Word clouds are a visualization tool that highlights the relative frequency of words in a text. They have been around for more than 6 years now, and their popularity doesn’t seem to diminish. According to Google trends, word clouds are more relevant than ever:
In this post I would like to cover the strengths and weaknesses of word clouds in the context of qualitative text analysis. Here is what I found:
- Inaccurate because of design features
- Lost information
- Lack of context
- Simple to understand
- Easy to create
- Casual & Visually appealing
Putting the Strengths and Weaknesses of Word Clouds to the Test
Let’s put the above assertions to the test. While I was doing my word cloud research I found this great article about Basic Text Mining and Word Clouds.
They had an inspiring word cloud of “War and Peace”, to illustrate the point that while it’s true that a word cloud highlights the main topics, it lacks context.
Easy to create
I made my own with Wordle. It took 2 minutes, the tool is free. For sure, this way of making graphs is accessible to everyone. No need for fancy software.
The word clouds highlighted the words Pierre, Prince, Natasha, one, now and Andrew. At this point I should disclose that I didn’t actually read War and Peace. For the purpose of this analysis that’s a good thing because I’m looking at the data with fresh eyes. If you read the novel, it means you can make judgements on the accuracy of the analysis.
Inaccurate because of design features
It’s easy to observe that the visibility of terms changes across the different styles, especially for the smaller words.
This already supports the fact that design makes word clouds less accurate.
I went further and uploaded the whole novel into Keatext. That’s 555k+ words, which took about 25 minutes to analyze. Not bad, considering it takes people months to read. Our AI is trained in CX conversational text such as open ended survey answers and reviews, but it still did well processing literature.
Let’s have a look at the top topic results from Keatext:
Even for its basic function of highlighting the most common topic, word cloud failed. The top word “man” was barely visible in all the versions of the word clouds above. This is because the Keatext AI has automatically grouped “man” and “men” together. That supports the fact that with the simple word cloud you can miss important information.
Lack of Context
I was intrigued when I saw Keatext displayed “prince andrew” as one topic. Based on the word cloud we could have thought that Pierre is the prince since they are roughly the same size. Being a CX trained AI, Keatext was designed to give meaningful context to the topic.
Another feature is the sentiment analysis. I remember reading on the Wiki that none of the characters are heroes or villains. This comes out in their overview, with Natasha being quite balanced, but Pierre and Andrew with slightly more negative mentions. When you look at the way they are described, they come across as real people. Here are some bubble chart outputs directly from Keatext:
Poor Natasha seems to be having a rough time.
Having topics that you can drill into means that you can choose to look at the data that is just related.
Simple to understand
From what we’ve seen, Word Cloud fails in a lot accuracy areas. Yes, they are easy to grasp, but what is it that they actually convey? The information may be misleading. This is why they shouldn’t be used as an analytics tools for serious research.
Casual & Visually Appealing
Sometimes even companies need to let their hair down when it comes to visualizations. If you really like word clouds Keatext can help with the accuracy issue I mentioned above. With Keatext you can easily export the data and make word clouds that are focused on a specific topic and give it context and even sentiment. Can you guess what this word cloud is about?
It seems affairs and family are a main source of embarrassment. I am once again struck by how relatable the feelings of the characters in the novel are.
This is what word clouds were meant to do all along: show the topic in its context.
Here’s another one about positive mentions of the men in “War and Peace”:
I find these word clouds wonderfully insightful because they are more specific than your average copy/paste job.
Playing around with the data has made me realize how balanced the novel is. I can see how it captures the essence of human life in a mini universe. I think I’ll have shot at reading it after all!
If you have any ideas of what I could analyze in “War and Peace” please send me your suggestions @KeatextAI.