Word clouds are visualization tools that highlight the relative frequency of words in a text. They have been around for more than 6 years now, and their popularity doesn’t seem to diminish. In this post, I would like to cover the strengths and weaknesses of word clouds in the context of qualitative text analysis.
While doing the necessary word cloud research, I found this great article about basic text mining and word clouds, that includes an inspiring word cloud of War and Peace. It demonstrates that while word clouds highlight main topics, they lack context. I went ahead and decided to create my own War and Peace word cloud with Wordle. Here is what I found.
Easy to create
It took me two minutes to create my word cloud with this free tool. The word clouds highlighted the words Pierre, Prince, Natasha, one, now and Andrew. At this point, I should disclose that I didn’t actually read War and Peace. For the purpose of this analysis, that’s a good thing because I’m looking at the data with fresh eyes. If you read the novel, it’ll be easier for you to judge the accuracy of the analysis.
The visibility of words changes across the different styles, especially for the smaller words. See how different words are highlighted depending on the style you choose. These differences made me question the accuracy of word clouds.
I went further and uploaded the whole novel into Keatext, the AI-powered text analytics for feedback interpretation. It took about 25 minutes to analyze the 555k+ words. Not bad, considering it takes people months to read. Keatext AI is trained in customer experience conversational text such as open ended survey answers and reviews, but it still did well processing literature.
Let’s have a look at the top topic results from Keatext:
Even for its basic function of highlighting the most common topic, word cloud failed. The top word man was barely visible in all the versions of the word clouds above. On the contrary, Keatext’s ability to bundle similar terms resulted on the automatic grouping of man and men, highlighting the term man as highly relevant. With the simple word cloud, you can miss important information like this one.
Lack of Context
I was intrigued when I saw Keatext displayed Prince Andrew as one topic. Based on the word cloud, I could have inferred that Pierre is the prince since these two words are roughly the same size. However, being a CX trained AI, Keatext showed me the relation between those two words right away, providing the necessary context.
In addition to providing context, Keatext includes a sentiment analysis feature. In Keatext analysis I could see that Natasha is quite balanced, but Pierre and Andrew are slightly more negative. When you look at the way they are described, they come across as real people. Here are some bubble chart outputs directly from Keatext:
Poor Natasha seems to be having a rough time.
Simple to understand
Word clouds are easy to grasp, but what do they actually convey? From what I’ve seen, my word clouds failed dramatically when it came to accuracy. This made me conclude that the information that word clouds provide may be misleading. I wonder if word clouds should be used as analytics tools for serious research…
Casual and visually appealing
Word clouds are definitely an interesting way of visualizing data. But if you are concerned about the accuracy issue that I mentioned above, you must know that with Keatext you can easily export the data and make word clouds that are focused on a specific topic, and add context and even sentiment. See an example: