3 Strengths And 3 Weaknesses Of Word Clouds

Word clouds are a visualization tool that highlights the relative frequency of words in a text. They have been around for more than 6 years now, and their popularity doesn’t seem to diminish. According to Google trends, word clouds are more relevant than ever:

word clouds are a popular search term on google

In this post I would like to cover the strengths and weaknesses of word clouds in the context of qualitative text analysis. Here is what I found:

3 Weaknesses:

  1. Inaccurate because of design features
  2. Lost information
  3. Lack of context

3 Strengths:

  1. Simple to understand
  2. Easy to create
  3. Casual & Visually appealing

Putting the Strengths and Weaknesses of Word Clouds to the Test

Let’s put the above assertions to the test. While I was doing my word cloud research I found this great article about Basic Text Mining and Word Clouds.
They had an inspiring word cloud of “War and Peace”, to illustrate the point that while it’s true that a word cloud highlights the main topics, it lacks context.

Easy to create

I made my own with Wordle. It took 2 minutes, the tool is free. For sure, this way of making graphs is accessible to everyone. No need for fancy software.

The word clouds highlighted the words Pierre, Prince, Natasha, one, now and Andrew. At this point I should disclose that I didn’t actually read War and Peace. For the purpose of this analysis that’s a good thing because I’m looking at the data with fresh eyes. If you read the novel, it means you can make judgements on the accuracy of the analysis.

Inaccurate because of design features

Another word cloud of War and Peace in a different style

It’s easy to observe that the visibility of terms changes across the different styles, especially for the smaller words.

Yet another word cloud of war and peace

This already supports the fact that design makes word clouds less accurate.

I went further and uploaded the whole novel into Keatext. That’s 555k+ words, which took about 25 minutes to analyze. Not bad, considering it takes people months to read. Our AI is trained in CX conversational text such as open ended survey answers and reviews, but it still did well processing literature.

Lost Information

Let’s have a look at the top topic results from Keatext:

Keatext Dashboard highlights top terms
Even for its basic function of highlighting the most common topic, word cloud failed. The top word “man” was barely visible in all the versions of the word clouds above. This is because the Keatext AI has automatically grouped “man” and “men” together. That supports the fact that with the simple word cloud you can miss important information.

Lack of Context

I was intrigued when I saw Keatext displayed “prince andrew” as one topic. Based on the word cloud we could have thought that Pierre is the prince since they are roughly the same size. Being a CX trained AI, Keatext was designed to give meaningful context to the topic.

Another feature is the sentiment analysis. I remember reading on the Wiki that none of the characters are heroes or villains. This comes out in their overview, with Natasha being quite balanced, but Pierre and Andrew with slightly more negative mentions. When you look at the way they are described, they come across as real people. Here are some bubble chart outputs directly from Keatext:

Pierre

Positive and negative mentions of Pierre

Natasha

positive and negative mentions of Natasha

Poor Natasha seems to be having a rough time.

Having topics that you can drill into means that you can choose to look at the data that is just related.

Simple to understand

From what we’ve seen, Word Cloud fails in a lot accuracy areas. Yes, they are easy to grasp, but what is it that they actually convey? The information may be misleading. This is why they shouldn’t be used as an analytics tools for serious research.

Casual & Visually Appealing

Sometimes even companies need to let their hair down when it comes to visualizations. If you really like word clouds Keatext can help with the accuracy issue I mentioned above. With Keatext you can easily export the data and make word clouds that are focused on a specific topic and give it context and even sentiment. Can you guess what this word cloud is about?

a wordcloud of things that embarrassed the war and peace characters

It seems affairs and family are a main source of embarrassment. I am once again struck by how relatable the feelings of the characters in the novel are.

This is what word clouds were meant to do all along: show the topic in its context.

Here’s another one about positive mentions of the men in “War and Peace”:

positive mentions of men in a word cloudI find these word clouds wonderfully insightful because they are more specific than your average copy/paste job.

 

Playing around with the data has made me realize how balanced the novel is. I can see how it captures the essence of human life in a mini universe. I think I’ll have shot at reading it after all!

If you have any ideas of what I could analyze in “War and Peace” please send me your suggestions @KeatextAI.

FRESH CONTENT DELIVERED STRAIGHT TO YOUR INBOX

4.Comments

  1. Jay
    February 8, 2017

    Good post!

    Reply
  2. Adam Philp
    February 11, 2017

    Good post. Are trousers really the third biggest source of embarrassment in War & Peace?

    Reply
    • February 17, 2017

      No, trousers isn’t the third cause of embarrassment, but it’s one of them. That’s just word clouds being inaccurate due to graphic design.

Leave a Comment