Welcome to the official blog of the Nexus Analytics Consulting Group! The purpose of this blog is for our team to share our work on analytics-related topics and to give the general public a better appreciation of what analytics can do.
Analytics is the use of data, quantitative analysis, and mathematical modeling to provide insights and drive decisions[i]. Analytics can be applied to many fields and industries to solve a variety of problems, which is why many organizations are eager to incorporate analytics into their systems and culture. As the content of this blog grows and diversifies, we hope that there will be something here for readers of all backgrounds and industries.
To learn more about our team and the kind of analytics we do, check out our team’s official page on the Nexus website: http://www.nexustech.com.ph/services/analytics-consulting/
For our first article, we decided to focus on the potential of harnessing social media data. Many organizations are already aware that managing social media presence is an important marketing tool and business driver. In certain industries, social media can make or break a business. By applying analytics to social media data, businesses can find more creative and innovative ways to tackle their challenges.
This two-part article will discuss the following:
To start, we collected up to 3,000 tweets from the Twitter account of a famous personality. We then used a word cloud, one of the most popular ways to visualize text data, to arrange the words that appear in the tweets within a shape and size them according to how often they are used. Can you tell which personality is being represented by the word cloud below[ii]? Are the most common words what you would expect from the tweets of this person?
As you may have guessed, these are tweets taken from the @BarackObama Twitter account. We can tell that his word cloud has a number of consistent words with several common themes among them, such as "climate", "economy", "health", and most notably, "change". We also notice that there are several words that are plausibly related to one another, such as “economy” and “job”, as well as “health” and “care”.
Although a word cloud is a fun and simple way to visualize tweet contents, they do not include the context in which the words are used. Word frequencies do not necessarily imply importance or relevance. For example, we can see from Obama's word cloud that "economy" is mentioned often; however, we do not know what is being said about it. Looking through all of the tweets including "economy" is tedious, so we need a simpler way to get a general idea of the context surrounding it.
To address the issue of context, we can look at how often pairs or groups of words occur together to have a better idea of themes present in tweets. This information can be visualized using a network graph that shows how strong the connection between two words is. We applied this to Obama’s tweets and focused on the pairs of words that appeared at least 90 times:
The thickness of the connecting lines tells us how often two words co-occur in the tweet dataset. This tells us the most common issues mentioned in Obama’s tweets, such as raising the minimum wage (#raisethewage, minimum, wage), health care (health, care), and actions on climate change (#actonclimate, climate, change). These relationships provide a little more context about the words found in the word cloud.
We can also pick specific keywords (e.g. obamacare and economy) and look at the words that frequently appear with them in the same tweet. This method can give us a broader context for specific topics.
From the interconnectedness of the nodes of the “economy” network graph, we can see that most of them do not only frequently co-occur with “economy” alone but with each other as well. On the other hand, the words connected to “obamacare” are mostly grouped as separate subtopics.
To illustrate how these analyses can be applied to businesses, assume that an app development company wants to know what people are saying about its app. It can collect tweets mentioning the app and construct a simple word cloud to visualize word frequencies. From the word cloud, it might be obvious that certain features are frequently mentioned, but the context is unknown. A network graph, however, may reveal that a competitor’s name frequently appears with the feature. This alerts the company to the fact that people are talking about the feature and the competitor in the same tweet.
Usually, the next question at this point would be: are people praising or criticizing the feature in comparison to the competition? Since the visualizations we have so far are only concerned with word frequencies, we would need another way to determine whether the tones of the tweets are positive or negative. Sentiment analysis, a popular method for analyzing text data that determines the tones and opinions of each post (or tweet), will be the emphasis of the second part of this article.
[i] Definition adapted from: Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of Winning, p. 7. Boston, MA: Harvard Business School Press
[ii] Tweets taken from @BarackObama on Aug 16, 2016. Wordcloud generator: wordcloud 1.2.1 (A little word cloud generator). Reference: https://pypi.python.org/pypi/wordcloud. Barack Obama image mask retrieved from http://www.freakingnews.com/pictures/65500/Barack-Obama-Silhouette-65697.jpg.