Election day looms closer and closer every week. US Politics are rapidly becoming the preferred conversation topic for millions of Americans and non-Americans worldwide. What are these people saying? What do they think? What are their opinions? How do they feel?
We are using machine learning to find out! For the past few months, we’ve been collecting millions of tweets posted by users from around the world that discuss this topic and classifying them using sentiment analysis with MonkeyLearn. That is, for each tweet published that mentions either Hillary Clinton or Donald Trump, MonkeyLearn tags it with its sentiment, which can be either positive, neutral, or negative.
We created a simple tool called Tarsier for visualizing this analysis and getting some interesting insights into the conversation around the US elections.
Having millions and millions of tweets that are talking about Donald Trump and Hillary Clinton is somehow useless if we can’t find an easy way to analyze the data. Just manually looking at the raw tweets doesn’t scale, there’s too many of them.
For this purpose, we have used MonkeyLearn for analyzing all of these tweets with Machine Learning and get their sentiment and extract the most relevant keywords. For example, these are some tweets that we have collected and the sentiment MonkeyLearn assigned to them:
"@HillaryClinton will receive the first question at tonight's presidential debate, according to @CBSNews #ClintonVsTrump". Sentiment: Neutral.
"Americans trust @realDonaldTrump to Make our Economy Great Again!". Sentiment: Positive.
"Racial discord was conceived, nurtured, refined & perpetuated by Americans incl @realDonaldTrump's father. Get real!". Sentiment: Negative.
"@wcve it's amazing how our city loves him and he really loves our city. @HillaryClinton made a great choice for Vice President. @timkaine". Sentiment: Positive.
Luckily, we have a probability rating as well. This number means how “confident” the machine learning algorithm is of the sentiment it assigned to the tweet. If it’s too low, we don’t use it in our analysis to avoid polluting the results with bad data. Of course, the sentiment of a short piece of text such as a tweet is not always clear even to a human analyst. We have previously discussed why sentiment analysis is hard in a previous post.
There are three graphs on the main view. The first one shows the ratio of the number of positive tweets to the number of negative tweets mentioning each of the candidates. This is, how positive the discussion is on Twitter surrounding that candidate. A higher value on this graph means that there are more positive tweets for each negative tweet and vice-versa:
As you can see with this first graph, there are much more negative tweets about the candidates than positive tweets.
The second and third graphs show the number of tweets each candidate got for each sentiment:
You can click on each line to the side to disable or enable traces, which allows you to compare only some of the traces, such as the number of positive Trump tweets vs the number of positive Clinton tweets, or positive Clinton tweets vs negative Clinton tweets.
It’s handy to look at both candidates graphs at the same time, which gives an insight into how Trump and Clinton compare to each other.
If you click on a graph on a particular day, on the right side panel you will get the most relevant keywords from the date you just clicked. You’ll also get some example tweets from that day, which are useful to see what the discussion looked like.
For example, these are the most relevant keywords for those negative tweets about Trump posted on the 7th of October, the day that was published the videotape containing Trump's offensive comments about Alicia Machado:
And these are some of the negative tweets about Trump on this particular day:
The first thing that stands out is that @realDonaldTrump gets mentioned much more than @HillaryClinton. Trump’s Twitter presence is much larger than Clinton’s. On an average day, Donald Trump’s account gets about 450,000 mentions, while Hillary Clinton’s account only gets 250,000.
Out of those tweets, the majority are tagged as “neutral”. These are factual tweets that don’t convey a sentiment. If you click on a specific date on the graphs, you can see some examples of what these tweets look like by going to the right side panel, selecting neutral and going to the tweets tab:
For both candidates, there are usually more negative than positive tweets. This means that whenever a candidate is mentioned on a tweet, it’s more likely for that tweet to have a negative sentiment than a positive one. What this implies is a long suspected truth: that on the internet, people are more likely to criticize something than they are to praise it.
It is important to note that this doesn’t mean that, for instance, all the negative tweets that mention @realDonaldTrump are criticizing him. Some (probably most) are critical of Trump, but some are critical of Clinton, or Obama, or other issues.
Now move on to something more oriented to specific dates and keywords. Something along the lines of “this allows you to see a picture of what was being talked about in a particular day”. Check out the keywords of a particular day, and you can see what was on the news that day.
We looked into some landmark dates of the campaign to find out what people were saying on Twitter that day.
You can see a considerable rise in traffic that day. A lot of people were not happy, and they were vocal about it. There’s a significant rise in traffic on Clinton’s side, with negative tweets taking off:
Checking out the keywords, you can see emails is a big one among the negative and neutral ones, alongside FBI, dark day, american history.
But, there’s also a rise in positive tweets. A small one compared to the negative ones, but an increase nonetheless. What are these tweets saying? A lot of them seem to be sarcasm. Machine learning algorithms aren’t great on sarcasm (yet!), so tweets like thanks obama and Great job with those emails! are classified as positive.
There’s a prominent peak in Clinton mentions that day, both neutral, negative and positive. However, the negative backlash was much smaller than the one on July 5th:
Clearly this piece of news was met with mixed responses from Twitter users: positive Clinton keywords were things like Bernie supporters, thanks bernie, best choice, while negative Clinton keywords were bringing up criticism: NAFTA, disastrous crime bill, email breach:
Meanwhile, on the Trump camp, some important keywords were Bernie bros, divider, stronger candidate. It’s clear that everyone was echoing Donald Trump’s message of welcoming Sanders supporters that were feeling like they were let down.
During the Republican convention (July 18–21) there’s a clear rise in Trump’s sentiment, which means there were more positive tweets than negative ones.
An interesting thing that can be seen on this date is that Ted Cruz appears as a keyword on Trump tweets, but only in neutral and negative ones. Looking at the most relevant tweets, this means that people on Twitter were either reporting on the senator’s speech or actively criticizing it, but not praising it. It’s clear that people did not regard this as a good move by Trump’s former rival.
During these days the Clinton sentiment is actually more positive than Trump sentiment. Some of the relevant keywords for Clinton's positive tweets during these days includes first woman president, proud, history and amazing time tonight, which clearly shows that people were excited (and vocal) about having the first woman nominee ever in the history of the US.
On this day we can see a nice rise in the Clinton Twitter mentions. Some keywords include disrespect, deplorable thing and shameful. If we click on the tweets tab, we can see some really strong and angry tweets like '@HillaryClinton the only thing deplorable is you. Stupid bitch'.
During the debates, there were so many tweets being published that unfortunately the Twitter’s public API was limiting us and was returning a fraction of all of the tweets. Although we don’t have the full picture of what went down on Twitter during the debate, we do still have enough data to understand what people thought:
Clinton saw a significant rise in the positiveness of her sentiment compared to Trump. There was no surprise in the keywords for both candidates. Positive tweets about Clinton praise her comebacks and mentions that she had won the debate. And on the other side, positive Trump tweets praise his job at the debate and also proclaims him as the winner.
This time, there wasn’t the same reaction as in the previous debate: even though Trump mentions were still about how Trump won, Clinton mentions mostly said that she lost. Uncomfortable debate is one of the top Clinton keywords, which says a lot about the public reaction to the second debate.
Like in the first debate, the third and last presidential debate saw a considerable jump in the positive sentiment around Clinton, actually beating Trump in the general sentiment overall.
Most of the positive tweets about Clinton in the last hours before the debate were about wishing her good luck. After the debate, people were praising her with keywords like good job, best candidate, and next president. Some of the negative keywords for both candidates are related somehow to Clinton: on the Clinton Twitter mentions we can find keywords like war drums, corruption, pathological liar and FBI documents. And on the Trump Twitter mentions we can find keywords like crooked hillary, nasty woman and voter fraud.
Another cool thing we found can be done with Tarsier is that you can find out what piece of news was making the rounds that day. In every election cycle, after some time has gone by, the big events are remembered (the conventions, big scandals, and whatnot), but the small things are forgotten, and it’s the day to day that shapes public opinion in the end.
As an example, it is well known that July 21st was the day Trump accepted the Republican nomination, and the keywords are all about the convention, Ted Cruz, and Trump’s speech. The next day, Hillary Clinton announced Tim Kaine would be his running mate (you can see a huge spike in traffic).
But what happened on, let’s say, on a day as unremarkable as September 17th? If you check the keywords, you can see that Hillary Clinton was being bashed for the whole birther rumor news that was going around. Supporter Phillip Berg and Birther Lawsuit are the top Clinton keywords for that day and quick googling leads to articles from September 16th and 17th that mention those issues. Meanwhile, Trump was also being bashed for the whole birther issue.
These are events that are sometimes forgotten but end up shaping the decision of a lot of voters on election day. With this tool, we found out you can go back and see these pieces of news and conversations, helping you understand better the big picture of the presidential campaign.
The conversation around the US elections has been omnipresent and highly polarized. There is a clear and deep division in the US political landscape and these are arguably the most controversial elections in recent history.
We feel that Tarsier is a simple but powerful tool that helps to understand how people are talking about the candidates on social media. We believe that it can bring some clarity to what's going on with this particular elections and get some valuable insights from the data.
We invite you to play around with Tarsier and share in the comments what type of insights you find!
October 20th, 2016