Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows businesses to identify customer sentiment toward products, brands or services in online conversations and feedback.
In this blog post, we’ll go into more detail about what sentiment analysis is, how it works, and how you can use it to detect emotions within text. We’ve also included a few tutorials on how you can carry out sentiment analysis on your data, whether you’re new to machine learning or a Python pro!
Read along, bookmark this post for later, or jump to the sections that pique your interest:
- Sentiment Analysis Basics
- How Does Sentiment Analysis Work?
- Sentiment Analysis Applications
- Sentiment Analysis Resources
Let’s get started!
Sentiment Analysis Basics
Sentiment analysis models detect polarity within a text (e.g. a positive or negative opinion), whether it’s a whole document, paragraph, sentence, or clause.
Understanding people’s emotions is essential for businesses since customers are able to express their thoughts and feelings more openly than ever before. By automatically analyzing customer feedback, from survey responses to social media conversations, brands are able to listen attentively to their customers, and tailor products and services to meet their needs.
For example, one of our customers used sentiment analysis to automatically analyze 4,000+ reviews about their product, and discovered that customers were happy about their
pricing but complained a lot about their
Types of Sentiment Analysis
Sentiment analysis assumes various forms, from models that focus on polarity (positive, negative, neutral) to those that detect feelings and emotions (angry, happy, sad, etc), or even models that identify intentions (e.g. interested v. not interested).
Here are some of the most popular types of sentiment analysis:
Fine-grained Sentiment Analysis
If polarity precision is important to your business, you might consider expanding your polarity categories to include:
- Very positive
- Very negative
This is usually referred to as fine-grained sentiment analysis, and could be used to interpret 5-star ratings in a review, for example:
- Very Positive = 5 stars
- Very Negative = 1 star
Emotion detection aims at detecting emotions, like happiness, frustration, anger, sadness, and so on. Many emotion detection systems use lexicons (i.e. lists of words and the emotions they convey) or complex machine learning algorithms.
One of the downsides of using lexicons is that people express emotions in different ways. Some words that typically express anger, like bad or kill (e.g. your product is so bad or your customer support is killing me) might also express happiness (e.g. this is bad ass or you are killing it).
Aspect-based Sentiment Analysis
Usually, when analyzing sentiments of texts, let’s say product reviews, you’ll want to know which particular aspects or features people are mentioning in a positive, neutral, or negative way. That's where aspect-based sentiment analysis can help, for example in this text: "The battery life of this camera is too short", an aspect-based classifier would be able to determine that the sentence expresses a negative opinion about the feature battery life.
Multilingual sentiment analysis
Multilingual sentiment analysis can be difficult. It involves a lot of preprocessing and resources. Most of these resources are available online (e.g. sentiment lexicons), while others need to be created (e.g. translated corpora or noise detection algorithms), but you’ll need to know how to code to use them.
Alternatively, you could detect language in texts automatically, then train a custom sentiment analysis model to classify texts in the language of your choice.
Check out these sentiment analysis examples to learn more about the different types of sentiment analysis.
Benefits of Sentiment Analysis
It’s estimated that 80% of the world’s data is unstructured, in other words it’s unorganized. Huge amounts of text data (emails, support tickets, chats, social media conversations, surveys, articles, documents, etc), is created every day but it’s hard to analyze, understand, and sort through, not to mention time-consuming and expensive.
Sentiment analysis, however, helps businesses make sense of all this unstructured text by automatically tagging it.
Benefits of sentiment analysis include:
Processing Data at Scale Can you imagine manually sorting through thousands of tweets, customer support conversations, or customer reviews? There’s just too much data to process manually. Sentiment analysis helps businesses process huge amounts of data in an efficient and cost-effective way.
Real-Time Analysis Sentiment analysis can identify critical issues in real-time, for example is a PR crisis on social media escalating? Is an angry customer about to churn? Sentiment analysis models can help you immediately identify these kinds of situations, so you can take action right away.
Consistent criteria It’s estimated that people only agree around 60-65% of the time when determining the sentiment of a particular text. Tagging text by sentiment is highly subjective, influenced by personal experiences, thoughts, and beliefs. By using a centralized sentiment analysis system, companies can apply the same criteria to all of their data, helping them improve accuracy and gain better insights.
How Does Sentiment Analysis Work?
Sentiment analysis uses various Natural Language Processing (NLP) methods and algorithms, which we’ll go over in more detail in this section.
The main types of algorithms used include:
- Rule-based systems that perform sentiment analysis based on a set of manually crafted rules.
- Automatic systems that rely on machine learning techniques to learn from data.
- Hybrid systems that combine both rule-based and automatic approaches.
Usually, a rule-based system uses a set of human-crafted rules to help identify subjectivity, polarity, or the subject of an opinion.
These rules may include various techniques developed in computational linguistics, such as:
- Stemming, tokenization, part-of-speech tagging and parsing.
- Lexicons (i.e. lists of words and expressions).
Here’s a basic example of how a rule-based system works:
- Defines two lists of polarized words (e.g. negative words such as bad, worst, ugly, etc and positive words such as good, best, beautiful, etc).
- Counts the number of positive and negative words that appear in a given text.
- If the number of positive word appearances is greater than the number of negative word appearances, the system returns a positive sentiment, and vice versa. If the numbers are even, the system will return a neutral sentiment.
Rule-based systems are very naive since they don't take into account how words are combined in a sequence. Of course, more advanced processing techniques can be used, and new rules added to support new expressions and vocabulary. However, adding new rules may affect previous results, and the whole system can get very complex. Since rule-based systems often require fine-tuning and maintenance, they’ll also need regular investments.
Automatic methods, contrary to rule-based systems, don't rely on manually crafted rules, but on machine learning techniques. A sentiment analysis task is usually modeled as a classification problem, whereby a classifier is fed a text and returns a category, e.g. positive, negative, or neutral.
Here’s how a machine learning classifier can be implemented:
The Training and Prediction Processes
In the training process (a), our model learns to associate a particular input (i.e. a text) to the corresponding output (tag) based on the test samples used for training. The feature extractor transfers the text input into a feature vector. Pairs of feature vectors and tags (e.g. positive, negative, or neutral) are fed into the machine learning algorithm to generate a model.
In the prediction process (b), the feature extractor is used to transform unseen text inputs into feature vectors. These feature vectors are then fed into the model, which generates predicted tags (again, positive, negative, or neutral).
Feature Extraction from Text
More recently, new feature extraction techniques have been applied based on word embeddings (also known as word vectors). This kind of representations makes it possible for words with similar meaning to have a similar representation, which can improve the performance of classifiers.
The classification step usually involves a statistical model like Naïve Bayes, Logistic Regression, Support Vector Machines, or Neural Networks:
Naïve Bayes: a family of probabilistic algorithms that uses Bayes’s Theorem to predict the category of a text.
Linear Regression: a very well-known algorithm in statistics used to predict some value (Y) given a set of features (X).
Support Vector Machines: a non-probabilistic model which uses a representation of text examples as points in a multidimensional space. Examples of different categories (sentiments) are mapped to distinct regions within that space. Then, new texts are assigned a category based on similarities with existing texts and the regions they’re mapped to.
Deep Learning: a diverse set of algorithms that attempt to mimic the human brain, by employing artificial neural networks to process data.
Hybrid systems combine the desirable elements of rule-based and automatic techniques into one system. One huge benefit of these systems is that results are often more accurate.
Sentiment Analysis Challenges
Computer scientists have been trying to develop more accurate sentiment classifiers, and overcome limitations in recent years. Let’s take a closer look at some of the challenges they face:
Subjectivity and Tone
The detection of subjective and objective texts is just as important as analyzing their tone. In fact, so called objective texts do not contain explicit sentiments. Say, for example, you intend to analyze the sentiment of the following two texts:
The package is nice.
The package is red.
Most people would say that sentiment is positive for the first one and neutral for the second one, right? All predicates (adjectives, verbs, and some nouns) should not be treated the same with respect to how they create sentiment. In the examples above, nice is more subjective than red.
Context and Polarity
All utterances are uttered at some point in time, in some place, by and to some people, you get the point. All utterances are uttered in context. Analyzing sentiment without context gets pretty difficult. However, machines cannot learn about contexts if they are not mentioned explicitly. One of the problems that arise from context is changes in polarity. Look at the following responses to a survey:
Everything of it.
Imagine the responses above come from answers to the question What did you like about the event? The first response would be positive and the second one would be negative, right? Now, imagine the responses come from answers to the question What did you DISlike about the event? The negative in the question will make sentiment analysis change altogether.
A good deal of preprocessing or postprocessing will be needed if we are to take into account at least part of the context in which texts were produced. However, how to preprocess or postprocess data in order to capture the bits of context that will help analyze sentiment is not straightforward.
Irony and Sarcasm
When it comes to irony and sarcasm, people express their negative sentiments using positive words, which can be difficult for machines to detect without having a thorough understanding of the context of the situation in which a feeling was expressed.
For example, look at some possible answers to the question, Did you enjoy your shopping experience with us?
Yeah, sure. So smooth!
Not one, but many!
What sentiment would you assign to the responses above? The first response with an exclamation mark could be negative, right? The problem is there is no textual cue that will help a machine learn, or at least question that sentiment since yeah and sure often belong to positive or neutral texts.
How about the second response? In this context, sentiment is positive, but we’re sure you can come up with many different contexts in which the same response can express negative sentiment.
How to treat comparisons in sentiment analysis is another challenge worth tackling. Look at the texts below:
This product is second to none.
This is better than older tools.
This is better than nothing.
The first comparison doesn’t need any contextual clues to be classified correctly. It’s clear that it’s positive.
The second and third texts are a little more difficult to classify, though. Would you classify them as neutral, positive, or even negative? Once again, context can make a difference. For example, if the ‘older tools’ in the second text were considered useless, then the second text is pretty similar to the third text.
There are two types of emojis according to Guibon et al.. Western emojis (e.g. :D) are encoded in only one or two characters, whereas Eastern emojis (e.g. ¯ \ _ (ツ) _ / ¯) are a longer combination of characters of a vertical nature. Emojis play an important role in the sentiment of texts, particularly in tweets.
You’ll need to pay special attention to character-level, as well as word-level, when performing sentiment analysis on tweets. A lot of preprocessing might also be needed. For example, you might want to preprocess social media content and transform both Western and Eastern emojis into tokens and whitelist them (i.e. always take them as a feature for classification purposes) in order to help improve sentiment analysis performance.
Here’s a quite comprehensive list of emojis and their unicode characters that may come in handy when preprocessing.
Defining what we mean by neutral is another challenge to tackle in order to perform accurate sentiment analysis. As in all classification problems, defining your categories -and, in this case, the neutral tag- is one of the most important parts of the problem. What you mean by neutral, positive, or negative does matter when you train sentiment analysis models. Since tagging data requires that tagging criteria be consistent, a good definition of the problem is a must.
Here are some ideas to help you identify and define neutral texts:
- Objective texts. So called objective texts do not contain explicit sentiments, so you should include those texts into the neutral category.
- Irrelevant information. If you haven’t preprocessed your data to filter out irrelevant information, you can tag it neutral. However, be careful! Only do this if you know how this could affect overall performance. Sometimes, you will be adding noise to your classifier and performance could get worse.
- Texts containing wishes. Some wishes like, I wish the product had more integrations are generally neutral. However, those including comparisons like, I wish the product were better are pretty difficult to categorize
How Accurate Is Sentiment Analysis?
Here’s what sentiment analysis is: it’s a tremendously difficult task even for human beings. That said, sentiment analysis classifiers might not be as precise as other types of classifiers. Remember that inter-annotator agreement is pretty low and that machines learn from the data they are fed with (see above).
That said, you might be saying, is it worth the effort? The answer is simple: it sure is worth it! Chances are that sentiment analysis predictions will be wrong from time to time, but by using sentiment analysis you will get the opportunity to get it right about 70-80% of the times you submit your texts for classification.
If you or your company have not used sentiment analysis before, then you’ll see some improvement really quickly. For typical use cases, such as ticket routing, brand monitoring, and VoC analysis (see below), this means you will save a lot of time and money -which you are likely to be investing in in-house manual work nowadays,- save your teams some frustration, and increase your (or your company’s) productivity.
Sentiment Analysis Use Cases & Applications
In this section, we’ll introduce use cases, applications, and examples of how sentiment analysis can be used for:
- Social media monitoring
- Brand monitoring
- Voice of customer (VoC)
- Customer service
- Market research
Social Media Monitoring
On the fateful evening of April 9th, 2017, United Airlines forcibly removed a passenger from an overbooked flight. The nightmare-ish incident was filmed by other passengers on their smartphones and posted immediately. One of the videos, posted to Facebook, was shared more than 87,000 times and viewed 6.8 million times by 6pm on Monday, just 24 hours later.
The fiasco was magnified horrifically by the company’s dismissive response. On Monday afternoon, United Airlines tweeted a statement from the CEO apologizing for “having to re-accommodate customers”. Cue public outrage – you can imagine the field day on Twitter.
This is exactly the kind of PR catastrophe you can avoid with sentiment analysis. It’s also an excellent example of why it’s important to care, not only about if people are talking about your brand, but how they’re talking about it. More mentions don't equal positive mentions.
In today’s day and age, brands of all shapes and sizes have meaningful interactions with customers, leads, and even competition on social networks like Facebook, Twitter, and Instagram. Most marketing departments are already tuned into online mentions as far as volume – they measure more chatter as more brand awareness. Nowadays, however, businesses need to look for deeper insights. By using sentiment analysis on social media, we can get incredible insights into the quality of conversation that’s happening around a brand.
In short, sentiment analysis can be used to:
- Analyze tweets and/or facebook posts over a period of time to detect sentiment of a particular audience
- Monitor social media mentions of your brand and automatically categorize by urgency
- Automatically route social media mentions to team members best fit to respond
- Automate any or all of these processes
- Gain deep insights into what’s happening across your social media channels
Top Benefits for social media monitoring:
Sentiment analysis is useful in social media monitoring because it helps you do all of the following:
- Prioritize action. Which is more urgent: a fuming customer or a “thanks!” shout-out? Obviously the angry customer. Sentiment analysis lets you easily filter unread mentions by positivity and negativity, helping you prioritize issues.
- Track trends over time.
- Tune into a specific point in time – i.e. the lead-up to a new product launch or the day a particular piece of bad press dropped.
- Keep a finger on the competition. Why not monitor your competitors’ social media the same way you monitor your own? If you tune in closely, maybe you notice there’s been a negative response to a particular feature of their new product, and you respond by designing a lead generation campaign targeting exactly that gap. They won’t even know what hit them.
Over the course of a few months during the 2016 US Presidential Elections, we collected and analyzed millions of tweets mentioning Clinton or Trump posted by users from around the world. We classified each of those tweets with a sentiment of either positive, neutral, or negative.
For example, here are some tweets we analyzed:
- Negative: “Racial discord was conceived, nurtured, refined & perpetuated by Americans incl @realDonaldTrump’s father. Get real!”
- Neutral: “@HillaryClinton will receive the first question at tonight’s presidential debate, according to @CBSNews #ClintonVsTrump”.
- Positive: “Americans trust @realDonaldTrump to Make our Economy Great Again!”
- Positive: “@wcve it’s amazing how our city loves him and he really loves our city. @HillaryClinton made a great choice for Vice President. @timkaine”.
From this simple, easy analysis, we found interesting insights:
- More tweets mentioned @realDonaldTrump (~450k/day) than @HillaryClinton (~250k/day). Again, this does not equal positivity, but does imply brand awareness (and in the case of something like elections, awareness is key).
- For both candidates, there were more negative than positive tweets. Given that it’s Twitter and politics, this was not much of a surprise.
- Trump had a better positive to negative Tweet ratio than Clinton.
To sum up, more people were tweeting about Trump, and a higher percentage of people tweeting about Trump were doing so more positively than those tweeting about Clinton.
Not only do brands have a wealth of information available on social media, but also across the internet. Instead of focusing on specific social media platforms such as Facebook and Twitter, we can find mentions in places like news, blogs, and forums – again, looking at not just the volume of mentions, but also the quality of those mentions.
In our United Airlines example, for instance, the flare-up started on the social media platforms of a few passengers. Within hours, it was picked up by news sites and spread like wildfire across the US. News then spread to China and Vietnam, as the passenger was reported to be an American of Chinese-Vietnamese descent and people accused the perpetrators of racial profiling. In China, the incident became the number one trending topic on Weibo, a microblogging site with almost 500 million users.
And again, this is all happening within mere hours and days of when the incident took place.
In short, sentiment analysis can be used to:
- Analyze news articles, blog posts, forum discussions, and other texts on the internet over a period of time to see sentiment of a particular audience
- Automatically categorize the urgency of all online mentions of your brand
- Automatically alert designated team members of online mentions that concern their area of work
- Automate any or all of these processes
- Better understand a brand online presence by getting all kinds of interesting insights and analytics
Top benefits for brand monitoring:
- Understand how your brand reputation evolves over time
- Research your competition and understand how their reputation also evolves over time.
- Identify potential PR crises and know to take immediate action. Again, prioritize what fires need to be put out immediately and what mentions can wait.
- Tune into a specific point in time. Again, maybe you want to look at just press mentions on the day of your IPO filing, or a new product launch. Sentiment analysis lets you do that.
Example: Expedia Canada
Around Christmas time, Expedia Canada ran a classic “escape winter” marketing campaign. All was well, except for the screeching violin they chose as background music. Understandably, people took to social media, blogs, and forums. Expedia noticed right away and removed the ad. Then, they created a series of follow-up spin-off videos: one showed the original actor smashing the violin, and in another one, they invited a real follower who had complained on Twitter to come in and rip the violin out of the actor’s hands. Though their original campaign was a flop, Expedia were able to redeem themselves by listening to their customers and responding.
Using sentiment analysis (and machine learning), you can automatically monitor all chatter around your brand and detect this type of potentially-explosive scenario while you still have time to defuse it.
Social media and brand monitoring offer us immediate, unfiltered, invaluable information on customer sentiment. However, there are two other troves of insight – surveys and customer support interactions.
Net Promoter Score (NPS) surveys are one of the most popular ways for businesses to gain feedback, and start by asking a simple question – Would you recommend this company, product, and/or service to a friend or family member? – that results in a simple number or score. Businesses use these scores to identify customers as promoters, passives, or detractors. The goal is to identify overall customer experience, and find ways to elevate all customers to “promoter” level, where they theoretically will buy more, stay longer, and refer other customers.
Numerical survey data is easily aggregated and assessed, but the next question in NPS surveys asks customers why they left the score they did. This triggers a series of open-ended responses that are a lot harder to analyze. However, with sentiment analysis these texts can be classified into positive and negative giving you further insights into why customers left the scores they did.
In short, sentiment analysis can be used to:
- Analyze aggregated NPS or other survey responses
- Analyze aggregated customer support interactions
- Track customer sentiment about specific aspects of the business over time. This adds depth to explain why the overall NPS score might have changed, or if specific aspects have shifted independently.
- Target individuals to improve their service. By automatically running sentiment analysis on incoming surveys, you can detect customers who are ‘strongly negatively’ towards your product or service, so you can respond to them right away
- Determine if particular customer segments feel more strongly about your company. You can zero in on sentiment by certain demographics, interests, personas, etc
Top benefits for understanding Voice of Customer (VoC):
- Use results of sentiment analysis to design better informed questions to ask on future surveys
- Understand the nuances of customer experience over time, along with why and how shifts are happening
- Empower your internal teams by giving them a deeper view of the customer experience, by segment and by specific aspects of the business
- Respond more quickly to signals and shifts from customers
Example: McKinsey City Voices project
In Brazil, federal public spending rose by 156% from 2007 to 2015 while people’s satisfaction with public services steadily decreased. Unhappy with this counterproductive progress, the Urban-planning Department recruited McKinsey to help them work on a series of new projects that would focus first on user experience, or citizen journeys, when delivering services. This citizen-centric style of governance has led to the rise of what we call Smart Cities.
McKinsey developed a tool called City Voices, which conducts citizen (customer) surveys across more than 150 different metrics, and then runs sentiment analysis to help leaders understand how constituents live and what they need, in order to better inform public policy. By using this tool, the Brazilian government was able to surface urgent needs – a safer bus system, for instance – and improve them first.
If even whole cities and countries, famous for their red tape and slow pace, are incorporating customer journeys and sentiment analysis into their decision making processes, then innovative companies better be far ahead.
We all know the drill: stellar customer experiences means a higher rate of returning customers. Leading companies know that how they deliver is just as, if not more, important as what they deliver. Customers expect their experience with companies to be immediate, intuitive, personal, and hassle-free. In fact, research shows that 25% of customers will switch to a competitor after just one negative interaction.
We already looked at how we can use sentiment analysis in terms of the broader VoC, so now we’ll dial in on customer service teams.
Sentiment analysis can be used to:
- Automate text classification all incoming customer support queries
- Rapidly detect disgruntled customers and surface those tickets to the top
- Route queries to specific team members best suited to respond
- Gain deep insights into what’s happening across your customer support
Top benefits for customer service:
- Prioritize order for responding to tickets, being sure to address the most urgent needs first.
- Increase efficiency by automatically assigning tickets to a particular category or team member.
Just for kicks, we decided to analyze how the four biggest US phone carriers (AT&T, Verizon, Sprint, and T-Mobile) handle customer support interactions on Twitter. We downloaded tens of thousands of tweets mentioning the companies (by name or by handle), and ran them through a MonkeyLearn sentiment model to categorize each tweet as positive, neutral, or negative. We then used our new Insight Extractor, which reads all text as one unit, extracts the most relevant keywords, and returns the most relevant sentences including each keyword.
Here’s some insights:
- T-Mobile had by far the highest percentage of positive tweets
- Verizon was the only company with more negative tweets than positive ones
- Top keywords for positive tweets at Verizon included typical terms such as “new phone,” “thanks,” and “quality customer service”. Key interactions between customers and agents were formal and somewhat dry
- Top keywords for positive tweets at T-Mobile included names of customer support agents, since their team has higher engagement, as well as back-and-forth type conversations with their followers
To sum up, this could imply that a more personal, engaging take on social media elicits more positive responses and higher customer satisfaction.
And as a final use case, sentiment analysis empowers all kinds of market research and competitive analysis. Whether you’re exploring a new market, anticipating future trends, or having an edge over the competition, sentiment analysis can make all the difference.
Sentiment analysis can be used to:
- Analyze product reviews of your brand and compare those with the competition
- Generate weekly, monthly, or daily reports – a sort of early-warning system
- Compare sentiment across international markets
- Analyze formal market reports or business journals for long-term, broader trends
- Analyze tweets and social media posts for real-time happenings
- Analyze reviews for unfiltered customer feedback
- Use aspect-based sentiment analysis to gain rich insight into the details and the reason for otherwise opaque market trends
Top benefits for market research:
- Tap into new sources of information
- Quantify otherwise qualitative information
- Add that qualitative dimension to already-gathered quantitative insights
- Provide information in real-time rather than in retrospect
- Automated for regular (perhaps weekly) reports
- Fill in gaps where public data is scarce – in emerging markets, for instance.
Examples: Hotel reviews on TripAdvisor
Our team was curious about how people feel about hotels in several major cities around the world, so we scraped and analyzed more than one million reviews from TripAdvisor. We looked at hotels in London, Paris, New York, Bangkok, Madrid, Beijing, and Rio de Janeiro.
Here are some insights:
- Reviews were mostly positive – on average, 82% of comments were tagged with a positive sentiment
- London hotels received the worst reviews
- London hotels were viewed as dirtier than New York hotels, and as having the worst food overall.
We used the keyword extraction module to analyze the actual content of the positive/negative reviews, and found a few more interesting insights:
- “Cockroaches” appears only in Bangkok –watch out!
- “Croissants” appears only in Paris (as we might expect). Shockingly, though, they appear to be a letdown. Taking a closer look, we were able to conclude this was more a reflection on the subpar hotel breakfast food than on the city itself (phew!).
Sentiment Analysis Resources Tutorials & Tools
Sentiment analysis is a really vast topic and beginners might not know how to get started. Luckily, there are many resources out there, from useful tutorials to all kinds of courses, articles, and papers that specialize in this topic. In this section, our goal is to give you a brief overview of how to get started with sentiment analysis.
1. Read the basics
Before diving into sentiment analysis literature and tutorials, make sure you understand the very basics of sentiment analysis. Maybe go over these sections once more:
- The basics of sentiment analysis.
- Different types of sentiment analysis.
- The benefits of sentiment analysis.
- How sentiment analysis works.
If you’re already familiar with the topic, you can explore more advanced sentiment analysis literature.
2. Try out an online tool
A good next step in your journey to learn more about sentiment analysis is to play and experiment with a sentiment analysis tool.
By having first-hand experience, you can quickly understand how sentiment analysis classifies expressions. You will also quickly learn what the challenges and caveats of this technology are.
Below, you can try out different models that were trained by MonkeyLearn for a diverse set of sentiment analysis tasks. Feel free to experiment with different expressions and see how different models behave and make predictions.
If you get an odd result, it could be because the expression you've used wasn’t recognized by the model (yet). Try entering more words to see how this affects the results.
Additionally, you can use MonkeyLearn to create a custom model for sentiment analysis to get specific results that are tailored to your domain and interest.
Cross-Domain Sentiment Analysis
This is a cross-domain sentiment analysis classifier for texts in English. It works well on any kind of text. If you are not sure about which sentiment analysis model to use, we recommend using this one.
This model can be used for classifying tweets in English according to their sentiment (i.e. positive, neutral or negative).
This model classifies product reviews and opinions in English as positive or negative according to their sentiment.
This sentiment analysis classifier was trained with data from different hotel review sites to distinguish between good and bad reviews.
3. Learn from a tutorial
There is a sentiment analysis tutorial for almost everyone: coders, non-coders, marketers, data analysts, support agents, salespeople, you name it. In this section, we’ll share a selection of tutorials so you can find something right up your alley.
Sentiment Analysis Tutorials for Coders
For those that feel comfortable around code and APIs, you can quickly find all kinds of step-by-step guides and resources. Python is the most common programming language for tutorials about data analysis, machine learning, and NLP (including sentiment analysis) but R is quickly catching up, especially with tutorials that are aimed at data scientists and statisticians.
Sentiment Analysis of Top 100 Subreddits with Python
This is a Python web scraping and sentiment analysis tutorial that provides a step-by-step guide on how to analyze the top 100 subreddits by the sentiment of their comments.
It starts by explaining how to use Beautiful Soup, one of the most popular Python libraries for web scraping, in order to pull data out of web pages. The author uses this library to scrape the top subreddits web page to get the names of the top 100 subreddits (subreddits like /r/funny, /r/AskReddit and /r/todayilearned).
Once he gets the names of the subreddits, he uses the Praw library to interact with the Reddit API and extract the comments from these subreddits.
Finally, the author explains how to use TextBlob to perform sentiment analysis on the extracted comments.
Sentiment analysis of Slack reviews using R
Let’s imagine that we're the Slack team and we're looking for an easy, reliable way to get data about users’ feelings about our product. We can turn to online reviews in order to answer some top-of-mind questions.
But, when there are thousands of reviews out there, it can be tough to sort through all this feedback and get the insights we're looking for. There is simply too much feedback to process manually.
With this in mind, we’ve provided a step-by-step guide of how you might conduct a seamless sentiment analysis of Slack reviews using R.
It analyzes a few thousand reviews of Slack on the product review site Capterra and get some great insights from the data.
Sentiment Analysis of the State of the Union with R
Kaggle is a great resource for all kinds of tutorials related to data science. In this sentiment analysis in R tutorial by Rachael Tatman, you can learn how the author analyzed sentiment of the State of the Union address, which is an annual speech given by the President of the United States to congress.
This message is an opportunity for the president to inform the US citizens (and the world) on how the government is doing regarding issues that are important to the US.
By analyzing the different messages from these State of the Union speeches, it’s possible to get a lot of interesting insights, like how the sentiment has changed over time or which presidents received more negative or positive comments.
As a first step, the author proceeds to tokenize the data, which basically means taking the text from the speeches and breaking it up into its individual words. Then, she compares these tokens against a list of words with associated positive or negative sentiments (a sentiment lexicon) and creates some visualizations using the ggplot package.
At the end of the tutorial, the author provides some exercises that are useful to get some additional practice and a deeper understanding of sentiment analysis.
Sentiment Analysis of Tweets Using NLTK
If you are a Python coder and you want to learn how to train your first text classifier for sentiment analysis, there’s a step-by step guide on Twitter sentiment analysis using Python and NLTK. The author uses Natural Language Toolkit NLTK to train a classifier that is able to predict the sentiment of a new tweet.
To get started, the author explains how to extract a list of features from a predefined set of positive and negative tweets. These features are a set of distinctive words that can be used to represent each tweet and are a key part of training a classifier.
Then, you’ll learn how to prepare the training data that contains the labeled feature sets. Finally, he proceeds to train a Naive Bayes classifier, a simple but powerful algorithm that works particularly well with natural language processing problems.
Once it has trained a classifier, the author proceeds to explain how to use this model to classify a new incoming tweet.
Sentiment Analysis on Songs Using R
If you are looking for a more advanced tutorial on sentiment analysis using R, then
If you are looking for a more advanced tutorial on sentiment analysis using R, then learn how to use the Tidytext package to perform sentiment analysis on Prince’s songs.
The author starts by analyzing basic information such as the lexical diversity of Prince lyrics. Then, it explores different sentiment lexicons (including AFINN, Bing, and NRC) and how well they fit to analyse Prince’s lyrics. Afterwards, it proceeds to explain how to effectively perform sentiment analysis on all of Prince’s songs. Once it has the sentiment, it explores the lyrics sentiment over the years and provides a practical explanation on how bigrams affect sentiment.
Sentiment Analysis of Tweets Using Scikit-learn and Jupyter Notebook
Scikit-learn is a simple and efficient tool for data analysis, most often used for data classification, regression, and clustering. It’s one of the most frequently used libraries in machine learning since it’s powerful but accessible to everybody. If you are serious about learning about data analysis and machine learning, there’s an easy-to-follow tutorial with scikit-learn to help you get started.
It explain how to train a logistic regression model for sentiment analysis. It starts by showing how to properly set up our environment, including jupyter notebook, an application that allows rapid prototyping and sharing of data-related projects.
Afterwards, the author proceeds to explain how to prepare and vectorize our data with scikit-learn. Finally, it trains a linear classifier and shows how to evaluate the model and calculate the accuracy of the model.
Sentiment Analysis in Python using MonkeyLearn
Although open-source frameworks are great because of their flexibility, sometimes it can be a hassle to use them if you don't have experience in machine learning or NLP. Most open-source frameworks don't have pre-trained models that you can use right away; you'll have to train one from scratch. Also, you will need to build the proper infrastructure for training and deploying the machine learning models model.
Instead, you might be better off trying a SaaS API for sentiment analysis, such as MonkeyLearns. Learn how to do sentiment analysis with Python using MonkeyLearn’s API, and start using a pre-built sentiment analysis model with just six lines of code. Then, train your own custom sentiment analysis model using MonkeyLearn’s easy-to-use UI.
Sentiment Analysis Tutorials for Non-technical People
Until recently, sentiment analysis was a niche technology only accessible to techs with coding skills and a background in machine learning. This is no longer the case thanks to the rise of a variety of easy-to-use sentiment analysis tools.
The following tutorials can help you get started with sentiment analysis without a single line of code.
Sentiment Analysis with Excel
While we all know how to crunch numbers with Excel functions, analyzing text in spreadsheets is still a hard and manual process. It takes a lot of time to make sense of the text data to create reports and analyze trends. But luckily, there's a better way. Instead of spending hours going through each row, analyzing each text manually, you can use sentiment analysis with Excel to save time and get more done.
MonkeyLearn’s got your back, providing a fast and simple way to run sentiment analysis on your Excel spreadsheets.
First, you need to select a sentiment analysis model. You can either use a pre-trained sentiment analysis model or create your own model built with your own tags and criteria.
Then, you just need to upload your Excel file to run the sentiment analysis with the selected model. And voilà! MonkeyLearn will return a new Excel file with the original data plus two new columns: one with the sentiment analysis result and another one with the confidence of the result.
Sentiment Analysis with Help Desk
Are you interested in knowing the sentiment of a set of tweets? Or, maybe you want to understand survey responses are positive or negative? No worries, you can use help desk tools like Zapier to connect with more than 1,000 apps, get the data that you need, and run your sentiment analysis.
Our tutorial on sentiment analysis with Zapier will walk you through how to create a zap to get the data you need and run a sentiment analysis with MonkeyLearn, filter out samples by confidence so you eliminate those that are likely to lead to inaccurate predictions, and add a third step to your zap to save the results and create all kinds of data visualizations!
Sentiment Analysis in Google Sheets
MonkeyLearn can also power up your Google sheets with sentiment analysis. Follow our step-by-step guide, where we explain how to do sentiment analysis directly in your Google Sheets using our add-on. We also go over some best practices and provide examples of interesting things you can do with your data.
Sentiment Analysis with RapidMiner
RapidMiner is a platform where you can create data mining processes without being an experienced data scientist. It provides a friendly user interface where you can create complete data analysis workflows, including loading your data, running machine learning models, and create visualizations. It’s simple to use and someone with no coding skills can quickly create automated processes and analyses of data.
Doing sentiment analysis with RapidMiner is pretty straight-forward with the MonkeyLearn extension.
First, you have to add the data (i.e. a source) from your computer to RapidMiner. You can upload data from a CSV file, a database, or use other data sources available on RapidMiner marketplace to import data from sources like Facebook, SAS, Tableau, and others.
As a second step, you have to add the MonkeyLearn classify operator and connect it to the input (your data). This operator allows you to use text classifiers available on MonkeyLearn, including those trained specifically for sentiment analysis.
Finally, you have to connect the output of the MonkeyLearn classify operator to the results port, click on ‘run’ and voilà! Here’s a more thorough introduction on how to set up the Rapidminer extension for MonkeyLearn.
Next Steps: Research Literature
So far, you’ve read about the basics of sentiment analysis, had first-hand experience with sentiment analysis models, and possibly set up sentiment analysis using one of the tutorials above.
Now you might be eager to level up your skills and learn more about sentiment analysis. In that case, the next step would be to dig into research and scientific literature.
Papers about Sentiment Analysis
The literature around sentiment analysis is massive; there are more than 55,700 scholarly articles, papers, theses, books, and abstracts out there.
The following are the most frequently cited and read papers in the sentiment analysis community in general:
- Opinion mining and sentiment analysis (Pang and Lee, 2008)
- Recognizing contextual polarity in phrase-level sentiment analysis (Wilson, Wiebe and Hoffmann, 2005).
- A survey of opinion mining and sentiment analysis (Liu and Zhang, 2012)
- Sentiment analysis and opinion mining (Liu, 2012)
Books about Sentiment Analysis
Bing Liu is an eminence in the field and has written a book about sentiment analysis and opinion mining that’s super useful for those starting research on sentiment analysis. Liu does a wonderful job of explaining sentiment analysis in a way that is highly technical, yet understandable. Liu covers different aspects of sentiment analysis including applications, research, sentiment classification using supervised and unsupervised learning, sentence subjectivity, aspect-based sentiment analysis, and more.
Courses and Lectures
Another good way to go deeper with sentiment analysis is mastering your knowledge and skills in natural language processing (NLP), the computer science field that focuses on understanding ‘human’ language.
By combining machine learning, computational linguistics, and computer science, NLP allows a machine to understand natural language including people's sentiments, evaluations, attitudes, and emotions from written language.
There are a large number of courses, lectures, and resources available online, but the essential NLP course is the Stanford Coursera course by Dan Jurafsky and Christopher Manning. By taking this course, you will get a step-by-step introduction to the field by two of the most reputable names in the NLP community.
If you want a more hands-on course, you should enroll in the Data Science: Natural Language Processing (NLP) in Python on Udemy. This course gives you a good introduction to NLP and what it can do, but it will also make you build different projects in Python, including a spam detector, a sentiment analyzer, and an article spinner. Most of the lectures are really short (~5 minutes) and the course strikes the right balance between practical and theoretical content.
Sentiment Analysis Datasets
The key part for mastering sentiment analysis is working on different datasets and experimenting with different approaches. First, you’ll need to get your hands on data and procure a dataset which you will use to carry out your experiments.
The following are some of our favorite sentiment analysis datasets for experimenting with sentiment analysis and a machine learning approach. They’re open and free to download:
- Product reviews: this dataset consists of a few million Amazon customer reviews with star ratings, super useful for training a sentiment analysis model.
- Restaurant reviews: this dataset consists of 5,2 million Yelp reviews with star ratings.
- Movie reviews: this dataset consists of 1,000 positive and 1,000 negative processed reviews. It also provides 5,331 positive and 5,331 negative processed sentences / snippets.
- Fine food reviews: this dataset consists of ~500,000 food reviews from Amazon. It includes product and user information, ratings, and a plain text version of every review.
- Twitter airline sentiment on Kaggle: this dataset consists of ~15,000 labeled tweets (positive, neutral, and negative) about airlines.
- First GOP Debate Twitter Sentiment: this dataset consists of ~14,000 labeled tweets (positive, neutral, and negative) about the first GOP debate in 2016.
If you are interested in rule-based approach, the following is a varied list of sentiment analysis lexicons that will come in handy. These lexicons provide a set of dictionaries of words with labels specifying their sentiments across different domains. The following lexicons are really useful to identify the sentiment of texts:
- Sentiment Lexicons for 81 Languages: this dataset contains both positive and negative sentiment lexicons for 81 languages.
- SentiWordNet: this dataset contains about 29,000 words with a sentiment score between 0 and 1.
- Opinion Lexicon for Sentiment Analysis: this dataset provides a list of 4,782 negative words and 2,005 positive words in English.
- Wordstat Sentiment Dictionary: this dataset includes ~4800 positive and ~9000 negative words.
- Emoticon Sentiment Lexicon: this dataset contains a list of 477 emoticons labeled as positive, neutral, or negative.
Sentiment Analysis Tools and APIs
There are multiple options on Sentiment Analysis systems that can be consumed through an API or via a user interface. Broadly speaking, they can be classified into two different categories:
- Open Source libraries
- SaaS Tools
Open Source Libraries
Within open source libraries, there are programming languages such as Python or Java that are particularly well positioned since they have a strong data science community and, as a result, open source libraries for data science, including natural language processing. In all of these cases, you must have a strong knowledge of machine learning and programming in order to use the libraries successfully.
Sentiment Analysis APIs for Python
Python is one of the top programming languages for data science and it has a strong community and a large set of options to implement NLP models.
The following are remarkable examples:
Scikit-learn is the go-to library for Machine Learning and has useful tools for text vectorization. Training a classifier on top of vectorizations like frequency or tf-idf text vectorizers is very straightforward. Scikit-learn has implementations for Support Vector Machines, Naïve Bayes, and Logistic Regression, among others.
NLTK has been the traditional NLP library for Python. It has an active community and, besides providing low level functions for NLP, it also provides the possibility to train machine learning classifiers.
SpaCy is another recent NLP library with a growing community. Like NLTK, it provides a strong set of low-level functions for NLP and support for training text classifiers.
With the Deep Learning trend, in the last few years, a new set of data science libraries have been developed that have support for NLP applications. Some of the most remarkable:
TensorFlow. Developed by Google, it provides a low-level set of tools to build and train neural networks. There's also support for text vectorization, both on traditional word frequency and on more advanced through word embeddings.
Keras provides useful abstractions to work with multiple neural network types like recurrent neural networks (RNNs) and convolutional neural networks (CNNs) and easily stack layers of neurons. Keras can be run on top of Tensorflow or Theano. It also provides useful tools for text classification.
PyTorch is a recent Deep Learning framework backed by some prestigious organizations like Facebook, Twitter, Nvidia, Salesforce, Stanford University, University of Oxford, and Uber. It has quickly developed a strong community.
Sentiment Analysis APIs in Java
Java is another programming language with a strong community around data science with remarkable data science libraries for NLP.
- OpenNLP: a toolkit that supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution.
- Stanford CoreNLP: a Java suite of core NLP tools provided by The Stanford NLP Group.
- Lingpipe: a Java toolkit for processing text using computational linguistics. LingPipe is often used for text classification and entity extraction.
- Weka: a set of tools created by The University of Waikato for data pre-processing, classification, regression, clustering, association rules, and visualization.
Sentiment Analysis SaaS Tools
Implementing a sentiment analysis system from scratch is not an easy task. Usually, companies need to spend a lot of time, money, and resources in the following:
- A data science team.
- A development team.
- Deploying and scaling the infrastructure to train and run the models.
- Implementing and deploying an API to consume the models.
- Implementing tools to tag training examples.
- Adjusting the model hyperparameters.
If you want to avoid these hassles or you don't know how to code, a great alternative is to use sentiment analysis SaaS tools. You can easily use them from any system via their API, along with any programming language. There are a lot of programming languages where software is built, but few of them have strong libraries for data science. Another key advantage of these tools is that you don't even need to know how to code; they provide integrations with third-party apps such as Google Sheets, Excel, and Zapier so you can use sentiment analysis right away to analyze data.
The following is a list of sentiment analysis tools worth taking a look:
- Google Cloud NLP
- IBM Watson
- Amazon Comprehend
Sentiment analysis can be applied to countless aspects of business, from brand monitoring and product analytics, to customer service and market research. By incorporating it into their existing systems and analytics, leading brands (not to mention entire cities) are able to work faster, with more accuracy, toward more useful ends.
Sentiment analysis has moved beyond merely an interesting, high-tech whim, and will soon become an indispensable tool for all companies of the modern age. Ultimately, sentiment analysis enables us to glean new insights, better understand our customers, and empower our own teams more effectively so that they do better and more productive work.
MonkeyLearn is an online platform that makes it easy to analyze text with Machine Learning.
If you need help building a sentiment analysis system for your business, reach out and we’ll help you get started.