Sentiment Analysis with Machine Learning: Process & Tutorial

Sentiment analysis is a machine learning tool that analyzes texts for polarity, from positive to negative. By training machine learning tools with examples of emotions in text, machines automatically learn how to detect sentiment without human input.

To put it simply, machine learning allows computers to learn new tasks without being expressly programmed to perform them. Sentiment analysis models can be trained to read beyond mere definitions, to understand things like, context, sarcasm, and misapplied words. For example:

“Super user-friendly interface. Yeah right. An engineering degree would be helpful.”

Out of context, the words ‘super user-friendly’ and ‘helpful’ could be read as positive, but this is clearly a negative comment. Using sentiment analysis, computers can automatically process text data and understand it just as a human would, saving hundreds of employee hours.

Imagine using machine learning to process customer service tickets, categorize them in order of urgency, and automatically route them to the correct department or employee. Or, to analyze thousands of product reviews and social media posts to gauge brand sentiment

Read on to learn more about how machine learning works and how it can help your business

How Does Sentiment Analysis with Machine Learning Work?

There are a number of techniques and complex algorithms used to command and train machines to perform sentiment analysis. There are pros and cons to each. But, used together, they can provide exceptional results. Below are some of the most used algorithms. 

How machine learning in sentiment analysis works in both the training and prediction phases

Naive Bayes

Naive Bayes is a fairly simple group of probabilistic algorithms that, for sentiment analysis classification, assigns a probability that a given word or phrase should be considered positive or negative.

Essentially, this is how Bayes’ theorem works. The probability of A, if B is true, is equal to the probability of B, if A is true, times the probability of A being true, divided by the probability of B being true

Formula for Bayes' theorem: The probability of A, if B is true, is equal to the probability of B, if A is true, times the probability of A being true, divided by the probability of B being true

But that’s a lot of math! Basically, Naive Bayes calculates words against each other. So, with machine learning models trained for word polarity, we can calculate the likelihood that a word, phrase, or text is positive or negative.

When techniques like lemmatization, stopword removal, and TF-IDF are implemented, Naive Bayes becomes more and more predictively accurate.

Linear Regression

Linear regression is a statistical algorithm used to predict a Y value, given X features. Using machine learning, the data sets are examined to show a relationship. The relationships are then placed along the X/Y axis, with a straight line running through them to predict further relationships. 

Linear regression calculates how the X input (words and phrases) relates to the Y output (polarity). This will determine where words and phrases fall on a scale of polarity from “really positive” to “really negative” and everywhere in between. 

Support Vector Machines (SVM)

A support vector machine is another supervised machine learning model, similar to linear regression but more advanced. SVM uses algorithms to train and classify text within our sentiment polarity model, taking it a step beyond X/Y prediction.

For a simple visual explanation, we’ll use two tags: red and blue, with two data features: X and Y. We’ll train our classifier to output an X/Y coordinate as either red or blue.

How SVM works:  red and blue shapes represent two data features: X and Y

The SVM then assigns a hyperplane that best separates the tags. In two dimensions this is simply a line (like in linear regression). Anything on one side of the line is red and anything on the other side is blue. For sentiment analysis this would be positive and negative.

In order to maximize machine learning, the best hyperplane is the one with the largest distance between each tag:

SVM assigns a hyperplane that best separates the tags or red and blue shapes

However, as data sets become more complex, it may not be possible to draw a single line to classify the data into two camps:

two-dimensional hyperplane explaining how SVM works

Using SVM, the more complex the data, the more accurate the predictor will become. Imagine the above in three dimensions, with a Z axis added, so it becomes a circle.

Mapped back to two dimensions with the best hyperplane, it looks like this:

Showing the best hyperplane for SVM

Very simply put, SVM allows for more accurate machine learning because it’s multidimensional.

Deep Learning

Deep learning is a subfield of machine learning that aims to calculate data as the human brain does using “artificial neural networks.” 

Deep learning is hierarchical machine learning. In other words, it’s multi-level, and allows a machine to automatically ‘chain’ a number of human-created processes together. By allowing multiple algorithms to be used progressively, while moving from step to step, deep learning is able to solve complex problems in the same way humans do.

Sentiment Analysis with Machine Learning Tutorial

As you can see from the above, the calculations and algorithms involved in sentiment analysis are quite complex. But with user-friendly tools, sentiment analysis with machine learning is accessible to everyone, whether you have a computer science background or not.

MonkeyLearn offers simple SaaS tools that help you get started with machine learning right away – no coding required. Try out this premade sentiment analysis demo model to see for yourself how it works – you can do some really neat stuff with it.

MonkeyLearn’s simple user interface makes it easy to build your own sentiment analysis model in just a few short steps. Follow our tutorial below and see what sentiment analysis can do for you:

1. Choose your model

Once you’ve signed up to MonkeyLearn, go to the dashboard and choose ‘Create a model’, then click ‘Classifier,’:

Step one in MonkeyLearn's model creation app: Choose a model

2. Choose your classifier

We want to show how machine learning works oncustomer opinions, so click on ‘Sentiment Analysis’:

Step two in MonkeyLearn's model creation app: Choose a classifer

3. Import your data

You can import data from an app or upload a CSV or Excel file. This will be used to train your sentiment analysis model. For this example, we’ll import data directly from Twitter.

Step three in MonkeyLearn's model creation app: Import your data

Enter a search query for tweets you’d like to use to train your model. It can be a keyword, hashtag, or brand mention. We’ll use the keyword ‘Zapier,’ for this tutorial. 

Next, choose the column you want to import data from (usually the text of the tweet):

Step four in MonkeyLearn's model creation app: highlight the data you want to use

4. Tag tweets to train your sentiment analysis classifier

Here’s where we see machine learning at work. Tag each tweet as Positive, Negative, or Neutral to train your model based on the opinion within the text. Once you tag a few, the model will begin making its own predictions. Correct them, if the model has tagged them wrong:

Step five in MonkeyLearn's model creation app: start tagging data by positive, negative, or neutral

5. Test your classifier

Once the model has been trained with some examples, you can paste your own text to see how they’re classified. If it’s not tagging correctly, you can keep training. The more you train the model, the better it’s predictions will become:

Step six in MonkeyLearn's model creation app: testing with your own data

MonkeyLearn shows a number of sentiment analysis statistics to help understand how well machine learning is working: Precision and Recall are tag level statistics, and Accuracy and F1 Score are statistics on the overall model. The keyword cloud helps visualize the most used words.

In the example below more tags are needed for Negative.

The analysis results for negative sentiments

6. Put your machine learning to work

Once your model is trained, you can upload huge amounts of data. MonkeyLearn offers three ways to upload your data:

  • Batch Analysis: upload a CSV or Excel file with new text. MonkeyLearn will process the data and provide your sentiment results.
  • Integrations: MonkeyLearn offers simple integrations with apps you probably already use:

MonkeyLearn's Integrations: Zapier, Google Sheets, RapidMiner, Zendesk

API: easy programming for quick plug-in analysis:

API code snippet

Put Machine Learning to Work for You

Sentiment analysis using machine learning can help any business analyze public opinion, improve customer support, and automate tasks with fast turnarounds. Not only saving you time, but also money. Sentiment analysis results will also give you real actionable insights, helping you make the right decisions. 

While machine learning can be complex, SaaS tools like MonkeyLearn make it simple for everyone to use. 

MonkeyLearn’s tools are also completely scalable, and can be effortlessly configured to your specific needs.

Learn more about how MonkeyLearn can help you get started with sentiment analysis.

Rachel Wolff

April 20th, 2020