Text Analysis with Python

Businesses deal with information all the time: emails, chat conversations, support tickets, social media mentions, product reviews, and so on. Did you know that 7 out of 8 customers leave their opinion after a purchase

The good news is that all this text data contains valuable insights that you can use to make decisions about your products or services. The problem, however, is that all this information is unstructured, so it’s hard to retrieve useful insights. You could try analyzing your text data manually but it would take forever. On top of that, it’s tedious work. 

Thankfully, some tools can help you transform your data into meaningful insights, and text analysis with Python is one of them. 

While machine learning to analyze text may seem daunting, it’s really simple to get started with. Not only will they help you save a lot of time and resources when analyzing data, but it will also allow your teams to focus on more pressing (and motivating) tasks. 

In this post, you will find everything you need to know about text analysis in general, how to use text analysis tools with Python, and all the necessary steps to create your own custom model for text analysis. 

Let’s get started!

What Is Text Analysis?

Text analysis is the automated process of examining text by extracting and classifying data from your written data sources (emails, Facebook comments, survey responses, chat conversations, and more). Analyzing these texts by hand is time-consuming, tedious, and ineffective, especially if you deal with large amounts of data every day. Walmart, for example, receives 200 billion rows of transactional data in just a few weeks! Imagine if they wanted to process this data with human agents: it would be impossible.

Text analysis with machine learning helps you deal with information overload. It never gets tired, bored, or changes its criteria, and can analyze hundreds and thousands of pieces of data in just a few seconds. 

These tools learn in a very similar way to humans, based on data and experience. We start differentiating between objects, emotions, themes, and so on, by seeing many examples of the same thing. Ever learned how to play chess? First, someone probably explained the rules to you, not just once, but over and over again. Then you might have observed how others play, and eventually started playing chess on your own. 

In a very similar fashion, we can teach machine learning tools to accurately distinguish between texts by manually feeding them tagged samples of data. Once you’ve tagged enough examples, these models can start making predictions on their own. How many of these texts do you need to tag? Well, there is no precise answer. It all comes down to your goals, the type of analysis you are going to carry out, and the number of tags involved. 

It also depends on the type of model you use: text classifiers or text extractors. On the one hand, text classifiers assign a category or tag to a piece of text based on its content. They are used for a wide variety of analysis such as sentiment analysis, topic classification, urgency detection, intent categorization, and others. 

On the other hand, text extractors identify and pulls data that is within the text. They are used for extracting the most relevant keywords or expressions from text, extracting names of people, brands or companies, prices, dates, etc. 

How to Use Text Analysis with Python

Python is the most popular programming language today, especially in the field of scientific computing, as it is a highly intuitive language when compared to others such as Java. It’s more concise, so it takes less time and effort to carry out certain operations. Finally, the syntax and code readability make it efficient, easy to process, and easy to learn. All these perks make Python the perfect option to build a machine learning model for text analysis. 

You might opt for open source libraries, such as Scikit-learn or NLTK, for example. Some other libraries include SpaCy (its API is simple and productive), Keras (a machine learning library with a focus on enabling fast experimentation), TensorFlow (for using deep learning for analyzing text), or PyTorch (another library used for building deep neural networks for NLP). 

However, building a model for analyzing texts with machine learning is not easy. Besides having to know about machine learning, you will need to spend time and resources to build the necessary infrastructure to run the model, train it, try it, and start all over again as many times as needed. 

Alternatively, there are many SaaS tools that can make your life easier when it comes to text analysis. One of them is MonkeyLearn, a simple but powerful platform for analyzing text with machine learning which can save you a lot of time and resources when implementing a text analysis solution: 

  • You don’t need to know about machine learning. MonkeyLearn provides ready-to-use models for specific text analysis tasks such as sentiment analysis, keyword extraction, or urgency detection. And if you are looking for maximum accuracy, you can train a custom model with your own data and criteria by using a simple UI. You just have to upload a bunch of texts and tagging them manually. After you’ve fed your model a few examples, it will start making predictions on its own. 
  • You don’t have to worry about setting up ecosystems, and libraries to start training your model. This can be time-consuming and difficult. For example, if you want to use scikit-learn you need to install NumPy, SciPy, and joblib first. With MonkeyLearn, you just use a simple but beautiful API to make requests to a model, and in return, you receive the results of the analysis.

For example, this is how you make an API request to this pre-trained model for sentiment analysis:

The API response for this request will look like this:

Create Your Own Text Analysis Model

Now, pre-trained models are great to start with. They will give you an idea of how machine learning works and how you can get insights out from your texts. However, if you are looking for true accuracy then you should build your own model. As you will be the one defining the tags and training your model with relevant samples, you’ll get better results. 

Let’s take a look at how to build a custom topic classifier. We’ll train a model that can automatically classify reviews from a SaaS into categories such as Customer Support, Ease of Use, and Pricing. Don’t worry, it’s easy and you’ll be able to integrate your model’s API with Python in no time.

1. Choose Model Type

Access your dashboard and click ‘create model’ in the top right-hand corner of the page. Then, choose ‘classifier:

In the following screen, choose the ‘topic classification’ model: 

2. Upload training data

Now, you’ll need to import your data. You can upload a CSV or Excel file with text examples, or integrate MonkeyLearn with apps such as Zendesk, Gmail, or Twitter. For this tutorial, we are going to upload a CSV file with a set of reviews from a SaaS (you can download the dataset here):

3. Define your tags

In this step, you’ll need to add the tags used to classify your texts. Try not to create too many, or the training process will be much more difficult. If you forget to add a tag here, you can always add it later on. In this case, we’ll define the tags Customer Support, Ease of Use, and Pricing:

4. Train your model

Now comes the most important part of creating your model: you have to train it! This means you have to accurately tag as many samples as possible. The model will learn the patterns and criteria you use for categorizing a review as Pricing, Ease of Use, or Customer Support. The more data you tag, the smarter your model becomes:

When you have reached a certain number of samples, your model will start making accurate predictions. 

5. Try your model out

You just need to write something in the text box to see how well your model works:

If your model needs to improve its predictions, you just have to tag more samples. 

6. Call the API with Python

Finally, you can use our API with Python to integrate the model with your apps. You can do this with a few lines of code:

The API will return a response with the result:


Analyzing your texts manually is a drag. Not only is it time-consuming, but it’s also ineffective and tedious. Human agents can only cope with a certain amount of information, no matter how hard-working they are. That is why text analysis with AI is essential for businesses – it allows teams to focus on more relevant and motivating tasks, and helps extract valuable insights.

Using text analysis with Python will save you a lot of time and resources, especially if you use SaaS tools such as MonkeyLearn instead of building a solution from scratch. Forget about setting up the necessary infrastructure, spending hours coding, and investing in expensive resources to run your own solution.

MonkeyLearn offers an array of pre-trained models you can start using right away. Once you’ve familiarized yourself with our platform, you can create your own text analysis models and integrate them with your apps right away using our simple API.

Get started with text analysis today! Just create an account on MonkeyLearn for free, or request a demo if you’d like to know more. Our team is ready to answer any questions you have!

Federico Pascual

Federico Pascual

COO & Co-Founder @MonkeyLearn. Machine Learning. @500startups B14. @Galvanize SoMa. TEDxDurazno Speaker. Wannabe musician and traveler.


Have something to say?

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate
business processes and save hours of manual data processing.