A Guide and Tutorial to Text Mining with Python

Mining data for insights into your brand’s status is easy if you have the right tools. Using Python, you can program machines to analyze text from surveys, social media mentions, product reviews, and more.

First, you’ll need to find the text mining tool that’s right for you. Open-source tools, like Scikit-learn and tensorflow, are readily available in Python. But you’ll need to create your own model, which can require hours of work and a serious computer science background. Also, open-source tools are not the most user-friendly and you’ll need to install them. 

SaaS tools in Python, on the other hand, are easy to use and you can start using ready-built machine learning models in next to no time – no installation needed. Plus, they’ll automatically prepare text data for you using a number of Natural Language Processing techniques (NLP) and processes, like word tokenization, stemming, and lemmatization. Just sign up and away you go.

MonkeyLearn is a SaaS platform that offers an array of pre-built text mining tools and SaaS APIs in Python that can be implemented with low-level coding in just a few minutes.

Sign up to MonkeyLearn to use these tools. Then, follow our tutorial as you perform sentiment analysis with our pre-built model. Try it out for free.

We’ll also show you how to call your model API with Python.

Tutorial On How to Do Text Mining With Python

Text mining with MonkeyLearn’s Python API is easy. There’s not a lot of code involved, and you can set it up in just a few minutes.

We’ll use the MonkeyLearn API to access text mining models automatically. The API tab has instructions on how to integrate using your own Python code (or Ruby, PHP, Node, or Java):

You can send plain requests to the MonkeyLearn API and parse the JSON responses yourself. But we’ve created SDKs in a number of languages to make API integration even easier.

To get started with our API, you’ll need the API key. Sign up here to get yours. Then install the Python SDK:

Now that you’re set up, you’re ready to run text mining with the code below:

The output will be a Python dict generated from the JSON sent by MonkeyLearn and should look something like this:

This returns the input text list in the same order, with each text and the output of the model. Now you’re ready for automatic text mining to get real insights from your data. 

You can see full documentation of our API and its features in our docs.

Create and Train Your Own Text Mining Model With Python

The great thing about text mining is the more your model is trained to your industry and the specific language used, the better it will perform. 

For this tutorial we’ll use a sample CSV file from a dataset on hotel reviews, to classify them as positive or negative.

Follow along to see how to create your own sentiment analysis model with Python:

1. Create a text classifier

Go to the MonkeyLearn dashboard, click Create a Model, then choose ‘Classifier’:

Choose ‘Topic Classification’:

2. Upload your Sample data

Now we enter the data for our classifier. There are a number of ways to do this, but for this case, you’ll need to get the sample CSV file and click ‘CSV’ to upload it to your classifier:

3. Define tags

Now you need to determine the tags the model is going to use to classify your data:

4. Train your model

Next, you’ll need to manually tag some of your data, you do this by assigning the appropriate tag to each text. Remember, the more data you tag while training your model, the better it will perform.

5. Test the model

Your model’s ready! Select the ‘Run’ tab and enter new text to check for accuracy.

Calling the Model API with Python

Your custom model is ready to use. To connect it to your tools, the setup is similar to the steps in the tutorial we showed you earlier. Just select your customized model’s ID instead:

And the output for the code will look like this:

You can also create a classifier, upload data, and create tags, directly in our API. For more information on how to do this, check out our API documentation.

Wrap Up

Now that you’ve learned how to do text mining using Python, you can use MonkeyLearn’s APIs to perform text mining tasks like topic and language classification, sentiment analysis, keyword extraction, and more. 

Choose to create custom models using our simple user interface, or directly in Python, and easily connect your text mining tools to the ones you use every day. 

Don’t miss out on all that data and actionable insights. You’ll save time and get much more accurate information than human analysis can provide. 

Sign up to MonkeyLearn to start using all our text mining models. 

Inés Roldós

Inés Roldós

Marketing @MonkeyLearn. Business Administration student.


Have something to say?

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate
business processes and save hours of manual data processing.