Tutorial – Text Mining in Python

Tutorial – Text Mining in Python

Mining text for insights about your business is easy if you have the right tools.

Open-source tools, like Scikit-learn and TensorFlow, are readily available in Python. But you’ll need to build your own model, which can require hours of work and a serious computer science background. Also, open-source tools are not the most user-friendly, and you’ll need to install them first.

SaaS tools in Python, on the other hand, are easy to use and you can start using ready-built text mining tools in next to no time – no installation needed. Plus, they’ll automatically prepare text data for you using a number of natural language processing techniques (NLP) and processes, like word tokenization, stemming, and lemmatization.

To get started with text mining in Python, follow this simple tutorial, below.

Tutorial On How to Do Text Mining in Python

MonkeyLearn is a SaaS platform that offers an array of pre-built text analysis tools and SaaS APIs in Python, allowing you to get started right away with just a few lines of code.

First, sign up to MonkeyLearn for free.

Then, follow our tutorial as you perform sentiment analysis with a pre-built text mining model.

You’ll need to use MonkeyLearn’s API to connect text mining models automatically.

The API tab has instructions on how to integrate models using your own Python code (or Ruby, PHP, Node, or Java): Text mining with MonkeyLearn’s Python API is easy. There’s not a lot of code involved, and you can set it up in just a few minutes.

We’ll use the MonkeyLearn API to access text mining models automatically. The API tab has instructions on how to integrate using your own Python code (or Ruby, PHP, Node, or Java):

You can send plain requests to the MonkeyLearn API and parse the JSON responses yourself. But we’ve created SDKs in a number of languages to make API integration even easier.

Once you’ve signed up to MonkeyLearn, you’ll be able to access your API key to perform text mining. First, install the Python SDK:

pip install monkeylearn

Now that you’re set up, you’re ready to run text mining with the code below:

from monkeylearn import MonkeyLearn
 
ml = MonkeyLearn('<<Your API key here>>')
data = ['Very helpful and friendly staff.','Bed was extremely comfortable.']
model_id = 'cl_TKb7XmdG'
result = ml.classifiers.classify(model_id, data)
 
print(result.body)

The output will be a Python dict generated from the JSON sent by MonkeyLearn and should look something like this:

[{
    'text': 'Very helpful and friendly staff.', 
    'classifications': [{
        'tag_name': 'Staff', 
        'confidence': 1.0,
        'tag_id': 1403281
     }],
        'error': False,
        'external_id': None
}, {
    'text': 'Bed was extremely comfortable.',
    'classifications': [{
        'tag_name': 'Comfort & Facilities', 
        'tag_id': 1406435, 
        'confidence': 0.911
    }],
    'error': False, 
    'external_id': None
}]

This returns the input text list in the same order, with each text and the output of the model. Now you’re ready for automatic text mining to get real insights from your data. 

You can see full documentation of our API and its features in our docs.

Create and Train Your Own Text Mining Model in Python

Now, you might want to create your own text mining model and connect it with our API in Python. The great thing about creating your own model is that you can train it with your own dataset, specific to the problem you’re trying to solve and teach it to understand industry-specific language and opinions.

For this tutorial, we’ll use hotel reviews as our sample dataset, and classify them by topic.

Follow along to see how to create your own topic classifier and connect it to your favorite tools:

1. Create a text classifier

Go to the MonkeyLearn dashboard, click Create a Model, then choose ‘Classifier’:

Choose ‘Topic Classification’, which will allow you to sort your data by topic:

2. Upload the data you want to mine for insights

Now, you’ll need to import the data you want to mine for insights. In this tutorial, we’re using the sample CSV file containing hotel reviews. Download it from the ‘Data Library’, then click on the ‘CSV’ icon and upload your data:

3. Define tags

Now you need to define the tags or topics you want to use to classify your data. Just type in your tags then click the ‘+’ icon. Once you’ve added all your tags, click ‘Continue’:

4. Train your text mining model

Train your text classification model by manually tagging each piece of text. Once you’ve tagged a few examples, the model will start making its own predictions. You can re-tag inaccurate examples to improve your model’s performance.

5. Test the model

Your model’s ready! Select the ‘Run’ tab and enter new text to check for accuracy.

Calling the Model API with Python

Your custom text mining model is ready to use. To connect it to your tools, the setup is similar to the steps in the tutorial we showed you earlier. Just select your customized model’s ID instead of the demo one:

from monkeylearn import MonkeyLearn
 
ml = MonkeyLearn('<<Your API key here>>')
data = ['Really close to Times Square.']
model_id = '<<Your model ID here>>'
result = ml.classifiers.classify(model_id, data)
 
print(result.body)

And the output for the code will look like this:

[{
    'text': 'Really close to Times Square.', 
    'classifications': [{
        'tag_name': 'Location', 
        'confidence': 0.845,
        'tag_id': 122740460
     }],
     'error': False,
     'external_id': None
}]

You can also create a classifier, upload data, and create tags, directly in our API. For more information on how to do this, check out our API documentation.

Wrap Up

Now that you’ve learned how to do text mining in Python, you can use MonkeyLearn’s APIs to perform text mining tasks like topic and language classification, sentiment analysis, keyword extraction, and more.

Choose to create custom models using our simple user interface, or directly in Python, and easily connect your text mining tools to the ones you use every day. 

Don’t miss out on all that data and actionable insights. You’ll save time and get much more accurate information than human analysis can provide. 

Sign up to MonkeyLearn to start using all our text mining models.

Inés Roldós

April 27th, 2020

Posts you might like...

MonkeyLearn Logo

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate business processes and save hours of manual data processing.

Try MonkeyLearn
Clearbit LogoSegment LogoPubnub LogoProtagonist Logo