Introducing RapidMiner extension for MonkeyLearn

Introducing RapidMiner extension for MonkeyLearn

A few weeks ago we released the MonkeyLearn extension for RapidMiner, and since then it has become one of our sales team's favorite tools to demo and create a proof of concepts for our leads. Not only that, but we have users and customers using this integration to do some really interesting data analysis, saving hours of manual data processing with this extension.

In short, RapidMiner is a platform for data science teams. It unifies all the preparation, development and deployment of machine learning models.

Although it’s capacities far exceed what I have used it for, it so simple to use, that even someone like me with no coding skills whatsoever can quickly create automated processes and analysis.

In this post, I’ll guide you how to use RapidMiner and MonkeyLearn to analyze reviews including:

  • Creating your RapidMiner account.
  • Installing MonkeyLearn Extension.
  • Understanding and using the different types of operators within RapidMiner.
  • Creating a model within MonkeyLearn to analyze reviews.
  • Visualizing and understanding the results.

Creating your RapidMiner account

  1. To create your RapidMiner account follow this link.
  2. Once you have signed up you’ll need to download RapidMiner Studio. You can download the latest version here.

Installing MonkeyLearn Extension

To install the MonkeyLearn extension for RapidMiner follow these steps:

  1. Click on the Extensions tab and open up the "Marketplace" within the RapidMiner application.
  2. Use the search bar to search for MonkeyLearn.
  3. Click on the MonkeyLearn extension.
  4. Check the "Select for installation" box.
  5. Accept the terms of service.
  6. Install the package:

Understanding and using the different types of operators

Operators in RapidMiner are the building blocks used to create processes. An operator has inputs and output ports. These operators define what action is performed on the input and provide the result as output.

MonkeyLearn has two different types of Operators:

  • Classifier Operator.
  • Extractor Operator.

When using an operator for the first time you will need to input your API key from MonkeyLearn to connect your account:

  1. In the "Operators" tab, under the extensions folder, open the folder for MonkeyLearn and select a MonkeyLearn Operator.
  2. In the "Parameters" tab, click on the MonkeyLearn logo next to API Token.
  3. Select "Add connection".
  4. Enter a name and select MonkeyLearn API Key as Connection Type.
  5. Click "Create".
  6. Add your MonkeyLearn API key (you can find it here).
  7. Then click "Save all Changes".
  8. Remember to select the corresponding API token when using a MonkeyLearn Operator:

Classifier Operator

The MonkeyLearn Classify Operator allows you to consume Classification models from the MonkeyLearn API. Classification models are used to classify information, that is, automatically assign a category to a text. MonkeyLearn has different pre-trained classifiers for specific tasks (like sentiment analysis, analyzing NPS surveys, classifying startups according to its description, etc). You can also build and train your own custom classifier for your specific needs.

To use a Classifier Operator you need:

  1. Connect the Operator to an Input (which you can do by using the mouse and dragging it).
  2. Connect the Output to results port or other operators.
  3. Select API Token (previously added).
  4. Select Model ID.
  5. Select Input Attribute (this would be the text sent to MonkeyLearn to classify).

Extractor Operator

The MonkeyLearn Extract Operator allows you to consume Extraction models from the MonkeyLearn API. Extraction models are used to extract data from text, that is, the result you are looking for exists within the text. MonkeyLearn has different extraction models to extract different types of data: keywords, entities, insights and much more.

To use an Extractor Operator you need:

  1. Connect the Operator to an Input.
  2. Connect the Output to results port or other operators.
  3. Select API Token (previously added).
  4. Select Type of Extraction.
  5. Select Input Attribute (this would be the text sent to MonkeyLearn to make the Extraction).
  6. Additionally, you can select to Split Rows, this will output each extraction made on a different Row instead of doing it on the same line.

To include an operator in the Process:

  1. Use the operator search bar to find the correct operator.
  2. Drag and Drop it on the Process tab:

Connecting data and operators:

To connect two operators or a source of data with an operator you need to click and drag your mouse from the output of the first one to the input of the former.

Designing the process to analyze reviews

To analyze reviews we’ll follow a very simple process:

  1. Split reviews into different opinion units.
  2. Identify sentiment for each opinion unit.
  3. Identify Aspect for each opinion unit.

To do this we’ll need to:

1. Select a source of reviews:

In this case, I’m going to be using some reviews I have stored in a CSV with just one simple column that contains thousands of reviews. To add data stored on your computer just:

1.1. Click the Add Data Button.

1.2. Select a file containing the information you want to process.

2. Add MonkeyLearn Extract Operator:

2.1. Connect Source of reviews output with operator input.

2.2. Select API Token.

2.3. Select Opinion Unit Extractor:

2.3.1. This model will grab each individual review and split it into different opinion units.

2.3.2. This is useful to us because we want to understand the sentiment and topic behind each sentence and not just the overall sentiment of the review.

2.4. Select Input Attribute Name (text to be extracted).

2.5. Check Split Rows. This will put every new Opinion Unit in a new row, if left unchecked it will output all Opinion Unit from a review in the same cell:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Adding the MonkeyLearn extract operator

3. Add MonkeyLearn Classify Operator:

3.1. Connect Output from Extract operator to the Classify Operator Input.

3.2. Select API Token.

3.3. Add Model ID: cl_TKb7XmdG. This machine learning model was trained with Hotel Reviews to identify aspects and topics mentioned in new unseen hotel reviews. It will classify each Opinion Unit into the following tags:

  1. Cleanliness.
  2. Comfort & Facilities.
  3. Food.
  4. Internet.
  5. Location.
  6. Staff.
  7. Value for Money.

3.4. Select Input Attribute Name: MonkeyLearn Extraction:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Adding the MonkeyLearn classify operator.

4. Add Rename Operator (from the Names & Roles Operators Folder):

4.1. Connect Output from Classify Operator to Rename Operator Input.

4.2. Select Classification Path as the old name.

4.3. Type "Aspect" in the new name field:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Adding a rename operator.

5. Add MonkeyLearn Classify Operator:

5.1. Connect Output from Rename Operator to the Classify Operator Input.

5.2. Select API Token.

5.3. Add Model ID: cl_rZ2P7hbs. This model will classify each opinion unit as Good or Bad.

5.4. Select Input Attribute Name: Aspect.

5.5. Connect the output port to the results port:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Adding a classify operator for detecting aspect from reviews.

6. Run the Process

It should look like this:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Running the process with RapidMiner.

Visualizing the results

RapidMiner has visualizations tools built right into the studio platform. We can quickly use this to visualize our review results and the predictions of MonkeyLearn on our data. When the process ends, we’ll be taken to the Results tab which will look something like this:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Visualizing the results of our analysis

Each row presents an opinion unit (OU):

  • Content: Original Review (where a "?" appears it means de OU belongs to the review above).
  • MonkeyLearn Extraction: Each individual Opinion Unit.
  • Aspect: Topic mentioned in the review.
  • Path 1 - Category: Full category path.
  • Path 1 - Probability: Probability of the OU mentioning that Aspect.
  • Classification: Sentiment behind each OU.
  • Category 1: Full category path.
  • Probability 1: Probability of the OU being good or bad.

Creating some charts:

With these results you can do some really interesting things, for example, you can build easy Bar Charts that will quickly help you understand the reviews and its analysis. In this case, I started by analyzing sentiment:

  • Group-by Column: Classification Path (which correspond to the sentiment).
  • Value Column: MonkeyLearn Extraction (which corresponds to Opinion Units Extracted).
  • Aggregation: count (So it will count the number of Opinion Units belonging to each tag).
  • Rotate labels (So it's easier to see).
  • Vertical (I prefer visualizing it that way, but you could quickly change it to horizontal).
Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Creating some visualizations of our data analysis.

From this graph, we can see that most Opinions gathered were good (more than 2,200) than bad (barely 300).

Or we can try to understand how many Opinion Units mentioned different Aspects by switching Group-by Column to Aspect. Which will output a graph like this:

Using Rapidminer and MonkeyLearn to analyze reviews with machine learning

Visualizing the aspect predictions of our reviews.

Which allows us to see that most opinions were about location, staff, comfort & facilities as well as value for money.

Wrapping up

As we can assess the possibilities of combining RapidMiner with MonkeyLearn are endless and it's just a matter of getting started and playing with data. RapidMiner does offer much more than what I could cover in this guide but the idea was to get you started on the path of analyzing reviews.

Do you have reviews about your product or services? Are you using this data to inform your decisions?

Give us a shout if you have any ideas you would like to explore.

Diego Ventura

January 25th, 2018

Posts you might like...

MonkeyLearn Logo

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate business processes and save hours of manual data processing.

Try MonkeyLearn
Clearbit LogoSegment LogoPubnub LogoProtagonist Logo