A few weeks ago we released the MonkeyLearn extension for RapidMiner, and since then it has become one of our sales team's favorite tools to demo and create a proof of concepts for our leads. Not only that, but we have users and customers using this integration to do some really interesting data analysis, saving hours of manual data processing with this extension.
In short, RapidMiner is a platform for data science teams. It unifies all the preparation, development and deployment of machine learning models.
Although it’s capacities far exceed what I have used it for, it so simple to use, that even someone like me with no coding skills whatsoever can quickly create automated processes and analysis.
In this post, I’ll guide you how to use RapidMiner and MonkeyLearn to analyze reviews including:
To install the MonkeyLearn extension for RapidMiner follow these steps:
Operators in RapidMiner are the building blocks used to create processes. An operator has inputs and output ports. These operators define what action is performed on the input and provide the result as output.
MonkeyLearn has two different types of Operators:
When using an operator for the first time you will need to input your API key from MonkeyLearn to connect your account:
The MonkeyLearn Classify Operator allows you to consume Classification models from the MonkeyLearn API. Classification models are used to classify information, that is, automatically assign a category to a text. MonkeyLearn has different pre-trained classifiers for specific tasks (like sentiment analysis, analyzing NPS surveys, classifying startups according to its description, etc). You can also build and train your own custom classifier for your specific needs.
To use a Classifier Operator you need:
The MonkeyLearn Extract Operator allows you to consume Extraction models from the MonkeyLearn API. Extraction models are used to extract data from text, that is, the result you are looking for exists within the text. MonkeyLearn has different extraction models to extract different types of data: keywords, entities, insights and much more.
To use an Extractor Operator you need:
To include an operator in the Process:
Connecting data and operators:
To connect two operators or a source of data with an operator you need to click and drag your mouse from the output of the first one to the input of the former.
To analyze reviews we’ll follow a very simple process:
To do this we’ll need to:
In this case, I’m going to be using some reviews I have stored in a CSV with just one simple column that contains thousands of reviews. To add data stored on your computer just:
1.1. Click the Add Data Button.
1.2. Select a file containing the information you want to process.
2.1. Connect Source of reviews output with operator input.
2.2. Select API Token.
2.3. Select Opinion Unit Extractor:
2.3.1. This model will grab each individual review and split it into different opinion units.
2.3.2. This is useful to us because we want to understand the sentiment and topic behind each sentence and not just the overall sentiment of the review.
2.4. Select Input Attribute Name (text to be extracted).
2.5. Check Split Rows. This will put every new Opinion Unit in a new row, if left unchecked it will output all Opinion Unit from a review in the same cell:
3.1. Connect Output from Extract operator to the Classify Operator Input.
3.2. Select API Token.
3.3. Add Model ID: cl_TKb7XmdG. This machine learning model was trained with Hotel Reviews to identify aspects and topics mentioned in new unseen hotel reviews. It will classify each Opinion Unit into the following tags:
3.4. Select Input Attribute Name: MonkeyLearn Extraction:
4.1. Connect Output from Classify Operator to Rename Operator Input.
4.2. Select Classification Path as the old name.
4.3. Type "Aspect" in the new name field:
5.1. Connect Output from Rename Operator to the Classify Operator Input.
5.2. Select API Token.
5.3. Add Model ID: cl_rZ2P7hbs. This model will classify each opinion unit as Good or Bad.
5.4. Select Input Attribute Name: Aspect.
5.5. Connect the output port to the results port:
It should look like this:
RapidMiner has visualizations tools built right into the studio platform. We can quickly use this to visualize our review results and the predictions of MonkeyLearn on our data. When the process ends, we’ll be taken to the Results tab which will look something like this:
Each row presents an opinion unit (OU):
With these results you can do some really interesting things, for example, you can build easy Bar Charts that will quickly help you understand the reviews and its analysis. In this case, I started by analyzing sentiment:
From this graph, we can see that most Opinions gathered were good (more than 2,200) than bad (barely 300).
Or we can try to understand how many Opinion Units mentioned different Aspects by switching Group-by Column to Aspect. Which will output a graph like this:
Which allows us to see that most opinions were about location, staff, comfort & facilities as well as value for money.
As we can assess the possibilities of combining RapidMiner with MonkeyLearn are endless and it's just a matter of getting started and playing with data. RapidMiner does offer much more than what I could cover in this guide but the idea was to get you started on the path of analyzing reviews.
Do you have reviews about your product or services? Are you using this data to inform your decisions?
Give us a shout if you have any ideas you would like to explore.
January 25th, 2018