Creating a Module
First, create a new module by clicking the Create Module button at the top of your screen:
Name, Description, Permissions and Module Type
- Name: the name of the module.
- Description: that should explain the functionality of your module.
- Public (every user in MonkeyLearn will be able to see and use the module)
- Private (only you or people that you explicitly invite through teams can see and use the module)
- Module Type:
Type of Problem
Type of Text and Language
Creating a Category Tree
- Create the categories right on the category three at the left within the Sandbox/Tree tab.
- Create the categories when uploading tagged data, that is, when you upload data, you’re also defining the categories and hierarchies. To s
Creating categories on the GUI
Just go to the Sandbox/Tree tab and add categories to the corresponding parent category:
Creating categories on data upload
Just go to the Sandbox/Samples tab and upload new tagged data with the Upload wizard:
When you upload a tagged dataset, you can specify the column that has the text content and the column that has the category. Use the combo boxes at the top of the column to select “Use as text” or “Use as category” respectively. If you upload samples with new categories, MonkeyLearn will create the corresponding category for you. Take a look at the CSV/Excel file specification to know how the syntax to denote hierarchies and multilabel categories.
Adding Training Samples
Now that you have our category tree, you must upload training samples that are representative for each category node. If you created the category tree by uploading a tagged CSV/Excel file, you may already uploaded some samples.
- Create sample allows you to create a sample by pasting text into a textbox.
- Upload as CSV/Excel file, through the GUI.
- Upload data through the API.
Training your Classifier
Now you are ready to train our Machine Learning model. After creating the category tree and adding samples to each category (at least one sample per leaf node) you can train the model by clicking the Train button.
You will notice that the state changes to a yellow TRAINING alongside with a progress bar. As our example has few categories and samples, the training is almost instant. After the training is finished, if the process was successful, the state changes to a green TRAINED. Congratulations! you have trained your first machine learning model!
The screen now shows some performance indicators in the Statistics section, depending on the samples you uploaded, category tree and selected parameters, you can obtain different results. In the picture, the results show 82% of Accuracy (please refer to section Classifier statistics in order to review the different performance indicators).
You may have noticed that when you select each category of the tree, you can view the statistics and the samples associated with that particular category. Also, samples are only shown when the particular category has samples associated to it. Take into account that categories that have children categories use the samples of their children to train themselves. For example, when the classification module has to decide between Sport and Politics the samples from Basketball and Football shall be used as samples for Sports category.
At the right of the statistics you can see a keyword cloud that shows the terms that are used to characterize the samples to know in which category should be placed (in machine learning these are commonly called features or attributes). Take into account that the keywords can be a bit transformed if you use stemming in your advanced settings. Also the length of the terms obtained depends on your configuration of the n-gram range. The following shows the keyword cloud corresponding to the Sports category in our example classifier:
Another useful tool to analyze how well the classifier is performing, and in particular, which errors is making, is to look at the Confusion matrix. If you select a particular parent category, you’ll see the confusion matrix at the bottom of the screen, like this:
Testing your Classifier
After you train your module, MonkeyLearn publishes a web API that allows you to integrate your module within your project with any programming language. Take a look at Integrating Modules for more details on that. With the Classify tab, you can test those endpoints with a simple graphic interface.
You can type or paste a text into the MonkeyLearn interface, click Submit and obtain the corresponding classification in the result box.
The result, returned in JSON format. Check the API documentation for more details.
You can also perform a classification with a list of samples by uploading a CSV/Excel file. Just select the Classify File and follow the wizard.
Putting your Module in Production
After you created and trained a custom classifier, you may want to integrate with your project via the API. The correct way to do that is to first Deploy your module. This process will make a copy of from your Sandbox to the Live version. Note that this generates a different endpoint to call the module through the API. The live endpoint should be used in production and the sandbox version only for development, experimentation and testing purposes. With this feature you can keep modifying and experimenting with your classifier in the sandbox without affecting the production version (live) of your module.