Classification

What is Text Classification?

Text Classification Text classification modules are used to classify information, that is, assign a category to a text, also known as tagging. A category is a label and categories are structured in a hierarchical category tree. For example, the following category tree shows categories related to retail products: A machine learning classifier, learns to assign […]

How do Classifiers Work?

One of the key details you should be familiar with when building your own classifier is how the classes (or categories) are structured and how the process of classification and training is implemented. The category tree As you probably already know, categories in MonkeyLearn are hierarchical, each category may have subcategories. There’s a special category […]

Category Trees

A category tree is the way categories (the tags that you want to assign to your texts) are organized into hierarchies. The term “tree”, is used for this structure because its graphical representation resembles a tree, although the root of the tree is the top node and the bottom nodes are the “leaves” of the […]

Gathering Training Data

Training samples, also known as datasets are used to give information to the classifier to let it learn to associate texts to their corresponding categories. This is the way we train or “teach” our classifiers. From the samples, the machine learning model automatically learns to generalize “rules” to classify new unlabeled texts. In MonkeyLearn, training […]

Tagging Data

After gathering the data (text samples), you’ll have to tag them into the following defined categories in order to build a Training Set: For example, if you have the following categories: Entertainment & Recreation Food & Drinks Health & Beauty Retail Travel & Vacations Miscellaneous The data shall be saved in a CSV or Excel file […]

Creating Custom Classifiers

  Creating a Module First, create a new module by clicking the Create Module button at the top  of your screen:   A three step dialog starts, where you will specify the characteristics of your classifier. Name, Description, Permissions and Module Type Name: the name of the module. Description: that should explain the functionality of your module. […]

Classifier Statistics

Part of the training process consists on running some tests to evaluate the prediction performance of the classifier. Before going on, make sure you have read the How do Classifiers Work? reference page. In the Sandox/Tree tab, when you select a category node from the Category Tree you will be able to see the statistics for the selected […]

Sandbox & Live Classifiers

For any given classifier you create you’ll have a Sandbox version you can modify and train and a Live version. This allows you to work on your classifier in the sandbox while your production site is safely using the live version without downtimes. Public modules only expose their live classifier since they are read only. […]

Classifier Parameters

Classifier settings can be set when creating a new classifier or modified from the settings tab. This settings may have great impact in the performance of the classifier, and the correct values to use depend on the particular classification problem you want to resolve. If you edit any of these settings, you must retrain the project […]

CSV/Excel Data Files

MonkeyLearn classifiers use Comma Separated Values (CSV) or Excel files when importing data into classifiers and CSV when exporting its data (the category tree and their samples). The following sections show more details on the format accepted by MonkeyLearn. CSV Primer CSV files are just plain text files, with a specific format to represent rows and columns: […]