A category tree is the way categories (the tags that you want to assign to your texts) are organized into hierarchies.
The term “tree”, is used for this structure because its graphical representation resembles a tree, although the root of the tree is the top node and the bottom nodes are the “leaves” of the tree. In MonkeyLearn we use trees to represent the categories that are used to tag text information.
So the first step in building a text classifier consists in designing a category tree that organizes the tags that we want to assign to data.
If you want to assigne the sentiment of an opinion in text, you probably have categories like: Negative, Neutral and Positive:
If you want to assign the topic of a news article, you could have categories like: Sports, Politics, Science, etc.
You can define a hierarchical tree when you want to have subcategories. For example, let’s say that we want to be more specific and assign subcategories within Politics, Science and Sports like the following hierarchy:
You can be as specific as you want and add more subcategories as needed.
Tips to design a good Category Tree
- Try to organize categories according to their semantic relations. For example: Cell Phones and Laptops should be children of Electronics because they are a specific types of electronic devices.
- Try to declare sibling categories that are disjoint. That is, avoid defining categories that are ambiguous or have overlapping, there should be no doubt in which category a text should be placed.
- Make sure you have a label for each type of text you want to classify. When you have a text input, you should always have a corresponding category where the text should be assigned.
To create a category tree you basically have two options.
Create a category tree through the GUI
You can design and create your own category tree directly through MonkeyLearn GUI. When you just created a classification module, the category only has one category node, called Root. This node is the basis of every category tree and can not be deleted. To start creating your own category you can click the contextual menu in the root category:
Create a category tree by uploading a CSV/Excel file
You can also upload samples and the tree structure with a CSV or Excel file. The file must have the following format:
|text sample 1||category sample 1|
|text sample 2||category sample 2|
|text sample N||category sample N|
Each row will be a sample, being the first column the sample’s text content and the second column the sample’s category label. Uploading samples with a CSV/Excel file into MonkeyLearn is as easy as following a simple wizard. You can read more about it in the CSV/Excel file documentation.
Modifying your category tree
If you want to improve your category tree, you can use the contextual menus in every node:
Add child as we have used before, allows to create a new child node in the selected node.
Rename allows to change the name of a node.
Change parent allows to move a node to another parent. A dialog will popup in order to select the new parent where the category will be moved.
Delete all samples deletes all the samples corresponding to the selected node.
- Delete category allows to remove a category from the tree. A dialog will popup to confirm the operation and to select what to do with the corresponding samples, three options are possible:
- Delete all the corresponding samples.
- Move the corresponding samples to the parent category.
- Select a new category where the corresponding samples will be transferred.