Entity extraction, also called named entity extraction or named entity recognition (NER) is a text analysis technique that uses natural language processing (NLP) to identify named entities and extract them from raw text. Entity types can be people, organizations, locations, email addresses, monetary values, etc.
Sign up to MonkeyLearn and try out easy-to-use extraction tools. You can analyze up to 300 queries for free then purchase add-on packages to top up your queries. Read on to learn how to use MonkeyLearn’s extraction tools in Python and how to build your own custom entity extractors.
MonkeyLearn offers a suite of powerful SaaS text analysis tools and a simple APIs you can set up with just a few lines of code.
Pre-built information extraction tools and SaaS APIs in Python include: person extractor, company extractor, location extractor, and more.
Entity extraction is easy with MonkeyLearn’s Python API. Learn how to set it up, then we’ll show you how to create a custom entity extractor.
Just sign up to MonkeyLearn for free and follow along.
In the API tab you can see how to integrate using your own Python code (or Ruby, PHP, Node, or Java). We’ll begin with the MonkeyLearn Python API for the pre-trained company extractor.
The API will automatically access the extractor:
You can send plain requests to the API and parse the JSON responses yourself, but MonkeyLearn SDKs make integration easy.
Sign up for an API key, then download and install the Python SDK:
pip install monkeylearn
Enter the below to start running MonkeyLearn’s company extractor:
from monkeylearn import MonkeyLearn
ml = MonkeyLearn('<<Your API key here>>')
model_id = 'ex_A9nCcXfn'
data = ['first text', {'text': 'SpaceX is an aerospace manufacturer and space transport services company headquartered in California. It was founded in 2002 by entrepreneur and investor Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars.', 'external_id': 'ANY_ID'}, '']
response = ml.extractors.extract(model_id, data=data)
print(response.body)
You can change the model ID to try out other models: Go to your MonkeyLearn dashboard. Select the desired model; click ‘Run’; then ‘API’. The ID will be at the top of the page.
The output will be a Python dict generated from the JSON sent by MonkeyLearn – in the same order as the input text – and should look something like this:
[
{
'text': 'first text',
'external_id': None,
'error': False,
'extractions': []
}, {
'text': 'SpaceX is an aerospace manufacturer and space transport services company headquartered in California. It was founded in 2002 by entrepreneur and investor Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars.',
'external_id': 'ANY_ID',
'error': False,
'extractions': [{
'tag_name': 'COMPANY',
'extracted_text': 'SpaceX',
'parsed_value': 'SpaceX',
'count': 1
}]
}, {
'text': '',
'external_id': None,
'error': True,
'error_detail': 'Invalid text, empty strings are not allowed',
'extractions': None
}
]
Now that you have the simple setup down, you can try out other models or learn how to train your own. Follow along below and, in just five more steps, you’ll have a custom model that you can call in Python.
Building your own model will help you get the most out of text extraction. Follow along to train a model with our sample training data or upload your own. It’s an easy process, so if you don’t have your own dataset handy, you can always go through the tutorial for a quick intro and come back when you have it.
Quickly sign up to MonkeyLearn for free. In your dashboard, click 'Create Model' and choose ‘Extractor’.
Upload a CSV or Excel file, connect to one of the many app options, or use one of our sample data sets. This example uses ‘Laptop Features’ CSV from the MonkeyLearn data library.
If your sheet has more than one data column, select which column you’d like to use. Click ‘Continue.’
These are the “tags” that will define your named entities. Begin with at least one – you can always add more later.
Here is an example where “entities” can go far beyond just peoples’ names, addresses, etc. We will be tagging laptop stats by “Brand,” “Model,” and “Storage.”
Manually tag relevant words with the tag tab in the right column. After you’ve tagged a few, you’ll notice the model will begin making predictions. Correct the tag, if predicted incorrectly.
If multiple words or numbers need to be included in a single tag, you may have to hold ‘Option’ while you select, so that they are included together.
Once you’ve trained your model, you’ll be prompted to name it. From there you can test the model. Enter text directly or cut and paste; click ‘Extract Text’ to test.
The more you train your model, the better it will perform. This is especially true for language specific to certain industries, but the models generally learn quite fast.
Once your extractor is properly trained, it’s ready to get to work with automatic analysis. You can upload a file for batch processing, connect to the API, or try one of our available integrations.
Paste the simple code below, and you’re ready to go:
from monkeylearn import MonkeyLearn
ml = MonkeyLearn('<<Your API key here>>')
model_id = '<<Model ID>>'
data = ['first text', {'text': <<Text Example>>, 'external_id': 'ANY_ID'}, '']
response = ml.extractors.extract(model_id, data=data)
Take a look at our docs for full API documentation and features.
Entity extraction can save time performing a number of tasks, and you can set your model up to extract any specific text you need. Best of all, once your model is properly trained you don’t have to worry about accuracy.
Once you get started with text analysis, you can try out even more advanced (but still easy-to-use) tools that MonkeyLearn has to offer. Click any of the below to try now for free:
Sign up to MonkeyLearn and find out what powerful machine learning SaaS tools can do to help your business.
June 17th, 2020