What Is Named Entity Recognition?
Named entity recognition (NER) ‒ also called entity identification or entity extraction ‒ is an information extraction technique that automatically identifies named entities in a text and classifies them into predefined categories. Entities can be names of people, organizations, locations, times, quantities, monetary values, percentages, and more.
Here’s an example of how NER works:
Entity extraction is really useful for analyzing unstructured text (written content that isn’t organized in any way). Think about business data, for instance: emails, social media posts, customer support tickets, online surveys, product reviews…all this text-based data can provide companies a lot of meaningful information. However, processing such large volumes of data requires having the right tools and technology.
With named entity recognition, you can obtain key information to understand what a text is about, making it a great starting point for all kinds of text analysis and data organization.
How Does Named Entity Recognition Work?
When we read a text, we naturally recognize named entities as people, values, locations, and so on.. For example, in the sentence “Mark Zuckerberg is one of the founders of Facebook, a company from the United States” we can identify three types of entities:
- “Person”: Mark Zuckerberg
- “Company”: Facebook
- “Location”: United States
For computers, however, recognizing entity types in human language (which is often complex and ambiguous) is not that simple.
Natural Language Processing (NLP) ‒ a subfield of Artificial Intelligence ‒ aims to bridge this gap by helping machines understand human language.
NLP studies the structure and rules of language, and creates intelligent systems capable of deriving meaning from text and speech, helping you solve problems like text classification and text extraction. Named entity recognition is an essential NLP task that allows us to spot the main entities in a text.
The most popular ways of extracting entities from text include:
The lexicon approach relies on a knowledge base called ontology, which contains all the words or terms related to a particular topic, grouped in different categories. For example, you could use a lexicon of cities, states, and countries to recognize locations in data. When provided with input data, the system looks for matches with named entities. The downside with this approach, however, is that it doesn’t work to extract new words that are not in the lexicon.
Rule-based systems for entity extraction employ a series of grammatical rules hand-crafted by computational linguists. Rules work well to extract entities like street names, phone numbers, social security numbers, or any other type of data that follows specific patterns.
With rule-based systems, you can get results of high precision but low recall. This means that, while most of the predictions for predefined categories are true positives (e.g, the majority of the words that a model tags as “company name” are actually companies), the ability of a model to identify all relevant instances a company is mentioned is low.
Defining rules and patterns takes time and they can’t be adapted to new domains; they only work well for the purpose they’ve been created, and it’s hard to modify them.
Machine Learning-Based Systems
Machine learning-based systems learn to recognize entities in text based on previous examples they’ve seen.
To build an entity extractor, you need to feed the model with a large volume of annotated training data (including positive and negative examples), so that it can learn what an entity is. For instance, if you want to build a model to extract “locations”, you need to manually tag names of cities, countries, venues, etc, and other text as “not locations”. The more examples you tag, the more accurate your model will be.
With this approach, the model becomes smarter over time as it learns from new examples.
The hybrid approach combines machine learning with rule-based systems. Basically, it consists of a model that’s been trained with tagged examples of data, which is then fine-tuned with a series of hand-crafted rules to improve accuracy. With this approach, you can extract entities with a high level of precision.
Real World Use Cases of Named Entity Recognition
Named entity recognition has many different applications, either as a standalone tool or as a necessary step for more complex NLP tasks, such as question answering, text summarization, or machine translation.
In business, entity extraction can be used to improve many routine processes. Here are some interesting use cases:
Categorizing Tickets in Customer Support
As your company starts dealing with a rising number of customer support tickets, you’ll need to implement a new customer service strategy to handle customer requests in a fast, scalable, and effective way. Automating repetitive tasks like ticket tagging can save you valuable time and improve your resolution rates, boosting customer satisfaction.
You can use entity extraction to pull relevant pieces of data from your incoming tickets, like company names, product names, or series numbers, making it easier to route tickets to the most suitable agent or team for handling that issue.
Online reviews are a great source of customer feedback: they can provide rich insights about what clients like and dislike about your products, and the aspects of your business that need improving.
Let’s say you want to analyze reviews about your bank. You could use NER systems to easily extract locations, like local branches, mentioned by your clients. That way, you could detect which branches clients mention most often, and investigate why they’re mentioning these particular branches. For example, are they being mentioned in a positive or negative way, and can you detect trends and patterns that coincide with a particular incident?
When analyzing reviews related to software or tech, it may be useful to train a NER extractor to pull specific product names or models. This way, you can send relevant support tickets to the teams allocated to each product. For example, large companies often have one product manager per product or feature.
Recruiters spend many hours of their day going through resumes, looking for the right candidate. Each resume contains the same type of information, but they’re often organized and formatted differently: a classic example of unstructured data.
By using an entity extractor, recruitment teams can instantly extract the most relevant information about candidates, from personal information (like name, address, phone number, date of birth and email), to data related to their training and experience (such as certifications, degree, company names, skills, etc).
How to Do Named Entity Recognition
Unless you are interested in developing a system from scratch (which would be the most complex way to go), the easiest way to get started with named entity recognition is using an API. Basically, you can choose between two types:
Open-source APIs are for developers: they are free, flexible, and entail a gentle learning curve. Here are a few options:
- Stanford Named Entity Recognizer (SNER): this JAVA tool developed by Stanford University is considered the standard library for entity extraction. It’s based on Conditional Random Fields (CRF) and provides pre-trained models for extracting person, organization, location, and other entities.
- SpaCy: a Python framework known for being fast and very easy to use. It has an excellent statistical system that you can use to build customized NER extractors.
- Natural Language Toolkit (NLTK): this suite of libraries for Python is widely used for NLP tasks. NLKT has its own classifier to recognize named entities called ne_chunk, but also provides a wrapper to use the Stanford NER tagger in Python.
SaaS tools are ready-to-use, low-code, and cost-effective solutions. Plus, they are easy to integrate with other popular platforms.
MonkeyLearn, for example, is a text analysis SaaS platform that you can use for different NLP tasks, one of which is named entity recognition. You can use the ready-built API to integrate pre-trained entity extraction models, or you can easily build your own custom named entity extractor in just a few simple steps.
Let’s take a look at each option:
MonkeyLearn’s Pre-Trained Models for Entity Extraction
If you want to get started right away, pre-trained models are your best option. At MonkeyLearn you’ll find a public model for entity extraction which can label persons, locations, and organizations. Using a pre-trained model is simple and fast: you just have to paste the text you want to analyze into MonkeyLearn’s model interface, and click on “Extract Text”.
How to Build a Custom Entity Extractor with MonkeyLearn
If you want to get the most out of entity extraction, you’ll need to build your own extractor. This way you can train your model with relevant examples, and sort text with your own predefined categories and criteria.
Creating a custom NER model with MonkeyLearn is really simple, you just need to follow these steps:
- Create a new model.
- Import your data. You can upload a CSV or excel file, connect to an app, or use one of our sample data sets.
- Select the column with the data you’d like to use to train your model.
- Define the tags for your model. These are the categories you will use for your entity extractor. Write at least one, you can add more later.
- Start training your model. Manually tag the words by choosing a tag from the right and clicking on the word that matches that tag. You’ll need to tag several examples to train your model. After a while, the model will start making its own predictions.
- Put your model to work. Now that you’ve trained your entity extractor, you can start analyzing your data. There are several ways to do this: upload a file for batch processing, connect to the API, or try one of our available integrations.
Named entity recognition (NER) helps you easily identify the key elements in a text, like names of people, places, brands, monetary values, and more. Extracting the main entities in a text helps sort unstructured data and detect important information, which is crucial if you have to deal with large datasets.
Companies can use NER to label relevant data in customer support tickets, detect entities mentioned in customer feedback, and easily extract important information, like contact information, location, dates, among other things.
Using entity extraction APIs (whether it’s through open-source libraries or SaaS tools) is the most popular way to get started with named entity recognition. Deciding on the best option, however, will depend on your skills, as well as the time and resources you’d like to invest.
With MonkeyLearn’s low code, no code approach, you can perform entity extraction in a quickly and easily. Use our pre-trained entity extraction model, or build a custom NER extractor by following a few simple steps.
Ready to see how it works? Sign up now to get started right away!
Sign up to our Newsletter
Receive awesome Machine Learning posts and tutorials!