Text Classification vs Text Extraction: What’s the Difference?

Text analysis is the process of automatically organizing and evaluating unstructured text (documents, customer feedback, social media, email, etc.). It uses machine learning with natural language processing (NLP) to break down text and “understand” it, in order to gather information, structure data, and reach conclusions, much as a human would.

The most useful text analysis techniques are text extraction and text classification, which can help you quickly glean data-driven insights at scale.

In this article, we’re going to describe the main differences between classifiers and extractors, when to use each analysis type, and when to combine the two.

Let’s start with extractors.

What Is Text Extraction?

Text extraction, often referred to as keyword extraction, uses machine learning to automatically scan text and extract relevant or core words and phrases from unstructured data like news articles, surveys, and customer service tickets.

Text extraction can be used to:

Extract entities

A sub-task of keyword extraction is entity extraction (or entity recognition), used to pull out important data points, like names, organizations, and email addresses to automatically populate spreadsheets or databases.

Here's an example of how an extractor might pull out various specified entities from one piece of text:

Example of entity extraction “SpaceX [Company name

Try out this pre-trained name extractor:

Test with your own text

Results

TagValue

PERSONElon Musk

Extract specific information

You can also specify other types of information that need to be extracted, such as product specifications (model, memory, color, brand, size, material, etc.)., like in this example:

Test with your own text

Results

TagValue

brandiView

display11.6

cpuIntel Bay Trail Z3735F

memory2GB

disk32GB

Extract keywords

Extract keywords from text using the keyword extraction model, which immediately ascertains that this customer tweet is about a customer order, then continues to extract the most pertinent words and phrases about the customer experience.

Try out this pre-trained keyword extractor:

Test with your own text

Results

TagValue

KEYWORDelon musk

KEYWORDsecond image

KEYWORDspacesuit

KEYWORDbody look

KEYWORDnew design

KEYWORDphoto

KEYWORDspacex

Summarize a text or document

Another way in which you can use text extraction is to find the most relevant words and phrases from a data set, social media posts, emails, and more. It can effectively summarize thousands of customer feedback responses and huge documents by extracting only the most used and most important words and phrases.

Try out this pre-trained summary extractor:

Test with your own text

Results

TagValue

SUMMARYThe virus that causes COVID-19 is usually transmitted through droplets generated when an infected person coughs, sneezes, or exhales.

Clean data

MonkeyLearn's email cleaner automatically removes signatures, confidentiality clauses, and previous replies within an email thread, so you end up with a “clean” version with only the most recent reply.

email extractor tool removing signature information

Text extraction is also great for “cleaning” text to eliminate irrelevant information that could distort the accuracy of your results.

Our boilerplate extractor extracts only relevant text from HTML. It can be used on websites or emails to remove clutter, like templates, navigation bars, and ads.

boilerplate extractor tool extracting text from html

Imagine performing text extraction on all manner of customer feedback, across dozens of platforms, and in real time.

The great thing about machine learning is that you can train text extraction models for specific fields and tasks with unfailing accuracy.

What Is Text Classification?

Text classification is the process of automatically assigning predefined tags or groupings to text that relate to its content. Just like text extraction, text classification can be performed on all manner of unstructured text, like support tickets, emails, customer feedback, web pages, social media, and more.

Text classification can be used to categorize words, phrases, and entire texts by subject, topic, sentiment, intent, and more.

Try out this NPS feedback analyzer. This model is pre-trained to tag survey responses by categories: Customer Support, Ease of Use, Features, and Pricing.

Test with your own text

Results

TagConfidence

Customer Support61.9%

Using advanced machine learning algorithms and natural language processing (NLP), text classification tools can even sort text by sentiment (positive, negative, neutral, and beyond) to understand the opinion and emotion of the writer.

Try out this pre-trained sentiment analyzer:

Test with your own text

Results

TagConfidence

Negative99.9%

For even more accuracy, learn how to train a custom sentiment analysis model specific to your needs and criteria. The more you train your model, the more accurate it will become.

Classification models can analyze thousands of texts in just minutes, and once your data is categorized and properly structured, you can perform even more comprehensive analyses.

Text Extraction vs Text Classification

The primary difference between text classification and text extraction relates to where the analysis result comes from.

Text extraction tools pull entities, words, or phrases that already appear in the text: the model extracts text based on predetermined parameters.
Text classification tools categorize text by understanding its overall meaning, without predefined categories being explicitly present within text.

The comment below about a new software purchase shows how extraction and classification work differently:

“While I think the new price is too expensive, it is considerably faster and the new interface is easy to use.”

A text extractor can pull out actual keywords and phrases, like “too expensive,” “considerably faster,” and “easy to use.”
A text classifier, on the other hand, would sort this feedback into predefined categories, like Price, Performance, and Usability, or perform sentiment analysis to classify the first half of this statement as Negative and the second half, Positive.

So, in general, extractors pull out information related to tags and classifiers sort information related to categories.

Text Extractors or Classifiers: Which to Use and When?

It all depends on your business needs. But, as a general rule of thumb, text analysis is most powerful when you use extraction and classification together. Let’s jump in and see how we’d use both techniques for a few different business use cases.

Customer Feedback Analysis

Learn what customers love or hate about your brand, detect trending topics, and align your product or service with your customers’ needs.

Using text analysis tools, you can gather unstructured customer feedback from open-ended surveys, social media posts, blogs, emails, and more. Text analysis can offer insights where you may have never even thought possible. You can search the web for unsolicited feedback about your company or products, or wade through thousands of pages of emails or customer surveys in just minutes.

Extraction Techniques

Extract the names and locations of your customers.
Pull the most used and most relevant words and phrases from surveys, customer service tickets, and social media posts, for example. You’ll know what’s trending, what are recurring problems, and what your customers like most about your business.

Classification Techniques

Classify feedback into categories, so you get a view of different areas within your business or individual specs or attributes of a product or service.
Perform sentiment analysis on individual comments or read through thousands for data-driven insights into your customers’ opinions and emotions.

Extraction and Classification Combined

Use extraction and classification in concert for even more fine-grained results. Let’s go back to the customer comment for an example: “While I think the new price is too expensive, it is considerably faster and the new interface is easy to use.”

We could first use the MonkeyLearn opinion unit extractor, to break this sentence into three distinct statements: “new price is too expensive,” “it is considerably faster,” and “new interface is easy to use.” Then, we could perform sentiment analysis on each opinion unit. Of course, product reviews and social media comments often have varied statements within them, so it’s necessary to break them up into individual opinions to get truly accurate results.

Another example would be performing keyword extraction on Facebook comments about a new product to detect which topics customers mention most often, then use those topics as your predefined tags in a topic classifier.

You could even combine a topic classifier with a sentiment analyzer (known as aspect-based sentiment analysis) for an even deeper analysis of your Facebook comments.

With deep learning SaaS tools, you can set up a number of extraction and classification techniques to work in unison, automatically, for extremely in-depth, accurate results.

Learn more about what customer feedback analysis can do for your company.

Customer Support Analysis

Automate your processes, make sure the most urgent requests are taken care of right away, and improve and expedite your customer service efforts.

Web support tickets, emails, chatbots – businesses can receive thousands (or more) of customer support queries on a daily basis. Text analysis software can help you organize and route any manner of support tickets to the proper department or individual employee.

Extraction Techniques

Extract names, addresses, and emails and automatically populate databases of customer information.
Use location extraction to find out what geographical area may be having more issues than others.
Analyze thousands of support tickets to extract keywords and uncover common and recurring complaints.

Classification Techniques

Organize support tickets by brand, product name, or category (Shipping, Returns, Service Agreement, etc.) and automatically route them to the proper department.
Use sentiment analysis to read tickets for the degree of urgency, irritation, or satisfaction.

Extraction and Classification Together

Use the email extractor to detect and remove unnecessary or redundant text, like signatures, confidentiality clauses, and previous replies. Then sort each email by topic and route them to the correct department.

Couple keyword extraction and sentiment analysis to analyze employee responses for consistent company tone-of-voice or use it to mirror a customer’s tone and manner of speaking. Once your extractors and classifiers are properly trained to your business, you’ll save hundreds of human hours and never leave a customer in the lurch.

Brand Monitoring

Monitor your business 24/7 on social media, in real time, and over many years, so you can spot patterns and inconsistencies.

Machine learning tools can automatically track news reports, social media, online reviews, chats, and more about your brand (and your competitors), then organize and analyze the unstructured data to ensure you’re always making data-driven decisions.

Extraction Techniques

Search your area of expertise to extract keywords and find out what’s trending.
Compare keywords related to your brand over time, to see how they have changed.
Use location extraction to discover where the majority of your customers are located and where you may need to build your base.

Classification Techniques

Perform sentiment analysis on product or service reviews, over time, to find out if your brand is rising or falling.
Follow product rollouts and marketing campaigns to find which work best and why.
Monitor social media for negative comments and put out small fires before they become viral, or use positive comments to further improve your image.

Extraction and Classification Together

Monitor Twitter for brand mentions and use the opinion unit extractor to break full tweets into individual statements, then perform sentiment analysis. This way, you’ll know your results will be thorough and accurate.

Here are some example of how you might combine the two for brand monitoring:

Perform sentiment analysis on a tweet with both positive and negative statements without first separating into opinion units, the overall sentiment would just be graded as “Neutral.”
Extract the names of organizations that are similar to your own from news reports, find out which are mentioned the most, then analyze for positive to negative polarity and how it relates to your company.
Extract reviews of your competition’s new product releases. Sort these reviews by individual features and perform sentiment analysis to find out what features users don’t like. Then you can swoop in with a better feature and game the competition.

Advantages and Disadvantages

Each text analysis technique has its advantages and disadvantages:

With text extractors you can detect new topics, themes, trends, and business competition right as they emerge – a properly trained extractor will be constantly searching for new keywords and organizations. Text extraction is a “dynamic” approach, pulling the actual names, words, and expressions used within text data, so your business will remain at the cutting edge.

There isn’t a predefined set of categories, so your analysis won’t be guided toward specific goals or data points that you may need to uncover. Your results will be more diverse and heterogeneous and less acute.

With text classifiers, your output will fall into a predefined set of categories that match your own criteria, so you’ll end up with the results you need.

If new topics emerge, however, you won’t be able to capture them unless you add more tags to your model and re-train it. Classification is more of a “static” approach.

In Summary...

Extraction and classification are clearly both effective tools for analyzing unstructured text data to obtain insights about your company, your customers, and your competitors. However, when used together, you can see that your results will develop even further.

There are huge amounts of unstructured data about your company online, in emails and chats, in surveys, and more, that once properly structured and analyzed, can be downright revelatory.

MonkeyLearn’s SaaS text analysis tools are easy-to-use, entirely scalable, and can be put to work on dozens of types of text analysis techniques.

Request a demo to learn more about MonkeyLearn’s powerful text analysis tools and get the most out of your text data.

Rachel Wolff

August 14th, 2020

Posts you might like...

5 Types of Classification Algorithms in Machine Learning

Classification is a natural language processing task that depends on machine learning algorithms . There are many different types of…

Rachel WolffAugust 26th, 2020

Multi-Label Classification: Overview & How to Build A Model

Multi-label classification is an AI text analysis technique that automatically labels (or tags) text to classify it by topic. This differs…

Rachel WolffJune 8th, 2020

Best Text Classification APIs – Automatically Organize Data

You can choose between open-source and SaaS text classification APIs to connect your unstructured text to AI tools. Open-source libraries…

Tobias Geisler MesevageMay 28th, 2020

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate business processes and save hours of manual data processing.

Try MonkeyLearn