Text analysis is the process of automatically organizing and evaluating unstructured text (documents, customer feedback, social media, email, etc.). It uses machine learning with natural language processing (NLP) to break down text and “understand” it, in order to gather information, structure data, and reach conclusions, much as a human would.
The most useful text analysis techniques are text extraction and text classification, which can help you quickly glean data-driven insights at scale.
In this article, we’re going to describe the main differences between classifiers and extractors, when to use each analysis type, and when to combine the two.
Let’s start with extractors.
Text extraction, often referred to as keyword extraction, uses machine learning to automatically scan text and extract relevant or core words and phrases from unstructured data like news articles, surveys, and customer service tickets.
Text extraction can be used to:
A sub-task of keyword extraction is entity extraction (or entity recognition), used to pull out important data points, like names, organizations, and email addresses to automatically populate spreadsheets or databases.
Here's an example of how an extractor might pull out various specified entities from one piece of text: “SpaceX [Company name] was founded by Elon Musk [Person] in __California [Place] .”
You can also specify other types of information that need to be extracted, such as product specifications (model, memory, color, brand, size, material, etc.)., like in this example above.
Extract keywords from text using the keyword extraction model, which immediately ascertains that this customer tweet is about a customer order, then continues to extract the most pertinent words and phrases about the customer experience.
Another way in which you can use text extraction is to find the most relevant words and phrases from a data set, social media posts, emails, and more. It can effectively summarize thousands of customer feedback responses and huge documents by extracting only the most used and most important words and phrases.
Text extraction is also great for “cleaning” text to eliminate irrelevant information that could distort the accuracy of your results.
MonkeyLearn's email cleaner automatically removes signatures, confidentiality clauses, and previous replies within an email thread, so you end up with a “clean” version with only the most recent reply.
Our boilerplate extractor extracts only relevant text from HTML. It can be used on websites or emails to remove clutter, like templates, navigation bars, and ads.
Imagine performing text extraction on all manner of customer feedback, across dozens of platforms, and in real time.
The great thing about machine learning is that you can train text extraction models for specific fields and tasks with unfailing accuracy.
Text classification is the process of automatically assigning predefined tags or groupings to text that relate to its content. Just like text extraction, text classification can be performed on all manner of unstructured text, like support tickets, emails, customer feedback, web pages, social media, and more.
Text classification can be used to:
Take a look at the below example using MonkeyLearn’s NPS feedback analyzer. This model is pre-trained to tag survey responses by categories: Customer Support, Ease of Use, Features, and Pricing.
Using advanced machine learning algorithms and natural language processing (NLP), text classification tools can even sort text by sentiment (positive, negative, neutral, and beyond) to understand the opinion and emotion of the writer.
Here’s an example from MonkeyLearn’s pre-trained sentiment analyzer:
For even more accuracy, learn how to train a custom sentiment analysis model specific to your needs and criteria. The more you train your model, the more accurate it will become.
Classification models can analyze thousands of texts in just minutes, and once your data is categorized and properly structured, you can perform even more comprehensive analyses.
The primary difference between text classification and text extraction relates to where the analysis result comes from.
The comment below about a new software purchase shows how extraction and classification work differently:
“While I think the new price is too expensive, it is considerably faster and the new interface is easy to use.”
So, in general, extractors pull out information related to tags and classifiers sort information related to categories.
It all depends on your business needs. But, as a general rule of thumb, text analysis is most powerful when you use extraction and classification together. Let’s jump in and see how we’d use both techniques for a few different business use cases.
Learn what customers love or hate about your brand, detect trending topics, and align your product or service with your customers’ needs.
Using text analysis tools, you can gather unstructured customer feedback from open-ended surveys, social media posts, blogs, emails, and more. Text analysis can offer insights where you may have never even thought possible. You can search the web for unsolicited feedback about your company or products, or wade through thousands of pages of emails or customer surveys in just minutes.
Use extraction and classification in concert for even more fine-grained results. Let’s go back to the customer comment for an example: “While I think the new price is too expensive, it is considerably faster and the new interface is easy to use.”
We could first use the MonkeyLearn opinion unit extractor, to break this sentence into three distinct statements: “new price is too expensive,” “it is considerably faster,” and “new interface is easy to use.” Then, we could perform sentiment analysis on each opinion unit. Of course, product reviews and social media comments often have varied statements within them, so it’s necessary to break them up into individual opinions to get truly accurate results.
Another example would be performing keyword extraction on Facebook comments about a new product to detect which topics customers mention most often, then use those topics as your predefined tags in a topic classifier.
You could even combine a topic classifier with a sentiment analyzer (known as aspect-based sentiment analysis) for an even deeper analysis of your Facebook comments.
With deep learning SaaS tools, you can set up a number of extraction and classification techniques to work in unison, automatically, for extremely in-depth, accurate results.
Learn more about what customer feedback analysis can do for your company.
Automate your processes, make sure the most urgent requests are taken care of right away, and improve and expedite your customer service efforts.
Web support tickets, emails, chatbots – businesses can receive thousands (or more) of customer support queries on a daily basis. Text analysis software can help you organize and route any manner of support tickets to the proper department or individual employee.
Use the email extractor to detect and remove unnecessary or redundant text, like signatures, confidentiality clauses, and previous replies. Then sort each email by topic and route them to the correct department.
Couple keyword extraction and sentiment analysis to analyze employee responses for consistent company tone-of-voice or use it to mirror a customer’s tone and manner of speaking. Once your extractors and classifiers are properly trained to your business, you’ll save hundreds of human hours and never leave a customer in the lurch.
Monitor your business 24/7 on social media, in real time, and over many years, so you can spot patterns and inconsistencies.
Machine learning tools can automatically track news reports, social media, online reviews, chats, and more about your brand (and your competitors), then organize and analyze the unstructured data to ensure you’re always making data-driven decisions.
Monitor Twitter for brand mentions and use the opinion unit extractor to break full tweets into individual statements, then perform sentiment analysis. This way, you’ll know your results will be thorough and accurate.
Here are some example of how you might combine the two for brand monitoring:
Each text analysis technique has its advantages and disadvantages:
With text extractors you can detect new topics, themes, trends, and business competition right as they emerge – a properly trained extractor will be constantly searching for new keywords and organizations. Text extraction is a “dynamic” approach, pulling the actual names, words, and expressions used within text data, so your business will remain at the cutting edge.
There isn’t a predefined set of categories, so your analysis won’t be guided toward specific goals or data points that you may need to uncover. Your results will be more diverse and heterogeneous and less acute.
With text classifiers, your output will fall into a predefined set of categories that match your own criteria, so you’ll end up with the results you need.
If new topics emerge, however, you won’t be able to capture them unless you add more tags to your model and re-train it. Classification is more of a “static” approach.
Extraction and classification are clearly both effective tools for analyzing unstructured text data to obtain insights about your company, your customers, and your competitors. However, when used together, you can see that your results will develop even further.
There are huge amounts of unstructured data about your company online, in emails and chats, in surveys, and more, that once properly structured and analyzed, can be downright revelatory.
MonkeyLearn’s SaaS text analysis tools are easy-to-use, entirely scalable, and can be put to work on dozens of types of text analysis techniques.
Request a demo to learn more about MonkeyLearn’s powerful text analysis tools and get the most out of your text data.
August 14th, 2020