Text Processing: What Is It?

Whenever customers use an app or service, send an email, or leave a comment on social media, they’re creating a data trail that contains a lot of valuable information for companies.

Luckily, you can manage large quantities of data in an effective, fast, and accurate way by combining text processing with machine learning.

Read on so you can learn more about what text processing is and how it works. Then, discover popular methods and tools for processing text.

What Is Text Processing?

Text processing is the automated process of analyzing and sorting unstructured text data to gain valuable insights. Using natural language processing (NLP) and machine learning, subfields of artificial intelligence, text processing tools are able to automatically understand human language and extract value from text data.

Since we naturally communicate in words, not numbers, companies receive a lot of raw text data via emails, chat conversations, social media, and other channels. This unstructured data is filled with insights and opinions about different topics, products, and services, but companies first need to organize, sort, and measure textual data to access this valuable information.

Product teams might use text processing to gather insights from customer feedback, to help them create their product roadmap, while customer support teams might use it to automate processes, like ticket tagging and routing.

Text Processing Methods and Tools

Now that you are more familiar with text processing, let’s have a look at some of the most relevant methods and techniques to analyze and sort text data.

Statistical Methods

At the heart of text processing are math and statistics. From frequency distribution, collocation, concordance, and TF-IDF, you can make use of all these statistical methods to process and analyze text.

You might be thinking, what do all these statistical approaches entail. Well, let’s give you a quick overview:

Word Frequency

This statistical method pinpoints the most frequently used words or expressions in a specific piece of text. With this particular insight, you can address problematic situations, identify success areas, and more.

Collocation

This method helps identify words that co-occur – meaning they commonly appear together. Bigrams (two adjacent words) and trigrams (three adjacent words) are the most common types of collocations found in text. For example, keep in touch or product launch are common collocations.

Concordance

Concordance is all about providing context – in essence, it helps decode the ambiguity of human language by analyzing how specific words are used in different contexts. For example, the word issue might be used for numerous scenarios such as a problem, a situation, a topic, or the act of supplying something:

There’s an issue with my account → problem

We have an issue to deal with → situation

It’s an important issue → topic

Your tracking number has been issued → supplied

TF-IDF

TF-IDF stands for term frequency-inverse document frequency. This metric gauges how important a word is to a document, but is offset by the number of documents that contain the word.

To make matters simple, here’s an example: the words ‘the’ or ‘and’ usually appear quite frequently in all documents, so they are not very useful for identifying the unique topics or themes discussed in a set of documents. In contrast, imagine that the word ‘RAM’ appears multiple times but only in one document. The ‘uniqueness’ of this word may provide some useful information to understand what that specific document is talking about.

Text Classification

Text classification classifies text into pre-defined groups based on its content, helping businesses to automatically sort and analyze their textual information. Some of the most popular text classification models include topic analysis, sentiment analysis, intent detection and language classification.

Topic Analysis

Topic analysis is a technique that interprets and categorizes large collections of text according to individual topics or themes.

With a topic analysis, you no longer have to dread the act of reading thousands of customer surveys or product reviews to identify the most talked-about topics about your product or service. Instead, you can have a readily available automated model that does just that.

For example, let’s say you work at Airbnb and have to sift through tens of thousands of online surveys about the service your platform is providing. But doing it manually is unsustainable, time-consuming, and tedious. With a topic analysis, you can do this in a matter of seconds.

You can define tags such as UX/UI, Quality, Functionality, and Pricing, to automatically find out which topic is firing up the survey results. Take this review for instance:

“I found the perfect little loft in the heart of the city. Love the look and feel of the mobile app, very easy to navigate and filter the best location-price combinations.”

A topic classifier would be able to process this information and automatically tag it under UX/UI.

Test MonkeyLearn’s unique feedback classifier to see how the model swiftly categorizes NPS responses for SaaS products into Ease of Use, Features, Pricing, and Support. It will give you a clear idea of how topic analysis classifies information according to topics.

Sentiment Analysis

Sentiment analysis automatically detects the emotional undertones of customer reviews, survey responses, social media posts, and so on. This sort of data helps companies learn and understand how customers feel about their brand, product, or service.

For example, sentiment analysis of Twitter data can help a company understand if customers are generally happy or angry with their brand or service. Take this tweet about Southwest:

This is clearly a negative tweet, and there are likely to be many other negative tweets mentioning the airline. By training a model to detect sentiment, you can delegate the task of categorizing texts into Positive, Neutral, and Negative, to machines. Not only does this help speed up the process, you can detect and prioritize negative comments, and respond to them as quickly as possible so that you avoid losing customers.

Test this pre-built sentiment classifier to get an idea of how it works.

Intent Detection

Intent classification automatically unearths the intent, goal or purpose behind text. This is particularly useful because it lets businesses know exactly where a user or lead is on their buyer journey.

Does a user express intent to purchase, unsubscribe or sign up via email or chat conversation, for example? Take this question, for example:

‘Your software is just what I’m looking for, but I’d like to know if you offer a more affordable package for startups?’

This text would be classified as Request for Information.

Here’s another example: Let’s say you go to a pet store and buy a bag of kibble for your furry friend. You are very pleased with the experience and send an email asking to be added to the newsletter to receive coupons:

“Thank you for being so nice to me and my dog T-Bone. I’d love to be a part of the newsletter to receive coupons and news events. Thanks”

With an intent classifier in place, the pet store would immediately classify your email as Subscribe to Newsletter. With a clear intent detected, you can easily classify user interactions and address each unique situation. In addition, it can help you identify when you need to send a follow-up message, or assist a customer to close a sale.

Play around with the email response classifier, which classifies intents such as Interested, Not Interested, Unsubscribe, Autoresponder, Email Bounce, and Wrong Person.

Language Classification

Language detection models classify text based on language.

Let’s set the scene. You are an online retailer with stores worldwide, which means that you receive customer support tickets in different languages all the time. A language detection model can automatically detect language for each text and route it to the appropriate localized teams.

Take Amazon for example. Amazon operates in many countries around the world. This means they receive support tickets in numerous languages. With a language detection classifier, tickets can easily be routed to the appropriate team to handle it. See this example:

“プロフィール写真を変更できる場所を誰かに教えてもらえますか？この新しいアップデートは役に立たない。”

A language detection classifier can easily detect this ticket is written in Japanese, helping businesses route it to a Japanese-speaking agent who can contact the client and address their issue.

Test MonkeyLearn’s language classifier for yourself, and see how it can identify over 49 different languages!

Text Extraction

Text extraction is a text processing technique that identifies and obtains valuable pieces of data that are present within the text. From keywords, client names, product details, dates, prices, or any other information within data, text extraction gets the job done.

Let’s examine keyword extraction and entity extraction.

Keyword Extraction

Keyword extraction automatically detects and extracts the most relevant words or expressions in text.

Take the NFL, for example. By examining Twitter mentions for a specific team or game, you can extract the keywords that are being communicated most often. When it’s Sunday Game Day, thousands of fans post their support for their teams:

“I’m a cheesehead through and through, Green Bay will go all the way this time! Here we go #NFL100 season! #GoPackGo”

The keyword extractor can automatically detect words and expressions, such as Green Bay, NFL100, GoPackGo, etc, which are representative of what is being talked about. This information can offer a glimpse into which teams are the favorites of the season, what cities are mentioned the most, which players are praised or criticized, etc. Companies can use this information to better target game-related strategies.

Type your own text into MonkeyLearn’s pre-trained keyword extractor and see the machine learning magic at work.

Entity Extraction

Entity extraction automatically obtains names of people, companies, brands, and more. It is particularly helpful when you’re trying to single out the names of competitors, brands, and people that influence your business to a certain extent.

You can use entity extraction to identify company branches that are receiving good and bad feedback. Take Bank of America for example. With different branches scattered across the United States, it is very important to keep track of events in specific locations, good and bad.

Check out this Twitter exchange:

This type of information can help Bank of America hone in on a bad situation, or replicate good actions across every branch.

Use our pre-trained company entity extractor to quickly extract company and organization entities from text in English.

Text Processing Use Cases and Applications

Text processing helps businesses automate processes and obtain valuable insights from data. This ultimately leads to better decision-making practices. In this section, we’ll focus on customer feedback and customer service, both of which can be enhanced with text processing tools.

Customer Feedback

Customer feedback is a key ingredient in any business strategy because it lets your customers know that you value their opinion. And, of course, it doesn’t hurt gaining valuable insights about your business, product, or service.

In general, customers use a number of platforms to express their opinions about your business, but the best way to get valuable feedback is through open-ended responses in surveys and product reviews. How can text processing tools help you make the most of this feedback?

Analyze Customer Surveys

Net Promoter Score is one of the most popular tools used by businesses to measure customer satisfaction, and typically asks your customers to rate your business on a scale of 0-10. For example, “How likely are you to recommend this brand to a friend or colleague?” Based on the results from that question, you can classify your customers as Promoters, Passives, or Detractors.

But an NPS survey doesn’t stop there. There is a follow-up question that prompts customers to elaborate on the reasons for their score. In this open-ended question, customers often express their feelings about the product or service, as well as the brand itself. That type of information is extremely insightful, but it’s also harder to analyze.

Text processing with machine learning enables you to extract these insights easily and quickly in various ways. You could use a keyword extractor to identify the most common expressions in survey responses. On the other hand, a topic classifier can categorize information based on topic, helping you to understand what topics or aspects customers mention the most. On top of this, you could analyze your NPS responses using sentiment analysis to find out how your customers feel about topics and aspects, a technique known as aspect-based sentiment analysis.

Analyze Product Reviews

Product reviews are like a compass that steer customers towards or away from products. Take the launch of the iPhone 11 Pro, for example. The yearly release of Apple’s latest smartphone generates a flurry of online discussions, which represent a magnificent source of information. These discussions provide Apple with a deep level of understanding about which features are a hit or miss, how customers feel about pricing, thoughts on aesthetics, and much more.

All of this data is out there, waiting to be dissected, which is where text processing comes in. By using machine learning, Apple can process millions of product reviews from every channel in just seconds, providing them valuable, up-to-date insights in seconds

Customer Service

Customer service is all about strengthening relationships and boosting customer loyalty. Typically, customer service teams deal with tons of customer queries, and with text processing you can automate processes so support agents can save precious time that could be better used to actually helping customers.

Automatically Tag Support Tickets

When customers send a request, ask about a product or service, or complain about an issue or bug, this information needs to be processed and handled. A big part of attending to support tickets involves processing each one to make sure the appropriate team takes ownership and handles the issue promptly and accurately.

But let’s call a spade, a spade: ticket categorization is boring and time-consuming. By coupling text processing with machine learning, you can automatically identify the topic of each support ticket and tag it accordingly.

Route and Triage Support Tickets

Once support tickets have been tagged, you’ll be able to route issues to the right person in real-time, reducing response times and making teams more efficient. Classifiers can help your business automatically route tickets by topic, language, urgency, and more, so, let’s say you receive a ticket tagged as Login Issues, this ticket will be passed onto the IT team.

Detect the Urgency of a Ticket

The ability to prioritize tickets based on urgency has a positive impact on your business. For example, you could use a sentiment analysis model to detect disgruntled customers or use an urgency detector to find issues that require immediate action.

Try Now: NPS Analysis Template (includes sentiment and topic analysis)

Now that you know what text processing can be used for, you probably want to give it a whirl! With MonkeyLearn’s plug-and-play templates, you can perform text analysis techniques in just a few clicks, and visualize the results in a striking dashboard.

Want to know how? Try out this NPS analysis template, which includes various text processing techniques.

Just Follow this tutorial, below:

1. Choose the NPS Analysis template

2. Upload your data

If you don't have a CSV file:

You can use our sample dataset.
Or, download your own survey responses from the survey tool you use with this documentation.

3. Match the CSV columns to the dashboard fields

created_at: Date that the response was sent.
text: Text of the response.
score: NPS score given by the customer.

4. Name your workflow

5. Wait for your data to import

6. Explore your dashboard!

Congratulations on getting to this point. Play around with your dashboard and filter by sentiment (negative, positive, or neutral), topic, keyword, NPS score or category – or all at the same time!

Once you’ve had a chance to be blown away by the results, share your NPS analysis dashboard with the rest of your team (just click on the ‘share’ button in the top right-hand corner).

And voila!

Wrapping up

Now, more than ever, customers rely on data to support everyday decisions.

Regardless of industry, businesses must put data at the heart of their strategies. Not only is text processing one mighty tool to have in your arsenal, it’s super easy to get started with.

Text processing helps discover valuable insights within customer feedback and is crucial for enhancing your customer service. If you’re using sentiment analysis, intent detection, entity extraction, or any of the other methods available, you’ll have insights at your fingertips, powering smarter decision-making within your business.

At MonkeyLearn, we provide easy-to-use machine learning models that can help you extract value from your data.

Prefer a customized view of how we can help you process your business data? Book a demo with one of our text processing experts.

Inés Roldós

November 15th, 2019

Posts you might like...

An Introduction To Conversational Analytics

You probably already know that a great way to grow your business is to listen to what your customers want and need from you. You also…

Rachel WolffMay 6th, 2022

How to Do Text Analysis & Visualize the Results in Tableau

Text analysis uses machine learning to automatically sort and classify unstructured text, like social media data, customer surveys, emails…

Tobias Geisler MesevageJune 24th, 2020

Top 5 Text Analytics APIs

Text analytics helps you transform unstructured data, like social media conversations, into quantifiable and actionable insights. It…

Inés RoldósJune 1st, 2020

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate business processes and save hours of manual data processing.

Try MonkeyLearn