New Keyword Extraction: smarter and more flexible extractor

New Keyword Extraction: smarter and more flexible extractor

We are excited to announce that we have made several improvements to our Keyword Extraction. You can try it for free here, you just need a MonkeyLearn account.

This model extracts keywords from text in English. Keywords can be compounded by one or more words and are defined as the important topics in your content and can be used to index data, generate tag clouds or for searching.

This keyword extraction algorithm employs statistical algorithms and natural language processing technology to analyze your content and identify the relevant keywords.

Our customers already love this extractor and are using it to empower all kind of applications. We decided to take it a step further and make some improvements to make this model the most powerful keyword extractor on the market.

Improvements

Now the response of this extractor will include the count and positions in the text for the different keywords it extracts. 

Also, by using parameters in the API, you can turn on and off the following behaviors:

  • Number of keywords: Set the maximum amount of keywords to extract, defaults to 10.
  • Capitalization: Lowercase all the given keywords, defaults to 0 (false).
  • Company Names: Expand company names, if in the text appears the word ‘Google’ and in other part appears ‘Google Inc.’, the word Google’ will be expanded to ‘Google Inc.’. Defaults to 0 (false).
  • Stemming: Take words to their base form in order to get better results, defaults to 1 (true).
  • Acronyms: Expand acronyms to the full form, for example ‘US’ to ‘United States’ if both tokens appear in the given text. Defaults to 0 (false).
  • Hyphenated: Keep the ‘&’ char when it appears inside a name. For example ‘Ferrara & Wolf’. Defaults to 0 (false).

Example

Input:

"Google Inc. is an American multinational technology company specializing in Internet-related services and products. These include online advertising services, search, cloud computing, and software. Most of its profits are derived from AdWords, an online advertising service that places advertising near the list of search results.

Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. Together they own about 14% of its shares but control 56% of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An Initial Public Offering was announced on August 19, 2004. On the first day of IPO, the share price was set to $85 and it closed at $100.34, a price gain of 18%."

Output:

{  
   "result":[  
      [  
         {  
            "relevance":"0.980",
            "count":2,
            "positions_in_text":[  
               630,
               708
            ],
            "keyword extraction":"Initial Public Offering"
         },
         {  
            "relevance":"0.980",
            "count":2,
            "positions_in_text":[  
               130,
               247
            ],
            "keyword extraction":"online advertising services"
         },
         {  
            "relevance":"0.882",
            "count":3,
            "positions_in_text":[  
               0,
               331,
               570
            ],
            "keyword extraction":"Google Inc."
         },
         {  
            "relevance":"0.490",
            "count":1,
            "positions_in_text":[  
               27
            ],
            "keyword extraction":"multinational technology company"
         },
         {  
            "relevance":"0.490",
            "count":1,
            "positions_in_text":[  
               500
            ],
            "keyword extraction":"stockholder voting power"
         },
         {  
            "relevance":"0.294",
            "count":1,
            "positions_in_text":[  
               717
            ],
            "keyword extraction":"share price"
         },
         {  
            "relevance":"0.294",
            "count":1,
            "positions_in_text":[  
               368
            ],
            "keyword extraction":"Sergey Brin"
         },
         {  
            "relevance":"0.294",
            "count":1,
            "positions_in_text":[  
               396
            ],
            "keyword extraction":"Ph.D. students"
         },
         {  
            "relevance":"0.294",
            "count":1,
            "positions_in_text":[  
               533
            ],
            "keyword extraction":"supervoting stock"
         },
         {  
            "relevance":"0.294",
            "count":1,
            "positions_in_text":[  
               76
            ],
            "keyword extraction":"Internet-related services"
         }
      ]
   ]
}

You can read the API reference for the keyword extractor here.

Feedback

Would love to have your feedback on this new and smarter keyword extractor.

If you have any suggestions, please let us know!

Federico Pascual

May 20th, 2015

Posts you might like...

MonkeyLearn Logo

Text Analysis with Machine Learning

Turn tweets, emails, documents, webpages and more into actionable data. Automate business processes and save hours of manual data processing.

Try MonkeyLearn
Clearbit LogoSegment LogoPubnub LogoProtagonist Logo