Here we describe each of the public extractors that you can use.

Entity Extractor

Description

Extract Entities from text using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 3 classes: PERSON, ORGANIZATION and LOCATION

Implementation

This NER tagger is implemented using Conditional Random Field (CRF) sequence models and is trained over a huge amount of data.

Example

Input:

Output:

Spanish Entity Extractor

Description

Extract Entities from text in Spanish using Named Entity Recognition (NER). NER labels sequences of words in a text which are the names of things, such as person and company names. This implementation labels 4 classes: PERS, ORG, LUG and OTROS.

Implementation

This NER tagger is implemented using Conditional Random Field (CRF) sequence models and is trained over a huge amount of data.

Example

Input:

Output:

Keywords Extractor

Description

Extract keywords from text in English. Keywords can be compounded by one or more words and are defined as the important topics in your content and can be used to index data, generate tag clouds or for searching.

Implementation

This keyword extraction algorithm employs statistical algorithms and natural language processing technology to analyze your content and identify the relevant keywords.

Example

Input:

Output:

Spanish Keywords Extractor

Description

Extract keywords from text in Spanish. Keywords can be compounded by one or more words and are defined as the important topics in your content and can be used to index data, generate tag clouds or for searching.

Implementation

This keyword extraction algorithm employs statistical algorithms and natural language processing technology to analyze your content and identify the relevant keywords.

Example

Input:

Output:

Html to Text extractor

Description

Extract relevant text from HTML. This algorithm can be used to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page.

Example

Input:

Output: