Hey Jebediah, I agree, its a little bit confusing that the public pipeline of ‘Twitter user profiling’ consumes so many queries at once (we are considering adding a warning so users know this before using this pipeline).
I’ll try to make my best to explain how this pipeline works. First of all, its important to note that we count 1 query for each ‘classification’ or ‘extraction’ task done by MonkeyLearn.
The Twitter user profiling pipeline takes 100 random Twitter biographies of the people this given Twitter user follows.
Afterwards, for each biography, we run the language detection classifier so we only use the biography written in English, so we can use them on the rest of our pipeline. With this, we can consume up to 100 queries (1 query per language detection).
The next step is to apply topic detection on this biographies written in English, to identify common topics and interests. Here we consume as many queries as topic classification tasks (basically, as how many English biographies we have).
Finally, we group all the different biographies and perform 1 last query to perform keyword extraction from all biographies.
By using Pipelines, although you still have the same amounts of queries by using all the different classifiers and extractors independently (because you will still need the same amount of text mining tasks), the advantage is less code needed to implement MonkeyLearn, less API overhead and a lot more speed.