This is the final part in a series where we use machine learning and natural language processing to analyze articles published in tech news sites in order to gain insights about the state of the startup industry.
On the first post, we collected all the articles published in TechCrunch, VentureBeat and Recode since 2007. We also filtered out all the ones that aren’t about startups. On the second post, we trained machine learning models that can tell what event is described in a piece of news (product launch, acquisition, fundraising, etc) and what industries the startup the article is about belongs to (Fintech, Machine Learning, and so on).
Now, we are finally ready to conduct our analysis on more than 270,000 articles, let’s go over the results!
The first thing we noticed when looking at the data was the result of the startup filter classifier, which detects how many articles about startups were written by these tech sites, which pride themselves in being about coverage of the latest things in tech:
So, TechCrunch, VentureBeat and Recode do write a lot about startups, but how has this evolved over time? It’s interesting to see how the number of articles about startups rises until a peak of 1,500 a month in early 2014:
While the total number of articles published on these sites never goes down, the number of articles about startups do takes a significant dip after 2014. This can be interpreted in different ways. Does this mean that startup activity slowed down after 2014? Or does this simply reflects changes in editorial priorities on these sites?
What type of articles are being published about startups? We’ll use the startup event classifier to analyze our data and find out! This model classifies articles about startups into different types of events:
The results show that fundraising is the most newsworthy event (36%), followed by articles announcing product launches (28%). Besides these two main topics, everything else that happened in startup world got significantly less attention:
As the volume of articles about startups declined in the last couple of years, so did the number of publications on most tags. However, some went down more than others:
By a wide margin, articles about fundraising took the biggest hit, going from a peak of 640 monthly articles in early 2014 to 300~350 in 2017.
Does this means that is harder nowadays to get funding than it used to be? Are investors more careful now, fearing a bubble? The data seems to suggest something of the sort. It could also be that tech sites simply don’t consider fundraising as newsworthy anymore.
Moving forward with our analysis, we’ll use the startup industry classifier to classify the articles about startups into 19 industries.
Unlike the models we have used so far, this is a multi-label classifier, which means that an article can fall under several tags. For instance, if an article mentions a startup that makes mobile games, it should be classified as both mobile and gaming.
So, what are the most popular industries? What’s hot and what’s not? Let’s visualize the coverage per industry for all the articles published in the last ten years to find out:
How have these trends evolved over time? Let’s compare the results from articles published in 2007 vs 2016:
The most popular startup industry in the last ten years has been Mobile. Looking far back in this tag is really interesting: it first has a peak in June - August 2007, which corresponds with the original iPhone launch. After this initial craze died down, the number of mobile articles started to slowly rise ahead of the pack, peaking in 2012 when smartphones really got going:
One of the reasons Mobile still has so much coverage is that it is a really broad tag: any startup that produces some kind of phone app or mobile device could be classified as Mobile. Even though this definition seems to encompass most startups these days, the noise seems to have died down a little bit.
Social media used to be the hottest thing around. Back in 2008 it was the field with the most coverage, with mentions on a whooping 40% of all startup news. It kept its relevance until early 2012, when social media startup news started to dwindle:
This mirrors the trend of startup companies themselves not being “a social network” anymore. After 2012, the landscape of social networks looked fairly stable, and most had given up on beating Facebook at their own game (even Google!). This change is clearly reflected in the data, with Social Media nosediving afterwards.
The second most popular industry these days is AI & Machine Learning. It went from virtually unknown to the second most covered tag in just a few years:
Terms like “Neural network”, “Natural language processing” and “Deep Learning” have become commonplace. Seemingly everyone is incorporating some sort of artificial intelligence into their products. And the press agrees with (and drives!) this popularity: right now, almost 11% of all articles about startups published in Techcrunch, Recode and Venturebeat cover some aspect of machine learning and artificial intelligence. If that doesn’t sound like a lot, think it’s nearly twice as much as any other tag in this analysis (besides mobile).
Besides AI, the industry that has gained most popularity in recent times is AR & VR. This industry started to take off in early 2013, right after the Oculus Rift successful kickstarter campaign and consolidated as a top industry in late 2015:
The rise and fall of Bitcoin (Bitcoin & Blockchain) was what you’d expect: it rises from obscurity in 2013, peaking at the start of 2014 and starting to fall sharply at the end of that year, a lot like Bitcoin price itself:
However, its newsworthiness apparently didn’t rise again when its price did, even with Blockchain becoming more popular for other applications.
As a next step, we’ll combine the results from the industry and events classifiers to discover additional insights.
By combining these classifiers, we can observe that the acquisitions in the mobile sector received more media attention than those in other industries:
This trend is partly due by some high profile acquisitions in the mobile space that received massive media coverage. Like when Facebook set the tech world ablaze with its $19 billion acquisition of WhatsApp, which generated a lot of discussion around why was Facebook spending so much money or if it was a smart move by Mark Zuckerberg.
Startups like Airbnb, Uber, Lyft and Instacart have been facing countless of lawsuits challenging their unconventional business practices for years. So, it comes as no surprise to find out it’s the industry with more press articles about legal problems:
Despite the fact that fintech raised less money in 2016, this sector received more press coverage than other industries, mostly because of some mega-rounds raised by startups like Ant Financial ($4.5 billion), Lu.com ($1.22 billion), JD Finance $1.01 billion):
So, we know which industries and events were most popular and when. But, what if we want to go deeper into a particular industry? We can turn to the keyword extractor! If can send a text from a certain year (or any other timeframe) we’ll get back the keywords for that year. We can then compare them with the keywords from other years and see what’s different.
This keyword analysis is great to see what was really hot in certain moments, and what important topic dominated the conversation in a certain timeframe. For example, the top keywords for the Gaming industry (which is huge by the way, more than 500 articles per year) used to be social games, social networks and casual games. Notably Zynga (remember them?) appears from 2010 to 2013.
However, this domination ended and was replaced by mobile games. Every year from 2013 onwards, it’s the top keyword, and social gaming is nowhere to be seen. This doesn’t mean that social gaming stopped existing, it’s just that the tech media stopped reporting on it.
Like the Gaming industry, the conversation around AR & VR changed quite a bit over the years. Right when this industry started to gain visibility in 2013, Oculus Rift was the top keyword by a wide margin. For a while, it was the product that grabbed headlines with almost no competition for media attention. However, this notoriety didn’t last long. By 2016, the Oculus Rift keyword lost its dominance and was down in sixth place alongside the keyword HTC Vive.
On this series of posts, we’ve scraped hundreds of thousands of articles, created text classifiers to analyze them using MonkeyLearn, and gotten insights from the results. The only way to perform an analysis like this is using machine learning and natural language processing, since there’s no way we can get a person to read through and interpret 270,000 articles.
If you’re interested, you can check out the code we used and perform your own analysis. Playing around with this data can be a lot of fun! Also, feel free to share your opinions and your own insights in the comments below, it’d be awesome to hear what you think!
May 11th, 2017