How accurate is CrunchBase? – Part 2

I am glad to report the CrunchBase team has showed interest in the analysis that I’ve published in my last post, the part 1 of this series. So much so that they have agreed to share with me the complete CrunchBase dataset: not just covering start-ups headquartered in the US (as publicly shared here) but in all corners of the world!

Armed with this plethora of data I carried on a comparative analysis of the CrunchBase dataset in different regions. Here is the content of the report that I produced and sent to the CrunchBase team:

One can draw two key results from comparing the CrunchBase dataset to statistics published by Dow Jones’ VentureSource:

  1. For start-ups headquartered in the US and Europe, the CrunchBase investment data has become much more accurate in recent years. For instance, the sum of VC investment in US-headquartered start-ups is pretty much equal to the amount reported by VentureSource (both in  2011 and 2012)
  2. The accuracy in other regions (e.g. India, Israel and China) has not steadily improved over time in the same way that it has in the US and in Europe

Fig^ 1

Fig^ 2

Another way to measure the accuracy of the database is to look at the average time lag between a round occurring and its data entry in the database. Once again, the analysis shows a clear improvement in that respect, especially in the US and Europe.

Fig^ 3

For all rounds of investment in the database that have occurred between Q1 2010 and Q4 2012, time lag drops significantly over time (c.75% reduction in the US and c.82% in Europe). The US remains the region where the dataset is the most “up-to-date”, with an average delay of 22 days between a round occurring and its data entry (vs. 37 days in Europe).

I have also tried to identify any significant “push” or “jump” in accuracy that may have occurred since the database was first created, in May 2007. To do so, I investigated the monthly number of rounds entered in the database with a time lag bigger than 200 days:

Fig^ 4

The above chart shows there was a significant peak in data entry activity around April 2010; this was probably caused by a concerted effort from the CrunchBase team to improve the historical accuracy of the dataset.

More detail regarding the accuracy of the CrunchBase dataset in Europe

Whilst the CrunchBase dataset seems to reconcile very well with the VentureSource statistics at a European level for recent years, a more detailed analysis at a country level shows a much lower level of reconciliation:

Fig^ 5

Fig^ 6

From this chart we can deduce that:

  • There are countries in which the CrunchBase dataset seems very incomplete. In France in particular, both the number of rounds and invested amounts represent only c.40% of stats reported by VentureSource
  • However, there are also countries for which CrunchBase seems to provide a more complete dataset (both for number of rounds and invested amounts) – for instance in the UK and in Germany

These results imply that VentureSource cannot be assumed to be a comprehensive source for the European VC and start-up eco-system. To double-check this I produced the chart below, which compares overall statistics published by VentureSource, EVCA (European Private Equity and Venture Capital Association) and those that I got from analysing the CrunchBase dataset:

Fig^ 7

Whilst the CrunchBase dataset is clearly incomplete for 2007 data, its accuracy improves for more recent data. In fact, as of today CrunchBase may well be the most accurate database in the world when it comes to the Europe start-up scene (one has to keep in mind however that the strength of the database varies considerably by countries – for instance it is strong in the UK but weak in France).

This should not come as a surprise: the weakness of professional database for the European VC industry (such as Thomson and Dow Jones VentureSource) has already been reported by EarlyBird Venture Capital in 2011:

Fig^ 8

Source: EarlyBird, EVCA, Prequin database: http://www.slideshare.net/earlybirdjason/earlybird-europe-venture-capital-report

To conclude, whilst my analysis shows that there is room for improvement at a country level, the fact still remains that CrunchBase is already doing as good a job as any other existing database when it comes to keeping track of European VC investment activity. If CrunchBase were to actively launch and promote the CrunchBase Venture Programme in Europe, it would probably change CrunchBase into the uncontested #1 database for the European start-up eco-system. And the best thing about it? Unlike its competitors, CrunchBase is free!