Blog article
See all stories ยป

When Big Data is not Big Understanding

Good article from Tim Harford (he of the enjoyable "Undercover Economist" books) in the FT last week called "Big data: are we making a big mistake". Tim injects some healthy realism into the hype of Big Data without dismissing its importance and potential benefits. The article talks about the four claims often made when talking about Big Data:

  1. Data analysis often produces uncannily accurate results
  2. Make statistical samplying obsolete by capturing all the data
  3. Statistical correlation is all you need - no need to understand causation
  4. Enough data means that scientific or statistical models aren't needed

Now models can have their own problems, but I can see where he is coming from, for instance 3. and 4. above seem to be in direct contradiction. I particularly like the comment later in the article that "causality won't be discarded, but it is being knocked off its pedestal as the primary fountain of meaning."

Also I liked the definition by one of the academics mentioned of a big data set being one where "N = All", and that you have "all" the data is an incorrect assumption behind some Big Data analysis put forward. Large data sets can mean that sample error is low, but sample bias is still a potentially big problem - for example everyone on Twitter is probably not representative of the population of the human race in general.

So I will now press save on this blog post and help re-enforce the impression that Big Data is a hot topic...which it is, but not for everyone I guess is the point.

 

3911

Comments: (0)

Retired Member

Member since

19 Mar 2009

Location

Blog posts

6,023

Comments

6,224

This post is from a series of posts in the group:

Data Management 101

A community blog about data and how to manage it


See all