Blog article
See all stories »

Beware Of Committing Harakiri By Lying With Big Data

I recently saw the following post on LinkedIn:

“95% of banks in the study have created innovation labs.”

This figure seemed extremely high to me. I did a quick-and-dirty survey of three banks in my circle. Not one of them has an innovation lab.

Nevertheless, I couldn't conclude that the author of the post was lying because I couldn’t find evidence of the use of the three techniques to lie with Big Data that I described a couple of years ago. Nor could I spot any sleight-of-hand in his analysis.

I was about to reconcile myself with the 95% figure when I read the following comment by fintech thought leader Alex Jimenez:

“95% of the banks with innovations labs have innovation labs (one closed when they ran out of beer).”

Hmmm. Alex had a point.

I then saw the following snarky interpretation by another fintech thought leader Ron Shevlin:

“No issue. It said 95% of the bank IN THE STUDY. Clearly, the study was a handpicked sample of banks that have innovation labs. And one bank that didn't.”

Ron's emphasis on “IN THE STUDY” gave me the epiphany moment that this could be a brand new way to lie with Big Data.

Let me call it:

#4. Pixie Dust Sample

In this method, you compile a cohort of members that supports your hypothesis. You then add a small sprinkling of truly random subjects in order to give the impression that your sample is representative of the population. When you run the survey on this sample, you'll obviously be able to prove whatever you set out to prove.

In a properly-conducted survey, you'd draw a random sample of banks and ask each bank if it had an innovation lab. I'd expect such a survey to reveal that 15-20% of banks have innovation labs (notwithstanding the 0% result of my personal survey).

In the pixie dust sample method, you compile a sample by Googling for "banks with innovation labs". You ask each bank in that sample whether it has an innovation lab. You'd expect all of them to say yes. But, realistically, Google is not infallible, so its search results might contain erraneous entries of a few banks that don't have an innovation lab. Besides, a couple of banks might have shuttered down their innovation labs because they ran out of beer or for some other reason. Ergo 95%, not 100%, of banks in your survey would say yes.


This LinkedIn post brought back memories of a fintech whose "pay later" product I'd trialed a while ago.

I got a call after a few months from a market research agency asking me if I'd heard about this deferred payment product.

I said yes.

The caller said thanks and hung up without asking any further questions.

I haven't heard about this fintech after that.

Connecting the dots, I guess this is what happened behind the scenes:

  • The fintech wanted to gauge its brand awarness and appointed an MR agency to carry out a survey
  • The agency asked for big bucks to compile a statistically-significant sample
  • The fintech balked at the cost
  • In a true spirit of partnership, the agency said, okay, you give us a list of people whom we should poll and we'll not charge you anything for compiling the sample
  • The fintech agreed to reciprocate the agency's overtures for partnership. After scrounging all its hard disks and cloud storage space and USB sticks, the fintech could come up with only one list that was in good enough shape. This was its existing customer list. It handed over the list to the agency.
  • Priding itself on its bias for action - rather than talk - the fintech neglected to mention to the agency that the list comprised of existing customers
  • The agency conducted the survey on this pixie dust sample.

Most people - like me - said yes when asked if they'd heard about the pay later product. A few people might have forgotten about the product and said no. Ergo the survey found that "95% of people have heard about the fintech's pay later product".

The fintech was delighted to hear that is pay later product enjoyed such a huge amount of brand awareness and promptly reported this figure to its VCs. Since VCs tend to have short attention spans, the fintech's founders didn't get into the details of survey methodology or composition of sample. The VCs felt that the fintech's brand awareness was very high and directed its founders to stop all marketing campaigns.

Unsurprisingly, the fintech has disappeared from the market.


Companies that use the three techniques described in my blog post How To Lie With Big Data can score benefits, at least in the short term, before they're exposed.

However, anybody who uses the fourth tactic outlined in this post would be committing harakiri (aka seppuku aka suicide) by lying with big data. As this fintech did - unwittingly or otherwise.

Notwithstanding the exact status of the said fintech or the accuracy of my conjecture of the behind-the-scenes events, the purpose of this post is to highlight the strong possibility that lying with Big Data can have the unintended consequence of killing the liar.


Comments: (1)

A Finextra member
A Finextra member 23 July, 2018, 14:46Be the first to give this comment the thumbs up 0 likes

Good insight, nicely written. Recently i had attended talk on how to do AI responsibily and avoid unintended consequences. What you mentioned above could result in unintended consequences if the element of bias is not checked and the urge to be first is not resisted. It is important to be truthful and unbiased, aleast there needs to be good vales/practices around using big data.

Now hiring