Blog article
See all stories »

How To Lie With Big Data

Darrell Huff's classic How To Lie With Statistics is as relevant today as it was when it was published 60 years ago. Probably even more relevant now because, as more data has been generated in the last three years than in the entire history of mankind, there's so much more raw material for lying now than ever before.

Whether it's statistics or big data, the goal of lying is still the same: Draw a favorable conclusion by ignoring unfavorable alternatives.

Let's take a few examples of fooling around with data to see how the lying techniques have evolved.   

#1. Faux Data

"I polled my 1000+ LinkedIn Connections to ask them whether they preferred fingerprint or password to access their Mobile Banking app. 80% of respondents voted in favor of Fingerprint."

It's intuitively clear that people will prefer the convenience of a fingerprint over the pain of entering a long password on the virtual keyboard of a smartphone. That said, we crave data before agreeing or disagreeing with anything. This statement slakes our thirst for numbers, so we readily give it the thumbs up.

There's only problem: The aforementioned statement implies that 800 people voted for fingerprint.  That's totally wrong, as you can see by looking up the actual figures here.

This technique works by using verbal sleight-of-hand. In this example, the wordplay is on the word "respondents".

#2. Strategic Silence

"Sixty-two percent of consumers expect live chat to be available on mobile devices, and 82% would use it."

I read this in the following tweet by @rshevlin.

I agree with @rshevlin: "Nonsense. 62% of ppl don't know what live chat is".

If you probe deeper, you might find the truth unraveling as follows: "62% of shoppers who know about live chat want it on mobile devices". Considering that not more than 20% of consumers are likely to know about live chat, the above finding would translate to:

"12% of shoppers want live chat with brands on mobile devices".

(12% being 20% of 62%).

This won't help someone trying to sell mobile live chat. Hence the "strategic silence" on details.

#3. Exploiting Calculitis

"A growing digital advertising market has also seen increased adoption among B2B marketers."

Blogger Poornima Mohandas waxes eloquently about the popularity of PPC ads in B2B in her post “How to Create PPC Ads That Close Sales”.

Unfortunately, according to data, only 15% of B2B marketers agree with her conclusion. Which is disastrous for the author's obvious attempt to plug her white paper about PPC in her blog post.

So what happens?

We see the premise broken into two parts:

  1. Percentage of B2B marketers who use PPC: 48%
  2. Percentage of them who find PPC very effective: 32%.

Now, for somebody to like PPC, they need to have used it and found it effective. That figure works out to 48% of 32% i.e. 15.36%. In other words, a vast majority of over 84% of B2B marketers don’t like PPC ads.

But many readers afflicted by “calculitis” - the fear of using a calculator - might see the two percentages in isolation and say to themselves, "Wow, PPC is widely used" or "Hmm, PPC seems quite effective".

And you don't expect an author pitching a white paper on PPC to rush to correct that impression, do you?


I could go on and on with many more examples but they'd all prove the same thing: The techniques for lying with data are still crude.

While data has grown by leaps and bounds in the last 60 years, the tricks to lie with it haven't kept pace.

I leave it to you to decide whether that's a good or bad thing.


Comments: (1)

A Finextra member
A Finextra member 22 December, 2014, 07:19Be the first to give this comment the thumbs up 0 likes

This is applicable to all analytics, it goes by the old saying "Garbage in Garbage out". The quality of data sample, size etc are the factors that give the output. But it does not indicate that something wrong with the tools.

Now hiring