Join the Community

22,178
Expert opinions
44,235
Total members
412
New members (last 30 days)
212
New opinions (last 30 days)
28,725
Total comments

Demystifying The Ubiquitous Sample Size Of 2000

  1 1 comment

Back in the day, we learned in statistics that you need a sample size of at least 2% of the size of population to make statistically significant conclusions about the behavior of the population. In common speak, the expression "statistically significant" means "valid".

Nevertheless, if you're like me, you regularly come across surveys on populations of hundreds of millions of members that use sample sizes of 2000, which is way less than 2%. A few examples:

EXAMPLE 1

(https://twitter.com/pennycrosman/status/1075440678363107329)

EXAMPLE 2

Not even 2K...

(https://twitter.com/EuropeElects/status/1086794905580695558)

EXAMPLE 3

An Amazon Checking Account Could Displace $100 Billion In Bank Deposits (But It Won't)

EXAMPLE 4

Most Americans foresee death of cash in their lifetimes

EXAMPLE 5

Barely three percent of the 2000+ consumers surveyed by the FCA had made an investment in cryoptoassets such as bitcoin and ether

----------

For reference, population of Great Britain is ~60 million and of USA is ~300 million.

The sample sizes in these studies work out to 0.0033 to 0.00066 percent of the respective population, which are well short of 2%.

Should we debunk the findings of such studies?

At one time, I thought yes.

But, now, I'm not so sure. Many of these studies were published by well-reputed media outlets that can't be dismissed so easily.

So, I decided to probe the topic further.

----------

I came across this online sample size calculator, which says a sample size of 1006 yields a 95% confidence value of results with 3% error margin for a population of 300 million.

Not convinced with the above, I found a formula to calculate error margin here. When I plugged in the values, I found the results tallying with the above.

I was also intrigued by the following line on the sample size calculator website:

“The sample size doesn't change much for populations larger than 20,000.”

----------

What gives?

I suspect it has something to do with the composition of samples and populations.

Surveys make a tacit assumption that their samples are "truly representative" of the population. To establish that, we can be guided by the following maxim that I call the Gallup Soup Principle:

Some populations are homogeneous i.e. of the same nature. For example, liquids that have been given a good stir. (Unlike James Bond's Martinis, which are shaken, not stirred!).

Other populations are heterogeneous i.e. diverse in character. For example, the terrain of the earth.

A nation tends to be homogeneous on some parameters (e.g. nationality) but heterogeneous on some others (e.g. income).

For the sake of this post, homogeneous refers to "homogeneous by nature" as well as "becomes homogeneous after being given a good stir"; and heterogeneous means "heterogeneous by nature" and "remains heterogeneous even after being given a good stir". Unlike liquids, you can't stir many populations, so "good stir" effectively happens by taking a random sample of those populations.

For homogeneous populations, a sample size of mere 2K will exhibit the characteristics of the aforementioned spoonful of soup. Accordingly, results of surveys conducted with such a small sample could be valid for the entire population, however large it is.

For heterogeneous populations, a sample size of 2K is unlike the spoonful of soup. Accordingly, results of surveys conducted with such a sample may not be valid for the population. In fact, such surveys often yield misleading or contradictory results.

Misleading results

  • Indians don't speak Hindi (Survey of 2000 Indians in Tamil Nadu, a southern state of India in which Tamil is the local language)
  • America is all planes and has no mountains (Survey of the terrain of 2000 square miles of Kansas)
  • 95% banks have innovation labs (Survey of 100% of banks who have innovation labs)

Contradictory results

  • Cash is dead v. Cash in circulation is growing
  • Branch is dead v. Banks are opening new branches
  • Omnichannel shopping is BS v. Book Online & Collect At Store is the future of retail, and so on.

----------

Whether a survey with a mere 2K sample size will deliver valid results for a population of millions depends on how homogeneous or heterogeneous the population is.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

22,178
Expert opinions
44,235
Total members
412
New members (last 30 days)
212
New opinions (last 30 days)
28,725
Total comments

Trending

Boris Bialek

Boris Bialek Vice President and Field CTO, Industry Solutions at MongoDB

Enhancing Digital Banking Experiences with AI

Barley Laing

Barley Laing UK Managing Director at Melissa

Reducing the impact of AI-driven fraud in 2025

Now Hiring