Blog article
See all stories »

Data Analytics: As Good As the User

Numbers are more persuasive; visual representations gravitate bringing the ‘wow’ factor.  All this is possible by the sophisticated tools of today that churn large amounts of data into nice beautiful picturesque dashboards. Some look like abstract art that would have certainly thrilled Picasso. The CxOs use these to make ‘fact based’ business decisions. I met a CxO, whose screen saver is a series of dashboards and she mentioned with pride that she has her fingers on the pulse of the business and the executives run business on quantitative facts. Very impressive. It did not take long for the company to run into trouble. The primary reason being the attractive dashboards showed numbers that are not truly representative. What I am driving at is the user of analytics mandatorily needs to have a full understanding of the underlying algorithm that shows the numbers. This is a life saver and it  requires shredding the masterpiece of the dashboard and poke ones’ head a bit into the unknown. Data in itself is an asset; digital gold. The problem lies in the packaging.

I list out the 5 greatest ‘devils’ as a result of poor packaging that can make analytics a punching back to relieve stress.

  1. The ‘averages’ are wonderful to discuss a wide spectrum of information condensed into one number.  This is only good to get a perspective. The ‘devil’ is in the outliers that can skew it either way. If ever one needs to take a decision on averages alone, though not recommended, look closely at the dataset. I would suggest call the data scientist to outline the underlying data in simple English.
  2. The ‘percentages’ give muscle when thrown into a discussion.  To illustrate 33.33 percent of university graduates end up marrying their professors. This is something; we straighten our spine to sit erect. The truth is the class has 3 students and one married her senior who was awarded a PhD and took up a teaching assignment.  The ‘devil’ is not only in the size of the numbers it is also on the consistency of qualitative data.
  3. A KPI (key performance indicator) is only an indicator. It is not a decision making tool.  A KPI can be an average, percentage or a raw number. The ‘devil’ is the name KPI, that throws one off guard. Therefore, when you see your business has grown, it always makes sense to pick up the phone and make enquiries with the right person.
  4. Data scientists are smart. They do a very good job of explaining the bell curve, the standard deviation and the mean. The ‘devil’ is, they speak a language that sounds like English and after an hour of conversation we are where we are and the white board is full of several diagrams showing ‘X’ axis and ‘Y’ axis clearly and the word ‘data’ splattered generously.   It is time well spent to learn some statistics to survive such an ordeal and make sense of the white board rather than look blankly at it.
  5. ‘Sampling’ is great; however, it has to be representative. The devil is ‘samples’ can be biased deliberately to get expected results. Make sure the size of a sample has some relation to the size of data that is interpreted. It is all central limit theorem.

When an analytics project is under consideration, it is necessary to have a project team conversant with the business and with sound knowledge of principles of statistics. Secondly, the data source that is not included in the project must be ring fenced and the risk assessed from such exclusion. Thirdly, as the project progresses each of the algorithms that is coded must be reviewed, tested and ‘dry run’ before moving into production. Finally the biggest devil, the documentation must be comprehensive and have generous examples (use cases if you will) in the use of data and clearly describe the outcome.

The one reality for any flavor of analytics is a necessary reliance on historical data; it does not matter if the analytics is descriptive or predictive or prescriptive. Make sure it is accurate.

Now to the cliche; Data is the new oil. Oil in itself cannot run an automobile unless processed in the right way.          



Comments: (3)

Ketharaman Swaminathan
Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 17 December, 2018, 14:46Be the first to give this comment the thumbs up 0 likes

Re. "Make sure the size of a sample has some relation to the size of data that is interpreted." This is old wisdom that we've known for a long time.

However, lately, I find many sample sizes of ~2000 - regardless of the size of universe. When I looked around, I read in some random place that sample size beyond 2000 does not increase the confidence level of the results, however large the population beyond 20000.

This seems to contradict conventional wisdom. Keen to know your views on whether this is yet another example of "how to lie with statistics" or the outcome of some drastic advancement in the field of statistics that totally upends old wisdom about sample sizes.

Vishwanath Thanalapatti
Vishwanath Thanalapatti - Temenos - Canada 17 December, 2018, 18:34Be the first to give this comment the thumbs up 0 likes

I am suggesting two different attributes of a sample. The first being the sample has to represent the population. For accuracy two separate random samples may help. The second is the sample size itself.  The larger the better. 95% confidence is good to draw conclusions.

Ketharaman Swaminathan
Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 17 December, 2018, 18:43Be the first to give this comment the thumbs up 0 likes

I know but the source I referenced somewhat contradicts your second assertion "the larger the better". I wonder if the resolution of the conundrum lies in the proposition that it's extremely hard to create a representative sample of just 2000 if the population size is high i.e. makes it impossible to fulfill your first attribute.