Data purists would rap my knuckles for asking this question and reply, "Never".
On the other hand, "data sophists" who're accustomed to lying with Big Data in even more crude ways would wonder, “Duh, they’re the same, no?”
If you don't belong to either camp, you might pause and wonder if there's a golden mean between the two extremes. Like me.
Let me use these two examples to get a feel of when correlation can equal causation and when it can't.
Correlation: US spending on science, space and technology goes up or down in tandem with suicides by hanging, strangulation and suffocation.
Source: Spurious Correlations (http://tylervigen.com/)
<see graphic at bottom of post>
Causation: If suicides by hanging etc. go up, US spending on science etc. will also go up.
Action: Monitor suicide rate by hanging. If it goes up, release more budget for R&D. If it goes down, downsize R&D.
Even a diehard Data Sophist would intuitively accept that correlation does not equal causation in this case.
Correlation: Compared to other zip codes, there’s a significantly higher attach rate of business loans with home loans in 23508.
Causation: If home loan goes up in Zip Code 23508, business loans will also go up.
Action: Monitor home loan volume. If it goes up, source additional funds for business loans. If it goes down, release funds earmarked for business loans.
As we saw in Fate Of Predictive Analytics After Obama Credit Card Decline, the correlation made business sense when the bank in question discovered that its business
banking and retail banking sales people sat at the same office in Norfolk (zip code 23508), a practice that led to better exchange of market information. Therefore, intuitively, we can agree that correlation could mean causation in this case.
(Notice my frequent use of intuition. It’s intentional: When all the numbers are collected, crunched and visualized, many business decisions are guided by the gut to some extent. At least the heuristic ones like developing a marketing plan, writing a book
or recruiting a sales rep.)
To summarize, correlation does not equal causation in the first example and may equal causation in the second one.
By abstracting the basic differences between the two examples, I propose that correlation can equal causation if the following three conditions are met:
- The measured variables belong to the same domain
- The correlation makes some business sense in itself
- The causation can be validated by backtesting with past data.
So, to answer the question posed in the title of this post,
Correlation can equal causation - sometimes!