Below is a short summary and detailed review of this video written by FutureFactual:
Correlation vs Causation in Statistics: A Cat Height Island Case and Causal Networks
Short summary
This video delves into the common statistician’s pitfall that correlation does not equal causation, using a playful Cat Height Island thought experiment to show how correlations alone leave many possible causal directions. It then explains how additional information and causal networks can narrow down the likely causes, sometimes to just a couple of scenarios, and ends with a caveat from quantum mechanics about the limits of causality in certain correlations.
- Correlation vs causation: why a link between height and cat ownership doesn’t tell you which way the cause flows.
- The Cat Height Island thought experiment reduces 19 possible causal relationships to two when extra information is available.
- Causal networks and multiple correlations can sometimes imply causation, with timeline considerations helping further refine possibilities.
- Even in statistics, there are exotic cases where correlations challenge traditional cause-and-effect ideas, such as in quantum phenomena.
Overview
The video tackles a core issue in statistics: correlation does not automatically imply causation. It emphasizes that a measured association between two variables, like height and cat ownership, is not enough to establish which variable is causing the other. The speaker uses a vivid example involving two islands to illustrate that the real underlying mechanism could be different or even involve a separate causal factor altogether. This sets the stage for a deeper look at how correlations can be used, cautiously, to infer causality when combined with additional information and causal modeling.
The Cat Height Island thought experiment
In the island scenario, height and cat ownership are correlated, but the direction of causality is unknown. The speaker enumerates 19 potential causal relationships that could explain the correlation, with a playful extra possibility that the correlation might be an accident, bringing the number to 20. The goal is to show the complexity of distinguishing cause from effect using correlation alone.
Narrowing the possibilities with extra information
The video then introduces two pieces of additional information that can dramatically reduce the number of viable explanations. First, if people born on a particular island stay there for life, height cannot influence island choice, ruling out relationships where height affects location. Second, if there is no intra-island correlation between height and cat ownership, then direct influences between height and cats on a single island can be ruled out. With these constraints, only two options remain: either the islands themselves causally explain both height and cat ownership, or cat ownership is the causal factor, with the islands reflecting that dynamic for height as a secondary consequence. This is a concrete demonstration of how correlations, when paired with structural assumptions, can narrow the field of possible causal relationships.
Timeline, assumptions, and general lessons
If one had information about the order in which cats and people arrived on the islands, it might further narrow to a single explanation. The broader takeaway is that any group of related things can be analyzed through the patterns of correlations and non correlations among them to eliminate implausible cause-effect links. This is the essence of how correlations can imply causation, given the right causal framework and prior information.
Limitations and cautionary notes
The video acknowledges a notable exception: some quantum experiments produce correlations that defy classical causal explanations. While the focus remains on classical causal inference using correlations and causal models, this caveat reminds viewers that causality is not a universal guarantee in all domains. The message is to use causal models and multiple correlations to deduce causality where appropriate, while remaining aware of the unique cases where standard intuitions may fail.
Takeaways for data analysis
For practitioners, the key lesson is that correlations are informative but not definitive. By combining multiple correlations, temporal information about events, and a well-specified causal framework, one can often eliminate unlikely directions of causality and identify plausible mechanisms. This approach is central to causal inference and the disciplined interpretation of statistical associations in fields spanning science and engineering.
