Correlation Is Not Causation: 7 Famous Examples That Prove It
Correlation Is Not Causation: 7 Famous Examples That Prove It
Here's a fun fact: in the United States, states that consume more margarine per capita tend to have higher divorce rates. The correlation coefficient? A startling 0.99 — near-perfect statistical alignment. Should we ban margarine to save marriages? Obviously not. But this is exactly the kind of reasoning trap that catches smart, educated people every single day.
The phrase "correlation is not causation" has become something of a statistical cliché — repeated so often it's lost its teeth. So instead of explaining the concept abstractly, let's look at seven real examples where the data looked genuinely convincing, where people actually changed their beliefs or policies, and where the whole thing later collapsed under scrutiny. By the end, you'll have a sharper instinct for spotting these traps in the wild — in news headlines, health studies, and especially in the kinds of data visualizations that go viral.
1. Nicolas Cage Films and Pool Drownings
Tyler Vigen's Spurious Correlations project is a goldmine, but this one takes the crown. Between 1999 and 2009, the number of people who drowned by falling into swimming pools correlates almost perfectly with the number of Nicolas Cage films released that year — correlation coefficient of 0.666.
It's funny, which is the point. The joke works because we immediately see that these two things are completely unrelated. There is no plausible mechanism connecting Nicolas Cage's filmography to accidental drownings. When a proposed causal link is this absurd, our intuition correctly rejects it.
The lesson: Your intuition about plausible mechanisms matters. When a correlation lacks any coherent explanation for why A would cause B, that's your first red flag. Unfortunately, most real-world spurious correlations are far less obviously absurd — which is why they fool us.
2. Ice Cream Sales and Violent Crime
This one has been used in statistics classrooms for decades, but it still hits hard. Cities consistently record higher rates of violent crime during months when ice cream sales peak. The pattern is robust, it replicates across different cities and different years, and the numbers are real.
So does ice cream make people violent? Of course not. The hidden variable — the confounder — is temperature. Hot weather drives both ice cream consumption and keeps people outside longer, creating more opportunities for interpersonal conflict. Remove the summer months, and the ice cream-crime correlation largely evaporates.
The lesson: Confounding variables are the most common cause of misleading correlations. A third factor — one you didn't measure or didn't think to control for — is driving both variables simultaneously. Always ask: "Is there something else that could be influencing both of these things at once?"
3. Shoe Size and Reading Ability in Children
Researchers studying elementary school children found a strong positive correlation between shoe size and reading comprehension scores. Bigger feet, better readers. The data is consistent and statistically significant.
The confounder here is age. Older children have bigger feet. Older children also read better, because they've had more practice and instruction. Once you control for age, the shoe size correlation disappears entirely. Shoe size is a proxy for a variable that actually matters.
The lesson: Proxy variables sneak into data analysis constantly. A measured variable that correlates with an unmeasured cause can appear to be the cause itself. This is why controlling for obvious confounders in statistical research isn't optional — it's the whole job.
4. The Helmet and Head Injury Paradox
This one sounds alarming: some studies found that hospital admissions for head injuries were disproportionately high among cyclists who were wearing helmets at the time of the accident. Does this mean helmets cause more injuries?
No — and understanding why reveals a subtler statistical trap called selection bias. Cyclists who wear helmets are more likely to engage in riskier cycling behavior (faster speeds, more traffic, longer distances) than those who don't bother with helmets. The hospital data was capturing the behavior pattern, not a helmet effect. Within equivalent risk groups, helmets absolutely reduce injury severity.
The lesson: Who ends up in your dataset matters enormously. If the sample is systematically skewed — if people with certain characteristics are more or less likely to show up in your data — the correlations you find can point in the completely wrong direction. Epidemiologists call this collider bias and it trips up even experienced researchers.
5. Nobel Prizes and Chocolate Consumption
In 2012, a paper published in the New England Journal of Medicine — yes, a real, serious medical journal — showed a strong correlation between per-capita chocolate consumption in a country and the number of Nobel laureates that country has produced per 10 million citizens. The correlation was 0.791, which is substantial.
The paper was written somewhat tongue-in-cheek, but it got serious media coverage, and many readers took it at face value. The actual explanation? Both variables correlate strongly with national wealth. Richer countries eat more chocolate (a luxury good) and also invest more in education, research, and institutions that produce Nobel Prize-winning work. Chocolate is a financial indicator in disguise.
The lesson: Wealth, education, and development are confounders in an enormous number of cross-country comparisons. Whenever you see a correlation between countries that involves any kind of consumption, health outcome, or cultural output, ask whether you're really just looking at GDP in different clothes.
6. Storks and Birth Rates in Europe
This is the classic that statistics professors have used for generations. In several European countries, regions with more storks have higher birth rates. The data is real. The storks are real. The babies are real. The correlation holds up.
The confounder? Rural land area. Storks nest in rural areas. Rural areas also tend to have larger families and higher birth rates than urban centers, often for economic and cultural reasons that have nothing whatsoever to do with storks. The stork population is a proxy for rurality; rurality drives birth rates; the stork-birth correlation is entirely an artifact.
The lesson: Geographic and demographic confounders are extremely common in public health and social science research. Whenever a correlation is observed across regions or communities, ask whether you're really measuring a difference in the underlying population rather than a direct relationship between the variables.
7. The Famous Breakfast Cereal Effect
Multiple studies over the years have found that children who eat breakfast perform better in school — better grades, better concentration, better test scores. This is probably real and probably causal. But here's where it gets complicated: studies also showed that specifically eating breakfast cereal was associated with academic performance, and cereal companies happily promoted this finding for decades.
The issue is that the cereal effect largely disappears once you control for household income and stability. Families who sit down for breakfast together, who can afford packaged cereal, and who maintain consistent morning routines tend to be in more stable economic circumstances. Those same families also invest more in education. The cereal was real, the correlation was real, but the causal arrow pointed somewhere completely different.
The lesson: Industry-funded research or research that supports convenient commercial conclusions deserves extra scrutiny. It's not that the researchers necessarily cheated — sometimes they genuinely didn't control for the right variables. But when a finding is commercially convenient, the pressure to not look too hard at confounders is real.
So How Do You Actually Tell the Difference?
After seven examples, a few practical questions emerge that you can apply whenever you encounter a striking correlation:
- Is there a plausible mechanism? Can you describe a coherent biological, social, or physical process by which A would cause B? If not, be skeptical.
- What's the obvious confounder? What third variable might be driving both A and B simultaneously? Temperature, wealth, age, and urbanization are responsible for more spurious correlations than almost anything else.
- Who's in the sample? If only certain types of people or regions or time periods ended up in the data, the correlation might be an artifact of that selection.
- Does the timing work? Causation requires that the cause precede the effect. If they move simultaneously, that's a red flag.
- Has the finding replicated? A single study showing a correlation is almost never enough. A finding that replicates across different populations, methods, and research teams starts to carry real weight.
None of this means correlation is useless — far from it. Correlational data gives us our first clues, points us toward hypotheses worth testing, and sometimes is the best evidence we can practically gather. Smoking and lung cancer was initially a correlational finding. So was lead exposure and cognitive development. Both turned out to be causal, and both changed public policy in important ways.
The goal isn't to dismiss every correlation with a wave of your hand. The goal is to hold it at arm's length long enough to ask the right questions — and to resist the deeply human temptation to see a pattern in data and immediately spin it into a story. Data is very good at showing us that two things move together. It takes a lot more work — careful study design, randomization, controlling for confounders, replication — to figure out why.
Until you've done that work, the humble "correlation is not causation" isn't a cliché. It's a firewall against bad reasoning. Keep it maintained.