Correlation and Causation

In elementary schools, it is possible to track a very interesting correlation relating to children’s test scores.  As it turns out, there is a very strong positive correlation between a child’s shoe size and the child’s performance on a standard spelling test.  On the surface, this seems very strange indeed.  How can it really be true that the bigger a child’s feet are, the better that child performs on spelling tests? 

The answer to this apparent mystery is simple:  as children age, several things happen.  One of these things is that their feet grow.  Another is that they get consistently at spelling words.  So, shoe size is correlated to spelling ability; however, the only causal relationship that exists in this example is that increasing age causes both of the initially noted variables to change. 

This is an illustration of one of the most important and oft-quoted points in the field of statistics:  correlation does not imply causation.  Just because two things reliably vary together does not mean that one has caused the other.  This is a simple point, but it bears repetition because of its titanic importance. 

As Mark Twain famously said, “there are lies, damned lies, and statistics.”  To a certain degree, Twain is exactly correct.  Though numbers don’t lie, people sure do, and it’s very easy to present statistics and mathematics in such a way that makes it easy to mislead those who don’t have experience or technical training in interpreting sets of numbers, scores, or statistics.  Probability and stats are misused every day to ill effect.  Creationists use pseudo-probability as evidence as to why their crackpot ideas should take a place next to real biology in the science classroom.  Real estate agents can easily misrepresent the average price of housing in a neighborhood by using a sufficiently clever substitution of mode or mean for median.  All around us, there are countless opportunities for innocent statistics to fall into the wrong hands and to become agents of bamboozle.

In a way, then, the field of statistics is a bit like the field of rhetoric as described by Aristotle.  In his great work, named (appropriately enough) “Rhetoric,” Aristotle claims that rhetoric is an essential skill to learn because leaving oneself ignorant of the ways in which people persuade is to leave oneself open to being convinced or losing a debate not because another has a good point but because another is merely better with slick words.  Statistics is the same way.  It’s an incredibly powerful tool, but if one doesn’t learn how to properly interpret statistical data, one is basically asking to be taken in by a manipulative data-cruncher who knows how to spin things like the ill-understood nonlink between causation and correlation.