A picture is worth a thousand numbersPosted: September 1, 2011
I just reintroduced myself to a lovely little example of why it’s important to draw graphs and to look at outliers in your data, and I want to introduce you to it as well.
This came up today when I was trying to persuade someone that they really ought to look beyond the average values for spending, website visits, and so on when thinking about their customers. They weren’t convinced, as they thought that an average told you pretty much all the important stuff, and anything else was basically just getting carried away with overengineering the statistics for the fun of it.
Here is a very lovely little counterexample: Anscombe’s quartet, constructed by Edward F J Anscombe back in 1973. Feast your eyes on the sets of data below, which have eleven points each.
(thanks Wikipedia for the picture).
Imagine that each of these datasets represents eleven of your customers on some measure or other (you can pick something that’s relevant to you). If you looked at just the averages, or even started to get a bit clever and look at the variance and the correlation of X and Y for each set, you’ll find that every set has exactly the same values.
Isn’t that fantastic? Each of the sets above has mean X of 9, variance X of 10, mean Y of 7.5, variance Y of 3.75, and correlation between X and Y of 0.81, with exactly the same best fit regression line for each. Any sensible summary measure you can think of for the set comes out just the same for every one. And yet anyone looking at the graphs can see in a second that each dataset is telling a completely different story about your business.