Standard deviations for fun and profit (or, why I heart statistics)

Despite having a maths degree, I graduated knowing nothing about applied maths. Nasty, dirty, smelly stuff. However, it turns out that simple statistics and applied maths are very handy in the nasty, dirty, smelly real world.One of my favourite handy measures is standard deviation. It’s incredibly useful, because it provides a yardstick of ‘weirdness’ for normally distributed data (normally distributed == bell curve shape. This is the distribution of most ‘organically’ generated data).
Why is knowing the standard deviation useful? Because it lets you monitor a bunch of metrics at once for anything unusual, without worrying about their individual qualities.Say you decide that you want an alert whenever your hourly bleep count goes 10% away from its mean. That works fine if you know the hourly bleep count well, and you know from experience that anything above 10% is a worry and anything below is fine. Here’s my bleep count graph over time. Anything which goes outside the lines is going to trigger an alert.However, if you now decide you want to monitor the (slightly noisier) blip count as well, your 10% threshold will drive you nuts with false alarms.


Unless you use the hero of the tale, standard deviation. In the example above, the bleep count has a standard deviation of 5 and mean of 100. So 2 standard deviations is equivalent to 10% of the mean in this case, which we decided was our starting comfort level. About 5% of events will fall outside 2 s.d. in a normal distribution, so in our timeline of 50 events here we would expect 2 alerts. If that starts to feel like too many alerts, we could extend the limits out to 3 s.d., which will only pick up even more unusual events (~0.3% of the time).

Now if we’re happy with the sensitivity level, we can use 2 s.d. for the blip count as well. This will be totally consistent with our bleep alerting, and really easy to implement.

Ahh, that’s better. So now we only get alerted when something genuinely unusual happens, within the parameters of standard blip variation, and we don’t get hounded by false alarms.

Maybe statistics is not so nasty and smelly after all.

bonus material: code to generate the graphs in R is below:

bell<-rnorm(1000,mean=0,sd=10)

hist(bell,100,main=”some normally distributed data”,xlab=”value”)

plot.ts(rnorm(50,mean=100,sd=5),ylim=c(80,120),ylab=”bleeps”,main=”bleep count”)

abline(h=90,col=”red”)

abline(h=110,col=”red”)

blip<-rnorm(50,mean=100,sd=15)

plot.ts(blip,ylim=c(80,120),ylab=”blips”,main=”blip count”)

abline(h=90,col=”red”)

abline(h=110,col=”red”)

plot.ts(blip,ylim=c(60,140),ylab=”blips”,main=”blip count with revised alert threshold”)

abline(h=70,col=”red”)

abline(h=130,col=”red”)

Advertisements

3 Comments on “Standard deviations for fun and profit (or, why I heart statistics)”

  1. Be interesting to see what the % for blip was?

    • magicdashes says:

      Oh yes, great question. For the blip count, 2 s.d. was 30, or 30% of the mean, so that’s where you would set a % alerting threshold if you need to continue defining it in that way.

  2. […] for how far away it gets from forecast before you should be worried. See my previous post on standard deviation for more […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s