Persistence and Predictability

 David Johnson, Statistical Analysis  Comments Off on Persistence and Predictability
Jun 012011

There seems to be some confusion, or lack of clarity, about my post on corsi vs shooting percentage vs shooting rate the other day so let me clear it up in as straight forward a way as I can.

“Hawerchuk” over at writes the following:

“I’m not totally sure what he’s getting at. People use Fenwick because it’s persistent, and PDO because it’s not. Over the course of a single season, observed shooting and save percentage drive results, but they are not persistent.”

Dirk Hoag over at writes:

“Here’s an example of when NOT to use correlation as a tool in statistical analysis (when the variables in question are linked by definition). David makes a bad blunder here, by looking at scoring leaders, seeing a bunch of high shooting percentages, and concluding that shooting percentage is the true “talent”. The problem is that shooting percentage swings wildly from season to season, whereas shooting rates are much more consistent.”

The great advantage of corsi/fenwick has over goals as an evaluator of talent is the greater sample size associated with it.  The greater the sample size the more confidence we can have in any results we conclude from it and the less chance that ‘luck’ messes things up.  Year over year shooting percentage fluctuates a lot, but that doesn’t necessarily mean that it isn’t a talent or doesn’t have persistence, it could mean that the sample size of one year is too small.  The four year shooting percentage leader board seems to identify all the top offensive players so it can’t be completely random.  So what happens if we increase the sample size?  Here are correlations of fenwick shooting percentages while on ice in 5v5 even strength situations for forwards:

Year(s) vs Year(s) Corrolation
200708 vs 200809 0.249
200809 vs 200910 0.268
200910 vs 201011 0.281
200709 vs 200911 (2yr) 0.497

As you can see, there isn’t a lot of persistence year over year but for 2 years over 2 years we are starting to see some persistence.  Still not to the level of corsi/fenwick, but certainly not non-existant either, and the greater correlation with scoring goals makes fenwick shooting percentage on par with fenwick as a predictor of future goal scoring performance when we have 2 seasons of data as I pointed out in my last post.

For the record, year over year correlation for fenwick for rate is approximately 0.60 depending on years used  and 2 year vs 2 year correlation is 0.66.

But as I pointed out in my previous post, you would probably never use shooting percentage as a predictor because you may as well use goal rate instead which has the same sample size limitations as shooting percentage but also factors in fenwick rate.  Year over year correlation of GF20 (goals for per 20 minutes) is approximately 0.45 depending on years used and the 2 year vs 2 year correlation is 0.619 so GF20 has persistence and has a 100% correlation with itself making it as reliable (or more) a predictor of future goal scoring rates as fenwick rate with just one year of data and a better predictor when using 2 years of data.  Let me repost the pertinent table of correlations:

Year(s) vs Year(s) FenF20 to GF20 GF20 to GF20
200708 vs 200809 0.396 0.386
200809 vs 200910 0.434 0.468
200910 vs 201011 0.516 0.491
Average 0.449 0.448
200709 vs 200911 (2yr) 0.498 0.619
200709 vs 200910 (2yr vs 1yr) 0.479 0.527

The conclusion is, when dealing with less than a years worth of data, fenwick/corsi is probably the better metric to identify talent and predict future performance, but anything greater than a year goals for rate is the better metric and for one years worth of data they are about on par with each other.

Note:  This is only true for forwards.  The same observations are not true about defensemen where we see very little persistence or predictability in any of these metricts, I presume because the majority of them don’t drive offense to any significant degree.