Goals, Corsi, and Weighted Shot Differential

Yesterday ‘Tangotiger’ introduced a new hockey metric that got the hockey twitter world all excited. Go read the articles for the methodology and rational behind the metric but in short he conducted first half season vs second half season regression and discovered that goals and shot attempts that didn’t result in goals should be weighted differently. The final result was that for his weighted shot differential goals should be given a weight of 1.0 and shot attempts that didn’t result in goals (saved, missed the net or blocked) should be given a weight of 0.2. Although he concluded that because of this Corsi is not a good statistic because it doesn’t apply the proper weighting correctly. The reality is, as others have pointed out, this new Weighted Shot Differential is actually highly correlated with corsi and here is why.

Consider the following formula for weighted shot for total (WSFT).

WSFT = Goals + (Corsi-Goals) * 0.2

We can reduce that formula further to

WSFT = Corsi + Goals * 0.8

Last seasons goals as a percentage of corsi (effectively corsi shooting percentage) ranged from 3.2% (Buffalo) to 5.3% (Anaheim) which means teams WSFT formula ranged from

WSFT = Corsi + 0.0255 * Corsi = 1.0255 * Corsi (for Buffalo)


WSFT = Corsi + 0.0428 * Corsi = 1.0428 * Corsi (for Anaheim)

which really isn’t much of an adjustment to overall Corsi.

The most important aspect of Tangotiger’s post is actually the part near the end to do with sample size.

Now, I know what you are going to say: how come all-shots correlate so much better than only-goals?  That’s easy.  The best way to increase correlation is to increase the number of trials.  It’s really that simple.  100 shots is not as good a forecaster as 500 shots, which is not as good as 2000 shots.  So, if you have 10 non-goal shots and 1 goal-shot, then naturally, the 10 non-goal shots will correlate better with future goals.

And indeed, this is consistent with the above results!  Since we weight each non-goal shot at 0.2 and each goal at 1.0, and if you have 2 EV goals and 20 EV non-goals, then guess what.  The 2 EV goals count as “2 trials”, while the 20 EV goals count as “4 trials”.  So, naturally, the 20 EV non-goals will correlate better than the 2 EV goals.  But, that still doesn’t mean you can weight both the same.  Not at all.

That is the crux of the whole corsi vs goal debate. The relative importance of shot attempts and goals as a predictor of future goal production is all about sample sizes. Shot attempts are much more reliable over small sample sizes. If we had an very very large sample goals would be the far better predictor (it theory it could be a perfect predictor in an infinitely large sample, Corsi could never be that). What Tangotiger did was attempt to determine the proper weightings when considering a 41 game sample. Nothing more. Nothing less. If you have a 30 game sample, goals would be weighted even less. If you had a larger sample, goals would be given more weight. In theory one should be able to develop a sliding scale where the weights vary based on sample size. Until then we can only guess what the actual weights should be for any particular sample size.

I get a fair bit of flak for being somewhat anti-Corsi but I am not really. I just feel the benefits of Corsi have been over sold and the issues with Corsi have been under reported. Corsi is a good evaluation tool but far too often it is used as the sole evaluation tool. If all you have are small sample sizes then that might be all that you can use but the reality is  for the majority of what we do in hockey analytics we have a lot more data to work with than, for example, games. It is about using all of the tools we have at our disposal, not relying on just one to the majority of the analysis we conduct.