Evaluating Player Evaluation Metrics and Expected Goal Models


Think about the perfect scenario where we have an infinite sample size. A scenario where every player plays an infinite amount of ice time with and against every other player. In fact, every 6-player combination of 3F-2D-1G plays against every other 6-player combination an infinite amount of ice time. Players start an infinite number of times in the offensive zone, defensive zone and neutral zone and they play an infinite amount of time in all score scenarios.

Under this scenario there is no need to make considerations for sample size, quality of teammates, quality of competition, zone starts, score effects, etc. Under this scenario with infinite sample size we can get a perfect measure of shooting percentage which is heavily influenced by small sample sizes. As a result, under this scenario, the perfect metric for for a players overall value would be GF%. The perfect metric for a players offensive value would be GF60 and the perfect metric for a players defensive value would be GA60. There is no need to consider anything else. No need for Corsi, Fenwick or expected goals. No need to account for teammates, competition, zone starts or score effects. GF%, GF60 and GA60 tell us everything we need to know about a players overall value and the overall talent level of the player. Scoring goals and preventing goals is the only objective a hockey player should have so there is no reason to consider anything else.

Of course, in the real world we don’t have access to that idealistic scenario but that doesn’t change the fact that in the perfect world GF%, GF60 and GA60 would be the gold standard in player evaluation metrics. It may seem obvious but it is important to remember that the only reason we even look at Corsi or zone starts or track things like zone entries is because of limitations of the data that is available to us to analyze. It is also important to remember that, other than GF%/GF60/GA60, no metric can ever provide a perfect evaluation under any scenario. There will always be an upper limit to what any non GF%/GF60/GA60 metric can tell you about the players value. Maybe that is 90% of the players overall value, maybe it is 40%, but it is critically important to understand what that level is. Furthermore, as GF%/GF60/GA60 is the gold standard in player evaluation metrics we should get an idea of under what scenarios would a metric (or analytical process) result in a better player evaluation.

Let me state that differently. If you have developed a new metric to evaluation hockey players not only should you be able to describe how to calculate that method but you should be able to provide some reference to when one should/could use that metric over GF%/GF60/GA60 and what the upper bounds are in terms of what percentage of a players overall value it could possibly explain.

Let’s take Corsi as an example. It’s equivalent components to GF%, GF60 and GA60 are CF%, CF60 and CA60. The reason we use Corsi is because it generates larger sample sizes significantly faster than goals do. Up to 30 times faster. A team might score 2 to 3 goals over 60 minutes of 5v5 play but they will get upwards of 50-60 shot attempts. The significantly larger sample sizes means Corsi statisitcs become much more reliable at measuring what they do measure much quicker than goal based statistics do. On the flip side though is that there is an upper bound on what Corsi statistics can tell us about a player because shot conversion rates (aka shooting percentage, aka shot quality) are not considered.

The two questions we ought to try and answer are:

  1. How much of a players overall value does Corsi explain? (definind an upper limit)
  2. At what sample size does Corsi explain a players value better than goals do and as sample sizes increase, at what point does a goals based analysis surpass a Corsi based analysis.

The answer to question #1 could be answered in a number of different ways. Lets consider the offensive stats because they are easier to deal with not neading to take into account goaltending. A simple way could be to look at a long-term sample and determine how much goals are explained by corsi. Using data from 2007-08 season through March 3rd of this current season for forwards with >3000 minutes of 5v5 ice time, the r^2 between CF60 and GF60 is 0.53 which means the upper bound on CF60 explaining offensive talent of forwards is 53%. Put in different terms, using CF60 can at best explain 53% of a players offensive talent.

Tom Awad had a series of articles on Hockey Prospectus looking at what makes good players good. In part one (http://www.hockeyprospectus.com/puck/article.php?articleid=625) he split forwards into four tiers based on their ice time. He then estimated each tiers contribution to scoring and preventing goals based on outshooting the opponent (shot quantity), shot quality (shot location), and ‘finishing’. Here is a recreation of Awad’s summary table.

Group ± due to finishing ± due to shot quality ± due to outshooting
1st tier 0.22 0.04 0.15
2nd tier 0.07 0.02 0.10
3rd tier 0.00 0.01 -0.06
4th tier -0.20 -0.04 -0.15

I personally would define shot quality as a combination of what Awad defines as ‘finishing’ plus what Awad defines as ‘shot quality’. If we sum these two together we see the range from 1 players to tier four players is +0.26 to -0.24 for a difference of 0.50. That compares to an overall net difference of 0.30 for outshooting. This would imply that overall outshooting an opponent would account for approximately 0.3 / (0.3+0.5) = 37.5% of the spread in talent. This in turn would imply that outshooting an opponent is only 37.5% of a players overall value. My own personal opinion is this is probably on the low side but it is fair to say that the upper bound on what Corsi can explain is probably at best in the 50% range. Corsi at best can tell you half of a players overall value and that is only if you are able to isolate all other factors such as usage, quality of teammates, quality of opponents, score effects, etc. That isn’t very good.

To answer the second question we would need to do an analysis over varying sample sizes and try to find where the Goals value surpassed the Corsi value. For simplicity though let me go back to the CF60 vs GF60 question and use a sample size of at least 500 minutes in a full season. I grabbed 2013-14 and 2014-15 5v5 data for forwards with at least 500 minutes of 5v5 ice time in each season from corsica.hockey (which I used because I wanted to look at their expected goals stat as well – see below). I then looked at how well 2013-14 CF60 and GF60 predicted 2014-15 GF60. I then did the same for the two previous full-year comparisons I could do (excluding the 2012-13 partial season). Here are my findings.

CF60 GF60
2009-10 predicting 2010-11 GF60 (r^2) 0.263 0.235
2010-11 predicting 2011-12 GF60 (r^2) 0.126 0.228
2013-14 predicting 2014-15 GF60 (r^2) 0.184 0.281
Average 0.191 0.248

So at one year of data GF60 is a the better predictor of next seasons GF60. For forwards with >500 minutes of ice time you are better off using GF60 over CF60. The sample size where CF60 is better to use in evaluating a players offensive performance is between zero and something less than 500 minutes of ice time. That’s it. A partial season. And even then you can’t even explain half of what makes a good offensive player. That makes Corsi a not overly useful statistic for player evaluation in the grand scheme of things. This is why I have always recommended using a goal based analysis and looking at longer trends. It is the only way current analytics can provide a reasonably reliable evaluation of a players value.

To summarize, Corsi can at best explain approximately 50% of a players offensive talent and somewhere around or a little less than 1 year (500 minutes) is where the sample size is large enough that GF60 is a more reliable measure of a players offensive performance.

The result of that is for any player who has at least 500 minutes of ice time it is probably best to use GF60 to evaluate his offensive ability than CF60. Anything less than that sure, go ahead and use CF60 since it will be the best but know that with that sample size and the limitations of CF60 usefulness (at best explains ~50% of players offensive talent) and conclusions are still fraught with uncertainty.

The bolded statement above is the kind of statement I would like to see everyone write for any metric they come up with. It contains the critical pieces of information we need to know.

Now that I am over 1350 words into this I can get to what prompted me to write this article int he first place. The value of expected goals models.

A while ago I looked at Nick Abe’s expected goals model and showed that the model seems to be under accounting for shot quality. To test the model I did a simple correlation between the expected (on ice) corsi shooting percentage (eCSh%) based on the estimated expected goals and observed (on ice) corsi shooting percentage (CSh%). It was found that there was a fairly low correlation (r^2=0.198). While there will be some randomness and uncertainty within the observed CSh% data I was looking at  8 1/2 seasons worth of data and forwards with >3000 minutes of ice time. This should put significant bounds on the amount of uncertainty in the observed data. Any expected goals model that came anywhere close to fully accounting for shot quality should be able to post an r^2 far greater than 0.198. Furthermore when we looked at the data we could see that all the players where the expected goals model underestimated their actual CSh% were highly skilled players while those where the model overestimated their CSh% were 3rd and 4th line defensive players.

There is a new expected goals model by Emmanuel Perry (@MannyElk) on his corsica.hockey website. Perry wrote about his expected goals model and in doing so I was pleasantly pleased to see that he actually provided a fair assessment of the usefulness of the model. In short, it wasn’t very good at predicting future goal rates. I am going to provide some additional evaluation here.

As I did with Nick Abe’s expected goals model, the first thing I did was run a correlation between expected goals fenwick shooting percentage (xFSh%) and observed fenwick shooting percentage (FSh%). The result is a slightly better, but still not outstanding, r^2 of 0.2367.


For me this is still nothing to write home about. The expected goals model is only explaining about 24% of shooting percentage. Some portion of the other 76% is randomness but a significant portion is still unaccounted for.

As was found with Nick Abe’s expected goals model, the players whose FSh% is underestimated by the corsica.hockey model are predominantly top offensive players. The top seven underestimates are Stamkos, Horton, St. Louis, Sykora, Tanguay, Krejci, Crosby. The players who the model overestimates their FSh% the most are Tim Jackman, Patrick Dwyer, Chris Drury, Trevor Lewis, Tommy Wingels, Andrew Desjardins, Adam Hall, Todd Marchant, etc. They are predominantly 3rd and 4th line players.

It seems that maybe the expected goals model is capturing some of shot quality but in order to test how well it is doing I will first look at a comparison between CF60 and xGF60. To do this test I took an average absolute deviation from GF60 for all forwards with >3000 5v5 minutes from 2007 to 2016. Using CF60 and league average corsi shooting percentage to convert to a corsi based GF60 (cGF60) the average absolute deviation from GF60 was 11.5%. The average absolute deviation between xGF60 and GF60 is 10.8%. This means xGF60 is only a marginal improvement over CF60 and as I showed above CF60 is worse at predicting future GF60 than GF60 is once you get to ~500 minutes worth of data.

Let’s have a look at some individual players data. Here is a chart of Stamkos’ FSh% and his xFSh% since he entered the league.


That is a pretty consistent underestimate of Stamkos’ FSh%. We can look at how this impacts his expected goals rate by comparing observed GF60 to xGF60 and cGF60.


On average xGF60 is underestimating Stamkos’ observed GF60 by 21%. For those wondering what the myGF60 is it is my GF60 estimate if I took Stamkos’ FF60 and multiplied it by the league average FSh%. As you can see, the xGF60 model is far closer to the league average FSh% model than Stamkos’ actual GF60.

Here is a chart for Thomas Vanek, another player for which the expected goals model underestimates his GF60.


Unlike for Stamkos though, Vanek is one player where the xGF60 model is significantly better than the simple myGF60 model.

Now let’s look at someone at the other end of the spectrum who I have often written about as being more defensive players. Brandon Sutter.


Except for this season (where Brandon Sutter has been mostly injured) the xGF60 over predicts Sutter’s actual GF60. In fact, for about half the seasons myGF60 using league average FSh% was actually better than xGF60.

Here is Daniel Winnik.


Here you see xGF60 almost perfectly mimicking myGF60 with both overestimating Winnik’s actual GF60 pretty consistently.

Now to round out the analysis lets take a look at some middle of the road on-ice shooting percentage players.


Bickell is an interesting case as over the years his GF60 has fallen but both the xGF60 and cGF60 have risen. Neither of these has been very good at estimating Bickell’s offensive production.

One last player.


Not a lot to see here. Sometimes xGF60 is closer to GF60, sometimes cGF60 is.

Again, overall xGF60 is a slight improvement over CF60 but hardly worth the effort of calculating it.

Here is an updated table that I showed you earlier.

CF60 xGF60 GF60
2009-10 predicting 2010-11 GF60 (r^2) 0.263 0.293 0.235
2010-11 predicting 2011-12 GF60 (r^2) 0.126 0.098 0.228
2013-14 predicting 2014-15 GF60 (r^2) 0.184 0.207 0.281
Average 0.191 0.199 0.248

There you have it. xGF60 is not significantly better than CF60. Stick to using GF60.

Now, there is one more xG model that I can check. It is one created by Hockey Graphs writer @DTMAboutHeart. A description of his model and the data can be found here. The data is a little less convenient to work with and there is no multi-year sample so we are stuck analyzing the smaller sample size of single seasons. That said, we can still see similar trends to both Nick Abe’s model and Emmanuel Perry’s model. The largest underestimate of on-ice shooting percentage is with high end offensive players like Toews, Spezza, Kovalchuk, the Sedin’s, Crosby, etc. while the largest overestimates are third and fourth liners like Matt Martin, Travis Moen, Todd Marchant, Cal Clutterbuck, etc.

We can see this once again in a plot of Stamkos xOnSh% vs his actual OnSh%.


This expected goal model also consistently underestimates Stamkos’ on-ice shooting percentage by a fairly significant margin though not quite as much as corsica.hockey’s model. The reason for this is this model incorporates regressed individual shooting percentage which is kind of cheating but the only way to make it work. I looked at a few players and found much the same trends so I won’t make this post any longer than it already is with those details. That said, I would welcome seeing @DTMAboutHeart do a more complete and formal analysis of his model along the lines of what I have done above to show how well his model performs and when it is more useful than using a goal based stat.

None of this surprises me really. The issue is that these are largely based on shot location data and it has been shown time and time again that shot location is only a fraction of what goes into shot quality as Tom Awad concluded in his ‘What makes good players good’ article discussed above. We are simply unable to adequately account for shot quality with the data we currently have and with the data we currently have we can only marginally improve on corsi stats which are quite limited as a player evaluation metric in their own right.

Let me finish off with some tips I have for player evaluation.

  • For most players we have more than a full seasons worth of data so I recommend using it in your player evaluations.
  • Look for trends in multi-year goal stats like a player consistently posting above/below average statistics.
  • Definitely look at Rel or RelTM numbers as well to isolate quality of team factors.
  • If you are looking at rookies or players with less than 500 minutes of ice time feel free to look at Corsi but understand that even under ideal conditions Corsi at best will only explain half of the players overall value. Be cautious about what conclusions you draw from it.
  • If you wish to use expected goals feel free to use it but see my previous point on Corsi as everything written there applies to the expected goals models too. Do not think just because the idea of expected goals sounds good that it is actually providing you with something significantly more useful. It isn’t.
  • Honestly, hockey analytics is not at a state where we can reliably predict a players value with even a full seasons worth of data. Let’s not pretend it does.