Are you adequately accounting for Shot Quality in your Expected Goals model?

The other day on twitter I questioned whether an existing expected goals model by @DTMAboutHeart adequately accounted for shot quality. This tweet it seems prompted a response from the Hockey Graphs crew in which they all take turns downplaying the importance or shot quality.

We already know that the impact of shot quality (context + skill) is miniscule in comparison to other factors –pet bugs

 

The question to ask then is: “why does shot quality have so little relative impact on long-term results?” –rjessop

 

I didn’t intend to start a debate on shot quality, I was simply expressing my doubts that the xG model adequately accounted for shot quality. I do need to look into it more and @DTMAboutHeart tells me he will be releasing player data in the next few days as which time I hope to evaluate it.

During the discussion though Nick Abe stepped in to promote his expected goals model.

The good thing is he has made all the data available on his website XtraHockeyStats.com though I couldn’t find an actual formula on how they are calculated. With the data I was able to do a quick test to see if it is adequately accounting for shot quality. Before I get into it, let’s start with some background.

What is shot quality?

Well, shot quality basically is answering the question of “How difficult is this shot to save?” or in a different light “How likely is it that this shot will result in a goal?” Shot quality definitely exists. Some shots are definitely more difficult to save than others. Those who want to minimize the importance of shot quality go by the premise that over the long haul the average shot quality of all the shots taken when a player is on the ice or a whole team over the course of a season will average out to pretty much the same value for all players and teams. The claim is that shot quality exists for a single shot, but for a collection of a large number of shots during which a player is on the ice there just isn’t much variability.

If shot quality exists, how would shot quality manifest itself in statistics?

The answer to this is simple. The most direct measure of shot quality will be shooting percentage as this is the conversion rate from shots to goals. If shot quality exists players with a higher average shot quality will have a higher shooting percentage and players with a lower average shot quality will have a lower shooting percentage.

But wait, I look at the stats and players do in fact have different on-ice shooting percentages? Does this not mean shot quality exists?

The simple answer is, well, sort of yes but maybe not. The problem is that due to the infrequent nature of goals being scored there can be a lot of randomness in shooting percentages. It is really a sample size issue. When you hear about ‘regression to the mean’ this is basically what people are talking about. League average shooting percentage during 5 on 5 play is a little shy of 8% (~7.8% though it is much lower this season for some reason). Randomness over small sample sizes can result in unusually high or unusually low shooting percentages. If a player is on the ice for 200 shots for one would expect he would be on the ice for 16 goals (200*8%). If he had a couple lucky bounces he might be on the ice for 18 goals and have a shooting percentage of 9%. Conversely there might be another player who has been unlucky and was only on the ice for 14 goals for a 7% shooting percentage. Over the long haul a players luck will eventually even out. He may have started out with some good luck, but then he might face a stretch of bad luck. When a lucky player starts to have no luck or bad luck his shooting percentage will drop, or regress towards expected levels, 8%. So the real answer to the question is how much of the variation in observed shooting percentages is due to talent and how much is due to luck.

There are models to estimate luck but I won’t go into that here. Suffice to say that I believe that existing models overallocate the variance in observed shooting percentage to luck and under allocate it to talent in maintaining a higher than average shot quality (or lower than average for those who lack talent).

I hope this frames the debate well but anyone who disputes it feel free to post in the comments.

How to test if expected goal models adequately account for shot quality?

There are more complex ways you can do it but one quick and easy way is to compare expected shooting percentage (expected goals divided by shots) to observed shooting percentage. We know there is some luck in observed shooting percentage but to the extent that shot quality exists and the model is capturing it there should be some correlation between expected shooting percentage and actual shooting percentage. On Nick Abe’s website he has corsi data instead of shot data so I will be looking at expected corsi shooting percentage (eCSh%) vs corsi shooting percentage (CSh%). I used all 2007-08 through this season combined for all players with at least 3000 minutes of ice time. Here is the chart.

Abe_eCShPct_vs_CShPct

So, there is a bit of a correlation. It isn’t great but it is there. In a recent article I wrote I looked at how time on ice correlated with a number of statistics and its correlation with shooting percentage was significantly higher (r^2=0.452). From that perspective the expected goals model isn’t doing so well. Coaches are handing out ice time at a higher correlation with shooting percentage better than this expected model. One could argue that this means the “eye test” is out performing the model. That is one piece of evidence that suggest that Abe’s expected goal model is not adequately accounting for shot quality.

The next thing I did was look at the difference between CSh% and eCSh% for individual players. Here are the top 20 and bottom 20 players in CSh%-eCSh%.

NAME CSh% eSh% CSh%-eSh%
BRAD RICHARDS 5.64% 4.30% 1.34%
NATHAN HORTON 5.62% 4.30% 1.32%
SIDNEY CROSBY 5.75% 4.43% 1.32%
MARTIN ST LOUIS 5.86% 4.68% 1.17%
MIKE RIBEIRO 5.46% 4.29% 1.16%
RENE BOURQUE 5.09% 3.95% 1.14%
JAMIE BENN 5.47% 4.33% 1.13%
PASCAL DUPUIS 5.25% 4.14% 1.10%
ALEX TANGUAY 5.67% 4.57% 1.10%
STEVEN STAMKOS 5.58% 4.49% 1.09%
JASON SPEZZA 5.27% 4.19% 1.07%
ALEXANDER SEMIN 5.33% 4.27% 1.07%
BOBBY RYAN 5.76% 4.70% 1.06%
DAVID KREJCI 4.95% 3.96% 0.99%
JAROME IGINLA 5.26% 4.27% 0.99%
HENRIK SEDIN 5.34% 4.35% 0.98%
ALEX BURROWS 5.35% 4.37% 0.97%
JIRI HUDLER 5.30% 4.34% 0.96%
JOFFREY LUPUL 5.33% 4.36% 0.96%
THOMAS VANEK 5.57% 4.61% 0.96%
ERIK CONDRA 3.43% 4.05% -0.61%
DWIGHT KING 3.60% 4.23% -0.63%
GREGORY CAMPBELL 3.49% 4.13% -0.64%
TRAVIS MOEN 3.56% 4.21% -0.65%
DANIEL PAILLE 3.58% 4.27% -0.69%
DREW MILLER 3.79% 4.49% -0.69%
ZACK SMITH 2.89% 3.65% -0.76%
MIKE RICHARDS 3.40% 4.17% -0.77%
CHRIS DRURY 3.81% 4.62% -0.81%
MARCUS KRUGER 3.40% 4.30% -0.89%
RYAN CALLAHAN 3.87% 4.80% -0.93%
PATRICK DWYER 3.00% 3.98% -0.98%
TOMMY WINGELS 3.27% 4.27% -1.00%
SHAWN THORNTON 3.12% 4.18% -1.06%
NATE THOMPSON 3.47% 4.56% -1.09%
TREVOR LEWIS 2.78% 3.93% -1.16%
TODD MARCHANT 3.20% 4.39% -1.19%
JORDAN STAAL 2.89% 4.15% -1.25%
MATT MARTIN 3.31% 4.65% -1.34%
BRIAN BOYLE 3.37% 4.73% -1.36%

The thing that should immediately jump out at you is that practically all of the top 20 players are high end players or in some cases are line mates of high end players (Dupuis) while the bottom 20 are all 3rd and 4th line players. This does not bode well for the claim that the model adequately accounts for shot quality. If the model fully accounted for shot quality the only thing that should remain when we calculate CSh% – eCSh% is the luck component of the observed CSh%. The luck component should be entirely randomly distributed across all players, not good luck assigned to first liners and bad luck assigned to third and fourth liners. The only logical conclusion we can have is that the model is not adequately accounting for shot quality.

Since the list is so well sorted between first liners up top, 3rd/4th liners at the bottom it seems that the luck component is still a relatively small component in CSh% – eCSh%. A significant portion of the variance we see there (which produced a range from -1.36% to +1.34%) is actually shot quality talent, not luck.

I don’t want to pick on Nick Abe because he is not alone here. Every shot quality model I have seen under estimates shot quality. It’s crazy to think about it but the hockey analytics community has been underestimating the importance of shot quality for a long time now. Shot quality exists. It is incredibly important. It is a big reason why good players are good players as Tom Awad wrote about in 2010. It baffles me that to this day that we have really smart people downplaying the role of shot quality in hockey especially at the player level. It is mind boggling that just mentioning that a model might be underestimating shot quality results in an important hockey analytics blog bringing 7 of its authors together in an effort to (incorrectly) minimize the importance and value of looking at shot quality to near zero. It is just mind boggling and I can’t believe we are still having this debate. Shot quality exits. It is important. Deal with it.

Now, I really do hope that @DTMAboutHeart’s expected goal model adequately accounts for shot quality. I really do hope for this because it would be a major step forward. I have my doubts though but am open to being proven wrong.

 

This article has 3 Comments

  1. David,

    I misunderstood your initial tweet. You’re talking about shooter quality not shot quality. I intentionally left shooter quality out of the eGF model. The reason I leave it out is because a) it naturally creates a bias and b) it actually reduces the information you receive from eGF.

    This bias it creates is obvious: if you’re a defenseman and you give up a shot to an average player and eGF did account for shooter quality then that would carry some eGF – let’s say 0.25. If however it was to a good shooter it would carry some new eGF of say 0.25 * (1 +/- Shooter Skill). So that very obvious problem is a good defenseman spends more time against 1st liners – which your analysis points out have a higher on iceSH%. All else equal – being a top defenseman would automatically make your eGF% lower, or being a top forward would automatically make your eGF% higher.

    This then brings me to point B, which is the information lost. What we’re trying to do with an eGF is explain away the portion of GF/GA that can be explained by everything but shooter skill. If you include shooter skill then you with an eGF that just approximates GF/GA – which you already know. The purpose (at least in my opinion) of Corsi, Fenwick, or eGF is to determine how much of the actual GF% performance might be attributable to “playing the game right” and not just relying on a supernatural ability to shoot the puck.

    Finally, the chart you posted will tend to correspond very well with the ratings that I have. In particular that one is called OFFENSIVE AWARENESS. Because it is on ice CF% (not individual CF%) you can’t say that it is entirely shooter quality. Anyways, OFFENSIVE AWARENESS has a mean of 70 and a std deviation of 5. So if you wanted an eGF that fully took shooter quality into consideration you could still use mine.

    Simply take the eGF numbers provided on the website and add in a column for the individual’s offensive awareness. Since you’ve already calculated the std. deviation in this case to be 0.54% then a shooter quality eGF would be:

    original eGF + (OFF AW. – 70)/5 * 0.54

    You should then be able to isolate luck from skill.

    1. If you are leaving shooter quality out of the model it isn’t so much expected goals as expected goals if you were an average shooter. That would have some value too for some applications but not for overall player value.

      I’ll have a look at your offensive awareness stat. I am not completely familiar with everything you have done but I’ll look at it more. I do think it is important to attempt to show how much of shot quality it is capturing and that is what I was trying to get to with this post.

      1. Yeah – my FTP is down right now for some reason or else I would do that calculation I talked about for you. I honestly have no idea how it would turn out relative to your expectations – but it should work. There is a detailed post in my blog about how the ratings work so you can learn more about them there. But to do it in the most unbiased way (i.e. not including data the eGF model shouldn’t have) you’d need to look at the ratings posted for a prior year i.e. 2010/2011 if you’re looking at 2011/2012 data. So it will be somewhat complicated to calculate and it might be easier for me to do an Excel file for you and post it when my FTP starts working again.

        And right now it is capturing absolutely 0 shot quality and it is expected goals if you are an average shooter – for the reasons stated above. I don’t believe shot quality should be included in expected goals.

        Thanks for the article though – always like to get feedback and comments.

Comments are closed.