The other day on twitter I questioned whether an existing expected goals model by @DTMAboutHeart adequately accounted for shot quality. This tweet it seems prompted a response from the Hockey Graphs crew in which they all take turns downplaying the importance or shot quality.
We already know that the impact of shot quality (context + skill) is miniscule in comparison to other factors –pet bugs
The question to ask then is: “why does shot quality have so little relative impact on long-term results?” –rjessop
I didn’t intend to start a debate on shot quality, I was simply expressing my doubts that the xG model adequately accounted for shot quality. I do need to look into it more and @DTMAboutHeart tells me he will be releasing player data in the next few days as which time I hope to evaluate it.
During the discussion though Nick Abe stepped in to promote his expected goals model.
— Nick Abe (@NickAbe) January 12, 2016
The good thing is he has made all the data available on his website XtraHockeyStats.com though I couldn’t find an actual formula on how they are calculated. With the data I was able to do a quick test to see if it is adequately accounting for shot quality. Before I get into it, let’s start with some background.
What is shot quality?
Well, shot quality basically is answering the question of “How difficult is this shot to save?” or in a different light “How likely is it that this shot will result in a goal?” Shot quality definitely exists. Some shots are definitely more difficult to save than others. Those who want to minimize the importance of shot quality go by the premise that over the long haul the average shot quality of all the shots taken when a player is on the ice or a whole team over the course of a season will average out to pretty much the same value for all players and teams. The claim is that shot quality exists for a single shot, but for a collection of a large number of shots during which a player is on the ice there just isn’t much variability.
If shot quality exists, how would shot quality manifest itself in statistics?
The answer to this is simple. The most direct measure of shot quality will be shooting percentage as this is the conversion rate from shots to goals. If shot quality exists players with a higher average shot quality will have a higher shooting percentage and players with a lower average shot quality will have a lower shooting percentage.
But wait, I look at the stats and players do in fact have different on-ice shooting percentages? Does this not mean shot quality exists?
The simple answer is, well, sort of yes but maybe not. The problem is that due to the infrequent nature of goals being scored there can be a lot of randomness in shooting percentages. It is really a sample size issue. When you hear about ‘regression to the mean’ this is basically what people are talking about. League average shooting percentage during 5 on 5 play is a little shy of 8% (~7.8% though it is much lower this season for some reason). Randomness over small sample sizes can result in unusually high or unusually low shooting percentages. If a player is on the ice for 200 shots for one would expect he would be on the ice for 16 goals (200*8%). If he had a couple lucky bounces he might be on the ice for 18 goals and have a shooting percentage of 9%. Conversely there might be another player who has been unlucky and was only on the ice for 14 goals for a 7% shooting percentage. Over the long haul a players luck will eventually even out. He may have started out with some good luck, but then he might face a stretch of bad luck. When a lucky player starts to have no luck or bad luck his shooting percentage will drop, or regress towards expected levels, 8%. So the real answer to the question is how much of the variation in observed shooting percentages is due to talent and how much is due to luck.
There are models to estimate luck but I won’t go into that here. Suffice to say that I believe that existing models overallocate the variance in observed shooting percentage to luck and under allocate it to talent in maintaining a higher than average shot quality (or lower than average for those who lack talent).
I hope this frames the debate well but anyone who disputes it feel free to post in the comments.
How to test if expected goal models adequately account for shot quality?
There are more complex ways you can do it but one quick and easy way is to compare expected shooting percentage (expected goals divided by shots) to observed shooting percentage. We know there is some luck in observed shooting percentage but to the extent that shot quality exists and the model is capturing it there should be some correlation between expected shooting percentage and actual shooting percentage. On Nick Abe’s website he has corsi data instead of shot data so I will be looking at expected corsi shooting percentage (eCSh%) vs corsi shooting percentage (CSh%). I used all 2007-08 through this season combined for all players with at least 3000 minutes of ice time. Here is the chart.
So, there is a bit of a correlation. It isn’t great but it is there. In a recent article I wrote I looked at how time on ice correlated with a number of statistics and its correlation with shooting percentage was significantly higher (r^2=0.452). From that perspective the expected goals model isn’t doing so well. Coaches are handing out ice time at a higher correlation with shooting percentage better than this expected model. One could argue that this means the “eye test” is out performing the model. That is one piece of evidence that suggest that Abe’s expected goal model is not adequately accounting for shot quality.
The next thing I did was look at the difference between CSh% and eCSh% for individual players. Here are the top 20 and bottom 20 players in CSh%-eCSh%.
|MARTIN ST LOUIS||5.86%||4.68%||1.17%|
The thing that should immediately jump out at you is that practically all of the top 20 players are high end players or in some cases are line mates of high end players (Dupuis) while the bottom 20 are all 3rd and 4th line players. This does not bode well for the claim that the model adequately accounts for shot quality. If the model fully accounted for shot quality the only thing that should remain when we calculate CSh% – eCSh% is the luck component of the observed CSh%. The luck component should be entirely randomly distributed across all players, not good luck assigned to first liners and bad luck assigned to third and fourth liners. The only logical conclusion we can have is that the model is not adequately accounting for shot quality.
Since the list is so well sorted between first liners up top, 3rd/4th liners at the bottom it seems that the luck component is still a relatively small component in CSh% – eCSh%. A significant portion of the variance we see there (which produced a range from -1.36% to +1.34%) is actually shot quality talent, not luck.
I don’t want to pick on Nick Abe because he is not alone here. Every shot quality model I have seen under estimates shot quality. It’s crazy to think about it but the hockey analytics community has been underestimating the importance of shot quality for a long time now. Shot quality exists. It is incredibly important. It is a big reason why good players are good players as Tom Awad wrote about in 2010. It baffles me that to this day that we have really smart people downplaying the role of shot quality in hockey especially at the player level. It is mind boggling that just mentioning that a model might be underestimating shot quality results in an important hockey analytics blog bringing 7 of its authors together in an effort to (incorrectly) minimize the importance and value of looking at shot quality to near zero. It is just mind boggling and I can’t believe we are still having this debate. Shot quality exits. It is important. Deal with it.
Now, I really do hope that @DTMAboutHeart’s expected goal model adequately accounts for shot quality. I really do hope for this because it would be a major step forward. I have my doubts though but am open to being proven wrong.