There has been a fair bit of discussion going on regarding shot quality the past few weeks among the hockey stats nuts. It started with this article about defense independent goalie rating (DIGR) in the wall street journal and several others have chimed in on the discussion so it is my turn.
Gabe Desjardins has a post today talking about his hatred of shot quality and how it really isn’t a significant factor and is dominated by luck and randomness. Now, generally speaking when others use the shot quality they are mostly talking about thinks like shot distance/location, shot type, whether it was on a rebound, etc. because that is all data that is relatively easily available or easily calculated. When I talk shot quality I mean the overall difficulty of the shot including factors that aren’t measurable such as the circumstances (i.e. 2 on 1, one timer on a cross ice pass, goalie getting screened, etc.). Unfortunately my definition means that shot quality isn’t easily calculated but more on that later.
In Gabe’s hatred post he dismisses pretty much everything related to shot quality in one get to the point paragraph.
Alan’s initial observation – the likelihood of a shot going in vs a shooter’s distance from the net – is a good one. As are adjustments for shot type and rebounds. But it turned out there wasn’t much else there. Why? The indispensable JLikens explained why – he put an upper bound on what we could hope to learn from “shot quality” and showed that save percentage was dominated by luck. The similarly indispensable Vic Ferrari coined the stat “PDO” – simply the sum of shooting percentage and save percentage – and showed that it was almost entirely luck. Vic also showed that individual shooting percentage also regressed very heavily toward a player’s career averages. An exhaustive search of players whose shooting percentage vastly exceeded their expected shooting percentage given where they shot from turned up one winner: Ilya Kovalchuk…Who proceeded to shoot horribly for the worst-shooting team in recent memory last season.
So, what Gabe is suggesting is that players have little or no ability to generate goals aside from their ability to generate shots. Those who follow me know that I disagree. The problem with a lot of shot quality and shooting percentage studies is that sample sizes aren’t sufficient to draw conclusions at a high confidence level. Ilya Kovalchuk may be the only one that we can say is a better shooter than the average NHLer with a high degree of confidence, but it doesn’t mean he is the only one who is an above average shooter. It’s just that we can’t say that about the others at a statistically significant degree of confidence.
Part of the problem is that goals are very rare events. A 30 goal scorer is a pretty good player but 30 events is an extremely small sample size to draw any conclusions over. Making matters worse, of the hundreds of players in the NHL only a small portion of them reach the 30 goal plateau. The majority would be in the 10-30 goal range and I don’t care how you do your study, you won’t be able to say much of anything at a high confidence level about a 15 goal scorer.
The thing is though, just because you cannot say something at a high confidence level doesn’t mean it doesn’t exist. What we need to do is find ways of increasing the sample size to increase our confidence levels. One way I have done that is to use 4 years of day and instead of using individual shooting percentage I use on-ice shooting percentage (this is useful in identifying players who might be good passers and have the ability to improve their linemates shooting percentage). Just take the list of forwards sorted by on-ice 5v5 shooting percentage over the past 4 seasons. The top of that list is dominated by players we know to be good offensive players and the bottom of the list is dominated by third line defensive role players. If shooting percentage were indeed random we would expect some Moen and Pahlsson types to be intermingled with the Sedin’s and Crosby’s, but generally speaking they are not.
A year ago Tom Awad did a series of posts at Hockey Prospectus on “What Makes Good Players Good.” In the first post of that series he grouped forwards according to their even strength ice time. Coaches are going to play the good players more than the not so good players so this seems like a pretty legitimate way of stratifying the players. Tom came up with four tiers with the first tier of players being identified as the good players. The first tier of players contained 83 players. It will be much easier to draw conclusions at a high confidence level about a group of 83 players than we can about single players. Tom’s conclusions are the following:
The unmistakable conclusions from this table? Outshooting, out-qualitying and out-finishing all contribute to why Good Players dominate their opponents. Shot Quality only represents a small fraction of this advantage; outshooting and outfinishing are the largest contributors to good players’ +/-. This means that judging players uniquely by Corsi or Delta will be flawed: some good players are good puck controllers but poor finishers (Ryan Clowe, Scott Gomez), while others are good finishers but poor puck controllers (Ilya Kovalchuk, Nathan Horton). Needless to say, some will excel at both (Alexander Ovechkin, Daniel Sedin, Corey Perry). This is not to bash Corsi and Delta: puck possession remains a fundamental skill for winning hockey games. It’s just not the only skill.
In that paragraph “shot quality” and “out-qualitying” is used to reference a shot quality model that incorporates things like shot location, out-finishing is essentially shooting percentage, and outshooting is self-explanatory. Tom’s conclusion is that the ability to generate shots from more difficult locations is a minor factor in being a better player but both being able to take more shots and being able to capitalize on those shots is of far greater importance.
In the final table in his post he identifies the variation in +/- due to the three factors. This is a very telling table because it tells it gives us an indication of how much each factors into scoring goals. The following is the difference in +/- between the top tier of players and the bottom tier of players:
- +/- due to Finishing: 0.42
- +/- due to shot quality: 0.08
- +/- due to out shooting: 0.30
In percentages, finishing ability accounted for 52.5% of the difference, out shooting 37.5% of the difference and shot quality 10% of the difference. Just because we can’t identify individual player shooting ability at a high confidence level doesn’t mean it doesn’t exist.
If we use the above as a guide, it is fair to suggest that scoring goals is ~40% shot generation and ~60% the ability to capitalize on those shots (either through shot location or better shooting percentages from those locations). Shooting percentage matters and matters a lot. It’s just a talent that is difficult to identify.
A while back I showed that goal rates are better than corsi rates in evaluating players. In that study I showed that with just 1 season of data goal for rates will predict future goal for rates just as good as fenwick for rates can predict future goal for rates and with 2 years of data goal for rates significantly surpass fenwick for rates in terms of predictability. I also showed that defensively, fenwick against rates are very poor predictors of future goal against rates (to the point of uselessness) while goals against rates were far better predictors of future goal against rates, even at the single season level.
The Conclusion: There simply is no reliable way of evaluating a player statistically at even a marginally high confidence level using just a single year of data. Our choices are either performing a Corsi analysis and doing a good job at predicting 40% of the game or performing a goal based analysis and doing a poor job at predicting 100% of the game. Either way we end up with a fairly unreliable player evaluation. Using more data won’t improve a corsi based analysis because sample sizes aren’t the problem, but using more data can significantly improve a goal based analysis. This is why I cringe when I see people performing a corsi based evaluation of players. It’s just not, and never will be, a good way of evaluating players.