I know I am in a bit of a minority but it is my opinion that one of the greatest failings of hockey analytics thus far is overstating the importance of Corsi at both the team and (especially) the individual level.
In a post yesterday about Luke Gazdic Tyler Dellow of mc79hockey.com wrote:
We care about Corsi% because it predicts future goals for/against better than just using goals for/against.
The problem is, this is only partly true and is missing an important qualifier at the end of the sentence. It should read:
We care about Corsi% because it predicts future goals for/against better than just using goals for/against when sample sizes are not sufficiently large.
We can debate what ‘sufficiently large’ sample sizes are but at the team level I’d suggest that it is something less than a full seasons worth of data and at the player level is probably between 500 and 750 minutes of ice time depending on shot rates based on some past research I have done.
In a post on the limits of Corsi at Arctic Ice Hockey Garret Hohl writes:
Winning in puck possession and scoring chances is important and will lead to wins but does not encompass the full game. The largest factors outside of possession and chances are luck (ie: bounces), special teams, and combination of goaltending and shot quality (probably in that order).
The problem with that paragraph is that there is no context of sample size. Sample size means everything when writing a sentence like that. If the sample was 3 games played by a particular team luck is quite probably the most important factor in determining how many of those 3 games the team wins. If the sample is 300 games luck is mostly irrelevant. Without considering sample size, there is no way of knowing what the ‘luck factor’ truly is. Furthermore, luck will mostly impact goaltending (save percentage) and shot quality (shooting percentage) so while goaltending and shooting talent can have minimal impact on winning over small sample sizes, it can’t be known what impact they have over the long haul without looking at larger sample sizes. Far too many conclusions about shot quality and goaltending have been made by looking at too small of sample sizes and far too few people have attempted to actually quantify the importance of shooting talent at the team level. As a result, far too often I hear statements like ‘Team X’s shooting percentage is unsustainable” when in reality it actually is.
Below is a chart of the top 5 and bottom 5 teams in terms of 5v5close shooting percentage over the 5 years from the 2007-08 season to 2011-12 season along with their shooting percentages from last year and this year through Saturday games.
|Top 5 Avg||8.27||9.01||8.18|
|Bottom 5 Avg||7.14||6.51||6.68|
What you will see is that the top 5 teams had an average 5-year shooting percentage 1.13% points higher than the bottom 5 teams. This is not insignificant either. It means that the top 5 teams will score almost 16% more goals than the bottom 5 teams just based on differences in their shooting percentage. If one looks at 5 year CF/60 you will find the top 5 teams are just over 17% higher than the bottom 5 teams so over a 5 year span. Thus, there is very little difference in the variation in shooting percentage and variation in corsi rates at the 5 year level.
Now, are shooting percentages sustainable? Well, in the 2 seasons since, one lock out shortened and one not yet complete, the top 5 5-year teams have actually, on average, improved while the bottom 5 teams have, on average, gotten worse. Aside from the 2012-13 NY Islanders all the other bottom 5 teams remained well below average and nowhere close to any of the top 5 teams. There is no observable regression occurring here.
Based on these observations, one can conclude that when it comes to scoring goals at the team level shooting percentages is pretty close to being equally important as shot generation. I won’t show it here, but if one did a similar study at the player ‘on-ice’ level you will find the difference in the best shooting percentage players and worst shooting percentage players are significantly more important than the difference in shot generation.
I don’t quite know why hockey analytics got this all wrong and has largely not yet come around to the importance of shot quality (it is slowly moving, but not there yet) as there have been some good posts showing the importance of shot quality but they largely get ignored out by the masses. Part of the problem is certainly that some of the early studies in shot quality just looked at too small a sample size. Another reason is that 2009-10 seems to be a real strange year for shooting percentages at the team level. Toronto, Edmonton and Philadelphia (top 5 teams from above) ranked 25th, 23rd and 20th in shooting percentage while San Jose, NY Islanders and New Jersey (bottom 5 teams from above) ranked 6th, 10th, and 13th. These were anomalies for all those teams so any year over year studies that used 2009-10 probably resulted in atypical results and less valid conclusions. Finally, I think part of the problem is that analytics have followed the lead of a few very vocal people and dismissed some other important but less vocal voices. Regardless of how we got here for hockey analytics to move forward we need to move past the notion that shot-based metrics are more important than goal based metrics.
Shot-based metrics are OK to use only when we don’t have a very large sample size. The thing is, this isn’t true for most players/teams. The majority of NHL players have played multiple seasons in the NHL and teams have a history of data we can look at. We can look at multiple years of data to see how sustainable a particular teams or players percentages are. It isn’t that difficult to do and will tell us far more about the player than looking at his CF% this season.
When I am asked to look at a player that I am not particularly knowledgeable on, the first thing I typically do is open up my WOWY pages for that player at stats.hockeyanalysis.com, especially the graphs that will quickly give me an indication of how the player performs relative to his team mates. I’ll maybe look at a multi-year WOWY first, and then look at several single-year WOWY’s to see if there are any trends I can spot. I’ll primarily look at GF% WOWY’s but will consider CF% WOWY’s to and maybe even GF20/GA20/CF20/CA20 WOWY’s. I look for trends over time, not how the player did during any particular year. This is because the percentages can matter a lot for some players and it is important to know what players can post good percentages consistently from year to year. I then may look at that players individual numbers such as GF/60, Pts/60, Assists/60 as well as IPP, IGP and IAP to determine how involved they were in the offense while they were on the ice (and I’ll do this looking at several seasons, and multiple seasons combined). Then I’ll take a look at his line mates, quality of competition, and usage (zone starts, PP/PK ice time, etc.). Only then will I start to feel comfortable drawing any kind of conclusions about the player.
As I recently wrote and article suggesting hockey analytics is hard and the above explains why. There is no single stat we can look at to find an answer. A goal-based analysis has flaws. A corsi-based analysis has flaws. Looking at just a single season has flaws. Looking at multiple seasons has flaws. There are score effects and quality of teammates and quality of opponents and zone starts that we need to consider not to mention sample sizes. Coaching/style of play is another area where hockey analytics has barely touched and yet it probably has a significant impact on statistics and results (maybe especially significant on corsi statistics). Hockey Analytics is hard and corsi doesn’t have all the answers so it is important not to reduce hockey analytics to looking up some corsi stats and drawing conclusions. I fear that hockey analytics has over-hyped the importance of corsi at the expense of other important factors and that is unfortunate.