David Johnson

Jan 172015
 

Shot quality as a talent at the team or player on-ice level has long been a topic of great debate and I outlined some of that debate in an article I wrote earlier in the week. For those who don’t believe that shot quality is a significant factor in performance put a lot of stock in possession metrics such as Fenwick or Corsi. These are shot attempt based metrics and as such ignore shot quality altogether. For those, like myself, who believe shot quality matters (at least for some teams and especially some players) I consider a possession based analysis a (potentially) incomplete analysis. Today I am going to put that debate aside and ask the question, is there any relationship between possession and shooting percentage?

To answer this question I took a look at CF% and CSh% (Corsi shooting percentage = GF/CF) for all 30 teams over the previous 3 seasons combined in 5v5close situations. When I plot these, here is what I get.

Possession_vs_ShootingPct_3yr5v5close

Ok, so while there seems to be some correlation it really isn’t all that significant. You might be inclined to end the investigation right here and conclude that there is no relationship but when you actually look at the data you will find that of the 10 best CSh% teams 8 of them are sub-50 CF% teams and of the 10 worst CSh% teams six are better than 50 CF% teams. The two top CSh% teams that have CF% above 50% are Chicago and Pittsburgh, two teams with elite level talent. The four bad CSh% teams that have a CF% below 50 are  Florida, Carolina, Minnesota and Buffalo. Of those teams, Florida, Carolina and Buffalo have combined for one playoff appearance in each of the past 3 seasons.

So, it appears that the teams that break the trend of good CSh% equals poor CF% and poor CSh% equals good CF% are the truly good or truly bad teams or, for better terminology, we could call them outlier teams. What if we attempted to remove the really good and really bad outlier teams from our analysis and focus on the teams that are more typical teams in terms of talent. To do this in an unbiased way I used GF% to rank teams and I removed the top 4 and bottom 4  GF% teams (8 total, or just over a quarter of the teams were removed). This is what the chart looks like now.

Possession_vs_ShootingPct_3yr5v5close_5-26

Now that looks better. R^2 has jumped from 0.09 to 0.47 and there is a clear negative relationship between possession and corsi shooting percentage. For the record the teams that were removed were Boston, Anaheim, Chicago, Pittsburgh, Calgary, Buffalo, Edmonton, and Florida.

For curiousity I took this one step further and removed the next two best (St. Louis, Detroit) and two worst (NY Islanders, Minnesota) teams and got the following chart.

Possession_vs_ShootingPct_3yr5v5close_7-24

Wow, R^2 jumps all the way to 0.77 which is a very strong correlation and indicates that for a large number of non-elite, non-terrible teams there is a strong negative correlation between possession and shooting percentage such that the difference between a 45% and a 55% possession team is 1.22% hit to CSh%. Considering last season the average team had about 2200 5v5close Corsi For events that would equate to a difference of about 27 goals. Considering the average NHL team had 90 5v5close goals last season, that is not an insignificant number.

How does the R^2 hold up for this season? Well, if we include all teams the R^2 is 0.00 or absolutely no correlation. If we delete the top 4 and bottom 4 GF% teams it improves to 0.097. If we drop the top 6 and bottom 6 it jumps to 0.26 and if we drop the top 7 and bottom 7 teams and just focus on the middle 16 the R^2 jumps up to 0.35. Now these correlations are not near as good as the 3-year analysis above but remember that our sample sizes are significantly smaller too (~43-45 games compared to 212 games). The general trend still continues. If we remove the really good and really bad outlier teams there appears to be a relatively strong negative relationship between possession and shooting percentage.

Now that we have identified a relationship, on thing we can do is look at how teams have changed from last season to this season. Let’s take the Edmonton Oilers as an example since they have improved their 5v5close CF% quite significantly this season but they are not an improved team. Let’s look at their numbers from last season and this season.

. CF% CSh%
2014-15 48.7 3.49
2013-14 43.4 4.38
Difference 5.3 -0.89

So, their 5v5 CSh% has improved from 43.4% to 48.7%. If we plug that 5.3% improvement into the regression equation above we would expect that their CSh% would drop 0.65% where it actually dropped 0.89%. Edmonton dropped from 11th in CSh% last season to 27th this season.

A couple of months ago I investigated the relationship between Corsi Against rates and save percentage and found that there does appear to be a relationship such that an increase in corsi against would result in a improved save percentage. This is completely consistent with the analysis above which one could infer that an increase in shot attempts correlates with a decrease in shooting percentage.

It is difficult to say whether these correlations are due to systems or talent but I have a couple theories.

  1. Good possession teams play in the offensive zone more frequently and the defensive zone less frequently. This could result in a shot type bias away from higher quality “rush shots” and towards lower quality zone play shots.
  2.  It could be related to style of play and passing. It has been shown that shots after passes are more likely to result in goals and lateral movement, especially passes, across the “Royal Road” down the center of the ice also result in more goals. My theory is passing, and in particular passing through the center of the ice, while more likely to result in a goal is also more likely to result in a turnover. Thus teams that take riskier, longer passes especially lateral passes are more likely to see plays result in a goal if successful or a turnover (and no shot from that possession) if unsuccessful. Conversely a more conservative passing team with fewer cross-ice passes through traffic would have fewer possession not result in shots but in turn not get rewarded with high quality shots that result from those risky cross-ice plays.

In conclusion, if you have exceptional talent such as Pittsburgh with Crosby and Malkin or Chicago with Kane and Toews  or exceptional depth like Boston or Detroit you might be able to be a good possession and a good shooting percentage team but if you are not one of the truly elite teams in the league it seems you likely have to choose one or the other. Unless of course you are Buffalo and you are terrible at both.

 

Update: Tyler Dellow, in one of his few hockey related tweets since being hired from the Oilers, tweeted the following:

Tyler is right. Things fall apart for earlier seasons. Let’s look at this in more detail by looking at R^2 between CF% and CSh% for individual seasons for all teams, middle 26, 22, and 18 GF% teams. Here is what we have:

Possession_vs_ShootingPct_SingleYear5v5Close

All of the above relationships are negative relationships meaning improved CF% led to decreased CSh% so it is very difficult to argue that this relationship isn’t real. More shots tends to mean lower shot quality.

Additionally, for 5 of the 7 seasons the middle 22 are better than the middle 26 which is better than all 30 teams (only 2007-08 and 2010-11 do not fit) and of those 5 seasons, four of them also have the middle 18 teams being better than the middle 22 (only 2011-12 is worse). This implies that there may be a few truly elite teams that can post a good CF% and a good CSh% and a few truly terrible teams that put up bad CF% and bad CSh% but for the mass of teams in the middle the trend holds.

Finally, the strongest relationships have occurred during the previous few seasons after removing the outlier teams from the sample and from above 2014-15 appears to following that trend as well. It is difficult to say why this is but it is an interesting observation. One has to wonder if it has anything to do with teams becoming more aware of and putting more focus on possession which in turn is strengthening the negative correlation with shooting percentage.

 

Jan 122015
 

Let me start off by first saying that this isn’t going to be a research post as much as it will be a commentary on the past, present and future of shot quality research.

The History

I have had more than a few battles on shot quality so I feel I have a more than decent understanding on the subject. As outlined in this post by Michael Schuckers there are two aspects of shot quality. These are, the quality of an individual shot and the average shot quality of all shots taken by a team or a player when on the ice.

Individual Shot Probability does matter and this has been illustrated time and time again.  There’s no doubt about it.  The most recent example is the distance analysis by Michael Parkatti,  Different shots have different probabilities of going in and there are plenty of factors that influence these probabilities.  These include x and y coordinates as well as the type of shot matter.  Here are some heat maps to emphasize that.

What has not been shown to matter much, to my mind, is Average Shot Probability (ASP), either for shots that a goalie has faced or that a team has faced or that a team has generated over a long period of time.  It might be there but the consensus (yes, David, I see your hand is up)  is that it is not.  I’ve tried to look for it.  It, ASP, matters but not a ton.  I’ve got plans to take another look at it again this winter.  But there is little denying that where we are right now is that we lack evidence for the value of long term repeatable ASP.  Somewhere there’s a fourteen year old kid with mad R skills and a great idea on how to model these data and, perhaps, they’ll find that shot quality exists.  It’s just that right now we don’t have enough evidence for it.

Yes, the “David” with his hand up is me. I have always claimed that shot quality exists in the Average Shot Probability sense of the word. I believed it back then and I believe it now. The reason I believe it is the data supports it as there is clearly a difference between the on-ice shooting percentages of the players at the top of this list and the players at the bottom and the players at the top are mostly who we all consider as the elite offensive players in the league and the ones at the bottom are mostly 3rd and 4th line players. This isn’t coincidence or randomness and is the strongest evidence in support of shot quality we have. There are even teams that have consistently posted above or below average shooting percentages. Shot quality in the Average Shot Probability sense exists and we must acknowledge that.

A number of people have analyzed shot location or shot distance data (including Shuckers, myself and numerous others) and have found relatively little indication that shot location varies across teams in a significant enough way to have a significant impact on shooting percentage. This does not mean that Average Shot Probability does not exist (which Shuckers implied was the consensus) but rather that that Average Shot Probability is not significantly influenced by variations in average shot locations. There is an important difference and the latter does not mean that Average Shot Probability does not exist (I think this is the crux of many of my shot quality debates in the past like those with Gabe Desjardins).

One of my favourite articles on this subject is one written by Tom Awad in his “What makes good players good” series of posts. If you haven’t read this article I recommend you go read it now. It is probably the best article written on shot quality even though it isn’t explicitly about that. The most important thing to note is the last table which I will reproduce here:

Group	  +/- due to finishing +/- due to shot quality +/- due to outshooting
1st tier  0.22	               0.04	               0.15
2nd tier  0.07	               0.02	               0.10
3rd tier  0.00	               0.01	              -0.06
4th tier -0.20	              -0.04	              -0.15

In this table ‘finishing’ is essentially having a better shooting percentage than your opponents, ‘shot quality’ is having a better average shot location and outshooting is as it sounds, out shooting your opponents. The greatest spread in talent between first tier players and fourth tier players is being able to out finish your opponents followed closely by outshooting your opponents. Having a better average shot location is a relatively minor factor in what makes good players good. A key takeaway is average shot location has relatively small impact on average shot probability which is consistent with what everyone has found. This is the “consensus” that Shuckers is talking about.

Recent Developments

War-on-ice has recently come up with a new definition for a scoring chance and added the results to their statistical database. The definition starts with the notion of “danger zones” which are areas surrounding the zone in front of the goal (similar to the “home plate” definition) with additional adjustments for rebound shots and rush shots (which are based of my work from this past summer). Their formal definition of a scoring chance is as follows:

  • In the low danger zone, unblocked rebounds and rush shots only.
  • In the medium danger zone, all unblocked shots.
  • In the high danger zone, all shot attempts (since blocked shots taken here may be more representative of more “wide-open nets”, though we don’t know this for sure.)

This definition, while likely an improvement over anything we have had previously (and there is evidence to support that), it still significantly dependent on shot location and based on the history of not being able to find much of a link between shot location and shot quality I have concerns about it. In particular, are we watering down the definition of a scoring chance by relying too much on location? Might we get better results by looking at just rush and rebound shots? Until I see a formal analysis that shows that shot location is a major factor in Average Shot Probability at either the team or player level I have my doubts that using shot location when defining what a quality scoring chance is is beneficial (and may in fact by harmful by diluting the defnition).

Some other really interesting work being done recently is by former NHL goalie Stephen Valiquette where he is identifying higher quality shots as being those that (for the most part) result from plays with significant lateral movement. In particular he defines the “Royal Road” as the line down the middle of the ice from one end to the other and when the puck moves laterally across this line either by a pass or being skated across immediately before a shot is taken the shot is more likely to result in a goal. To me this makes a lot of sense and I think is really where the next great leap in shot quality analysis will come from. Speed of the play (i.e. rush shots) and lateral puck movement are likely the largest contributing factors to shot quality.

In support of the idea that puck movement is a significant factor in shot quality a couple of years ago I looked at the relative impact a player can have on his linemates shooting percentages and found that many of the best players at boosting line mate shooting percentage are excellent playmakers.

The Future

The challenge with Valiquette’s “Royal Road” work is that it currently requires a lot of manual tracking of the data which is time consuming and has the potential to bring human error into the analysis. Furthermore it also doesn’t account for speed of the play which may also be a significant factor in shot quality as it limits the goalies reaction time. While I believe Valiquette’s work is a significant step forward in our understanding of the game the holy grail of shot quality research will be when the NHL introduces player and puck tracking technology. When we get this data we will be able to dig far deeper into shot quality research and allow us to define shot quality in far greater detail. Everything that has been done up until now will pale in comparison to what we’ll be able to do with automated player and puck tracking data.

 

2014-15 vs 2013-14 Rush Shots and shooting/save percentages

 Uncategorized  Comments Off on 2014-15 vs 2013-14 Rush Shots and shooting/save percentages
Dec 312014
 

I have mentioned on twitter how I was looking at rush shots as a percentage of overall shots and how teams have fared compared to last season. Here is how the teams have done from an offensive perspective.

Team RushShot% For Diff Sh% Diff
NY Rangers 7.24% 2.47%
Arizona 6.07% 0.92%
San Jose 5.89% -0.01%
Edmonton 5.64% -1.23%
Buffalo 5.02% 2.70%
Washington 3.64% 0.65%
Anaheim 3.10% -2.09%
Calgary 2.83% -0.33%
Pittsburgh 2.71% 0.72%
Columbus 2.51% -0.23%
New Jersey 2.37% 1.57%
Vancouver 2.31% 1.32%
Winnipeg 2.21% -1.71%
Montreal 1.52% -0.11%
Los Angeles 1.43% 1.09%
Minnesota 1.21% 1.12%
St. Louis 1.20% -1.16%
Ottawa 0.96% -0.04%
Florida 0.88% -2.26%
Dallas 0.81% 0.89%
Tampa Bay 0.48% 1.90%
Detroit -1.35% 0.08%
Carolina -1.64% -1.04%
Philadelphia -1.68% 0.91%
Chicago -1.71% 0.71%
Colorado -2.79% -2.84%
Boston -3.35% 0.88%
NY Islanders -3.37% 1.31%
Toronto -3.77% 1.62%
Nashville -3.97% 1.13%

In this table RushShot% For Diff is the difference between this seasons 5v5 road RushShots/TotalShots – last years. A positive number indicates a higher percentage of their shots are coming on the rush this season and a negative number indicates a lower percentage. Sh% Diff is the difference in 5v5 Road shooting percentage between this season and last season (positive means improved shooting percentage).

Rush shots are generally tougher shots and thus one would expect a higher percentage of rush shots for would boost team shooting percentage. For some teams like the Rangers, Arizona and Buffalo this is true while for others like Edmonton this hasn’t held up. Certainly sample size and randomness are an issue here as are roster changes. Overall the correlation between the two stats is 0.067 which isn’t great.

Here are comparable stats from a defensive standpoint.

Team RushShot%Against Diff Sv% Diff
Colorado 6.33% -0.37%
Minnesota 4.13% -1.41%
Montreal 4.05% 1.69%
Chicago 3.82% 1.61%
Carolina 3.10% -2.74%
Boston 3.08% -1.65%
Pittsburgh 3.01% 1.82%
Los Angeles 2.74% -2.37%
Edmonton 2.55% -2.29%
Anaheim 2.55% 0.68%
Toronto 2.09% -0.38%
Arizona 2.06% -1.87%
Washington 2.05% 1.78%
Nashville 1.07% 3.53%
Ottawa 0.94% -0.09%
San Jose 0.82% 0.95%
Columbus 0.50% -0.06%
Winnipeg 0.49% 1.33%
Dallas 0.21% -0.66%
Vancouver 0.10% -0.01%
New Jersey -0.08% 1.87%
Philadelphia -0.42% 0.41%
Tampa Bay -1.38% -1.93%
Detroit -1.66% 1.05%
Buffalo -1.76% -0.69%
Calgary -2.41% 1.55%
NY Rangers -2.93% 0.13%
St. Louis -3.36% 0.22%
NY Islanders -3.66% -0.44%
Florida -6.51% 3.28%

Here we would expect a higher RushShot Against Differential to lead to a lower save percentage as more rush shots against should lead to a higher average shot quality against. We see a higher correlation here than on the offensive side of things as the correlation between the two stats is -0.275.

I’ll revisit these tables as the season progresses to see if the correlations improve but the observations are interesting nonetheless.

(Note: See here for all my articles on rush shots from the summer. Of particular interest is the introduction to rush shots and why, with how I have defined rush shots, we are limited to using road data.)

Why zone starts don’t matter much

 Uncategorized  Comments Off on Why zone starts don’t matter much
Dec 132014
 

I have written a number of posts on zone starts and how they don’t generally have a significant impact on a players overall statistics but I constantly run across people that find that difficult to accept. There are still studies being done looking at how long the impact of a zone start has on outcomes (this was based on some of the work of Tyler Dellow). While these are interesting studies, the important thing to understand is that while there is an impact it has relatively little impact on a players overall statistics.

Before Tyler Dellow was hired by the Edmonton Oilers he suggested that my 10 seconds was not sufficient and like the article I linked to above he suggested the impact went on much longer though I could never pin down exactly what that number was. In my work I determined that after 10 seconds after a zone face off there is no noticeable impact on a players statistics and even the 10 second adjustment is minimal. I hope to try and explain why this is with this post.

The first significant fact to know is that when you remove the 10 seconds of play after all offensive and defensive zone face offs from a players 5v5 statistics you are removing approximately 15% of their ice time.

Now, let’s consider a hypothetical 50% corsi player during normal, non-face off influenced play. Now, for the 10 seconds after an offensive zone face off he is a 100% corsi player and for the 10 seconds after a defensive zone face off he is a 0% corsi player (these are the most extreme scenarios that in reality don’t happen).

Now, let’s also assume that the player is a 70% dzone face off player meaning that of all the offensive and defensive face offs he is on the ice for 70% of them are in the defensive zone and 30% are in the offensive zone. This is a pretty extreme zone start differential that only a handful of players get.

So now we have 85% of a players 5v5 ice time beyond the 10 seconds after a zone face off at 50% corsi. We have 4.5% of his ice time (30% of 15%) after an offensive zone face off with 100% corsi and we have 10.5% of his ice time after a defensive zone face off with 0% corsi. Add that all up and his expected corsi is 50%*0.85 + 100%*0.045 + 0%*0.105 = 47%.

That means, for this extreme zone start player the maximum impact of the 10 seconds after a face off is a drop in his Corsi from 50% to 47%. In reality it would be less because corsi isn’t 100%/0% after for 10 seconds after an offensive/defensive face off but for the purposes of identify an upper bound on the impact this suffices.

Now, one might suggest that the impact of the zone start is more than 10 seconds which may be true. Remember though that the percentage of the players overall ice time that the second 10 seconds would account for would be less than the 15% of the first 10 seconds since there might be another face off or a line change. I don’t have that number off hand but let’s assume it is 8%. Furthermore, the impact on Corsi will be far less significant in that second 10 seconds. Corsi in that second 10 seconds is likely more like 65%/35% than 100%/0%. If the next 10 seconds accounted for 8% of his ice time with a 65%/35% ozone/dzone corsi it would drop his Corsi% from 47% to 46.5%, just an additional 0.5% which is pretty much within the range of noise. Beyond that the impact would be negligible.

Now, let’s do these same calculations for a guy who has 60% defensive zone face offs and 40% offensive zone face offs which is far more common than 70/30. This player would have his 10 second impact take him from 50% corsi to 48.5% corsi. The following 10 seconds would see his corsi drop from 48.5% to 48.26%, just an additional quarter percent. The majority of players will be within this range (of all players with 500 5v5 minutes last season 87.4% were within 40-60% DZone%) and have a maximum potential impact of +/- 1.75%.

Let’s take a look at some of the players who had the most intense defensive zone starts from last season and see how their 5v5 stats compare with their 5v5 – 10s after a zone start stats (which I call F10 stats) to show the true impact.

Player_Name Dzone% CF% F10 CF% CF%- F10CF%
BOYD GORDON 82.0% 42.3 44.4 -2.1
MANNY MALHOTRA 79.1% 41.6 44.1 -2.5
JAY MCCLEMENT 71.8% 38.8 40.8 -2
BRANDON BOLLIG 81.7% 51 53.3 -2.3
MARCUS KRUGER 79.1% 51.6 54.4 -2.8
DOMINIC MOORE 75.4% 48.5 49.5 -1
PAUL GAUSTAD 71.7% 44.5 45.8 -1.3
BRIAN BOYLE 76.2% 47.1 48.5 -1.4
ADAM HALL 72.0% 44.1 44.5 -0.4
BRAD RICHARDSON 67.7% 47.7 47.9 -0.2
BEN SMITH 73.8% 51 53 -2
RADEK DVORAK 68.3% 42.8 43.4 -0.6
MATT HENDRICKS 72.2% 41.6 42.3 -0.7
DRAYSON BOWMAN 65.4% 45.9 47 -1.1
KYLE BRODZIAK 66.5% 44 45.6 -1.6
DAVID JONES 64.2% 45.3 45 0.3
TORREY MITCHELL 64.3% 45.5 45.3 0.2
PAUL RANGER 60.3% 42.4 43.8 -1.4
MATT COOKE 65.6% 45.1 46.4 -1.3
JEFF HALPERN 65.4% 49.1 49.7 -0.6

The biggest impact is just -2.8% (Marcus Kruger) and the average impact among these players is just -1.24%. This is well below my maximum impact estimates above so my theory is over estimating reality. The following scenarios might explain why.

Scenario 1: Player A is on the ice for a neutral zone face off which his team loses. The opposing team immediately goes on the offense and take a shot which the goalie saves driving it into the corner where the opposing team retrieves the puck and the goalie saves it and covers it up ending play. This would account for 2 shots against after a neutral zone face off, both of which Player A was on the ice for.

Scenario 2: Player A is on the ice for a neutral zone face off which his team loses. The opposing team immediately goes on the offense and takes a shot which the goalie saves and covers up the puck forcing a face off. Player A along with all his teammates remain on the ice for the defensive zone face off when his team again loses and the opposing team takes a shot which the goalie saves and covers again. This would account for 1 shot after a neutral zone face off and one shot after a defensive zone face off.

The reality is, both of these scenarios should be accounted for identically as it was losing the neutral zone face off and letting the opposing team enter their zone that resulted in both of these shots. The fact that there was a face off between shots doesn’t change that if the players didn’t change. We can’t be letting players off the hook just because his goalie covered up the puck and forced a face off in between shots. The reality is, when we count zone starts we should really only be counting face offs where the player was not on the ice prior to that face off. By not doing so we are not properly assigning credit/blame for some shots for/against. This is why in reality the impact is smaller than I calculated in theory.

Do zone starts matter? Yes,  a bit for some of the more extreme zone start usage players. For the majority of players its hardly worth considering.

 

Dec 082014
 

I have tackled the subject of on-ice shooting percentage a number of times here but I think it is a subject that has been under researched in hockey analytics. Historically people have done some split half comparisons found weak correlations and written it off as a significant or useful factor in hockey analytics. While some of the research has merit, a lot of the research deals with too small of a sample size to get any really useful correlations. Split-half season correlations with majority of the players is including players that might have 3 goals int he first half and 7 in the second half and that is just not enough to draw any conclusions from. Even year over year correlations have their issues and in addition to smallish sample sizes it suffers problems related to roster changes and how roster changes impact on-ice shooting percentages. Ideally we’d want to eliminate all these factors and get down to actual on-ice shooting percentage talent factoring out both luck/randomness and roster changes.

Today @MimicoHero posted an article discussing shooting percentage (and save percentage)  by looking at multi-year vs multi-year comparisons. It’s a good article so have a read and I have written many articles like this in the past. This is important research but as I eluded to above, year over year comparisons suffer from issues related to roster change which potentially limit what we can actually learn from the data. People often look at even/odd games to eliminate these roster issues and that is a pretty good methodology. Once in the past I took this idea to the extreme and even used even/odd seconds in order to attempt to isolate true talent from other factors (note that subsequent to that article I found a bug in my code that may have impacted the results so I don’t have 100% confidence in them. I hope to revisit this in a future post to confirm the results.). This pretty much assures that the teammates a player plays with and the opponents they play against and the situations they play in will be almost identical in both halves of the data. I hope to revisit the even/odd second work in a future post to confirm and extend on that research but for this post I am going to take another approach. For this post I am going to focus solely on shooting percentage and use an even/odd shot methodology which should do a pretty good job of removing roster change effects as well.

I took all 5v5 shot data from 2007-08 through 2013-14 and for each forward I took their first 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000 shots for that they were on the ice for. This allowed me to do 100 vs 100 shot, 200 vs 200 shot, … 1000 vs 1000 shot comparisons. For comparison sake, in addition to even/odd shots I am also going to look at first half vs second half comparisons to get an idea of how different the correlations are (i.e. what the impact of roster changes is on a players on-ice shooting percentage). Here are the resulting correlation coefficients.

Scenario SplitHalf Even vs Odd NPlayers
100v100 0.186 0.159 723
200v200 0.229 0.268 590
300v300 0.296 0.330 502
400v400 0.368 0.375 443
500v500 0.379 0.440 399
600v600 0.431 0.481 350
700v700 0.421 0.463 319
800v800 0.451 0.486 285
900v900 0.440 0.454 261
1000v1000 0.415 0.498 222

And here is the table in graphical form.

EvenVsOdd_FirstvsSecondHalf_ShPct

Let’s start with the good news. As expected even vs odd correlations are better than first half vs second half correlations though it really isn’t as significant of a difference as I might have expected. This is especially true with the larger sample sizes where the spread should theoretically get larger.

What I did find a bit troubling is that correlations seem to max out at 600 shots vs 600 shots and even those correlations aren’t all that great (0.45-0.50). In theory as sample size increases one should get better and better correlations and as they approach infinity they should approach 1.00. Instead, they seem to approach 0.5 which had me questioning my data.

After some thought though I realized the problem was likely due to the decreasing number of players within the larger shot total groups. What this does is it restricts the spread in talent as only the top level players remain in those larger groups. As you increase the shot requirements you start weeding out the lesser players that are on the ice for less ice time and fewer shots. So, while randomness decreases with increased number of shots so does the spread in talent. My theory is the signal (talent) to noise (randomness) ratio is not actually improving enough to see improving results.

To test this theory I looked at the standard deviations within each even/odd group. Since we also have a definitive N value for each group (100, 200, 300, etc.) and I can calculate the average shooting percentage it is possible to estimate the standard deviation due to randomness. With the overall standard deviation and an estimated standard deviation of randomness it is possible to calculate the standard deviation in on-ice shooting percentage talent. Here are the results of that math.

Scenario SD(EvenSh%) SD(OddSh%) SD(Randomness) SD(Talent)
100v100 2.98% 2.84% 2.67% 1.15%
200v200 2.22% 2.08% 1.91% 1.00%
300v300 1.99% 1.87% 1.56% 1.14%
400v400 1.71% 1.70% 1.35% 1.04%
500v500 1.56% 1.57% 1.21% 1.00%
600v600 1.50% 1.50% 1.11% 1.01%
700v700 1.35% 1.39% 1.03% 0.90%
800v800 1.35% 1.33% 0.96% 0.93%
900v900 1.24% 1.26% 0.91% 0.86%
1000v1000 1.14% 1.23% 0.86% 0.81%

And again, the chart in graphical format.

EstimatingOnIceShootingPctTalent

The grey line is the randomness standard deviation and it flows as expected, decreasing in a nice manner. This is a significant driver of the even and odd standard deviations but the talent standard deviation slowly falls off as well. If we call SD(Talent) the signal and SD(Randomness) as the noise then we can plot a signal to noise ratio calculated as ST(Talent) / SD(Randomness).

SignalToNoise

What is interesting is that the signal to noise ration improves significantly up to 600v600 then it sort of levels off. This is pretty much in line with what we saw earlier in the first table and chart. After 600v600 we start dropping out the majority of the fourth liners who don’t get enough ice time to be on the ice for 1400+ shots at 5v5. Later we start dropping out the 3rd liners too. The result is the signal to noise ratio flattens out.

With that said, there is probably enough information in the above charts to determine what a reasonable spread in on-ice shooting percentage talent actually is. Specifically, the yellow SD(Talent) line does give us a pretty good indication of what the spread in on-ice shooting percentage talent really is. Based on this analysis a reasonable estimate for one standard deviation in shooting percentage talent in a typical NHL season is probably around 1.0% or maybe slightly above.

What does that mean in real terms (i.e. goal production)? Well, the average NHL forward is on the ice for ~400 5v5 shots per season. Thus, a player with an average amount of ice time that shoots one standard deviation (I’ll use 1.0% as standard deviation to be conservative) above average would be on the ice for 4 extra goals due solely to their on-ice shooting percentage. Conversely an average ice time player with an on-ice shooting percentage one standard deviation below average would be on the ice for about 4 fewer goals.

Now of course if you are an elite player getting big minutes the benefit is far greater. Let’s take Sidney Crosby for example. Over the past 7 seasons his on-ice shooting percentage is about 3.33 standard deviations above average and last year he was on the ice for just over 700 shots. That equates to an extra 23 goals due to his extremely good on-ice shooting percentage. That’s pretty impressive if you think about it.

Now compare that to Scott Gomez whose 7-year shooting percentage is about 1.6 standard deviations below average. In 2010-11 he was on the ice for 667 shots for. That year his lagging shooting percentage talent an estimated 10.6 goals. Imagine, Crosby vs Gomez is a 33+ goal swing in just 5v5 offensive output.

(Yes, I am taking some liberties in those last few paragraphs with assumptions relating to luck/randomness, quality of team mates and what not so not all good or bad can necessarily be attributed to a single player or to the extent described but I think it drives the point, a single player can have a significant impact through on-ice shooting percentage talent alone).

In conclusion, even after you factor out luck and randomness, on-ice shooting percentage can player a significant role in goal production at the player level and, as I have been saying for years, must be taken into consideration in player evaluation. If you aren’t considering that a particular player might be particularly good or particularly bad at driving on-ice shooting percentage you may not be getting the full story.

(In a related post, there was an interesting article on Hockey Prospectus yesterday looking at how passing affects shooting percentage which supports some earlier findings that showed that good passers are often good at boosting teammates on-ice shooting percentage. Of course I have also shown that shots on the rush also result in higher shooting percentage so to the extent that players are good at generating rush shots they should be good at boosting their on-ice shooting percentages).

 

Goals, Corsi, and Weighted Shot Differential

 Uncategorized  Comments Off on Goals, Corsi, and Weighted Shot Differential
Dec 012014
 

Yesterday ‘Tangotiger’ introduced a new hockey metric that got the hockey twitter world all excited. Go read the articles for the methodology and rational behind the metric but in short he conducted first half season vs second half season regression and discovered that goals and shot attempts that didn’t result in goals should be weighted differently. The final result was that for his weighted shot differential goals should be given a weight of 1.0 and shot attempts that didn’t result in goals (saved, missed the net or blocked) should be given a weight of 0.2. Although he concluded that because of this Corsi is not a good statistic because it doesn’t apply the proper weighting correctly. The reality is, as others have pointed out, this new Weighted Shot Differential is actually highly correlated with corsi and here is why.

Consider the following formula for weighted shot for total (WSFT).

WSFT = Goals + (Corsi-Goals) * 0.2

We can reduce that formula further to

WSFT = Corsi + Goals * 0.8

Last seasons goals as a percentage of corsi (effectively corsi shooting percentage) ranged from 3.2% (Buffalo) to 5.3% (Anaheim) which means teams WSFT formula ranged from

WSFT = Corsi + 0.0255 * Corsi = 1.0255 * Corsi (for Buffalo)

to

WSFT = Corsi + 0.0428 * Corsi = 1.0428 * Corsi (for Anaheim)

which really isn’t much of an adjustment to overall Corsi.

The most important aspect of Tangotiger’s post is actually the part near the end to do with sample size.

Now, I know what you are going to say: how come all-shots correlate so much better than only-goals?  That’s easy.  The best way to increase correlation is to increase the number of trials.  It’s really that simple.  100 shots is not as good a forecaster as 500 shots, which is not as good as 2000 shots.  So, if you have 10 non-goal shots and 1 goal-shot, then naturally, the 10 non-goal shots will correlate better with future goals.

And indeed, this is consistent with the above results!  Since we weight each non-goal shot at 0.2 and each goal at 1.0, and if you have 2 EV goals and 20 EV non-goals, then guess what.  The 2 EV goals count as “2 trials”, while the 20 EV goals count as “4 trials”.  So, naturally, the 20 EV non-goals will correlate better than the 2 EV goals.  But, that still doesn’t mean you can weight both the same.  Not at all.

That is the crux of the whole corsi vs goal debate. The relative importance of shot attempts and goals as a predictor of future goal production is all about sample sizes. Shot attempts are much more reliable over small sample sizes. If we had an very very large sample goals would be the far better predictor (it theory it could be a perfect predictor in an infinitely large sample, Corsi could never be that). What Tangotiger did was attempt to determine the proper weightings when considering a 41 game sample. Nothing more. Nothing less. If you have a 30 game sample, goals would be weighted even less. If you had a larger sample, goals would be given more weight. In theory one should be able to develop a sliding scale where the weights vary based on sample size. Until then we can only guess what the actual weights should be for any particular sample size.

I get a fair bit of flak for being somewhat anti-Corsi but I am not really. I just feel the benefits of Corsi have been over sold and the issues with Corsi have been under reported. Corsi is a good evaluation tool but far too often it is used as the sole evaluation tool. If all you have are small sample sizes then that might be all that you can use but the reality is  for the majority of what we do in hockey analytics we have a lot more data to work with than, for example, games. It is about using all of the tools we have at our disposal, not relying on just one to the majority of the analysis we conduct.

 

Does higher Corsi Against rates boost Save Percentage?

 Uncategorized  Comments Off on Does higher Corsi Against rates boost Save Percentage?
Nov 242014
 

Yesterday I wrote an article for MapleLeafsHotStove.com looking at the Leafs performance so far this season in comparison to previous seasons. In it I showed a chart comparing the Leafs CA/60 rate in comparison with their Save% and it was quite astonishing how they rose and fell in lock-step. Here is that chart:

MapleLeafs_21games_CA60SvPct

 

Very rarely in hockey analytics do you get a chart that looks as “nice” as that one so it is something that really draws my attention. Essentially what this is saying is that the more shot attempts you give up the higher the goalies save percentage will be. If this is true it would imply that more shots does not automatically mean more goals. At least not more goals at the same rate. It would apply that in many cases more shots just means more shots that aren’t difficult for the goalie to save.

I have some theories on this. For one, we know that shots on the rush are more difficult to save. If you are generating a ton of shot attempts it probably means you are spending a lot of time in the offensive zone and if you are in the offensive zone generating shots, they are not the tougher rush shot variety. Thus, if you are generating a lot of shots it probably means they are of lower quality on average.

This is difficult to accept for a lot of people and there have been studies that have shown otherwise. For example, this one at brodeurisafraud.blogspot.com or this one at hockey-graphs.com. This morning twitter user @DTMAboutHeart posted his own chart showing the relationship did not exist. The problem with these studies is they aren’t necessarily looking at the same goalie in different situations. For example, if you plot CA60 vs Save% for all goalies you get some good and bad goalies on both good and bad CA60 teams. Of course the chart will be largely random in that situation.

Chris Boyle of SportsNet did a study that showed that the relationship does exist and higher shot totals leads to higher save percentages but that analysis is also flawed due to selection bias which led to some to rightfully doubt the conclusions. Although I still think there is merit to what Chris Boyle did there is also merit to the claims made by those who doubt his methodology. As such a different analysis really needs to be undertaken which is what I have done here.

In my opinion, the proper way to answer the question of whether shot volume leads to higher save percentages is to look at how individual goalies save percentages have varied from year to year in relation to how their CA60 has varied from year to year. To do this I looked at the past 7 seasons of data and selected all goalie seasons where the goalie played at least 1500 minutes of 5v5 ice time. I then selected all goalies who have had at least 5 such seasons. There were 23 such goalies. I then took their 5-7 years worth of CA60 and save % stats and calculated a correlation between them. Here is what I found.

Player_Name Nyrs CA60 vs Sv% Correlation StdDev(CA60) StdDev(Sh%) One Team
EVGENI NABOKOV 6 0.036 5.49 0.59
JONAS HILLER 5 -0.117 4.63 0.69 Y
ANTTI NIEMI 5 0.629 4.41 0.67
STEVE MASON 5 -0.311 3.86 0.74
HENRIK LUNDQVIST 7 0.418 3.78 0.44 Y
MIIKKA KIPRUSOFF 5 0.571 3.62 0.78 Y
CAM WARD 5 0.566 3.53 0.55 Y
NIKLAS BACKSTROM 6 0.702 3.43 0.67 Y
JONATHAN QUICK 6 0.494 3.33 0.95 Y
TIM THOMAS 6 0.555 3.17 1.68 Mostly
CAREY PRICE 7 0.604 3.11 0.55 Y
ILYA BRYZGALOV 6 -0.645 3.05 0.90
TOMAS VOKOUN 5 0.776 2.88 0.69
RYAN MILLER 7 0.080 2.82 0.37 Y
DWAYNE ROLOSON 5 0.326 2.80 1.21
PEKKA RINNE 5 -0.087 2.43 0.36 Y
MARC-ANDRE FLEURY 5 0.264 2.30 0.78 Y
JIMMY HOWARD 5 -0.812 2.23 0.94 Y
MARTIN BRODEUR 5 0.802 2.17 1.03 Y
MIKE SMITH 5 0.433 2.02 0.89
ROBERTO LUONGO 6 -0.338 1.69 0.44 Mostly
ONDREJ PAVELEC 5 -0.144 1.48 0.85 Y
KARI LEHTONEN 6 -0.583 1.47 0.24
Average 0.183
Average (CA60 StdDev>2) 0.264
Average (CA60 StdDev>3) 0.292
Average (One Team) 0.237
Average (One Team, CA60 StdDev>2) 0.311
Average (One Team, CA60 StdDev>3) 0.474

The columns are:

  • NYrs – Number of seasons goalie played >1500 minutes at 5v5 play
  • CA60 vs Sv% Correlation – Correlation between CA60 and Save Percentage
  • StdDev(CA60) – The Standard Deviation in CA60
  • StdDev(Sh%) – The Standard Deviation in Sh%
  • One Team – Flag indicating whether goalie played with a single team (Mostly is single team except for a trade deadline trade in a single season)

So, you can see that there are both positive and negative correlations which puts the claim in some doubt. That said, the overall average correlation is 0.183 so there is some evidence that on average there is a positive correlation.

Now, if CA60 doesn’t vary much in the sample it is difficult to identify a relationship with save %. You just can’t correlate something to a variable if that variable is relatively stable. So, if I restrict the goalies to only those whose standard deviation in CA60 is >2.00 the average correlation between CA60 and save percentage rises to 0.264. If I restrict it further to >3.00 the average correlation between CA60 and save percentage rises to 29.2.

The players playing in front of the goalie and possibly the system the team plays behind may also impact save percentage. If we attempt to minimize this impact by looking at goalies that have only played for one team (or mostly one team) the average correlation between CA60 and save percentage is 0.237. If we restrict that further by looking at goalies with StdDev(CA60)>2 the correlation is 0.311. Restricting it further to goalies with StdDev(CA60)>3 the correlation rises to 0.474.

So, what have we learned?

  1. There appears to be a correlation between CA60 and save percentage.
  2. The correlation gets sronger if we restrict to goalies that haven’t changed teams (i.e. relative stability in who is playing in front of them and possibly the system being played).
  3. If we restrict to only goalies that have had reasonably large variations in CA60 over the years the correlation also gets stronger.

Based on these observations I believe it is reasonable to suggest that there is in fact a positive relationship between CA60 and save percentage though it can be dominated by the impacts of changing teams or significantly changing rosters or playing styles in front of the goalie.Needless to say, this should change how we evaluate goalies as well as evaluate the defensive performance of players.

 

Sep 262014
 

Last night on twitter I posted some GF%RelTM statistics which resulted in a number of comments but notably some from Stephen Burtch about how players cannot be blamed for GF% and is nothing more than a fancy +/- stat and how players can’t be blamed or given credit for things such as save percentage.

It isn’t just Burtch that has this sentiment. In an article on ArcticIceHockey.com HappyCaraT writes that “+/- is a stat that is pure luck.” There has been a lot of bashing of +/-, some fair, some overblown, and the result is this kind of sentiment. To suggest that +/- or some similar stat is all luck and has no validity or usefulness is just silly. Yes +/- is heavily team driven but so is Corsi and nearly every other NHL statistic so that is no reason to toss it aside. You just have to take that into consideration and look at things like ‘Rel’ stats and WOWY analysis. Yes it is impacted by luck and randomness but given large enough sample sizes that is largely mitigated and given large enough sample sizes it is predictive of future performance.

Now, to address Burtch’s specific comment about on-ice save percentage I don’t understand why anyone believes players cannot influence it. I have written about this before but we know players can impact save percentage because score effects are real. When players are protecting a lead they give up more shots but they end up as goals at a smaller rate while presumably playing against the oppositions best offensive players who definitely have better shooting percentages overall. Luck doesn’t only happen when you are protecting a lead and bad luck doesn’t always happen when you are trailing.

Furthermore, in recent months the following have been discovered:

These two observations taken together implies that the players that are better at minimizing clean zone entries against effectively should be able to boost their goalies on-ice save percentage. Who was the best Leaf defenseman in terms of limiting successful zone entries against on the Leafs last season? Dion Phaneuf. Who on the Leafs had the best Save%RelTM last year? Gunnarsson, who played mostly with Phaneuf. Phaneuf was a close second. In fact, over the past 4 seasons Phaneuf’s Save%RelTM has been +1.3%, +1.8%, +1.6% and +2.1%. Pretty consistently good. Is it a coincidence that a defenseman who is good as limiting successful zone entries against is good at boosting their goalies save percentage? I suspect not.

Now, what about Polak. Well, he has been -1.7, -2.4, -0.7, and -1.1. Not so good. Robidas has been -3.1, -3.5, -0.6, and -2.1. Wow, look at that. It’s a trend, and not a good one. Should we be predicting a tougher season for Maple Leaf goalies? Probably so.

When I get more time (currently working on my new website where you’ll get access to these RelTM stats) I’ll do some more research into studying the connection between zone entries against and save percentage. Until then I think there is at least some good evidence to support that limiting zone entries against is a big factor in being able to boost your on-ice save percentage (as well as your goalies save percentage).

So, can we please get past the idea that a statistics like GF% or GF%RelTM has zero merit and that all hockey analytics must be done using Corsi or Fenwick? Are there special concerns that need to be considered with these statistics? Sure, but calling them irrelevant, all luck, and not useful is the kind of thinking is only going to limit progress in hockey analytics. Shot quality exists and its real. At both ends of the rink. To take hockey analytics to the next level we need to research it and understand it better, not continually minimize it.

Sep 242014
 

Today apparently there was some discussion about the Avalanche and their non-interest in hockey analytics. In that discussion Corey Pronman wrote the following tweet:

 

I have seen the above logic from time to time. I think it dates back to something Gabe Desjardins wrote many years ago. I find the logic very odd though. Let me explain.

Let’s assume that the numbers are true. According to my math, that leaves 25% unaccounted for. I don’t really consider 25% insignificant but it is actually more significant than that.

Luck, or I prefer the term randomness, is a component that is outside the control of a general manager, a coach, a player or anyone else that could potentially influence the outcome of the game. Thus it is pointless to bring luck into the equation.  All that management and players for an NHL team really needs to worry about is what they can control. That is the non-luck fraction of winning or the other 60%.

Now, if Corsi is 35% of winning overall then it accounts for 58% of the controllable aspect of winning. That leaves 42% of what is controllable unaccounted for. If I were an owner of an NHL team, or an owner of a business of any kind, and my general manager told me that we are going to largely ignore 42% of  the controllable factors that lead to positive outcomes I’d be firing that general manager on the spot. It simply isn’t acceptable business practice to ignore 42% of what is within your control that produces good outcomes.

Here is the the real kicker though. The estimate that Corsi explains 35% of wins is based on historical data (and probably from several years ago). It does not necessarily mean it will be that way in the future. As teams become more aware of Corsi and possession it is certainly conceivable that the disparity across teams in corsi shrinks and thus the importance of Corsi as a differentiator among teams and as predictor of winning shrinks. If teams switch focus to Corsi those other factors might be the great differentiator of team talent and be the better predictor of success. It is easy to hop on the Corsi bandwagon now. The forward thinking teams and forward thinking hockey analytics researchers are those researching that other 42% to some significant degree.

Now, if you are a hockey analytics researcher raise your hand if you have spent ~60% of your research and analysis time on Corsi related issues and ~40% of your research time on non-Corsi related issues. If you are honest I suspect very few of you have raised your hand. The honest truth is those other factors have been unfairly downplayed and in my opinion that is very unfortunate.

 

Evaluating defensive ability

 Uncategorized  Comments Off on Evaluating defensive ability
Sep 232014
 

A short while ago I aksed the question of who the best defensive defensemen in the NHL are to my twitter followers and it became clear to me that I am not certain people really know how to evaluate players defensive ability. I’ll explore that further in a bit but first here are some of the answers I received.

  • Vlasic
  • Seabrook
  • Chara
  • Muzzin
  • Fayne
  • Giordano
  • Stralman
  • Andy Greene
  • Rozsival
  • Paul Martin
  • Shea Weber
  • Hjalmarsson
  • Phillips
  • and probably a few more I missed

It also spawned a lot of talk about corsi%, CorsiRel and players CF% with and without certain players. This really dumbfounds me because I find CF% an odd way of evaluating players defensive ability because CF% mixes both corsi for and corsi against stats. It’s kind of like using +/- as a defensive stat when at least half of what goes into +/- is offensive ability.

So, how might I go about evaluating players defensively? Well, one thing I might do is look at a players CA60 RelTM for the past few seasons in 5v5close situations. For defensemen with 1000 5v5close minutes over the last three seasons the leaders in CA60RelTM are Muzzin, Brodie, Stralman, Timonen, and Orlov. The  worst are Doug Murray, Klesla, J. Schultz, Butler and Phaneuf.

Here is the thing though. I believe that defenders (at least some of them) are able to impact their goalies save percentage so I personally think that CA60RelTM is probably not a completely evaluating defensive ability. If we looked at GA60RelTM instead the top defensemen are Doug Hamilton, Bryce Salvador, Matt Niskanen, Sheldon Brookbank, and TJ Brodie while the worst are Schultz, Klesla, Brenden Dillon, Giordano and Tyson Barrie.

Of course I would also want to consider players who play against top offensive opponents and there I would look at guys who play against the best GF60 players on average. The players with the toughest GF60 opponents the past 3 seasons are Phaneuf, Weber, Girardi, McDonagh and Ekman-Larsson while the defensemen with the weakest GF60 opponents the past 3 seasons are Kindl, Meszaros, Sbisa, Engelland and Rozsival.

This is not very scientific because I just did this in about 15 minutes but I filtered all defensemen who had an opponent GF60 higher than 2.25 to get the defensemen with the toughest QoC. I then took all defensemen with a CA60RelTM below -3.0 and also GA60RelTM below -0.2. This gave me 10 defensemen who might worthy of consideration for being among the top defensemen defensively.

  • Brodie
  • Hjalmarsson
  • Brodin
  • Tanev
  • Vlasek
  • Michalek
  • Enstrom
  • Campbell
  • Ekman-Larsson
  • Oduya

Agree? Disagree?

If I take out the GA60RelTM restriction Muzzin, Timonen, Subban, Braun, Doughty, Fayne, Goligoski, Giordano, Andy Greene, and Chara get added into the mix.

If we apply all the restrictions to forwards we get the following 19 players:

  • D. Sedin
  • H. Sedin
  • P. Bergeron
  • C. Perry
  • B. Marchand
  • M. Koivu
  • A. Ponikarovsky
  • A. Kopitar
  • A. Burrows
  • T. Zajac
  • B. Dubinsky
  • D. Backes
  • M. Backlund
  • D. Moss
  • A. Steen
  • A. Hemsky
  • G. Landenskog
  • C. MacArthur
  • M. Hossa

Agree? Disagree? Clearly some players are there as a line effect (Sedin/Sedin/Burrows, Steen/Backes, etc.) but generally speaking I’d consider most of those guys quality 2-way players.

This is in no way meant to be a definitive guide to evaluating players defensive ability but was meant more as a preliminary exercise to see what people think.