2014-15 vs 2013-14 Rush Shots and shooting/save percentages

 Uncategorized  Comments Off on 2014-15 vs 2013-14 Rush Shots and shooting/save percentages
Dec 312014

I have mentioned on twitter how I was looking at rush shots as a percentage of overall shots and how teams have fared compared to last season. Here is how the teams have done from an offensive perspective.

Team RushShot% For Diff Sh% Diff
NY Rangers 7.24% 2.47%
Arizona 6.07% 0.92%
San Jose 5.89% -0.01%
Edmonton 5.64% -1.23%
Buffalo 5.02% 2.70%
Washington 3.64% 0.65%
Anaheim 3.10% -2.09%
Calgary 2.83% -0.33%
Pittsburgh 2.71% 0.72%
Columbus 2.51% -0.23%
New Jersey 2.37% 1.57%
Vancouver 2.31% 1.32%
Winnipeg 2.21% -1.71%
Montreal 1.52% -0.11%
Los Angeles 1.43% 1.09%
Minnesota 1.21% 1.12%
St. Louis 1.20% -1.16%
Ottawa 0.96% -0.04%
Florida 0.88% -2.26%
Dallas 0.81% 0.89%
Tampa Bay 0.48% 1.90%
Detroit -1.35% 0.08%
Carolina -1.64% -1.04%
Philadelphia -1.68% 0.91%
Chicago -1.71% 0.71%
Colorado -2.79% -2.84%
Boston -3.35% 0.88%
NY Islanders -3.37% 1.31%
Toronto -3.77% 1.62%
Nashville -3.97% 1.13%

In this table RushShot% For Diff is the difference between this seasons 5v5 road RushShots/TotalShots – last years. A positive number indicates a higher percentage of their shots are coming on the rush this season and a negative number indicates a lower percentage. Sh% Diff is the difference in 5v5 Road shooting percentage between this season and last season (positive means improved shooting percentage).

Rush shots are generally tougher shots and thus one would expect a higher percentage of rush shots for would boost team shooting percentage. For some teams like the Rangers, Arizona and Buffalo this is true while for others like Edmonton this hasn’t held up. Certainly sample size and randomness are an issue here as are roster changes. Overall the correlation between the two stats is 0.067 which isn’t great.

Here are comparable stats from a defensive standpoint.

Team RushShot%Against Diff Sv% Diff
Colorado 6.33% -0.37%
Minnesota 4.13% -1.41%
Montreal 4.05% 1.69%
Chicago 3.82% 1.61%
Carolina 3.10% -2.74%
Boston 3.08% -1.65%
Pittsburgh 3.01% 1.82%
Los Angeles 2.74% -2.37%
Edmonton 2.55% -2.29%
Anaheim 2.55% 0.68%
Toronto 2.09% -0.38%
Arizona 2.06% -1.87%
Washington 2.05% 1.78%
Nashville 1.07% 3.53%
Ottawa 0.94% -0.09%
San Jose 0.82% 0.95%
Columbus 0.50% -0.06%
Winnipeg 0.49% 1.33%
Dallas 0.21% -0.66%
Vancouver 0.10% -0.01%
New Jersey -0.08% 1.87%
Philadelphia -0.42% 0.41%
Tampa Bay -1.38% -1.93%
Detroit -1.66% 1.05%
Buffalo -1.76% -0.69%
Calgary -2.41% 1.55%
NY Rangers -2.93% 0.13%
St. Louis -3.36% 0.22%
NY Islanders -3.66% -0.44%
Florida -6.51% 3.28%

Here we would expect a higher RushShot Against Differential to lead to a lower save percentage as more rush shots against should lead to a higher average shot quality against. We see a higher correlation here than on the offensive side of things as the correlation between the two stats is -0.275.

I’ll revisit these tables as the season progresses to see if the correlations improve but the observations are interesting nonetheless.

(Note: See here for all my articles on rush shots from the summer. Of particular interest is the introduction to rush shots and why, with how I have defined rush shots, we are limited to using road data.)

Why zone starts don’t matter much

 Uncategorized  Comments Off on Why zone starts don’t matter much
Dec 132014

I have written a number of posts on zone starts and how they don’t generally have a significant impact on a players overall statistics but I constantly run across people that find that difficult to accept. There are still studies being done looking at how long the impact of a zone start has on outcomes (this was based on some of the work of Tyler Dellow). While these are interesting studies, the important thing to understand is that while there is an impact it has relatively little impact on a players overall statistics.

Before Tyler Dellow was hired by the Edmonton Oilers he suggested that my 10 seconds was not sufficient and like the article I linked to above he suggested the impact went on much longer though I could never pin down exactly what that number was. In my work I determined that after 10 seconds after a zone face off there is no noticeable impact on a players statistics and even the 10 second adjustment is minimal. I hope to try and explain why this is with this post.

The first significant fact to know is that when you remove the 10 seconds of play after all offensive and defensive zone face offs from a players 5v5 statistics you are removing approximately 15% of their ice time.

Now, let’s consider a hypothetical 50% corsi player during normal, non-face off influenced play. Now, for the 10 seconds after an offensive zone face off he is a 100% corsi player and for the 10 seconds after a defensive zone face off he is a 0% corsi player (these are the most extreme scenarios that in reality don’t happen).

Now, let’s also assume that the player is a 70% dzone face off player meaning that of all the offensive and defensive face offs he is on the ice for 70% of them are in the defensive zone and 30% are in the offensive zone. This is a pretty extreme zone start differential that only a handful of players get.

So now we have 85% of a players 5v5 ice time beyond the 10 seconds after a zone face off at 50% corsi. We have 4.5% of his ice time (30% of 15%) after an offensive zone face off with 100% corsi and we have 10.5% of his ice time after a defensive zone face off with 0% corsi. Add that all up and his expected corsi is 50%*0.85 + 100%*0.045 + 0%*0.105 = 47%.

That means, for this extreme zone start player the maximum impact of the 10 seconds after a face off is a drop in his Corsi from 50% to 47%. In reality it would be less because corsi isn’t 100%/0% after for 10 seconds after an offensive/defensive face off but for the purposes of identify an upper bound on the impact this suffices.

Now, one might suggest that the impact of the zone start is more than 10 seconds which may be true. Remember though that the percentage of the players overall ice time that the second 10 seconds would account for would be less than the 15% of the first 10 seconds since there might be another face off or a line change. I don’t have that number off hand but let’s assume it is 8%. Furthermore, the impact on Corsi will be far less significant in that second 10 seconds. Corsi in that second 10 seconds is likely more like 65%/35% than 100%/0%. If the next 10 seconds accounted for 8% of his ice time with a 65%/35% ozone/dzone corsi it would drop his Corsi% from 47% to 46.5%, just an additional 0.5% which is pretty much within the range of noise. Beyond that the impact would be negligible.

Now, let’s do these same calculations for a guy who has 60% defensive zone face offs and 40% offensive zone face offs which is far more common than 70/30. This player would have his 10 second impact take him from 50% corsi to 48.5% corsi. The following 10 seconds would see his corsi drop from 48.5% to 48.26%, just an additional quarter percent. The majority of players will be within this range (of all players with 500 5v5 minutes last season 87.4% were within 40-60% DZone%) and have a maximum potential impact of +/- 1.75%.

Let’s take a look at some of the players who had the most intense defensive zone starts from last season and see how their 5v5 stats compare with their 5v5 – 10s after a zone start stats (which I call F10 stats) to show the true impact.

Player_Name Dzone% CF% F10 CF% CF%- F10CF%
BOYD GORDON 82.0% 42.3 44.4 -2.1
MANNY MALHOTRA 79.1% 41.6 44.1 -2.5
JAY MCCLEMENT 71.8% 38.8 40.8 -2
BRANDON BOLLIG 81.7% 51 53.3 -2.3
MARCUS KRUGER 79.1% 51.6 54.4 -2.8
DOMINIC MOORE 75.4% 48.5 49.5 -1
PAUL GAUSTAD 71.7% 44.5 45.8 -1.3
BRIAN BOYLE 76.2% 47.1 48.5 -1.4
ADAM HALL 72.0% 44.1 44.5 -0.4
BRAD RICHARDSON 67.7% 47.7 47.9 -0.2
BEN SMITH 73.8% 51 53 -2
RADEK DVORAK 68.3% 42.8 43.4 -0.6
MATT HENDRICKS 72.2% 41.6 42.3 -0.7
DRAYSON BOWMAN 65.4% 45.9 47 -1.1
KYLE BRODZIAK 66.5% 44 45.6 -1.6
DAVID JONES 64.2% 45.3 45 0.3
TORREY MITCHELL 64.3% 45.5 45.3 0.2
PAUL RANGER 60.3% 42.4 43.8 -1.4
MATT COOKE 65.6% 45.1 46.4 -1.3
JEFF HALPERN 65.4% 49.1 49.7 -0.6

The biggest impact is just -2.8% (Marcus Kruger) and the average impact among these players is just -1.24%. This is well below my maximum impact estimates above so my theory is over estimating reality. The following scenarios might explain why.

Scenario 1: Player A is on the ice for a neutral zone face off which his team loses. The opposing team immediately goes on the offense and take a shot which the goalie saves driving it into the corner where the opposing team retrieves the puck and the goalie saves it and covers it up ending play. This would account for 2 shots against after a neutral zone face off, both of which Player A was on the ice for.

Scenario 2: Player A is on the ice for a neutral zone face off which his team loses. The opposing team immediately goes on the offense and takes a shot which the goalie saves and covers up the puck forcing a face off. Player A along with all his teammates remain on the ice for the defensive zone face off when his team again loses and the opposing team takes a shot which the goalie saves and covers again. This would account for 1 shot after a neutral zone face off and one shot after a defensive zone face off.

The reality is, both of these scenarios should be accounted for identically as it was losing the neutral zone face off and letting the opposing team enter their zone that resulted in both of these shots. The fact that there was a face off between shots doesn’t change that if the players didn’t change. We can’t be letting players off the hook just because his goalie covered up the puck and forced a face off in between shots. The reality is, when we count zone starts we should really only be counting face offs where the player was not on the ice prior to that face off. By not doing so we are not properly assigning credit/blame for some shots for/against. This is why in reality the impact is smaller than I calculated in theory.

Do zone starts matter? Yes,  a bit for some of the more extreme zone start usage players. For the majority of players its hardly worth considering.


Dec 082014

I have tackled the subject of on-ice shooting percentage a number of times here but I think it is a subject that has been under researched in hockey analytics. Historically people have done some split half comparisons found weak correlations and written it off as a significant or useful factor in hockey analytics. While some of the research has merit, a lot of the research deals with too small of a sample size to get any really useful correlations. Split-half season correlations with majority of the players is including players that might have 3 goals int he first half and 7 in the second half and that is just not enough to draw any conclusions from. Even year over year correlations have their issues and in addition to smallish sample sizes it suffers problems related to roster changes and how roster changes impact on-ice shooting percentages. Ideally we’d want to eliminate all these factors and get down to actual on-ice shooting percentage talent factoring out both luck/randomness and roster changes.

Today @MimicoHero posted an article discussing shooting percentage (and save percentage)  by looking at multi-year vs multi-year comparisons. It’s a good article so have a read and I have written many articles like this in the past. This is important research but as I eluded to above, year over year comparisons suffer from issues related to roster change which potentially limit what we can actually learn from the data. People often look at even/odd games to eliminate these roster issues and that is a pretty good methodology. Once in the past I took this idea to the extreme and even used even/odd seconds in order to attempt to isolate true talent from other factors (note that subsequent to that article I found a bug in my code that may have impacted the results so I don’t have 100% confidence in them. I hope to revisit this in a future post to confirm the results.). This pretty much assures that the teammates a player plays with and the opponents they play against and the situations they play in will be almost identical in both halves of the data. I hope to revisit the even/odd second work in a future post to confirm and extend on that research but for this post I am going to take another approach. For this post I am going to focus solely on shooting percentage and use an even/odd shot methodology which should do a pretty good job of removing roster change effects as well.

I took all 5v5 shot data from 2007-08 through 2013-14 and for each forward I took their first 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000 shots for that they were on the ice for. This allowed me to do 100 vs 100 shot, 200 vs 200 shot, … 1000 vs 1000 shot comparisons. For comparison sake, in addition to even/odd shots I am also going to look at first half vs second half comparisons to get an idea of how different the correlations are (i.e. what the impact of roster changes is on a players on-ice shooting percentage). Here are the resulting correlation coefficients.

Scenario SplitHalf Even vs Odd NPlayers
100v100 0.186 0.159 723
200v200 0.229 0.268 590
300v300 0.296 0.330 502
400v400 0.368 0.375 443
500v500 0.379 0.440 399
600v600 0.431 0.481 350
700v700 0.421 0.463 319
800v800 0.451 0.486 285
900v900 0.440 0.454 261
1000v1000 0.415 0.498 222

And here is the table in graphical form.


Let’s start with the good news. As expected even vs odd correlations are better than first half vs second half correlations though it really isn’t as significant of a difference as I might have expected. This is especially true with the larger sample sizes where the spread should theoretically get larger.

What I did find a bit troubling is that correlations seem to max out at 600 shots vs 600 shots and even those correlations aren’t all that great (0.45-0.50). In theory as sample size increases one should get better and better correlations and as they approach infinity they should approach 1.00. Instead, they seem to approach 0.5 which had me questioning my data.

After some thought though I realized the problem was likely due to the decreasing number of players within the larger shot total groups. What this does is it restricts the spread in talent as only the top level players remain in those larger groups. As you increase the shot requirements you start weeding out the lesser players that are on the ice for less ice time and fewer shots. So, while randomness decreases with increased number of shots so does the spread in talent. My theory is the signal (talent) to noise (randomness) ratio is not actually improving enough to see improving results.

To test this theory I looked at the standard deviations within each even/odd group. Since we also have a definitive N value for each group (100, 200, 300, etc.) and I can calculate the average shooting percentage it is possible to estimate the standard deviation due to randomness. With the overall standard deviation and an estimated standard deviation of randomness it is possible to calculate the standard deviation in on-ice shooting percentage talent. Here are the results of that math.

Scenario SD(EvenSh%) SD(OddSh%) SD(Randomness) SD(Talent)
100v100 2.98% 2.84% 2.67% 1.15%
200v200 2.22% 2.08% 1.91% 1.00%
300v300 1.99% 1.87% 1.56% 1.14%
400v400 1.71% 1.70% 1.35% 1.04%
500v500 1.56% 1.57% 1.21% 1.00%
600v600 1.50% 1.50% 1.11% 1.01%
700v700 1.35% 1.39% 1.03% 0.90%
800v800 1.35% 1.33% 0.96% 0.93%
900v900 1.24% 1.26% 0.91% 0.86%
1000v1000 1.14% 1.23% 0.86% 0.81%

And again, the chart in graphical format.


The grey line is the randomness standard deviation and it flows as expected, decreasing in a nice manner. This is a significant driver of the even and odd standard deviations but the talent standard deviation slowly falls off as well. If we call SD(Talent) the signal and SD(Randomness) as the noise then we can plot a signal to noise ratio calculated as ST(Talent) / SD(Randomness).


What is interesting is that the signal to noise ration improves significantly up to 600v600 then it sort of levels off. This is pretty much in line with what we saw earlier in the first table and chart. After 600v600 we start dropping out the majority of the fourth liners who don’t get enough ice time to be on the ice for 1400+ shots at 5v5. Later we start dropping out the 3rd liners too. The result is the signal to noise ratio flattens out.

With that said, there is probably enough information in the above charts to determine what a reasonable spread in on-ice shooting percentage talent actually is. Specifically, the yellow SD(Talent) line does give us a pretty good indication of what the spread in on-ice shooting percentage talent really is. Based on this analysis a reasonable estimate for one standard deviation in shooting percentage talent in a typical NHL season is probably around 1.0% or maybe slightly above.

What does that mean in real terms (i.e. goal production)? Well, the average NHL forward is on the ice for ~400 5v5 shots per season. Thus, a player with an average amount of ice time that shoots one standard deviation (I’ll use 1.0% as standard deviation to be conservative) above average would be on the ice for 4 extra goals due solely to their on-ice shooting percentage. Conversely an average ice time player with an on-ice shooting percentage one standard deviation below average would be on the ice for about 4 fewer goals.

Now of course if you are an elite player getting big minutes the benefit is far greater. Let’s take Sidney Crosby for example. Over the past 7 seasons his on-ice shooting percentage is about 3.33 standard deviations above average and last year he was on the ice for just over 700 shots. That equates to an extra 23 goals due to his extremely good on-ice shooting percentage. That’s pretty impressive if you think about it.

Now compare that to Scott Gomez whose 7-year shooting percentage is about 1.6 standard deviations below average. In 2010-11 he was on the ice for 667 shots for. That year his lagging shooting percentage talent an estimated 10.6 goals. Imagine, Crosby vs Gomez is a 33+ goal swing in just 5v5 offensive output.

(Yes, I am taking some liberties in those last few paragraphs with assumptions relating to luck/randomness, quality of team mates and what not so not all good or bad can necessarily be attributed to a single player or to the extent described but I think it drives the point, a single player can have a significant impact through on-ice shooting percentage talent alone).

In conclusion, even after you factor out luck and randomness, on-ice shooting percentage can player a significant role in goal production at the player level and, as I have been saying for years, must be taken into consideration in player evaluation. If you aren’t considering that a particular player might be particularly good or particularly bad at driving on-ice shooting percentage you may not be getting the full story.

(In a related post, there was an interesting article on Hockey Prospectus yesterday looking at how passing affects shooting percentage which supports some earlier findings that showed that good passers are often good at boosting teammates on-ice shooting percentage. Of course I have also shown that shots on the rush also result in higher shooting percentage so to the extent that players are good at generating rush shots they should be good at boosting their on-ice shooting percentages).


Goals, Corsi, and Weighted Shot Differential

 Uncategorized  Comments Off on Goals, Corsi, and Weighted Shot Differential
Dec 012014

Yesterday ‘Tangotiger’ introduced a new hockey metric that got the hockey twitter world all excited. Go read the articles for the methodology and rational behind the metric but in short he conducted first half season vs second half season regression and discovered that goals and shot attempts that didn’t result in goals should be weighted differently. The final result was that for his weighted shot differential goals should be given a weight of 1.0 and shot attempts that didn’t result in goals (saved, missed the net or blocked) should be given a weight of 0.2. Although he concluded that because of this Corsi is not a good statistic because it doesn’t apply the proper weighting correctly. The reality is, as others have pointed out, this new Weighted Shot Differential is actually highly correlated with corsi and here is why.

Consider the following formula for weighted shot for total (WSFT).

WSFT = Goals + (Corsi-Goals) * 0.2

We can reduce that formula further to

WSFT = Corsi + Goals * 0.8

Last seasons goals as a percentage of corsi (effectively corsi shooting percentage) ranged from 3.2% (Buffalo) to 5.3% (Anaheim) which means teams WSFT formula ranged from

WSFT = Corsi + 0.0255 * Corsi = 1.0255 * Corsi (for Buffalo)


WSFT = Corsi + 0.0428 * Corsi = 1.0428 * Corsi (for Anaheim)

which really isn’t much of an adjustment to overall Corsi.

The most important aspect of Tangotiger’s post is actually the part near the end to do with sample size.

Now, I know what you are going to say: how come all-shots correlate so much better than only-goals?  That’s easy.  The best way to increase correlation is to increase the number of trials.  It’s really that simple.  100 shots is not as good a forecaster as 500 shots, which is not as good as 2000 shots.  So, if you have 10 non-goal shots and 1 goal-shot, then naturally, the 10 non-goal shots will correlate better with future goals.

And indeed, this is consistent with the above results!  Since we weight each non-goal shot at 0.2 and each goal at 1.0, and if you have 2 EV goals and 20 EV non-goals, then guess what.  The 2 EV goals count as “2 trials”, while the 20 EV goals count as “4 trials”.  So, naturally, the 20 EV non-goals will correlate better than the 2 EV goals.  But, that still doesn’t mean you can weight both the same.  Not at all.

That is the crux of the whole corsi vs goal debate. The relative importance of shot attempts and goals as a predictor of future goal production is all about sample sizes. Shot attempts are much more reliable over small sample sizes. If we had an very very large sample goals would be the far better predictor (it theory it could be a perfect predictor in an infinitely large sample, Corsi could never be that). What Tangotiger did was attempt to determine the proper weightings when considering a 41 game sample. Nothing more. Nothing less. If you have a 30 game sample, goals would be weighted even less. If you had a larger sample, goals would be given more weight. In theory one should be able to develop a sliding scale where the weights vary based on sample size. Until then we can only guess what the actual weights should be for any particular sample size.

I get a fair bit of flak for being somewhat anti-Corsi but I am not really. I just feel the benefits of Corsi have been over sold and the issues with Corsi have been under reported. Corsi is a good evaluation tool but far too often it is used as the sole evaluation tool. If all you have are small sample sizes then that might be all that you can use but the reality is  for the majority of what we do in hockey analytics we have a lot more data to work with than, for example, games. It is about using all of the tools we have at our disposal, not relying on just one to the majority of the analysis we conduct.


Does higher Corsi Against rates boost Save Percentage?

 Uncategorized  Comments Off on Does higher Corsi Against rates boost Save Percentage?
Nov 242014

Yesterday I wrote an article for MapleLeafsHotStove.com looking at the Leafs performance so far this season in comparison to previous seasons. In it I showed a chart comparing the Leafs CA/60 rate in comparison with their Save% and it was quite astonishing how they rose and fell in lock-step. Here is that chart:



Very rarely in hockey analytics do you get a chart that looks as “nice” as that one so it is something that really draws my attention. Essentially what this is saying is that the more shot attempts you give up the higher the goalies save percentage will be. If this is true it would imply that more shots does not automatically mean more goals. At least not more goals at the same rate. It would apply that in many cases more shots just means more shots that aren’t difficult for the goalie to save.

I have some theories on this. For one, we know that shots on the rush are more difficult to save. If you are generating a ton of shot attempts it probably means you are spending a lot of time in the offensive zone and if you are in the offensive zone generating shots, they are not the tougher rush shot variety. Thus, if you are generating a lot of shots it probably means they are of lower quality on average.

This is difficult to accept for a lot of people and there have been studies that have shown otherwise. For example, this one at brodeurisafraud.blogspot.com or this one at hockey-graphs.com. This morning twitter user @DTMAboutHeart posted his own chart showing the relationship did not exist. The problem with these studies is they aren’t necessarily looking at the same goalie in different situations. For example, if you plot CA60 vs Save% for all goalies you get some good and bad goalies on both good and bad CA60 teams. Of course the chart will be largely random in that situation.

Chris Boyle of SportsNet did a study that showed that the relationship does exist and higher shot totals leads to higher save percentages but that analysis is also flawed due to selection bias which led to some to rightfully doubt the conclusions. Although I still think there is merit to what Chris Boyle did there is also merit to the claims made by those who doubt his methodology. As such a different analysis really needs to be undertaken which is what I have done here.

In my opinion, the proper way to answer the question of whether shot volume leads to higher save percentages is to look at how individual goalies save percentages have varied from year to year in relation to how their CA60 has varied from year to year. To do this I looked at the past 7 seasons of data and selected all goalie seasons where the goalie played at least 1500 minutes of 5v5 ice time. I then selected all goalies who have had at least 5 such seasons. There were 23 such goalies. I then took their 5-7 years worth of CA60 and save % stats and calculated a correlation between them. Here is what I found.

Player_Name Nyrs CA60 vs Sv% Correlation StdDev(CA60) StdDev(Sh%) One Team
EVGENI NABOKOV 6 0.036 5.49 0.59
JONAS HILLER 5 -0.117 4.63 0.69 Y
ANTTI NIEMI 5 0.629 4.41 0.67
STEVE MASON 5 -0.311 3.86 0.74
HENRIK LUNDQVIST 7 0.418 3.78 0.44 Y
MIIKKA KIPRUSOFF 5 0.571 3.62 0.78 Y
CAM WARD 5 0.566 3.53 0.55 Y
NIKLAS BACKSTROM 6 0.702 3.43 0.67 Y
JONATHAN QUICK 6 0.494 3.33 0.95 Y
TIM THOMAS 6 0.555 3.17 1.68 Mostly
CAREY PRICE 7 0.604 3.11 0.55 Y
ILYA BRYZGALOV 6 -0.645 3.05 0.90
TOMAS VOKOUN 5 0.776 2.88 0.69
RYAN MILLER 7 0.080 2.82 0.37 Y
DWAYNE ROLOSON 5 0.326 2.80 1.21
PEKKA RINNE 5 -0.087 2.43 0.36 Y
MARC-ANDRE FLEURY 5 0.264 2.30 0.78 Y
JIMMY HOWARD 5 -0.812 2.23 0.94 Y
MARTIN BRODEUR 5 0.802 2.17 1.03 Y
MIKE SMITH 5 0.433 2.02 0.89
ROBERTO LUONGO 6 -0.338 1.69 0.44 Mostly
ONDREJ PAVELEC 5 -0.144 1.48 0.85 Y
KARI LEHTONEN 6 -0.583 1.47 0.24
Average 0.183
Average (CA60 StdDev>2) 0.264
Average (CA60 StdDev>3) 0.292
Average (One Team) 0.237
Average (One Team, CA60 StdDev>2) 0.311
Average (One Team, CA60 StdDev>3) 0.474

The columns are:

  • NYrs – Number of seasons goalie played >1500 minutes at 5v5 play
  • CA60 vs Sv% Correlation – Correlation between CA60 and Save Percentage
  • StdDev(CA60) – The Standard Deviation in CA60
  • StdDev(Sh%) – The Standard Deviation in Sh%
  • One Team – Flag indicating whether goalie played with a single team (Mostly is single team except for a trade deadline trade in a single season)

So, you can see that there are both positive and negative correlations which puts the claim in some doubt. That said, the overall average correlation is 0.183 so there is some evidence that on average there is a positive correlation.

Now, if CA60 doesn’t vary much in the sample it is difficult to identify a relationship with save %. You just can’t correlate something to a variable if that variable is relatively stable. So, if I restrict the goalies to only those whose standard deviation in CA60 is >2.00 the average correlation between CA60 and save percentage rises to 0.264. If I restrict it further to >3.00 the average correlation between CA60 and save percentage rises to 0.292.

The players playing in front of the goalie and possibly the system the team plays behind may also impact save percentage. If we attempt to minimize this impact by looking at goalies that have only played for one team (or mostly one team) the average correlation between CA60 and save percentage is 0.237. If we restrict that further by looking at goalies with StdDev(CA60)>2 the correlation is 0.311. Restricting it further to goalies with StdDev(CA60)>3 the correlation rises to 0.474.

So, what have we learned?

  1. There appears to be a correlation between CA60 and save percentage.
  2. The correlation gets sronger if we restrict to goalies that haven’t changed teams (i.e. relative stability in who is playing in front of them and possibly the system being played).
  3. If we restrict to only goalies that have had reasonably large variations in CA60 over the years the correlation also gets stronger.

Based on these observations I believe it is reasonable to suggest that there is in fact a positive relationship between CA60 and save percentage though it can be dominated by the impacts of changing teams or significantly changing rosters or playing styles in front of the goalie.Needless to say, this should change how we evaluate goalies as well as evaluate the defensive performance of players.


Sep 262014

Last night on twitter I posted some GF%RelTM statistics which resulted in a number of comments but notably some from Stephen Burtch about how players cannot be blamed for GF% and is nothing more than a fancy +/- stat and how players can’t be blamed or given credit for things such as save percentage.

It isn’t just Burtch that has this sentiment. In an article on ArcticIceHockey.com HappyCaraT writes that “+/- is a stat that is pure luck.” There has been a lot of bashing of +/-, some fair, some overblown, and the result is this kind of sentiment. To suggest that +/- or some similar stat is all luck and has no validity or usefulness is just silly. Yes +/- is heavily team driven but so is Corsi and nearly every other NHL statistic so that is no reason to toss it aside. You just have to take that into consideration and look at things like ‘Rel’ stats and WOWY analysis. Yes it is impacted by luck and randomness but given large enough sample sizes that is largely mitigated and given large enough sample sizes it is predictive of future performance.

Now, to address Burtch’s specific comment about on-ice save percentage I don’t understand why anyone believes players cannot influence it. I have written about this before but we know players can impact save percentage because score effects are real. When players are protecting a lead they give up more shots but they end up as goals at a smaller rate while presumably playing against the oppositions best offensive players who definitely have better shooting percentages overall. Luck doesn’t only happen when you are protecting a lead and bad luck doesn’t always happen when you are trailing.

Furthermore, in recent months the following have been discovered:

These two observations taken together implies that the players that are better at minimizing clean zone entries against effectively should be able to boost their goalies on-ice save percentage. Who was the best Leaf defenseman in terms of limiting successful zone entries against on the Leafs last season? Dion Phaneuf. Who on the Leafs had the best Save%RelTM last year? Gunnarsson, who played mostly with Phaneuf. Phaneuf was a close second. In fact, over the past 4 seasons Phaneuf’s Save%RelTM has been +1.3%, +1.8%, +1.6% and +2.1%. Pretty consistently good. Is it a coincidence that a defenseman who is good as limiting successful zone entries against is good at boosting their goalies save percentage? I suspect not.

Now, what about Polak. Well, he has been -1.7, -2.4, -0.7, and -1.1. Not so good. Robidas has been -3.1, -3.5, -0.6, and -2.1. Wow, look at that. It’s a trend, and not a good one. Should we be predicting a tougher season for Maple Leaf goalies? Probably so.

When I get more time (currently working on my new website where you’ll get access to these RelTM stats) I’ll do some more research into studying the connection between zone entries against and save percentage. Until then I think there is at least some good evidence to support that limiting zone entries against is a big factor in being able to boost your on-ice save percentage (as well as your goalies save percentage).

So, can we please get past the idea that a statistics like GF% or GF%RelTM has zero merit and that all hockey analytics must be done using Corsi or Fenwick? Are there special concerns that need to be considered with these statistics? Sure, but calling them irrelevant, all luck, and not useful is the kind of thinking is only going to limit progress in hockey analytics. Shot quality exists and its real. At both ends of the rink. To take hockey analytics to the next level we need to research it and understand it better, not continually minimize it.

Sep 242014

Today apparently there was some discussion about the Avalanche and their non-interest in hockey analytics. In that discussion Corey Pronman wrote the following tweet:


I have seen the above logic from time to time. I think it dates back to something Gabe Desjardins wrote many years ago. I find the logic very odd though. Let me explain.

Let’s assume that the numbers are true. According to my math, that leaves 25% unaccounted for. I don’t really consider 25% insignificant but it is actually more significant than that.

Luck, or I prefer the term randomness, is a component that is outside the control of a general manager, a coach, a player or anyone else that could potentially influence the outcome of the game. Thus it is pointless to bring luck into the equation.  All that management and players for an NHL team really needs to worry about is what they can control. That is the non-luck fraction of winning or the other 60%.

Now, if Corsi is 35% of winning overall then it accounts for 58% of the controllable aspect of winning. That leaves 42% of what is controllable unaccounted for. If I were an owner of an NHL team, or an owner of a business of any kind, and my general manager told me that we are going to largely ignore 42% of  the controllable factors that lead to positive outcomes I’d be firing that general manager on the spot. It simply isn’t acceptable business practice to ignore 42% of what is within your control that produces good outcomes.

Here is the the real kicker though. The estimate that Corsi explains 35% of wins is based on historical data (and probably from several years ago). It does not necessarily mean it will be that way in the future. As teams become more aware of Corsi and possession it is certainly conceivable that the disparity across teams in corsi shrinks and thus the importance of Corsi as a differentiator among teams and as predictor of winning shrinks. If teams switch focus to Corsi those other factors might be the great differentiator of team talent and be the better predictor of success. It is easy to hop on the Corsi bandwagon now. The forward thinking teams and forward thinking hockey analytics researchers are those researching that other 42% to some significant degree.

Now, if you are a hockey analytics researcher raise your hand if you have spent ~60% of your research and analysis time on Corsi related issues and ~40% of your research time on non-Corsi related issues. If you are honest I suspect very few of you have raised your hand. The honest truth is those other factors have been unfairly downplayed and in my opinion that is very unfortunate.


Evaluating defensive ability

 Uncategorized  Comments Off on Evaluating defensive ability
Sep 232014

A short while ago I aksed the question of who the best defensive defensemen in the NHL are to my twitter followers and it became clear to me that I am not certain people really know how to evaluate players defensive ability. I’ll explore that further in a bit but first here are some of the answers I received.

  • Vlasic
  • Seabrook
  • Chara
  • Muzzin
  • Fayne
  • Giordano
  • Stralman
  • Andy Greene
  • Rozsival
  • Paul Martin
  • Shea Weber
  • Hjalmarsson
  • Phillips
  • and probably a few more I missed

It also spawned a lot of talk about corsi%, CorsiRel and players CF% with and without certain players. This really dumbfounds me because I find CF% an odd way of evaluating players defensive ability because CF% mixes both corsi for and corsi against stats. It’s kind of like using +/- as a defensive stat when at least half of what goes into +/- is offensive ability.

So, how might I go about evaluating players defensively? Well, one thing I might do is look at a players CA60 RelTM for the past few seasons in 5v5close situations. For defensemen with 1000 5v5close minutes over the last three seasons the leaders in CA60RelTM are Muzzin, Brodie, Stralman, Timonen, and Orlov. The  worst are Doug Murray, Klesla, J. Schultz, Butler and Phaneuf.

Here is the thing though. I believe that defenders (at least some of them) are able to impact their goalies save percentage so I personally think that CA60RelTM is probably not a completely evaluating defensive ability. If we looked at GA60RelTM instead the top defensemen are Doug Hamilton, Bryce Salvador, Matt Niskanen, Sheldon Brookbank, and TJ Brodie while the worst are Schultz, Klesla, Brenden Dillon, Giordano and Tyson Barrie.

Of course I would also want to consider players who play against top offensive opponents and there I would look at guys who play against the best GF60 players on average. The players with the toughest GF60 opponents the past 3 seasons are Phaneuf, Weber, Girardi, McDonagh and Ekman-Larsson while the defensemen with the weakest GF60 opponents the past 3 seasons are Kindl, Meszaros, Sbisa, Engelland and Rozsival.

This is not very scientific because I just did this in about 15 minutes but I filtered all defensemen who had an opponent GF60 higher than 2.25 to get the defensemen with the toughest QoC. I then took all defensemen with a CA60RelTM below -3.0 and also GA60RelTM below -0.2. This gave me 10 defensemen who might worthy of consideration for being among the top defensemen defensively.

  • Brodie
  • Hjalmarsson
  • Brodin
  • Tanev
  • Vlasek
  • Michalek
  • Enstrom
  • Campbell
  • Ekman-Larsson
  • Oduya

Agree? Disagree?

If I take out the GA60RelTM restriction Muzzin, Timonen, Subban, Braun, Doughty, Fayne, Goligoski, Giordano, Andy Greene, and Chara get added into the mix.

If we apply all the restrictions to forwards we get the following 19 players:

  • D. Sedin
  • H. Sedin
  • P. Bergeron
  • C. Perry
  • B. Marchand
  • M. Koivu
  • A. Ponikarovsky
  • A. Kopitar
  • A. Burrows
  • T. Zajac
  • B. Dubinsky
  • D. Backes
  • M. Backlund
  • D. Moss
  • A. Steen
  • A. Hemsky
  • G. Landenskog
  • C. MacArthur
  • M. Hossa

Agree? Disagree? Clearly some players are there as a line effect (Sedin/Sedin/Burrows, Steen/Backes, etc.) but generally speaking I’d consider most of those guys quality 2-way players.

This is in no way meant to be a definitive guide to evaluating players defensive ability but was meant more as a preliminary exercise to see what people think.

TSN Analytics Team, Even Strength Play and Marc-Edouard Vlasic

 Uncategorized  Comments Off on TSN Analytics Team, Even Strength Play and Marc-Edouard Vlasic
Sep 182014

Earlier this week TSN announced the creation of an Analytics team consisting of long-time TSN contributor Scott Cullen along with new TSN additions of Globe and Mail’s James Mirtle and hockey blogger Travis Yost. I am all for main stream media jumping on board with hockey analytics but once you go from independent hockey blogger to a significant contributor to TSN I think it opens the door to higher expectations and higher standards.  Scott Cullen has a long track record with TSN and I am confident James Mirtle will bring some intelligent insight as we are all familar with and respect his work. While I am fully aware of Yost and his blogging history I have to be honest in saying that I have not read a ton of his stuff so I was interested to see what he would offer. After reading his first two articles, I have to say I definitely think there is room for improvement.

Yost’s first article was a look at some trends as to how teams use players during 5 on 5 play. The point I think Yost was trying to make most is that teams are phasing out goons and other “specialists” and replacing them with guys that can play bigger minutes and at both ends of the rink. While this may very well be true I am not sure Yost’s evidence to support this is really valid. He produced a chart that showed that more players are getting more 5v5 ice time per game in 2013-14 than in 2007-08 and his conclusion was that this was evidence of teams moving away from goons and small ice time players.

The rightward shift here should seem apparent – a higher concentration of guys playing larger minutes now as opposed to seven years ago and fewer guys picking up scrap minutes in smaller roles. The number of forwards playing ten or less minutes a night has dropped from 109 in 2007, to 65 in 2014. And the number of forwards playing between 13 and 16 minutes a night has moved from 153 in 2007 to 231 in 2014. As a group, teams may still be leaning on their star players, but there’s also been a more balanced spread of total ice time than there was seven years ago.

First off, the rightward shift that Yost talks about is likely almost exclusively due to the fact that there were far fewer penalties and power plays in 2013-14 than there were in 2007-08 as Yost pointed out earlier. This lead to there being more even strength ice time to be doled out to the same number of players. This will almost certainly produce a right shift as observed. As for a more balanced spread in ice time, I don’t see that either. At least not to any significant extent. If one really wanted to look at this properly instead of looking at number of minutes of even strength ice time played one would want to look at percentage of a teams even strength minutes the player played. This would eliminate the difference in total even strength ice time and truly allow you to see whether teams are using a more balanced line up or not. At the very least one should adjust each players ES TOI by an appropriate amount for one of the seasons based on the ratio of league-wide ES TOI between the two seasons. I’d then be interested to see if a “right shift” occurs or whether there is a meaningful difference in the charts.

Yost’s second article for TSN.ca was about Marc-Edouard Vlasic and how he should probably be getting more recognition for how good he really is. Now that is a sentiment I can generally support but Yost’s supporting evidence for this is analytically unsound in my opinion. The first thing Yost does is identify a number of defensemen who are generally considered the leagues best that we should compare Vlasic too. This is a good start and Yost identified guys like Chara, Doughty, Karlsson, Pietrangelo, Subban, etc. What Yost did next is produce a bubble chart that plots even strength corsi% on the x-axis vs even strength goals % on the y-axis with bubble size representing scoring production. To be honest, I have no clue what the value of this chart is. Both corsi% and goal% are significantly  team driven but there was no accounting for quality of team and goal% has a certain amount of luck and randomness associated with it which was not discussed and I really have no idea what statistic was used for scoring production. The conclusion Yost drew from this chart was that Vlasic was right in the mix with some of the best defensemen in the league. Problem is I am certain I could find a number of other defensemen we generally consider mediocre that would be right there with Vlasic.

There are proper ways to do this kind of analysis and there is no way one can do this without taking into consideration quality of teammates. On my stats site I have teammate statistics (denoted by TM) and one can easily do a comparison of how the players on-ice stats compare to their teammates when their teammates are not playing with them. Doing this we get the following:

Player Name CF60 RelTM
P.K. SUBBAN 6.152

If we use CF60 as a proxy for offensive production we find the best offensive defensemen are Karlsson, Keith and Pietrangelo while the least offensive are Suter, Chara and Doughty. Vlasic is right in the middle and looks pretty good. One might be surprised at Doughty but the rest kind of make sense.

Now, let’s do the same for CA60.

Player Name CA60 RelTM
P.K. SUBBAN -4.586

For CA60 it is better to have a negative number as this indicates you are giving up fewer shot attempts than your teammates when they aren’t playing with you. Here Vlasic is second and looking pretty good.

Now we can combine these two stats by looking at CF% RelTM.

Player Name CF% RelTM
P.K. SUBBAN 4.8%

Out of this group, Vlasic is second best which is pretty good and is evidence that he probably deserves to be in the company of these guys. Now, with that said, this is just a cursory look and in no way a complete analysis. Not only are there limitations by just looking at corsi but there are a lot of other factors that need to be taken into consideration as well (for example, Giordano is probably not that good, only looks good because his Flames teammates are not very good relative to the teammates of the other players on this list). Overall though, this is how I think one should start an analysis of Vlasic and whether he deserves more credit for the player he is. To be fair to Yost, he gets into this a little bit by looking at a timeseries of Vlasic’s Relative Corsi% but in no way is this sufficient and he doesn’t compare it to any of the other defensemen he is comparing Vlasic to.

Overall I applaud TSN for wanting to jump on the analytics band wagon and I am certain Yost has the potential to provide a better analytical view than his first few posts which, to be honest, left me a little underwhelmed if not disappointed.

On the flip side, I saw some good stuff written recently by @MimicoHero that I think is worthy of mention. A recent blog post of his looked at Ryan Johansen’s value to the Blue Jackets and he, in my opinion, did a pretty good job of accounting for usage (i.e. QoT, QoC, zone starts) and comparing Johansen to his peers. I like the tables he produced and how he looked at offense and defense separately. Now I’d probably want to weight QoT far more heavily in the usage metric he came up with but overall a very good methodology for comparing players on different teams playing in different circumstances.


Aug 262014

I am sure many of you are aware that Corey Sznajder (@ShutdownLine) has been working on tracking zone entries and exits for every game from last season. A week and a half ago Corey was nice enough to send me the data for every team for all the games he had tracked so far (I’d estimate approximately 60% of the season) and the past few days I have been looking at it. So, ultimately everything you read from here on is thanks to the time and effort Corey has put in tracking this data.

As I have alluded to on twitter, I have found some interesting and potentially very significant findings but before I get to that let me summarize a bit of what is being tracked with respect to zone entries.

  • CarryIn% – Is the percentage of time the team carried the puck over the blue line into the offensive zone.
  • FailedCarryIn% – Is the percentage of the time the team failed to carry the puck over the blue line into the offensive zone.
  • DumpIn% – is the percentage of the time the team dumped the puck into the offensive zone.

The three of these should sum up to 100% (Corey’s original data treated FailedCarryIn% separately so I made this adjustment) and represent the three different outcomes if a team is attempting to enter the offensive zone – successful carry in, failed carry in, and dumped in.

I gathered all this information for and against for every team and put them in a table. I’ll spare you all the details as to how I arrived at this idea I had but here is what I essentially came up with:

  • Treat successful carry ins as a positive
  • Treat failed carry in attempts as a negative (probably results in a quality counter attack against)
  • Dump ins are considered neutral (ignored)

So, I then came up with NetCarryIn% which is CarryIn% – FailedCarryIn% and I calculated this for each team for and against to get NetCarryIn%For and NetCarryIn%Against for each team.

I then subtracted NetCarryIn%Against from NetCarryIn%For to get NetCarryIn%Diff.

In all one formula we have:

NetCarryIn%Diff = (CarryIn%For – FailedCarryIn%For) – (CarryIn%Against – FailedCarryIn%Against)

Hopefully I haven’t lost you. So, with that we now get the following results.

Team Playoffs? NetCarryIn%Diff RegWin%
Chicago Playoffs 12.2% 61.0%
Tampa Playoffs 6.1% 53.0%
Anaheim Playoffs 5.9% 64.6%
Colorado Playoffs 5.5% 59.1%
Detroit Playoffs 4.7% 51.2%
Minnesota Playoffs 4.1% 53.0%
Pittsburgh Playoffs 4.0% 59.8%
Dallas Playoffs 3.8% 51.8%
New Jersey . 3.4% 48.2%
Los Angeles Playoffs 1.7% 53.7%
Boston Playoffs 1.3% 67.1%
St. Louis Playoffs 1.2% 60.4%
Ottawa . 0.9% 47.6%
Columbus Playoffs 0.7% 51.8%
Edmonton . 0.7% 35.4%
NY Rangers Playoffs -0.1% 54.9%
Phoenix . -1.3% 48.8%
Montreal Playoffs -1.3% 53.0%
Vancouver . -1.7% 43.9%
Philadelphia Playoffs -1.8% 53.0%
Winnipeg . -1.8% 43.3%
San Jose Playoffs -2.3% 59.1%
NY Islanders . -3.0% 40.2%
Toronto . -4.8% 42.7%
Nashville . -6.0% 50.6%
Calgary . -6.4% 38.4%
Washington . -6.4% 46.3%
Florida . -6.7% 35.4%
Carolina . -6.8% 47.0%
Buffalo . -7.7% 25.6%

‘Playoffs’ indicates a playoff team and RegWin% is their regulation winning percentage (based on W-L-T after regulation time).

What is so amazing about this is we have taken the first ~60% of games and done an excellent job of predicting who will make the playoffs. The top 8 teams (and 11 of top 12) in this stat through 60% of games made the playoffs and all of  the bottom 8 missed the playoffs. That’s pretty impressive as a predictor. What’s more, the r^2 with RegWin% is a solid 0.42, significantly better than the r^2 with 5v5 CF% which is 0.31. Here are what the scatter plots look like.



I think what we are seeing is that if you are more successful at carrying the puck into the offensive zone, but not at the expense of costly turnovers attempting those carry ins, than your opponent you will win the neutral zone and that goes a long way towards winning the game. Recall that I have shown that shots on the rush are of higher quality than shots generated from zone play so an important key to winning is maximizing your shots on the rush and minimizing your opponents shots on the rush. To an extent this may in fact actually be measuring some level of shot quality.

Of course, why stop here. If it is in fact some sort of measure of shot quality, why not combine it with shot quantity? To do this I took NetCarryIn%Diff and add to it the teams Corsi% – 50%. This is what we get.

Team Playoffs? NetCarryIn%Diff – CF% over 50%
Chicago Playoffs 17.7%
Los Angeles Playoffs 8.5%
New Jersey . 7.8%
Tampa Playoffs 7.1%
Detroit Playoffs 6.2%
Anaheim Playoffs 5.7%
Boston Playoffs 5.2%
St. Louis Playoffs 4.3%
Dallas Playoffs 4.3%
Ottawa . 3.3%
Minnesota Playoffs 2.7%
Pittsburgh Playoffs 2.7%
Colorado Playoffs 2.5%
NY Rangers Playoffs 2.3%
San Jose Playoffs 1.4%
Columbus Playoffs 0.6%
Vancouver . -0.4%
Phoenix . -0.8%
Winnipeg . -1.7%
Philadelphia Playoffs -1.8%
NY Islanders . -3.6%
Montreal Playoffs -4.6%
Edmonton . -5.0%
Florida . -5.7%
Carolina . -6.5%
Nashville . -7.5%
Washington . -8.7%
Calgary . -10.1%
Toronto . -11.9%
Buffalo . -14.7%

New Jersey still messes things up but New Jersey is just a strange team when it comes to these stats. But think about this. If New Jersey and Ottawa made the playoffs over Philadelphia and Montreal it would have a perfect record in predicting the playoff teams. It was perfect in the western conference.

Compared to Regulation Win Percentage we get:


That’s a pretty nice correlation and far better than corsi% itself.

Now, this could all be one massive fluke and none of this is repeatable but I am highly doubtful that will be the case. We may be on to something there. Will be interesting to see what individual players look like with this stat and I’ll also take a look at whether zone exits should somehow get factored in to this equation. I suspect it may not be necessary as it may be measuring something similar to Corsi% (shot quantity over quality).