Persistence and Predictability

 David Johnson, Statistical Analysis  Comments Off on Persistence and Predictability
Jun 012011

There seems to be some confusion, or lack of clarity, about my post on corsi vs shooting percentage vs shooting rate the other day so let me clear it up in as straight forward a way as I can.

“Hawerchuk” over at writes the following:

“I’m not totally sure what he’s getting at. People use Fenwick because it’s persistent, and PDO because it’s not. Over the course of a single season, observed shooting and save percentage drive results, but they are not persistent.”

Dirk Hoag over at writes:

“Here’s an example of when NOT to use correlation as a tool in statistical analysis (when the variables in question are linked by definition). David makes a bad blunder here, by looking at scoring leaders, seeing a bunch of high shooting percentages, and concluding that shooting percentage is the true “talent”. The problem is that shooting percentage swings wildly from season to season, whereas shooting rates are much more consistent.”

The great advantage of corsi/fenwick has over goals as an evaluator of talent is the greater sample size associated with it.  The greater the sample size the more confidence we can have in any results we conclude from it and the less chance that ‘luck’ messes things up.  Year over year shooting percentage fluctuates a lot, but that doesn’t necessarily mean that it isn’t a talent or doesn’t have persistence, it could mean that the sample size of one year is too small.  The four year shooting percentage leader board seems to identify all the top offensive players so it can’t be completely random.  So what happens if we increase the sample size?  Here are correlations of fenwick shooting percentages while on ice in 5v5 even strength situations for forwards:

Year(s) vs Year(s) Corrolation
200708 vs 200809 0.249
200809 vs 200910 0.268
200910 vs 201011 0.281
200709 vs 200911 (2yr) 0.497

As you can see, there isn’t a lot of persistence year over year but for 2 years over 2 years we are starting to see some persistence.  Still not to the level of corsi/fenwick, but certainly not non-existant either, and the greater correlation with scoring goals makes fenwick shooting percentage on par with fenwick as a predictor of future goal scoring performance when we have 2 seasons of data as I pointed out in my last post.

For the record, year over year correlation for fenwick for rate is approximately 0.60 depending on years used  and 2 year vs 2 year correlation is 0.66.

But as I pointed out in my previous post, you would probably never use shooting percentage as a predictor because you may as well use goal rate instead which has the same sample size limitations as shooting percentage but also factors in fenwick rate.  Year over year correlation of GF20 (goals for per 20 minutes) is approximately 0.45 depending on years used and the 2 year vs 2 year correlation is 0.619 so GF20 has persistence and has a 100% correlation with itself making it as reliable (or more) a predictor of future goal scoring rates as fenwick rate with just one year of data and a better predictor when using 2 years of data.  Let me repost the pertinent table of correlations:

Year(s) vs Year(s) FenF20 to GF20 GF20 to GF20
200708 vs 200809 0.396 0.386
200809 vs 200910 0.434 0.468
200910 vs 201011 0.516 0.491
Average 0.449 0.448
200709 vs 200911 (2yr) 0.498 0.619
200709 vs 200910 (2yr vs 1yr) 0.479 0.527

The conclusion is, when dealing with less than a years worth of data, fenwick/corsi is probably the better metric to identify talent and predict future performance, but anything greater than a year goals for rate is the better metric and for one years worth of data they are about on par with each other.

Note:  This is only true for forwards.  The same observations are not true about defensemen where we see very little persistence or predictability in any of these metricts, I presume because the majority of them don’t drive offense to any significant degree.

May 302011

The general consensus among advanced hockey statistic analyzers and is that corsi/fenwick stats are the best statistic for measuring player and team talent levels.  For those of you who are not aware of corsi and fenwick let me give you a quick definition.  Corsi numbers are the number of shots directed at the goal and include shots, missed shots and blocked shots.  Fenwick numbers are the same except it does not included blocked shots (just shots and missed shots).  I generally look at fenwick and will do that here but fenwick and corsi are very highly correlated to the results would be similar if I used corsi.

The belief by many that support corsi and fenwick is that by looking at fenwick +/- or fenwick ratio (i.e. fenwick for /(fenwick for + fenwick against)) is an indication of which team is controlling the play and the team that controls the play more will, over time, score the most goals and thus win the most games.  There is some good evidence to support this, and controlling the play does go a long way to controlling the score board.  The problem I have with many corsi/fenwick enthusiasts is that they often dismiss the influence that ability to drive or suppress shooting percentage plays in the equation.  Many dismiss it outright, others feel it has so little impact it isn’t worth considering except when considering outliers or special cases.  In this article I am going to take an in depth look at the two and their influence on scoring goals on an individual level.

I have taken that last 4 seasons of 5v5 even strength data and pulled out all the forwards that have played at minimum 2000 minutes of 5v5 ice time over the past 4 seasons.  There were a total 310 forwards matching that criteria and for those players I calculated the fenwick shooting percentage (goals / fenwick for), fenwick for rate (FenF20 – fenwick for per 20 minutes of ice time) and goal scoring rate (gf20 – goal for per 20 minutes ice time) while the player was on the ice. What we find is shooting percentage is more correlated with goal production than fenwick rate.

Shooting % vs GF20 R^2 = 0.8272
FenF20 vs GF20 R^2 = 0.4657
Shooting % vs FenF20 R^2 = 0.1049

As you can see, shooting percentage is much more highly correlated with goal scoring rate than fenwick rate is which would seem to indicate that being able to drive shooting percentage is more important for scoring goals than taking a lot of shots.

Here is a list of the top 20 and bottom 10 players in fenwick shooting percentage and fenwick rate.

Rank Player FenSh% Player FenF20

For both lists, the players are the top of the list are for the most part considered top offensive players and the players at the bottom of the list are not even close to being considered quality offensive players.  So, it seems that both shooting percentage and fenwick do a reasonable job at identifying offensively talented players.  That said, the FenF20 list includes 7 players (Zetterberg, Datsyuk, Holmstrom, Franzen, Hudler, Cleary and Samuelsson) who have played mostly or fully with the Detroit Red Wings and it seems unlikely to me that 7 of the top 20 offensive players are Red Wing players.  Furthermore, the fenwick list also includes guys like Ponikarovsky, Samuelsson, Hudler, Cleary, Williams, etc. who would probably be considered secondary offensive players at best.  From just this cursory overview it seems to confirm what we saw with the correlations – Shooting Percentage is a better indicator of offensive talent than Fenwick For rates.

It is actually no surprise that the Red Wings dominate the fenwick rate leader board because the Red Wings organizational philosophy is all about puck control.

“It’s funny because our game looks at numbers just like other games,” says Red Wings general manager Ken Holland, “but as much value as we assign to puck possession and how essential it is to winning, we really don’t have a numerical value for it that everyone can agree on. Remember when [A’s general manager] Billy Beane started emphasizing on-base percentage in baseball? It wasn’t just a curious number; it changed the game. It redefined the type of player you wanted on your team. It’s coming in hockey; we just have to figure out how.”

This got the pro-corsi crowd riled up a bit as they said “Umm, yeah, we have that stat and it is called corsi” and were a bit bewildered at why NHL GMs didn’t make that recognition.  But anyway, what the above shows is that an organization that focuses on puck control dominates the corsi for statistic so I guess what that shows is that corsi/fenwick probably is a good measure of puck control.  But, as we have seen, fenwick (i.e. puck control) doesn’t automatically translate into goals scored.  There are no Red Wing players among the top 20 in fenwick shooting percentage and Datsyuk is the only Red Wing player in the top 20 in goals for per 20 minutes so while they take a lot of shots (or at least shot attempts), they aren’t the best at converting them into goals.

For me, and I am sure many others, the above is enough to conclude that shooting percentage matters a lot in scoring goals, but for the staunch corsi supporters they will argue that corsi is more persistent from season to season and thus is a better predictor of future performance.  So which is the better predictor of future performance?  The following table shows the correlation between shooting percentage and fenwick rate with the following seasons goal scoring rate.

Year(s) vs Year(s) FenSh% to GF20 FenF20 to GF20
200708 vs 200809 0.253 0.396
200809 vs 200910 0.327 0.434
200910 vs 201011 0.317 0.516
Average 0.299 0.449
200709 vs 200911 (2yr) 0.479 0.498
200709 vs 200910 (2yr vs 1yr) 0.375 0.479

Note:  For the above season(s) vs season(s) correlation calculations, only players with at least 500 5v5 even strength minutes in each of the four seasons are included.  This way the same players are included in all season(s) vs season(s) correlation calculations.

As you can see, when dealing with a single season of data the correlation with GF20 is much better for fenwick rate than for fenwick shooting percentage.  The gap closes when using 2 seasons as the predictor of a single season and is almost gone when using 2 seasons to predict the following 2 seasons.  It seems that the benefit of using corsi over shooting percentage diminishes to near zero when we have multiple seasons of data and though I haven’t tested it shooting percentage probably has an edge in player evaluation with 3 years of data.

Of course, you would never want to use shooting percentage as a predictor of future goal scoring rate when you could simply use past goal scoring rate as the predictor.  Past goal scoring rate has the same ‘small sample size’ limitations as shooting percentage (both use goals scored as it sample size limitation) but scoring rate combines the prediction benefits of shooting percentage and fenwick rate.  The table below is the same as above but I have added in GF20 as a predictor.

Year(s) vs Year(s) FenSh% to GF20 FenF20 to GF20 GF20 to GF20
200708 vs 200809 0.253 0.396 0.386
200809 vs 200910 0.327 0.434 0.468
200910 vs 201011 0.317 0.516 0.491
Average 0.299 0.449 0.448
200709 vs 200911 (2yr) 0.479 0.498 0.619
200709 vs 200910 (2yr vs 1yr) 0.375 0.479 0.527

The above table tells you everything you need to know.  When looking at single seasons both GF20 and FenF20 perform similarly at predicting next seasons GF20 with fenwick shooting percentage well behind but when we have 2 years of data as the starting point, GF20 is the clear leader.  This means, when we have at least a full seasons worth of data (or approximately 500 minutes ice time), goal scoring rates are as good or better than corsi rates as a predictor of future performance and beyond a years worth of data the benefits increase.  When dealing with less than a full season of data, corsi/fenwick may still be the preferred stat when evaluating offensive performance.

So what about the defensive side of things?

Year(s) vs Year(s) FenA20 to GA20 GA20 to GA20
200708 vs 200809 0.265 0.557
200809 vs 200910 0.030 0.360
200910 vs 201011 0.120 0.470
Average 0.138 0.462
200709 vs 200911 (2yr) -0.037 0.371
200709 vs 200910 (2yr vs 1yr) 0.000 0.316

Defensively, fenwick against rate is very poorly correlated with future goals against rate and it gets worse, to the point of complete uselessness, when we consider more seasons.  Past goals against rate is a far better predictor of future goals against rate.

Where it gets interest is unlike offense correlation drops when you consider more seasons which seems a bit strange.  My guess is the reason we are seeing this is because I am just looking at forwards and defense is more driven by goaltending and defensemen and as more time passes the greater the difference are in goalie and defensemen teammates.  Furthermore, forward ice time is largely driven by offensive ability (and not defensive ability) so many of the quality defensive forwards may be removed from the study because of the 500 minute per season minimum I am using (i.e. the group of players used in this study are biased towards those that aren’t focusing on defense).  Further analysis is necessary to show either of these as true though but the conclusion to draw from the above table that, for forwards at least, goals against rates are by far the better indicator of defensive ability.

In summary, it should be clear that we cannot simply ignore the impact of a players ability to drive or suppress shooting percentage in the individual player performance evaluation and so long as you have a full year of data (or > 500 or more minutes ice time) the preferred stat for individual player performance evaluation should be goal scoring rate.  Corsi/fenwick likely only provide a benefit to individual performance evaluation when dealing with less than a full year of data.

Sep 162010

On Monday I outlined an all-encompassing player evaluation model that allows us to evaluate every forward, defenseman and goalie under the same methodology.  In short, the system compares how many goals are scored for and against while a player is on the ice and compares it to how many goals scored for/against one should expect based on the quality of his line mates and opposition.  That model, I believe, makes a reasonable attempt at evaluating a players performance, but it can be improved.

The first method of improvement is to utilize the additional information we have about the quality of a players line mates and opposition once we have run the model.  Initially I use the goals for and against performance of his line mates and opposition when the player being evaluated is not on the ice at the same time as his line mates and opposition.  But now that we have run the model we, at least theoretically, have a better understanding of the quality of his team mates and opposition.  I can then take the output of the first model run and use it as the input of the second model run to get new and better results.  I can then continue doing this iteratively and the good news is that after every iteration the difference between the player rating from that iteration and the previous iteration trends towards zero which is a very nice result.

Continue reading »

Jan 242008

I am a believer that most people over-rate the draft as a tool to building a successful team. This is not to say that I don’t think it is an important tool, but rather that it isn’t the only tool and probably not the most important tool. Successful teams, even in the post-lockout NHL, can and have been built significantly through trades and free agent signings.

I looked at every teams current rosters and added up how many players on the roster was drafted by that team. For forwards and defensemen the player must have played 20 games this year in the NHL and for goalies they must have played 10 games. Some exceptions have been made for significant players who have not met the games played requirements due to injury as I think the team deserves credit for these players too. Joe Sakic is an example. Lesser established players such as Carlo Colaiacovo was not counted even though he likely would have played 20 games had he been healthy.

Based on those requirements here are how each team fared.


13-San Jose



9-New Jersey
9-St. Louis

8-NY Rangers

7-Los Angeles

6-Tampa Bay


4-NY Islanders


At the top of the list you find a lot of pretty good teams though the team at the very top, Buffalo, currently sits near the bottom of the eastern conference standings. At the bottom of the list you will find many bad or mediocre teams such as the NY Islanders, Carolina, Phoenix, Tampa and Atlanta but at the very bottom of the list you will find two pretty good teams, including last years Stanley Cup winner, who have largely built their teams through trades and free agent signings. In Philadelphia’s case the team was almost completely rebuilt in the past year. Also in the lower part of the list is Calgary who are a pretty good team but whose only draft picks that are playing a significant role on the team are Dion Phaneuf and Matthew Lombardi.

Is smart drafting important? Sure. Is it the be all and end all in building a successful team? Definitely not.

Click more for a list of each teams draft picks currently playing on their roster. If I have missed anyone post a comment.

Continue reading »

Nov 082007

Well, after a couple of days of work I have managed to re-design my stats website which to be honest, was pretty much non-existant and/or non-usable before this update. I have put the power rankings, player rankings, and adjusted hits/giveaways/takeaways, and player on ice/off ice and with/without teammate and against/not against opponent data on that website. In a few days I will migrate much of that off of this blog site and move it there where it will be easier to update which means far more frequent updates.

For the regulars to this website, please head over there and take a look at the new redesigned user interface and let me know what you think of the new look? Do you like it? Anything you want to see changed/improved? I ask because my plan is to use an almost identical look and feel here on my blog. I think by moving much of the stats over there and also using a cleaner page design the page will look significantly less cluttered. Eventually i hope to re-design in a similar way but before I go ahead and put in all that effort feedback from you would be greatly appreciated. I am almost ready to roll out this new design on this blog so if the feedback is positive I may change it over in the next day or two.

Jun 252007

A couple weeks ago I posted an article about the Leafs defence and how it isn’t as bad as many people think. Well, since then I have been working on trying to improve on the methodology by including shot type (slap shot, wrist shot, snap shot, tip in, backhand, wraparound) and in that process I found a few mistakes/issues in what I did previously.

First, I found a bug in my program that caused a number of powerplay goals to be considered even strength goals. When I fixed this the general conclusions of that article remained in tact though the amount of the goals caused by the goalie was reduced for most teams. In general the closer that a teams penalty kill ability was to their even strength ability the more valid the results but where a teams penalty kill ability was seemingly far superior or inferior to their even strength ability the results were skewed. The most notable team was the Philadelphia Flyers who were down right horrible at even strength but for some reason managed to have a pretty good penalty kill. When I fixed the bug the Flyers looked far worse than they did in the previous article.

The second issue I discovered is that not all shot distances and types are created equal and that there is significant (unintentional) bias in how game monitors decide what is a wrist shot vs snap shot as well as the distance a shot is from the goal. It is a bit surprising but it is clear to me now that some game monitors have a real hard time distinguishing between a 10’ shot and a 15’ shot. Since teams play half their games at the same arena (their home arena) if that arena’s game monitor couldn’t judge shot type or shot distance very well their shot difficulty ratings would get significantly biased one way or the other.

In an ideal world I would there would be an easy method for factoring out that bias and using all the data but I cannot think of any such easy method. The quick and dirty solution is to just look at shots against while playing on the road. This will eliminate any significant home arena bias and hopefully and biases found in other arenas will get averaged out on their own. For the most part this is likely true but I am still not completely happy with this solution because I think on some level teams play a bit different at home than on the road. More on this later but for now just looking at road shots is the best solution so lets go with that.

Ok, so what I did was group shots into 19 categories based on shot type and distance.

  • 0-14′ wrist shot
  • 15-29′ wrist shot
  • 30-44′ wrist shot
  • 45+’ wrist shot
  • 0-14′ snap shot
  • 15-29′ snap shot
  • 30-44′ snap shot
  • 45+’ snap shot
  • 0-14′ slap shot
  • 15-29′ slap shot
  • 30-44′ slap shot
  • 45+’ slap shot
  • 0-9′ tip-in
  • 10-25′ tip-in
  • 26+’ tip-in
  • 0-12′ backhand
  • 13-24′ backhand
  • 25+’ backhand
  • wraparound

I then performed an analysis more or less equivalent to the analysis done in the previous article. By doing that I come up with the following results:

Team ExpGA/60m GA/60m Goalie Impact
Philadelphia 2.58 3.15 0.57
Los Angeles 2.68 3.08 0.39
Edmonton 2.49 2.85 0.36
Washington 2.63 2.95 0.32
Phoenix 2.39 2.71 0.32
Tampa Bay 2.52 2.80 0.28
Montreal 2.71 2.98 0.27
Carolina 2.39 2.56 0.17
Chicago 2.66 2.82 0.15
Colorado 2.66 2.76 0.10
Boston 2.87 2.95 0.07
Calgary 2.46 2.50 0.03
San Jose 2.08 2.10 0.02
Pittsburgh 2.50 2.51 0.01
Toronto 2.44 2.42 -0.02
Anaheim 2.23 2.18 -0.05
Florida 2.58 2.52 -0.06
NY Islanders 2.71 2.62 -0.09
Columbus 2.02 1.93 -0.09
Minnesota 2.45 2.33 -0.12
NY Rangers 2.38 2.16 -0.22
Detroit 2.11 1.88 -0.24
Nashville 2.61 2.36 -0.25
Dallas 2.16 1.91 -0.26
Buffalo 2.67 2.38 -0.30
Ottawa 2.60 2.30 -0.30
New Jersey 2.48 2.11 -0.37
Atlanta 2.62 2.21 -0.41
Vancouver 2.34 1.81 -0.53
St. Louis 2.73 2.18 -0.55

Goalie Impact is the amount of goals per 60 minutes of even strength ice time the goalie is responsible for above or below what an average goalie would allow.
There are a lot of similarities between the table above and the table in my previous article but there are some teams that have moved up or down the list.

Toronto: Toronto’s defence (ExpGA/60m) drops a bit in the ratings influenced partially by factoring in the amount of time the Leafs spend at even strength but more significantly because of removing a small home ice bias that made shots at Air Canada be reported as being at a slightly greater distance than they likely actually were (it should be noted the Air Canada Center bias was much lower than some other arenas). The result is that the Leafs dropped to 11th best road defence from 8th. Not a huge drop but I figured since the article was primarily about the Leafs defence I should mention it. Also, because of these changes it makes Andrew Raycroft look much better than under the previous analysis. Using this current approach the net effect of Leaf goaltending is pretty neutral (i.e. On the Leaf goaltending was about average). And this gets to my problem with just looking at road statistics. The Leafs were generally a worse team at home than on the road despite the fact that it is typical for a team to play better (by about 10%) on home ice as road ice. Raycroft may have been the culprit as he had an .898 save percentage on the road and a .890 save percentage at home but it could also be the Leafs as a team played differently and gave up tougher shots at home. Without reliable statistics we will never know which is true or whether it is some combination of the two.

St. Louis: Wow! How did they become the team with the goalies that saved the most goals? That is hard to believe considering the names of the goalies they have on their roster but it seems to be a function of the number of shots and their difficulty. St. Louis goalies also posted a much better road save % than a home save %.

The next thing I looked at is how individual goalies performed. Based on league wide save percentages for the 19 groupings I calculated how many goals a perfectly average goalie should give up given the shot types that each goalie faced. Here are the results sorted by the number of goals the goalie saved per game.

Name Team ExpGoals Goals Diff Diff/game
SANFORD, CURTIS St. Louis 30.80 23.00 7.80 0.42
OSGOOD, CHRIS Detroit 22.82 18.00 4.82 0.34
LUONGO, ROBERTO Vancouver 59.43 42.00 17.43 0.33
BACASHIHUA, JASON St. Louis 22.43 19.00 3.43 0.33
HEDBERG, JOHAN Atlanta 20.96 17.00 3.96 0.32
LEGACE, MANNY St. Louis 35.99 27.00 8.99 0.29
BRODEUR, MARTIN New Jersey 81.80 65.00 16.80 0.28
LEHTONEN, KARI Atlanta 65.11 52.00 13.11 0.27
DIPIETRO, RICK NY Islanders 63.73 52.00 11.73 0.27
TURCO, MARTY Dallas 56.34 44.00 12.34 0.27
MILLER, RYAN Buffalo 54.75 43.00 11.75 0.26
VOKOUN, TOMAS Nashville 40.43 33.00 7.43 0.24
LUNDQVIST, HENRIK NY Rangers 59.50 49.00 10.50 0.21
GERBER, MARTIN Ottawa 29.63 26.00 3.63 0.19
BURKE, SEAN Los Angeles 33.80 31.00 2.80 0.17
EMERY, RAY Ottawa 57.16 50.00 7.16 0.17
NORRENA, FREDRIK Columbus 40.98 35.00 5.98 0.17
HASEK, DOMINIK Detroit 33.82 27.00 6.82 0.17
THIBAULT, JOCELYN Pittsburgh 26.19 24.00 2.19 0.17
RAYCROFT, ANDREW Toronto 62.74 55.00 7.74 0.15
MASON, CHRIS Nashville 44.26 40.00 4.26 0.15
BELFOUR, ED Florida 49.00 43.00 6.00 0.15
THOMAS, TIM Boston 62.58 57.00 5.58 0.13
HUET, CRISTOBAL Montreal 41.48 38.00 3.48 0.12
KHABIBULIN, NIKOLAI Chicago 58.75 54.00 4.75 0.12
BACKSTROM, NIKLAS Minnesota 36.16 34.00 2.16 0.08
KIPRUSOFF, MIIKKA Calgary 65.53 62.00 3.53 0.07
GRAHAME, JOHN Carolina 27.94 27.00 0.94 0.05
GIGUERE, J Anaheim 45.88 44.00 1.88 0.05
GARON, MATHIEU Los Angeles 22.90 22.00 0.90 0.04
TOSKALA, VESA San Jose 30.08 29.00 1.08 0.04
NABOKOV, EVGENI San Jose 39.80 39.00 0.80 0.02
KOLZIG, OLAF Washington 46.13 46.00 0.13 0.00
BRYZGALOV, ILJA Anaheim 25.00 25.00 0.00 0.00
FLEURY, MARC-ANDRE Pittsburgh 56.29 57.00 -0.71 -0.02
BUDAJ, PETER Colorado 56.11 57.00 -0.89 -0.02
SMITH, MIKE Dallas 15.23 16.00 -0.77 -0.05
ROLOSON, DWAYNE Edmonton 64.88 68.00 -3.12 -0.06
THEODORE, JOSE Colorado 27.57 29.00 -1.43 -0.07
HOLMQVIST, JOHAN Tampa Bay 45.68 48.00 -2.32 -0.07
JOSEPH, CURTIS Phoenix 51.39 54.00 -2.61 -0.07
BIRON, MARTIN Philadelphia 33.72 36.00 -2.28 -0.09
WARD, CAM Carolina 48.86 53.00 -4.14 -0.10
FERNANDEZ, MANNY Minnesota 38.52 42.00 -3.48 -0.12
NIITTYMAKI, ANTERO Philadelphia 50.70 55.00 -4.30 -0.12
BOUCHER, BRIAN Columbus 17.38 19.00 -1.62 -0.14
LECLAIRE, PASCAL Columbus 13.87 16.00 -2.13 -0.14
JOHNSON, BRENT Washington 37.11 40.00 -2.89 -0.15
AULD, ALEXANDER Florida 30.08 33.00 -2.92 -0.16
DENIS, MARC Tampa Bay 37.94 43.00 -5.06 -0.17
AEBISCHER, DAVID Montreal 26.86 31.00 -4.14 -0.20
TELLQVIST, MIKAEL Phoenix 23.38 28.00 -4.62 -0.22
MARKKANEN, JUSSI Edmonton 16.84 20.00 -3.16 -0.25
HALAK, JAROSLAV Montreal 20.38 25.00 -4.62 -0.41
TOIVONEN, HANNU Boston 16.45 22.00 -5.55 -0.55
CLOUTIER, DAN Los Angeles 17.88 28.00 -10.12 -0.68
DUNHAM, MIKE NY Islanders 18.94 28.00 -9.06 -0.75
ESCHE, ROBERT Philadelphia 23.18 32.00 -8.82 -0.86

For the most part the table makes perfect sense. It is still surprising to see the St. Louis goalies near the top of the list (I am beginning to think this is an anomaly of some sort) but it is no surprise to see Luongo, Brodeur, Lehtonen, DiPietro, Turco, Miller, Vokoun, Lundqvist, etc. near the top of the list and Esche, Dunham, Cloutier, Toivonen, etc. at the bottom of the list. So for the most part the list passes the smell test as everything seems right.

Here are some more observations:

1. Leaf goalie Andrew Raycroft does OK here as well as he would be somewhere in the middle of the NHL regular starting goalies.

2. It is interesting that Martin Gerber ranks higher than Ray Emery.

3. While the Panthers have seemingly upgraded from Belfour to Vokoun, the same cannot be said for the Leafs as Toskala is ranked well below Raycroft. I should add that while Toskala has a pretty good save percentage it could be attributed to the quality of his opponent as he has started against the Kings 5 times, and Coyotes 4 times, St. Louis, Columbus and the weak offensive Dallas Stars 3 times. That is a pretty easy schedule. Interestingly, like Raycroft, Toskala also seemed to perform much better on the road.

4. Teams might want to consider avoiding trading for Manny Fernandez and his large contract as he ranks quite poorly.

What’s left to do?

There are still a couple of things I would like to tackle in this area of analysis. The first would be to see if I can come up with some kind of reliable method for making use of the home stats. The second thing is that while I think the above analysis does a pretty good job of accounting for shot difficulty I think the quality of the shooter is still factor that is not factored in fully and I’d like to see if I can find some kind of method for factoring that in. Problem is, I am not sure if there is enough data to properly evaluate individual shooters but I might give it a try.

Draft Day Tomorrow

 NHL, Statistical Analysis  Comments Off on Draft Day Tomorrow
Jun 212007

It is draft day tomorrow so I figured it would be a good idea to refer everyone to a simple draft analysis I conducted a year or so ago. It basically gives everyone an indication of what to expect from players based on where they are drafted (i.e. top 5, top 10, mid first round, late first round, etc.) but everyone should remember that this is supposedly a weak draft so the chances of a player making an impact in the NHL are probably even lower (at least for first round picks).

May 302007

Note: I have produced a followup article to this one that corrects some small mistakes I made with respect to counting some PP goals as even strength goals, factors in shot type, and better deals with some biases that are present in the NHL statistics. Feel free to read the article below but be sure to also read the followup article which I would consider a much more reliable evaluation of defence and goaltending in the NHL.

I don’t know how many gazillion times I have heard people say that the Leafs defence sucks big time and is the reason for the Leafs failures and every time I hear that I cringe because it is so not true. And then I argue that the real problem is not the Leafs defence but the Leafs goaltending. I often quote statistics like how the Leafs give up relatively few shots against and the counter argument against that is that the Leafs may not give up a lot of shots, but they give up high quality shots. Although I have always suspected that is not the case it is a real difficult argument to argue against because there is no easy way to evaluate shot quality. But being the stubborn guy that I am I am going to give it my best shot.

In the NHL’s play by play reports they keep track of the distance of each shot that is taken and I think this might be the easiest and only reliable stat to use as a proxy for shot quality. The idea is that the closer in a shot is to the goal, the more difficult the shot is. So, what I did was track all shots against that every team gave up this past season and grouped them according to the distance the shot was taken from the goal. The groupings I used were 0-5, 6-10, 11-15-, 16-20, 21-30, 31-40, 41-50, 51-60, and 60+ feet. I also kept track of how many goals were scored from each distance grouping so I could determine shooting percentage for each distance group as well. Only even strength shots were considered as well. Here is what I found on a league-wide basis.

Distance Shots Goals Shooting %
0-5 201 53 26.4
6-10 3522 797 22.6
11-15 6854 1603 23.4
16-20 5471 1017 18.6
21-30 9080 1160 12.8
31-40 9133 651 7.1
41-50 8859 410 4.6
51-60 7412 325 4.4
61+ 3314 87 2.6
Total 53846 6103 11.3

As one might expect the closer the shot the higher the chance that the puck goes in the net with shots from inside of 20 feet or so being the best shots to take. I realize that there will still be some variances in difficulty of shot within these groupings (i.e. a one timer on a cross ice pass being more difficult than a straight shot from off the side of the net) but by factoring out shot distance we should be doing a pretty decent job of accounting for a significant portion of what makes a shot difficult.

The next thing I did was to look at how each team does in terms of giving up shots from the various distance groupings with a particular interest in seeing how the Leafs stacked up to the rest of the league. Warning: These results may be frightening to those who want so desperately to believe that the Leafs defence sucks. View with caution.

(Click image to open full size in a new browser window)

As you can see from the above chart is that the Leafs do an excellent job at limiting the number of shots from close-mid range, particularly in the 11-15 foot range as they give up the 4th fewest shots from that distance. From the other high shooting percentage distances (0-5, 6-10 and 16-20) the Leafs are middle of the pack or slightly below.

The next thing I did was to take a look at the shooting percentage against for each team from each distance grouping. The results can be seen in the following chart.

(Click image to open full size in a new browser window)

As you can see, the Leafs goaltenders have one of the highest overall shooting percentages against which is consistent with the fact they have one of the worst save percentages in the league. But what is interesting is that Leaf goalies (mostly Raycroft) are the worst at saving shots from 11-15 feet as well as 41-50 feet and are among the worst from 16-20 feet, and 21-30 feet. All I have to say is thank goodness the Leafs defence were good at limiting the number of shots from the 11-15 foot range or else last season would have been much worse for the Leafs.

Chris Boersma over at Hockey Numbers has done some interesting work looking at goalies save percentages based on shot location and he found that Raycroft really sucks at stopping puck shot high and to the glove side. This is interesting because probably the best place to shoot a puck high is in that 10-20 foot range because you are far enough out that you can get it up and over the goalie but close enough to not give the goalie a lot of reaction time. Clearly these kinds of shots are killing Andrew Raycroft and the Leafs.

So finally, I wanted to summarize all this data in terms of a single, easy to understand number so we can compare teams and how difficult of shots they give up. To do this I took the number of shots each team gave up in each distance grouping and multiplied it by the league-wide shooting percentage for that group and then summed up the numbers for all groups. The result is one number which represents how may goals a team would give up if they had a perfectly average goalie with a perfectly average save % from shots taken at each distance grouping. Here are the results:

Team Expected ES
Goals Against
Dallas 165.52
Detroit 172.03
San Jose 184.91
Vancouver 188.65
Minnesota 189.13
Anaheim 190.83
Calgary 192.36
Toronto 193.47
Columbus 196.11
New Jersey 199.28
Florida 200.68
Tampa Bay 202.01
Chicago 202.04
Colorado 202.13
Edmonton 203.03
Los Angeles 203.96
Buffalo 204.50
Carolina 207.78
NY Rangers 208.88
Phoenix 210.28
Pittsburgh 210.48
Washington 211.93
Philadelphia 212.58
Nashville 216.16
Montreal 217.32
St. Louis 220.47
NY Islanders 221.55
Ottawa 223.25
Atlanta 225.13
Boston 226.57

What is interesting with that table is that for the most part the teams perceived as the good defensive teams (Dallas, Detroit, Vancouver, Minnesota, Anaheim, Calgary, New Jersey) are closer to the top of the list and look, surprise, right there with them is Toronto. Sorry Leaf defence bashers, looks like you have even less of an argument now.

Edit: I added the following a few hours the original post

Maybe the most interesting thing we can do with the above information is see how much goaltending actually affected a teams goals against. To do this I compared the above expected even strength goals against with each teams actual even strength goals against. By subtracting expected from actual I can come up with ‘Goalie Goals’ which is an indicitation of how many goals the teams goalies can be blamed for. These numbers are astounding.

Team Exp. Goals Actual Goals Goalie Goals
Los Angeles 203.96 250 46.04
Toronto 193.47 235 41.53
Philadelphia 212.58 254 41.42
Phoenix 210.28 245 34.72
Washington 211.93 245 33.07
Edmonton 203.03 222 18.97
Carolina 207.78 225 17.22
Tampa Bay 202.01 218 15.99
Chicago 202.04 218 15.96
Columbus 196.11 211 14.89
Florida 200.68 215 14.32
Montreal 217.32 229 11.68
Colorado 202.13 210 7.87
Boston 226.57 230 3.43
Dallas 165.52 167 1.48
Pittsburgh 210.48 209 -1.48
St. Louis 220.47 218 -2.47
Calgary 192.36 186 -6.36
Buffalo 204.50 197 -7.50
San Jose 184.91 175 -9.91
Detroit 172.03 158 -14.03
Atlanta 225.13 205 -20.13
Anaheim 190.83 170 -20.83
Vancouver 188.65 166 -22.65
NY Islanders 221.55 196 -25.55
Nashville 216.16 181 -35.16
Ottawa 223.25 186 -37.25
New Jersey 199.28 162 -37.28
Minnesota 189.13 151 -38.13
NY Rangers 208.88 169 -39.88

It is no surprise to who is at the top of the list as most of us knew Los Angeles, Toronto, Philadelphia, Phoenix, etc. had weak goaltending and the bottom teams are no surprise either as all those teams are known to have good goalies. What is a surprise is the magnitude of the goals that can be blamed on the goalies and the number of goals that goalies saved for their teams. As bad as the Kings goalies were, seeing them be the blame for as many as 46 even strength goals is quite amazing. Similarly, seeing that Lundqvist saved his team nearly 40 goals on his own is quite amazing. If anyone wants to argue that goaltending isn’t the most important position in hockey they just need to look at these statistics.

May 282007

I had been planning on doing this for a while but finally got around to it the past few days. I have set up a new subdomain at where I am going to post all sorts of statistics. The first statistics are for the 2006-07 season and contain team goals for and against data for players while that player was on the ice. If you then click on a players name you can find out who that player has played with and against and how much ice time they played together or against each other. Also contained in the tables are goals for and against while that pair is on the ice together or one the ice playing against each other.

For example, we can take a look at Bryan McCabe and even strength goal production while he is on the ice with teammates. From that table you will see that McCabe has played the most even strength ice time (520:46) with Tomas Kaberle and during the time they were on the ice together the Leafs gave up 25 goals or 0.960 goals per 20 minutes. We can then compare that to how well each player performs when they are not playing together. McCabe played 867:41 even strength minutes when Kaberle was not on the ice and in that time 34 goals were given up by the Leafs for a rate of 0.784 per 20 minutes. Similarly, Kaberle played 682:30 at even strength without McCabe and during that time the Leafs gave up 31 goals for a rate of .908 goals per 20 minutes. In other words, McCabe seemed to perform much better defensively when not with Kaberle and Kaberle performed slightly better defensively when not with McCabe (or maybe they both played with better defensive defense partners). With that knowledge, one might want to consider if playing these two guys together is the best option. A similar analysis can be done with goals for statistics when playing with each of his teammates as well as defense and offense when playing against various opponents.

Note: These statistics are available for every player who played a game in the NHL last season except for goalies. Goalies are currently not included but I hope to add some data for them in the future as well as other player data.

Note 2: If you want to browse these statistics the best place to start off is <a href="“>at the index.

Note 3: These numbers are used fairly significantly in my player rankings algorithm.

Apr 072007

One can look at team goals against average and save percentage to get an idea of how good or bad a teams goaltending is but I wanted to get a better idea of how much each team has been affected by good or bad goaltending. As a quick and easy way of doing this I defined as either of the following:

  1. a game in which the team scored 2 or fewer goals, allowed 30 or more shots and won
  2. a game in which the team scored 3 goals and allowed 35 or more shots and won.

It is a pretty arbitrary definition but should give a fairly reasonable idea of what teams are benefiting from very good goaltending. So here are the results:

Num Team Stolen
1 Columbus 7
2 Vancouver 6
3 New Jersey 5
4 NY Rangers 5
5 Anaheim 4
6 Atlanta 4
7 Calgary 4
8 Minnesota 4
9 Montreal 4
10 Nashville 4
11 St. Louis 4
12 Boston 3
13 Buffalo 3
14 Chicago 3
15 Edmonton 3
16 NY Islanders 3
17 Ottawa 3
18 Philadelphia 3
19 San Jose 3
20 Colorado 2
21 Dallas 2
22 Pittsburgh 2
23 Washington 2
24 Detroit 1
25 Florida 1
26 Phoenix 1
27 Tampa Bay 1
28 Toronto 1
29 Carolina 0
30 Los Angeles 0

Wow. Who’d have thought Columbus would be at the top of the list. I guess the likely reason for that is that the Blue Jackets have a lot of games in which they give up 30 or more shots. Not many surprises after that. At the bottom of the spectrum are Carolina and the Los Angeles with no stolen games. The Hurricanes are a bit of a surprise as I thought they would have managed at least one or two but the Kings and their generally horrid goaltending is no surprise.

I also defined blown games as:

  1. a game in which the team scored 4 or more goals, allowed 25 or fewer shots but lost.
  2. a game in which the team scored 5 or more goals and allowed 26-30 shots, but lost.

Again, those definitions are arbitrary but reasonable I think. And the results are:

Num Team Blown
1 Colorado 6
2 NY Rangers 5
3 Toronto 5
4 Anaheim 4
5 Columbus 4
6 Tampa Bay 4
7 Carolina 3
8 Florida 3
9 Los Angeles 3
10 Nashville 3
11 Phoenix 3
12 San Jose 3
13 St. Louis 3
14 Washington 3
15 Calgary 2
16 Chicago 2
17 Edmonton 2
18 Montreal 2
19 NY Islanders 2
20 Ottawa 2
21 Philadelphia 2
22 Atlanta 1
23 Buffalo 1
24 Dallas 1
25 Detroit 1
26 New Jersey 1
27 Pittsburgh 1
28 Boston 0
29 Minnesota 0
30 Vancouver 0

I am guessing that most of those Colorado blown games came early in the season by Jose Theodore. Seeing the Rangers at the top of both lists is evidence of the up and down season that Lundqvist has had this year and as a Leaf fan I am not surprised to see Toronto near the top of the list. At the bottom of the blown games list you get the expected teams like Vancouver, Minnesota, New Jersey, Buffalo, etc. but a few surprises in Boston and Pittsburgh.