Jul 102013

One of the complaints against advanced statistics in hockey is the names of some of the advanced statistics. Sometimes people complain about names like Corsi, Fenwick, PDO, etc. because they don’t have meaningful names. I never really understood it because once you figure it out, which honestly it isn’t that difficult, it isn’t all that difficult. That said, it still seems that some people feel it is a bit of a hurdle for some to get into advanced hockey statistics. I am hoping to revamp and improve my hockey statistics database even more this summer and in the process I wondered if there is interest in having me use some standardized hockey statistics nomenclature that we can all agree on. Here is what I am proposing:

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SOG Shots on goal
SAG Shots at goal (includes missed shots)
ASAG Attempted Shots at Goal (includes missed and blocked shots)
Percentage Statistics
Sh% Shooting percentage (G/SoG)
SAGSh% Shots at goal shooting percentage (G/SaG)
ASAGSh% Attempted Shots at Goal Shooting percentage (G/aSaG)
Sv% Save percentage (G/SoG)
SAGSv% Shots at goal save percentage (G/SaG)
ASAGSv% Attempted Shots at Goal Save percentage (G/aSaG)
ShSv% Shooting percentage + save percentage (Sh% + Sv%)
SAGShSv% Shots at goal shooting percentage + save percentage (SAGSh% + SAGSv%)
ASAGShSv% Attempted Shots at goal shooting percentage + save percentage (ASAGSh% + ASAGSv%)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISOGP Individual Shots on Goal Percentage (iSOG / SOGF)
IASAGP Individual Shots at Goal Percentage (iSAG / SAGF)
IASAGP Individual Attempted Shots at Goal Percentage (iASAG / ASAGF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics

The major changes are instead of calling shots + missed shots fenwick events we call them Shots At Goal (SAG) and instead of calling shots + missed shots + blocked shots corsi events we call them Attempted Shots At Goal (ASAG). Also PDO which is shooting percentage + save percentage is now named ShSv%.

The prefixes and suffixes can be added to individual stats to create new statistics. For example:

  • iSh% = Individual Shooting Percentage (iG / iSOG)
  • TMSAG20 = Team mate average Shots at Goal per 20 minutes of ice time weighted by TOI with
  • OppGF% = Opponent average Goals For Percentage weighted by time on ice against
  • PctTmG = In games that the player played in, the percentage of his teams goals that the player himself scored.

Note that not all combinations of prefixes and suffixes make sense. For example, PctTmSh% or Sh%F but that is self explanatory I think.

What does everyone think? I am perfectly fine sticking with the way I have statistics currently presented but if the majority think something along the lines of the above is better I am all for making the change. If anyone has any other suggestions they are welcome as well. I just think that this is as good a time as any to come up with some standardized nomenclature.

Also, I currently have statistics for the following situations:

  • 5v5
  • 5v5 Home
  • 5v5 Road
  • 5v5 Close
  • 5v5 Tied
  • 5v5 Up1
  • 5v5 Up 2+
  • 5v5 Down 1
  • 5v5 Down 2+
  • 5v5 Leading
  • 5v5 Trailing
  • 5v4 PP
  • 4v5 SH
  • Zone start adjusted data for all of the above except 5v4 SH and 4v5 SH.

If there is interest I may consider adding other situations. For example, first period, second period, third period, 4v4, 5v5 close home and 5v5 close road. Would anyone find these or any other situation interesting to look at?

Also feel free to consider the comments of this post the place where you can officially make any other suggestions of upgrades/enhancements you would like to see made to stats.hockeyanalysis.com. I can’t make any promises I will do implement them but I hope to make some upgrades over the summer.

Update:  Added ‘D’ to the suffix list which stands for differential. So ASAGD would stand for Attempted Shots At Goal Differential which is the equivalent of corsi differential in use now. Might consider adding Rel but need to consider if it is necessary or not. Thoughts?


Jul 052013

Unfortunately I didn’t have as much time this week as I had hoped to do a full evaluation of unrestricted free agent centers like I did for wingers but it is free agent day and there was some big news regarding centers yesterday with the buy out of Grabovski so I thought I’d throw a little something together where I look at some offensive statistics of some of the top centers available. Let me start off by presenting you with the summary table.

G/60 A/60 Pts/60 IPP GF20-TMGF20 FF20-TMFF20 OZBias
Ribeiro 0.593 1.512 2.11 80.5 0.113 -0.025 102.6
Filppula 0.769 1.334 2.1 75 0.116 -0.878 104.7
Lecavalier 0.799 1.186 1.99 68.1 0.139 0.381 100.7
Grabovski 0.899 0.961 1.86 65.4 0.196 2.406 96
Roy 0.587 1.146 1.73 67.4 0.039 0.747 98.7
Weiss 0.652 0.821 1.47 65.6 0.07 -0.467 103.3
Bozak 0.566 0.775 1.34 54.2 -0.062 0.292 99.8

The numbers above are 5v5 numbers over the past 3 seasons and the players are sorted by Pts/60. I threw in Lecavalier because he was a UFA for a brief period of time and is at more or less the same level as the others. I included Bozak to highlight just how much he doesn’t fit in with the rest of the group.

  • G/60 = Goals per 60 minutes of ice time.
  • A/60 = Assists per 60 minutes of ice time
  • Pts/60 = Points per 60 minutes of ice time.
  • IPP = Individual Points Percentage, or the percentage of goals scored while on ice that the player had a point on.
  • GF20-TMGF20 = How much better are his team mates on-ice goal stats when playing with him than without.
  • FF20-TMFF20 = How much better are his team mates on-ice shot generation when playing with him than without.
  • OZBias = OZ Starts*2 + NZStarts and gives an indication of the players usage.

List sorted by G/60: Grabovski, Lecavalier, Filppula, Weiss, Ribeiro, Roy, Bozak

List sorted by A/60: Ribeiro, Filppula, Lecavalier, Roy, Grabovski, Weiss, Bozak

List sorted by Pts/60: Ribeiro, Filppula, Lecavalier, Grabovski, Roy, Weiss, Bozak

List sorted by IPP: Ribeiro, Filppula, Lecavalier, Roy, Weiss, Grabovski, Bozak

List sorted by GF20-TMGF20:  Grabovski, Lecavalier, Filppula, Ribeiro, Weiss, Roy, Bozak

List sorted by FF20-TMFF20: Grabovski, Roy, Lecavalier, Bozak, Ribeiro, Weiss, Filppula

Some comments on each player:

Mike Ribeiro: Easily the best play maker of the group and is most consistently involved in the play.

Valterri Filppula: Better goal scorer than Ribeiro but not as good as a play maker as Ribeiro but better than the rest.

Vincent Lecavalier: Similar to Filppula in value but better at the possession game.

Mikhail Grabovski: Not a great play maker but a good finisher and good at driving shot generation indicating he is probably good at puck retrieval.

Derek Roy: Kind of a poor mans Ribeiro but much less valuable.

Stephen Weiss: More of a poor mans Lecavalier. Easily had the worst line mates of the group and might do better in a different situation.

Tyler Bozak: Weak at goal scoring, bad at play making, not involved in the play and a drag on his team mates goal production. Not anywhere close to the same league as the others (and maybe be better suited for a different league too).

For me, Ribeiro is probably the best of the group in terms of pure offense because of his elite play making ability. Grabovski and Lecavalier are a little more balanced with better scoring and puck retrieval skills while Filppula is pretty solid all round as well and has the flexibility of being used as either a center or a winger (which is valuable if locking in long-term). It’s difficult to compare Weiss to the rest because he simply hasn’t had near as good of line mates but it is probably safe to say he’d be a bit of a step down from Grabovski, Lecavalier or Filppula. Roy, on the other hand, would definitely be a step back but still a decent consolation prize if on a lower priced contract with shorter term. Definitely not anything more than a #2 center though.

As for Bozak, well, you simply don’t want him on your team. Maybe not at any price no matter what the bargain basement price is. I have tried and tried but I just can’t find any redeeming qualities for him outside of his ability to win face offs which has limited value. There simply is no reason why you would want to play him on any of your top 3 lines. None.

Being a Leaf fan and unable to keep Grabovski, my preference would be Ribeiro or Filppula but might be willing to take a chance on Weiss if the contract was right. Ribeiro’s play making skills with the Leafs wingers should be a good combination and Filppula is a good all round player who could shift to wing down if needed. Weiss seems like a solid 2-way player who might be able to step up his game with better line mates which he’d get with the Leafs. If they sign Bozak, I am not sure what I’ll do. It’ll be a sad day.


Jun 202013

This years free agent class is a relatively thin one, pending compliance buy outs of course, but there are a handful of good players that could be hitting the unrestricted free agent market this summer. Today I’ll take a look at the wingers.

In total I identified 15 wingers that I would consider quality NHL regulars. These are in no particular order Nathan Horton, Viktor Stalberg, Ryane Clowe, mason Raymond, Clarke MacArthur, Patrick Elias, David Clarkson, Dan Cleary, Pascal Dupuis, Brad Boyes, Alexei Ponikarovsky, Jarome Iginla, Michael Ryder, Bryan Bickell, and Matt Cooke. I have omitted from the list Teemu Selanne and Daniel Alfredsson since if they do return it will almost certainly be with the Ducks and Senators respectively. I have also omitted Damien Brunner because he doesn’t have enough of a track record as I am looking at 3 seasons of data in my statistical evaluation. I have also omitted Jaromir Jagr because, well, for some reason I forgot to include him and couldn’t be bothered to go back and plug him into all the tables. He still has some value, but I am not sure how significant it is.

(Note that unless mentioned otherwise, the stats below are 5v5 stats over the past 3 seasons)

Offensive Evaluation

In order to attempt to isolate a players offensive production from their team mates one think I like to do is compare their own on-ice stats with the on-ice stats of their team mates when they are playing apart from him. To do this I took each players FF20 and GF20 and divided by teammate FF20 and teammate GF20 respectively. Here is how the wingers stack up against each other.

Winger FF20/TMFF20 Winger GF20/TMGF20
Viktor Stalberg 1.180 Patrick Elias 1.358
Nathan Horton 1.138 Nathan Horton 1.343
Ryane Clowe 1.087 Jarome Iginla 1.290
Mason Raymond 1.083 Pascal Dupuis 1.188
Clarke MacArthur 1.076 Viktor Stalberg 1.124
Patrick Elias 1.074 Michael Ryder 1.116
David Clarkson 1.066 Clarke MacArthur 1.111
Dan Cleary 1.049 Ryane Clowe 1.075
Pascal Dupuis 1.048 Bryan Bickell 1.058
Brad Boyes 1.044 Brad Boyes 1.042
Alexei Ponikarovsky 1.018 Mason Raymond 1.037
Jarome Iginla 1.017 Matt Cooke 0.962
Michael Ryder 0.999 Alexei Ponikarovsky 0.896
Matt Cooke 0.917 Dan Cleary 0.892
Bryan Bickell 0.896 David Clarkson 0.874

Based on the above lists you’d probably have to rank Horton, Stalberg and Elias the top 3 with MacArthur and Clowe not far behind while Cooke, Ponikarovsky and Bickell don’t look so good in comparison. Those are on-ice stats though, how do their individual stats look in comparison.

Winger G/60 Winger Points/60
Nathan Horton 1.111 Pascal Dupuis 2.28
Jarome Iginla 0.987 Nathan Horton 2.22
Pascal Dupuis 0.985 Jarome Iginla 2.09
Viktor Stalberg 0.964 Viktor Stalberg 2.03
Michael Ryder 0.941 Patrick Elias 2.01
David Clarkson 0.846 Michael Ryder 1.99
Clarke MacArthur 0.802 Clarke MacArthur 1.97
Bryan Bickell 0.779 Bryan Bickell 1.86
Matt Cooke 0.743 Brad Boyes 1.70
Dan Cleary 0.722 Ryane Clowe 1.70
Patrick Elias 0.700 Matt Cooke 1.69
Mason Raymond 0.645 Dan Cleary 1.69
Ryane Clowe 0.610 Mason Raymond 1.68
Brad Boyes 0.544 David Clarkson 1.28
Alexei Ponikarovsky 0.462 Alexei Ponikarovsky 1.20

Horton, Dupuis, Iginla, Stalberg dominate the top 4 spots on both lists while Ponikarovsky trails both lists. Individual stats are heavily influenced by quality of line mates and one measure I like to look at is the percentage of goals that their team scores when they are on the ice that they scored themselves (IGP) or had a point on (IPP). The higher the percentage the more integral the player is to his teams offense when he is on the ice.

Winger IGP Winger IPP
David Clarkson 50.7 Patrick Elias 82.1
Jarome Iginla 35.6 David Clarkson 76.7
Viktor Stalberg 34.9 Bryan Bickell 75.5
Michael Ryder 33.9 Jarome Iginla 75.2
Nathan Horton 33.1 Clarke MacArthur 73.5
Bryan Bickell 31.6 Viktor Stalberg 73.4
Dan Cleary 31.2 Dan Cleary 73.1
Pascal Dupuis 30.5 Michael Ryder 71.8
Matt Cooke 30.5 Ryane Clowe 70.9
Clarke MacArthur 29.9 Pascal Dupuis 70.8
Patrick Elias 28.6 Brad Boyes 69.9
Mason Raymond 26.4 Matt Cooke 69.5
Ryane Clowe 25.5 Mason Raymond 69.0
Alexei Ponikarovsky 25.0 Nathan Horton 66.1
Brad Boyes 22.3 Alexei Ponikarovsky 64.7

David Clarkson didn’t look so good in previous lists but when he is on the ice he is a major contributor to the teams offense. Put him with some better offensive players and it is possible he could significantly boost his offensive production. The same can probably be said for Bryan Bickell who has been given more ice time on the Blackhawks top lines these playoffs and he has produced well above his regular season rates. He could be a good bargain pickup for a team who could get good production from him as a second line winger.

Defensive Evaluation

Defensive evaluation is much tougher than offensive evaluation and I think in general wingers are the least important position as far as team defense goes. The best way to evaluate a player defensively is compare their on-ice stats with their team mates. Similar to what I did above with FF20 and GF20 I looked at TMFF20/FF20 and TMGA20/GA20.

Winger TMFA20/FA20 Winger TMGA20/GA20
Alexei Ponikarovsky 1.150 Alexei Ponikarovsky 1.206
Patrick Elias 1.122 Clarke MacArthur 1.174
Clarke MacArthur 1.083 Brad Boyes 1.150
David Clarkson 1.069 David Clarkson 1.097
Viktor Stalberg 1.063 Bryan Bickell 1.086
Nathan Horton 1.052 Pascal Dupuis 1.078
Ryane Clowe 1.038 Viktor Stalberg 1.003
Matt Cooke 1.005 Patrick Elias 0.976
Bryan Bickell 1.001 Michael Ryder 0.954
Brad Boyes 0.996 Matt Cooke 0.948
Dan Cleary 0.973 Jarome Iginla 0.937
Michael Ryder 0.971 Ryane Clowe 0.933
Jarome Iginla 0.953 Dan Cleary 0.879
Mason Raymond 0.951 Mason Raymond 0.858
Pascal Dupuis 0.918 Nathan Horton 0.830

Ponikarovsky, MacArthur, Clarkson seem to be the best in the class here with Raymond, Cleary, and Iginla probably trailing the pack overall.

Overall Evaluation

There is nothing too scientific in this but if I had to rank the wingers in terms of value this is how I would rank them, with probably more emphasis on offensive value.

  1. Iginla – Perfect for a team close looking for some help over the next couple seasons.
  2. Clarkson – I am surprised I am ranking Clarkson over Horton but he comes out ahead in more categories and may come cheaper. I’d still be cautious about over paying but he has scored a bunch of goals on a bad offensive team so that is good.
  3. Horton – I really like Horton but injuries have to be a concern and he’ll likely demand a big contract. He is a first line guy though and would be a big addition to any team. Has a longer track record than Clarkson too so less risky (health issues aside).
  4. MacArthur – Good all-round winger ideal for a second line role or as a secondary player on a first line.
  5. Elias – Age is starting to show but still very solid. Probably stays in New Jersey on short term deal.
  6. Stalberg – Not quite as proven against top competition as MacArthur but similar potential.
  7. Ryder – All he seems to do is score goals and still can be a 30 goal guy if given top line duty. Less rugged version of Clarkson.
  8. Dupuis – Likely stick in Pittsburgh and continue benefiting from playing a bunch on Crosby’s wing.
  9. Bickell – Probably worth taking a gamble on and playing in a second line role. Might be a 20 goal, 50 point guy in that role.
  10. Cooke – More useful for his PK skills. Decent 3rd line guy but limited offense
  11. Boyes – Decent offensive depth guy if on a good value contract. Probably re-signs with Islanders as he probably has more value to them than anyone else. Probably gets more (and higher quality) ice time than he deserves.
  12. Cleary – Not as productive as he was a few years ago but still has some value as a 2nd/3rd line winger.
  13. Clowe – Probably best as a 3rd line guy you hope you can get some toughness secondary offense from.
  14. Raymond – From afar he seems like the guy you always hope can be more but never is.
  15. Ponikarovsky – He kind of like Cooke minus the agitator/cheap shot track record. Solid defensive 3rd liner at this point in his career.


Jun 182013

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

Total Minutes GF20 vs GF20
>500 0.79
>1000 0.85
>1500 0.88
>2000 0.89
>2500 0.88
>3000 0.88
>4000 0.89
>5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

Total Minutes FSh% vs FSh%
>500 0.54
>1000 0.64
>1500 0.71
>2000 0.73
>2500 0.72
>3000 0.73
>4000 0.72
>5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

GF20 predictive power 3yr vs 3yr 0.61
True Randomness Estimate 0.11
Unaccounted for factors estimate 0.28
Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

Total Minutes GF20 vs GF20 FSh% vs FSh%
>500 0.81 0.58
>1000 0.86 0.68
>1500 0.88 0.71
>2000 0.89 0.73
>2500 0.89 0.73
>3000 0.90 0.75
>4000 0.90 0.73
>5000 0.89 0.71


Jun 122013

Yesterday it came across my twitter feed a paper about using regularized logistic regression in estimating player contribution in hockey. I skimmed through the article but not enough to fully understand that article but found some of the conclusions at least mildly interesting. This post is neither a post in support or against the paper but rather a rebuttal to a rebuttal from Eric T at NHLNumbers.com.

To summarize the paper, the authors conducted a goal based analysis to estimate player contribution and to summarize Eric T’s rebuttal, Eric T applauded the effort but suggested a shot based analysis would be more appropriate because that is where ‘modern hockey thought’ currently stands.


I think my biggest concern is that by focusing exclusively on goals, you allow for shooting percentage variance to have a significant impact on a player’s calculated value. Even with four years of data, variance plays a large role in the shooting and save percentages with a given player on the ice.

This is why much of modern hockey analysis starts with shot-based metrics; the shooting percentages introduce a lot of variance which must be accounted for to get a reasonable assessment of talent. If you used shots for your model, I suspect you’d easily identify more than a mere 60 players who have significantly non-zero talent levels — and the model could be further refined from there (e.g. give each shot a weight based on the shooter’s career shooting percentage).

That is in essence Eric T’s argument.  Shooting percentages are unreliable so it is better to use a shot based approach (though I find it a little ironic that he then suggest incorporating shooting percentage again).

The “even with four years of data, variance plays a large role in shooting and save percentages with a given player on the ice” is the statement that I have the biggest problem with. It has been shown by myself many times that goal scoring rates are a better predictor of future goal scoring than shot rates are when dealing with multiple seasons of data. Furthermore, any study that uses sufficient amounts of data (either by using multiple seasons of data or by grouping similar players and using their aggregate shooting percentage) has concluded that shot quality (ability to sustain an elevated shooting percentage) exists and is significant. For example, we know that players that get a significant amount of ice time have significantly higher shooting percentages (see here and here and here) and just by looking at list of players sorted by their long-term on-ice shooting percentages we see that good offensive players rise to the top and poor offensive players fall to the bottom (in no way can anyone conclude that that list is random in nature). There is ample evidence to suggest that with 4 years of data goal based metrics should be the preferred tool over shot/possession based metrics.

Eric T brought up Dwayne Roloson, Kent Huskins, Sean O’Donnell, and others as examples of where he feels the evaluation system failed but pointing out a few counter examples is not enough to toss the analysis out completely. There will always be exceptions and outliers when attempting to build an all-encompassing evaluation metric. For the methodology in the paper maybe it is Roloson and Huskins but I can assure you than for any shot based metric it will be Tyler Kennedy and Scott Gomez.

The standard for which an all-encompassing metric should be tested against is not “is it perfect” and if it doesn’t pass that test toss it aside and ignore it forever. These metrics will never be perfect and should never be used as the final say on a players value. In truth, they should be used to spark conversation and discussion and further investigation, not end it. When we see strange results just as much as we shouldn’t assume they are true we shouldn’t assume the whole methodology is worthless.

Furthermore, making any argument against a new methodology because it doesn’t conform to “modern hockey thought” and suggesting they revise it to make it conform more to “modern hockey thought” is plainly the worst thing one can do. The best discoveries in the history of humanity typically arise when people don’t conform to current thought processes but rather do something different. You are free to make an argument against something but make sure that argument is something deeper than “it doesn’t conform to modern hockey thought.”

Finally, my biggest beef with many in the pro corsi/possession/shot differential crowd is the way in which many immediately and abjectly dismiss anything that strays from a corsi/possession/shot differential analysis. This is as fundamentally misguided as those that claim that corsi/possession/shot differential is meaningless and goals are the only tool one should use in player evaluation. The truth is, both methods provide value. The possession method primarily provides value when dealing with small sample sizes as it will reduce small sample size and random variance issues. Shot differential metrics are inherently a flawed metric though because shot differential isn’t the end goal of the player (goal differential is what matters in the win/loss column) and shot quality and ability to drive/suppress shooting percentages exists and are real. There is nothing wrong with using possession metrics as an evaluation tool so long as we are aware of this limitation just as there is nothing wrong with using goal based metrics as an evaluation tool so long as we are aware of its sample size, randomness and uncertainty limitations. Neither are perfect, both have their uses, both have their limitations and in reality both should be considered in any player evaluation.

(Note: Just to be clear, because apparently Tyler Dellow has a poor ability to interpret words properly, my critique of Eric T’s critique of the goal based all-encompassing player evaluation metric does not in any way mean that I believe Dwayne Roloson helps his team score goals. To be completely honest, I serious question how the authors of the paper incorporate goalies into the methodology and this is supported by the fact that in my own all-encompassing player evaluation metrics – goal or shot based – I assume goalies have no influence on a teams offensive production. Hope this clears the issue up for Tyler.)


Jun 112013

Nathan Horton has been one of the stars of these NHL playoffs as will be an integral component of the Stanley Cup finals if the Bruins are going to beat the Chicago Blackhawks. Nathan Horton is also set to become an unrestricted free agent this summer so his good playoff performance is good timing. One of the things I have noticed about Horton while looking through the statistics is that he has one of the highest on-ice 5v5 shooting percentages over the past 6 seasons of any NHL forward (ranks 16th among forwards with >300 minutes of ice time).

Part of the reason for this is that he is a fairly good shooter himself (ranks 30th with a 5v5 shooting percentage of 12.25%) but this in no way is the main reason.  Let’s take a look at how Horton’s line mates shooting percentage have been over the past 6 seasons when playing with Horton and when not playing with Horton.

Sh% w/o Horton Sh% w/ Horton Difference
Weiss 11.28% 12.84% 1.56%
Lucic 13.03% 16.98% 3.95%
Krejci 11.41% 12.10% 0.68%
Booth 8.44% 11.26% 2.82%
Frolik 6.58% 10.84% 4.26%
Stillman 10.03% 15.38% 5.35%
Zednik 8.81% 13.56% 4.75%
Average 9.94% 13.28% 3.34%

Included are all forwards Horton has played at least 400 minutes of 5v5 ice time with over the past 6 seasons along with their individual shooting percentage when with Horton and when not with Horton. Every single one of them has an individual shooting percentage higher with Horton than when not with Horton and generally speaking significantly higher.  I have previously looked at how much players can influence their line mates shooting percentages and found that Horton was among the league leaders so the above table agrees with that assessment.

It is still possible that Horton is just really lucky but that argument starts to lose steam when it seems he is getting lucky each and every year over the past 6 years (he has never had a 5v5 on-ice shooting percentage at or below league average). Whatever Horton is doing while on the ice seems to be allowing his line mates to boost their own individual shooting percentages and the result of this is that he has the 9th highest on-ice goals for rate over the past 6 seasons. He is a massively under rated player and is this summers Alexander Semin of the UFA market.


May 212013

Last week there was a twitter discussion on the merits of playing a defensive shell game by limiting scoring chances against but also limiting scoring chances for, even if it meant the ration of goals for to goals against gets worse. The two sides of the debate are as follows:

Argument 1: It is always best to play a game where you are expected to out score the opposition regardless of the goals for/against rates.

Argument 2: When playing with a lead late in the game it is more important to reduce the goals against rate than maintain the goals for rate, even if it means the goals for to goals against ratio drops significantly.

To test each theory I simulated a number of games between teams T1 and T2 according to the following theories:

1. During normal play between teams T1 and T2, T1 will score at a rate of 2.75 goals/60 minutes and T2 will score at a rate of 2.50 goals/60 minutes. During this play it is expected that T1 will score approximately 52.4% of all the goals that are scored.

2. During play between T1 and T2 when T1 has a lead and is playing in defensive shell mode T1 score at a rate of 2.00 goals/60 and T2 will score at the same 2.00 goals/60 rate.

From there I simulated 1,000,000 games in which T1 is protecting a 1 goal lead for the remaining 2.5, 5, 7.5, 10, 12.5, 15, 17.5 and 20 minutes of a game under both normal style play and defensive shell style play. Here are the results at the end of regulation play.

Normal play

Wins Losses Ties RegWin% OTL Pts% PlayoffWin%
2.5mins 911132 4471 99307 96.08% 93.60% 96.32%
5mins 847011 15230 187894 94.10% 89.40% 94.54%
7.5mins 799667 28880 268711 93.40% 86.68% 94.04%
10mins 764672 44692 340642 93.50% 84.98% 94.31%
12.5mins 738696 59869 405525 94.15% 84.01% 95.11%
15mins 717679 75094 464680 95.00% 83.38% 96.11%
17.5mins 702071 88968 518004 96.11% 83.16% 97.34%
20mins 690638 102013 565261 97.33% 83.20% 98.67%

Defensive Shell

Wins Losses Ties RegWin% OTL Pts% PlayoffWinRate
2.5mins 926241 3011 79934 96.62% 94.62% 96.81%
5 mins 868285 10599 153384 94.50% 90.66% 94.86%
7.5mins 821835 21109 221668 93.27% 87.73% 93.79%
10mins 785935 32888 283819 92.78% 85.69% 93.46%
12.5mins 755920 46048 341509 92.67% 84.13% 93.48%
15mins 733346 58874 392918 92.98% 83.16% 93.92%
17.5mins 713419 72115 442202 93.45% 82.40% 94.50%
20mins 697687 85092 486930 94.12% 81.94% 95.27%

Wins, losses, ties are T1’s record after 60 minutes and regulation win% is the standard regulation winning percentage using 2 points for a win, 0 points for a loss and 1 point for a tie. PlayoffWinRate is the winning percentage of T1 in a playoff game assuming that they would win 52.4% of all overtime games. OTL Pts% is the current regular season system where you get 1 point for an overtime loss, 2 points for a win of any kind and zero points for a regulation loss (under this system for simplicity sake I assumed a 50% chance of winning an overtime game since we don’t know odds of winning a shoot out).

That is a lot of numbers, so lets look at these in nicer easier to read charts.




Under this constructed scenario the break even point for when to go into a defensive shell and when to continue playing normal hockey is at about 7-7.5 minutes for regulation win % and playoff win % systems and about 13 minutes for the point for an overtime loss system currently used during the regular season.

For some people this may not make sense intuitively. How can it be better to stop playing a system in which you are expected to out score your opposition and start playing a system in which you are expected to score the same as your opponent. The reason is simple and it comes down to that over a short period of time your are essentially dealing with small sample size issues and randomness becomes more important than long term skill. The reality is, over a short time one team is almost as likely to score as the other so which team scored next is close to random, if any team scores at all. The most important thing when protecting a lead is simply reducing the likelihood that your opponent will score because the cost of your opponent scoring is far greater than the benefit if you scoring (it is irrelevant whether you win 3-1 or 2-1, a win is a win in the standings).

What is interesting is the effect of awarding the point for an overtime loss is in reality providing additional incentive for teams to play the defensive shell game for longer periods of time because the cost of giving up a goal is not as great in that system because a tied at the end of regulation guarantees you one point with the possibility of 2 where as in the other systems it does not. This means teams can play the defensive shell for twice as long as they could otherwise.

Of course, this is only looking at one side of the equation. Typically the trailing team will get more offensively aggressive even if it means increasing the possibility of having a goal scored against them. This is why teams pull their goalie late in the game. At that point scoring a goal is the only thing that matters so you may as well risk giving one up to score. Over the last 5-10 minutes or so it probably makes sense for the trailing team to take more high risk high reward plays in the offensive zone because at that point scoring a goal has more benefit than the cost of giving up a goal.



May 152013

After last weeks untimely pinch by Dion Phaneuf in game 4 that led to an overtime goal and the Bruins taking a 3-1 lead in the first round series there was a lot of evaluation of Phaneuf as a defenseman both good and bad. I was intending to write an article to discuss the relative merits of Dion Phaneuf and attempt to get an idea of where he stands among NHL defensemen but in the process of researching that I came across some interesting Phaneuf stats that I think deserve their own post so here it is.

My observation was with respect to Phaneuf’s usage and performance when the Leafs are leading and when they are trailing over the previous 3 seasons. Let’s start of by looking at Phaneuf’s situational statistics over the past 3 seasons.

5v5 5v5close 5v5tied Leading Trailing
G/60 0.222 0.175 0.101 0.156 0.408
Pts/60 0.700 0.670 0.660 0.420 1.020
IPP 30.1% 31.1% 34.2% 20.0% 34.5%
GF20 0.773 0.721 0.640 0.692 0.986
GA20 0.841 0.760 0.943 0.865 0.714
GF% 47.9% 48.7% 40.4% 44.4% 58.0%
CF20 18.316 18.113 18.159 15.195 21.542
CA20 20.686 21.418 21.880 22.982 17.223
CF% 47.0% 45.8% 45.4% 39.8% 55.6%
OZ% 28.0% 26.7% 25.2% 24.2% 34.5%
DZ% 31.8% 30.3% 29.7% 37.5% 28.5%
NZ% 40.3% 43.0% 45.0% 38.3% 37.0%
DZBias 103.9 103.6 104.4 113.3 94.0
TeamDZBias 108.9 109 107 115.2 100.8
DZBiasDiff -5 -5.4 -2.6 -1.9 -6.8

Most of the stats above the regular readers should be familiar with but if you are not you can reference my glossary here. The one stat that I have not used before is DZBias. DZBias is defined as 2*DZ% + NZ% and thus anything over 100 indicates the player has a bias towards starting shifts in the defensive zone and anything under 100 the player has a bias towards starting in the offensive zone. I prefer this to OZone% which is OZStarts/(OZStarts+DZStarts) because it takes into account neutral zone starts as well. TeamDZBias is the zone start bias of the Leafs over the past 3 seasons and DZBiasDiff is Phaneuf’s DZBias minus the teams DZBias and provides a zone start bias relative to the team. Anything less than 0 indicates usage is more in the offensive zone relative to his teammates.

So, what does this tell us about Phaneuf.  Well, there isn’t a huge variation in either the zone start usage or the results during 5v5, 5v5close and 5v5tied situations so the focus should be on the differences between 5v5leading and 5v5trailing which are significant.

Typical score effects are when leading a team gives up more shots but of lower quality (defensive shells protect the danger zone in front of the net but allow more shots from the perimeter) and takes fewer shots but of higher quality (probably a result of more odd-man rushes due to pinching defensemen of the trailing team).  Phaneuf seems to take this concept to the extreme but more importantly Phaneuf seems to excel best in an offensive role and struggles in a defensive role. When the Leafs are trailing Phaneuf has  0.408G/60 (10th of 180 defensemen) and 1.02 points/60 (36th of 180 defensemen) but when leading Phaneuf falls to 0.156 G/50 (64th of 177 defensemen) and 0.42 points/60 (137th of 177 defensemen). Furthermore, Phaneuf’s involvement in the offensive zone drops off significantly when leading (IPP drops from 34.5% when trailing to 20.0% when leading).

In terms of on-ice stats, Phaneuf’s CF% drops from 55.6% when trailing (79th of 180 defensemen) to a very poor 39.8% when leading (164th of 177 defensemen).  Some may be thinking this is due to zone starts but Phaneuf is getting above average offensive zone starts both when trailing (ranks 100th of 180 defensemen) and when leading (ranks 154th of 177) and using even the most aggressive zone start adjustments in no way will account for the difference. Similar observations can be made with on-ice goal stats as well. Let’s look at how Phaneuf ranks among defensemen over the past 3 seasons.

Leading (of177) Trailing ( of 180)
GF20 109 25
GA20 125 71
GF% 126 36
CF20 128 31
CA20 174 154
CF% 164 79

That is a pretty significant improvement in rankings when trailing over when leading, especially in the offensive statistics (GF20, CF20). If zone starts aren’t a factor, might line mates be? He are Phaneuf’s most frequent defense partners:

Trailing:  Gunnarsson (364:33, 31.0%), Beauchemin(212:07, 18,0%), Aulie(162:09, 13.8%)

Leading: Gunnarsson (376:16, 32.5%), Aulie(234:17, 20.3%), Beauchemin(166:30, 14.4%)

Playing more with Beauchemin and less with Aulie when trailing ought to help, particularly ones offensive stats, but I doubt that is going to account for that much of a difference. Also, when leading Phaneuf has a 41.2CF% with Gunnarsson and when trailing that spikes to 54.6%. When leading Phaneuf and Beauchemin have a CF% of 37.3% and when trailing that spikes to 57.7%. With Aulie the difference is 36.6% vs 49.3%. Regardless of which defense partner Phaneuf is with, their stats dramatically improve when playing in catch up situation than when in trailing situations.

The same is true for forwards. When protecting a lead Phaneuf plays more with Grabovski and Kulemin but when playing catch up he plays a bit more with Kessel and Bozak but for all of those forwards Phaneuf’s numbers with them are hugely better when playing catch up than when protecting a lead and playing with Grabovski and Kulemin more when playing with a lead should only help his statistics as they are generally considered the Leafs better corsi players.

Let’s take a look at a chart of Phaneuf’s corsi WOWY’s when leading and when trailing.



As you can see, when leading the majority of Phaneuf’s team mates are to the left of the diagonal line which means they have a better corsi% without Phaneuf than with.



When trailing the majority of Phaneuf’s team mates are near or to the right of the diagonal line which means they generally have better corsi% statistics when with Phaneuf than when apart.

So the question arises, why is this? It doesn’t seem to be zone starts. It doesn’t seem to be changes in line mates and it isn’t that the team as a whole automatically becomes a great corsi% team when trailing which Phaneuf could benefit from. When leading Phaneuf’s corsi% is 39.8% which is worse than the teams 41.2% and when trailing Phaneuf’s corsi% is 55.6% which is better than the teams 54.4%. It seems to me that the conclusion we must draw from this is that Phaneuf has been poor at protecting a lead relative to his team mates and we know his team mates have been poor at protecting a lead. Where Phaneuf excels is when he is asked to engage offensively be that when playing catch up hockey or when playing on the PP (Phaneuf’s PP statistics are pretty solid). From the first chart we know that Phaneuf has a slight bias towards more offensive zone starts (relative to his team mates) and when we dig into the numbers further it probably shows that he should be given even more offensive opportunities and given fewer defensive ones because he seems like a much better player when asked to be engaged offensively than when he is asked to be a shut down defenseman.

Acquiring a quality shut down defenseman (ideally two) this off season must be the #1 priority of Maple Leaf management and Phaneuf’s usage must shift further away from multi-purpose heavy work load defenseman to primarily an offensive usage defenseman.


May 012013

I brought this issue up on twitter today because it got me thinking. Many hockey analytics dismiss face off winning % as a skill that has much value but many of the same people also claim that zone starts can have a significant impact on a players statistics. I haven’t really delved into the statistics to investigate this, but here is what I am wondering.  Consider the following two players:

Player 1: Team wins 50% of face offs when he is on the ice and he starts in the offensive zone 55% of the time.

Player 2: Team wins 55% of face offs when he is on the ice but he has neutral zone starts.

Given 1000 zone face offs the following will occur:

Player 1 Player 2
Win Faceoff in OZone 275 275
Lose Faceoff in Ozone 275 225
Win Faceoff in DZone 225 275
Lose Faceoff in Dzone 225 225

Both of these players will win the same number of offensive zone face offs and lose the same number of defensive zone face offs which are the situations that intuitively should have the greatest impacts on a players statistcs. So, if Player 1 is going to be more significantly impacted by his zone starts than player 2 is impacted by his face off win % losing face offs in the offensive zone must still have a significant positive impact on the players statistics and winning face offs in the defensive zone must must still have a significant negative impact on the players statistics. If this is not the case then being able to win face offs should be more or less equivalent in importance to zone starts (and this is without considering any benefit of winning neutral zone face offs).

Now, I realize that there is a greater variance in zone start deployment than face off winning percentage, but if a 55% face off percentage is roughly equal to a 55% offensive zone start deployment and a 55% face off win% has a relatively little impact on a players statistics then a 70% zone start deployment would have a relatively little impact on the players statistics times four which is still probably relatively little.

I hope to be able to investigate this further but on the surface it seems that if face off win% is of relatively little importance it is supporting of my claim that zone starts have relatively little impact on a players statistics.


Apr 252013

I am hoping to get playoff stats on stats.hockeyanalysis.com but it is going to take some work especially if I am to do game by game and series by series stats including “with you” and “against you” stats. As such I have decided to start a crowd funding project at RocketHub.com (because unlike Kickstarter they support Canadians) to help justify the time I will have to put in to getting these stats up in a relatively short time frame. Below is the description of the project and what I hope to achieve and if you are interested in contributing you can do that at the project page at RocketHub.com. Your contributions are greatly appreciated and I think you will enjoy what I have planned for stats.hockeyanalysis.com.


Hello. This is David Johnson from HockeyAnalysis.com and creator of the popular advanced hockey statistics website Stats.HockeyAnalysis.com. Much of my work on hockey analytics has been at the macro level, or more specifically evaluating players over 1, 2, or more years. This works great for the regular season and for evaluating a players overall talent level which is where my interest mostly lies but there seems to be a strong demand for more micro level stats such as how players or teams perform in a single game or over a short stretch of games (i.e. after the trade deadline, before and after a coach got replaced, etc.) and this is especially true during the Stanley Cup playoffs.

The problem is, much of my existing code base that I use for stats.hockeyanalysis.com is designed for macro level stats and to revamp it to calculate stats on a per game or per playoff series basis and make these available on the web will take a significant redesign and rewrite of large portions of the code.

My goal for this project is to make some of those changes so I can get some playoff stats up for those that are interested and down the road make per game and per groups of game data available for regular season data starting next season. Here is what I am hoping to generate for these playoffs:

  • Team stats by series and playoffs overall
  • Player stats by series and playoffs overall
  • Game by game team stats
  • “with you” stats by game, series and playoffs overall so you can see how the team performed with various pairs of players on the ice.
  • “against you” stats by game, series and playoffs so you can see which players were successful at scoring on or shutting down their opponents.
  • For each of the above I will be adding goal, shot, fenwick and corsi data (totals and possibly %’s).
  • Will add zone start data to “with you” and “against you” data as time permits.
  • Will start with just looking at 5v5 situations but will add other situations if time permits.

My intent is to start by adding playoff stats similar to the existing regular season stats and then as development progresses I’ll be adding the other features with hopefully the majority of them being added by the end of round 1 if not sooner.

I am looking for some funding so I can justify the significant time over the next few weeks that it will take to rewrite my code and make game by game playoff stats available. I figure if each of the regulars that use stats.hockeyanalysis.com contributes between $10 and $50 (larger donations certainly welcome though) it will be easy to reach my funding goal. Any additional funding beyond my goal will be devoted towards adding similar game by game features to the regular season data for the start of next season.