Jul 052013
 

Unfortunately I didn’t have as much time this week as I had hoped to do a full evaluation of unrestricted free agent centers like I did for wingers but it is free agent day and there was some big news regarding centers yesterday with the buy out of Grabovski so I thought I’d throw a little something together where I look at some offensive statistics of some of the top centers available. Let me start off by presenting you with the summary table.

G/60 A/60 Pts/60 IPP GF20-TMGF20 FF20-TMFF20 OZBias
Ribeiro 0.593 1.512 2.11 80.5 0.113 -0.025 102.6
Filppula 0.769 1.334 2.1 75 0.116 -0.878 104.7
Lecavalier 0.799 1.186 1.99 68.1 0.139 0.381 100.7
Grabovski 0.899 0.961 1.86 65.4 0.196 2.406 96
Roy 0.587 1.146 1.73 67.4 0.039 0.747 98.7
Weiss 0.652 0.821 1.47 65.6 0.07 -0.467 103.3
Bozak 0.566 0.775 1.34 54.2 -0.062 0.292 99.8


The numbers above are 5v5 numbers over the past 3 seasons and the players are sorted by Pts/60. I threw in Lecavalier because he was a UFA for a brief period of time and is at more or less the same level as the others. I included Bozak to highlight just how much he doesn’t fit in with the rest of the group.

  • G/60 = Goals per 60 minutes of ice time.
  • A/60 = Assists per 60 minutes of ice time
  • Pts/60 = Points per 60 minutes of ice time.
  • IPP = Individual Points Percentage, or the percentage of goals scored while on ice that the player had a point on.
  • GF20-TMGF20 = How much better are his team mates on-ice goal stats when playing with him than without.
  • FF20-TMFF20 = How much better are his team mates on-ice shot generation when playing with him than without.
  • OZBias = OZ Starts*2 + NZStarts and gives an indication of the players usage.

List sorted by G/60: Grabovski, Lecavalier, Filppula, Weiss, Ribeiro, Roy, Bozak

List sorted by A/60: Ribeiro, Filppula, Lecavalier, Roy, Grabovski, Weiss, Bozak

List sorted by Pts/60: Ribeiro, Filppula, Lecavalier, Grabovski, Roy, Weiss, Bozak

List sorted by IPP: Ribeiro, Filppula, Lecavalier, Roy, Weiss, Grabovski, Bozak

List sorted by GF20-TMGF20:  Grabovski, Lecavalier, Filppula, Ribeiro, Weiss, Roy, Bozak

List sorted by FF20-TMFF20: Grabovski, Roy, Lecavalier, Bozak, Ribeiro, Weiss, Filppula

Some comments on each player:

Mike Ribeiro: Easily the best play maker of the group and is most consistently involved in the play.

Valterri Filppula: Better goal scorer than Ribeiro but not as good as a play maker as Ribeiro but better than the rest.

Vincent Lecavalier: Similar to Filppula in value but better at the possession game.

Mikhail Grabovski: Not a great play maker but a good finisher and good at driving shot generation indicating he is probably good at puck retrieval.

Derek Roy: Kind of a poor mans Ribeiro but much less valuable.

Stephen Weiss: More of a poor mans Lecavalier. Easily had the worst line mates of the group and might do better in a different situation.

Tyler Bozak: Weak at goal scoring, bad at play making, not involved in the play and a drag on his team mates goal production. Not anywhere close to the same league as the others (and maybe be better suited for a different league too).

For me, Ribeiro is probably the best of the group in terms of pure offense because of his elite play making ability. Grabovski and Lecavalier are a little more balanced with better scoring and puck retrieval skills while Filppula is pretty solid all round as well and has the flexibility of being used as either a center or a winger (which is valuable if locking in long-term). It’s difficult to compare Weiss to the rest because he simply hasn’t had near as good of line mates but it is probably safe to say he’d be a bit of a step down from Grabovski, Lecavalier or Filppula. Roy, on the other hand, would definitely be a step back but still a decent consolation prize if on a lower priced contract with shorter term. Definitely not anything more than a #2 center though.

As for Bozak, well, you simply don’t want him on your team. Maybe not at any price no matter what the bargain basement price is. I have tried and tried but I just can’t find any redeeming qualities for him outside of his ability to win face offs which has limited value. There simply is no reason why you would want to play him on any of your top 3 lines. None.

Being a Leaf fan and unable to keep Grabovski, my preference would be Ribeiro or Filppula but might be willing to take a chance on Weiss if the contract was right. Ribeiro’s play making skills with the Leafs wingers should be a good combination and Filppula is a good all round player who could shift to wing down if needed. Weiss seems like a solid 2-way player who might be able to step up his game with better line mates which he’d get with the Leafs. If they sign Bozak, I am not sure what I’ll do. It’ll be a sad day.

 

Jun 202013
 

This years free agent class is a relatively thin one, pending compliance buy outs of course, but there are a handful of good players that could be hitting the unrestricted free agent market this summer. Today I’ll take a look at the wingers.

In total I identified 15 wingers that I would consider quality NHL regulars. These are in no particular order Nathan Horton, Viktor Stalberg, Ryane Clowe, mason Raymond, Clarke MacArthur, Patrick Elias, David Clarkson, Dan Cleary, Pascal Dupuis, Brad Boyes, Alexei Ponikarovsky, Jarome Iginla, Michael Ryder, Bryan Bickell, and Matt Cooke. I have omitted from the list Teemu Selanne and Daniel Alfredsson since if they do return it will almost certainly be with the Ducks and Senators respectively. I have also omitted Damien Brunner because he doesn’t have enough of a track record as I am looking at 3 seasons of data in my statistical evaluation. I have also omitted Jaromir Jagr because, well, for some reason I forgot to include him and couldn’t be bothered to go back and plug him into all the tables. He still has some value, but I am not sure how significant it is.

(Note that unless mentioned otherwise, the stats below are 5v5 stats over the past 3 seasons)

Offensive Evaluation

In order to attempt to isolate a players offensive production from their team mates one think I like to do is compare their own on-ice stats with the on-ice stats of their team mates when they are playing apart from him. To do this I took each players FF20 and GF20 and divided by teammate FF20 and teammate GF20 respectively. Here is how the wingers stack up against each other.

Winger FF20/TMFF20 Winger GF20/TMGF20
Viktor Stalberg 1.180 Patrick Elias 1.358
Nathan Horton 1.138 Nathan Horton 1.343
Ryane Clowe 1.087 Jarome Iginla 1.290
Mason Raymond 1.083 Pascal Dupuis 1.188
Clarke MacArthur 1.076 Viktor Stalberg 1.124
Patrick Elias 1.074 Michael Ryder 1.116
David Clarkson 1.066 Clarke MacArthur 1.111
Dan Cleary 1.049 Ryane Clowe 1.075
Pascal Dupuis 1.048 Bryan Bickell 1.058
Brad Boyes 1.044 Brad Boyes 1.042
Alexei Ponikarovsky 1.018 Mason Raymond 1.037
Jarome Iginla 1.017 Matt Cooke 0.962
Michael Ryder 0.999 Alexei Ponikarovsky 0.896
Matt Cooke 0.917 Dan Cleary 0.892
Bryan Bickell 0.896 David Clarkson 0.874

Based on the above lists you’d probably have to rank Horton, Stalberg and Elias the top 3 with MacArthur and Clowe not far behind while Cooke, Ponikarovsky and Bickell don’t look so good in comparison. Those are on-ice stats though, how do their individual stats look in comparison.

Winger G/60 Winger Points/60
Nathan Horton 1.111 Pascal Dupuis 2.28
Jarome Iginla 0.987 Nathan Horton 2.22
Pascal Dupuis 0.985 Jarome Iginla 2.09
Viktor Stalberg 0.964 Viktor Stalberg 2.03
Michael Ryder 0.941 Patrick Elias 2.01
David Clarkson 0.846 Michael Ryder 1.99
Clarke MacArthur 0.802 Clarke MacArthur 1.97
Bryan Bickell 0.779 Bryan Bickell 1.86
Matt Cooke 0.743 Brad Boyes 1.70
Dan Cleary 0.722 Ryane Clowe 1.70
Patrick Elias 0.700 Matt Cooke 1.69
Mason Raymond 0.645 Dan Cleary 1.69
Ryane Clowe 0.610 Mason Raymond 1.68
Brad Boyes 0.544 David Clarkson 1.28
Alexei Ponikarovsky 0.462 Alexei Ponikarovsky 1.20

Horton, Dupuis, Iginla, Stalberg dominate the top 4 spots on both lists while Ponikarovsky trails both lists. Individual stats are heavily influenced by quality of line mates and one measure I like to look at is the percentage of goals that their team scores when they are on the ice that they scored themselves (IGP) or had a point on (IPP). The higher the percentage the more integral the player is to his teams offense when he is on the ice.

Winger IGP Winger IPP
David Clarkson 50.7 Patrick Elias 82.1
Jarome Iginla 35.6 David Clarkson 76.7
Viktor Stalberg 34.9 Bryan Bickell 75.5
Michael Ryder 33.9 Jarome Iginla 75.2
Nathan Horton 33.1 Clarke MacArthur 73.5
Bryan Bickell 31.6 Viktor Stalberg 73.4
Dan Cleary 31.2 Dan Cleary 73.1
Pascal Dupuis 30.5 Michael Ryder 71.8
Matt Cooke 30.5 Ryane Clowe 70.9
Clarke MacArthur 29.9 Pascal Dupuis 70.8
Patrick Elias 28.6 Brad Boyes 69.9
Mason Raymond 26.4 Matt Cooke 69.5
Ryane Clowe 25.5 Mason Raymond 69.0
Alexei Ponikarovsky 25.0 Nathan Horton 66.1
Brad Boyes 22.3 Alexei Ponikarovsky 64.7

David Clarkson didn’t look so good in previous lists but when he is on the ice he is a major contributor to the teams offense. Put him with some better offensive players and it is possible he could significantly boost his offensive production. The same can probably be said for Bryan Bickell who has been given more ice time on the Blackhawks top lines these playoffs and he has produced well above his regular season rates. He could be a good bargain pickup for a team who could get good production from him as a second line winger.

Defensive Evaluation

Defensive evaluation is much tougher than offensive evaluation and I think in general wingers are the least important position as far as team defense goes. The best way to evaluate a player defensively is compare their on-ice stats with their team mates. Similar to what I did above with FF20 and GF20 I looked at TMFF20/FF20 and TMGA20/GA20.

Winger TMFA20/FA20 Winger TMGA20/GA20
Alexei Ponikarovsky 1.150 Alexei Ponikarovsky 1.206
Patrick Elias 1.122 Clarke MacArthur 1.174
Clarke MacArthur 1.083 Brad Boyes 1.150
David Clarkson 1.069 David Clarkson 1.097
Viktor Stalberg 1.063 Bryan Bickell 1.086
Nathan Horton 1.052 Pascal Dupuis 1.078
Ryane Clowe 1.038 Viktor Stalberg 1.003
Matt Cooke 1.005 Patrick Elias 0.976
Bryan Bickell 1.001 Michael Ryder 0.954
Brad Boyes 0.996 Matt Cooke 0.948
Dan Cleary 0.973 Jarome Iginla 0.937
Michael Ryder 0.971 Ryane Clowe 0.933
Jarome Iginla 0.953 Dan Cleary 0.879
Mason Raymond 0.951 Mason Raymond 0.858
Pascal Dupuis 0.918 Nathan Horton 0.830

Ponikarovsky, MacArthur, Clarkson seem to be the best in the class here with Raymond, Cleary, and Iginla probably trailing the pack overall.

Overall Evaluation

There is nothing too scientific in this but if I had to rank the wingers in terms of value this is how I would rank them, with probably more emphasis on offensive value.

  1. Iginla – Perfect for a team close looking for some help over the next couple seasons.
  2. Clarkson – I am surprised I am ranking Clarkson over Horton but he comes out ahead in more categories and may come cheaper. I’d still be cautious about over paying but he has scored a bunch of goals on a bad offensive team so that is good.
  3. Horton – I really like Horton but injuries have to be a concern and he’ll likely demand a big contract. He is a first line guy though and would be a big addition to any team. Has a longer track record than Clarkson too so less risky (health issues aside).
  4. MacArthur – Good all-round winger ideal for a second line role or as a secondary player on a first line.
  5. Elias – Age is starting to show but still very solid. Probably stays in New Jersey on short term deal.
  6. Stalberg – Not quite as proven against top competition as MacArthur but similar potential.
  7. Ryder – All he seems to do is score goals and still can be a 30 goal guy if given top line duty. Less rugged version of Clarkson.
  8. Dupuis – Likely stick in Pittsburgh and continue benefiting from playing a bunch on Crosby’s wing.
  9. Bickell – Probably worth taking a gamble on and playing in a second line role. Might be a 20 goal, 50 point guy in that role.
  10. Cooke – More useful for his PK skills. Decent 3rd line guy but limited offense
  11. Boyes – Decent offensive depth guy if on a good value contract. Probably re-signs with Islanders as he probably has more value to them than anyone else. Probably gets more (and higher quality) ice time than he deserves.
  12. Cleary – Not as productive as he was a few years ago but still has some value as a 2nd/3rd line winger.
  13. Clowe – Probably best as a 3rd line guy you hope you can get some toughness secondary offense from.
  14. Raymond – From afar he seems like the guy you always hope can be more but never is.
  15. Ponikarovsky – He kind of like Cooke minus the agitator/cheap shot track record. Solid defensive 3rd liner at this point in his career.

 

Jun 182013
 

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

Total Minutes GF20 vs GF20
>500 0.79
>1000 0.85
>1500 0.88
>2000 0.89
>2500 0.88
>3000 0.88
>4000 0.89
>5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

Total Minutes FSh% vs FSh%
>500 0.54
>1000 0.64
>1500 0.71
>2000 0.73
>2500 0.72
>3000 0.73
>4000 0.72
>5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

GF20 predictive power 3yr vs 3yr 0.61
True Randomness Estimate 0.11
Unaccounted for factors estimate 0.28
Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

Total Minutes GF20 vs GF20 FSh% vs FSh%
>500 0.81 0.58
>1000 0.86 0.68
>1500 0.88 0.71
>2000 0.89 0.73
>2500 0.89 0.73
>3000 0.90 0.75
>4000 0.90 0.73
>5000 0.89 0.71

 

Jun 122013
 

Yesterday it came across my twitter feed a paper about using regularized logistic regression in estimating player contribution in hockey. I skimmed through the article but not enough to fully understand that article but found some of the conclusions at least mildly interesting. This post is neither a post in support or against the paper but rather a rebuttal to a rebuttal from Eric T at NHLNumbers.com.

To summarize the paper, the authors conducted a goal based analysis to estimate player contribution and to summarize Eric T’s rebuttal, Eric T applauded the effort but suggested a shot based analysis would be more appropriate because that is where ‘modern hockey thought’ currently stands.

 

I think my biggest concern is that by focusing exclusively on goals, you allow for shooting percentage variance to have a significant impact on a player’s calculated value. Even with four years of data, variance plays a large role in the shooting and save percentages with a given player on the ice.

This is why much of modern hockey analysis starts with shot-based metrics; the shooting percentages introduce a lot of variance which must be accounted for to get a reasonable assessment of talent. If you used shots for your model, I suspect you’d easily identify more than a mere 60 players who have significantly non-zero talent levels — and the model could be further refined from there (e.g. give each shot a weight based on the shooter’s career shooting percentage).

That is in essence Eric T’s argument.  Shooting percentages are unreliable so it is better to use a shot based approach (though I find it a little ironic that he then suggest incorporating shooting percentage again).

The “even with four years of data, variance plays a large role in shooting and save percentages with a given player on the ice” is the statement that I have the biggest problem with. It has been shown by myself many times that goal scoring rates are a better predictor of future goal scoring than shot rates are when dealing with multiple seasons of data. Furthermore, any study that uses sufficient amounts of data (either by using multiple seasons of data or by grouping similar players and using their aggregate shooting percentage) has concluded that shot quality (ability to sustain an elevated shooting percentage) exists and is significant. For example, we know that players that get a significant amount of ice time have significantly higher shooting percentages (see here and here and here) and just by looking at list of players sorted by their long-term on-ice shooting percentages we see that good offensive players rise to the top and poor offensive players fall to the bottom (in no way can anyone conclude that that list is random in nature). There is ample evidence to suggest that with 4 years of data goal based metrics should be the preferred tool over shot/possession based metrics.

Eric T brought up Dwayne Roloson, Kent Huskins, Sean O’Donnell, and others as examples of where he feels the evaluation system failed but pointing out a few counter examples is not enough to toss the analysis out completely. There will always be exceptions and outliers when attempting to build an all-encompassing evaluation metric. For the methodology in the paper maybe it is Roloson and Huskins but I can assure you than for any shot based metric it will be Tyler Kennedy and Scott Gomez.

The standard for which an all-encompassing metric should be tested against is not “is it perfect” and if it doesn’t pass that test toss it aside and ignore it forever. These metrics will never be perfect and should never be used as the final say on a players value. In truth, they should be used to spark conversation and discussion and further investigation, not end it. When we see strange results just as much as we shouldn’t assume they are true we shouldn’t assume the whole methodology is worthless.

Furthermore, making any argument against a new methodology because it doesn’t conform to “modern hockey thought” and suggesting they revise it to make it conform more to “modern hockey thought” is plainly the worst thing one can do. The best discoveries in the history of humanity typically arise when people don’t conform to current thought processes but rather do something different. You are free to make an argument against something but make sure that argument is something deeper than “it doesn’t conform to modern hockey thought.”

Finally, my biggest beef with many in the pro corsi/possession/shot differential crowd is the way in which many immediately and abjectly dismiss anything that strays from a corsi/possession/shot differential analysis. This is as fundamentally misguided as those that claim that corsi/possession/shot differential is meaningless and goals are the only tool one should use in player evaluation. The truth is, both methods provide value. The possession method primarily provides value when dealing with small sample sizes as it will reduce small sample size and random variance issues. Shot differential metrics are inherently a flawed metric though because shot differential isn’t the end goal of the player (goal differential is what matters in the win/loss column) and shot quality and ability to drive/suppress shooting percentages exists and are real. There is nothing wrong with using possession metrics as an evaluation tool so long as we are aware of this limitation just as there is nothing wrong with using goal based metrics as an evaluation tool so long as we are aware of its sample size, randomness and uncertainty limitations. Neither are perfect, both have their uses, both have their limitations and in reality both should be considered in any player evaluation.

(Note: Just to be clear, because apparently Tyler Dellow has a poor ability to interpret words properly, my critique of Eric T’s critique of the goal based all-encompassing player evaluation metric does not in any way mean that I believe Dwayne Roloson helps his team score goals. To be completely honest, I serious question how the authors of the paper incorporate goalies into the methodology and this is supported by the fact that in my own all-encompassing player evaluation metrics – goal or shot based – I assume goalies have no influence on a teams offensive production. Hope this clears the issue up for Tyler.)

 

May 212013
 

Last week there was a twitter discussion on the merits of playing a defensive shell game by limiting scoring chances against but also limiting scoring chances for, even if it meant the ration of goals for to goals against gets worse. The two sides of the debate are as follows:

Argument 1: It is always best to play a game where you are expected to out score the opposition regardless of the goals for/against rates.

Argument 2: When playing with a lead late in the game it is more important to reduce the goals against rate than maintain the goals for rate, even if it means the goals for to goals against ratio drops significantly.

To test each theory I simulated a number of games between teams T1 and T2 according to the following theories:

1. During normal play between teams T1 and T2, T1 will score at a rate of 2.75 goals/60 minutes and T2 will score at a rate of 2.50 goals/60 minutes. During this play it is expected that T1 will score approximately 52.4% of all the goals that are scored.

2. During play between T1 and T2 when T1 has a lead and is playing in defensive shell mode T1 score at a rate of 2.00 goals/60 and T2 will score at the same 2.00 goals/60 rate.

From there I simulated 1,000,000 games in which T1 is protecting a 1 goal lead for the remaining 2.5, 5, 7.5, 10, 12.5, 15, 17.5 and 20 minutes of a game under both normal style play and defensive shell style play. Here are the results at the end of regulation play.

Normal play

Wins Losses Ties RegWin% OTL Pts% PlayoffWin%
2.5mins 911132 4471 99307 96.08% 93.60% 96.32%
5mins 847011 15230 187894 94.10% 89.40% 94.54%
7.5mins 799667 28880 268711 93.40% 86.68% 94.04%
10mins 764672 44692 340642 93.50% 84.98% 94.31%
12.5mins 738696 59869 405525 94.15% 84.01% 95.11%
15mins 717679 75094 464680 95.00% 83.38% 96.11%
17.5mins 702071 88968 518004 96.11% 83.16% 97.34%
20mins 690638 102013 565261 97.33% 83.20% 98.67%

Defensive Shell

Wins Losses Ties RegWin% OTL Pts% PlayoffWinRate
2.5mins 926241 3011 79934 96.62% 94.62% 96.81%
5 mins 868285 10599 153384 94.50% 90.66% 94.86%
7.5mins 821835 21109 221668 93.27% 87.73% 93.79%
10mins 785935 32888 283819 92.78% 85.69% 93.46%
12.5mins 755920 46048 341509 92.67% 84.13% 93.48%
15mins 733346 58874 392918 92.98% 83.16% 93.92%
17.5mins 713419 72115 442202 93.45% 82.40% 94.50%
20mins 697687 85092 486930 94.12% 81.94% 95.27%

Wins, losses, ties are T1′s record after 60 minutes and regulation win% is the standard regulation winning percentage using 2 points for a win, 0 points for a loss and 1 point for a tie. PlayoffWinRate is the winning percentage of T1 in a playoff game assuming that they would win 52.4% of all overtime games. OTL Pts% is the current regular season system where you get 1 point for an overtime loss, 2 points for a win of any kind and zero points for a regulation loss (under this system for simplicity sake I assumed a 50% chance of winning an overtime game since we don’t know odds of winning a shoot out).

That is a lot of numbers, so lets look at these in nicer easier to read charts.

DefensiveShellRegulationWinPct

DefensiveShellPlayoffWinRate

DefensiveShellOTLPointsPct

Under this constructed scenario the break even point for when to go into a defensive shell and when to continue playing normal hockey is at about 7-7.5 minutes for regulation win % and playoff win % systems and about 13 minutes for the point for an overtime loss system currently used during the regular season.

For some people this may not make sense intuitively. How can it be better to stop playing a system in which you are expected to out score your opposition and start playing a system in which you are expected to score the same as your opponent. The reason is simple and it comes down to that over a short period of time your are essentially dealing with small sample size issues and randomness becomes more important than long term skill. The reality is, over a short time one team is almost as likely to score as the other so which team scored next is close to random, if any team scores at all. The most important thing when protecting a lead is simply reducing the likelihood that your opponent will score because the cost of your opponent scoring is far greater than the benefit if you scoring (it is irrelevant whether you win 3-1 or 2-1, a win is a win in the standings).

What is interesting is the effect of awarding the point for an overtime loss is in reality providing additional incentive for teams to play the defensive shell game for longer periods of time because the cost of giving up a goal is not as great in that system because a tied at the end of regulation guarantees you one point with the possibility of 2 where as in the other systems it does not. This means teams can play the defensive shell for twice as long as they could otherwise.

Of course, this is only looking at one side of the equation. Typically the trailing team will get more offensively aggressive even if it means increasing the possibility of having a goal scored against them. This is why teams pull their goalie late in the game. At that point scoring a goal is the only thing that matters so you may as well risk giving one up to score. Over the last 5-10 minutes or so it probably makes sense for the trailing team to take more high risk high reward plays in the offensive zone because at that point scoring a goal has more benefit than the cost of giving up a goal.

 

 

May 012013
 

I brought this issue up on twitter today because it got me thinking. Many hockey analytics dismiss face off winning % as a skill that has much value but many of the same people also claim that zone starts can have a significant impact on a players statistics. I haven’t really delved into the statistics to investigate this, but here is what I am wondering.  Consider the following two players:

Player 1: Team wins 50% of face offs when he is on the ice and he starts in the offensive zone 55% of the time.

Player 2: Team wins 55% of face offs when he is on the ice but he has neutral zone starts.

Given 1000 zone face offs the following will occur:

Player 1 Player 2
Win Faceoff in OZone 275 275
Lose Faceoff in Ozone 275 225
Win Faceoff in DZone 225 275
Lose Faceoff in Dzone 225 225

Both of these players will win the same number of offensive zone face offs and lose the same number of defensive zone face offs which are the situations that intuitively should have the greatest impacts on a players statistcs. So, if Player 1 is going to be more significantly impacted by his zone starts than player 2 is impacted by his face off win % losing face offs in the offensive zone must still have a significant positive impact on the players statistics and winning face offs in the defensive zone must must still have a significant negative impact on the players statistics. If this is not the case then being able to win face offs should be more or less equivalent in importance to zone starts (and this is without considering any benefit of winning neutral zone face offs).

Now, I realize that there is a greater variance in zone start deployment than face off winning percentage, but if a 55% face off percentage is roughly equal to a 55% offensive zone start deployment and a 55% face off win% has a relatively little impact on a players statistics then a 70% zone start deployment would have a relatively little impact on the players statistics times four which is still probably relatively little.

I hope to be able to investigate this further but on the surface it seems that if face off win% is of relatively little importance it is supporting of my claim that zone starts have relatively little impact on a players statistics.

 

Apr 252013
 

I am hoping to get playoff stats on stats.hockeyanalysis.com but it is going to take some work especially if I am to do game by game and series by series stats including “with you” and “against you” stats. As such I have decided to start a crowd funding project at RocketHub.com (because unlike Kickstarter they support Canadians) to help justify the time I will have to put in to getting these stats up in a relatively short time frame. Below is the description of the project and what I hope to achieve and if you are interested in contributing you can do that at the project page at RocketHub.com. Your contributions are greatly appreciated and I think you will enjoy what I have planned for stats.hockeyanalysis.com.

—————————————

Hello. This is David Johnson from HockeyAnalysis.com and creator of the popular advanced hockey statistics website Stats.HockeyAnalysis.com. Much of my work on hockey analytics has been at the macro level, or more specifically evaluating players over 1, 2, or more years. This works great for the regular season and for evaluating a players overall talent level which is where my interest mostly lies but there seems to be a strong demand for more micro level stats such as how players or teams perform in a single game or over a short stretch of games (i.e. after the trade deadline, before and after a coach got replaced, etc.) and this is especially true during the Stanley Cup playoffs.

The problem is, much of my existing code base that I use for stats.hockeyanalysis.com is designed for macro level stats and to revamp it to calculate stats on a per game or per playoff series basis and make these available on the web will take a significant redesign and rewrite of large portions of the code.

My goal for this project is to make some of those changes so I can get some playoff stats up for those that are interested and down the road make per game and per groups of game data available for regular season data starting next season. Here is what I am hoping to generate for these playoffs:

  • Team stats by series and playoffs overall
  • Player stats by series and playoffs overall
  • Game by game team stats
  • “with you” stats by game, series and playoffs overall so you can see how the team performed with various pairs of players on the ice.
  • “against you” stats by game, series and playoffs so you can see which players were successful at scoring on or shutting down their opponents.
  • For each of the above I will be adding goal, shot, fenwick and corsi data (totals and possibly %’s).
  • Will add zone start data to “with you” and “against you” data as time permits.
  • Will start with just looking at 5v5 situations but will add other situations if time permits.

My intent is to start by adding playoff stats similar to the existing regular season stats and then as development progresses I’ll be adding the other features with hopefully the majority of them being added by the end of round 1 if not sooner.

I am looking for some funding so I can justify the significant time over the next few weeks that it will take to rewrite my code and make game by game playoff stats available. I figure if each of the regulars that use stats.hockeyanalysis.com contributes between $10 and $50 (larger donations certainly welcome though) it will be easy to reach my funding goal. Any additional funding beyond my goal will be devoted towards adding similar game by game features to the regular season data for the start of next season.

Apr 192013
 

Tyler Dellow has a post at mc79hockey.com looking at zone starts and defensemen and if you read it the clear conclusion is that zone starts seem to matter quite a bit. In the third chart you can see that defensemen who get the most extreme defensive zone starts have an average corsi% of 44.7% while the average corsi% for defensemen with the most extreme offensive zone starts is 53.3%. This would seem to indicate that for defensemen zone starts can impact your corsi% anywhere from -5.3% to +3.3%. This is far more significant than I have estimated myself using a different methodology so I pondered that part of the reason for this is that when you start in the defensive zone you are playing with weaker quality of teammates than when you start in the offensive zone. My reasoning is that players that get used primarily in the defensive zone are often weak offensive players as if you are a good offensive player you will be given offensive opportunities. I wanted to explore this concept further and that is what I present to you here.

Unlike Tyler Dellow I used forwards in my analysis but it is unlikely that this will have a major impact in the analysis as forwards and defensemen are always on the ice together. One difference between my analysis and Tyler Dellow’s is I used data from stats.hockeyanalysis.com where as Tyler used stats from behindthenet.ca. Behindthenet.ca includes goalie pulled situations in their data and this has the potential to greatly emphasize the impact of zone starts. I feel it is important to eliminate this factor so I have it removed from the data. I also only used 2011-12 data but that shouldn’t have a major impact on the results.

So, my theory is that players who start in the defensive zone are weaker players overall. The challenge to this is that players who start with players that start frequently in the defensive zone likely start frequently in the defensive zone themselves and thus their stats are subject to zone start effects so if they have weak stats we don’t know whether they are due to the zone starts or because they are weak players. My solution was to look at the players zone start adjusted stats that I have on stats.hockeyanalysis.com. These stats ignore the first 10 seconds after a zone face off as it has been shown that the majority of the benefit/penalty of a zone face off has largely dissipated after 10 seconds. I understand that it may seem weird to use zone start adjusted data in a study that attempts to estimate the impact of zone starts but I don’t know what else to do.

I want to also point out that I will be using ZS adjusted FF% team mates when the team mates are not on the ice with the player and this may also mitigate the ZS impact on the teammates stats. My reasoning is, if a player has an extensive number of defensvie zone starts, it is quite possible that when his team mates are not playing with him their zone starts are more neutral or maybe even offensive zone biased. It if there ever was a way to get a non-zone start impacted FF% to use as a QoT metric this is probably the best we can do.

Ok, so what I did was compare a players 5v5 FF% (fenwick %) and zone start adjusted 5v5 TMFF% (zone start adjusted FF% of teammates when team mates are not playing with him) and came up with the following:

FFPct_vs_TMFFPct_by_ZS

As you can see, TMFF% does seem to vary across zone start profiles as I had hypothesized though to a lesser extent than the players zone start influenced FF% which is to be expected. So, if we subtract TMFF% from FF% we get the following chart:

FFPct-TMFFPct_by_ZS

This chart indicates that the zone start impact on forwards once adjusted for quality of teammates (as best we can) ranges from -2.5% to +2.15% which is significantly lower than the -5.3% to +3.3% estimate that Tyler Dellow came up with for defensemen without adjusting for quality of teammates and using goalie pulled situations included in the data. That said, this is still more significant than my own estimates when I compared 5v5 data to 5v5 data with the first 10 seconds after a zone start ignored. When I did that I calculated the impact on H. Sedin’s FF% due to his heavy offensive zone starts to be +1.4% to his FF% and considered this an upper bound. To investigate this further I plotted the average difference between 5v5 FF% and my 5v5 zone start adjusted FF% and I get the following:

FFPct-ZSAdjFFPct_by_ZS

The above is an estimate of the average impact of zone starts using my zone start adjustment methodology which ignores the first 10 seconds after a zone face off. This is significantly lower than either of the previous 2 estimates as we can see in this summary table:

Methodology ZS Impact Estimate
T. Dellow’s estimate for defensemen -5.3% to +3.3%
My TM Adjusted estimate for forwards -2.5% to +2.15%
My 10 second after Zone FO adjustment for forwards -0.5% to +0.41%

I am pretty sure none of what I have said above will put an end to the impact of zone starts on a players statistics debate but at the very least I hope it sheds some light on some of the issues involved. For me personally, I have the most confidence in my zone start adjustment method which removes the 10 seconds after a zone face off. My reasoning is studies have shown that the effect of a zone face off is largely eliminated within the first 10 seconds (see here or here) and also because it is the only methodology that compares a player to himself under similar playing conditions (i.e. same season, almost identical QoT, QoC and situation profiles) eliminating most of the opportunity for confounding factors to influence the results. If this is the case, the impact of zone starts on a players stats is fairly small to the point of being almost negligible for the majority of players.

 

Apr 122013
 

Even though I think the idea of ‘usage’ and ‘tough minutes’ is a vastly over stated factor in an individual players statistics they are interesting to look at as it gives us an indication of how a coach views the player. So for all the usage fans, here is another usage statistic which I will call the Leading-Trailing Index, or LT Index for short.

LT Index = TOI% when leading / TOI% when trailing

where TOI% is the percentage of the teams overall ice time (in games that the player played in) that the player is on the ice (so a 5v5 TOI% of 20% means the player was on the ice for 20% of the time that the team was at 5v5). Thus, the LT index is a ratio of the players ice time when his team is leading to his ice time when his team is trailing adjusted for the overall ice time that the team is leading/trailing. Any number greater than 1.00 indicates the player gets a greater share of ice time when the team is leading and anything under 1.00 indicates the player gets a greater share of ice time when the team is trailing.  So, any players with an LT index greater than one is used more as a defensive player than an offensive one and anything less than one they are used more as an offensive player than a defensive one. Any player around 1.00 is a well balanced player. So, looking at this seasons data we have the following player usage:

Defensive Usage

Defenseman LT Index Forward LT Index
MICHAEL STONE 1.21 BJ CROMBEEN 1.66
KEITH AULIE 1.20 MATHIEU PERREAULT 1.45
RYAN MCDONAGH 1.19 CRAIG ADAMS 1.35
PAUL MARTIN 1.16 TRAVIS MOEN 1.33
BRYCE SALVADOR 1.15 BOYD GORDON 1.26
BRENDAN SMITH 1.14 JAMES WRIGHT 1.26
SCOTT HANNAN 1.14 MICHAEL FROLIK 1.23
ANDREJ SEKERA 1.13 BRIAN BOYLE 1.22
MIKE WEBER 1.13 MATT CALVERT 1.22
JUSTIN BRAUN 1.12 TANNER GLASS 1.20
BARRET JACKMAN 1.12 MATT MARTIN 1.19
ROBYN REGEHR 1.12 RUSLAN FEDOTENKO 1.19
CLAYTON STONER 1.12 STEPHEN GIONTA 1.19
ANTON VOLCHENKOV 1.11 CASEY CIZIKAS 1.19
RON HAINSEY 1.11 JEFF HALPERN 1.18
TIM GLEASON 1.11 DAVID JONES 1.17
ROSTISLAV KLESLA 1.11 NIKOLAI KULEMIN 1.17
ROB SCUDERI 1.10 ZACK KASSIAN 1.17
NIKLAS HJALMARSSON 1.10 RYAN CARTER 1.16
NICKLAS GROSSMANN 1.10 TORREY MITCHELL 1.16

Offensive Usage

Defenseman LT Index Forward LT Index
RYAN ELLIS 0.78 DEREK DORSETT 0.77
KRIS LETANG 0.84 RAFFI TORRES 0.77
MARK STREIT 0.86 TAYLOR HALL 0.79
KYLE QUINCEY 0.87 CORY CONACHER 0.79
MATT NISKANEN 0.87 JORDAN EBERLE 0.80
JUSTIN SCHULTZ 0.87 NAIL YAKUPOV 0.82
DOUGIE HAMILTON 0.88 RYAN NUGENT-HOPKINS 0.82
VICTOR HEDMAN 0.88 RICH CLUNE 0.82
DAN BOYLE 0.89 BLAKE COMEAU 0.82
KEVIN SHATTENKIRK 0.89 KYLE PALMIERI 0.84
ALEX PIETRANGELO 0.89 BRENDAN GALLAGHER 0.84
JOHN-MICHAEL LILES 0.90 CLAUDE GIROUX 0.86
JOHN CARLSON 0.90 VINCENT LECAVALIER 0.86
P.K. SUBBAN 0.90 DREW SHORE 0.86
LUBOMIR VISNOVSKY 0.91 TJ OSHIE 0.87
CODY FRANSON 0.91 ALEX OVECHKIN 0.87
JAMIE MCBAIN 0.91 JONATHAN HUBERDEAU 0.87
ROMAN JOSI 0.92 NICKLAS BACKSTROM 0.87
JARED SPURGEON 0.93 SCOTT HARTNELL 0.87
CHRISTIAN EHRHOFF 0.93 MARIAN HOSSA 0.87

Balanced Usage

Defenseman LT Index Forward LT Index
MICHAEL DEL_ZOTTO 0.99 BRYAN LITTLE 0.99
ERIC BREWER 0.99 MIKE FISHER 0.99
JAKUB KINDL 0.99 MIKKEL BOEDKER 0.99
ADRIAN AUCOIN 0.99 ALEXEI PONIKAROVSKY 0.99
ALEX GOLIGOSKI 0.99 JASON POMINVILLE 0.99
ERIK GUDBRANSON 1.00 CHRIS STEWART 0.99
DREW DOUGHTY 1.00 DANIEL BRIERE 1.00
THOMAS HICKEY 1.00 RADIM VRBATA 1.00
JOHNNY ODUYA 1.00 ALEX TANGUAY 1.00
SLAVA VOYNOV 1.00 GABRIEL LANDESKOG 1.00
MATT IRWIN 1.00 JIRI TLUSTY 1.00
FRANCIS BOUILLON 1.01 COLIN WILSON 1.00
JONAS BRODIN 1.01 PATRICK DWYER 1.00
BRENT SEABROOK 1.01 JADEN SCHWARTZ 1.01
JOSH GORGES 1.01 BRANDON SAAD 1.01
DUSTIN BYFUGLIEN 1.01 LEO KOMAROV 1.01
BRENDEN DILLON 1.01 DREW MILLER 1.01
GREG ZANON 1.01 DAVID PERRON 1.01
KRIS RUSSELL 1.02 TOM PYATT 1.01

It’s amazing how much more BJ Crombeem gets used protecting a lead than when trailing. You’d have to think that score effects could have a significant impact on his stats because of this. Not really a lot of surprises there though though in the case of a guy like Derek Dorsett him being in the ‘offensive usage’ category has more with the coach not wanting to use him defending a lead than hoping he will score a goal to get the team back in the game.

 

Apr 122013
 

Now that I have added home and road stats to stats.hockeyanalysis.com I can take a look at how quality of competition differs when the team is at home vs when they are on the road. In theory because the home team has last change they should be able to dictate the match ups better and thus should be able to drive QoC a bit better. Let’s take a look at the top 10 defensemen in HARO QoC last season at home and on the road (defensemen with 400 5v5 home/road minutes were considered).

Player Name Home HARO QOC Player Name Road HARO QOC
GIRARDI, DAN 8.81 MCDONAGH, RYAN 6.73
MCDONAGH, RYAN 8.49 GORGES, JOSH 6.48
PHANEUF, DION 8.46 GIRARDI, DAN 6.03
GARRISON, JASON 8.27 SUBBAN, P.K. 5.95
GORGES, JOSH 8.25 PHANEUF, DION 5.94
GLEASON, TIM 8.21 GUNNARSSON, CARL 5.48
SUBBAN, P.K. 8.19 ALZNER, KARL 5.35
WEAVER, MIKE 7.92 STAIOS, STEVE 5.15
ALZNER, KARL 7.74 TIMONEN, KIMMO 4.95
REGEHR, ROBYN 7.72 WEAVER, MIKE 4.67

There is definitely a lot of common names in each list but we do notice that the HARO QoC is greater at home than on the road for these defensemen. Next I took a look at the standard deviation of all the defensemen with 400 5v5 home/road minutes last season which should give us an indication of how much QoC varies from player to player.

StdDev
Home 3.29
Road 2.45

The standard deviation is 34% higher at home than on the road which again confirms that variation in QoC are greater at home than on the road.  All of this makes perfect sense but it is nice to see it backed up in actual numbers.