In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

 Comparison 0710 vs 1013 Even vs Odd Difference GF20 vs GF20 0.61 0.89 0.28 FF20 vs FF20 0.62 0.97 0.35 FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs GF20 0.54 0.86 0.33 GF20 vs FF20 0.44 0.86 0.42 FSh% vs GF20 0.48 0.76 0.28 GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs FSh% 0.38 0.66 0.28 FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

 Comparison Even vs odd GF20 vs GF20 0.82 FF20 vs FF20 0.93 FSh% vs FSh% 0.63 FF20 vs GF20 0.74 GF20 vs FF20 0.77 FSh% vs GF20 0.65 GF20 vs FSh% 0.66 FF20 vs FSh% 0.45 FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

Rob Vollman was nice enough to send me a PDF version of his book Hockey Abstract which I have spend a bit of time the past couple days looking over. I have not read it right through but have read over a few sections and skimmed through a good chunk of the rest of the book. I must say, if you are looking for a fairly readable, not math heavy, practical introduction to hockey statistics this is an excellent start. I like how Vollman uses statistics to answer very simple questions such as “Who is the best player?” and “Who is the luckiest team?” and in doing so explains why certain statistics are used and why some are not.

While I think Hockey Abstract is an excellent intro to hockey statistics and everything I read is useful and informative I think one of the most important paragraphs in the book is in the Introduction on page 1.

One of the most common and recurring criticisms of statistical analysis in hockey is that it isn’t comprehensive and foolproof, which is why I want to establish upfront that none of these answers are meant to be definitive. After all, in several cases this book will be the first serious attempt to answer certain questions this way. Plus, I love hockey arguments―I want to refuel the conversations, not end them!

One of the things that irritates me about some that use hockey statistics to make arguments is that they often write in absolutes. This players corsi isn’t very good therefore he is a bad player. That teams PDO is high therefore they are lucky and undeserving of their record. While there is ample evidence to suggest a poor corsi is evidence of a bad player or a high PDO is evidence of a lucky team suggesting the are absolutely true might be a false claim. There are a number of good players that have poor corsi statistics and there are teams with elevated PDO’s that achieve them through talent, not luck.

There is a lot that statistics can tell us about an NHL player or team, but there is still a lot we don’t (fully) know and can’t yet (fully) quantify. For example, I’ll argue that we can’t yet properly isolate an individuals contribution and talent from his line mates and the best we can do infer it by looking at a series of WOWY numbers or other statistics but it is far from fool proof. There is still a lot to learn about hockey stats and we need to fuel more discussions and research, not end them with absolute statements (from either side of the pro/anti stats fence).

So, with that in mind, Hockey Abstract is a great introduction to hockey analytics and  presents a good current view of much of what is currently known in hockey analytics. After reading Hockey Abstract I am certain one would be far more familiar with the hockey statistics and how, why and when we use them. I know I won’t have a problem recommending it to any of the numerous people who e-mail me asking for where they can get an intro to hockey statistics. My recommendation though would be to read it with a critical, but open, mind whether you are a big proponent of the value of hockey analytics, a skeptic, or somewhere in between.

Time permitting I’ll write a more detailed review and critique once I have finished reading it.

You can purchase a copy of Hockey Abstract from Amazon.com or a .pdf copy here.

Last week I posted an article with a proposal to standardize the terminology and nomenclature we use for advanced hockey statistics. This was an attempt to solicit feedback as to whether such a standardization was necessary and if so get some feedback on my proposed terminology. The response at first was slow to trickle in but the responses that did were somewhat telling. Generally speaking, while not a ton of people had an opinion, the feedback on standardizing terminology and getting rid of names like Corsi, Fenwick and PDO was positive.

Interestingly, or maybe not, the biggest resistance to the change was from some of the more hard core advanced statistics people. From this group of people more of the feedback was more along the lines of to “it will sort itself out eventually” to “people attack the name corsi as a way to attack the stat itself”.

Upon challenging Eric T. more he said he was open to standardization but believed it would eventually happen one way or another so wasn’t worried about the details now. In other conversations, he referred back to baseball and how its stats are named.

There is a clear difference though. DIPS is Defense Independent Pitching Statistics. BABIP is Batting Average on Balls In Play. OPS is On base Plus Slugging percentage. Corsi is a shot attempt differential stat that includes shots, missed shots and blocked shots and is named after an obscure former NHL goalie who used a similar metric to evaluate the work load a goalie experiences in a game during his time as a goalie coach for the Buffalo Sabres. Yes, you need an explanation of what the baseball stats mean but once you get that explanation you say “ahhh, it makes sense.” Down the road you see BABIP the name itself is a reminder of what the definition of it is. It is a lot easier to remember BABIP and what it stands for than it is to remember PDO and what it stands for. Now, this may not seem that difficult for someone who speaks and works with the terminology daily, but for the casual dabblers in advanced stats that might only occasionally read an article referencing advanced stats it becomes more difficult. There are no clues within PDO to trigger a memory response to recall it’s definition and this is even more difficult for Corsi and Fenwick where the differences are subtle. I have joked in the past that PDO stands for Pretty Damn Obfuscated because, well it is, and yet the stat itself is conceptually trivial to understand and calculate from it’s component parts.

Some of the arguments I have read on the resisting changes side of the equation essentially translates into “the onus is on others to learn our terminology and not on us to make it easier for them to learn” and “if they cannot be bothered to look up what a term means they probably won’t be bothered to understand what the stat says.” In some cases there may be some merit to these arguments but they also come across as being somewhat arrogant or elitist in nature. ‘It is not our duty to make it easy for others to learn it is their duty to take the time to learn what we do.’ It is probably not deliberate and we probably all do this sort of thing with respect to our own “fields of expertise” whatever they may be but that doesn’t mean it is right. We need to remind ourselves that being more accommodating and understanding of newcomers and casual observers is only going to benefit the field over the long haul.

I myself took offense to Eric T’s article “Steering advanced regression tools towards modern hockey thought” because when I hear the idea of steering other peoples work to a specific way of thinking it smacks of the same sort of elitism and arrogance (i.e. there being a “right way” and a “wrong way” to think of hockey analytics and one must conform to the right way). I am sure this was not deliberate in any way on Eric’s part but it it was made worse by the fact there was no effort to understand and critique the work that was actually done, only to point out what was not done up to “modern hockey thought” standards (which are not clearly published anywhere, more on this later). With that said, in the “Modern Hockey Thought” post Eric did write one thing that I think we can work off of:

I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.

I am not well versed in the history of baseball analytics but if this is occurring in the hockey analytics world, one must look into the reasons why. The truth is, those of us within the hockey analytics community have done a terrible job at removing barriers to entry regardless of how small those barriers to entry may seem. It starts with standardizing terminology and making terminology more understandable but the hockey analytics community has done a terrible job of making its research easily accessible. There is no well organized and maintained glossary of statistics and definitions. There is no well organized and maintained list of important papers and articles. There is no single place that one can go to get an up to date description of the state of hockey analytics, of what we know and what we don’t know, what we agree on and where differing opinions exist. We expect everyone to be up to speed on the current state of hockey analytics but don’t seem to want to put any effort into making it possible for people to do so. There was an attempt by Eric last summer on NHLNumbers.com to document the current state of hockey analytics and create a directory of important articles but it wasn’t finished and what was done isn’t easily found. If you want to learn about hockey analytics google is really your only resource but that can lead you in the wrong direction just as easily as the right direction and more likely than not lead to people giving up. As a hockey analytics community we need to be better at organizing the knowledge we have and making it far more accessible and it’s unfortunate that my effort to do so with regards to standardization of statistic naming conventions was mostly met with a big “meh, whatever” by the analytics community, or even worse ‘it means more work for us’.

 Event Statistics Description TOI Time on ice G Goals A Assists FirstA First Assists SoG Shots on goal SA Shot Attempts (includes missed and blocked shots, formerly a corsi event) UBSA UnBlocked Shot Attempts (does not include blocked shots, formerly a fenwick event) Percentage Statistics Sh% Shooting percentage (G/S) SASh% Shot Attempt Shooting percentage (G/SA) UBSA-Sh% Unblocked Shot Attempt Shooting percentage (G/UBSA) Sv% Save percentage (GA/SA) SASv% Shot Attempt Save percentage (GA/SAA) UBSASv% Unblocked Shot Attempt save percentage (GA/UBSAA) SPS Save Plus Shooting (percentages) SASPS Shot Attempt Save Plus Shooting (percentages) UBSASPS Unblocked Shot Attempt Save Plus Shooting (percentages) Other Statistics IGP Individual Goals Percentage (iG / GF) IAP Individual Assist Percentage (iA / GF) IPP Individual Points Percentage (iPts / GF) ISP Individual Shot Percentage (iS / SF) ISAP Individual Shot Attempt Percentage (iSA/SAF) IUBSAP Individual Unblocked Shot Attempt Percentage (iUBSA/UBSAF) Zone Starts OZFO Numer of Offensive Zone Face Offs NZFO Number of Neutral Zone Face Offs DZFO Number of Defensive Zone Face Offs OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO) NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO) DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO) OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO) DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO) OZFOW% Offensive Zone Face Off Winning Percentage NZFOW% Neutral Zone Face Off Winning Percentage DZFOW% Defensive Zone Face Off Winning Percentage FOW% Face off win percentage (all zones) Prefix i Individual Stats TM Average stats of team/line mates weighted by TOI with Opp Stats of opposing players weighted by TOI against PctTm Percent of Teams stats the player recorded in games the player played in Suffix F Stats for the players team while player is on the ice A Stats against the players team while player is on the ice 20 or /20 Stats per 20 minutes of ice time 60 or /60 Stats per 60 minutes of ice time F% Percentage of events that are by the players own team (i.e. for) D Difference between For and Against statistics (i.e. a +/- statistics)

The major changes I made were to use “SA” (shot attempts) for corsi events and “UBSA” (unblocked shot attempts) for fenwick events instead of ASAG (attempted shots at goals) and SAG (shots at goal) in my previous iteration. This should make things a little clearer than my first proposal.  Update: Changed Sv+Sh% to SPS (Save Plus Shooting percentages).

One of the complaints against advanced statistics in hockey is the names of some of the advanced statistics. Sometimes people complain about names like Corsi, Fenwick, PDO, etc. because they don’t have meaningful names. I never really understood it because once you figure it out, which honestly it isn’t that difficult, it isn’t all that difficult. That said, it still seems that some people feel it is a bit of a hurdle for some to get into advanced hockey statistics. I am hoping to revamp and improve my hockey statistics database even more this summer and in the process I wondered if there is interest in having me use some standardized hockey statistics nomenclature that we can all agree on. Here is what I am proposing:

 Event Statistics Description TOI Time on ice G Goals A Assists FirstA First Assists SOG Shots on goal SAG Shots at goal (includes missed shots) ASAG Attempted Shots at Goal (includes missed and blocked shots) Percentage Statistics Sh% Shooting percentage (G/SoG) SAGSh% Shots at goal shooting percentage (G/SaG) ASAGSh% Attempted Shots at Goal Shooting percentage (G/aSaG) Sv% Save percentage (G/SoG) SAGSv% Shots at goal save percentage (G/SaG) ASAGSv% Attempted Shots at Goal Save percentage (G/aSaG) ShSv% Shooting percentage + save percentage (Sh% + Sv%) SAGShSv% Shots at goal shooting percentage + save percentage (SAGSh% + SAGSv%) ASAGShSv% Attempted Shots at goal shooting percentage + save percentage (ASAGSh% + ASAGSv%) Other Statistics IGP Individual Goals Percentage (iG / GF) IAP Individual Assist Percentage (iA / GF) IPP Individual Points Percentage (iPts / GF) ISOGP Individual Shots on Goal Percentage (iSOG / SOGF) IASAGP Individual Shots at Goal Percentage (iSAG / SAGF) IASAGP Individual Attempted Shots at Goal Percentage (iASAG / ASAGF) Zone Starts OZFO Numer of Offensive Zone Face Offs NZFO Number of Neutral Zone Face Offs DZFO Number of Defensive Zone Face Offs OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO) NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO) DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO) OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO) DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO) OZFOW% Offensive Zone Face Off Winning Percentage NZFOW% Neutral Zone Face Off Winning Percentage DZFOW% Defensive Zone Face Off Winning Percentage FOW% Face off win percentage (all zones) Prefix i Individual Stats TM Average stats of team/line mates weighted by TOI with Opp Stats of opposing players weighted by TOI against PctTm Percent of Teams stats the player recorded in games the player played in Suffix F Stats for the players team while player is on the ice A Stats against the players team while player is on the ice 20 or /20 Stats per 20 minutes of ice time 60 or /60 Stats per 60 minutes of ice time F% Percentage of events that are by the players own team (i.e. for) D Difference between For and Against statistics

The major changes are instead of calling shots + missed shots fenwick events we call them Shots At Goal (SAG) and instead of calling shots + missed shots + blocked shots corsi events we call them Attempted Shots At Goal (ASAG). Also PDO which is shooting percentage + save percentage is now named ShSv%.

The prefixes and suffixes can be added to individual stats to create new statistics. For example:

• iSh% = Individual Shooting Percentage (iG / iSOG)
• TMSAG20 = Team mate average Shots at Goal per 20 minutes of ice time weighted by TOI with
• OppGF% = Opponent average Goals For Percentage weighted by time on ice against
• PctTmG = In games that the player played in, the percentage of his teams goals that the player himself scored.

Note that not all combinations of prefixes and suffixes make sense. For example, PctTmSh% or Sh%F but that is self explanatory I think.

What does everyone think? I am perfectly fine sticking with the way I have statistics currently presented but if the majority think something along the lines of the above is better I am all for making the change. If anyone has any other suggestions they are welcome as well. I just think that this is as good a time as any to come up with some standardized nomenclature.

Also, I currently have statistics for the following situations:

• 5v5
• 5v5 Home
• 5v5 Close
• 5v5 Tied
• 5v5 Up1
• 5v5 Up 2+
• 5v5 Down 1
• 5v5 Down 2+
• 5v5 Trailing
• 5v4 PP
• 4v5 SH
• Zone start adjusted data for all of the above except 5v4 SH and 4v5 SH.

If there is interest I may consider adding other situations. For example, first period, second period, third period, 4v4, 5v5 close home and 5v5 close road. Would anyone find these or any other situation interesting to look at?

Also feel free to consider the comments of this post the place where you can officially make any other suggestions of upgrades/enhancements you would like to see made to stats.hockeyanalysis.com. I can’t make any promises I will do implement them but I hope to make some upgrades over the summer.

Update:  Added ‘D’ to the suffix list which stands for differential. So ASAGD would stand for Attempted Shots At Goal Differential which is the equivalent of corsi differential in use now. Might consider adding Rel but need to consider if it is necessary or not. Thoughts?

Unfortunately I didn’t have as much time this week as I had hoped to do a full evaluation of unrestricted free agent centers like I did for wingers but it is free agent day and there was some big news regarding centers yesterday with the buy out of Grabovski so I thought I’d throw a little something together where I look at some offensive statistics of some of the top centers available. Let me start off by presenting you with the summary table.

 G/60 A/60 Pts/60 IPP GF20-TMGF20 FF20-TMFF20 OZBias Ribeiro 0.593 1.512 2.11 80.5 0.113 -0.025 102.6 Filppula 0.769 1.334 2.1 75 0.116 -0.878 104.7 Lecavalier 0.799 1.186 1.99 68.1 0.139 0.381 100.7 Grabovski 0.899 0.961 1.86 65.4 0.196 2.406 96 Roy 0.587 1.146 1.73 67.4 0.039 0.747 98.7 Weiss 0.652 0.821 1.47 65.6 0.07 -0.467 103.3 Bozak 0.566 0.775 1.34 54.2 -0.062 0.292 99.8

The numbers above are 5v5 numbers over the past 3 seasons and the players are sorted by Pts/60. I threw in Lecavalier because he was a UFA for a brief period of time and is at more or less the same level as the others. I included Bozak to highlight just how much he doesn’t fit in with the rest of the group.

• G/60 = Goals per 60 minutes of ice time.
• A/60 = Assists per 60 minutes of ice time
• Pts/60 = Points per 60 minutes of ice time.
• IPP = Individual Points Percentage, or the percentage of goals scored while on ice that the player had a point on.
• GF20-TMGF20 = How much better are his team mates on-ice goal stats when playing with him than without.
• FF20-TMFF20 = How much better are his team mates on-ice shot generation when playing with him than without.
• OZBias = OZ Starts*2 + NZStarts and gives an indication of the players usage.

List sorted by G/60: Grabovski, Lecavalier, Filppula, Weiss, Ribeiro, Roy, Bozak

List sorted by A/60: Ribeiro, Filppula, Lecavalier, Roy, Grabovski, Weiss, Bozak

List sorted by Pts/60: Ribeiro, Filppula, Lecavalier, Grabovski, Roy, Weiss, Bozak

List sorted by IPP: Ribeiro, Filppula, Lecavalier, Roy, Weiss, Grabovski, Bozak

List sorted by GF20-TMGF20:  Grabovski, Lecavalier, Filppula, Ribeiro, Weiss, Roy, Bozak

List sorted by FF20-TMFF20: Grabovski, Roy, Lecavalier, Bozak, Ribeiro, Weiss, Filppula

Some comments on each player:

Mike Ribeiro: Easily the best play maker of the group and is most consistently involved in the play.

Valterri Filppula: Better goal scorer than Ribeiro but not as good as a play maker as Ribeiro but better than the rest.

Vincent Lecavalier: Similar to Filppula in value but better at the possession game.

Mikhail Grabovski: Not a great play maker but a good finisher and good at driving shot generation indicating he is probably good at puck retrieval.

Derek Roy: Kind of a poor mans Ribeiro but much less valuable.

Stephen Weiss: More of a poor mans Lecavalier. Easily had the worst line mates of the group and might do better in a different situation.

Tyler Bozak: Weak at goal scoring, bad at play making, not involved in the play and a drag on his team mates goal production. Not anywhere close to the same league as the others (and maybe be better suited for a different league too).

For me, Ribeiro is probably the best of the group in terms of pure offense because of his elite play making ability. Grabovski and Lecavalier are a little more balanced with better scoring and puck retrieval skills while Filppula is pretty solid all round as well and has the flexibility of being used as either a center or a winger (which is valuable if locking in long-term). It’s difficult to compare Weiss to the rest because he simply hasn’t had near as good of line mates but it is probably safe to say he’d be a bit of a step down from Grabovski, Lecavalier or Filppula. Roy, on the other hand, would definitely be a step back but still a decent consolation prize if on a lower priced contract with shorter term. Definitely not anything more than a #2 center though.

As for Bozak, well, you simply don’t want him on your team. Maybe not at any price no matter what the bargain basement price is. I have tried and tried but I just can’t find any redeeming qualities for him outside of his ability to win face offs which has limited value. There simply is no reason why you would want to play him on any of your top 3 lines. None.

Being a Leaf fan and unable to keep Grabovski, my preference would be Ribeiro or Filppula but might be willing to take a chance on Weiss if the contract was right. Ribeiro’s play making skills with the Leafs wingers should be a good combination and Filppula is a good all round player who could shift to wing down if needed. Weiss seems like a solid 2-way player who might be able to step up his game with better line mates which he’d get with the Leafs. If they sign Bozak, I am not sure what I’ll do. It’ll be a sad day.

This years free agent class is a relatively thin one, pending compliance buy outs of course, but there are a handful of good players that could be hitting the unrestricted free agent market this summer. Today I’ll take a look at the wingers.

In total I identified 15 wingers that I would consider quality NHL regulars. These are in no particular order Nathan Horton, Viktor Stalberg, Ryane Clowe, mason Raymond, Clarke MacArthur, Patrick Elias, David Clarkson, Dan Cleary, Pascal Dupuis, Brad Boyes, Alexei Ponikarovsky, Jarome Iginla, Michael Ryder, Bryan Bickell, and Matt Cooke. I have omitted from the list Teemu Selanne and Daniel Alfredsson since if they do return it will almost certainly be with the Ducks and Senators respectively. I have also omitted Damien Brunner because he doesn’t have enough of a track record as I am looking at 3 seasons of data in my statistical evaluation. I have also omitted Jaromir Jagr because, well, for some reason I forgot to include him and couldn’t be bothered to go back and plug him into all the tables. He still has some value, but I am not sure how significant it is.

(Note that unless mentioned otherwise, the stats below are 5v5 stats over the past 3 seasons)

Offensive Evaluation

In order to attempt to isolate a players offensive production from their team mates one think I like to do is compare their own on-ice stats with the on-ice stats of their team mates when they are playing apart from him. To do this I took each players FF20 and GF20 and divided by teammate FF20 and teammate GF20 respectively. Here is how the wingers stack up against each other.

 Winger FF20/TMFF20 Winger GF20/TMGF20 Viktor Stalberg 1.180 Patrick Elias 1.358 Nathan Horton 1.138 Nathan Horton 1.343 Ryane Clowe 1.087 Jarome Iginla 1.290 Mason Raymond 1.083 Pascal Dupuis 1.188 Clarke MacArthur 1.076 Viktor Stalberg 1.124 Patrick Elias 1.074 Michael Ryder 1.116 David Clarkson 1.066 Clarke MacArthur 1.111 Dan Cleary 1.049 Ryane Clowe 1.075 Pascal Dupuis 1.048 Bryan Bickell 1.058 Brad Boyes 1.044 Brad Boyes 1.042 Alexei Ponikarovsky 1.018 Mason Raymond 1.037 Jarome Iginla 1.017 Matt Cooke 0.962 Michael Ryder 0.999 Alexei Ponikarovsky 0.896 Matt Cooke 0.917 Dan Cleary 0.892 Bryan Bickell 0.896 David Clarkson 0.874

Based on the above lists you’d probably have to rank Horton, Stalberg and Elias the top 3 with MacArthur and Clowe not far behind while Cooke, Ponikarovsky and Bickell don’t look so good in comparison. Those are on-ice stats though, how do their individual stats look in comparison.

 Winger G/60 Winger Points/60 Nathan Horton 1.111 Pascal Dupuis 2.28 Jarome Iginla 0.987 Nathan Horton 2.22 Pascal Dupuis 0.985 Jarome Iginla 2.09 Viktor Stalberg 0.964 Viktor Stalberg 2.03 Michael Ryder 0.941 Patrick Elias 2.01 David Clarkson 0.846 Michael Ryder 1.99 Clarke MacArthur 0.802 Clarke MacArthur 1.97 Bryan Bickell 0.779 Bryan Bickell 1.86 Matt Cooke 0.743 Brad Boyes 1.70 Dan Cleary 0.722 Ryane Clowe 1.70 Patrick Elias 0.700 Matt Cooke 1.69 Mason Raymond 0.645 Dan Cleary 1.69 Ryane Clowe 0.610 Mason Raymond 1.68 Brad Boyes 0.544 David Clarkson 1.28 Alexei Ponikarovsky 0.462 Alexei Ponikarovsky 1.20

Horton, Dupuis, Iginla, Stalberg dominate the top 4 spots on both lists while Ponikarovsky trails both lists. Individual stats are heavily influenced by quality of line mates and one measure I like to look at is the percentage of goals that their team scores when they are on the ice that they scored themselves (IGP) or had a point on (IPP). The higher the percentage the more integral the player is to his teams offense when he is on the ice.

 Winger IGP Winger IPP David Clarkson 50.7 Patrick Elias 82.1 Jarome Iginla 35.6 David Clarkson 76.7 Viktor Stalberg 34.9 Bryan Bickell 75.5 Michael Ryder 33.9 Jarome Iginla 75.2 Nathan Horton 33.1 Clarke MacArthur 73.5 Bryan Bickell 31.6 Viktor Stalberg 73.4 Dan Cleary 31.2 Dan Cleary 73.1 Pascal Dupuis 30.5 Michael Ryder 71.8 Matt Cooke 30.5 Ryane Clowe 70.9 Clarke MacArthur 29.9 Pascal Dupuis 70.8 Patrick Elias 28.6 Brad Boyes 69.9 Mason Raymond 26.4 Matt Cooke 69.5 Ryane Clowe 25.5 Mason Raymond 69.0 Alexei Ponikarovsky 25.0 Nathan Horton 66.1 Brad Boyes 22.3 Alexei Ponikarovsky 64.7

David Clarkson didn’t look so good in previous lists but when he is on the ice he is a major contributor to the teams offense. Put him with some better offensive players and it is possible he could significantly boost his offensive production. The same can probably be said for Bryan Bickell who has been given more ice time on the Blackhawks top lines these playoffs and he has produced well above his regular season rates. He could be a good bargain pickup for a team who could get good production from him as a second line winger.

Defensive Evaluation

Defensive evaluation is much tougher than offensive evaluation and I think in general wingers are the least important position as far as team defense goes. The best way to evaluate a player defensively is compare their on-ice stats with their team mates. Similar to what I did above with FF20 and GF20 I looked at TMFF20/FF20 and TMGA20/GA20.

 Winger TMFA20/FA20 Winger TMGA20/GA20 Alexei Ponikarovsky 1.150 Alexei Ponikarovsky 1.206 Patrick Elias 1.122 Clarke MacArthur 1.174 Clarke MacArthur 1.083 Brad Boyes 1.150 David Clarkson 1.069 David Clarkson 1.097 Viktor Stalberg 1.063 Bryan Bickell 1.086 Nathan Horton 1.052 Pascal Dupuis 1.078 Ryane Clowe 1.038 Viktor Stalberg 1.003 Matt Cooke 1.005 Patrick Elias 0.976 Bryan Bickell 1.001 Michael Ryder 0.954 Brad Boyes 0.996 Matt Cooke 0.948 Dan Cleary 0.973 Jarome Iginla 0.937 Michael Ryder 0.971 Ryane Clowe 0.933 Jarome Iginla 0.953 Dan Cleary 0.879 Mason Raymond 0.951 Mason Raymond 0.858 Pascal Dupuis 0.918 Nathan Horton 0.830

Ponikarovsky, MacArthur, Clarkson seem to be the best in the class here with Raymond, Cleary, and Iginla probably trailing the pack overall.

Overall Evaluation

There is nothing too scientific in this but if I had to rank the wingers in terms of value this is how I would rank them, with probably more emphasis on offensive value.

1. Iginla – Perfect for a team close looking for some help over the next couple seasons.
2. Clarkson – I am surprised I am ranking Clarkson over Horton but he comes out ahead in more categories and may come cheaper. I’d still be cautious about over paying but he has scored a bunch of goals on a bad offensive team so that is good.
3. Horton – I really like Horton but injuries have to be a concern and he’ll likely demand a big contract. He is a first line guy though and would be a big addition to any team. Has a longer track record than Clarkson too so less risky (health issues aside).
4. MacArthur – Good all-round winger ideal for a second line role or as a secondary player on a first line.
5. Elias – Age is starting to show but still very solid. Probably stays in New Jersey on short term deal.
6. Stalberg – Not quite as proven against top competition as MacArthur but similar potential.
7. Ryder – All he seems to do is score goals and still can be a 30 goal guy if given top line duty. Less rugged version of Clarkson.
8. Dupuis – Likely stick in Pittsburgh and continue benefiting from playing a bunch on Crosby’s wing.
9. Bickell – Probably worth taking a gamble on and playing in a second line role. Might be a 20 goal, 50 point guy in that role.
10. Cooke – More useful for his PK skills. Decent 3rd line guy but limited offense
11. Boyes – Decent offensive depth guy if on a good value contract. Probably re-signs with Islanders as he probably has more value to them than anyone else. Probably gets more (and higher quality) ice time than he deserves.
12. Cleary – Not as productive as he was a few years ago but still has some value as a 2nd/3rd line winger.
13. Clowe – Probably best as a 3rd line guy you hope you can get some toughness secondary offense from.
14. Raymond – From afar he seems like the guy you always hope can be more but never is.
15. Ponikarovsky – He kind of like Cooke minus the agitator/cheap shot track record. Solid defensive 3rd liner at this point in his career.

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

 Total Minutes GF20 vs GF20 >500 0.79 >1000 0.85 >1500 0.88 >2000 0.89 >2500 0.88 >3000 0.88 >4000 0.89 >5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

 Total Minutes FSh% vs FSh% >500 0.54 >1000 0.64 >1500 0.71 >2000 0.73 >2500 0.72 >3000 0.73 >4000 0.72 >5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

 GF20 predictive power 3yr vs 3yr 0.61 True Randomness Estimate 0.11 Unaccounted for factors estimate 0.28 Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

 Total Minutes GF20 vs GF20 FSh% vs FSh% >500 0.81 0.58 >1000 0.86 0.68 >1500 0.88 0.71 >2000 0.89 0.73 >2500 0.89 0.73 >3000 0.90 0.75 >4000 0.90 0.73 >5000 0.89 0.71

Yesterday it came across my twitter feed a paper about using regularized logistic regression in estimating player contribution in hockey. I skimmed through the article but not enough to fully understand that article but found some of the conclusions at least mildly interesting. This post is neither a post in support or against the paper but rather a rebuttal to a rebuttal from Eric T at NHLNumbers.com.

To summarize the paper, the authors conducted a goal based analysis to estimate player contribution and to summarize Eric T’s rebuttal, Eric T applauded the effort but suggested a shot based analysis would be more appropriate because that is where ‘modern hockey thought’ currently stands.

I think my biggest concern is that by focusing exclusively on goals, you allow for shooting percentage variance to have a significant impact on a player’s calculated value. Even with four years of data, variance plays a large role in the shooting and save percentages with a given player on the ice.

This is why much of modern hockey analysis starts with shot-based metrics; the shooting percentages introduce a lot of variance which must be accounted for to get a reasonable assessment of talent. If you used shots for your model, I suspect you’d easily identify more than a mere 60 players who have significantly non-zero talent levels — and the model could be further refined from there (e.g. give each shot a weight based on the shooter’s career shooting percentage).

That is in essence Eric T’s argument.  Shooting percentages are unreliable so it is better to use a shot based approach (though I find it a little ironic that he then suggest incorporating shooting percentage again).

The “even with four years of data, variance plays a large role in shooting and save percentages with a given player on the ice” is the statement that I have the biggest problem with. It has been shown by myself many times that goal scoring rates are a better predictor of future goal scoring than shot rates are when dealing with multiple seasons of data. Furthermore, any study that uses sufficient amounts of data (either by using multiple seasons of data or by grouping similar players and using their aggregate shooting percentage) has concluded that shot quality (ability to sustain an elevated shooting percentage) exists and is significant. For example, we know that players that get a significant amount of ice time have significantly higher shooting percentages (see here and here and here) and just by looking at list of players sorted by their long-term on-ice shooting percentages we see that good offensive players rise to the top and poor offensive players fall to the bottom (in no way can anyone conclude that that list is random in nature). There is ample evidence to suggest that with 4 years of data goal based metrics should be the preferred tool over shot/possession based metrics.

Eric T brought up Dwayne Roloson, Kent Huskins, Sean O’Donnell, and others as examples of where he feels the evaluation system failed but pointing out a few counter examples is not enough to toss the analysis out completely. There will always be exceptions and outliers when attempting to build an all-encompassing evaluation metric. For the methodology in the paper maybe it is Roloson and Huskins but I can assure you than for any shot based metric it will be Tyler Kennedy and Scott Gomez.

The standard for which an all-encompassing metric should be tested against is not “is it perfect” and if it doesn’t pass that test toss it aside and ignore it forever. These metrics will never be perfect and should never be used as the final say on a players value. In truth, they should be used to spark conversation and discussion and further investigation, not end it. When we see strange results just as much as we shouldn’t assume they are true we shouldn’t assume the whole methodology is worthless.

Furthermore, making any argument against a new methodology because it doesn’t conform to “modern hockey thought” and suggesting they revise it to make it conform more to “modern hockey thought” is plainly the worst thing one can do. The best discoveries in the history of humanity typically arise when people don’t conform to current thought processes but rather do something different. You are free to make an argument against something but make sure that argument is something deeper than “it doesn’t conform to modern hockey thought.”

Finally, my biggest beef with many in the pro corsi/possession/shot differential crowd is the way in which many immediately and abjectly dismiss anything that strays from a corsi/possession/shot differential analysis. This is as fundamentally misguided as those that claim that corsi/possession/shot differential is meaningless and goals are the only tool one should use in player evaluation. The truth is, both methods provide value. The possession method primarily provides value when dealing with small sample sizes as it will reduce small sample size and random variance issues. Shot differential metrics are inherently a flawed metric though because shot differential isn’t the end goal of the player (goal differential is what matters in the win/loss column) and shot quality and ability to drive/suppress shooting percentages exists and are real. There is nothing wrong with using possession metrics as an evaluation tool so long as we are aware of this limitation just as there is nothing wrong with using goal based metrics as an evaluation tool so long as we are aware of its sample size, randomness and uncertainty limitations. Neither are perfect, both have their uses, both have their limitations and in reality both should be considered in any player evaluation.

(Note: Just to be clear, because apparently Tyler Dellow has a poor ability to interpret words properly, my critique of Eric T’s critique of the goal based all-encompassing player evaluation metric does not in any way mean that I believe Dwayne Roloson helps his team score goals. To be completely honest, I serious question how the authors of the paper incorporate goalies into the methodology and this is supported by the fact that in my own all-encompassing player evaluation metrics – goal or shot based – I assume goalies have no influence on a teams offensive production. Hope this clears the issue up for Tyler.)

Last week there was a twitter discussion on the merits of playing a defensive shell game by limiting scoring chances against but also limiting scoring chances for, even if it meant the ration of goals for to goals against gets worse. The two sides of the debate are as follows:

Argument 1: It is always best to play a game where you are expected to out score the opposition regardless of the goals for/against rates.

Argument 2: When playing with a lead late in the game it is more important to reduce the goals against rate than maintain the goals for rate, even if it means the goals for to goals against ratio drops significantly.

To test each theory I simulated a number of games between teams T1 and T2 according to the following theories:

1. During normal play between teams T1 and T2, T1 will score at a rate of 2.75 goals/60 minutes and T2 will score at a rate of 2.50 goals/60 minutes. During this play it is expected that T1 will score approximately 52.4% of all the goals that are scored.

2. During play between T1 and T2 when T1 has a lead and is playing in defensive shell mode T1 score at a rate of 2.00 goals/60 and T2 will score at the same 2.00 goals/60 rate.

From there I simulated 1,000,000 games in which T1 is protecting a 1 goal lead for the remaining 2.5, 5, 7.5, 10, 12.5, 15, 17.5 and 20 minutes of a game under both normal style play and defensive shell style play. Here are the results at the end of regulation play.

Normal play

 Wins Losses Ties RegWin% OTL Pts% PlayoffWin% 2.5mins 911132 4471 99307 96.08% 93.60% 96.32% 5mins 847011 15230 187894 94.10% 89.40% 94.54% 7.5mins 799667 28880 268711 93.40% 86.68% 94.04% 10mins 764672 44692 340642 93.50% 84.98% 94.31% 12.5mins 738696 59869 405525 94.15% 84.01% 95.11% 15mins 717679 75094 464680 95.00% 83.38% 96.11% 17.5mins 702071 88968 518004 96.11% 83.16% 97.34% 20mins 690638 102013 565261 97.33% 83.20% 98.67%

Defensive Shell

 Wins Losses Ties RegWin% OTL Pts% PlayoffWinRate 2.5mins 926241 3011 79934 96.62% 94.62% 96.81% 5 mins 868285 10599 153384 94.50% 90.66% 94.86% 7.5mins 821835 21109 221668 93.27% 87.73% 93.79% 10mins 785935 32888 283819 92.78% 85.69% 93.46% 12.5mins 755920 46048 341509 92.67% 84.13% 93.48% 15mins 733346 58874 392918 92.98% 83.16% 93.92% 17.5mins 713419 72115 442202 93.45% 82.40% 94.50% 20mins 697687 85092 486930 94.12% 81.94% 95.27%

Wins, losses, ties are T1′s record after 60 minutes and regulation win% is the standard regulation winning percentage using 2 points for a win, 0 points for a loss and 1 point for a tie. PlayoffWinRate is the winning percentage of T1 in a playoff game assuming that they would win 52.4% of all overtime games. OTL Pts% is the current regular season system where you get 1 point for an overtime loss, 2 points for a win of any kind and zero points for a regulation loss (under this system for simplicity sake I assumed a 50% chance of winning an overtime game since we don’t know odds of winning a shoot out).

That is a lot of numbers, so lets look at these in nicer easier to read charts.

Under this constructed scenario the break even point for when to go into a defensive shell and when to continue playing normal hockey is at about 7-7.5 minutes for regulation win % and playoff win % systems and about 13 minutes for the point for an overtime loss system currently used during the regular season.

For some people this may not make sense intuitively. How can it be better to stop playing a system in which you are expected to out score your opposition and start playing a system in which you are expected to score the same as your opponent. The reason is simple and it comes down to that over a short period of time your are essentially dealing with small sample size issues and randomness becomes more important than long term skill. The reality is, over a short time one team is almost as likely to score as the other so which team scored next is close to random, if any team scores at all. The most important thing when protecting a lead is simply reducing the likelihood that your opponent will score because the cost of your opponent scoring is far greater than the benefit if you scoring (it is irrelevant whether you win 3-1 or 2-1, a win is a win in the standings).

What is interesting is the effect of awarding the point for an overtime loss is in reality providing additional incentive for teams to play the defensive shell game for longer periods of time because the cost of giving up a goal is not as great in that system because a tied at the end of regulation guarantees you one point with the possibility of 2 where as in the other systems it does not. This means teams can play the defensive shell for twice as long as they could otherwise.

Of course, this is only looking at one side of the equation. Typically the trailing team will get more offensively aggressive even if it means increasing the possibility of having a goal scored against them. This is why teams pull their goalie late in the game. At that point scoring a goal is the only thing that matters so you may as well risk giving one up to score. Over the last 5-10 minutes or so it probably makes sense for the trailing team to take more high risk high reward plays in the offensive zone because at that point scoring a goal has more benefit than the cost of giving up a goal.

I brought this issue up on twitter today because it got me thinking. Many hockey analytics dismiss face off winning % as a skill that has much value but many of the same people also claim that zone starts can have a significant impact on a players statistics. I haven’t really delved into the statistics to investigate this, but here is what I am wondering.  Consider the following two players:

Player 1: Team wins 50% of face offs when he is on the ice and he starts in the offensive zone 55% of the time.

Player 2: Team wins 55% of face offs when he is on the ice but he has neutral zone starts.

Given 1000 zone face offs the following will occur:

 Player 1 Player 2 Win Faceoff in OZone 275 275 Lose Faceoff in Ozone 275 225 Win Faceoff in DZone 225 275 Lose Faceoff in Dzone 225 225

Both of these players will win the same number of offensive zone face offs and lose the same number of defensive zone face offs which are the situations that intuitively should have the greatest impacts on a players statistcs. So, if Player 1 is going to be more significantly impacted by his zone starts than player 2 is impacted by his face off win % losing face offs in the offensive zone must still have a significant positive impact on the players statistics and winning face offs in the defensive zone must must still have a significant negative impact on the players statistics. If this is not the case then being able to win face offs should be more or less equivalent in importance to zone starts (and this is without considering any benefit of winning neutral zone face offs).

Now, I realize that there is a greater variance in zone start deployment than face off winning percentage, but if a 55% face off percentage is roughly equal to a 55% offensive zone start deployment and a 55% face off win% has a relatively little impact on a players statistics then a 70% zone start deployment would have a relatively little impact on the players statistics times four which is still probably relatively little.

I hope to be able to investigate this further but on the surface it seems that if face off win% is of relatively little importance it is supporting of my claim that zone starts have relatively little impact on a players statistics.