Sep 162013

Let’s imagine a sport where two factors are equally correlated with winning so that FactorA is 50% correlated with winning and FactorB is 50% correlated with winning. Now for years general managers in this sport only ever knew that FactorA existed and when choosing how to build their team they only ever considered FactorA. Now let’s assume that in this idealist, yet uninformed about FactorB, world every general manager of every team allocated their financial resources perfectly based on their knowledge of Factor A. On top of that, every team is working under the same financial constraints meaning they spend the exact same amount of money.

The result is, in this fictional world, FactorA becomes perfectly evenly distributed across every team. Strangely though, even after accounting for luck, teams have statistically significant differences in winning percentages.

Now, along comes a smart individual who discovers the existence of FactorB and finds out that FactorB correlates 100% with winning percentage (after factoring out luck) and concludes that General Managers were wrong all along and that FactorB is all that matters to winning and FactorA is irrelevant (has to be since it has zero correlation with winning). Upon discovering this he gets hired to become a General Manager of a team and while every other GM was only signing FactorA players he chose to go out and sign solely FactorB players. He made signing FactorB players his goal. Strangely, despite FactorB seemingly showing a 100% correlation with winning, his team didn’t win any more than anyone else.

The reason for this is that FactorA is in fact important. It just doesn’t seem important because everyone knows about FactorA and FactorA is getting evenly spread out across teams. Ignoring FactorA for FactorB is equally wrong as ignoring FactorB for FactorA. Upon learning of the existence of FactorB and its high correlation with winning, the goal of a General Manager is not to optimize his team for FactorB but to recognize that there is undiscovered value in players that have FactorB as a skill while not ignoring other skills that we previously knew existed.

Bringing this back to hockey, lets call FactorA shooting percentage and FactorB shot generation. Teams have typically doled out contracts based on shooting percentage but not based on corsi as shown by Eric T. His conclusion was:\

most teams don’t give out contracts because of Corsi. But a team that does will get more wins out of their budget than a team that follows the conventional path and overvalues finishing talent.

My response is, not if it comes at the expense of ignoring finishing talent. Based on Tom Awad’s work, finishing talent is probably at least 50% of out scoring your opposition (note that shooting percentage is a combination of out finishing and shot quality in Awad’s terminology).

So, if teams have been doling out contracts based on, effectively, shooting percentage then it is perfectly reasonable to assume that shooting percentage talent is more evenly distributed across teams than corsi-talent is. Under these circumstances corsi would be highly correlated with winning percentage because that is where the differences lie between teams. This doesn’t mean that corsi is the main factor in out scoring the opponent though and valuing corsi at the expense of shooting percentage will be a detriment to any General Manager.

Furthermore, if General Managers as a whole started paying primarily for corsi we will start to find that corsi talent becomes more evenly distributed across teams and thus shooting percentage would become much more highly correlated with winning (even after adjusting for luck). Furthermore, paying players based on corsi would potentially lead to players altering their style of play to optimize their corsi statistics to the detriment of the ultimate goal, out scoring the opponent.

It is certainly possible in the current hockey universe in which players are paid more by shooting percentage than corsi that they play a style of game to optimize shooting percentage at the expense of winning so it is not unreasonable to see the flip side occur of corsi because a metric by which general managers dole out contracts.

Ultimately, the goal of any General Manager is to optimize his line up for out scoring the opposition, not out shooting percentage-ing them and not out corsi-ing them. Corsi or possession should never be considered the goal just as shooting percentage or any other identifiable skill shouldn’t be. The goal has been, is, and always will be out score the opposition and it’s the General Managers job to find the right balance of all the identifiable skills, not just those that seemingly correlate with winning.


Sep 142013

A while back I came up with a stat which at the time I called LT Index which is essentially the percentage of a players teams ice time when leading that the player is on the ice for divided by the percentage of a players teams ice time when trailing that the player is on the ice for (in 5v5 situations and only in games in which the player played). LT Index standing for Leading-Trailing Index. I have decided to rename this statistic to Usage Ratio since it gives us an indication of whether players are used more in defensive situations (i.e. leading and protecting a lead and thus a Usage Ratio above 1.00) or in offensive situations (i.e. when trailing and in need of a goal and thus a Usage Ratio less than 1.00). I think it does a pretty good job of identifying how a player is used.

I then compared players Usage Index to their 5v5 tied statistics using the theory that a player being used in a defensive role when leading/trailing is more likely to be used in a defensive role when the game is tied. This is also an out of sample comparison (which is always a nice thing to be able to do) since we are using leading/trailing situations to identify offensive vs defensive players and then comparing to 5v5 tied situations that in no way overlap the leading or trailing data.

Let’s start by looking at forwards using data over the last 3 seasons and including all forwards with >500 minutes of 5v5 tied ice time. The following charts compare Usage Ratio with 5v5 Tied CF%, CF60 and CA60.




Usage Ratio is on the horizontal axis with more defensive players to the right and offensive players to the left.

Usage Ratio has some correlation with CF% but that correlation is solely due to it’s connection with generating shot attempts for and not for restricting shot attempts against. Players we identify as offensive players via the Usage Ratio statistic do in fact generate more shots but players we identify as defensive players do not suppress opposition shots any. In fact, Usage Ratio and 5v5 tied CA60 is as uncorrelated as you can possibly get. One may attempt to say this is because those defensive players are playing against offensive players (i.e. tough QoC) and that is why but if this were the case then those offensive players would be playing against defensive players (i.e. tough defensive QoC) and thus should see their shot attempts suppressed as well. We don’t observe that though. It just seems that players used as defensive players are no better at suppressing shot attempts against than offensive players but are, as expected, worse at generating shot attempts for.

Before we move on to defensemen let’s take a look at how Usage Ratio compares with shooting percentage and GF60.




As seen with CF60, Usage Ratio is correlated with both shooting percentage and GF60 and the correlation with GF60 is stronger than with CF60. Note that the sample size for 3 seasons (or 2 1/2 actually) of 5v5 tied data is about the same as the sample size for one season of 5v5 data (players in this study have between 500 and 1300 5v5 tied minutes which is roughly equivalent of how many 5v5 minutes forwards play over the course of one full season).

FYI, the dot up at the top with the GF60 above 5 is Sidney Crosby (yeah, he is in a league of his own offensively) and the dot to the far right (heavy defensive usage) is Adam Hall.

Now let’s take a look at defensemen.




There really isn’t much going on here and how a defenseman is used really does’t tell us much at all about their 5v5 stats (only marginal correlation to CF60). As with forwards, defensemen that we identify as being used in a defensive are not any better at reducing shots against than defensemen we identify as being used in an offensive manner.

To summarize the above, players who get more minutes when playing catch up are in fact better offensive players, particularly when looking at forwards but players who get more minutes when protecting a lead are not necessarily any better defensively. We do know that there are better defensive players (the range of CA60 among forwards is similar to the range of CF60 so if there is offensive talent there is likely defensive talent too), and yet coaches aren’t playing these defensive players when protecting a lead. Coaches in general just don’t know who their good defensive players are.

Still not sold on this? Well, let’s compare 5v5 defensive zone start percentage (percentage of face offs taken in the defensive zone) to CF60 and CA60 (for forwards) in 5v5 tied situations.


Percentage of face offs in the defensive zone is on the horizontal axis and CF60 is on the vertical axis. This chart is telling us that the fewer defensive zone face offs a forward gets, and thus likely more offensive face offs, the more shot attempts for they produce. In short, players who get offensive zone starts get more shot attempts.


The opposite is not true though. Players who get more defensive face offs don’t give up any more or less shots than their low defensive zone face off counterparts. This tells me that if there is any connection between zone starts and CF% it is solely due to the fact that players who get offensive zone starts are better offensive players and not because players who get defensive zone starts are better defensive players.

You might again be saying to yourself ‘the players who are getting the defensive zone starts they are playing against better offensive players so doesn’t make sense that their CA60 is inflated above their talent levels (which presumably is better than average defensively)?  This might be true, but if zone starts significantly impacted performance (as would be the case if that last statement were true), either directly or indirectly because zone starts are linked to QoC, then there should be more symmetry between the charts. There isn’t though. Let’s look at what these two charts tell us:

  1. The first chart tells us that players who get offensive zone starts generate more shot attempts.
  2. The second chart tells us that players who get defensive zone starts don’t give up more shots attempts against.

If zone starts were a major factor in results, those two statements don’t jive. How can one side of the ledger show an advantage and the other side of the ledger be neutral? The way those statements can work in conjunction with each other is if zone starts don’t significantly impact results which is what I believe (and have observed before).

But, if zone starts do not significantly impact results, then the results we see in the two charts above are driven by the players talent levels. Knowing that we once again can observe that coaches are doing a decent job of identifying offensive players to start in the offensive zone but are doing a poor job at identifying defensive players to play in the defensive zone.

All of this is to say, NHL coaches generally do a poor job at identifying their best defensive players so if you think that guy who is getting all those defensive zone starts (aka ‘tough minutes’) are more likely to be defensive wizards, think again. They may not be.


Sep 062013

I had first intended this to be a comment to Tyler Dellow’s investigation into Phaneuf and Grabovski shot totals for and against when they were on the ice together but once I started pulling numbers I decided it was important enough to have a post on its own and not get hidden in the comments somewhere. Go read Tyler’s post because it is a worthwhile read but he found that the when Grabovski/Phaneuf were on the ice together the Leafs were incredibly poor at getting shift with shots while good at having shifts where they gave up shots and it had very little to do with not getting multiple shots per shift or giving up multiple shots per shift at a higher rate.

This is helpful to know because it narrows the issue: the Leafs’ Corsi% last year with Grabovski/Phaneuf on the ice didn’t collapse because of a change in the rate at which multi-SAF and multi-SAA shifts occurred; it collapsed because the Leafs suddenly became extraordinarily poor at generating the first SAF and preventing the first SAA. If you’re blaming Korbinian Holzer or Mike Kostka or Jay McClement for this, you need to come up with a convincing explanation as to why their impact was felt in terms of the likelihood of the first shot attempt occurring, but not really on subsequent ones.

A lot of people blame Holzer or Kostka or McClement but I will present another (at least partial) explanation. Phaneuf and Grabovski’s numbers tanked because the Leafs were winning. Let me explain.

Here is a table of Phaneuf’s CF% over the last 4 seasons during various 5v5 situations: Tied, Leading, Trailing, Total. Note that part of 2009-10 season was with Calgary.

Tied Leading Trailing 5v5
2009-10 53.4% 44.3% 58.2% 52.3%
2010-11 46.5% 38.6% 54.7% 47.1%
2011-12 47.7% 44.3% 56.4% 49.9%
2012-13 39.6% 35.7% 55.4% 41.9%

In Tied and Overall situations Phaneuf’s numbers tanked quite significantly, particularly last season, but where it gets really interesting is in the Leading and Trailing stats. When Leading his stats dropped off a bit to 35.7% last year but he was at 38.6% in 2010-11 and was only 44.3% the other years so pretty bad all round. What’s interesting is his trailing stats have maintained significantly higher levels right through from 2009-10 through 2012-13 with relatively very little fluctuation (compared to leading and tied stats).

Now, let’s look at the percentage of ice time Phaneuf played in each situation.

Tied Leading Trailing
2009-10 41.2% 28.3% 30.5%
2010-11 31.9% 27.7% 40.4%
2011-12 33.5% 29.8% 36.6%
2012-13 32.9% 42.3% 24.8%

He played much more in tied situations in 2009-10 but maintained about the same the following 3 years. Where the big difference lies is in the percentage of ice time he played while leading and trailing. He played far more while leading last year and far less while trailing. When you combine this with the previous table, it isn’t a surprise that his corsi numbers tanked. If we took last years CF% and applied them to his ice time percentages of 2011-12 he’d have ended up with a CF% of 44.2% which is a fair bit higher than his actual 2012-13 CF% of 41.9%. This means about 29% (or 2.3 CF% points) of his drop off in CF% from 2011-12 to 2012-13 can be attributed to ice time changes alone. That’s not an insignificant amount.

As for the rest, I believe Randy Carlyle’s more defensive style of hockey compared to Ron Wilson’s is a significant factor. When leading teams play a more defensive game and we see above (and you’ll see with other players if you looked) when leading your CF% tanks compared to when trailing and playing offensive hockey. How much is Phaneuf’s drop off in CF% in 5v5 tied situations last year is due to Phaneuf being asked to play a far more defensive role?  Probably a significant portion of it.

When we take everything into consideration, the majority of Phaneuf’s drop off in CF% last year can probably be attributed to Leading vs Trailing ice time differences and being asked to play a far more significant defensive role in tied situations and probably only a very small portion of it can be attributed to playing with Holzer and Kostska or any change in quality of competition or zone starts (which I still claim have very little direct impact on stats, though they can be a proxy for their style of play, defensive vs offensive).

Now, let’s take a quick look at Grabovski’s stats.

Tied Leading Trailing 5v5
2009-10 58.0% 55.8% 56.1% 56.8%
2010-11 52.2% 49.8% 58.0% 53.6%
2011-12 52.8% 46.9% 59.2% 53.7%
2012-13 44.0% 38.2% 55.7% 44.3%

Much the same as Phaneuf. His 5v5 tied stats dropped off significantly but his trailing stats maintained at a fairly good level. His Leading stats have dropped off steadily since 2009-10, probably as he has been given more defensive responsibility.

Tied Leading Trailing
2009-10 38.6% 20.3% 41.0%
2010-11 33.3% 28.9% 37.8%
2011-12 33.5% 26.8% 39.7%
2012-13 32.2% 42.7% 25.1%

Nothing too different from Phaneuf. If anything more extreme changes in Leading vs Trailing. For Grabovski, 29.8% of his drop off in CF% last year can be attributed changes in Leading/Trailing ice time while I suspect a significant portion of the rest can be attributed in large part to Randy Carlyle’s more defensive game, and asking Grabovski to play a more defensive role in particular.

Now, how do the Leafs as a team look?

Tied Leading Trailing 5v5
2009-10 52.1% 48.0% 56.1% 52.8%
2010-11 46.1% 41.6% 54.0% 47.8%
2011-12 47.9% 42.1% 55.6% 48.9%
2012-13 43.8% 39.5% 52.2% 44.1%

The Leafs drop off in CF% is pretty even across the board. They lost 4.1% when tied, 2.6% when leading and 3.4% when trailing.  Interestingly that led to a 4.8% drop overall which kind of makes little sense until you look at their leading/trailing ice times.

Tied Leading Trailing
2009-10 37.2% 22.0% 40.9%
2010-11 33.6% 28.9% 37.5%
2011-12 33.7% 29.8% 36.5%
2012-13 33.1% 42.0% 25.0%

Tied ice time remained about the same last year as 2011-12 but leading ice time jumped from 29.8% to 42.0% while trailing ice time dropped from 36.5% to 25.0%. So, when we look at the Leafs as a whole and applied this years leading/trailing/tied CF% stats to last years  ice time percentages they would have only dropped from 48.9% to 45.6%. The remainder of the fall to 41.1% is due to changes in leading/trailing/tied ice times, or 30.8% of the drop off.

So, to summarize about 30% of the drop off in the Leafs team and individual CF% from 2011-12 season to last season can be directly attributed to changes int he Leafs leading/trailing/tied ice time percentages. This means 30% of the drop off can be attributed to the Leafs being a far better team last year at getting leads and winning games.  Or, if you believe that was largely due to lucky shooting you can say 30% of the Leafs drop off in CF% is due to good luck.

Although I haven’t explicitly proven it, I’ll contend that a significant portion of the remainder comes down to Randy Carlye being a far more defensive coach than Ron Wilson was. Maybe another day I’ll test this theory by looking at someone like Phil Kessel and see how his stats changed because Phil Kessel was not given a heavy defensive role last year like Phaneuf and Grabovski were and thus may not have seen the same drop off, particularly in tied situations (quick check: Kessel was 47.3 CF% in 5v5 tied situations in 2011-12 and 42.3% last year so he saw a significant drop off too but not as much as Phaneuf or Grabovski). It may also be interesting to look at how ice time changes impact shooting and save percentages and whether this partly explains the Leafs high shooting percentage last year and maybe what impact it had on their relatively decent save percentages too compared to previous years.

As you can see though, ice time changes can have a significant impact on a players statistics and it is important to take that into consideration in player evaluation like when I looked at Phaneuf’s leading/trailing stats a while back.

(All the stats in this post came from so feel free to go there, pull the data and analyze whichever team or player you want in leading/trailing/tied situations)

Aug 022013

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

  1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
  2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

Comparison 0710 vs 1013 Even vs Odd Difference
GF20 vs GF20 0.61 0.89 0.28
FF20 vs FF20 0.62 0.97 0.35
FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs GF20 0.54 0.86 0.33
GF20 vs FF20 0.44 0.86 0.42
FSh% vs GF20 0.48 0.76 0.28
GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs FSh% 0.38 0.66 0.28
FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

Comparison Even vs odd
GF20 vs GF20 0.82
FF20 vs FF20 0.93
FSh% vs FSh% 0.63
FF20 vs GF20 0.74
GF20 vs FF20 0.77
FSh% vs GF20 0.65
GF20 vs FSh% 0.66
FF20 vs FSh% 0.45
FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.


Jul 292013

Rob Vollman was nice enough to send me a PDF version of his book Hockey Abstract which I have spend a bit of time the past couple days looking over. I have not read it right through but have read over a few sections and skimmed through a good chunk of the rest of the book. I must say, if you are looking for a fairly readable, not math heavy, practical introduction to hockey statistics this is an excellent start. I like how Vollman uses statistics to answer very simple questions such as “Who is the best player?” and “Who is the luckiest team?” and in doing so explains why certain statistics are used and why some are not.

While I think Hockey Abstract is an excellent intro to hockey statistics and everything I read is useful and informative I think one of the most important paragraphs in the book is in the Introduction on page 1.

One of the most common and recurring criticisms of statistical analysis in hockey is that it isn’t comprehensive and foolproof, which is why I want to establish upfront that none of these answers are meant to be definitive. After all, in several cases this book will be the first serious attempt to answer certain questions this way. Plus, I love hockey arguments―I want to refuel the conversations, not end them!

One of the things that irritates me about some that use hockey statistics to make arguments is that they often write in absolutes. This players corsi isn’t very good therefore he is a bad player. That teams PDO is high therefore they are lucky and undeserving of their record. While there is ample evidence to suggest a poor corsi is evidence of a bad player or a high PDO is evidence of a lucky team suggesting the are absolutely true might be a false claim. There are a number of good players that have poor corsi statistics and there are teams with elevated PDO’s that achieve them through talent, not luck.

There is a lot that statistics can tell us about an NHL player or team, but there is still a lot we don’t (fully) know and can’t yet (fully) quantify. For example, I’ll argue that we can’t yet properly isolate an individuals contribution and talent from his line mates and the best we can do infer it by looking at a series of WOWY numbers or other statistics but it is far from fool proof. There is still a lot to learn about hockey stats and we need to fuel more discussions and research, not end them with absolute statements (from either side of the pro/anti stats fence).

So, with that in mind, Hockey Abstract is a great introduction to hockey analytics and  presents a good current view of much of what is currently known in hockey analytics. After reading Hockey Abstract I am certain one would be far more familiar with the hockey statistics and how, why and when we use them. I know I won’t have a problem recommending it to any of the numerous people who e-mail me asking for where they can get an intro to hockey statistics. My recommendation though would be to read it with a critical, but open, mind whether you are a big proponent of the value of hockey analytics, a skeptic, or somewhere in between.

Time permitting I’ll write a more detailed review and critique once I have finished reading it.

You can purchase a copy of Hockey Abstract from or a .pdf copy here.


Jul 162013

Last week I posted an article with a proposal to standardize the terminology and nomenclature we use for advanced hockey statistics. This was an attempt to solicit feedback as to whether such a standardization was necessary and if so get some feedback on my proposed terminology. The response at first was slow to trickle in but the responses that did were somewhat telling. Generally speaking, while not a ton of people had an opinion, the feedback on standardizing terminology and getting rid of names like Corsi, Fenwick and PDO was positive.






Interestingly, or maybe not, the biggest resistance to the change was from some of the more hard core advanced statistics people. From this group of people more of the feedback was more along the lines of to “it will sort itself out eventually” to “people attack the name corsi as a way to attack the stat itself”.


Upon challenging Eric T. more he said he was open to standardization but believed it would eventually happen one way or another so wasn’t worried about the details now. In other conversations, he referred back to baseball and how its stats are named.

There is a clear difference though. DIPS is Defense Independent Pitching Statistics. BABIP is Batting Average on Balls In Play. OPS is On base Plus Slugging percentage. Corsi is a shot attempt differential stat that includes shots, missed shots and blocked shots and is named after an obscure former NHL goalie who used a similar metric to evaluate the work load a goalie experiences in a game during his time as a goalie coach for the Buffalo Sabres. Yes, you need an explanation of what the baseball stats mean but once you get that explanation you say “ahhh, it makes sense.” Down the road you see BABIP the name itself is a reminder of what the definition of it is. It is a lot easier to remember BABIP and what it stands for than it is to remember PDO and what it stands for. Now, this may not seem that difficult for someone who speaks and works with the terminology daily, but for the casual dabblers in advanced stats that might only occasionally read an article referencing advanced stats it becomes more difficult. There are no clues within PDO to trigger a memory response to recall it’s definition and this is even more difficult for Corsi and Fenwick where the differences are subtle. I have joked in the past that PDO stands for Pretty Damn Obfuscated because, well it is, and yet the stat itself is conceptually trivial to understand and calculate from it’s component parts.

Some of the arguments I have read on the resisting changes side of the equation essentially translates into “the onus is on others to learn our terminology and not on us to make it easier for them to learn” and “if they cannot be bothered to look up what a term means they probably won’t be bothered to understand what the stat says.” In some cases there may be some merit to these arguments but they also come across as being somewhat arrogant or elitist in nature. ‘It is not our duty to make it easy for others to learn it is their duty to take the time to learn what we do.’ It is probably not deliberate and we probably all do this sort of thing with respect to our own “fields of expertise” whatever they may be but that doesn’t mean it is right. We need to remind ourselves that being more accommodating and understanding of newcomers and casual observers is only going to benefit the field over the long haul.

I myself took offense to Eric T’s article “Steering advanced regression tools towards modern hockey thought” because when I hear the idea of steering other peoples work to a specific way of thinking it smacks of the same sort of elitism and arrogance (i.e. there being a “right way” and a “wrong way” to think of hockey analytics and one must conform to the right way). I am sure this was not deliberate in any way on Eric’s part but it it was made worse by the fact there was no effort to understand and critique the work that was actually done, only to point out what was not done up to “modern hockey thought” standards (which are not clearly published anywhere, more on this later). With that said, in the “Modern Hockey Thought” post Eric did write one thing that I think we can work off of:

I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.

I am not well versed in the history of baseball analytics but if this is occurring in the hockey analytics world, one must look into the reasons why. The truth is, those of us within the hockey analytics community have done a terrible job at removing barriers to entry regardless of how small those barriers to entry may seem. It starts with standardizing terminology and making terminology more understandable but the hockey analytics community has done a terrible job of making its research easily accessible. There is no well organized and maintained glossary of statistics and definitions. There is no well organized and maintained list of important papers and articles. There is no single place that one can go to get an up to date description of the state of hockey analytics, of what we know and what we don’t know, what we agree on and where differing opinions exist. We expect everyone to be up to speed on the current state of hockey analytics but don’t seem to want to put any effort into making it possible for people to do so. There was an attempt by Eric last summer on to document the current state of hockey analytics and create a directory of important articles but it wasn’t finished and what was done isn’t easily found. If you want to learn about hockey analytics google is really your only resource but that can lead you in the wrong direction just as easily as the right direction and more likely than not lead to people giving up. As a hockey analytics community we need to be better at organizing the knowledge we have and making it far more accessible and it’s unfortunate that my effort to do so with regards to standardization of statistic naming conventions was mostly met with a big “meh, whatever” by the analytics community, or even worse ‘it means more work for us’.



I am sure I will get criticized for some of the comments I have made here but honestly, I don’t care (and my critiques go beyond just the couple people I mentioned here). If you feel you have been unfairly critiqued feel free to write your complaints in the comments. You can have your say, just keep them on topic and don’t expect a response from me. I have made my point and I know that there are others that agree with it.  I also regularly get people e-mailing me asking me for places they can go to get an intro to advanced hockey statistics and while I try to send them useful links I also know that they barely suffice. We need to do better and as such I will be going ahead with the following terminology changes when I update my stats site and I hope others will follow along. If we can’t even come together to reach an agreement on standardizing terminology there is no hope that we will be able to overcome that far more difficult challenges we face in making hockey analytics more accessible to anyone with an interest.

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SoG Shots on goal
SA Shot Attempts (includes missed and blocked shots, formerly a corsi event)
UBSA UnBlocked Shot Attempts (does not include blocked shots, formerly a fenwick event)
Percentage Statistics
Sh% Shooting percentage (G/S)
SASh% Shot Attempt Shooting percentage (G/SA)
UBSA-Sh% Unblocked Shot Attempt Shooting percentage (G/UBSA)
Sv% Save percentage (GA/SA)
SASv% Shot Attempt Save percentage (GA/SAA)
UBSASv% Unblocked Shot Attempt save percentage (GA/UBSAA)
SPS Save Plus Shooting (percentages)
SASPS Shot Attempt Save Plus Shooting (percentages)
UBSASPS Unblocked Shot Attempt Save Plus Shooting (percentages)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISP Individual Shot Percentage (iS / SF)
ISAP Individual Shot Attempt Percentage (iSA/SAF)
IUBSAP Individual Unblocked Shot Attempt Percentage (iUBSA/UBSAF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics (i.e. a +/- statistics)

The major changes I made were to use “SA” (shot attempts) for corsi events and “UBSA” (unblocked shot attempts) for fenwick events instead of ASAG (attempted shots at goals) and SAG (shots at goal) in my previous iteration. This should make things a little clearer than my first proposal.  Update: Changed Sv+Sh% to SPS (Save Plus Shooting percentages).


Jul 102013

One of the complaints against advanced statistics in hockey is the names of some of the advanced statistics. Sometimes people complain about names like Corsi, Fenwick, PDO, etc. because they don’t have meaningful names. I never really understood it because once you figure it out, which honestly it isn’t that difficult, it isn’t all that difficult. That said, it still seems that some people feel it is a bit of a hurdle for some to get into advanced hockey statistics. I am hoping to revamp and improve my hockey statistics database even more this summer and in the process I wondered if there is interest in having me use some standardized hockey statistics nomenclature that we can all agree on. Here is what I am proposing:

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SOG Shots on goal
SAG Shots at goal (includes missed shots)
ASAG Attempted Shots at Goal (includes missed and blocked shots)
Percentage Statistics
Sh% Shooting percentage (G/SoG)
SAGSh% Shots at goal shooting percentage (G/SaG)
ASAGSh% Attempted Shots at Goal Shooting percentage (G/aSaG)
Sv% Save percentage (G/SoG)
SAGSv% Shots at goal save percentage (G/SaG)
ASAGSv% Attempted Shots at Goal Save percentage (G/aSaG)
ShSv% Shooting percentage + save percentage (Sh% + Sv%)
SAGShSv% Shots at goal shooting percentage + save percentage (SAGSh% + SAGSv%)
ASAGShSv% Attempted Shots at goal shooting percentage + save percentage (ASAGSh% + ASAGSv%)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISOGP Individual Shots on Goal Percentage (iSOG / SOGF)
IASAGP Individual Shots at Goal Percentage (iSAG / SAGF)
IASAGP Individual Attempted Shots at Goal Percentage (iASAG / ASAGF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics

The major changes are instead of calling shots + missed shots fenwick events we call them Shots At Goal (SAG) and instead of calling shots + missed shots + blocked shots corsi events we call them Attempted Shots At Goal (ASAG). Also PDO which is shooting percentage + save percentage is now named ShSv%.

The prefixes and suffixes can be added to individual stats to create new statistics. For example:

  • iSh% = Individual Shooting Percentage (iG / iSOG)
  • TMSAG20 = Team mate average Shots at Goal per 20 minutes of ice time weighted by TOI with
  • OppGF% = Opponent average Goals For Percentage weighted by time on ice against
  • PctTmG = In games that the player played in, the percentage of his teams goals that the player himself scored.

Note that not all combinations of prefixes and suffixes make sense. For example, PctTmSh% or Sh%F but that is self explanatory I think.

What does everyone think? I am perfectly fine sticking with the way I have statistics currently presented but if the majority think something along the lines of the above is better I am all for making the change. If anyone has any other suggestions they are welcome as well. I just think that this is as good a time as any to come up with some standardized nomenclature.

Also, I currently have statistics for the following situations:

  • 5v5
  • 5v5 Home
  • 5v5 Road
  • 5v5 Close
  • 5v5 Tied
  • 5v5 Up1
  • 5v5 Up 2+
  • 5v5 Down 1
  • 5v5 Down 2+
  • 5v5 Leading
  • 5v5 Trailing
  • 5v4 PP
  • 4v5 SH
  • Zone start adjusted data for all of the above except 5v4 SH and 4v5 SH.

If there is interest I may consider adding other situations. For example, first period, second period, third period, 4v4, 5v5 close home and 5v5 close road. Would anyone find these or any other situation interesting to look at?

Also feel free to consider the comments of this post the place where you can officially make any other suggestions of upgrades/enhancements you would like to see made to I can’t make any promises I will do implement them but I hope to make some upgrades over the summer.

Update:  Added ‘D’ to the suffix list which stands for differential. So ASAGD would stand for Attempted Shots At Goal Differential which is the equivalent of corsi differential in use now. Might consider adding Rel but need to consider if it is necessary or not. Thoughts?


Jul 052013

Unfortunately I didn’t have as much time this week as I had hoped to do a full evaluation of unrestricted free agent centers like I did for wingers but it is free agent day and there was some big news regarding centers yesterday with the buy out of Grabovski so I thought I’d throw a little something together where I look at some offensive statistics of some of the top centers available. Let me start off by presenting you with the summary table.

G/60 A/60 Pts/60 IPP GF20-TMGF20 FF20-TMFF20 OZBias
Ribeiro 0.593 1.512 2.11 80.5 0.113 -0.025 102.6
Filppula 0.769 1.334 2.1 75 0.116 -0.878 104.7
Lecavalier 0.799 1.186 1.99 68.1 0.139 0.381 100.7
Grabovski 0.899 0.961 1.86 65.4 0.196 2.406 96
Roy 0.587 1.146 1.73 67.4 0.039 0.747 98.7
Weiss 0.652 0.821 1.47 65.6 0.07 -0.467 103.3
Bozak 0.566 0.775 1.34 54.2 -0.062 0.292 99.8

The numbers above are 5v5 numbers over the past 3 seasons and the players are sorted by Pts/60. I threw in Lecavalier because he was a UFA for a brief period of time and is at more or less the same level as the others. I included Bozak to highlight just how much he doesn’t fit in with the rest of the group.

  • G/60 = Goals per 60 minutes of ice time.
  • A/60 = Assists per 60 minutes of ice time
  • Pts/60 = Points per 60 minutes of ice time.
  • IPP = Individual Points Percentage, or the percentage of goals scored while on ice that the player had a point on.
  • GF20-TMGF20 = How much better are his team mates on-ice goal stats when playing with him than without.
  • FF20-TMFF20 = How much better are his team mates on-ice shot generation when playing with him than without.
  • OZBias = OZ Starts*2 + NZStarts and gives an indication of the players usage.

List sorted by G/60: Grabovski, Lecavalier, Filppula, Weiss, Ribeiro, Roy, Bozak

List sorted by A/60: Ribeiro, Filppula, Lecavalier, Roy, Grabovski, Weiss, Bozak

List sorted by Pts/60: Ribeiro, Filppula, Lecavalier, Grabovski, Roy, Weiss, Bozak

List sorted by IPP: Ribeiro, Filppula, Lecavalier, Roy, Weiss, Grabovski, Bozak

List sorted by GF20-TMGF20:  Grabovski, Lecavalier, Filppula, Ribeiro, Weiss, Roy, Bozak

List sorted by FF20-TMFF20: Grabovski, Roy, Lecavalier, Bozak, Ribeiro, Weiss, Filppula

Some comments on each player:

Mike Ribeiro: Easily the best play maker of the group and is most consistently involved in the play.

Valterri Filppula: Better goal scorer than Ribeiro but not as good as a play maker as Ribeiro but better than the rest.

Vincent Lecavalier: Similar to Filppula in value but better at the possession game.

Mikhail Grabovski: Not a great play maker but a good finisher and good at driving shot generation indicating he is probably good at puck retrieval.

Derek Roy: Kind of a poor mans Ribeiro but much less valuable.

Stephen Weiss: More of a poor mans Lecavalier. Easily had the worst line mates of the group and might do better in a different situation.

Tyler Bozak: Weak at goal scoring, bad at play making, not involved in the play and a drag on his team mates goal production. Not anywhere close to the same league as the others (and maybe be better suited for a different league too).

For me, Ribeiro is probably the best of the group in terms of pure offense because of his elite play making ability. Grabovski and Lecavalier are a little more balanced with better scoring and puck retrieval skills while Filppula is pretty solid all round as well and has the flexibility of being used as either a center or a winger (which is valuable if locking in long-term). It’s difficult to compare Weiss to the rest because he simply hasn’t had near as good of line mates but it is probably safe to say he’d be a bit of a step down from Grabovski, Lecavalier or Filppula. Roy, on the other hand, would definitely be a step back but still a decent consolation prize if on a lower priced contract with shorter term. Definitely not anything more than a #2 center though.

As for Bozak, well, you simply don’t want him on your team. Maybe not at any price no matter what the bargain basement price is. I have tried and tried but I just can’t find any redeeming qualities for him outside of his ability to win face offs which has limited value. There simply is no reason why you would want to play him on any of your top 3 lines. None.

Being a Leaf fan and unable to keep Grabovski, my preference would be Ribeiro or Filppula but might be willing to take a chance on Weiss if the contract was right. Ribeiro’s play making skills with the Leafs wingers should be a good combination and Filppula is a good all round player who could shift to wing down if needed. Weiss seems like a solid 2-way player who might be able to step up his game with better line mates which he’d get with the Leafs. If they sign Bozak, I am not sure what I’ll do. It’ll be a sad day.


Jun 202013

This years free agent class is a relatively thin one, pending compliance buy outs of course, but there are a handful of good players that could be hitting the unrestricted free agent market this summer. Today I’ll take a look at the wingers.

In total I identified 15 wingers that I would consider quality NHL regulars. These are in no particular order Nathan Horton, Viktor Stalberg, Ryane Clowe, mason Raymond, Clarke MacArthur, Patrick Elias, David Clarkson, Dan Cleary, Pascal Dupuis, Brad Boyes, Alexei Ponikarovsky, Jarome Iginla, Michael Ryder, Bryan Bickell, and Matt Cooke. I have omitted from the list Teemu Selanne and Daniel Alfredsson since if they do return it will almost certainly be with the Ducks and Senators respectively. I have also omitted Damien Brunner because he doesn’t have enough of a track record as I am looking at 3 seasons of data in my statistical evaluation. I have also omitted Jaromir Jagr because, well, for some reason I forgot to include him and couldn’t be bothered to go back and plug him into all the tables. He still has some value, but I am not sure how significant it is.

(Note that unless mentioned otherwise, the stats below are 5v5 stats over the past 3 seasons)

Offensive Evaluation

In order to attempt to isolate a players offensive production from their team mates one think I like to do is compare their own on-ice stats with the on-ice stats of their team mates when they are playing apart from him. To do this I took each players FF20 and GF20 and divided by teammate FF20 and teammate GF20 respectively. Here is how the wingers stack up against each other.

Winger FF20/TMFF20 Winger GF20/TMGF20
Viktor Stalberg 1.180 Patrick Elias 1.358
Nathan Horton 1.138 Nathan Horton 1.343
Ryane Clowe 1.087 Jarome Iginla 1.290
Mason Raymond 1.083 Pascal Dupuis 1.188
Clarke MacArthur 1.076 Viktor Stalberg 1.124
Patrick Elias 1.074 Michael Ryder 1.116
David Clarkson 1.066 Clarke MacArthur 1.111
Dan Cleary 1.049 Ryane Clowe 1.075
Pascal Dupuis 1.048 Bryan Bickell 1.058
Brad Boyes 1.044 Brad Boyes 1.042
Alexei Ponikarovsky 1.018 Mason Raymond 1.037
Jarome Iginla 1.017 Matt Cooke 0.962
Michael Ryder 0.999 Alexei Ponikarovsky 0.896
Matt Cooke 0.917 Dan Cleary 0.892
Bryan Bickell 0.896 David Clarkson 0.874

Based on the above lists you’d probably have to rank Horton, Stalberg and Elias the top 3 with MacArthur and Clowe not far behind while Cooke, Ponikarovsky and Bickell don’t look so good in comparison. Those are on-ice stats though, how do their individual stats look in comparison.

Winger G/60 Winger Points/60
Nathan Horton 1.111 Pascal Dupuis 2.28
Jarome Iginla 0.987 Nathan Horton 2.22
Pascal Dupuis 0.985 Jarome Iginla 2.09
Viktor Stalberg 0.964 Viktor Stalberg 2.03
Michael Ryder 0.941 Patrick Elias 2.01
David Clarkson 0.846 Michael Ryder 1.99
Clarke MacArthur 0.802 Clarke MacArthur 1.97
Bryan Bickell 0.779 Bryan Bickell 1.86
Matt Cooke 0.743 Brad Boyes 1.70
Dan Cleary 0.722 Ryane Clowe 1.70
Patrick Elias 0.700 Matt Cooke 1.69
Mason Raymond 0.645 Dan Cleary 1.69
Ryane Clowe 0.610 Mason Raymond 1.68
Brad Boyes 0.544 David Clarkson 1.28
Alexei Ponikarovsky 0.462 Alexei Ponikarovsky 1.20

Horton, Dupuis, Iginla, Stalberg dominate the top 4 spots on both lists while Ponikarovsky trails both lists. Individual stats are heavily influenced by quality of line mates and one measure I like to look at is the percentage of goals that their team scores when they are on the ice that they scored themselves (IGP) or had a point on (IPP). The higher the percentage the more integral the player is to his teams offense when he is on the ice.

Winger IGP Winger IPP
David Clarkson 50.7 Patrick Elias 82.1
Jarome Iginla 35.6 David Clarkson 76.7
Viktor Stalberg 34.9 Bryan Bickell 75.5
Michael Ryder 33.9 Jarome Iginla 75.2
Nathan Horton 33.1 Clarke MacArthur 73.5
Bryan Bickell 31.6 Viktor Stalberg 73.4
Dan Cleary 31.2 Dan Cleary 73.1
Pascal Dupuis 30.5 Michael Ryder 71.8
Matt Cooke 30.5 Ryane Clowe 70.9
Clarke MacArthur 29.9 Pascal Dupuis 70.8
Patrick Elias 28.6 Brad Boyes 69.9
Mason Raymond 26.4 Matt Cooke 69.5
Ryane Clowe 25.5 Mason Raymond 69.0
Alexei Ponikarovsky 25.0 Nathan Horton 66.1
Brad Boyes 22.3 Alexei Ponikarovsky 64.7

David Clarkson didn’t look so good in previous lists but when he is on the ice he is a major contributor to the teams offense. Put him with some better offensive players and it is possible he could significantly boost his offensive production. The same can probably be said for Bryan Bickell who has been given more ice time on the Blackhawks top lines these playoffs and he has produced well above his regular season rates. He could be a good bargain pickup for a team who could get good production from him as a second line winger.

Defensive Evaluation

Defensive evaluation is much tougher than offensive evaluation and I think in general wingers are the least important position as far as team defense goes. The best way to evaluate a player defensively is compare their on-ice stats with their team mates. Similar to what I did above with FF20 and GF20 I looked at TMFF20/FF20 and TMGA20/GA20.

Winger TMFA20/FA20 Winger TMGA20/GA20
Alexei Ponikarovsky 1.150 Alexei Ponikarovsky 1.206
Patrick Elias 1.122 Clarke MacArthur 1.174
Clarke MacArthur 1.083 Brad Boyes 1.150
David Clarkson 1.069 David Clarkson 1.097
Viktor Stalberg 1.063 Bryan Bickell 1.086
Nathan Horton 1.052 Pascal Dupuis 1.078
Ryane Clowe 1.038 Viktor Stalberg 1.003
Matt Cooke 1.005 Patrick Elias 0.976
Bryan Bickell 1.001 Michael Ryder 0.954
Brad Boyes 0.996 Matt Cooke 0.948
Dan Cleary 0.973 Jarome Iginla 0.937
Michael Ryder 0.971 Ryane Clowe 0.933
Jarome Iginla 0.953 Dan Cleary 0.879
Mason Raymond 0.951 Mason Raymond 0.858
Pascal Dupuis 0.918 Nathan Horton 0.830

Ponikarovsky, MacArthur, Clarkson seem to be the best in the class here with Raymond, Cleary, and Iginla probably trailing the pack overall.

Overall Evaluation

There is nothing too scientific in this but if I had to rank the wingers in terms of value this is how I would rank them, with probably more emphasis on offensive value.

  1. Iginla – Perfect for a team close looking for some help over the next couple seasons.
  2. Clarkson – I am surprised I am ranking Clarkson over Horton but he comes out ahead in more categories and may come cheaper. I’d still be cautious about over paying but he has scored a bunch of goals on a bad offensive team so that is good.
  3. Horton – I really like Horton but injuries have to be a concern and he’ll likely demand a big contract. He is a first line guy though and would be a big addition to any team. Has a longer track record than Clarkson too so less risky (health issues aside).
  4. MacArthur – Good all-round winger ideal for a second line role or as a secondary player on a first line.
  5. Elias – Age is starting to show but still very solid. Probably stays in New Jersey on short term deal.
  6. Stalberg – Not quite as proven against top competition as MacArthur but similar potential.
  7. Ryder – All he seems to do is score goals and still can be a 30 goal guy if given top line duty. Less rugged version of Clarkson.
  8. Dupuis – Likely stick in Pittsburgh and continue benefiting from playing a bunch on Crosby’s wing.
  9. Bickell – Probably worth taking a gamble on and playing in a second line role. Might be a 20 goal, 50 point guy in that role.
  10. Cooke – More useful for his PK skills. Decent 3rd line guy but limited offense
  11. Boyes – Decent offensive depth guy if on a good value contract. Probably re-signs with Islanders as he probably has more value to them than anyone else. Probably gets more (and higher quality) ice time than he deserves.
  12. Cleary – Not as productive as he was a few years ago but still has some value as a 2nd/3rd line winger.
  13. Clowe – Probably best as a 3rd line guy you hope you can get some toughness secondary offense from.
  14. Raymond – From afar he seems like the guy you always hope can be more but never is.
  15. Ponikarovsky – He kind of like Cooke minus the agitator/cheap shot track record. Solid defensive 3rd liner at this point in his career.


Jun 182013

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

Total Minutes GF20 vs GF20
>500 0.79
>1000 0.85
>1500 0.88
>2000 0.89
>2500 0.88
>3000 0.88
>4000 0.89
>5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

Total Minutes FSh% vs FSh%
>500 0.54
>1000 0.64
>1500 0.71
>2000 0.73
>2500 0.72
>3000 0.73
>4000 0.72
>5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

GF20 predictive power 3yr vs 3yr 0.61
True Randomness Estimate 0.11
Unaccounted for factors estimate 0.28
Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

Total Minutes GF20 vs GF20 FSh% vs FSh%
>500 0.81 0.58
>1000 0.86 0.68
>1500 0.88 0.71
>2000 0.89 0.73
>2500 0.89 0.73
>3000 0.90 0.75
>4000 0.90 0.73
>5000 0.89 0.71