Commentary on Hockey Analytics and Player Evaluation Models

Every time I see a chart like this being tweeted out I cringe.


I cringe because it gets a lot of attention. I cringe because it gets a lot of likes and retweets and accolades. I cringe because people put these kinds of stats on a pedestal, since they are presented as easy to understand numbers that are nicely visualized. I cringe because the actual statistics is largely misleading.

Is Klefbom really the second best player in the NHL? No! Is he even remotely close? From the chart it seems Klefbom’s strength is even strength defense however we find that he was 6th among Oilers defensemen in GA60 or if you prefer 4th in CA60. Among the 169 NHL defensemen with 750 minutes of 5v5 ice time Klefbom ranks 151st in GA60 Rel and 84th in CA60 Rel. This is very consistent with his performance in previous years too. GAR aside, there is little evidence he is a good defensive defenseman. The chart says this is supposed to represent player contribution to goals above replacement. I have to wonder, in what way is Klefbom so good at contributing to reducing goals against when he is on the ice for so many goals (and shots) against?

Anze Kopitar probably had the worst NHL season of his career with just 12 goals and 52 points. Those 52 points are good for just 89th in the NHL? Among Kings forwards he ranked 8th in 5v5 GF% and 9th in 5v5 CF% . Despite this he manages to be the 4th best player in the NHL last season at contributing to goals above replacement? Something doesn’t seem quite right.

Don’t even get me started on Nick Foligno again. He just isn’t a top 10 forward in the NHL.

My problem with these lists is they are presented with a level of confidence that they do not deserve. People jump in and start promoting Klefbom as one of the top defensemen in the NHL. He just isn’t. It is very misleading.

I don’t just pick on GAR either. A week or so I had issue with @web_sant on his player evaluation model.

Whenever I see someone come up with a new model the first thing I do is look at the results and ask if this makes sense. Even before I look at the underlying math. I know this offends some but if the results don’t make sense I am not going to waste very much of my time on the underlying math except to try and understand when and why it fails. Furthermore, if you are selling your model as a better way to make hockey decisions you can’t expect NHL GMs to know everything about the underlying math. You have to sell it on the merits of the result evaluations. For this you don’t need to know the underlying math to know that if Kopitar is rated 83 one year and 51 the next the model is probably evaluating results, not player talent. Player talent should not fluctuate like that from year to year. I also know that if Klefbom is ranked highly for his defense and yet he is regularly among the worst on his team in goals against that there is a good chance that something is not right with the model. Sure, we can expect some surprise evaluation in any player evaluation model but they shouldn’t be abundant and counter to most other evidence. They still need to be explained.

A common critique towards me when I point out these failures is that I am cherry picking and that on the whole the model works reasonably well. Maybe, but how do I know when it is working well and when it is not? Should I trade Klefbom or Ruhwedel for Karlsson because they are just as good on a far cheaper contract? The models says so.

I should also point out that cherry picking has been a significant part of  the hockey analytics communities critique of ‘traditional’ player evaluation for years. Whether it is the Avalanche or David Clarkson or Dave Bolland  the analytics community loves to point out when ‘traditional’ hockey people make mistakes. It is easy to say that analytics would have helped a team avoid the Bolland contract, but we can’t also ignore when player evaluation models suggest that trading Ruhwedel for Karlsson might be a good idea. It is easy to point out where analytics would have made a wiser decision, it is far more difficult to prove that analytics is better overall than traditional decision making processes.

Those that have been in the hockey analytics community for a long time may find this strange but I actually think Tyler Dellow has been doing some good stuff and making some good points lately. I think this because he is delving into the data to try and learn some new things about hockey works. Not all of it I agree with and not all of it will end up being beneficial but it is still useful to do the digging. He had some particularly interesting comments on QoT and QoC earlier this week (and I followed with a few thoughts of my own as well). I have seen and participated in a lot of debates about QoC and QoT and the more I look at things I wonder if we really know how these factor into a players performance. I think a lot of the flaws we see come out of the models probably comes with how poor we are able to account for QoC and QoT and the roles players get used in. We definitely need more investigation here and so I’ll applaud Dellow and doing the exploratory work. There is too little of this done in hockey analytics.

I want to wrap up this post with a few summary points.

  1. Developing models are great but we must remember it isn’t sufficient to develop the best statistical model we can, we must develop a statistical model that an NHL team could integrate into their decision making process to make better decisions overall.
  2. In general player production is largely not a function of their own talent. The team the player plays on, the line he plays on, the amount of ice time the coach gives him, who the coach lines him up against all matter at least as much as the players actual talent level. This is what makes hockey analytics so difficult compared to baseball which is a far more individual sport. Outcomes are not individual driven. They are situation driven. Extracting individual player talent from outcome data is really really difficult in hockey.
  3. Because of #2 it means using predictability as a primary goal of player evaluation models may not be wise. If situations change we should expect outcomes to change. If we optimize models for predictiveness we may just be optimizing for players that don’t change situations. At minimum it is useful to look at predictiveness of just players that change teams.
  4. Individual player talent will not change much from year to year. Yes, age curves and injuries will change player talent over time but generally a very talented player last year will be a very talented player this year. For this reason any player evaluation model should be be evaluated based on how persistent it is from year to year (i.e. Gardiner should not be a 47 one year and a 94 the next).
  5. Models are great and everyone would love a single number metric that tells us how good a player is however at the same time we should never stop digging into the data to find new and interesting relationships. Rosters should never ever get built using single number player evaluations. We will always need to build teams with the right mix of defenders, playmakers, shooters and grinders and single numbers will never tell you those details. To me single number player evaluations are for fans more than NHL front offices.


This article has 4 Comments

  1. Fantastic article.

    To test metric, I like using half-season, comparing first half of a season to second half, as even less changes in that time as it does year to year. Year over year, teams can change a lot, and player usage and player himself more likely to change.

  2. As for Klefbom, in my own tracking, he’s made a high number of major mistakes on Grade A scoring chances against two years in a row, though he came on like the real Tim Horton in the last month and playoffs, playing a far more aggressive and heady defensive game (which was reflected in him making far few such mistakes).

    Klefbom is a fine puck mover, and has been for some time now.

    If that end-of-season improvement was real, and if he can consistently maintain that level of play, he has a chance to become an honest to goodness No. 1 dman for his team, something the Oilers have lacked for years. On his own team, Sekera was a better, more consistent player this year on defence, in my opinion, as was Larsson, all things considered. Larsson was far more effective on defence, even as he was often left to clean up messes that came from Klefbom’s side of the ice.

    Klefbom, based on his inconsistent play last year, would likely be in the Top 60 NHL dmen, but that’s just a crude estimate of course.

    1. Agreed. He has some puck moving ability but has not great defensively. I personally think that limits him to second pairing but if he can become an average defensive player maybe he’ll be a #2 guy.

    2. Yeah, this is where shot metrics fall short. Defensive acumen as measured by metrics is about high quality scoring chance reduction and limiting multi-high quality chances per opposition zone sortie. The best and worst players at this all allow plenty of shots against, it’s the context that matters. I hope there’s a site that tracks HDSCA individually next season.

Leave a Reply

Your email address will not be published. Required fields are marked *