Predictive Analyics have failed Hockey Analytics

A few weeks ago I wrote about the limitations of predictive analytics. Go read that if you haven’t. Here are some areas where predictive analytics has failed hockey analytics.

Gabe Desjardins used predictive analytics and suggested one must regress on-ice Shooting Percentage 80% towards the mean.

I found that player on-ice shooting regressed 80% to the mean…

 

Eric Tulsky used predictive analytics and suggested one must regress on-ice shooting percentage 67% towards the mean.

… our best guess for what he’d do in the future would be to take that number and pull it back 67% of the way towards the average shooting percentage.

In both those cases they were pulling on-ice shooting percentage back towards the league-wide average shooting percentage.

Both of these “regress to the mean” conclusions were based on predictive analytics and if you accepted these you would rightfully conclude that a players ability to impact shooting percentage is a relatively minor component of a players offensive value. Shot generation would be far more significant. However one could, as I did, just look at long-term on-ice shooting percentages and see that they don’t all congregate around 7.8 +/- 0.5% as 80% or 67% regression to the mean would suggest.

However, later Eric Tulsky modified his approach and found that regressing to a league-wide mean wasn’t ideal and instead regressed towards a player-specific mean which varied from player to player based on their ice time. This mean varied from approximately 6.2% to 9.7%.

… we would estimate that a forward who averages 15 minutes of 5v5 ice time per game would have a ~9.7% on-ice shooting percentage, and that his observed performance in a non-infinite sample should be regressed towards that value.

Regressing a player 67% towards 9.7% is significantly different than regressing a player towards 7.8% (approximate league-wide average depending on season).

The initial regress to the mean estimates that resulted from predictive analytics massively over-regressed on-ice shooting percentage.

In an article by Tom Awad on “What makes good players good” Tom found that shooting percentage (independent of shot location which Awad calls shot quality), or what he called finishing ability, was a significant component of player value, at least on-par with shot quantity.

The unmistakable conclusions from this table? Outshooting, out-qualitying and out-finishing all contribute to why Good Players dominate their opponents. Shot Quality only represents a small fraction of this advantage; outshooting and outfinishing are the largest contributors to good players +/-.

This was a non-predictive study that grouped players based on ice-time and essentially found that ice time was a predictor of shooting percentage. Eric Tulsky found this too and eventually chose to regress to a mean shooting percentage reflective of ice time after 3 years of regressing 80% or 67% towards league-wide mean debates.

(Note: Another takeaway from the Awad article is that shot location, which is a core component of xGF models, is only a tiny contributing factor to making players better. This is a large reason why I don’t buy into xGF models. xGF just doesn’t improve things significantly.)

So when someone tells me that on-ice save percentage is not predictive or persistent and thus not a player controllable talent I refuse to accept that as proof. Predictive analytics failed with shooting percentage and they are failing with save percentage too. Instead we could consider a non-predictive investigation or look at how players in certain roles influence save percentage and see that players can influence save percentage. Even score effects (which everyone accepts) tell us that players can influence save percentage (when defending a lead, save percentages rise).

Over-reliance on predictive analytics has largely failed hockey analytics.

 

This article has 1 Comment

  1. I use machine learning and love it when applied to the NHL. With the limited data I was able to access I was able to predict the SC winner without seeing a single game all season.
    How it is put together is what counts.

Comments are closed.