Limitations of Predictive Analytics in Player Evaluation

Hockey analytics generally revolves around identifying what statistics are best at predicting future events, or “out of sample” events more generally. The main reason that Corsi has become such a popular statistic in hockey analytics is because it is more predictive of future events. We look at score and venue adjusted Corsi because we have found that they are more predictive. Many people have developed “expected goals” models and are adoping them because they have been found to be more predictive than Corsi.

On the flip side of things, hockey analytics once largely dismissed on-ice shooting percentage as an important talent due to its relative poor predictive ability which led to many people concluding shooting percentage would need to be regressed to the mean by such a large amount it is relatively useless.While there is more acceptance of on-ice shooting percentage as a talent now than 5+ years ago it’s importance is still vastly downplayed in player evaluation in favour of Corsi or expected goals metrics.

Most of the same arguments against the value of shooting percentage have now transferred over the an ability of players to influence save percentage. It is not predictive, therefore it is not a talent (or at least not important in player evaluation) the claim goes.

Needless to say, hockey analytics is largely obsessed with predictive statistics. There are lots of reasons to want to be better at predicting the future however I think we need to be careful with how obsessed we get about it when it comes to player evaluation. I have previously written about the difference between predicting the future and evaluating the past. In that I basically pointed out that predicting the future and player evaluation have subtle differences that we must remember.

The methodology you use to regress data matters a whole lot depending on what your goal is. Let me present you two scenarios:

  1. You play fantasy sports and you want to predict what a players statistics will be in future seasons for your keeper league.
  2. You are a general manager of an NHL organization and you want to evaluate the players current talent to decide whether to trade for them for your playoff run.

These two scenarios lead to two related, but very different, questions. The first is, what statistics can I expect this guy going to put up in the future. The second is, how good is this guy right now. Those questions may sound similar but they must be answered in different ways.

My interest in hockey analytics is primarily on evaluating players from a general managers perspective, not trying to predict future performance from a fantasy sports perspective (though I have done that too). So with that in mind I want to demonstrate how focusing on predictability in player evaluation will necessarily lead one to under estimate the importance of variables that suffer from small sample sizes (shooting and save percentages) in favour of ones that do not (shot metrics).

Comparing On-ice Sh% Talent vs Modelled Observed On-ice Sh%

To accomplish this I want you to imagine an ideal world where we have 300 hockey players and we know exactly what their shooting percentage should be if not for randomness and luck. In hockey analytics we normally have results and are trying to estimate talent but in this ideal world we know talent and I am going to model results. By doing this we will better be able to understand the relationship between talent and results as opposed to understanding only the predictive ability of statistics.

To keep our model somewhat close to NHL reality I am going to assign the 300 players on-ice shooting percentages somewhat representative of actual NHL shooting percentages. The top 300 forwards have long-term on-ice shooting percentages between 5.0% and 10.5%. For this experiment I’ll assume a more normal range is between 6% and 10% which is where most players lie anyway. For the first 200 players I assigned shooting percentages in increments of 0.02% starting with 6.02% and ending with 10.0% (i.e. 6.02%, 6.04%, 6.06%, …, 9.98%, 10.0%). As NHL players are a little more tightly packed in the middle I assigned the remaining 100 players shooting percentages in increments of 0.02% starting from 7.02% (i.e. 7.02%, 7.04%, …, 8.98%, 9.00%).

Also, in any given year the top 300 forwards will be on the ice for between approximately 300 and 700 shots for so for each player I randomly assigned a number between 300 and 700 to represent the number of shots they were on the ice for. Using a weighted coin flipping technique I modelled ‘actual’ shooting percentages based on their real shooting percentage talent and their real shots for totals.

How well does this model reflect reality?

Before I move on any further I wanted to get an idea whether the distribution of these modelled shooting percentages come close to reflecting real NHL shooting percentages. To do so I sorted my modelled shooting percentages from lowest to highest. I then did the same for NHL shooting percentages from the 2013-14 season. This is what we find when we plot them together.

The model fails a bit at the low end as there are more NHLers with poorer on-ice shooting percentages (I should have used a range of 5-10%, not 6-10%) but generally speaking the modelled shooting percentages are pretty representative of actual NHL shooting percentages.

How well does Modelled Sh% correlate well with Sh% Talent?

In this idealistic world that I have created we know (because we assigned it) each players true expected on-ice shooting percentage and we also have their modelled on-ice shooting percentage. How well do these correlate? Let’s have a look.

An R^2 of 0.35 is not too bad. There is definitely a correlation here. The real question is if we modelled a second season under the exact same conditions how would Modelled Season 1 compare to Modelled Season 2. Well, let’s have a look.

An R^2 of 0.08 is actually pretty terrible which indicates there is very little predictive ability of Sh% despite the fact that a single season of modelled Sh% is actually somewhat representative of actual shooting percentage talent.

Now this is just one attempt and as it turns out is actually a bit of an outlier example. I ran this same test 5 times and took an average R^2 and got somewhat better results. The average R^2 for one modelled season vs assigned shooting percentage talent is 0.39 and the R^2 for Season 1 modelled vs Season 2 modelled is 0.17. This still means that single season Sh% may not be all that predictive however it is far more representative of actual Sh% talent.

I continued with the analysis by modelling 2 years, 3 years, 4 years and 5 years by doubling, tripling, quadrupling and quintupling the shot sample size respectively. I again ran each 5 times and calculated an average R^2. Here is what I found.

R^2 vs Actual Talent Predictive R^2
1 Season 0.39 0.17
2 Seasons 0.55 0.32
3 Seasons 0.67 0.46
4 Seasons 0.72 0.51
5 Seasons 0.77 0.60

With just two seasons of data observed shooting percentages will explain 55% of actual shooting percentage talent while it is only 32% predictive. If a player maintains a good shooting percentage for ~2 seasons (less if they get a lot of ice time) then it is likely that there is real talent there.

It is important to remember that with real NHL data predictive R^2 will be far worse because players situations (teams, roles, coaches, injuries, age curves, etc.) will change and in my model I have left their true on-ice talent Sh% the same.

 

The reason why we can’t predict the future as well as we can estimate talent is because in predicting the future we are using data that is influenced by randomness to predict more data that is influenced by randomness. There is randomness in both sides of the equation.

Corsi metrics on the other hand are far more predictive because they are far more frequently occurring events and thus far less likely to be significantly impacted by randomness. This makes shot metrics look vastly superior to shooting percentage however I believe my model above shows that we are actually undervaluing shooting percentage in player evaluation.

What about Save Percentage?

The same is true for save percentage. Hockey analytics has largely downplayed a players ability to impact save percentage because it isn’t very predictive (here is another article). That said, save percentage is no different than shooting percentage in that using predictive ability to evaluate whether and how useful it is in player evaluation will result in us under valuing it.

Dealing with save percentages are more difficult than shooting percentages because off goalie impacts. If you are Patrice Bergeron and played in front of Tim Thomas and Tuuka Rask your whole career or Derek Stepan and played in front of Lundqvist your whole career you are going to be far better off than Jordan Eberle and the revolving door of goaltending failures the Oilers have put out there. Ideally we want to factor own goalies out of the equation when we want to look at player impact.

For a variety of reasons I believe a players ability to impact save percentage is about half of their ability to impact shooting percentage. The standard deviation of long-term Sh% Rel is 1.177 while it is 0.585 for Sv%Rel (almost exactly half). For Sh% RelTM it is 0.96 vs 0.55 for Sv%RelTM. Also when I was creating my plots for my recent Roles and Stats series of posts I generally had to halve the scales that I used on the offensive stat charts for the the defensive stat charts. Players impacting save percentage at about one half their ability to impact shooting percentage is probably a pretty good rule of thumb (probably applies to shot generation ability vs shot suppression ability too).

This one half rule of thumb makes adjusting my shooting percentage model above to save percentage easy. Instead of using a range of 6-10% I’ll use a range of 7-9% and instead of 0.02% increments I’ll use 0.01% increments. Run through all the numbers and here is what we find for save percentage.

R^2 vs Actual Talent Predictive R^2
1 Season 0.15 0.02
2 Seasons 0.25 0.08
3 Seasons 0.32 0.11
4 Seasons 0.41 0.15
5 Seasons 0.46 0.22

Now these are much lower than what we saw for shooting percentage because the variation in talent is half as large. Essentially the signal (talent variation) to noise (randomness) ratio is smaller therefore it is more difficult to have confidence in our observations. However there is still value in our observations above and beyond what the predictive R^2 values would indicate and this is critically important to know.

Takeaways and Conclusions

The number one takeaway from this article is that predicting the future is different than evaluating the past. The future is necessarily different from the past and one should recognise that while predicting the future and evalating the past are related questions they are also different questions and we need to understand that when jumping between the two.

If we apply the predictive limitations of a statistic to evaluating past data we will necessarily under value variables that are influenced by randomness. If we then use predictive ability as a basis for regressing statistics in player evaluation then we necessarily will be over regressing (something hockey analytics did for years with shooting percentage).

So when I throw things out like Kris Russell or Matt Hunwick are better than their Corsi statistics suggest I am not doing so just to rile up the analytics community, I do it because I believe it to be true. In fact there is ample evidence to suggest that they are better defensively than their shot against statistics as we see in their 5v5close Sv% RelTM statistics.

Season Russell Sv% RelTM Hunwick Sv% RelTM
2007-08 1.7 2.1
2008-09 1.2 1.4
2009-10 2.8 1.5
2010-11 0.1 0.5
2011-12 1.1 -0.8
2012-13 1.5 1.8
2013-14 0.5 DNP
2014-15 2.0 1.0
2015-16 -2.2 0.8
2016-17 1.8 1.3

That is pretty consistently being pretty good and when you do that long enough the uncertainty diminishes significantly. Nearly every day on twitter I see people railing against Hunwick because of his poor Corsi and desperately want to see him benched in favour of Frank Corrado however as I have written before that is a solution to a theoretical problem, not a real one. Hockey Analytics has largely dismissed players influencing save percentage because of the poor predictive ability of Sv%Rel and Sv%RelTM. I hope I have shown above that this is poor analytics and as a result Hunwick is a Hockey Analytics problem, not a Mike Babcock/Toronto Maple Leafs problem.