Estimating actual randomness in goal data
If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.
The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects, coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.
In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.
When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.
|Total Minutes||GF20 vs GF20|
For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.
Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.
|Total Minutes||FSh% vs FSh%|
Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.
To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.
|GF20 predictive power 3yr vs 3yr||0.61|
|True Randomness Estimate||0.11|
|Unaccounted for factors estimate||0.28|
|Eric T’s regression benefit||0.05|
There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult question to answer but the benefit could be far more significant than any regression technique.
Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.
|Total Minutes||GF20 vs GF20||FSh% vs FSh%|