Order from Randomness?

If you want to claim that a piece of data is random, then there must be no identifiable patterns within it for if there are, then the data is not random.

For example one can easily look at a long-term list of forwards sorted by on-ice shooting percentage and clearly see that it is not random. The top of the list is dominated by everyone we would identify as elite offensive forwards and the bottom of the list is dominated by 3rd and 4th liners. Even with just 2 years of data the list is fairly well sorted with a range/standard deviation not much greater than 8 years of data. There is meaning in the data.

This brings us to my latest stat of discussion, Sv%RelTM which I discussed in yesterday’s post where I showed with 8 years of data that players that post poor Sv%RelTM statistics are generally more offence oriented players while those that post a good Sv%RelTM are more likely to be defence oriented.

Of course, while long term, it is just one dataset and who knows, maybe it is just lucky it turned out that way. Plus we wouldn’t want to have to wait 8 years to draw any conclusions so I wanted to present some more data but by looking at 2 season datasets. In these charts I will look a the top 25 forwards in Sv%RelTM and the bottom 25 forwards in Sv%RelTM and compare the group average of the following stats:

  • GF60 RelTM
  • CF60 RelTM
  • %ofTeam DZFO / % of Team OZFO

You are probably well aware of the first two statistics which are offensive statistics relative to the players they are playing with (i.e. are they better offensive players or worse). The last stat gives us an indication of whether they started more shifts in the defensive zone than the offensive zone. The higher the number, the more defensive the players role is likely to be. So, here is what we get.

SvPctRelTM_vs_GF60RelTM

SvPctRelTM_vs_CF60RelTM

SvPctRelTM_vs_ZoneStarts

(Note: all data used is 5v5close data for forwards with at least 600 minutes of ice time in the 2 season datasets and 2000 minutes in the 8 season dataset)

What do we notice in these chats? Relative consistency. Every year the poor Sv%RelTM players are on average better offensive players and get a smaller percentage of the defensive zone starts (and presumably a larger percentage of offensive zone starts). You can’t get order from randomness so we can only conclude that Sv%RelTM is not a purely random stat.

How important is Sv%RelTM? Well, over 8 years the top 25 players had an average Sv%RelTM of +1.4 and the worst 25 players had an average Sv%RelTM of -1.3 (relative symmetry is nice too). If the average starting goalie posts a .915 Save % behind the top group they would have a .929 save percentage and behind the worst group it would be a .902 save percentage. Is that insignificant? While less than the variance in shooting percentage I wouldn’t say it is insignificant at all.

But what about the lack of year over year persistence?

The lack of year over year persistence in Sv%RelTM or similar statistics is the main reason why people doubt that players can in fact influence save percentage. No doubt this is a challenge to explain but my analysis above does clearly indicate that something is going on that isn’t random. How can we explain these two observations that seem to be telling us completely opposite things? Well, the only thing we can suggest is playing style is the most significant driver in Sv%RelTM stats and it is possible that very few players play the same style over multiple seasons. They may move up or down the lineup or change teams or a new coach gives them a different or more balanced role. Furthermore, there are actually very few pure offensive or pure defensive players in the league so we may only be talking the outliers that actually influence save percentage to a significant degree which means for many players maybe Sv%RelTM isn’t important to consider, but for those in the specialized roles it needs to be considered.

All of this tells us once again that role and playing style seem to be a significant factor in the statistics players put up. This isn’t the first time I have shown that playing style may dramatically influence a players statistics (see The Bozak-Corsi Dilemma and The Coaching-Corsi Dilemma). Playing style is a significantly under studied area of hockey analytics but might be far more important than we realize, especially for those with more specialized roles. We need to find better methodologies to identify, study, and account for the roles players play and the coaching style they are playing under.

To finish up, it is important to remember that order is not a result of randomness. A statistic may seem random in year over year correlations but if it exhibits order and structure in other areas, it isn’t random and there are other factors at play that make it look random (and we need to investigate and understand these factors to further our knowledge).