Aug 022013
 

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

  1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
  2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

Comparison 0710 vs 1013 Even vs Odd Difference
GF20 vs GF20 0.61 0.89 0.28
FF20 vs FF20 0.62 0.97 0.35
FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs GF20 0.54 0.86 0.33
GF20 vs FF20 0.44 0.86 0.42
FSh% vs GF20 0.48 0.76 0.28
GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs FSh% 0.38 0.66 0.28
FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

Comparison Even vs odd
GF20 vs GF20 0.82
FF20 vs FF20 0.93
FSh% vs FSh% 0.63
FF20 vs GF20 0.74
GF20 vs FF20 0.77
FSh% vs GF20 0.65
GF20 vs FSh% 0.66
FF20 vs FSh% 0.45
FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

 

Apr 192012
 

Prior to the season Gabe Desjardins and I had a conversation over at MC79hockey.com where I predicted several players would combine for a 5v5 on-ice shooting percentage above 10.0% while league average is just shy of 8.0%.  I documented this in a post prior to the season.  In short, I predicted the following:

  • Crosby, Gaborik, Ryan, St. Louis, H. Sedin, Toews, Heatley, Tanguay, Datsyuk, and Nathan Horton will have a combined on-ice shooting percentage above 10.0%
  • Only two of those 10 players will have an on-ice shooting percentage below 9.5%

So, how did my prediction fair?  The following table tells all.

Player GF SF SH%
SIDNEY CROSBY 31 198 15.66%
MARTIN ST._LOUIS 74 601 12.31%
ALEX TANGUAY 43 371 11.59%
MARIAN GABORIK 57 582 9.79%
JONATHAN TOEWS 51 525 9.71%
NATHAN HORTON 34 359 9.47%
HENRIK SEDIN 62 655 9.47%
BOBBY RYAN 52 552 9.42%
PAVEL DATSYUK 50 573 8.73%
DANY HEATLEY 42 611 6.87%
Totals 496 5027 9.87%

Well, technically neither of my predictions came true.  Only 5 players had on-ice shooting percentages above 9.5% and as a group they did not maintain a shooting percentage above 10.0%.  That said, my prediction wasn’t all that far off.  8 of the 10 players had an on-ice shooting percentage above 9.42% and as a group they had an on-ice shooting percentage of 9.87%.  If Crosby was healthy for most of the season or the Minnesota Wild didn’t suck so bad the group would have reached the 10.0% mark.  So, when all is said and done, while technically my predictions didn’t come perfectly true, the intent of the prediction did.  Shooting percentage is a talent, is maintainable, and can be used as a predictor of future performance.

I now have 5 years of on-ice data on stats.hockeyanalysis.com so I thought I would take a look at how sustainable shooting percentage is using that data.  To do this I took all forwards with 350 minutes of 5v5 zone start adjusted ice time in each of the past 5 years and took the first 3 years of the data (2007-08 through 2009-10) to predict the final 2 years of data (2010-11 and 2011-12).  This means we used at least 1050 minutes of data over 3 seasons to predict at least 700 minutes of data over 2 seasons.  The following chart shows the results for on-ice shooting percentage.

Clearly there is some persistence in on-ice shooting percentage.  How does this compare to something like fenwick for rates (using FF20 – Fenwick For per 20 minutes).

Ok, so FF20 seems to be more persistent, but that doesn’t take away from the fact that shooting percentage is persistent and a reasonable predictor of future shooting percentage.  (FYI, the guy out on his own in the upper left is Kyle Wellwood)

The real question is, are either of them any good at predicting future goal scoring rates (GF20 – goals for per 20 minutes) because really, goals are ultimately what matters in hockey.

Ok, so both on-ice shooting percentage and on-ice fenwick for rates are somewhat reasonable predictors of future on-ice goal for rates with a slight advantage to on-ice shooting percentage (sorry, just had to point that out).  This is not inconsistent with what I  found a year ago when I used 4 years of data to calculate 2 year vs 2 year correlations.

Of course, I would never suggest we use shooting percentage as a player evaluation tool, just as I don’t suggest we use fenwick as a player evaluation tool.  Both are sustainable, both can be used as predictors of future success, and both are true player skills, but the best predictor of future goal scoring is past goal scoring, as evidenced by the following chart.

That is pretty clear evidence that goal rates are the best predictor of future goal rates and thus, in my opinion anyway, the best player evaluation tool.  Yes, there are still sample size issues with using goal rates for less than a full seasons worth of data, but for all those players where we have multiple seasons worth of data (or at least one full season with >~750 minutes of ice time) for, using anything other than goals as your player evaluation tool will potentially lead to less reliable and less accurate player evaluations.

As for the defensive side of the game, I have not found a single reasonably good predictor of future goals against rates, regardless of whether I look at corsi, fenwick, goals, shooting percentage or anything else.  This isn’t to suggest that players can’t influence defense, because I believe they can, but rather that there are too many other factors that I haven’t figured out how to isolate and remove from the equation.  Most important is the goalie and I feel the most difficult question to answer in hockey statistics is how to separate the goalie from the defenders. Plus, I believe there are far fewer players that truly focus on defense and thus goals against is largely driven by the opposition.

Note:  I won’t make any promises but my intention is to make this my last post on the subject of sustainability of on-ice shooting percentage and the benefit of using a goal based player analysis over a corsi/fenwick based analysis.  For all those who still fail to realize goals matter more than shots or shot attempts there is nothing more I can say.  All the evidence is above or in numerous other posts here at hockeyanalysis.com.  On-ice shooting percentage is a true player talent that is both sustainable and a viable predictor of future performance at least on par with fenwick rates.  If you choose to ignore reality from this point forward, it is at your own peril.