Dec 082014
 

I have tackled the subject of on-ice shooting percentage a number of times here but I think it is a subject that has been under researched in hockey analytics. Historically people have done some split half comparisons found weak correlations and written it off as a significant or useful factor in hockey analytics. While some of the research has merit, a lot of the research deals with too small of a sample size to get any really useful correlations. Split-half season correlations with majority of the players is including players that might have 3 goals int he first half and 7 in the second half and that is just not enough to draw any conclusions from. Even year over year correlations have their issues and in addition to smallish sample sizes it suffers problems related to roster changes and how roster changes impact on-ice shooting percentages. Ideally we’d want to eliminate all these factors and get down to actual on-ice shooting percentage talent factoring out both luck/randomness and roster changes.

Today @MimicoHero posted an article discussing shooting percentage (and save percentage)  by looking at multi-year vs multi-year comparisons. It’s a good article so have a read and I have written many articles like this in the past. This is important research but as I eluded to above, year over year comparisons suffer from issues related to roster change which potentially limit what we can actually learn from the data. People often look at even/odd games to eliminate these roster issues and that is a pretty good methodology. Once in the past I took this idea to the extreme and even used even/odd seconds in order to attempt to isolate true talent from other factors (note that subsequent to that article I found a bug in my code that may have impacted the results so I don’t have 100% confidence in them. I hope to revisit this in a future post to confirm the results.). This pretty much assures that the teammates a player plays with and the opponents they play against and the situations they play in will be almost identical in both halves of the data. I hope to revisit the even/odd second work in a future post to confirm and extend on that research but for this post I am going to take another approach. For this post I am going to focus solely on shooting percentage and use an even/odd shot methodology which should do a pretty good job of removing roster change effects as well.

I took all 5v5 shot data from 2007-08 through 2013-14 and for each forward I took their first 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000 shots for that they were on the ice for. This allowed me to do 100 vs 100 shot, 200 vs 200 shot, … 1000 vs 1000 shot comparisons. For comparison sake, in addition to even/odd shots I am also going to look at first half vs second half comparisons to get an idea of how different the correlations are (i.e. what the impact of roster changes is on a players on-ice shooting percentage). Here are the resulting correlation coefficients.

Scenario SplitHalf Even vs Odd NPlayers
100v100 0.186 0.159 723
200v200 0.229 0.268 590
300v300 0.296 0.330 502
400v400 0.368 0.375 443
500v500 0.379 0.440 399
600v600 0.431 0.481 350
700v700 0.421 0.463 319
800v800 0.451 0.486 285
900v900 0.440 0.454 261
1000v1000 0.415 0.498 222

And here is the table in graphical form.

EvenVsOdd_FirstvsSecondHalf_ShPct

Let’s start with the good news. As expected even vs odd correlations are better than first half vs second half correlations though it really isn’t as significant of a difference as I might have expected. This is especially true with the larger sample sizes where the spread should theoretically get larger.

What I did find a bit troubling is that correlations seem to max out at 600 shots vs 600 shots and even those correlations aren’t all that great (0.45-0.50). In theory as sample size increases one should get better and better correlations and as they approach infinity they should approach 1.00. Instead, they seem to approach 0.5 which had me questioning my data.

After some thought though I realized the problem was likely due to the decreasing number of players within the larger shot total groups. What this does is it restricts the spread in talent as only the top level players remain in those larger groups. As you increase the shot requirements you start weeding out the lesser players that are on the ice for less ice time and fewer shots. So, while randomness decreases with increased number of shots so does the spread in talent. My theory is the signal (talent) to noise (randomness) ratio is not actually improving enough to see improving results.

To test this theory I looked at the standard deviations within each even/odd group. Since we also have a definitive N value for each group (100, 200, 300, etc.) and I can calculate the average shooting percentage it is possible to estimate the standard deviation due to randomness. With the overall standard deviation and an estimated standard deviation of randomness it is possible to calculate the standard deviation in on-ice shooting percentage talent. Here are the results of that math.

Scenario SD(EvenSh%) SD(OddSh%) SD(Randomness) SD(Talent)
100v100 2.98% 2.84% 2.67% 1.15%
200v200 2.22% 2.08% 1.91% 1.00%
300v300 1.99% 1.87% 1.56% 1.14%
400v400 1.71% 1.70% 1.35% 1.04%
500v500 1.56% 1.57% 1.21% 1.00%
600v600 1.50% 1.50% 1.11% 1.01%
700v700 1.35% 1.39% 1.03% 0.90%
800v800 1.35% 1.33% 0.96% 0.93%
900v900 1.24% 1.26% 0.91% 0.86%
1000v1000 1.14% 1.23% 0.86% 0.81%

And again, the chart in graphical format.

EstimatingOnIceShootingPctTalent

The grey line is the randomness standard deviation and it flows as expected, decreasing in a nice manner. This is a significant driver of the even and odd standard deviations but the talent standard deviation slowly falls off as well. If we call SD(Talent) the signal and SD(Randomness) as the noise then we can plot a signal to noise ratio calculated as ST(Talent) / SD(Randomness).

SignalToNoise

What is interesting is that the signal to noise ration improves significantly up to 600v600 then it sort of levels off. This is pretty much in line with what we saw earlier in the first table and chart. After 600v600 we start dropping out the majority of the fourth liners who don’t get enough ice time to be on the ice for 1400+ shots at 5v5. Later we start dropping out the 3rd liners too. The result is the signal to noise ratio flattens out.

With that said, there is probably enough information in the above charts to determine what a reasonable spread in on-ice shooting percentage talent actually is. Specifically, the yellow SD(Talent) line does give us a pretty good indication of what the spread in on-ice shooting percentage talent really is. Based on this analysis a reasonable estimate for one standard deviation in shooting percentage talent in a typical NHL season is probably around 1.0% or maybe slightly above.

What does that mean in real terms (i.e. goal production)? Well, the average NHL forward is on the ice for ~400 5v5 shots per season. Thus, a player with an average amount of ice time that shoots one standard deviation (I’ll use 1.0% as standard deviation to be conservative) above average would be on the ice for 4 extra goals due solely to their on-ice shooting percentage. Conversely an average ice time player with an on-ice shooting percentage one standard deviation below average would be on the ice for about 4 fewer goals.

Now of course if you are an elite player getting big minutes the benefit is far greater. Let’s take Sidney Crosby for example. Over the past 7 seasons his on-ice shooting percentage is about 3.33 standard deviations above average and last year he was on the ice for just over 700 shots. That equates to an extra 23 goals due to his extremely good on-ice shooting percentage. That’s pretty impressive if you think about it.

Now compare that to Scott Gomez whose 7-year shooting percentage is about 1.6 standard deviations below average. In 2010-11 he was on the ice for 667 shots for. That year his lagging shooting percentage talent an estimated 10.6 goals. Imagine, Crosby vs Gomez is a 33+ goal swing in just 5v5 offensive output.

(Yes, I am taking some liberties in those last few paragraphs with assumptions relating to luck/randomness, quality of team mates and what not so not all good or bad can necessarily be attributed to a single player or to the extent described but I think it drives the point, a single player can have a significant impact through on-ice shooting percentage talent alone).

In conclusion, even after you factor out luck and randomness, on-ice shooting percentage can player a significant role in goal production at the player level and, as I have been saying for years, must be taken into consideration in player evaluation. If you aren’t considering that a particular player might be particularly good or particularly bad at driving on-ice shooting percentage you may not be getting the full story.

(In a related post, there was an interesting article on Hockey Prospectus yesterday looking at how passing affects shooting percentage which supports some earlier findings that showed that good passers are often good at boosting teammates on-ice shooting percentage. Of course I have also shown that shots on the rush also result in higher shooting percentage so to the extent that players are good at generating rush shots they should be good at boosting their on-ice shooting percentages).

 

Apr 012014
 

Last week Tyler Dellow had a post titled “Two Graphs and 480 Words That Will Convince You On Corsi%” in which, you can say, I was less than convinced (read the comments). This post is my rebuttal that will attempt to convince you on the importance of Sh% in player evaluation.

The problem with shooting percentage is that it suffers from small sample size issues. Over small sample sizes it often gets dominated by randomness (I prefer the term randomness to luck) but the question I have always had is, if we remove randomness from the equation, how important of a skill is shooting percentage? To attempt to answer this I will look at the variance in on-ice shooting percentages among forwards as we increase the sample size from a single season (minimum 500 minutes ice time) to 6 seasons (minimum 3000 minutes ice time). As the sample size increases we would expect the variance due to randomness to decrease. This means, when the observed variance stops decreasing (or significantly slows the rate of decrease) as sample size increases we know we are approaching the point where any variance is actually variance in true talent and not small sample size randomness. So, without going on any further I present you my first chart of on-ice shooting percentages for forwards in 5v5 situations.

 

ShPctVarianceBySampleSize

Variance decline pretty much stops by the time you reach 5 years/2500+ minutes worth of data but after 3 years (1500+ minutes) the drop off rate falls off significantly. It is also worth noting that some of the drop off over longer periods of time is due to age progression/regression and not due to reduction in randomness.

What is the significance of all of this?  Well, at 5 years a 90th percentile player would have 45% more goals given an equal number of shots as a 10th percentile player. A player one standard deviation above average will have 33% more goals for given an equal number of shots as a player one standard deviation below average.

Now, let’s compare this to the same chart for CF/20 to get an idea of how shot generation varies across players.

 

CF20VarianceBySampleSize

It’s a little interesting that the top players show no regression over time but the bottom line players do. This may be because terrible shot generating players don’t stick around long enough. More importantly though is the magnitude of the difference between the top players and the bottom players.  Well, a 90th percentile CF20 player produces about 25% more shots attempts than a 10th percentile player and a one standard deviation above average CF20 player produces about 18.5% more than a one standard deviation below average CF20 player (over 5 years). Both of these are well below (almost half of) the 45% and 33% we saw for shooting percentage.

I hear a lot of ‘I told you so’ from the pro-corsi crowd in regards to the Leafs and their losing streak and yes, their percentages have regress this season but I think it is worth noting that the Leafs are still an example of a team where CF% is not a good indicator of performance. The Leafs 5v5close CF% is 42.5% but their 5v5close GF% is 47.6%. The idea that CF% and GF% are “tightly intertwined” as Tyler Dellow wrote is not supported by the Maple Leafs this season despite the fact that the Maple Leafs are the latest “pro-Corsi” crowds favourite “I told you so” team.

There is also some evidence that the Leafs have been “unlucky” this year. Their 5v5close shooting percentages over the past 3 seasons have been 8.82 (2nd), 8.59(4th), 10.54(1st) while this year it has dropped to 8.17 (8th). Now the question is how much of that is luck and how much is the loss of Grabovski and MacArthur and the addition of Clarkson (who is a generally poor on-ice Sh% player) but the Leafs Sh% is well below the past few seasons and some of that may be bad luck (and notably, not “regression” from years of “good luck”).

In summary, generating shots matter, but capitalizing on them matters as much or more.

 

Aug 022013
 

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

  1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
  2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

Comparison 0710 vs 1013 Even vs Odd Difference
GF20 vs GF20 0.61 0.89 0.28
FF20 vs FF20 0.62 0.97 0.35
FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs GF20 0.54 0.86 0.33
GF20 vs FF20 0.44 0.86 0.42
FSh% vs GF20 0.48 0.76 0.28
GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs FSh% 0.38 0.66 0.28
FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

Comparison Even vs odd
GF20 vs GF20 0.82
FF20 vs FF20 0.93
FSh% vs FSh% 0.63
FF20 vs GF20 0.74
GF20 vs FF20 0.77
FSh% vs GF20 0.65
GF20 vs FSh% 0.66
FF20 vs FSh% 0.45
FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

 

Jun 182013
 

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

Total Minutes GF20 vs GF20
>500 0.79
>1000 0.85
>1500 0.88
>2000 0.89
>2500 0.88
>3000 0.88
>4000 0.89
>5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

Total Minutes FSh% vs FSh%
>500 0.54
>1000 0.64
>1500 0.71
>2000 0.73
>2500 0.72
>3000 0.73
>4000 0.72
>5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

GF20 predictive power 3yr vs 3yr 0.61
True Randomness Estimate 0.11
Unaccounted for factors estimate 0.28
Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

Total Minutes GF20 vs GF20 FSh% vs FSh%
>500 0.81 0.58
>1000 0.86 0.68
>1500 0.88 0.71
>2000 0.89 0.73
>2500 0.89 0.73
>3000 0.90 0.75
>4000 0.90 0.73
>5000 0.89 0.71