Last week Tyler Dellow had a post titled “Two Graphs and 480 Words That Will Convince You On Corsi%” in which, you can say, I was less than convinced (read the comments). This post is my rebuttal that will attempt to convince you on the importance of Sh% in player evaluation.

The problem with shooting percentage is that it suffers from small sample size issues. Over small sample sizes it often gets dominated by randomness (I prefer the term randomness to luck) but the question I have always had is, if we remove randomness from the equation, how important of a skill is shooting percentage? To attempt to answer this I will look at the variance in on-ice shooting percentages among forwards as we increase the sample size from a single season (minimum 500 minutes ice time) to 6 seasons (minimum 3000 minutes ice time). As the sample size increases we would expect the variance due to randomness to decrease. This means, when the observed variance stops decreasing (or significantly slows the rate of decrease) as sample size increases we know we are approaching the point where any variance is actually variance in true talent and not small sample size randomness. So, without going on any further I present you my first chart of on-ice shooting percentages for forwards in 5v5 situations.

Variance decline pretty much stops by the time you reach 5 years/2500+ minutes worth of data but after 3 years (1500+ minutes) the drop off rate falls off significantly. It is also worth noting that some of the drop off over longer periods of time is due to age progression/regression and not due to reduction in randomness.

What is the significance of all of this?  Well, at 5 years a 90th percentile player would have 45% more goals given an equal number of shots as a 10th percentile player. A player one standard deviation above average will have 33% more goals for given an equal number of shots as a player one standard deviation below average.

Now, let’s compare this to the same chart for CF/20 to get an idea of how shot generation varies across players.

It’s a little interesting that the top players show no regression over time but the bottom line players do. This may be because terrible shot generating players don’t stick around long enough. More importantly though is the magnitude of the difference between the top players and the bottom players.  Well, a 90th percentile CF20 player produces about 25% more shots attempts than a 10th percentile player and a one standard deviation above average CF20 player produces about 18.5% more than a one standard deviation below average CF20 player (over 5 years). Both of these are well below (almost half of) the 45% and 33% we saw for shooting percentage.

I hear a lot of ‘I told you so’ from the pro-corsi crowd in regards to the Leafs and their losing streak and yes, their percentages have regress this season but I think it is worth noting that the Leafs are still an example of a team where CF% is not a good indicator of performance. The Leafs 5v5close CF% is 42.5% but their 5v5close GF% is 47.6%. The idea that CF% and GF% are “tightly intertwined” as Tyler Dellow wrote is not supported by the Maple Leafs this season despite the fact that the Maple Leafs are the latest “pro-Corsi” crowds favourite “I told you so” team.

There is also some evidence that the Leafs have been “unlucky” this year. Their 5v5close shooting percentages over the past 3 seasons have been 8.82 (2nd), 8.59(4th), 10.54(1st) while this year it has dropped to 8.17 (8th). Now the question is how much of that is luck and how much is the loss of Grabovski and MacArthur and the addition of Clarkson (who is a generally poor on-ice Sh% player) but the Leafs Sh% is well below the past few seasons and some of that may be bad luck (and notably, not “regression” from years of “good luck”).

In summary, generating shots matter, but capitalizing on them matters as much or more.

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

 Comparison 0710 vs 1013 Even vs Odd Difference GF20 vs GF20 0.61 0.89 0.28 FF20 vs FF20 0.62 0.97 0.35 FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs GF20 0.54 0.86 0.33 GF20 vs FF20 0.44 0.86 0.42 FSh% vs GF20 0.48 0.76 0.28 GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs FSh% 0.38 0.66 0.28 FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

 Comparison Even vs odd GF20 vs GF20 0.82 FF20 vs FF20 0.93 FSh% vs FSh% 0.63 FF20 vs GF20 0.74 GF20 vs FF20 0.77 FSh% vs GF20 0.65 GF20 vs FSh% 0.66 FF20 vs FSh% 0.45 FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

 Total Minutes GF20 vs GF20 >500 0.79 >1000 0.85 >1500 0.88 >2000 0.89 >2500 0.88 >3000 0.88 >4000 0.89 >5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

 Total Minutes FSh% vs FSh% >500 0.54 >1000 0.64 >1500 0.71 >2000 0.73 >2500 0.72 >3000 0.73 >4000 0.72 >5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

 GF20 predictive power 3yr vs 3yr 0.61 True Randomness Estimate 0.11 Unaccounted for factors estimate 0.28 Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

 Total Minutes GF20 vs GF20 FSh% vs FSh% >500 0.81 0.58 >1000 0.86 0.68 >1500 0.88 0.71 >2000 0.89 0.73 >2500 0.89 0.73 >3000 0.90 0.75 >4000 0.90 0.73 >5000 0.89 0.71

Yesterday there was a post on the Behind the Net Blog which discussed the Washington Capital’s 2009-10 even strength shooting percentage of 11.0% and the conclusion was that it must be mostly luck which resulted in a shooting percentage that high.  But was it?  It was noted in the article that in 2007-08 the Capitals shot at 8.1%, in 2008-09 they shot at 8.2% and this season they are shooting at 8.2% again.  So clearly 2009-10 appears to be an anomaly, but was it a luck driven anomaly or something else?

Most people in the hockey analysis world have been using a simple binomial distribution to simulate luck so I’ll do that here too.  The thing is, if the Washington Capitals were really a 8.2% shooting team last year, the chances of them shooting 11.0% or better on 2045 shots is a mere 0.0042%.  That kind of luck we should expect once every 8000 NHL seasons.  In short, we can be pretty confident that the Capitals 11.0% shooting percentage wasn’t all luck driven.

So the next question is, how much of it is luck, and how much can we attribute to other factors?  Well, let’s assume that their good luck was significant to the point where there would only be a 5% chance they could have experienced even more luck.  We can do this by constructing a binomial distribution using centered on a shooting percentage where the chance of producing a shooting percentage of >11.0% is 5%.  The result is shown in the following chart:

The far left vertical line is the number of goals that Washington would produce if they had an 8.2% shooting percentage and the far right line is their actual shooting percentage.  The center vertical line is the theoretical shooting percentage we would need to meet the 5% luck conditions outlined above.  Under this scenario one could suggest of the extra 57 goals that Washington scored above what they would get if they shot at 8.2%, 22 of those goals can be attributed to luck and 35 can be attributed to skill.

But what if we assumed the Capitals were extremely lucky and there was only 1% chance of having greater luck.  Under that scenario their true talent level would be 9.49% shooting percentage and 26 goals would be due to skill and 31 would be due to luck.

Regardless of how you want to look at it, a significant portion of the Capitals elevated shooting percentage was likely due to non-luck factors, be they actual talent, playing style, score effects, etc.

(Updated to include 3 seasons of data as I now realize that more luck data was available)

The other day there was a post on the Behind the Net Blog which used betting odds to estimate how lucky a team was during the 2009-10 season.  In many ways it is quite an ingenious way to evaluate a teams luck and I recommend those who have not read it go take a look.  Last night I was watching, sadly, the Leafs-Oilers game and thinking about luck in a hockey game and whether a team has any control over the luck they experience.   It got me thinking, does a team which controls the flow of the play mean that team is more likely to have more ‘good luck’ stuff happen to them than ‘bad luck’ stuff.

I defined luck as being how many standard deviations their actual point totals were from their expected point totals as defined in the document referenced in the Behind the Net blog post and in an updated document with 4 years of data.  I have only included 3 seasons in this analysis since I have only been working with 3 seasons of data recently and I was too lazy to go back and calculate a fourth season right now.

The most used stat to indicate how well a team controls the play is corsi or fenwick percentage which is basically the number of shots a team directs at the goal divided by the number of shots that they and their opponents teams directed at the goal.  I’ll be using Fenwick % here which includes shots and missed shots but not blocked shots.  So how does Fenwick % correlate with luck?

The correlation is fairly low but a correlation exists.  Maybe good teams can generate their own luck.  Here is a table of a teams luck and fenwick% for 2009-10.

 Team Luck Fen% Chicago Blackhawks 0.777 0.578 Detroit Red Wings 0.395 0.541 Boston Bruins -0.534 0.536 Pittsburgh Penguins -0.156 0.530 Toronto Maple Leafs -1.282 0.528 New Jersey Devils 0.459 0.522 St. Louis Blues 0.186 0.519 Phoenix Coyotes 2.092 0.515 Nashville Predators 1.225 0.514 Calgary Flames -0.590 0.513 Washington Capitals 1.883 0.512 San Jose Sharks 1.020 0.512 Philadelphia Flyers -1.157 0.511 Ottawa Senators 0.083 0.508 Los Angeles Kings 1.040 0.498 Buffalo Sabres 0.302 0.496 Atlanta Thrashers -0.347 0.496 New York Rangers -0.753 0.495 Vancouver Canucks 0.471 0.495 Carolina Hurricanes -0.555 0.491 New York Islanders -0.201 0.490 Columbus Blue Jackets -0.855 0.488 Dallas Stars -0.212 0.480 Anaheim Ducks -0.087 0.467 Tampa Bay Lightning -0.604 0.466 Florida Panthers -0.726 0.465 Montreal Canadiens 0.052 0.464 Minnesota Wild -0.486 0.459 Colorado Avalanche 0.599 0.449 Edmonton Oilers -1.993 0.446

When I was looking through the table something caught my attention.  Of the bottom 15 teams in Fenwick%, only four teams had positive luck.  These were Buffalo, Vancouver, Montreal and Colorado.  Generally speaking, these four teams had good to very good goaltending.  Of the top 15 teams in Fenwick%, only five teams had negative luck.  These were Boston, Pittsburgh, Toronto, Calgary and Philadelphia.  Boston and Calgary had good to very good goaltending (especially once Boston switched mostly to Rask) but Philadelphia, Pittsburgh and Toronto had mediocre to poor goaltending.  That got me to wondering whether goaltending correlated with luck at all so I took a look at the correlation between 5v5 game tied shooting and save percentages with luck.

Like fenwick%, there is an indication of a small correlation between shooting percentage and luck and there is a bit more of a correlation with save percentage.  Next I looked at combining all three factors.  Initially I was going to look at combining all three through some sort of average but then decided to look at goals for percentage instead (goals for divided by goals for plus goals against) since that basically encompasses everything anyway and we find that combined we get a relatively strong correlation with luck.

Now we are getting into correlation that might actually mean something, but what does it all mean?  To be honest, I am not sure.  Regardless of what ‘skill’ we look at there does seem to be a small positive correlation between how good a team is and how good their luck is (as calculated from the betting lines).  Does this mean that a bad team and especially a team with bad goaltending opens itself up to more bad luck than good teams or teams with good goaltending, or does it mean that luck manifests itself mostly in bad goals against or does it simply mean that the people who bet on hockey games trend towards betting the underdog which would push their expected winning percentage up and good teams expected winning percentage down which would result in a poor estimation of luck?  I am not sure how you determine what the exact cause of the correlation is but if it is the latter I have a word of advice, always bet the favourite.