# Estimating actual randomness in goal data

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

 Total Minutes GF20 vs GF20 >500 0.79 >1000 0.85 >1500 0.88 >2000 0.89 >2500 0.88 >3000 0.88 >4000 0.89 >5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

 Total Minutes FSh% vs FSh% >500 0.54 >1000 0.64 >1500 0.71 >2000 0.73 >2500 0.72 >3000 0.73 >4000 0.72 >5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

 GF20 predictive power 3yr vs 3yr 0.61 True Randomness Estimate 0.11 Unaccounted for factors estimate 0.28 Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

 Total Minutes GF20 vs GF20 FSh% vs FSh% >500 0.81 0.58 >1000 0.86 0.68 >1500 0.88 0.71 >2000 0.89 0.73 >2500 0.89 0.73 >3000 0.90 0.75 >4000 0.90 0.73 >5000 0.89 0.71

1. Eric T. says:

This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects, coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

It’s cute how you frame it as an either/or proposition, as if we couldn’t regress to the mean and adjust for usage. Is multiplying everything by 1-r is so labor-intensive that you couldn’t possibly have done that while also working on how to account for zone starts?

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology

I’ve been using a regression methodology for years. What you mean to say is that in my last post on the subject I finally caved in and walked you through how to apply it to your own methodology, doing for you the work that people have been telling you for years would improve your analysis.

This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest.

I like the part where you ignore the fact that it completely, dramatically, wildly changes the conclusion about defensemen, whom it would seem you’ve been evaluating wrongly for years.

“If we only pay attention to this half of the NHL, this trivial arithmetic only makes modest improvements to this one type of analysis” is a pretty poor defense for having ignored years of people telling you to look into it.

1. I’ve been using a regression methodology for years.

So publish your results (and if you have, point me to where you have) with the values for every player so everyone critique and nit pick about who it fails miserably for like you did with the paper that starting this whole discussion.

I like the part where you ignore the fact that it completely, dramatically, wildly changes the conclusion about defensemen, whom it would seem you’ve been evaluating wrongly for years.

How is this so? You wrote “regressing the team’s shooting percentage all the way back to league average” which is essentially saying that defensemen have zero effect on shooting percentage. A goal based analysis does not consider shots or shooting percentage individually but it doesn’t mean it will fail if one of them has no effect. There are probably a ton of forwards that have a neutral effect on shooting percentage. That doesn’t mean a goal based analysis fails for them.

Furthermore, the magnitude of the random errors associated with defensemen are no larger, or smaller, than that of forwards. If they are, it’s not random error at play.

1. Pierce Cunneen says:

If a defensemen has little control over his on-ice shooting percentage, then using goal rates(which you have been doing) is giving full credit to defensemen for something that they have no control over. How can you not see the error in that. If a D-man can’t control on ice SH%, than he only can control his shot differentials. Thus, using goal rates for D-men makes no sense, since it is including a variable into the equation (on ice SH%) which is completely dependent on outside factors (QoC, QoT, goaltending, etc). Thus, using SH% in evaluation of defensemen is adding a whole bunch of noise, and obstructing the truth.

1. If a defensemen has little control over his on-ice shooting percentage, then using goal rates(which you have been doing) is giving full credit to defensemen for something that they have no control over.

If a defenseman can boost shots by 10% and has a neutral effect on shooting percentage then he has a 10% boost on goals. If a forward has a 10% boost on shots and a 10% boost on shooting percentage he’ll have a 21% (1.1 * 1.1) boost on goals. Now, if I am correctly analyzing the goal stats I should be able to isolate both of these percentages.

You can debate whether my methodology for isolating a players impact on goals is doing a good job or not but to suggest that an analysis of goals to isolate a defenseman’s contribution to goals is necessarily wrong because it uses goals is logically inaccurate.

Thus, using SH% in evaluation of defensemen is adding a whole bunch of noise

This is a different argument than the “it doesn’t work” argument above and has some validity but it brings up an interesting question. If we develop an algorithm that perfectly isolates a defenseman’s contribution to goals and similarly produce an algorithm that perfectly isolates a defenseman’s contribution to shots, then, if defensemen in fact have zero impact on shooting %, doesn’t this give us an ability to estimate how ‘lucky’ they were. More precisely the amount of ‘random error’ they were on the ice for? And, if this were the case, I wonder if there is some way we can infer a forwards on-ice random error based on the on-ice random error of the defensemen he shared ice time with. Include goalies in the mix and you might be able to do this more accurately and we may be able to get something better than a straight regression. We are a long way from being able to isolate individual goal and shot talent to a high level (and may never get close enough to make it work), but the idea is intriguing.

1. Pierce Cunneen says:

But goal rates might not reflect that 10% boost in SH%, since outside factors might cause a drop in SH% that is beyond a defensemen’s control (this is going with the assumption that a defensemen cannot control his on ice SH%, which I believe Eric brought up).

So a D-man might be really good at boosting shot rates, but because of competition, sucky teamates, great opposing goaltenders, etc, his SH% might drop significantly. Thus, the Goal rates for that D-man are not reflective of his true value. That’s why I think Eric is saying we can’t use goal rate for defensemen (maybe I am misunderstanding Eric’s point).

Now the second part of your reply is more interesting. I would say that if a defensemen has no impact on his on ice SH%, then his contribution to goals can only be reflected by his shot rate. Now shot rate also includes luck (though far less than goals because shots are far more numerous), but a true evaluation of a D-man’s offensive contribution to goals would be his ability to increase shot rate.

2. So a D-man might be really good at boosting shot rates, but because of competition, sucky teamates, great opposing goaltenders, etc, his SH% might drop significantly. Thus, the Goal rates for that D-man are not reflective of his true value. That’s why I think Eric is saying we can’t use goal rate for defensemen (maybe I am misunderstanding Eric’s point).

I don’t see how this differs from “A D-man might be really good at boosting shot rates, but because of competition that is good at suppressing shot rates and team mates that are poor at boosting shot rates his shot rate might drop significantly”

The question is, how well can we isolate an individuals contribution to goal production (or shot production).

3. Eric T. says:

A goal based analysis does not consider shots or shooting percentage individually but it doesn’t mean it will fail if one of them has no effect.

Including something that has sizable variance but for which there is no real talent leads to failure. It adds noise but not signal. That is an unequivocal bad; I’m not sure why you’d defend it at this point.

If we develop an algorithm that perfectly isolates a defenseman’s contribution to goals and similarly produce an algorithm that perfectly isolates a defenseman’s contribution to shots, then, if defensemen in fact have zero impact on shooting %, doesn’t this give us an ability to estimate how ‘lucky’ they were. More precisely the amount of ‘random error’ they were on the ice for?

Yes, it does. And you know what would give you the exact same information? Their on-ice shooting percentage.

So why do all the work to develop the algorithm that isolates their contribution to goals, knowing that they have essentially zero impact on shooting percentage? What value does that add?

Why quote their goal rates as evidence of who is playing well or poorly, when defenseman goal rate is just shot rate plus random noise?

so everyone critique and nit pick about who it fails miserably for like you did with the paper that starting this whole discussion

You still seem not to understand what I was saying in that article.

My point was not “you got these players wrong, therefore your methodology is bad”; it was “your methodology is bad, therefore it got these players wrong”.

I’m tired of having to keep explaining what the article says when you misinterpret it in some way that rankles you. So I’m just going to walk through it line by line and explain what each section meant and hope we can then move on.

Hello. Your article on hockey analysis (“Estimating player contribution in hockey with regularized logistic regression”) came to my attention. I wanted to provide some feedback as someone who is less well-versed in the technical tools of data mining but perhaps more up to speed on the state of the art in hockey analysis.

Translation: I think there’s some stuff that others have worked out that you can benefit from, and I’m writing to help fill you in on that.

I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.

Translation: I know there’s potential for you to add value. That’s why I’m taking the time to write this.

I think my biggest concern is that by focusing exclusively on goals, you allow for shooting percentage variance to have a significant impact on a player’s calculated value. Even with four years of data, variance plays a large role in the shooting and save percentages with a given player on the ice.

This is the central thesis of my article. It’s something that, as a statistician, he quickly understood and is now working on adapting to. It’s something that I’ve spent the better part of a week proving to you.

I suspect this is a big part of why you rate Roloson so highly, for example.

See how that worked? I didn’t write to nitpick the ranking of Roloson; I used Roloson as a clear example of how the flaw in the methodology manifests itself.

His teams scored 2.02 even strength goals per game in the games that he played, while scoring just 1.82 even strength goals in the games he didn’t play — largely because they shot 0.7% better at 5v5 in the games he played than the games he didn’t.

And I walked him through exactly how the flawed result arose from the flawed methodology, so he could see what my concern was and how it affected the result.

Over some 4200 shots, a change of 0.7% represents a change of less than twice the standard error, so I find it much easier to believe that Roloson’s teammates just happened to run hot when he was on the ice than that he possesses a unique skill for producing high-percentage shots from across the rink which makes him a “quantifiable star”.

Lest someone argue that 9000 minutes of play is enough to identify shooting talent precisely, I ran through some simple arithmetic to make it abundantly clear that random noise is significant even at that extraordinary sample size.

The problem doesn’t just plague goalies.

I feared that the answer might be “yeah, I should probably take the goals for part out of the goalie evaluation”, so I wanted to walk through a skater example to make sure it was abundantly clear to everyone that it is necessary to account for variance with every player.

Your model has Kent Huskins as the second-best defenseman over this four-year period, a result that is entirely driven by shooting percentages. Huskins’ teams were roughly even in shot differential when he was on the ice, but the shooting percentages tilted dramatically in his favor. In a league where the average shooting percentage is about 8.1% at 5v5, when he was on the ice, his teams shot 9.0% and the opponents shot 5.9%. Again, what’s the more plausible explanation, that a guy who can barely get ice time has a unique unrecognized talent for dramatically suppressing opponents’ shooting percentages or that coincidentally the goalies ran hot over the 1500 shots that came with him on the ice?

Again: the specific flaw in the methodology that I have developed leads to this specific mis-evaluation in this specific way.

Bear in mind how predictive similar streaks have been: Sean O’Donnell’s opponents shot 6.1% over 1500+ shots from ’07-10 and have shot 7.8% since; Mark Stuart’s opponents shot 6.3% over 1500+ shots from ’07-10 and have shot 8.1% since; David Krejci’s opponents shot 5.9% over 1500+ shots from ’08-11 and have shot 10.1% since…and, of course, Huskins himself has been at 7.5% since the period covered in your story.

Again: supplying evidence that the observed performance is more likely to be variance than talent, that what I am calling a flaw in the methodology is indeed a flaw (this time via a list of the closest comparables rather than calculating standard errors).

This is why much of modern hockey analysis starts with shot-based metrics; the shooting percentages introduce a lot of variance which must be accounted for to get a reasonable assessment of talent.

Returning to my central thesis, and reminding him that this flaw is easily corrected.

If you used shots for your model, I suspect you’d easily identify more than a mere 60 players who have significantly non-zero talent levels — and the model could be further refined from there (e.g. give each shot a weight based on the shooter’s career shooting percentage).

Encouraging him to continue onwards, and suggesting an easy path to use his exact same code to take steps toward reducing the impact of variance.

Which brings me to a criticism that is admittedly less substantial

OK, admittedly here I move into two paragraphs of complaining about the prose that was unnecessary. I’ll admit fault there, and I’m very impressed that he was able to gloss over that and respond positively to the helpful critiques that came before it.

But most of all, the biggest stylistic issue I have is that I feel that claims this surprising need either stronger evidence to support them or more discussion of the uncertainty surrounding them. Certainly, conventional wisdom is wrong in places and some of the value of quantitative analysis is helping to identify those places. But if my model had Roloson as a top-five goalie, I would ask how that came about before I proclaimed him a star. If my model suggested Colton Orr was being dragged down by his teammates, I would question the model before I questioned the coaching staff’s usage patterns. Does it pass the sniff test that Manny Malhotra and Colby Armstrong and Kent Huskins are ranked ahead of Evgeni Malkin and Henrik Sedin and Chris Pronger? If not, then where is the discussion about the weaknesses of the model?

This is a very important thing for me. We don’t know enough to make a perfect model, and any model will have flaws. I expect people to look at the unexpected results with a critical eye and think about what it says about their model, rather than just declare that despite his obviously poor puck skills, Luke Schenn probably does somehow improve his teammates’ shooting percentage.

Finally, I am always nervous about analysis that is exclusively backwards-looking. I would have liked to see some analysis of whether this model was predictive to any appreciable extent. How likely was a player who was highly rated over the 2007-2011 period to perform well in the following seasons? How did teams’ performance in 2011-12 correlate with their players’ cumulative estimated value over the four previous years? Answering these kinds of questions is critical to demonstrating that the model is useful, in my opinion — and is especially crucial when the model is giving shocking results.

This is also a very important thing for me, and for econometrics in general. Out of sample correlations are extremely important to establishing the usefulness of a model.

I think if you start putting the model to that kind of test of predictive capabilities, you will see how important it is to reduce shooting percentage variance by incorporating shots that are saved or miss the net as well as those that go in.

And back to the central thesis one last time for a tidy conclusion, in case anyone somehow misunderstood it.

The point was not “hahaha Kent Huskins hahaha you’re an idiot.” It was not “starting your analysis with goals is unacceptable”.

It was, very simply, “it is important to reduce shooting percentage variance by incorporating shots that are saved or miss the net”.

I hope we’re on the same page now; I’m getting tired of being criticized for things I didn’t say.

4. Yes, it does. And you know what would give you the exact same information? Their on-ice shooting percentage.

No it doesn’t because on-ice shooting percentage is driven by a number of factors including QoT, QoC, zone starts, etc. In my even/odd analysis, defensemen with >4000 minutes of 5v5 ice time have a FSh% predictive power of 0.55. That’s a lower than for forwards but far from zero. Defensemen can have an elevated on-ice shooting percentage far beyond noise because the forwards they play with do. Pretty sure a lot of the Penguins defensemen do.

So why do all the work to develop the algorithm that isolates their contribution to goals, knowing that they have essentially zero impact on shooting percentage? What value does that add?

Clearly you are missing the point, which is strange because you wrote this:

Why quote their goal rates as evidence of who is playing well or poorly, when defenseman goal rate is just shot rate plus random noise?

If defenseman contribution to goal rate = defenseman contribution to shot rate + noise and we can calculate defenseman contribution to goal rate and defensemen contribution to shot rate we now know what the noise is. So, I was openly wondering, assuming we got this far, if there was some way to use this noise information we now know for defensemen to infer something about the noise their forward team mates experience.

Oh, thanks for the translation. It all makes sense now. I’ll sleep easy tonight.

2. Eric T. says:

Serious question: If you don’t think we should try to steer people towards what we think is right, then what is the point of any discussion at all?

1. Pierce Cunneen says:

<q cite="Serious question: If you don’t think we should try to steer people towards what we think is right, then what is the point of any discussion at all?"

correct Eric, what we know is right. I think your past two articles on BSH have proven the benefits of using shots analysis with regressed SH% over goal rates.

2. Pierce Cunneen says:

correct Eric, what we know is right. I think your past two articles on BSH have proven the benefits of using shots analysis with regressed SH% over goal rates.

1. Pierce Cunneen says:

not sure what is going on here, but its not letting me quote your last post Eric.

3. Rather than focus solely on what they didn’t do why not also look at what they did do and what they did do right that we all might be able to learn from. You brought up in one of your posts the concept of peer review but peer review is not just about pointing out what could have been done better it is about understanding what was done well and where what furthers the field of study. It is easy to critique and nit pick but the real benefit is in understanding what is underneath the things you can critique and nit pick. Critiquing and nit picking is easy so we all tend to do that far too often, myself included. Understanding the value underneath the obvious critiques and nit picks is difficult but that is where the true value likely lies. In my opinion at least.

1. Eric T. says:

I’m not exactly sure what this means.

If an analyst does something, and I think there are changes they can make that would improve their work, is it “critiquing and nitpicking” if I make those suggestions?

I can’t imagine it’s really what you mean, but this comes across an awful lot like “if you don’t have anything nice to say then don’t say anything at all”, which I don’t think is how scientific progress is made.