Being honest about “possession” stats as a predictive tool
I often feel that I am the sole defender of goal based hockey analyitics in a world dominated by shot attempt (corsi) based analytics. In recent weeks I have often heard the pro-corsi crowd cite example after example of where corsi-based analytics “got it right” or “predicted something fairly well”. While it is always good to be able to cite examples where you got things right a fair an honest evaluation looks at the complete picture, not just the good outcomes. Otherwise it is analytics by anecdotes which is an oxymoron if there every was one.
For example, Kent Wilson of FlamesNation.ca recent wrote about the “Dawning of the Age of Fancy Stats” in which he cited several instances of where hockey analytics got it right or did well in predicting outcomes.
The big test case which seems to have moved the needle in favour of the nerds is, of course, the Toronto Maple Leafs. Toronto came into the season with inflated expectations after an outburst of percentages during the lock-out shortened year saw them break into the post-season. Their awful underlying numbers caused the stats oriented amongst us to be far more circumspect about their chances, of course.
Toronto is the recent example that the hockey analytics crowd likes to bring up in support of their case but it is just one example. We don’t hear much about how many predicted the Ottawa Senators to be in the playoffs and some even had them challenging for the top spot in the eastern conference. We don’t hear much about how the New Jersey Devils missed the playoffs yet again despite having the 5th best 5v5close Fenwick% in the league, the year after missing the playoffs with the 3rd best 5v5close Fenwick% in the league. If we are truly interested in hockey analytics we need a complete and unbiased assessment of all outcomes, not just the ones that support our underlying belief.
In the same article Kent Wilson quoted a tweet from Dimitri Filipovic about the success of Corsi in predicting outcomes of playoff series.
Relevant #fact: since ’08 playoffs, teams that were 5+ % better than their opponent in 5v5 fenwick close during the regular season are 25-7.
While interesting, what it really doesn’t tell us a whole lot more than “when one team is significantly better at outshooting their opponents they more often than not win”. Well, that really isn’t saying a whole lot. It is more or less saying, when a dominant team plays a mediocre team, the dominant team usually wins. Not really that interesting when you think of it that way.
Here is another fact that puts that into perspective. Since the 2008 playoffs, the team with the better 5v5close Fenwick% has a 53-35-2 record (there were 2 cases where teams had identical fenwick% to 1 decimal place). That actually makes it sound like 5v5close Fenwick% is predictive overall, not just in cases where one team is significantly better than another. Of course, if we look at goals we find that the team with the better 5v5close goal% has a 54-34-1 record. In other words, 5v5close possession stats did no better at predicting playoff outcomes than 5v5close goal stats. It is easy to throw out stats that support a point of view, but it is far more important to look at the complete picture. That is what analytics is about.
A similar statistic was promoted by Michael Parkatti in a recent talk on hockey analytics at the University of Alberta. In that talk Parkatti stated that of the last 15 Stanley Cup winners all but 3 had a “ShotShare” (all situations) of at least 53%. The exceptions were Pittsburgh in 2009, Boston in 2011 and Carolina in 2006. I will note that it appears that all three of these teams are below 51% and 2009 Penguins were below 50%. That seems sort of impressive but I did some digging myself and found that every Stanley Cup winner since 1980 had a “GoalShare” (all situations) greater than 52%. Every single one. No exceptions. I didn’t look at any cup winners pre-1980 but the trend may very well go back a lot further. As impressive as 12 of 15 is, 34 of 34 is far more impressive.
Here is the thing. We know that goal percentage correlates with winning far better than corsi percentage. This is an indisputable fact. It is actually quite a bit better. The sole reason we use corsi is that goals are infrequent events and thus not necessarily indicative of true talent due to small sample size issues. This is a fair argument and one that I accept. In situations where you have small sample sizes definitely use corsi as your predictive metric (but understand its limitations). The question that needs to be answered is what constitutes a small sample size and more importantly what sample size do we need such that goals become as good or better of a predictor of future events than corsi. I have pegged this crossing point at about 1 seasons worth of data, maybe a bit more if looking at individual players who may not be getting 20 minutes of ice time a game (my guess is around >750 minutes of ice time is where I’d start to get more comfortable using goal data than corsi data). I am certain not everyone agrees but I haven’t see a lot of analyses attempting to find this “crossing point”.
Let’s take another look at how well 5v5close Fenwick% and Goal% predict playoff outcomes again but lets look by season rather than overall.
In full seasons not affected by lockouts we find that GF% was generally the better predictor (only 2008 did GF% under perform FF%) but in last years lockout shortened season FF% significantly outperformed GF%. Was this a coincidence or is it evidence that 48 games is not a large enough sample size to rely on GF% more than CF% but 82 games probably is?
I have seen numerous other examples in recent weeks where “analytics” supporters have used what amounts to not much more than anecdotal evidence to support their claims. This is not analytics. Analytics is a fair, unbiased and complete fact based assessment of reality. Showing why a technique is a good predictor some of the time is not enough. You need to show why it is overall a better predictor all of the time or at least define when it is and when it isn’t.
I recently wrote an article on whether last years statistics predicted this years playoff teams and found that GF% seemed to do at least as well as CF% despite last season being a lock-out shortened year.
With all that said, you will frequently find me using “possession” statistics so I certainly don’t think they are useless. It is just my opinion that puck possession is just one aspect of the game and puck possession analytics has largely been oversold when it comes to how useful it is as a predictor. Conversely goal based analytics has been largely given a bad rap which I find a little unfortunate.
(Another article worth reading is Matt Rudnitsky’s MONEYPUCK: Why Most People Need To Shut Up About ‘Advanced Stats’ In The NHL.)