Corsi vs Goals when predicting conference finalists

Every year I’ll see a tweet like the following and while they frustrates me it also shows me peoples biases.

It shows peoples biases because clearly Neil is attempting to defend Corsi, something he has invested himself in over the years, by showing the value of Corsi in predicting conference finalists. It is frustrating because it isn’t really telling us the whole story. Sure, seeing #2, #6, #7, and #8 teams make the semi-finals seems like a good thing but it is like saying the Boston Red Sox scored 6 runs last night. On the surface that sounds pretty good (it’s above MLB average for runs scored in a game) until I tell you their opponent, the Houston Astros, scored 7. Now scoring six runs seems much less useful or important. The problem here isn’t the facts of Neil’s tweet it is that Neil isn’t putting this observation in context of how it compares to other methods of predicting the conference finalists. But hey, that is what I am here for right?

I am going to modify the methodology some by looking at rankings within the conference the teams play in and not within the league as a whole since playoffs are conference based not league based. I am also going to look at both CF% and GF% in both 5v5 and 5v5 score adjusted situations. Here is what I found for this current season.

East West
Tampa Pittsburgh San Jose St. Louis Total
CF% – 5v5 2 1 6 5 14
GF% – 5v5 5 4 3 2 14
CF% – 5v5 score adjusted 2 1 6 5 14
GF% – 5v5 score adjusted 5 4 3 2 14

Look at that, GF% and CF% perform equally well with CF% performing better for eastern conference teams and GF% performing better for western conference teams. Adjusting for score provided zero additional value overall. Of course, one single season isn’t a very large sample size to work with. How did last years playoff look?

East West
Tampa NYR Chicago Anaheim Total
CF% – 5v5 2 10 2 8 22
GF% – 5v5 1 2 5 7 15
CF% – 5v5 score adjusted 1 9 2 9 21
GF% – 5v5 score adjusted 1 2 5 7 15

Hmm. Those CF% stats aren’t looking quote so good as they were outperformed by GF%, especially in the eastern conference. There was only a marginal benefit to adjusting for score.

Now two seasons is still a pretty small sample size so I looked at playoffs back to 2010. Here is what I found for total rank among conference finalists.

2016 2015 2014 2013 2012 2011 2010 Average
CF% – 5v5 14 22 21 15 27 14 29 20.3
GF% – 5v5 14 15 15 10 25 11 21 15.9
CF% – 5v5 score adjusted 14 21 20 11 22 10 29 18.1
GF% – 5v5 score adjusted 14 15 15 10 24 11 21 15.7

Hey, look at that. The only time a CF% stat did better than its corresponding GF% stat was in 2011 and 2012 when score adjusted CF% outperformed score adjusted GF% and it was only by the slightest of margins (1 and 2 points respectively). Every other time GF% was equal to or better than its corresponding CF% statistic.

Generally speaking GF% does a better job at predicting the conference finalists than CF% and while there was some benefit to using score adjusted data it was a fairly small benefit and CF% benefited more than GF%. So while the crux of Neil Greenberg’s tweet is true, score adjusted CF% does a decent job at predicting conference finalists, what is missing is the fact that there are seemingly better stats to use for this.