## The declining value of fenwick/corsi with increased sample size

The last several days I have been playing around a fair bit with team data and analyzing various metrics for their usefulness in predicting future outcomes and I have come across some interesting observations. Specifically, with more years of data, fenwick becomes significantly less important/valuable while goals and the percentages become more important/valuable. Let me explain.

Let’s first look at the year over year correlations in the various stats themselves.

Y1 vs Y2 | Y12 vs Y34 | Y123 vs Y45 | |

FF% | 0.3334 | 0.2447 | 0.1937 |

FF60 | 0.2414 | 0.1635 | 0.0976 |

FA60 | 0.3714 | 0.2743 | 0.3224 |

GF% | 0.1891 | 0.2494 | 0.3514 |

GF60 | 0.0409 | 0.1468 | 0.1854 |

GA60 | 0.1953 | 0.3669 | 0.4476 |

Sh% | 0.0002 | 0.0117 | 0.0047 |

Sv% | 0.1278 | 0.2954 | 0.3350 |

PDO | 0.0551 | 0.0564 | 0.1127 |

RegPts | 0.2664 | 0.3890 | 0.3744 |

The above table shows the r^2 between past events and future events. The Y1 vs Y2 column is the r^2 between subsequent years (i.e. 0708 vs 0809, 0809 vs 0910, 0910 vs 1011, 1011 vs 1112). The Y12 vs Y23 is a 2 year vs 2 year r^2 (i.e. 07-09 vs 09-11 and 08-10 vs 10-12) and the Y123 vs Y45 is the 3 year vs 2 year comparison (i.e. 07-10 vs 10-12). RegPts is points earned during regulation play (using win-loss-tie point system).

As you can see, with increased sample size, the fenwick stats abilitity to predict future fenwick stats diminishes, particularly for fenwick for and fenwick %. All the other stats generally get better with increased sample size, except for shooting percentage which has no predictive power of future shooting percentage.

The increased predictive nature of the goal and percentage stats with increased sample size makes perfect sense as the increased sample size will decrease the random variability of these stats but I have no definitive explanation as to why the fenwick stats can’t maintain their predictive ability with increased sample sizes.

Let’s take a look at how well each statistic correlates with regulation points using various sample sizes.

1 year | 2 year | 3 year | 4 year | 5 year | |

FF% | 0.3030 | 0.4360 | 0.5383 | 0.5541 | 0.5461 |

GF% | 0.7022 | 0.7919 | 0.8354 | 0.8525 | 0.8685 |

Sh% | 0.0672 | 0.0662 | 0.0477 | 0.0435 | 0.0529 |

Sv% | 0.2179 | 0.2482 | 0.2515 | 0.2958 | 0.3221 |

PDO | 0.2956 | 0.2913 | 0.2948 | 0.3393 | 0.3937 |

GF60 | 0.2505 | 0.3411 | 0.3404 | 0.3302 | 0.3226 |

GA60 | 0.4575 | 0.5831 | 0.6418 | 0.6721 | 0.6794 |

FF60 | 0.1954 | 0.3058 | 0.3655 | 0.4026 | 0.3951 |

FA60 | 0.1788 | 0.2638 | 0.3531 | 0.3480 | 0.3357 |

Again, the values are r^2 with regulation points. Nothing too surprising there except maybe that team shooting percentage is so poorly correlated with winning because at the individual level it is clear that shooting percentages are highly correlated with goal scoring. It seems apparent from the table above that team save percentage is a significant factor in winning (or as my fellow Leaf fans can attest to, lack of save percentage is a significant factor in losing).

The final table I want to look at is how well a few of the stats are at predicting future regulation time point totals.

Y1 vs Y2 | Y12 vs Y34 | Y123 vs Y45 | |

FF% | 0.2500 | 0.2257 | 0.1622 |

GF% | 0.2214 | 0.3187 | 0.3429 |

PDO | 0.0256 | 0.0534 | 0.1212 |

RegPts | 0.2664 | 0.3890 | 0.3744 |

The values are r^2 with future regulation point totals. Regardless of time frame used, past regulation time point totals are the best predictor of future regulation time point totals. Single season FF% is slightly better at predicting following season regulation point totals but with 2 or more years of data GF% becomes a significantly better predictor as the predictive ability of GF% improves and FF% declines. This makes sense as we earlier observed that increasing sample size improves GF% predictability of future GF% while FF% gets worse and that GF% is more highly correlated with regulation point totals than FF%.

One thing that is clear from the above tables is that defense has been far more important to winning than offense. Regardless of whether we look at GF60, FF60, or Sh% their level of importance trails their defensive counterpart (GA60, FA60 and Sv%), usually significantly. The defensive stats more highly correlate with winning and are more consistent from year to year. Defense and goaltending wins in the NHL.

What is interesting though is that this largely differs from what we see at the individual level. At the individual level there is much more variation in the offensive stats indicating individual players have more control over the offensive side of the game. This might suggest that team philosophies drive the defensive side of the game (i.e. how defensive minded the team is, the playing style, etc.) but the offensive side of the game is dominated more by the offensive skill level of the individual players. At the very least it is something worth of further investigation.

The last takeaway from this analysis is the declining predictive value of fenwick/corsi with increased sample size. I am not quite sure what to make of this. If anyone has any theories I’d be interested in hearing them. One theory I have is that fenwick rates are not a part of the average GMs player personal decisions and thus over time as players come and go any fenwick rates will begin to vary. If this is the case, then this may represent an area of value that a GM could exploit.

GMs likely don’t use Fenwick plus-minus, but for a good reason, I’d argue.

Shots plus/minus stats are a questionable way to rate individual players, given the high number of false positives and negatives — about 40 per cent of the time a player will get a plus or a minus he does not deserve, as he had no significant impact in the shot for or against.

Over a season, with an increased sample size, this does not “even out” for two types of players in particular: good/average players who spend most of their time on ice with weak players and good/average players who spend most of their time on ice with great players. In the one case, the player will have a deflated Fenwick number, in the other case the player will have an inflated Fenwick number.

This is the essential problem any time you apply a stat earned by a team to an individual player, the same issue seen with official plus-minus.

They tell you about all ten players on the ice, the quality of teammates and the quality of competition, but it’s not always possible to pull out of that the impact of one player on the shots totals.

So if a GM decides to make personnel decisions based on Fenwick plus-minus, he won’t be a whole lot better off than a GM who makes them on goals plus-minus.

I’d think that the volatility in year to year Fenwick would be due to roster turnover. When looking at years 4 and 5 in comparison to year 1, you’re not increasing the sample size, you’re adding random, unrelated noise.

If you look at the Leafs roster from five years ago, you’ll notice that every single player from that team is no longer in the organization. The rosters of that year and this year are 100% different. Why then could you reasonably expect any correlation in year to year Fenwick and Corsi? If none of the players are the same, the outcomes won’t be the same.

When analyzing teams, you’re bounded by the 82-game regular season sample. Attempting to analyze the team over a trajectory of years is pointless because you’re simply not looking at the same team after year 1.

Explain then why correlations don’t get worse in the other stats?

Truth is, most teams have the same core of players. Boston has been build around Chara and Bergeron for years. Pittsburgh around Crosby, Malkin, Kunitz, Letang, etc. for years. Anaheim has been built around Getzlaf, Perry, Selanne, Ryan, etc. San Jose Thornton, Marleau, Boyle, Pavelski, etc. Yes, there are teams like the Leafs who have undertaken a complete overhaul, but that isn’t true for most teams.

Excellent work as always David. I was playing with similar idea.

1. Brian Macdonald has shown that even in one season Goals F/A are better than Corsi/Fenwick SO the tide HAS sifted away

from poss. stats predictive power.

2. DID you use Fenwick in all situations or did you use Close?

this will help me give a theory as to what is happening? Thanks Dan

One other pt. Phi (at the Sabermetric Research Blog) has shown (using Tango’s formula) that it takes

~144 games to eliminate luck from NHl reg. season stats. This would imply that ~2 yrs of data is best data set to use for future predictions (its what I use)

David;

What Fenwick numbers are you using for these calculations ? Close or Tied? Thanks