Feb 272013

The last several days I have been playing around a fair bit with team data and analyzing various metrics for their usefulness in predicting future outcomes and I have come across some interesting observations. Specifically, with more years of data, fenwick becomes significantly less important/valuable while goals and the percentages become more important/valuable. Let me explain.

Let’s first look at the year over year correlations in the various stats themselves.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.3334 0.2447 0.1937
FF60 0.2414 0.1635 0.0976
FA60 0.3714 0.2743 0.3224
GF% 0.1891 0.2494 0.3514
GF60 0.0409 0.1468 0.1854
GA60 0.1953 0.3669 0.4476
Sh% 0.0002 0.0117 0.0047
Sv% 0.1278 0.2954 0.3350
PDO 0.0551 0.0564 0.1127
RegPts 0.2664 0.3890 0.3744

The above table shows the r^2 between past events and future events.  The Y1 vs Y2 column is the r^2 between subsequent years (i.e. 0708 vs 0809, 0809 vs 0910, 0910 vs 1011, 1011 vs 1112).  The Y12 vs Y23 is a 2 year vs 2 year r^2 (i.e. 07-09 vs 09-11 and 08-10 vs 10-12) and the Y123 vs Y45 is the 3 year vs 2 year comparison (i.e. 07-10 vs 10-12). RegPts is points earned during regulation play (using win-loss-tie point system).

As you can see, with increased sample size, the fenwick stats abilitity to predict future fenwick stats diminishes, particularly for fenwick for and fenwick %. All the other stats generally get better with increased sample size, except for shooting percentage which has no predictive power of future shooting percentage.

The increased predictive nature of the goal and percentage stats with increased sample size makes perfect sense as the increased sample size will decrease the random variability of these stats but I have no definitive explanation as to why the fenwick stats can’t maintain their predictive ability with increased sample sizes.

Let’s take a look at how well each statistic correlates with regulation points using various sample sizes.

1 year 2 year 3 year 4 year 5 year
FF% 0.3030 0.4360 0.5383 0.5541 0.5461
GF% 0.7022 0.7919 0.8354 0.8525 0.8685
Sh% 0.0672 0.0662 0.0477 0.0435 0.0529
Sv% 0.2179 0.2482 0.2515 0.2958 0.3221
PDO 0.2956 0.2913 0.2948 0.3393 0.3937
GF60 0.2505 0.3411 0.3404 0.3302 0.3226
GA60 0.4575 0.5831 0.6418 0.6721 0.6794
FF60 0.1954 0.3058 0.3655 0.4026 0.3951
FA60 0.1788 0.2638 0.3531 0.3480 0.3357

Again, the values are r^2 with regulation points.  Nothing too surprising there except maybe that team shooting percentage is so poorly correlated with winning because at the individual level it is clear that shooting percentages are highly correlated with goal scoring. It seems apparent from the table above that team save percentage is a significant factor in winning (or as my fellow Leaf fans can attest to, lack of save percentage is a significant factor in losing).

The final table I want to look at is how well a few of the stats are at predicting future regulation time point totals.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.2500 0.2257 0.1622
GF% 0.2214 0.3187 0.3429
PDO 0.0256 0.0534 0.1212
RegPts 0.2664 0.3890 0.3744

The values are r^2 with future regulation point totals. Regardless of time frame used, past regulation time point totals are the best predictor of future regulation time point totals. Single season FF% is slightly better at predicting following season regulation point totals but with 2 or more years of data GF% becomes a significantly better predictor as the predictive ability of GF% improves and FF% declines. This makes sense as we earlier observed that increasing sample size improves GF% predictability of future GF% while FF% gets worse and that GF% is more highly correlated with regulation point totals than FF%.

One thing that is clear from the above tables is that defense has been far more important to winning than offense. Regardless of whether we look at GF60, FF60, or Sh% their level of importance trails their defensive counterpart (GA60, FA60 and Sv%), usually significantly. The defensive stats more highly correlate with winning and are more consistent from year to year. Defense and goaltending wins in the NHL.

What is interesting though is that this largely differs from what we see at the individual level. At the individual level there is much more variation in the offensive stats indicating individual players have more control over the offensive side of the game. This might suggest that team philosophies drive the defensive side of the game (i.e. how defensive minded the team is, the playing style, etc.) but the offensive side of the game is dominated more by the offensive skill level of the individual players. At the very least it is something worth of further investigation.

The last takeaway from this analysis is the declining predictive value of fenwick/corsi with increased sample size. I am not quite sure what to make of this. If anyone has any theories I’d be interested in hearing them. One theory I have is that fenwick rates are not a part of the average GMs player personal decisions and thus over time as players come and go any fenwick rates will begin to vary. If this is the case, then this may represent an area of value that a GM could exploit.


Jan 232013

One of the challenges in hockey analytics, or any type of data analysis, is how to best visualize data in a way that is exceptionally informative and yet really simple to understand. I have been working on a few things can came up with something that I think might be a useful tool to understand how a player gets utilized by his coach.

Let’s start with some background. We can get an idea of how a player is utilized by looking at when the player gets used and how frequently he gets used.  Offensive players get more ice time on the power play and more ice time when their team is trailing and needs a goal. Defensive players get more ice time on the PK and when they are protecting a lead. This all makes sense, but the issue is some teams spend more time on the PP or PK than others while bad teams end up trailing more than good teams and leading less. This means doing a straight time on ice comparison between players on different teams doesn’t always accurately depict the usage of the player. If a player on the Red Wings plays the same number of minutes with the lead as a player on the Blue Jackets it doesn’t mean the players are used int he same way.  The Blue Jackets will lead a game significantly less than the Red Wings thus in the hypothetical example above the Blue Jackets are depending on their player a higher percent of the time with a lead than the Red Wings are their player.

To get around this I looked at percentages. If Player A played 500 minutes with a lead and his team played a total of 2000 minutes with a lead during games which Player A played, then Players A’s ice time with a lead percentage would be 25%. In games in which Player A played he was used in 25% of the teams time leading. I can calculated these percentages for any situation from 5v5 to 4v5 or 5v4 special teams to leading and trailing situations. The challenge is to visualize the data in a clear and understandable way. To do this I use radar charts. Lets look at a couple examples so you get an idea and we’ll use players that have extreme and opposite usages: Daniel Sedin and Manny Malhotra.

For those not up to speed on my terminology f10 is zone start adjusted ice time which ignores the 10 seconds after a face off in either the offensive or defensive zone.

The charts above are largely driven by PP and PK ice time but players that are used more often in offensive roles will have their charts bulge to the top and top right while those in more defensive roles will have their charts bulge more to the bottom and bottom left. Also, the larger the ‘polygon’ the more ice time and more relied on the player is. In the examples above, Sedin is clearly used more often in offensive situations and clearly gets more ice time.

Let’s now look at a player who is used in a more balanced way, Zdeno Chara.

That is a chart that is representative of a big ice time player who plays in all situations. We can then take it a step further and compare players such as the following.

In normal 5v5 situations Gardiner was depended on about as much as Phaneuf, but Phaneuf was relied on a lot more on special teams and a bit more when protecting a lead. Of course, you can also compare across teams with these charts:

Phaneuf and Chara were depended on almost equally in all situations except on the PP where Phaneuf was used far more frequently.

I am not sure where I will go with these charts but I think I’ll look at them from time to time as I am sure they will be of use in certain situations and I have a few ideas as to how to expand on them to make them even more interesting/useful.


Sep 032012

A month and a half ago Eric T at NHLNumbers.com had a good post on quantifying the impact on teammate shooting percentage.  I wanted to take a second look at the relative importance the impact on teammate shooting percentage can have because I disagreed somewhat with Eric’s conclusions.

For a very small number of elite playmakers, the ability to drive shooting percentage can be a major component of their value. For the vast majority of the league, driving possession is a more significant and more reproducible path to success.

It is my belief that it is important to consider impact on shooting percentage for more than a “very small number of elite playmakers” and I’ll attempt to show that now.

The method that Eric used to identify a players impact on shooting percentage is to compare that players line mates shooting percentages with him to their overall shooting percentage.  As noted in the comments the one flaw with this is that their overall shooting percentage is impacted by the player we are trying to evaluate which will end up underestimating the impact.  In the comments Eric re-did the analysis using a true “without you” shooting percentage and the impact of driving teammate shooting percentages was greater than initially expected but he concluded the conclusions didn’t  chance significantly.

Overall average for the top ten is a 1.2% boost (up from 0.9% in story) and 5 goals per year (up from 4.5). I don’t think this changes the conclusions appreciably.

In the minutes that a player is on the ice with one of the very best playmakers in the league, his shooting percentage will be about 1% better. For a player who gets ~150-200 shots per year and plays ~40-60% of his ice time with that top-tier playmaker, that’s less than a one-goal boost. It’s just not that big of a factor.

He also suggested that using the “without you” shooting percentage instead of the “overall shooting percentage” would probably result in “more accurate but less precise” analysis.  This is because a guy like Daniel Sedin would get very few shots when playing apart from Henrik Sedin because they rarely play apart and this small “apart” sample size might be subject to significant small sample size errors.

Continue reading »