Feb 272013
 

The last several days I have been playing around a fair bit with team data and analyzing various metrics for their usefulness in predicting future outcomes and I have come across some interesting observations. Specifically, with more years of data, fenwick becomes significantly less important/valuable while goals and the percentages become more important/valuable. Let me explain.

Let’s first look at the year over year correlations in the various stats themselves.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.3334 0.2447 0.1937
FF60 0.2414 0.1635 0.0976
FA60 0.3714 0.2743 0.3224
GF% 0.1891 0.2494 0.3514
GF60 0.0409 0.1468 0.1854
GA60 0.1953 0.3669 0.4476
Sh% 0.0002 0.0117 0.0047
Sv% 0.1278 0.2954 0.3350
PDO 0.0551 0.0564 0.1127
RegPts 0.2664 0.3890 0.3744

The above table shows the r^2 between past events and future events.  The Y1 vs Y2 column is the r^2 between subsequent years (i.e. 0708 vs 0809, 0809 vs 0910, 0910 vs 1011, 1011 vs 1112).  The Y12 vs Y23 is a 2 year vs 2 year r^2 (i.e. 07-09 vs 09-11 and 08-10 vs 10-12) and the Y123 vs Y45 is the 3 year vs 2 year comparison (i.e. 07-10 vs 10-12). RegPts is points earned during regulation play (using win-loss-tie point system).

As you can see, with increased sample size, the fenwick stats abilitity to predict future fenwick stats diminishes, particularly for fenwick for and fenwick %. All the other stats generally get better with increased sample size, except for shooting percentage which has no predictive power of future shooting percentage.

The increased predictive nature of the goal and percentage stats with increased sample size makes perfect sense as the increased sample size will decrease the random variability of these stats but I have no definitive explanation as to why the fenwick stats can’t maintain their predictive ability with increased sample sizes.

Let’s take a look at how well each statistic correlates with regulation points using various sample sizes.

1 year 2 year 3 year 4 year 5 year
FF% 0.3030 0.4360 0.5383 0.5541 0.5461
GF% 0.7022 0.7919 0.8354 0.8525 0.8685
Sh% 0.0672 0.0662 0.0477 0.0435 0.0529
Sv% 0.2179 0.2482 0.2515 0.2958 0.3221
PDO 0.2956 0.2913 0.2948 0.3393 0.3937
GF60 0.2505 0.3411 0.3404 0.3302 0.3226
GA60 0.4575 0.5831 0.6418 0.6721 0.6794
FF60 0.1954 0.3058 0.3655 0.4026 0.3951
FA60 0.1788 0.2638 0.3531 0.3480 0.3357

Again, the values are r^2 with regulation points.  Nothing too surprising there except maybe that team shooting percentage is so poorly correlated with winning because at the individual level it is clear that shooting percentages are highly correlated with goal scoring. It seems apparent from the table above that team save percentage is a significant factor in winning (or as my fellow Leaf fans can attest to, lack of save percentage is a significant factor in losing).

The final table I want to look at is how well a few of the stats are at predicting future regulation time point totals.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.2500 0.2257 0.1622
GF% 0.2214 0.3187 0.3429
PDO 0.0256 0.0534 0.1212
RegPts 0.2664 0.3890 0.3744

The values are r^2 with future regulation point totals. Regardless of time frame used, past regulation time point totals are the best predictor of future regulation time point totals. Single season FF% is slightly better at predicting following season regulation point totals but with 2 or more years of data GF% becomes a significantly better predictor as the predictive ability of GF% improves and FF% declines. This makes sense as we earlier observed that increasing sample size improves GF% predictability of future GF% while FF% gets worse and that GF% is more highly correlated with regulation point totals than FF%.

One thing that is clear from the above tables is that defense has been far more important to winning than offense. Regardless of whether we look at GF60, FF60, or Sh% their level of importance trails their defensive counterpart (GA60, FA60 and Sv%), usually significantly. The defensive stats more highly correlate with winning and are more consistent from year to year. Defense and goaltending wins in the NHL.

What is interesting though is that this largely differs from what we see at the individual level. At the individual level there is much more variation in the offensive stats indicating individual players have more control over the offensive side of the game. This might suggest that team philosophies drive the defensive side of the game (i.e. how defensive minded the team is, the playing style, etc.) but the offensive side of the game is dominated more by the offensive skill level of the individual players. At the very least it is something worth of further investigation.

The last takeaway from this analysis is the declining predictive value of fenwick/corsi with increased sample size. I am not quite sure what to make of this. If anyone has any theories I’d be interested in hearing them. One theory I have is that fenwick rates are not a part of the average GMs player personal decisions and thus over time as players come and go any fenwick rates will begin to vary. If this is the case, then this may represent an area of value that a GM could exploit.

 

Jan 302013
 

For those familiar with my history, I have been a big proponent that there is more to the game of hockey than corsi and that players can certainly drive on-ice shooting percentage. I have not done much work at the team level, but now that I have team stats up at stats.hockeyanalysis.com I figured I’d take a look.

Since shooting percentages can vary significantly over small sample sizes, my goal was to use the largest sample size possible.  As such, I used 5 years of team data (2007-08 through 2011-12) and looked at each teams shooting and save percentages over that time. During those 5 years Vancouver led all teams in 5v5 ZS adjusted save percentage shooting at 10.69% while Columbus trailed all teams with a 8.61% shooting percentage. What’s interesting to note is the top 6 teams are Vancouver, Washington, Chicago, Philadelphia, Boston and Pittsburgh, all what we would consider the teams with the best offensive talent in the league. Meanwhile, the bottom 5 teams are Columbus, Los Angeles, Phoenix, Carolina, and Minnesota, all teams (except maybe Carolina) more associated with defensive play and a defense-first system.

As far as save percentage goes, Phoenix led the league with a 91.83% save percentage while the NY Islanders trailed with an 89.04% save percentage. The top 5 teams were Phoenix, Boston, Anaheim, Nashville, and Montreal.  The bottom 5 teams were NY Islanders, Tampa, Toronto, Chicago and Ottawa. Not surprises there.

As far as sample size goes, teams on average had 7,627 shots for (or against) over the course of the 5 years which gives us a reasonable large sample size to work with.

Now, in order to not use an extreme situation, I decided to compare the 5th best team to the 5th worst team in each category and then determine the probability that their deviations from each other are solely due to randomness.  This meant I was comparing Boston to Minnesota for shooting percentage and Montreal to Ottawa for save percentage.

TeamShootingPercentageComp

As you can see, there isn’t a lot of overlap, meaning there isn’t a large probability that luck is the reason for the difference between these two teams 5 year save percentages.  In fact, the intersecting area under the two curves amounts to just a 6.2% chance that the differences are luck driven.  That’s pretty small and the differences between the teams above Boston and below Minnesota would be greater. I think we can be fairly certain that there are statistically significant differences between teams 5 year shooting percentages and considering how much player movement and coaching changes there are over the span of 5 years it makes it that much more impressive. Single seasons differences could in theory (and probably likely are) more significant.

TeamSavePercentageComp

The save percentage chart provides even stronger evidence that there are non-luck factors at play.  The intersecting area under the curves equates to a 2.15% chance that the differences are due to luck alone. There is easily a statistically significant differences between Ottawa and Montreal’s 5 year save percentages. Long-term team save percentages are not luck driven!

So, the next question is, how much does it matter?  Well, the average team takes approximately 1500 5v5 ZS adjusted shots each season. The differences in shooting percentage between the 5th best team and the 5th worst team is 1.27% so that would equate to a difference of 19 goals per year during 5v5 ZS adjusted situations. The difference between the 5th best and 5th worst team in save percentage is 1.5% which equates to a 22.5 goal difference. These are not insignificant goal totals and they are likely driven solely by the percentages.

Now, how does this equate to differences in shot rates? If we take the team with the 5th highest shot rate and apply a league average shooting percentage and then compare it to the team with the 5th lowest shot rate we would find a difference of 17.5 goals over the course of a single season. This is slightly lower than what we saw for shooting and save percentages.

What is interesting is this (the percentages being more important than the shot rates) is not inconsistent with what we have seen at the individual level. In Tom Awad’s “What makes Good Players Good, Part I” post he identified 3 skills that good players differed from bad players. He identified the variation in +/- due to finishing as being 0.42 for finishing (shooting percentage), 0.08 for shot quality (shot location) and 0.30 for out shooting which would equate to out shooting being just 37.5% of the overall difference. I also showed that fenwick shooting percentage is more important than fenwick rates by a fairly significant margin.

Any player or team evaluation that doesn’t take into account the percentages or assumes the percentages are all luck driven is an evaluation that is not telling you the complete story.

 

Apr 262012
 

While doing my earlier post on Luongo’s value I noticed that Luongo’s 5v5close zone start adjusted save percentage relative to the rest of the league is much more mediocre than his 5v5 save percentage.  I decide to look into this further and realized that this is in large part due to zone start effects, and not score effects.  This got me to look into zone start effects on a goalies save percentage further.

I previously wrote an article where I described a simple and straight forward for adjusting for zone starts.  Basically you can fully account for zone start effects by ignoring the first 10 seconds after an offensive or defensive zone face off so this is what I have been doing ever since.  I hadn’t yet considered the effect on a goalies save percentage though so here it is.  In the table below you will find all goalies who played 3000 5v5 minutes over the previous 3 seasons.  There are 46 such goalies.

Goalie 5v5 Sv% ZS Adj. Sv% Diff. 10Sec Sv% 10Sec SA%
MICHAL NEUVIRTH 91.8% 90.5% 1.4% 96.0% 24.6%
JIMMY HOWARD 92.3% 90.6% 1.7% 98.2% 22.0%
ROBERTO LUONGO 93.0% 91.5% 1.5% 98.4% 21.6%
TIM THOMAS 93.1% 92.0% 1.1% 97.4% 21.0%
HENRIK LUNDQVIST 93.1% 92.0% 1.1% 97.4% 20.5%
TUUKKA RASK 93.0% 91.8% 1.2% 97.9% 20.3%
COREY CRAWFORD 92.2% 90.7% 1.4% 97.9% 20.1%
TOMAS VOKOUN 92.9% 91.7% 1.2% 97.7% 19.7%
EVGENI NABOKOV 92.6% 91.4% 1.2% 97.4% 19.5%
DWAYNE ROLOSON 91.4% 90.0% 1.4% 97.3% 19.1%
BRIAN BOUCHER 91.8% 90.4% 1.4% 97.9% 18.8%
SCOTT CLEMMENSEN 92.1% 90.7% 1.3% 97.9% 18.5%
JOSE THEODORE 92.6% 91.6% 1.0% 97.0% 18.4%
SERGEI BOBROVSKY 92.3% 91.0% 1.3% 97.8% 18.4%
SEMYON VARLAMOV 92.9% 91.9% 1.0% 97.4% 18.2%
JAMES REIMER 92.7% 91.7% 0.9% 96.9% 17.9%
RAY EMERY 91.8% 90.9% 0.9% 96.1% 17.8%
JOHAN HEDBERG 92.1% 91.1% 0.9% 96.5% 17.6%
MIIKKA KIPRUSOFF 92.5% 91.5% 1.0% 97.4% 17.5%
CRAIG ANDERSON 92.3% 91.2% 1.0% 97.2% 17.1%
JEAN-SEBASTIEN GIGUERE 91.7% 91.1% 0.6% 94.8% 17.0%
DEVAN DUBNYK 92.1% 91.0% 1.1% 97.4% 16.7%
PETER BUDAJ 92.1% 91.1% 0.9% 96.7% 16.6%
ANTTI NIEMI 92.7% 91.7% 1.0% 98.1% 16.1%
MARTY TURCO 91.8% 90.7% 1.1% 97.3% 16.0%
MARTIN BRODEUR 91.7% 90.5% 1.1% 97.7% 15.8%
JONATHAN QUICK 92.6% 91.9% 0.6% 96.1% 15.2%
BRIAN ELLIOTT 91.2% 90.3% 0.9% 96.2% 15.2%
CAREY PRICE 92.5% 91.9% 0.5% 95.6% 14.8%
JONAS GUSTAVSSON 90.9% 89.8% 1.1% 97.3% 14.7%
DAN ELLIS 91.3% 90.5% 0.8% 96.1% 14.3%
KARI LEHTONEN 92.6% 92.2% 0.5% 95.4% 14.2%
CAM WARD 92.6% 91.9% 0.7% 97.0% 13.5%
PEKKA RINNE 93.0% 92.5% 0.5% 96.1% 13.5%
CHRIS MASON 91.0% 90.7% 0.4% 93.3% 13.4%
MARC-ANDRE FLEURY 91.6% 90.9% 0.7% 96.4% 13.4%
ILYA BRYZGALOV 92.8% 92.3% 0.5% 96.2% 13.3%
NIKOLAI KHABIBULIN 91.0% 90.0% 1.0% 97.7% 13.3%
RYAN MILLER 92.7% 92.2% 0.5% 96.3% 13.2%
NIKLAS BACKSTROM 92.6% 92.2% 0.4% 95.2% 12.9%
ONDREJ PAVELEC 92.1% 91.5% 0.6% 96.0% 12.8%
STEVE MASON 91.2% 90.5% 0.8% 96.5% 12.6%
JONAS HILLER 92.6% 91.9% 0.7% 97.3% 12.5%
MIKE SMITH 92.4% 91.9% 0.6% 96.4% 12.4%
JAROSLAV HALAK 92.9% 92.4% 0.5% 96.8% 11.8%
MATHIEU GARON 91.0% 90.5% 0.5% 95.2% 11.4%
Average 92.3% 91.4% 0.9% 96.9% 16.2%

Included in the table are 5v5 save percentage, 5v5 zone start adjusted save percentage, the difference between 5v5 save percentage and zone start adjusted save percentage, the goalies save percentage on shots within 10 seconds of an offensive/defensive zone face off, and the percentage of shots that the goalie faced that were within 10 seconds of an offensive/defensive zone face off.

As you can see, the average within 10 seconds of a face off save percentage is significantly higher than the average face off adjusted save percentage (97.9% vs 91.4%) and the variation of the percentage of shots faced within 10 seconds if a face off across goalies is very significant (average 16.2%, low of 11.4%, high of 24.6%).  Furthermore, this average seems to be team driven (i.e. Rask/Thomas have quite similar/high percentages, S. Mason/Garon, Pavelec/C. Mason quite low).  This can introduce a significant bias into a goalies save percentage.  If we calculate an expected save percentage based on the number of ‘within 10 second’ and ‘during normal play’ shots and the average save percentages for those situations, the expected save percentage of Neuvirth would be 92.7% while the expected save percentage of Mathieu Garon would be 92.0%.  Now that is just a 0.7% difference which you may not think is huge, but the lowest 5v5 save percentage is 90.9% and the highest is 93.1% for a range of 2.3%.  That means a 0.7% variation due to within 10 seconds of a face off vs normal play could account for 30% of all variation in goaltender save percentage.  I am too lazy to look into it, but I wonder if variation in number of power play shots faced has as much impact on overall save percentage.  I am certain that this is a far more significant factor than score effects.

Thus, I think it is extremely important to factor out shots faced immediately after a face off when evaluating goaltender performance (and any players performance for that matter).  Furthermore, I fully stand by my previous Luongo post where I suggest he is a mere middle of the pack goalie.

 

Feb 052012
 

One of my beefs in the analysis and evaluation of hockey players is the notion that PDO (on-ice shooting percentage plus on-ice save percentage) can be used as a proxy for luck.  A perfect example of how PDO is used as a proxy for luck is this article by Neil Greenberg about the Washington Capitals.

For example, when Alex Ovechkin has been on the ice during even strength this season, the team has a shooting percentage of 8.2 percent and has saved shots at a rate of .917. So that makes his PDO value 999 (.082+.917=.999), which is almost exactly the league average. In other words, Ovechkin has seen neither very good nor very bad “puck luck” this season.

What’s useful about this metric is that it’s “unstable,” and over a large-enough sample will regress to 1000. Why 1000? Because every shot that is a goal is a shot not saved, and vice versa.

My beef with such an analysis is the notion that for all players PDO regresses to 1000 and any players with PDO above 1000 are lucky  and any players with a PDO below 1000 are unlucky.  While I do believe luck can influence PDO over small sample sizes, not all players have a natural PDO level of 1000 and there are two reasons why.

1.  Not all players play in front of perfectly average goalies which will have a major impact on the save percentage portion of PDO.

2. Players can drive shooting percentages.

To show you what I mean on point 2, I took 4 years (2007-08 to 2010-11) of 5v5 zone start adjusted data and grouped forwards based on their ice time over those 4 years and then calculated the on-ice shooting and save percentages and PDO for each group.  Here is what I found.

TOI (minutes) SH% SV% PDO
<500 7.5% 90.9% 983.5
500-999 7.9% 91.2% 991.2
1000-1499 8.0% 91.2% 992.2
1500-1999 8.2% 91.2% 993.4
2000-2499 8.6% 91.1% 997.0
2500-2999 9.0% 91.2% 1001.9
3000-3499 9.3% 91.2% 1004.4
3500-4000 9.8% 90.8% 1006.1
4000+ 10.4% 90.8% 1012.4

PDO varies from 983.5 up to 1012.4 depending on the group’s ice time.  This is largely driven by shooting percentage which varies from 7.5% to 10.4% with the players with the lowest amount of ice time having the lowest on-ice shooting percentage and the players with the most ice time having the highest shooting percentage.  Order is the enemy of luck so seeing shooting percentages ordered this nicely tells me something other than luck is happening.  Driving on-ice shooting percentage is a skill.  This means more talented players can have a natural PDO (the PDO that they should regress to) above 1000 and less talented players can have a nautral PDO below 1000.  Factor in the goaltending and a player could have a natural PDO well above or well below 1000.

Now, this is not to say that luck isn’t a factor in a players PDO, especially over small sample sizes, it’s just we can’t estimate that luck by assuming every players natural “regress to” PDO is 1000.  Daniel Sedin has a PDO of 1043 this season (through Thursday February 2nd).  Is it fair to suggest he has been luck and should see his PDO regress to 1000?  When you consider his4-year PDO is 1035 (and his 3 year PDO is 1054) probably not.  His natural, “regress to” PDO is probably not that far off his current 1043 PDO.  Now if you are talking about Todd Bertuzzi this season it’s a different story.  Through Thursday he had a a PDO of 1056 while his 4-year PDO is 994 and he hasn’t had a PDO above 1000 in any of the previous 3 seasons.  It is probably fair to presume that Bertuzzi’s natural regress to PDO is much closer to 1000, maybe even below 1000 in which case it is fair to conclude that Bertuzzi has probably been quite lucky so far this season and is unlikely to continue at this pace the remainder of the season.

When used properly PDO can be an indication of luck but to do so we need to consider the context of a players PDO, not just assume all players PDO’s will necessarily regress to 1000.

 

Mar 182011
 

The guys over at Behind the Net have initiated a ‘prove shot quality exists’ competition and in response to that Rob Vollman took a quick and dirty look at shooting percentage suppression.  As I showed the other day, Rob’s logic was a little off.

Rob started off by identifying a number of players with high on ice save percentages over the past 3 seasons.  Some of these guys included low minute players mostly playing on the fourth line against other fourth line caliber players, but there were a handful of players who played relative significant number of minutes and still put up good on ice save percentages.  Let me remind you of a few names that Rob identified:  forwards Marco Sturm, Manny Malhotra, Tyler Kennedy, Travis Moen, Taylor Pyatt, Michael Ryder, defensemen Kent Huskins, Sean O’Donnell, Mike Weaver, Mark Stuart.  I’ll get back to these guys later but I’ll claim that Rob dismissed some of them prematurely by claiming they played against weak competition.

As you may or may not know I have developed offensive and defensive ratings for every player and these can be found at http://stats.hockeyanalysis.com/ Furthermore, I have created these using goals for/against as well as shots for/against, fenwick for/against, and corsi for/against.  For clarification, fenwick is shots + missed shots while Corsi is shots + missed shots + blocked shots.  For this study I decided to use fenwick instead of shots because I had the data handy and I was too lazy to get the shot data in the right format but there shouldn’t be a significant difference (the two are very highly correlated).

Continue reading »

Jan 062011
 

The score of a game influences how a team plays.  When a team is trailing they play a more aggressive offensive game, when they are up a goal or more, they play a more defensive game.  The question I answer today is, how does score influence a teams save percentage.

To answer this question I looked at the past 3 seasons of 5v5 even strength save percentage data when the score is tied, when the team is up by a goal, when the team is up by 2 or more goals, when the team is down a goal and when the team is down by 2 or more goals.  For each team and score category I have a data point for 2007-08, 2008-09, 2009-10 as well as a three year average (2007-10).  For each score category I sorted from lowest to highest save percentage and then plotted them on one chart and got the following:

As you can see, when the game is tied generally produces higher save percentages than when a team is leading or trailing and when a team is trailing their save percentages are at their worst.  This is probably not surprising as a team will open up its game in hopes of creating offense but also puts them at risk defensively.  Now, what that table doesn’t tell us is if all teams experience the same score effects or, for whatever reason, do some teams actually have improved save percentages when trailing or leading.  The following chart shows each teams 3 year save percentage by score ordered from lowest 5v5 game tied save percentage.

The majority of teams have the majority of their leading or trailing save percentages below the game tied save percentages but there are a number of occassions where that doesn’t occur and they are mostly related to up2 or up2+ save percentages.  The only teams that had a down1 or down2+ save percentage above game tied save percentage were:

  1. Dallas – Down1: 92.51% vs Tied: 91.74%
  2. Detroit – Down1: 93.05% vs Tied: 92.16%
  3. Pittsburgh: Down2+: 92.87% vs Tied: 92.78%
  4. Minnesota:  Down2+: 93.21% vs Tied: 92.89%
  5. Florida: Down1: 93.92% vs Tied: 93.23%

On average, teams had their down 1 goal save percentage 1.3% lower than their game tied save percentage and their down 2+ goal save percentage 1.90% lower than their game tied save percentage.  The average team save percentage at 5v5 tied is 92.7% vs 91.4% down a goal, 90.8% down 2+ goals, 92.2% up a goal and 92.1% up 2 goals.  Tailing can have a sizable negative impact on save percentage where as leading can have a minor negative impact.

So what does this mean?  It means we need to be careful when evaluating goalies (and probably shooters to some extent) based on save percentage (special team effects) or even 5v5 even strength save percentage because the game situations a goalie has been exposed to will influence the goalies save percentage.  A goalie on a weak team will have his save percentage lowered simply because his team is going to be trailing more often and be forced to take chances to create offense and thus he will be exposed to tougher shots where as a goalie on a good team who leads the game more than they trail a lot will not face as many tough shots.

One interesting thing I noticed while doing all this was the Toronto Maple Leafs up by a single goal performance over the last 3 seasons.  While they were middle of the pack 5v5 game tied (16th in 3 year 5v5 game tied save percentage), they were downright horrific when they got up a goal.  They just couldn’t hold a lead.  The three worst single season save percentages when up a goal were the 2009-10 Leafs, 2008-09 Leafs, and the 2007-08 Leafs so they were three for three there.  Over the course of the past 3 seasons the Leafs posted an 88.4 save percentage when up a goal which was 3.44 standard deviations from the mean.  Next worse what the Ottawa Senators who were well ahead of them at 90.8, a mere 1.23 standard deviations from the mean.  The good news for Leaf fans is their 5v5 up a goal save percentage is much better this year: 95.6% (better than any team in any of the last 3 seasons), 97.2 for Gustavsson and 93.9% for Giguere so they are much better at maintaining the lead.  Unfortunately this season they can’t score well enough to get them a lead to protect.