Dec 082014
 

I have tackled the subject of on-ice shooting percentage a number of times here but I think it is a subject that has been under researched in hockey analytics. Historically people have done some split half comparisons found weak correlations and written it off as a significant or useful factor in hockey analytics. While some of the research has merit, a lot of the research deals with too small of a sample size to get any really useful correlations. Split-half season correlations with majority of the players is including players that might have 3 goals int he first half and 7 in the second half and that is just not enough to draw any conclusions from. Even year over year correlations have their issues and in addition to smallish sample sizes it suffers problems related to roster changes and how roster changes impact on-ice shooting percentages. Ideally we’d want to eliminate all these factors and get down to actual on-ice shooting percentage talent factoring out both luck/randomness and roster changes.

Today @MimicoHero posted an article discussing shooting percentage (and save percentage)  by looking at multi-year vs multi-year comparisons. It’s a good article so have a read and I have written many articles like this in the past. This is important research but as I eluded to above, year over year comparisons suffer from issues related to roster change which potentially limit what we can actually learn from the data. People often look at even/odd games to eliminate these roster issues and that is a pretty good methodology. Once in the past I took this idea to the extreme and even used even/odd seconds in order to attempt to isolate true talent from other factors (note that subsequent to that article I found a bug in my code that may have impacted the results so I don’t have 100% confidence in them. I hope to revisit this in a future post to confirm the results.). This pretty much assures that the teammates a player plays with and the opponents they play against and the situations they play in will be almost identical in both halves of the data. I hope to revisit the even/odd second work in a future post to confirm and extend on that research but for this post I am going to take another approach. For this post I am going to focus solely on shooting percentage and use an even/odd shot methodology which should do a pretty good job of removing roster change effects as well.

I took all 5v5 shot data from 2007-08 through 2013-14 and for each forward I took their first 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800 and 2000 shots for that they were on the ice for. This allowed me to do 100 vs 100 shot, 200 vs 200 shot, … 1000 vs 1000 shot comparisons. For comparison sake, in addition to even/odd shots I am also going to look at first half vs second half comparisons to get an idea of how different the correlations are (i.e. what the impact of roster changes is on a players on-ice shooting percentage). Here are the resulting correlation coefficients.

Scenario SplitHalf Even vs Odd NPlayers
100v100 0.186 0.159 723
200v200 0.229 0.268 590
300v300 0.296 0.330 502
400v400 0.368 0.375 443
500v500 0.379 0.440 399
600v600 0.431 0.481 350
700v700 0.421 0.463 319
800v800 0.451 0.486 285
900v900 0.440 0.454 261
1000v1000 0.415 0.498 222

And here is the table in graphical form.

EvenVsOdd_FirstvsSecondHalf_ShPct

Let’s start with the good news. As expected even vs odd correlations are better than first half vs second half correlations though it really isn’t as significant of a difference as I might have expected. This is especially true with the larger sample sizes where the spread should theoretically get larger.

What I did find a bit troubling is that correlations seem to max out at 600 shots vs 600 shots and even those correlations aren’t all that great (0.45-0.50). In theory as sample size increases one should get better and better correlations and as they approach infinity they should approach 1.00. Instead, they seem to approach 0.5 which had me questioning my data.

After some thought though I realized the problem was likely due to the decreasing number of players within the larger shot total groups. What this does is it restricts the spread in talent as only the top level players remain in those larger groups. As you increase the shot requirements you start weeding out the lesser players that are on the ice for less ice time and fewer shots. So, while randomness decreases with increased number of shots so does the spread in talent. My theory is the signal (talent) to noise (randomness) ratio is not actually improving enough to see improving results.

To test this theory I looked at the standard deviations within each even/odd group. Since we also have a definitive N value for each group (100, 200, 300, etc.) and I can calculate the average shooting percentage it is possible to estimate the standard deviation due to randomness. With the overall standard deviation and an estimated standard deviation of randomness it is possible to calculate the standard deviation in on-ice shooting percentage talent. Here are the results of that math.

Scenario SD(EvenSh%) SD(OddSh%) SD(Randomness) SD(Talent)
100v100 2.98% 2.84% 2.67% 1.15%
200v200 2.22% 2.08% 1.91% 1.00%
300v300 1.99% 1.87% 1.56% 1.14%
400v400 1.71% 1.70% 1.35% 1.04%
500v500 1.56% 1.57% 1.21% 1.00%
600v600 1.50% 1.50% 1.11% 1.01%
700v700 1.35% 1.39% 1.03% 0.90%
800v800 1.35% 1.33% 0.96% 0.93%
900v900 1.24% 1.26% 0.91% 0.86%
1000v1000 1.14% 1.23% 0.86% 0.81%

And again, the chart in graphical format.

EstimatingOnIceShootingPctTalent

The grey line is the randomness standard deviation and it flows as expected, decreasing in a nice manner. This is a significant driver of the even and odd standard deviations but the talent standard deviation slowly falls off as well. If we call SD(Talent) the signal and SD(Randomness) as the noise then we can plot a signal to noise ratio calculated as ST(Talent) / SD(Randomness).

SignalToNoise

What is interesting is that the signal to noise ration improves significantly up to 600v600 then it sort of levels off. This is pretty much in line with what we saw earlier in the first table and chart. After 600v600 we start dropping out the majority of the fourth liners who don’t get enough ice time to be on the ice for 1400+ shots at 5v5. Later we start dropping out the 3rd liners too. The result is the signal to noise ratio flattens out.

With that said, there is probably enough information in the above charts to determine what a reasonable spread in on-ice shooting percentage talent actually is. Specifically, the yellow SD(Talent) line does give us a pretty good indication of what the spread in on-ice shooting percentage talent really is. Based on this analysis a reasonable estimate for one standard deviation in shooting percentage talent in a typical NHL season is probably around 1.0% or maybe slightly above.

What does that mean in real terms (i.e. goal production)? Well, the average NHL forward is on the ice for ~400 5v5 shots per season. Thus, a player with an average amount of ice time that shoots one standard deviation (I’ll use 1.0% as standard deviation to be conservative) above average would be on the ice for 4 extra goals due solely to their on-ice shooting percentage. Conversely an average ice time player with an on-ice shooting percentage one standard deviation below average would be on the ice for about 4 fewer goals.

Now of course if you are an elite player getting big minutes the benefit is far greater. Let’s take Sidney Crosby for example. Over the past 7 seasons his on-ice shooting percentage is about 3.33 standard deviations above average and last year he was on the ice for just over 700 shots. That equates to an extra 23 goals due to his extremely good on-ice shooting percentage. That’s pretty impressive if you think about it.

Now compare that to Scott Gomez whose 7-year shooting percentage is about 1.6 standard deviations below average. In 2010-11 he was on the ice for 667 shots for. That year his lagging shooting percentage talent an estimated 10.6 goals. Imagine, Crosby vs Gomez is a 33+ goal swing in just 5v5 offensive output.

(Yes, I am taking some liberties in those last few paragraphs with assumptions relating to luck/randomness, quality of team mates and what not so not all good or bad can necessarily be attributed to a single player or to the extent described but I think it drives the point, a single player can have a significant impact through on-ice shooting percentage talent alone).

In conclusion, even after you factor out luck and randomness, on-ice shooting percentage can player a significant role in goal production at the player level and, as I have been saying for years, must be taken into consideration in player evaluation. If you aren’t considering that a particular player might be particularly good or particularly bad at driving on-ice shooting percentage you may not be getting the full story.

(In a related post, there was an interesting article on Hockey Prospectus yesterday looking at how passing affects shooting percentage which supports some earlier findings that showed that good passers are often good at boosting teammates on-ice shooting percentage. Of course I have also shown that shots on the rush also result in higher shooting percentage so to the extent that players are good at generating rush shots they should be good at boosting their on-ice shooting percentages).

 

Sep 062013
 

I had first intended this to be a comment to Tyler Dellow’s investigation into Phaneuf and Grabovski shot totals for and against when they were on the ice together but once I started pulling numbers I decided it was important enough to have a post on its own and not get hidden in the comments somewhere. Go read Tyler’s post because it is a worthwhile read but he found that the when Grabovski/Phaneuf were on the ice together the Leafs were incredibly poor at getting shift with shots while good at having shifts where they gave up shots and it had very little to do with not getting multiple shots per shift or giving up multiple shots per shift at a higher rate.

This is helpful to know because it narrows the issue: the Leafs’ Corsi% last year with Grabovski/Phaneuf on the ice didn’t collapse because of a change in the rate at which multi-SAF and multi-SAA shifts occurred; it collapsed because the Leafs suddenly became extraordinarily poor at generating the first SAF and preventing the first SAA. If you’re blaming Korbinian Holzer or Mike Kostka or Jay McClement for this, you need to come up with a convincing explanation as to why their impact was felt in terms of the likelihood of the first shot attempt occurring, but not really on subsequent ones.

A lot of people blame Holzer or Kostka or McClement but I will present another (at least partial) explanation. Phaneuf and Grabovski’s numbers tanked because the Leafs were winning. Let me explain.

Here is a table of Phaneuf’s CF% over the last 4 seasons during various 5v5 situations: Tied, Leading, Trailing, Total. Note that part of 2009-10 season was with Calgary.

Tied Leading Trailing 5v5
2009-10 53.4% 44.3% 58.2% 52.3%
2010-11 46.5% 38.6% 54.7% 47.1%
2011-12 47.7% 44.3% 56.4% 49.9%
2012-13 39.6% 35.7% 55.4% 41.9%

In Tied and Overall situations Phaneuf’s numbers tanked quite significantly, particularly last season, but where it gets really interesting is in the Leading and Trailing stats. When Leading his stats dropped off a bit to 35.7% last year but he was at 38.6% in 2010-11 and was only 44.3% the other years so pretty bad all round. What’s interesting is his trailing stats have maintained significantly higher levels right through from 2009-10 through 2012-13 with relatively very little fluctuation (compared to leading and tied stats).

Now, let’s look at the percentage of ice time Phaneuf played in each situation.

Tied Leading Trailing
2009-10 41.2% 28.3% 30.5%
2010-11 31.9% 27.7% 40.4%
2011-12 33.5% 29.8% 36.6%
2012-13 32.9% 42.3% 24.8%

He played much more in tied situations in 2009-10 but maintained about the same the following 3 years. Where the big difference lies is in the percentage of ice time he played while leading and trailing. He played far more while leading last year and far less while trailing. When you combine this with the previous table, it isn’t a surprise that his corsi numbers tanked. If we took last years CF% and applied them to his ice time percentages of 2011-12 he’d have ended up with a CF% of 44.2% which is a fair bit higher than his actual 2012-13 CF% of 41.9%. This means about 29% (or 2.3 CF% points) of his drop off in CF% from 2011-12 to 2012-13 can be attributed to ice time changes alone. That’s not an insignificant amount.

As for the rest, I believe Randy Carlyle’s more defensive style of hockey compared to Ron Wilson’s is a significant factor. When leading teams play a more defensive game and we see above (and you’ll see with other players if you looked) when leading your CF% tanks compared to when trailing and playing offensive hockey. How much is Phaneuf’s drop off in CF% in 5v5 tied situations last year is due to Phaneuf being asked to play a far more defensive role?  Probably a significant portion of it.

When we take everything into consideration, the majority of Phaneuf’s drop off in CF% last year can probably be attributed to Leading vs Trailing ice time differences and being asked to play a far more significant defensive role in tied situations and probably only a very small portion of it can be attributed to playing with Holzer and Kostska or any change in quality of competition or zone starts (which I still claim have very little direct impact on stats, though they can be a proxy for their style of play, defensive vs offensive).

Now, let’s take a quick look at Grabovski’s stats.

Tied Leading Trailing 5v5
2009-10 58.0% 55.8% 56.1% 56.8%
2010-11 52.2% 49.8% 58.0% 53.6%
2011-12 52.8% 46.9% 59.2% 53.7%
2012-13 44.0% 38.2% 55.7% 44.3%

Much the same as Phaneuf. His 5v5 tied stats dropped off significantly but his trailing stats maintained at a fairly good level. His Leading stats have dropped off steadily since 2009-10, probably as he has been given more defensive responsibility.

Tied Leading Trailing
2009-10 38.6% 20.3% 41.0%
2010-11 33.3% 28.9% 37.8%
2011-12 33.5% 26.8% 39.7%
2012-13 32.2% 42.7% 25.1%

Nothing too different from Phaneuf. If anything more extreme changes in Leading vs Trailing. For Grabovski, 29.8% of his drop off in CF% last year can be attributed changes in Leading/Trailing ice time while I suspect a significant portion of the rest can be attributed in large part to Randy Carlyle’s more defensive game, and asking Grabovski to play a more defensive role in particular.

Now, how do the Leafs as a team look?

Tied Leading Trailing 5v5
2009-10 52.1% 48.0% 56.1% 52.8%
2010-11 46.1% 41.6% 54.0% 47.8%
2011-12 47.9% 42.1% 55.6% 48.9%
2012-13 43.8% 39.5% 52.2% 44.1%

The Leafs drop off in CF% is pretty even across the board. They lost 4.1% when tied, 2.6% when leading and 3.4% when trailing.  Interestingly that led to a 4.8% drop overall which kind of makes little sense until you look at their leading/trailing ice times.

Tied Leading Trailing
2009-10 37.2% 22.0% 40.9%
2010-11 33.6% 28.9% 37.5%
2011-12 33.7% 29.8% 36.5%
2012-13 33.1% 42.0% 25.0%

Tied ice time remained about the same last year as 2011-12 but leading ice time jumped from 29.8% to 42.0% while trailing ice time dropped from 36.5% to 25.0%. So, when we look at the Leafs as a whole and applied this years leading/trailing/tied CF% stats to last years  ice time percentages they would have only dropped from 48.9% to 45.6%. The remainder of the fall to 41.1% is due to changes in leading/trailing/tied ice times, or 30.8% of the drop off.

So, to summarize about 30% of the drop off in the Leafs team and individual CF% from 2011-12 season to last season can be directly attributed to changes int he Leafs leading/trailing/tied ice time percentages. This means 30% of the drop off can be attributed to the Leafs being a far better team last year at getting leads and winning games.  Or, if you believe that was largely due to lucky shooting you can say 30% of the Leafs drop off in CF% is due to good luck.

Although I haven’t explicitly proven it, I’ll contend that a significant portion of the remainder comes down to Randy Carlye being a far more defensive coach than Ron Wilson was. Maybe another day I’ll test this theory by looking at someone like Phil Kessel and see how his stats changed because Phil Kessel was not given a heavy defensive role last year like Phaneuf and Grabovski were and thus may not have seen the same drop off, particularly in tied situations (quick check: Kessel was 47.3 CF% in 5v5 tied situations in 2011-12 and 42.3% last year so he saw a significant drop off too but not as much as Phaneuf or Grabovski). It may also be interesting to look at how ice time changes impact shooting and save percentages and whether this partly explains the Leafs high shooting percentage last year and maybe what impact it had on their relatively decent save percentages too compared to previous years.

As you can see though, ice time changes can have a significant impact on a players statistics and it is important to take that into consideration in player evaluation like when I looked at Phaneuf’s leading/trailing stats a while back.

(All the stats in this post came from stats.hockeyanalysis.com so feel free to go there, pull the data and analyze whichever team or player you want in leading/trailing/tied situations)

Feb 272013
 

The last several days I have been playing around a fair bit with team data and analyzing various metrics for their usefulness in predicting future outcomes and I have come across some interesting observations. Specifically, with more years of data, fenwick becomes significantly less important/valuable while goals and the percentages become more important/valuable. Let me explain.

Let’s first look at the year over year correlations in the various stats themselves.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.3334 0.2447 0.1937
FF60 0.2414 0.1635 0.0976
FA60 0.3714 0.2743 0.3224
GF% 0.1891 0.2494 0.3514
GF60 0.0409 0.1468 0.1854
GA60 0.1953 0.3669 0.4476
Sh% 0.0002 0.0117 0.0047
Sv% 0.1278 0.2954 0.3350
PDO 0.0551 0.0564 0.1127
RegPts 0.2664 0.3890 0.3744

The above table shows the r^2 between past events and future events.  The Y1 vs Y2 column is the r^2 between subsequent years (i.e. 0708 vs 0809, 0809 vs 0910, 0910 vs 1011, 1011 vs 1112).  The Y12 vs Y23 is a 2 year vs 2 year r^2 (i.e. 07-09 vs 09-11 and 08-10 vs 10-12) and the Y123 vs Y45 is the 3 year vs 2 year comparison (i.e. 07-10 vs 10-12). RegPts is points earned during regulation play (using win-loss-tie point system).

As you can see, with increased sample size, the fenwick stats abilitity to predict future fenwick stats diminishes, particularly for fenwick for and fenwick %. All the other stats generally get better with increased sample size, except for shooting percentage which has no predictive power of future shooting percentage.

The increased predictive nature of the goal and percentage stats with increased sample size makes perfect sense as the increased sample size will decrease the random variability of these stats but I have no definitive explanation as to why the fenwick stats can’t maintain their predictive ability with increased sample sizes.

Let’s take a look at how well each statistic correlates with regulation points using various sample sizes.

1 year 2 year 3 year 4 year 5 year
FF% 0.3030 0.4360 0.5383 0.5541 0.5461
GF% 0.7022 0.7919 0.8354 0.8525 0.8685
Sh% 0.0672 0.0662 0.0477 0.0435 0.0529
Sv% 0.2179 0.2482 0.2515 0.2958 0.3221
PDO 0.2956 0.2913 0.2948 0.3393 0.3937
GF60 0.2505 0.3411 0.3404 0.3302 0.3226
GA60 0.4575 0.5831 0.6418 0.6721 0.6794
FF60 0.1954 0.3058 0.3655 0.4026 0.3951
FA60 0.1788 0.2638 0.3531 0.3480 0.3357

Again, the values are r^2 with regulation points.  Nothing too surprising there except maybe that team shooting percentage is so poorly correlated with winning because at the individual level it is clear that shooting percentages are highly correlated with goal scoring. It seems apparent from the table above that team save percentage is a significant factor in winning (or as my fellow Leaf fans can attest to, lack of save percentage is a significant factor in losing).

The final table I want to look at is how well a few of the stats are at predicting future regulation time point totals.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.2500 0.2257 0.1622
GF% 0.2214 0.3187 0.3429
PDO 0.0256 0.0534 0.1212
RegPts 0.2664 0.3890 0.3744

The values are r^2 with future regulation point totals. Regardless of time frame used, past regulation time point totals are the best predictor of future regulation time point totals. Single season FF% is slightly better at predicting following season regulation point totals but with 2 or more years of data GF% becomes a significantly better predictor as the predictive ability of GF% improves and FF% declines. This makes sense as we earlier observed that increasing sample size improves GF% predictability of future GF% while FF% gets worse and that GF% is more highly correlated with regulation point totals than FF%.

One thing that is clear from the above tables is that defense has been far more important to winning than offense. Regardless of whether we look at GF60, FF60, or Sh% their level of importance trails their defensive counterpart (GA60, FA60 and Sv%), usually significantly. The defensive stats more highly correlate with winning and are more consistent from year to year. Defense and goaltending wins in the NHL.

What is interesting though is that this largely differs from what we see at the individual level. At the individual level there is much more variation in the offensive stats indicating individual players have more control over the offensive side of the game. This might suggest that team philosophies drive the defensive side of the game (i.e. how defensive minded the team is, the playing style, etc.) but the offensive side of the game is dominated more by the offensive skill level of the individual players. At the very least it is something worth of further investigation.

The last takeaway from this analysis is the declining predictive value of fenwick/corsi with increased sample size. I am not quite sure what to make of this. If anyone has any theories I’d be interested in hearing them. One theory I have is that fenwick rates are not a part of the average GMs player personal decisions and thus over time as players come and go any fenwick rates will begin to vary. If this is the case, then this may represent an area of value that a GM could exploit.

 

Jan 302013
 

For those familiar with my history, I have been a big proponent that there is more to the game of hockey than corsi and that players can certainly drive on-ice shooting percentage. I have not done much work at the team level, but now that I have team stats up at stats.hockeyanalysis.com I figured I’d take a look.

Since shooting percentages can vary significantly over small sample sizes, my goal was to use the largest sample size possible.  As such, I used 5 years of team data (2007-08 through 2011-12) and looked at each teams shooting and save percentages over that time. During those 5 years Vancouver led all teams in 5v5 ZS adjusted save percentage shooting at 10.69% while Columbus trailed all teams with a 8.61% shooting percentage. What’s interesting to note is the top 6 teams are Vancouver, Washington, Chicago, Philadelphia, Boston and Pittsburgh, all what we would consider the teams with the best offensive talent in the league. Meanwhile, the bottom 5 teams are Columbus, Los Angeles, Phoenix, Carolina, and Minnesota, all teams (except maybe Carolina) more associated with defensive play and a defense-first system.

As far as save percentage goes, Phoenix led the league with a 91.83% save percentage while the NY Islanders trailed with an 89.04% save percentage. The top 5 teams were Phoenix, Boston, Anaheim, Nashville, and Montreal.  The bottom 5 teams were NY Islanders, Tampa, Toronto, Chicago and Ottawa. Not surprises there.

As far as sample size goes, teams on average had 7,627 shots for (or against) over the course of the 5 years which gives us a reasonable large sample size to work with.

Now, in order to not use an extreme situation, I decided to compare the 5th best team to the 5th worst team in each category and then determine the probability that their deviations from each other are solely due to randomness.  This meant I was comparing Boston to Minnesota for shooting percentage and Montreal to Ottawa for save percentage.

TeamShootingPercentageComp

As you can see, there isn’t a lot of overlap, meaning there isn’t a large probability that luck is the reason for the difference between these two teams 5 year save percentages.  In fact, the intersecting area under the two curves amounts to just a 6.2% chance that the differences are luck driven.  That’s pretty small and the differences between the teams above Boston and below Minnesota would be greater. I think we can be fairly certain that there are statistically significant differences between teams 5 year shooting percentages and considering how much player movement and coaching changes there are over the span of 5 years it makes it that much more impressive. Single seasons differences could in theory (and probably likely are) more significant.

TeamSavePercentageComp

The save percentage chart provides even stronger evidence that there are non-luck factors at play.  The intersecting area under the curves equates to a 2.15% chance that the differences are due to luck alone. There is easily a statistically significant differences between Ottawa and Montreal’s 5 year save percentages. Long-term team save percentages are not luck driven!

So, the next question is, how much does it matter?  Well, the average team takes approximately 1500 5v5 ZS adjusted shots each season. The differences in shooting percentage between the 5th best team and the 5th worst team is 1.27% so that would equate to a difference of 19 goals per year during 5v5 ZS adjusted situations. The difference between the 5th best and 5th worst team in save percentage is 1.5% which equates to a 22.5 goal difference. These are not insignificant goal totals and they are likely driven solely by the percentages.

Now, how does this equate to differences in shot rates? If we take the team with the 5th highest shot rate and apply a league average shooting percentage and then compare it to the team with the 5th lowest shot rate we would find a difference of 17.5 goals over the course of a single season. This is slightly lower than what we saw for shooting and save percentages.

What is interesting is this (the percentages being more important than the shot rates) is not inconsistent with what we have seen at the individual level. In Tom Awad’s “What makes Good Players Good, Part I” post he identified 3 skills that good players differed from bad players. He identified the variation in +/- due to finishing as being 0.42 for finishing (shooting percentage), 0.08 for shot quality (shot location) and 0.30 for out shooting which would equate to out shooting being just 37.5% of the overall difference. I also showed that fenwick shooting percentage is more important than fenwick rates by a fairly significant margin.

Any player or team evaluation that doesn’t take into account the percentages or assumes the percentages are all luck driven is an evaluation that is not telling you the complete story.

 

Jan 172013
 

Yesterday evening James Mirtle from the Globe and Mail posted an article on The Curious case of Tim Connolly and the Leafs.  It’s worth a read so go read it but the premise of the article is how the narrative around Tim Connolly in training camp is he had a poor year last year and he needs to perform better this year.  Makes sense from most peoples view points but Connolly tries to present a different perspective.

Connolly can be prickly to deal with and wasn’t particularly interested in talking about last season, but when pressed, you could tell he felt he did more of value than the narrative – that he’s been an unmitigated bust in Toronto – would suggest.

Here was his answer when asked (maybe for the second or third time) about needing to “rebound” this season.

“Even strength, I think I had my second highest career points last year,” Connolly said. “I’d like to improve my play on the power play and maybe play a bigger role. Penalty killing, I think, my individual percentage was 89 per cent I read somewhere. I was able to lead the forwards in blocked shots.”

He makes two points in there.  The first is that he had his second highest even strength points last year and the second was something about individual percentage was 89 percent. Lets deal with the first one first by looking at his even strength points since the first lockout.

Season Goals Assists Points
2011-12 11 20 31
2010-11 7 16 23
2009-10 9 27 36
2008-09 12 16 28
2007-08 3 20 23
2005-06 9 20 29

(Note: Connolly only played 2 games in 2006-07 so I have omitted it from the table and discussion)

Tim Connolly is actually correct.  His best even strength point total came in 2009-10 when he had 36 points followed by his 31 even strength points last year.  But let’s take a look at those point totals relative to even strength ice time.

Season ESTOI Points TOI/Pt
2011-12 940:12 31 30:20
2010-11 840:31 23 36:33
2009-10 966:41 36 26:51
2008-09 631:26 28 22:33
2007-08 603:18 23 26:14
2005-06 708:47 29 24:26

The last column is time on ice per point, or time on ice between points.  Last year he was on the ice for an average of 30 minutes and 20 seconds between each of his even strength points. This was his second worst since the locked out season. So, while Connolly was technically correct in saying that he had his second highest even strength point total last season, it was a somewhat misleading representation of his performance.

Now for the individual PK percent. It generated a bit of twitter conversation last night questioning what it actually is.

One might think it is the penalty kill percentage when he was on the ice but that seems like a strange thing to calculate.  Is it goals per 2 minutes of PK time?  Is it goals per PK he spent any amount of time killing?  I really didn’t know so I dug into the numbers deeper by looking at the Leafs PK percentages on my stats site and noticed that Connolly had the best on-ice save percentage (listed as lowest opposition shooting percentage) of any Leaf last season during 4v5 play and that save percentage while he was on the ice was just shy of 89% (88.68%). It seems that maybe what Connolly meant to say was that he had an on-ice PK save percentage of 89%.

How good is an 89% save percentage on the PK?  Well, of the 100 forwards with at least 100 4v5 minutes of ice time last year, Connolly ranks 42nd in the league so league wide it isn’t that impressive but considering the Leafs weak goaltending it might actually be fairly good.

Here is the thing though. Single season PK save percentage is so fraught with sample size issues that it is next to useless as a stat for goalies let alone forwards.

One could evaluate Connolly based on PK goals against rate in which he came up 3rd on the Leafs (trailing Lombardi or Kulemin) but that is still fraught with sample size issues. More fairly we probably should evaluate Connolly’s PK contribution based on shots against rate or maybe even more fairly fenwick or corsi against rates. In each of those categories he ranked 5th among Leafs with at least 50 minutes of 4v5 ice time with only Joey Crabb being worse. Furthermore, among the 110 players with 100 minutes of 4v5 PK ice time last year, Connolly ranked 99th in fenwick against rate.

I don’t mean for this article to be a Connolly bashing article. I actually do think Connolly was a little misused and would probably do better with a more well defined role and not bounced around in the line up so much so in that sense I agree with the premise of what Connolly is saying. With that said though, it probably is fair to say that he didn’t have a great season and if he wants a regular role in the top six with time on the PP and PK he needs to perform better as his use of stats to attempt to show he had a good season is really just evidence to how statistics can be misused to support almost any narrative you want.  As they say, there are lies, damn lies, and then there are statistics.

 

Nov 082012
 

Eric T. over at NHL Numbers had a post last week summarizing the current state of our statistical knowledge with respect to accounting for zone start differences.  If you haven’t read it definitely go read it because it is not only a good read but because it concludes that how the majority of people have been doing is is wrong.

Overall, no two estimates are in direct agreement, but the analyses that are known to derive from looking directly at the outcomes immediately following a faceoff converge in the range of 0.25 to 0.4 Corsi shots per faceoff — one-third to one-half of the figure in widespread use. It is very likely that we have been overestimating the importance of faceoffs; they still represent a significant correction on shot differential, but perhaps not as large as has been previously assumed.

In the article Eric refers to my observation that eliminating the 10 seconds after a zone start effectively removes any effect that the zone start had on the game.  From there he combined my zone start adjusted data found at stats.hockeyanalysis.com with zone start data from behindthenet.ca and came up with an estimate that a zone start is worth 0.35 corsi.  He did this by subtracting the 10 second zone start adjusted corsi from standard 5v5 corsi and then running a regression against the extra offensive zone starts the player had.  In the comments I discussed some further analysis I did on this using my own data (i.e. not the stuff on behindthenet.ca) and came up with similar, though slightly different, numbers.  In any event I figured the content of that comment was worthy of its own post here.

So, when I did the correlation between extra offensive zone starts and difference between 5v5 and 5v5 10 second zone start adjusted corsi I got the following (using all players with >1000 minutes of ice time over last 5 seasons):

My calculations come up with a slope of 0.3043 which is a little below that of Eric’s calculations but since I don’t know the exact methodology he used that might explain the difference (i.e. not sure if Eric used complete 5 years of data, or individual seasons).

What is interesting is that when I explored things further, I noticed that the results varied across positions, but varied very little across talent levels.  Here are some more correlations for different positions and ice time restrictions.

Position Slope r^2
All Players >1000 min. 0.30 0.55
Skaters >1000 min. 0.28 0.52
Forwards >1000 min. 0.26 0.50
Defensemen >1000 min. 0.33 0.57
Goalies >1000 min. 0.44 0.73
Forwards >500 min. 0.26 0.50
Forwards >2500 min. 0.26 0.52
Forwards 500-2500 min. 0.26 0.39

Two observations:

1.  The slope for forwards is less than the slope for defensemen which is (quite a bit) less than the slope for goalies.

2.  There is no variation in slope no matter what restrictions we put on a forwards ice time.

There isn’t really much to say regarding the second observation except that it is nice to see consistency but the first observation is quite interesting.  Goalies, who have no impact on corsi, see the greatest zone start influences on corsi of any position.  It is a little odd but I think it addresses one of the concerns that Eric had pointed out in his article:

The next step would be to remove the last vestige of sampling bias from our analysis. The approaches that focus on the period immediately after the faceoff reduce the impact of teams’ tendency to use their best forwards in the offensive zone, but certainly do not remove it altogether.

I think that is exactly what we are witnessing here, but maybe more importantly teams put out their best defensive players and, maybe more importantly, their best face off guys for defensive zone face offs. If David Steckel, who is an excellent face off guy, is getting all the defensive zone face offs, it is naturally going to suppress the corsi events immediately after the defensive zone face off because he is going to win the draw more often than not.  There is probably more line matching done for the zone face offs than during regular play so the line matching suppresses some of the zone start impact.  It is more difficult to line match when changing lines on the fly so a good coach can more easily get favourable line matches. The result is normal 5v5 play offensive players might see a boost to their corsi (because they can exploit good matchups) and during offensive zone face offs they see their corsi suppressed because they will almost always be facing good defensive players and top face off guys.  Thus, the boost to corsi based on a zone start is not as extreme as should be for offensive players.  The opposite is true for defensive players.

Defensemen are less often line matched so we see their corsi boost due to an offensive zone face off a little higher than that of forwards, but it isn’t near as high as goalies because there are defensemen that are primarily used in offensive situations and others that are primarily used in defensive situations.

Goalies though, tell us the real effect because they are always on the ice and they are not subject to any line matching.  In the table above you will notice that goalies have a significantly higher slope and an impressively high r^2.  I feel I have to post the chart of the correlation because it really is a nice chart to look at.

I have looked at a lot of correlations and charts in hockey stats but very few of them are as nice with as high a correlation as the chart above.

I believe that this is telling us that an offensive zone start is worth 0.44 corsi, but only when a player is playing against similarly defensively capable players as he would during regular 5v5 play which I speculate above is not necessarily (or likely) the case.  The 0.44 adjustment really only applies to an idealistic situation that doesn’t normally occur for any players other than goalies.  So where does that leave us?  Should we use a zone start adjustment of 0.44 corsi for all players, or should we use something like 0.33 for defensemen and 0.26 for forwards?  The answer isn’t so simple.  One could argue that we should apply 0.44 to all players and then make some sort of QoC adjustment and that would make some sense.  But if we are not intending to apply a QoC adjustment, does that mean we should use 0.33 and 0.26?  Maybe, but that is a little inconsistent because it would mean you are using a QoC adjustment only for the zone start adjustment of a players stats, and not for all his stats.  The answer for me is what I have been doing the past little while and not even attempt to adjust a players stats based on zone starts differences and rather simply just ignore the the portion of play that is subject to being influenced by zone starts – the 10 seconds after a zone start face off.  To me it seems like the simplest and easiest thing to do.

 

Oct 292012
 

The other week I wrote about breaking down IPP (Individual Point Percentage, which is individual points divided by number of goals scored while the player was on the ice) into IGP (Individual Goal Percentage) and IFAP (Individual First Assist Percentage).  It seems IGP does a decent job of identifying the pure goal scorers and IFAP does a decent job of identifying the pure play makers.  I have always been interested in team/line makeup and how to maximize a lines performance so I decide to take a look at WOWY IPP comparisons for two pairs of extremely talented players who have at times played together and at times played on separate lines the past 5 years.  These are Crosby/Malkin and Thornton/Marleau.  Let’s start with Crosby/Malkin.

TOI IGP IFAP IPP G/60 FA/60 GF20
Crosby without Malkin 2527:07 35.7% 36.3% 84.7% 1.33 1.35 1.24
Crosby with Malkin 954:29 41.9% 30.2% 91.9% 2.26 1.63 1.80
Malkin without Crosby 3588:42 32.2% 38.3% 86.7% 0.97 1.15 1.00
Malkin with Crosby 954:29 27.9% 30.2% 75.6% 1.51 1.63 1.80

These two players have played significantly more ice time apart than with each other but still the comparison is interesting.  When separated Crosby IGP and IFP are very close together indicating he is relatively balanced between being a goal scorer and a playmaker but when he is playing with Malkin he becomes a more important goal scorer as his IGP rises from 35.7% without Malkin to 41.9% with Malkin and his IFAP falls from 36.3% without Malkin to 30.2% with Malkin.  Crosby got a point on 84.7% of all goals scored while he was on the ice without Malkin which is a very high number, but it rises to 91.9% when he is playing with Malkin which is a truly extraordinary number.

Malkin, strangely, sees both his IGP and his IFAP fall when playing with Crosby which means a smaller percentage of the goal production goes through Malkin when Crosby is on the ice. This makes sense since Crosby is in on nearly every goal scored when the two are on the ice together.  Interestingly, despite being in on a lower percentage of goals, Malkin did see his individual G/60 and individual FA/60 rise dramatically when playing with Crosby due to the fact that when those two are on the ice together they score goals at an exceptionally high rate.

I am not sure what to conclude here other than if you desperately need to score a goal late in the game it would be awfully smart to play these two together.  But, with that said, it may not be the most prudent use of resources during the course of the game because it seems to somewhat diminish Malkin’s ability to drive the play.  Now, lets take a look at Thornton/Marleau.

TOI IGP IFAP IPP G/60 FA/60 GF20
Thornton without Marleau 2585:10 24.6% 35.2% 79.6% 0.75 1.07 1.01
Thornton with Marleau 2438:22 19.3% 37.8% 74.8% 0.64 1.25 1.11
Marleau without Thornton 2808:03 32.3% 24.2% 69.7% 0.74 0.56 0.77
Marleau with Thornton 2438:22 37.8% 13.3% 73.3% 1.25 0.44 1.11

This shows that Thornton and Marleau are very different players.  Marleau is clearly much more of a goal scorer while Thornton is clearly much more of a play maker, and this is true regardless of whether they are playing together or apart.  When playing with Marleau Thornton sees his goal production drop from 0.75 G/60 to 0.64 G/60 but his FA/60 rise from 1.07 to 1.25.  For Marleau his G/60 rises significantly when playing with Thornton but his FA/60 falls a bit too and his IFAP falls to an astonishingly low 13.3%.  In short, Marleau’s goal production benefits a lot from playing with Thornton, while Marleau’s benefit to Thornton is a little less significant.  I believe if we continued this analysis to Thornton’s other line mates we will find that Thornton’s play making skills are easily the most significant driving force of the Sharks offense.

Having done this IPP WOWY comparison for these two pairs of players we can make some interesting observations and we can get a better idea of which player is driving the play when they are playing together (and apart).  That said, I think more work needs to be done to determine whether IPP WOWY is a useful player evaluation tool in general, or just something that might be interesting to look at in certain situations.  I’m curious what others think, or if you have another pair of players you want me to look at let me know (for example, Spezza/Alfredsson might be interesting).

 

Oct 172012
 

Scott Reynolds over at NHLNumbers.com has written a series of articles on individual point percentage (IPP).  Individual point percentage is defined as the number of points an individual has collected divided by the number of goals scored while the player was on the ice.  In other words, it is the percentage of goals scored while the player was on the ice that the player either had a goal or an assist on.  Scott’s articles are on individual point percentages for 2011-12, individual point percentages for the last 5 seasons and individual point percentages on the power play.  Definitely go give them a read, as well as the comments, where some interesting discussions ensued.

At first I was skeptical of the value of IPP because essentially it only tells you how important the player is to the teams offense when the player is on the ice, and not really anything about the actual skill level of the player.  A good player with really weak line mates can put up a pretty good IPP even if he isn’t a great offensive player.  Or, a good third liner could have a similar IPP as a good first liner, but not be anywhere close to each other in terms of overall talent level.  But, upon further thought I figured there would be some value in determining who is leading the offense and who might be deserving of a line promotion (i.e. might be too good for his current line mates) or a demotion (might be holding their line mates back).  So, I decided I would look into IPP a bit further.  I have calculated IPP for the past 5 years for 5v5 zone start adjusted ice time and only considered forwards with >2500 minutes of ice time over those 5 seasons.  The top 30 players in terms of IPP are the following.

Player IPP
SIDNEY CROSBY 87.2%
JAMIE BENN 84.9%
MARIAN GABORIK 83.6%
EVGENI MALKIN 83.1%
DANIEL SEDIN 82.0%
MIKE RIBEIRO 81.9%
HENRIK SEDIN 81.6%
ILYA KOVALCHUK 81.3%
RICK NASH 81.3%
ZACH PARISE 81.0%
MARC SAVARD 81.0%
NIKOLAI ZHERDEV 80.7%
JORDIN TOOTOO 80.6%
WOJTEK WOLSKI 80.3%
ALES HEMSKY 80.1%
JASON POMINVILLE 79.8%
ALEX OVECHKIN 79.8%
PATRIK ELIAS 79.7%
SCOTTIE UPSHALL 79.5%
ALEXANDER SEMIN 79.5%
PETER MUELLER 79.1%
DAVID KREJCI 79.0%
LOUI ERIKSSON 78.8%
CURTIS GLENCROSS 78.7%
CLAUDE GIROUX 78.7%
JAMAL MAYERS 78.5%
KRISTIAN HUSELIUS 78.5%
TRENT HUNTER 78.3%
RAY WHITNEY 78.2%
MIKKO KOIVU 78.0%

The above table is fairly similar to the top players that Scott identified so I won’t go into too much detail.  Some guys that Scott identified, such as Jordan Eberle, didn’t make my list because he didn’t make my 2500 minute ice time restriction and because I am using faceoff adjusted ice time (eliminating 10 seconds after a zone face off) the numbers for others are slightly different.  But more or less the lists are comparable.

Continue reading »

Sep 032012
 

A month and a half ago Eric T at NHLNumbers.com had a good post on quantifying the impact on teammate shooting percentage.  I wanted to take a second look at the relative importance the impact on teammate shooting percentage can have because I disagreed somewhat with Eric’s conclusions.

For a very small number of elite playmakers, the ability to drive shooting percentage can be a major component of their value. For the vast majority of the league, driving possession is a more significant and more reproducible path to success.

It is my belief that it is important to consider impact on shooting percentage for more than a “very small number of elite playmakers” and I’ll attempt to show that now.

The method that Eric used to identify a players impact on shooting percentage is to compare that players line mates shooting percentages with him to their overall shooting percentage.  As noted in the comments the one flaw with this is that their overall shooting percentage is impacted by the player we are trying to evaluate which will end up underestimating the impact.  In the comments Eric re-did the analysis using a true “without you” shooting percentage and the impact of driving teammate shooting percentages was greater than initially expected but he concluded the conclusions didn’t  chance significantly.

Overall average for the top ten is a 1.2% boost (up from 0.9% in story) and 5 goals per year (up from 4.5). I don’t think this changes the conclusions appreciably.

In the minutes that a player is on the ice with one of the very best playmakers in the league, his shooting percentage will be about 1% better. For a player who gets ~150-200 shots per year and plays ~40-60% of his ice time with that top-tier playmaker, that’s less than a one-goal boost. It’s just not that big of a factor.

He also suggested that using the “without you” shooting percentage instead of the “overall shooting percentage” would probably result in “more accurate but less precise” analysis.  This is because a guy like Daniel Sedin would get very few shots when playing apart from Henrik Sedin because they rarely play apart and this small “apart” sample size might be subject to significant small sample size errors.

Continue reading »

Jul 112012
 

I have been wondering about the benefits of using 5v5 close data instead of 5v5 when we do player analysis and player comparisons.  The rationale for comparing players in 5v5close situations is that we are comparing players under similar situations.  When teams have a comfortable lead they go into a defensive shell resulting in fewer shots for but with a higher shooting percentage and more shots against, but a lower shooting percentage.  The opposite of course is true when a team is trailing.  But what I have been thinking about recently is whether there is a quality of competition impact during close situations.  My hypothesis is that teams that are really good will play more time with the score close against other good teams and less time with the score close against significantly weaker teams.  Conversely, weak teams will play more minutes with the score close against other weak teams than against good teams.

My hypothesis is that players on good teams will have a tougher QoC during 5v5 close situations than during overall 5v5 situations and players on weak teams will have weaker QoC during 5v5 close situations than during overall 5v5 situations.  Let’s put that hypothesis to the test.

The first thing I did was to select one key player from each of the 30 teams to represent that team in the study.  Mostly forwards were chosen but a few defensemen were chosen as well.  From there I looked at the average of their opponents goals for percentage (goals for / [goals for + goals against]) over the past 3 seasons in zone start adjusted 5v5 situations as well as zone start adjusted 5v5 close situations and then compared the difference to the players teams record over the past three seasons.  The table below is what results.

Player Team GF% 5v5 GF% Close Close – 5v5 3yr Pts Avg. Pts
Doan Phoenix 50.3% 50.6% 0.3% 303 101.0
Chara Boston 50.7% 50.9% 0.2% 296 98.7
Toews Chicago 50.4% 50.6% 0.2% 310 103.3
Datsyuk Detroit 50.8% 51.0% 0.2% 308 102.7
Weber Nashville 50.5% 50.7% 0.2% 303 101.0
Backes St. Louis 50.8% 51.0% 0.2% 286 95.3
E. Staal Carolina 50.4% 50.5% 0.1% 253 84.3
Ribeiro Dallas 50.5% 50.6% 0.1% 272 90.7
Gaborik Ny Rangers 50.1% 50.2% 0.1% 289 96.3
Malkin Pittsburgh 50.1% 50.2% 0.1% 315 105.0
Ovechkin Washington 49.9% 50.0% 0.1% 320 106.7
Enstrom Winnipeg 50.1% 50.2% 0.1% 247 82.3
Weiss Florida 50.3% 50.3% 0.0% 243 81.0
Plekanec Montreal 50.4% 50.4% 0.0% 262 87.3
Tavares NY Islanders 50.3% 50.3% 0.0% 231 77.0
Hartnell Philadelphia 50.1% 50.1% 0.0% 297 99.0
J. Thornton San Jose 50.9% 50.9% 0.0% 314 104.7
Kessel Toronto 50.1% 50.1% 0.0% 239 79.7
H. Sedin Vancouver 50.0% 50.0% 0.0% 331 110.3
Nash Columbus 50.9% 50.8% -0.1% 225 75.0
J. Eberle Edmonton 50.6% 50.5% -0.1% 198 66.0
Kopitar Los Angeles 50.6% 50.5% -0.1% 294 98.0
M. Koivu Minnesota 50.7% 50.6% -0.1% 251 83.7
Parise New Jersey 50.8% 50.7% -0.1% 286 95.3
Getzlaf Anaheim 51.0% 50.8% -0.2% 268 89.3
Roy Buffalo 50.3% 50.1% -0.2% 285 95.0
Stastny Colorado 50.3% 50.1% -0.2% 251 83.7
Spezza Ottawa 50.6% 50.4% -0.2% 260 86.7
Stamkos Tampa 50.2% 50.0% -0.2% 267 89.0
Iginla Calgary 50.5% 50.2% -0.3% 274 91.3
50.4% 50.5% >0 97.3
50.3% 50.3% =0 91.3
50.6% 50.4% <0 86.6

The list above is sorted by the difference between the oppositions 5v5 close GF% and the oppositions 5v5 GF%.  The bottom three rows of the last column is what tells the story.  These show the average point totals of the teams for players whose opposition 5v5 close GF% was greater than, equal to and less than the opponents 5v5 GF%.  As you can see, the greater than group had a team average 97.3 points, the equal to group had a team average of 91.3 points and the less than group had a team average of 86.6 points.  This means that good teams have on average tougher 5v5 close opponents than straight 5v5 opponents and weak teams have tougher 5v5 opponents than 5v5 close opponents which is exactly what we predicted.  It is also not unexpected.  Weak teams tend to play close games against similarly weak teams while strong teams play close games against similarly strong teams.

Another important observation is how little deviation from 50% there is in each players opposition GF% metrics.  The range for the above players is from 49.9% to 51.0%.  That is an incredible tight range and reconfirms to me the small importance QoC has an a players performance, especially when considering longer periods of time.

I also conducted the same study using fenwick for percentage as the QoC metric instead of goals for percentage but the results were less conclusive.  The >0 group had an average of 93.2 team points int he standings, the =0 group had 93.4 team points in the standings and the <0 group had 83.25 team points in the standings.  Furthermore there was even less variance in opposition FF% than GF% and only 12 teams had any difference between opposition 5v5 and opposition 5v5 close FF%.  For me, this is further evidence that fenwick/corsi are not optimal measures of player value.

Finally, I looked at the difference in player performance during 5v5 situations and found no trends among the different performance levels.  For GF% almost every player had their 5v5 close GF% within 4% of their of their 5v5 GF% (r^2 between the two was 0.7346) and for FF% every player but Parise had their 5v5 close FF% within 1.7% of their 5v5 GF% (r^2 = 0.945).  Furthermore, there was consistency as to which players saw an improvement (or decrease) in their 5v5 close GF% or FF% so it seems it might be luck driven (particularly for GF%) or maybe coaching factors.

So what does this all mean?  It means that in 5v5 close situations good teams have a bias towards tougher QoC than weak teams do.  Does it have a significant factor on player performance?  No, because the QoC metrics vary very little across players or from situation to situation (from my perspective QoC can be ignored the majority of the time).  Does it mean that we should be using 5v5 close in our player analysis?  I am still not sure.  I think the benefits of doing so are still probably quite small if there is any at all as 5v5 close performance metrics mirror 5v5 performance metrics quite well and in the case of goal metrics using the larger sample size of 5v5 data almost certainly supersedes any benefits of using 5v5 close data.