Sep 142013
 

A while back I came up with a stat which at the time I called LT Index which is essentially the percentage of a players teams ice time when leading that the player is on the ice for divided by the percentage of a players teams ice time when trailing that the player is on the ice for (in 5v5 situations and only in games in which the player played). LT Index standing for Leading-Trailing Index. I have decided to rename this statistic to Usage Ratio since it gives us an indication of whether players are used more in defensive situations (i.e. leading and protecting a lead and thus a Usage Ratio above 1.00) or in offensive situations (i.e. when trailing and in need of a goal and thus a Usage Ratio less than 1.00). I think it does a pretty good job of identifying how a player is used.

I then compared players Usage Index to their 5v5 tied statistics using the theory that a player being used in a defensive role when leading/trailing is more likely to be used in a defensive role when the game is tied. This is also an out of sample comparison (which is always a nice thing to be able to do) since we are using leading/trailing situations to identify offensive vs defensive players and then comparing to 5v5 tied situations that in no way overlap the leading or trailing data.

Let’s start by looking at forwards using data over the last 3 seasons and including all forwards with >500 minutes of 5v5 tied ice time. The following charts compare Usage Ratio with 5v5 Tied CF%, CF60 and CA60.

UsageRatiovsCFPct

UsageRatiovsCF60

UsageRatiovsCA60

Usage Ratio is on the horizontal axis with more defensive players to the right and offensive players to the left.

Usage Ratio has some correlation with CF% but that correlation is solely due to it’s connection with generating shot attempts for and not for restricting shot attempts against. Players we identify as offensive players via the Usage Ratio statistic do in fact generate more shots but players we identify as defensive players do not suppress opposition shots any. In fact, Usage Ratio and 5v5 tied CA60 is as uncorrelated as you can possibly get. One may attempt to say this is because those defensive players are playing against offensive players (i.e. tough QoC) and that is why but if this were the case then those offensive players would be playing against defensive players (i.e. tough defensive QoC) and thus should see their shot attempts suppressed as well. We don’t observe that though. It just seems that players used as defensive players are no better at suppressing shot attempts against than offensive players but are, as expected, worse at generating shot attempts for.

Before we move on to defensemen let’s take a look at how Usage Ratio compares with shooting percentage and GF60.

UsageRatiovsShPct

 

UsageRatiovsGF60

As seen with CF60, Usage Ratio is correlated with both shooting percentage and GF60 and the correlation with GF60 is stronger than with CF60. Note that the sample size for 3 seasons (or 2 1/2 actually) of 5v5 tied data is about the same as the sample size for one season of 5v5 data (players in this study have between 500 and 1300 5v5 tied minutes which is roughly equivalent of how many 5v5 minutes forwards play over the course of one full season).

FYI, the dot up at the top with the GF60 above 5 is Sidney Crosby (yeah, he is in a league of his own offensively) and the dot to the far right (heavy defensive usage) is Adam Hall.

Now let’s take a look at defensemen.

UsageRatiovsCFPctDefensemen

UsageRatiovsCF60Defensemen

UsageRatiovsCA60Defensemen

There really isn’t much going on here and how a defenseman is used really does’t tell us much at all about their 5v5 stats (only marginal correlation to CF60). As with forwards, defensemen that we identify as being used in a defensive are not any better at reducing shots against than defensemen we identify as being used in an offensive manner.

To summarize the above, players who get more minutes when playing catch up are in fact better offensive players, particularly when looking at forwards but players who get more minutes when protecting a lead are not necessarily any better defensively. We do know that there are better defensive players (the range of CA60 among forwards is similar to the range of CF60 so if there is offensive talent there is likely defensive talent too), and yet coaches aren’t playing these defensive players when protecting a lead. Coaches in general just don’t know who their good defensive players are.

Still not sold on this? Well, let’s compare 5v5 defensive zone start percentage (percentage of face offs taken in the defensive zone) to CF60 and CA60 (for forwards) in 5v5 tied situations.

DefensiveFOPctvsCF60

Percentage of face offs in the defensive zone is on the horizontal axis and CF60 is on the vertical axis. This chart is telling us that the fewer defensive zone face offs a forward gets, and thus likely more offensive face offs, the more shot attempts for they produce. In short, players who get offensive zone starts get more shot attempts.

DefensiveFOPctvsCA60

The opposite is not true though. Players who get more defensive face offs don’t give up any more or less shots than their low defensive zone face off counterparts. This tells me that if there is any connection between zone starts and CF% it is solely due to the fact that players who get offensive zone starts are better offensive players and not because players who get defensive zone starts are better defensive players.

You might again be saying to yourself ‘the players who are getting the defensive zone starts they are playing against better offensive players so doesn’t make sense that their CA60 is inflated above their talent levels (which presumably is better than average defensively)?  This might be true, but if zone starts significantly impacted performance (as would be the case if that last statement were true), either directly or indirectly because zone starts are linked to QoC, then there should be more symmetry between the charts. There isn’t though. Let’s look at what these two charts tell us:

  1. The first chart tells us that players who get offensive zone starts generate more shot attempts.
  2. The second chart tells us that players who get defensive zone starts don’t give up more shots attempts against.

If zone starts were a major factor in results, those two statements don’t jive. How can one side of the ledger show an advantage and the other side of the ledger be neutral? The way those statements can work in conjunction with each other is if zone starts don’t significantly impact results which is what I believe (and have observed before).

But, if zone starts do not significantly impact results, then the results we see in the two charts above are driven by the players talent levels. Knowing that we once again can observe that coaches are doing a decent job of identifying offensive players to start in the offensive zone but are doing a poor job at identifying defensive players to play in the defensive zone.

All of this is to say, NHL coaches generally do a poor job at identifying their best defensive players so if you think that guy who is getting all those defensive zone starts (aka ‘tough minutes’) are more likely to be defensive wizards, think again. They may not be.

 

Aug 022013
 

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

  1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
  2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

Comparison 0710 vs 1013 Even vs Odd Difference
GF20 vs GF20 0.61 0.89 0.28
FF20 vs FF20 0.62 0.97 0.35
FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs GF20 0.54 0.86 0.33
GF20 vs FF20 0.44 0.86 0.42
FSh% vs GF20 0.48 0.76 0.28
GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

Comparison 0710 vs 1013 Even vs Odd Difference
FF20 vs FSh% 0.38 0.66 0.28
FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

Comparison Even vs odd
GF20 vs GF20 0.82
FF20 vs FF20 0.93
FSh% vs FSh% 0.63
FF20 vs GF20 0.74
GF20 vs FF20 0.77
FSh% vs GF20 0.65
GF20 vs FSh% 0.66
FF20 vs FSh% 0.45
FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

 

Jun 182013
 

If you have been following the discussion between Eric T and I you will know that there has been a rigorous discussion/debate over where hockey analytics is at, where it is going, the benefits of applying “regression to the mean” to shooting percentages when evaluating players. For those who haven’t and want to read the whole debate you can start here, then read this, followed by this and then this.

The original reason for my first post on the subject is that I rejected Eric T’s notion that we should “steer” people researching hockey analytics towards “modern hockey thought” in essence because I don’t we should ever be closed minded, especially when hockey analytics is pretty new and there is still a lot to learn. This then spread into a discussion of the benefits of regressing shooting percentages to the mean, which Eric T supported wholeheartedly while I suggested that I think further research into isolating individual talent even goal talent through adjusting for QoT, QoC, usage, score effects,  coaching styles, etc. can be equally beneficial and focus need not be on regressing to the mean.

In Eric T’s last post on the subject he finally got around to actually implementing a regression methodology (though he didn’t post any player specifics so we can’t see where it is still failing miserably) in which he utilized time on ice to choose a mean for which a players shooting percentage should regress to. This is certainly be better than regressing to the league-wide mean which he initially proposed but the benefits are still somewhat modest. The results for players who played 1000 minutes in the 3 years of 2007-10 and 1000 minutes in the 3 years from 2010-13 showed the predictive power of his regressed GF20 to predict future GF20 was 0.66 which was 0.05 higher than the 0.61 predictive power raw GF20. So essentially his regression algorithm improved predictive power by 0.05 while there still remains 0.34 which is unexplained. The question I attempt to answer today is for a player who has played 1000 minutes of ice time, what is the amount of his observed stats that is true randomness and what amount is simply unaccounted for skill/situational variance.

When we look at 2007-10 GF20 and compare it to 2010-13 GF20 there are a lot of factors that can explain the differences from a change in quality of competition, a change in quality of team mates, a change in coaching style, natural career progression of the player, zone start usage, and possibly any number of other factors that might come into play that we do not currently know about as well as true randomness. To overcome all of these non-random factors that we do not yet know how to fully adjust for in order to get a true measure of the random component of a players stats we need to be able to get two sets of data that have attributes (QoT, QoC, usage, etc) as similar to each other as possible. The way I did this was to take each of the 6870 games that have been played over the past 6 seasons and split them into even and odd games and calculate each players GF20 over each of those segments. This should, more or less, split a players 6 years evenly in half such that all those other factors are more or less equivalent across halves. The following table shows how predicting the even half is at predicting the odd half based on how many total minutes (across both halves) that the player has played.

Total Minutes GF20 vs GF20
>500 0.79
>1000 0.85
>1500 0.88
>2000 0.89
>2500 0.88
>3000 0.88
>4000 0.89
>5000 0.89

For the group of players with more than 500 minutes of ice time (~250 minutes or more in each odd/even half) the upper bound on true randomness is 0.21 while the predictive power of GF20 is 0.79. With greater than 1000 minutes randomness drops to 0.15 and with greater than 1500 minutes and above the randomness is around 0.11-0.12. It’s interesting that setting the minimum above 1500 minutes (~750 in each even/odd half) of data doesn’t necessarily reduce the true randomness in GF20 which seems a little counter intuitive.

Let’s take a look at the predictive power of fenwick shooting percentage in even games to predict fenwick shooting percentage in odd games.

Total Minutes FSh% vs FSh%
>500 0.54
>1000 0.64
>1500 0.71
>2000 0.73
>2500 0.72
>3000 0.73
>4000 0.72
>5000 0.72

Like GF20, the true randomness of fenwick shooting percentage seems to bottom out at 1500 minutes of ice time and there appears to be no benefit to going with increasing the minimum minutes played.

To summarize what we have learned we have the following which is for forwards with >1000 minutes in each of 2007-10 and 2010-13.

GF20 predictive power 3yr vs 3yr 0.61
True Randomness Estimate 0.11
Unaccounted for factors estimate 0.28
Eric T’s regression benefit 0.05

There is no denying that a regression algorithm can provide modest improvements but this is only addressing 30% of what GF20 is failing to predict and it is highly doubtful that efforts to improve the regression algorithm any more will result in anything more than marginal benefits. The real benefit will come from researching the other 70% we don’t know about. It is a much more difficult  question to answer but the benefit could be far more significant than any regression technique.

Addendum: After doing the above I thought, why not take this all the way and instead of doing even and odd games do even and odd seconds so what happens one second goes in one bin and what happens the following second goes in the other bin. This should absolutely eliminate any differences in QoC, QoT, zone starts, score effects, etc. As you might expect, not a lot has changed but the predictive power of GF20 increases marginally, particularly when dealing with lower minute cutoffs.

Total Minutes GF20 vs GF20 FSh% vs FSh%
>500 0.81 0.58
>1000 0.86 0.68
>1500 0.88 0.71
>2000 0.89 0.73
>2500 0.89 0.73
>3000 0.90 0.75
>4000 0.90 0.73
>5000 0.89 0.71

 

May 152013
 

After last weeks untimely pinch by Dion Phaneuf in game 4 that led to an overtime goal and the Bruins taking a 3-1 lead in the first round series there was a lot of evaluation of Phaneuf as a defenseman both good and bad. I was intending to write an article to discuss the relative merits of Dion Phaneuf and attempt to get an idea of where he stands among NHL defensemen but in the process of researching that I came across some interesting Phaneuf stats that I think deserve their own post so here it is.

My observation was with respect to Phaneuf’s usage and performance when the Leafs are leading and when they are trailing over the previous 3 seasons. Let’s start of by looking at Phaneuf’s situational statistics over the past 3 seasons.

5v5 5v5close 5v5tied Leading Trailing
G/60 0.222 0.175 0.101 0.156 0.408
Pts/60 0.700 0.670 0.660 0.420 1.020
IPP 30.1% 31.1% 34.2% 20.0% 34.5%
GF20 0.773 0.721 0.640 0.692 0.986
GA20 0.841 0.760 0.943 0.865 0.714
GF% 47.9% 48.7% 40.4% 44.4% 58.0%
CF20 18.316 18.113 18.159 15.195 21.542
CA20 20.686 21.418 21.880 22.982 17.223
CF% 47.0% 45.8% 45.4% 39.8% 55.6%
OZ% 28.0% 26.7% 25.2% 24.2% 34.5%
DZ% 31.8% 30.3% 29.7% 37.5% 28.5%
NZ% 40.3% 43.0% 45.0% 38.3% 37.0%
DZBias 103.9 103.6 104.4 113.3 94.0
TeamDZBias 108.9 109 107 115.2 100.8
DZBiasDiff -5 -5.4 -2.6 -1.9 -6.8

Most of the stats above the regular readers should be familiar with but if you are not you can reference my glossary here. The one stat that I have not used before is DZBias. DZBias is defined as 2*DZ% + NZ% and thus anything over 100 indicates the player has a bias towards starting shifts in the defensive zone and anything under 100 the player has a bias towards starting in the offensive zone. I prefer this to OZone% which is OZStarts/(OZStarts+DZStarts) because it takes into account neutral zone starts as well. TeamDZBias is the zone start bias of the Leafs over the past 3 seasons and DZBiasDiff is Phaneuf’s DZBias minus the teams DZBias and provides a zone start bias relative to the team. Anything less than 0 indicates usage is more in the offensive zone relative to his teammates.

So, what does this tell us about Phaneuf.  Well, there isn’t a huge variation in either the zone start usage or the results during 5v5, 5v5close and 5v5tied situations so the focus should be on the differences between 5v5leading and 5v5trailing which are significant.

Typical score effects are when leading a team gives up more shots but of lower quality (defensive shells protect the danger zone in front of the net but allow more shots from the perimeter) and takes fewer shots but of higher quality (probably a result of more odd-man rushes due to pinching defensemen of the trailing team).  Phaneuf seems to take this concept to the extreme but more importantly Phaneuf seems to excel best in an offensive role and struggles in a defensive role. When the Leafs are trailing Phaneuf has  0.408G/60 (10th of 180 defensemen) and 1.02 points/60 (36th of 180 defensemen) but when leading Phaneuf falls to 0.156 G/50 (64th of 177 defensemen) and 0.42 points/60 (137th of 177 defensemen). Furthermore, Phaneuf’s involvement in the offensive zone drops off significantly when leading (IPP drops from 34.5% when trailing to 20.0% when leading).

In terms of on-ice stats, Phaneuf’s CF% drops from 55.6% when trailing (79th of 180 defensemen) to a very poor 39.8% when leading (164th of 177 defensemen).  Some may be thinking this is due to zone starts but Phaneuf is getting above average offensive zone starts both when trailing (ranks 100th of 180 defensemen) and when leading (ranks 154th of 177) and using even the most aggressive zone start adjustments in no way will account for the difference. Similar observations can be made with on-ice goal stats as well. Let’s look at how Phaneuf ranks among defensemen over the past 3 seasons.

Leading (of177) Trailing ( of 180)
GF20 109 25
GA20 125 71
GF% 126 36
CF20 128 31
CA20 174 154
CF% 164 79

That is a pretty significant improvement in rankings when trailing over when leading, especially in the offensive statistics (GF20, CF20). If zone starts aren’t a factor, might line mates be? He are Phaneuf’s most frequent defense partners:

Trailing:  Gunnarsson (364:33, 31.0%), Beauchemin(212:07, 18,0%), Aulie(162:09, 13.8%)

Leading: Gunnarsson (376:16, 32.5%), Aulie(234:17, 20.3%), Beauchemin(166:30, 14.4%)

Playing more with Beauchemin and less with Aulie when trailing ought to help, particularly ones offensive stats, but I doubt that is going to account for that much of a difference. Also, when leading Phaneuf has a 41.2CF% with Gunnarsson and when trailing that spikes to 54.6%. When leading Phaneuf and Beauchemin have a CF% of 37.3% and when trailing that spikes to 57.7%. With Aulie the difference is 36.6% vs 49.3%. Regardless of which defense partner Phaneuf is with, their stats dramatically improve when playing in catch up situation than when in trailing situations.

The same is true for forwards. When protecting a lead Phaneuf plays more with Grabovski and Kulemin but when playing catch up he plays a bit more with Kessel and Bozak but for all of those forwards Phaneuf’s numbers with them are hugely better when playing catch up than when protecting a lead and playing with Grabovski and Kulemin more when playing with a lead should only help his statistics as they are generally considered the Leafs better corsi players.

Let’s take a look at a chart of Phaneuf’s corsi WOWY’s when leading and when trailing.

Leading:

PhaneufLeadingCorsiWOWY201013

As you can see, when leading the majority of Phaneuf’s team mates are to the left of the diagonal line which means they have a better corsi% without Phaneuf than with.

Trailing:

PhaneufTrailingCorsiWOWY201013

When trailing the majority of Phaneuf’s team mates are near or to the right of the diagonal line which means they generally have better corsi% statistics when with Phaneuf than when apart.

So the question arises, why is this? It doesn’t seem to be zone starts. It doesn’t seem to be changes in line mates and it isn’t that the team as a whole automatically becomes a great corsi% team when trailing which Phaneuf could benefit from. When leading Phaneuf’s corsi% is 39.8% which is worse than the teams 41.2% and when trailing Phaneuf’s corsi% is 55.6% which is better than the teams 54.4%. It seems to me that the conclusion we must draw from this is that Phaneuf has been poor at protecting a lead relative to his team mates and we know his team mates have been poor at protecting a lead. Where Phaneuf excels is when he is asked to engage offensively be that when playing catch up hockey or when playing on the PP (Phaneuf’s PP statistics are pretty solid). From the first chart we know that Phaneuf has a slight bias towards more offensive zone starts (relative to his team mates) and when we dig into the numbers further it probably shows that he should be given even more offensive opportunities and given fewer defensive ones because he seems like a much better player when asked to be engaged offensively than when he is asked to be a shut down defenseman.

Acquiring a quality shut down defenseman (ideally two) this off season must be the #1 priority of Maple Leaf management and Phaneuf’s usage must shift further away from multi-purpose heavy work load defenseman to primarily an offensive usage defenseman.

 

May 012013
 

I brought this issue up on twitter today because it got me thinking. Many hockey analytics dismiss face off winning % as a skill that has much value but many of the same people also claim that zone starts can have a significant impact on a players statistics. I haven’t really delved into the statistics to investigate this, but here is what I am wondering.  Consider the following two players:

Player 1: Team wins 50% of face offs when he is on the ice and he starts in the offensive zone 55% of the time.

Player 2: Team wins 55% of face offs when he is on the ice but he has neutral zone starts.

Given 1000 zone face offs the following will occur:

Player 1 Player 2
Win Faceoff in OZone 275 275
Lose Faceoff in Ozone 275 225
Win Faceoff in DZone 225 275
Lose Faceoff in Dzone 225 225

Both of these players will win the same number of offensive zone face offs and lose the same number of defensive zone face offs which are the situations that intuitively should have the greatest impacts on a players statistcs. So, if Player 1 is going to be more significantly impacted by his zone starts than player 2 is impacted by his face off win % losing face offs in the offensive zone must still have a significant positive impact on the players statistics and winning face offs in the defensive zone must must still have a significant negative impact on the players statistics. If this is not the case then being able to win face offs should be more or less equivalent in importance to zone starts (and this is without considering any benefit of winning neutral zone face offs).

Now, I realize that there is a greater variance in zone start deployment than face off winning percentage, but if a 55% face off percentage is roughly equal to a 55% offensive zone start deployment and a 55% face off win% has a relatively little impact on a players statistics then a 70% zone start deployment would have a relatively little impact on the players statistics times four which is still probably relatively little.

I hope to be able to investigate this further but on the surface it seems that if face off win% is of relatively little importance it is supporting of my claim that zone starts have relatively little impact on a players statistics.

 

Apr 192013
 

Tyler Dellow has a post at mc79hockey.com looking at zone starts and defensemen and if you read it the clear conclusion is that zone starts seem to matter quite a bit. In the third chart you can see that defensemen who get the most extreme defensive zone starts have an average corsi% of 44.7% while the average corsi% for defensemen with the most extreme offensive zone starts is 53.3%. This would seem to indicate that for defensemen zone starts can impact your corsi% anywhere from -5.3% to +3.3%. This is far more significant than I have estimated myself using a different methodology so I pondered that part of the reason for this is that when you start in the defensive zone you are playing with weaker quality of teammates than when you start in the offensive zone. My reasoning is that players that get used primarily in the defensive zone are often weak offensive players as if you are a good offensive player you will be given offensive opportunities. I wanted to explore this concept further and that is what I present to you here.

Unlike Tyler Dellow I used forwards in my analysis but it is unlikely that this will have a major impact in the analysis as forwards and defensemen are always on the ice together. One difference between my analysis and Tyler Dellow’s is I used data from stats.hockeyanalysis.com where as Tyler used stats from behindthenet.ca. Behindthenet.ca includes goalie pulled situations in their data and this has the potential to greatly emphasize the impact of zone starts. I feel it is important to eliminate this factor so I have it removed from the data. I also only used 2011-12 data but that shouldn’t have a major impact on the results.

So, my theory is that players who start in the defensive zone are weaker players overall. The challenge to this is that players who start with players that start frequently in the defensive zone likely start frequently in the defensive zone themselves and thus their stats are subject to zone start effects so if they have weak stats we don’t know whether they are due to the zone starts or because they are weak players. My solution was to look at the players zone start adjusted stats that I have on stats.hockeyanalysis.com. These stats ignore the first 10 seconds after a zone face off as it has been shown that the majority of the benefit/penalty of a zone face off has largely dissipated after 10 seconds. I understand that it may seem weird to use zone start adjusted data in a study that attempts to estimate the impact of zone starts but I don’t know what else to do.

I want to also point out that I will be using ZS adjusted FF% team mates when the team mates are not on the ice with the player and this may also mitigate the ZS impact on the teammates stats. My reasoning is, if a player has an extensive number of defensvie zone starts, it is quite possible that when his team mates are not playing with him their zone starts are more neutral or maybe even offensive zone biased. It if there ever was a way to get a non-zone start impacted FF% to use as a QoT metric this is probably the best we can do.

Ok, so what I did was compare a players 5v5 FF% (fenwick %) and zone start adjusted 5v5 TMFF% (zone start adjusted FF% of teammates when team mates are not playing with him) and came up with the following:

FFPct_vs_TMFFPct_by_ZS

As you can see, TMFF% does seem to vary across zone start profiles as I had hypothesized though to a lesser extent than the players zone start influenced FF% which is to be expected. So, if we subtract TMFF% from FF% we get the following chart:

FFPct-TMFFPct_by_ZS

This chart indicates that the zone start impact on forwards once adjusted for quality of teammates (as best we can) ranges from -2.5% to +2.15% which is significantly lower than the -5.3% to +3.3% estimate that Tyler Dellow came up with for defensemen without adjusting for quality of teammates and using goalie pulled situations included in the data. That said, this is still more significant than my own estimates when I compared 5v5 data to 5v5 data with the first 10 seconds after a zone start ignored. When I did that I calculated the impact on H. Sedin’s FF% due to his heavy offensive zone starts to be +1.4% to his FF% and considered this an upper bound. To investigate this further I plotted the average difference between 5v5 FF% and my 5v5 zone start adjusted FF% and I get the following:

FFPct-ZSAdjFFPct_by_ZS

The above is an estimate of the average impact of zone starts using my zone start adjustment methodology which ignores the first 10 seconds after a zone face off. This is significantly lower than either of the previous 2 estimates as we can see in this summary table:

Methodology ZS Impact Estimate
T. Dellow’s estimate for defensemen -5.3% to +3.3%
My TM Adjusted estimate for forwards -2.5% to +2.15%
My 10 second after Zone FO adjustment for forwards -0.5% to +0.41%

I am pretty sure none of what I have said above will put an end to the impact of zone starts on a players statistics debate but at the very least I hope it sheds some light on some of the issues involved. For me personally, I have the most confidence in my zone start adjustment method which removes the 10 seconds after a zone face off. My reasoning is studies have shown that the effect of a zone face off is largely eliminated within the first 10 seconds (see here or here) and also because it is the only methodology that compares a player to himself under similar playing conditions (i.e. same season, almost identical QoT, QoC and situation profiles) eliminating most of the opportunity for confounding factors to influence the results. If this is the case, the impact of zone starts on a players stats is fairly small to the point of being almost negligible for the majority of players.

 

Apr 172013
 

Even though I am a proponent of shot quality and the idea that the percentages matter (shooting and save percentage) puck control and possession are still an important part of the game and the Maple Leafs are dreadful at it. One of the better easily available metrics for measuring possession is fenwick percentage (FF%) which is a measure of the percentage shot attempts (shots + shots that missed the net) that your team took. So a FF% of 52% would mean your team took 52% of the shots while the opposing team took 48% of the shots. During 5v5 situations this season the Maple Leafs have a FF% of 44.4% which is dead last in the NHL. So, who are the biggest culprits in dragging down the Maple Leafs possession game? Let’s take a look.

Forwards

Player Name FF% TMFF% OppFF% FF% – TMFF% FF%-TMFF%+OppFF%-0.5
MACARTHUR, CLARKE 0.485 0.44 0.507 0.045 0.052
KESSEL, PHIL 0.448 0.404 0.507 0.044 0.051
KOMAROV, LEO 0.475 0.439 0.508 0.036 0.044
KADRI, NAZEM 0.478 0.444 0.507 0.034 0.041
GRABOVSKI, MIKHAIL 0.45 0.424 0.508 0.026 0.034
VAN_RIEMSDYK, JAMES 0.456 0.433 0.508 0.023 0.031
FRATTIN, MATT 0.475 0.448 0.504 0.027 0.031
LUPUL, JOFFREY 0.465 0.445 0.502 0.02 0.022
BOZAK, TYLER 0.437 0.453 0.508 -0.016 -0.008
KULEMIN, NIKOLAI 0.421 0.454 0.51 -0.033 -0.023
ORR, COLTON 0.401 0.454 0.5 -0.053 -0.053
MCLAREN, FRAZER 0.388 0.443 0.501 -0.055 -0.054
MCCLEMENT, JAY 0.368 0.459 0.506 -0.091 -0.085

FF% is the players FF% when he is on the ice expressed in decimal form. TMFF% is an average of the players team mates FF% when they are not playing with the player in question (i.e. what his team mates do when they are separated from them, or a quality of teammate metric). OppFF% is an average of the players opponents FF% (i.e. a quality of competition metric). From those base stats I took FF% – TMFF% which will tell us which players perform better than their teammates do when they aren’t playing with him (the higher the better). Finally I factored in OppFF% by adding in how much above 50% their opposition is on average. This will get us an all encompassing stat to indicate who are the drags on the Leafs possession game.

Jay McClement is the Leafs greatest drag on possession. A few weeks ago I posted an article visually showing how much of a drag on possession McClement has been this year and in previous years. McClement’s 5v5 FF% over the past 6 seasons are 46.2%, 46.8%, 45.3%, 47.5%, 46,2% and 36.8% this season.

Next up are the goons, Orr and McLaren which is probably no surprise. They are more interested in looking for the next hit/fight than they are the puck. In general they are low minute players so their negative impact is somewhat mitigated but they are definite drags on possession.

Kulemin is the next biggest drag on possession which might come as a bit of a surprise considering that he has generally been fairly decent in the past. Looking at the second WOWY chart here you can see that nearly every player has a worse CF% (same as FF% but includes shots that have been blocked) with Kulemin than without except for McClement and to a much smaller extent Liles. This is dramatically different than previous seasons  (see second chart again) when the majority of players did equally well or better with Kulemin save for Grabovski. Is Kulemin having an off year? It may seem so.

Next up is my favourite whipping boy Tyler Bozak. Bozak is and has always been a drag on possession. Bozak ranks 293 of 312 forwards in FF% this season (McClement is dead last!) and in the previous 2 seasons he ranked 296th of 323 players.

Among forwards, McClement, McLaren, Orr, Kulemin and Bozak appear to be the biggest drags on the Maple Leafs possession game this season.

Defense

Player Name FF% TMFF% OppFF% FF% – TMFF% FF%-TMFF%+OppFF%-0.5
FRANSON, CODY 0.469 0.437 0.506 0.032 0.038
GARDINER, JAKE 0.463 0.44 0.506 0.023 0.029
KOSTKA, MICHAEL 0.459 0.435 0.504 0.024 0.028
GUNNARSSON, CARL 0.455 0.437 0.506 0.018 0.024
FRASER, MARK 0.461 0.445 0.506 0.016 0.022
LILES, JOHN-MICHAEL 0.445 0.443 0.503 0.002 0.005
PHANEUF, DION 0.422 0.455 0.509 -0.033 -0.024
HOLZER, KORBINIAN 0.399 0.452 0.504 -0.053 -0.049
O_BYRNE, RYAN 0.432 0.505 0.499 -0.073 -0.074

O’Byrne is a recent addition to the Leafs defense so you can’t blame the Leafs possession woes on him, but in Colorado he was a dreadful possession player so he won’t be the answer to the Leafs possession woes either.

Korbinian Holzer was dreadful in a Leaf uniform this year and we all know that so no surprise there but next up is Dion Phaneuf, the Leafs top paid and presumably best defenseman. In FF%-TMFF%+OppFF%-0.5 Phaneuf ranked a little better the previous 2 seasons (0.023 and 0.003) so it is possible that he is having an off year or had his stats dragged down a bit by Holzer but regardless, he isn’t having a great season possession wise.

 

 

Apr 162013
 

If you follow me on twitter you know I am not a fan of Tyler Bozak and I have written about him in the past. As a Leaf fan I want to keep writing about his poor play because I really do not want to see him re-signed in Toronto. He isn’t a good player and simple does not deserve it, especially if he is going to be making upwards of $4M/yr on a 4+ year long contract.  Let’s take a look at how he ranks in a variety of categories over the previous 3 seasons combined as well as this season.

Statistic 3yr 2012-13
5v5 G/60 219/324 130/310
5v5 A/60 168/324 144/310
5v5 Pts/60 199/324 139/310
5v5 IGP 265/324 195/310
5v5 IAP 202/324 221/310
5v5 IPP 288/324 268/310
5v5 FF20 155/324 173/310
5v5 FA20 319/324 309/310
5v5 FF% 275/324 291/310
5v4 G/60 116/155 57/147
5v4 A/60 144/155 98/147
5v4 Pts/60 150/155 89/147
5v4 IGP 76/155 66/147
5v4 IAP 131/155 110/147
5v4 IPP 139/155 114/147

The above are his rankings among other forwards (i.e. 219/324 means 219th among 324 forwards with >1500 5v5 3yr minutes, >300 5v5 2012-13 minutes, >400 5v4 3yr minutes and >75 5v4 2012-13 minutes.  2012-13 stats for games up to but not including last nights).  For 5v5 ice time we are essentially talking the top 10-11 forwards on each team, or their regulars and on the power play we are talking the top 5 forwards in PP ice time per team.

In 3-year 5v5 goals, assists and points per 60 minutes of play Tyler Bozak is ranking approximately the equivalent of a good 3rd line player. The thing is, he is doing that while playing on the first line but his terrible IGP, IAP, and IPP numbers indicate he is doing a terrible job keeping pace with his fellow first line mates.  If you look at his 3 year fenwick numbers (FF20, FA20 and FF%) which are on-ice stats you see when Tyler Bozak has been on the ice the Leafs have been mediocre at shot generation and terrible at shot prevention. Only a handful (literally, just 5 players) have a worse shot prevention record when they are on the ice.

On the power play things aren’t much better. He is second powerplay unit material at best but he is near the bottom of the pack in every assist and point generation and only a bit better in goal production.

Overall his numbers look a little better in 2012-13 but they certainly aren’t much to write home about, especially his IGP, IAP and IPP. He still looks to be a 3rd line offensive player with terrible defensive ability.

Another thing we can look at is his WOWY numbers with his most frequent line mate Phil Kessel.

Bozak w/Kessel Bozak wo/ Kessel
3yr GF20 0.874 0.648
3yr GA20 0.995 1.297
3yr GF% 46.8% 33.3%
3yr CF20 19.60 17.43
3yr CA20 20.89 20.82
3yr CF% 48.4% 45.6%
2012-13 GF20 0.956 0.000
2012-13 GA20 0.918 0.419
2012-13 GF% 51.0% 0.0%
2012-13 CF20 19.50 8.38
2012-13 CA20 21.53 25.55
2012-13 CF% 47.5% 24.7%

When Phil Kessel and Tyler Bozak are on the ice together they are not even breaking even. When Tyler Bozak is on the ice without Kessel they are significantly worse. Individually, Tyler Bozak has scored just 3 of his 26 5v5 goals (11.5%) and 8 of his 68 points (11.8%) over the previous 3 seasons when separated from Kessel despite playing nearly 20% of his ice time apart from Kessel. When not with Kessel his goal and point production drops significantly and as we know from above it wasn’t all that impressive to start with.

Not shown are Phil Kessel’s numbers when he isn’t playing with Tyler Bozak but they are generally better than when they are together. Phil Kessel when not playing with Tyler Bozak has a GF% of 50.4% and a CF% of 51.5% over the previous 3 seasons. Tyler Bozak appears to be a drag on Kessel’s offense.

The only argument you can for keeping Bozak is that the Kessel-Bozak-Lupul/JVR line has been productive and is working so why break them up. To me that argument only works when Bozak is making $1.5M and is not a significant drag on the salary cap but you can’t be paying a player $3.5-4M to essentially be a place holder between Kessel and Lupul/JVR.

Related News Article: James Mirtle wrote an article on the tough decision Leaf management has regarding the re-signing of Tyler Bozak.

(I am going to try and include a glossary in my posts for advanced statistics mentioned in the post so those not familiar with advanced stats can find out what they mean but a full glossary can also be found here).

Glossary

  • G/60 – Goals scored per 60 minutes of play
  • A/60 – Assists per 60 minutes of play
  • Pts/60 – Points per 60 minutes of play
  • IGP – Percentage of teams goals while player was on ice that were scored by the player
  • IAP – Percentage of teams goals while player was on the ice that the player had an assist on
  • IPP – Percentage of teams goals while player was on the ice that player scored or had an assist on
  • FF20 – Fenwick (shots + missed shots) by team per 20 minutes of ice time
  • FA20 – Fenwick (shots + missed shots) against team per 20 minutes of ice time
  • FF% – % of all shot attempts (shots + missed shots) while on ice that the players team took – FF/(FF+FA)
  • GF20, GA20, GF% – same as FF20, FA20, FF% except for goals
  • CF20, CA20, CF% – same as FF20, FA20, FF% but also includes shot attempts that were blocked (corsi)

 

Apr 122013
 

Even though I think the idea of ‘usage’ and ‘tough minutes’ is a vastly over stated factor in an individual players statistics they are interesting to look at as it gives us an indication of how a coach views the player. So for all the usage fans, here is another usage statistic which I will call the Leading-Trailing Index, or LT Index for short.

LT Index = TOI% when leading / TOI% when trailing

where TOI% is the percentage of the teams overall ice time (in games that the player played in) that the player is on the ice (so a 5v5 TOI% of 20% means the player was on the ice for 20% of the time that the team was at 5v5). Thus, the LT index is a ratio of the players ice time when his team is leading to his ice time when his team is trailing adjusted for the overall ice time that the team is leading/trailing. Any number greater than 1.00 indicates the player gets a greater share of ice time when the team is leading and anything under 1.00 indicates the player gets a greater share of ice time when the team is trailing.  So, any players with an LT index greater than one is used more as a defensive player than an offensive one and anything less than one they are used more as an offensive player than a defensive one. Any player around 1.00 is a well balanced player. So, looking at this seasons data we have the following player usage:

Defensive Usage

Defenseman LT Index Forward LT Index
MICHAEL STONE 1.21 BJ CROMBEEN 1.66
KEITH AULIE 1.20 MATHIEU PERREAULT 1.45
RYAN MCDONAGH 1.19 CRAIG ADAMS 1.35
PAUL MARTIN 1.16 TRAVIS MOEN 1.33
BRYCE SALVADOR 1.15 BOYD GORDON 1.26
BRENDAN SMITH 1.14 JAMES WRIGHT 1.26
SCOTT HANNAN 1.14 MICHAEL FROLIK 1.23
ANDREJ SEKERA 1.13 BRIAN BOYLE 1.22
MIKE WEBER 1.13 MATT CALVERT 1.22
JUSTIN BRAUN 1.12 TANNER GLASS 1.20
BARRET JACKMAN 1.12 MATT MARTIN 1.19
ROBYN REGEHR 1.12 RUSLAN FEDOTENKO 1.19
CLAYTON STONER 1.12 STEPHEN GIONTA 1.19
ANTON VOLCHENKOV 1.11 CASEY CIZIKAS 1.19
RON HAINSEY 1.11 JEFF HALPERN 1.18
TIM GLEASON 1.11 DAVID JONES 1.17
ROSTISLAV KLESLA 1.11 NIKOLAI KULEMIN 1.17
ROB SCUDERI 1.10 ZACK KASSIAN 1.17
NIKLAS HJALMARSSON 1.10 RYAN CARTER 1.16
NICKLAS GROSSMANN 1.10 TORREY MITCHELL 1.16

Offensive Usage

Defenseman LT Index Forward LT Index
RYAN ELLIS 0.78 DEREK DORSETT 0.77
KRIS LETANG 0.84 RAFFI TORRES 0.77
MARK STREIT 0.86 TAYLOR HALL 0.79
KYLE QUINCEY 0.87 CORY CONACHER 0.79
MATT NISKANEN 0.87 JORDAN EBERLE 0.80
JUSTIN SCHULTZ 0.87 NAIL YAKUPOV 0.82
DOUGIE HAMILTON 0.88 RYAN NUGENT-HOPKINS 0.82
VICTOR HEDMAN 0.88 RICH CLUNE 0.82
DAN BOYLE 0.89 BLAKE COMEAU 0.82
KEVIN SHATTENKIRK 0.89 KYLE PALMIERI 0.84
ALEX PIETRANGELO 0.89 BRENDAN GALLAGHER 0.84
JOHN-MICHAEL LILES 0.90 CLAUDE GIROUX 0.86
JOHN CARLSON 0.90 VINCENT LECAVALIER 0.86
P.K. SUBBAN 0.90 DREW SHORE 0.86
LUBOMIR VISNOVSKY 0.91 TJ OSHIE 0.87
CODY FRANSON 0.91 ALEX OVECHKIN 0.87
JAMIE MCBAIN 0.91 JONATHAN HUBERDEAU 0.87
ROMAN JOSI 0.92 NICKLAS BACKSTROM 0.87
JARED SPURGEON 0.93 SCOTT HARTNELL 0.87
CHRISTIAN EHRHOFF 0.93 MARIAN HOSSA 0.87

Balanced Usage

Defenseman LT Index Forward LT Index
MICHAEL DEL_ZOTTO 0.99 BRYAN LITTLE 0.99
ERIC BREWER 0.99 MIKE FISHER 0.99
JAKUB KINDL 0.99 MIKKEL BOEDKER 0.99
ADRIAN AUCOIN 0.99 ALEXEI PONIKAROVSKY 0.99
ALEX GOLIGOSKI 0.99 JASON POMINVILLE 0.99
ERIK GUDBRANSON 1.00 CHRIS STEWART 0.99
DREW DOUGHTY 1.00 DANIEL BRIERE 1.00
THOMAS HICKEY 1.00 RADIM VRBATA 1.00
JOHNNY ODUYA 1.00 ALEX TANGUAY 1.00
SLAVA VOYNOV 1.00 GABRIEL LANDESKOG 1.00
MATT IRWIN 1.00 JIRI TLUSTY 1.00
FRANCIS BOUILLON 1.01 COLIN WILSON 1.00
JONAS BRODIN 1.01 PATRICK DWYER 1.00
BRENT SEABROOK 1.01 JADEN SCHWARTZ 1.01
JOSH GORGES 1.01 BRANDON SAAD 1.01
DUSTIN BYFUGLIEN 1.01 LEO KOMAROV 1.01
BRENDEN DILLON 1.01 DREW MILLER 1.01
GREG ZANON 1.01 DAVID PERRON 1.01
KRIS RUSSELL 1.02 TOM PYATT 1.01

It’s amazing how much more BJ Crombeem gets used protecting a lead than when trailing. You’d have to think that score effects could have a significant impact on his stats because of this. Not really a lot of surprises there though though in the case of a guy like Derek Dorsett him being in the ‘offensive usage’ category has more with the coach not wanting to use him defending a lead than hoping he will score a goal to get the team back in the game.

 

Apr 122013
 

Now that I have added home and road stats to stats.hockeyanalysis.com I can take a look at how quality of competition differs when the team is at home vs when they are on the road. In theory because the home team has last change they should be able to dictate the match ups better and thus should be able to drive QoC a bit better. Let’s take a look at the top 10 defensemen in HARO QoC last season at home and on the road (defensemen with 400 5v5 home/road minutes were considered).

Player Name Home HARO QOC Player Name Road HARO QOC
GIRARDI, DAN 8.81 MCDONAGH, RYAN 6.73
MCDONAGH, RYAN 8.49 GORGES, JOSH 6.48
PHANEUF, DION 8.46 GIRARDI, DAN 6.03
GARRISON, JASON 8.27 SUBBAN, P.K. 5.95
GORGES, JOSH 8.25 PHANEUF, DION 5.94
GLEASON, TIM 8.21 GUNNARSSON, CARL 5.48
SUBBAN, P.K. 8.19 ALZNER, KARL 5.35
WEAVER, MIKE 7.92 STAIOS, STEVE 5.15
ALZNER, KARL 7.74 TIMONEN, KIMMO 4.95
REGEHR, ROBYN 7.72 WEAVER, MIKE 4.67

There is definitely a lot of common names in each list but we do notice that the HARO QoC is greater at home than on the road for these defensemen. Next I took a look at the standard deviation of all the defensemen with 400 5v5 home/road minutes last season which should give us an indication of how much QoC varies from player to player.

StdDev
Home 3.29
Road 2.45

The standard deviation is 34% higher at home than on the road which again confirms that variation in QoC are greater at home than on the road.  All of this makes perfect sense but it is nice to see it backed up in actual numbers.