David Johnson

Apr 122014
 

As of last night games all 16 playoff teams have been determined. Before I get into any playoff predictions, lets take a look at how last seasons 5v5 close statistics do at predicting who would make the playoffs this season.

CFPctGFPctPlayoffPredictor

The above table shows last years 5v5close GF% and CF% and the teams in red are this years playoff teams.

There were 15 teams with a CF% above 50% last year, 9 of them made the playoffs this year while 6 missed. Of the remaining 15 teams that had sub 50% CF% last year, 7 of them made the playoffs this year while 8 missed. Seven of the top 10 CF% teams last year made the playoffs while 5 of the bottom 10 teams made the playoffs and 5 missed.

There were 18 teams last year with at least 50% GF% and 11 made the playoffs this season while 7 missed. Of the 12 teams that failed to reach 50% GF% last season, 5 made the playoffs and 11 missed. Seven of the top 10 GF% teams made the playoffs last season while 7 of the bottom 10 missed the playoffs.

Difficult to say one was significantly better than the other. Truth is, neither was particularly good but with 7 of the bottom 9 GF% teams last year missing the playoffs this year that might be enough to give GF% a slight edge. That said, the better predictor might have been last seasons point totals.

PtTotalsPlayoffPredictor

 

Apr 082014
 

The past few weeks while I have been shifting my website from one web host to another in an attempt to fight off the DDoS attacks I started thinking about how big my stats.hockeyanalysis.com database actually is. I was thinking about it because of how long it takes to upload the data to a new web host and how long it takes to set up the database again.

So, how many data points do I have in my database?  A lot. A data point is any single piece of data like the Leafs 2008-09 CF% or Jarome Iginla’s 2007-13 (6yr) individual Goals/60 or Jack Johnson’s CF% while playing with Drew Doughty during the 2008-09 season. Each of those is a single data point.

Here is a summary of all the data point totals by table type.

Database Table Type Total Records Datapoints/record Total Data points
Individual+OnIce Stats 595726 123 73274298
WOWY 3983667 54 215118018
“Against You” 10856454 38 412545252
Team Data 660 28 18480
Total 700956048

So yes, there are just over 700 million data points in my database not including things like player names, player positions, players team, etc. Once I add in all the multi-year data that includes this current season I estimate there will be over 900 million datapoints.

The majority, though not all (I’d estimate 70-80%), of these data points are accessible to you if you conduct the right searches. Which one of you is going to be the first to count them all?

Now, if I actually uploaded all the data I can generate (specifically WOWY and Against You data when players have played fewer than 5 minutes with/against each other) the number of data points would rise dramatically, probably several billion data points. This is why I don’t upload that data.

 

Apr 012014
 

Last week Tyler Dellow had a post titled “Two Graphs and 480 Words That Will Convince You On Corsi%” in which, you can say, I was less than convinced (read the comments). This post is my rebuttal that will attempt to convince you on the importance of Sh% in player evaluation.

The problem with shooting percentage is that it suffers from small sample size issues. Over small sample sizes it often gets dominated by randomness (I prefer the term randomness to luck) but the question I have always had is, if we remove randomness from the equation, how important of a skill is shooting percentage? To attempt to answer this I will look at the variance in on-ice shooting percentages among forwards as we increase the sample size from a single season (minimum 500 minutes ice time) to 6 seasons (minimum 3000 minutes ice time). As the sample size increases we would expect the variance due to randomness to decrease. This means, when the observed variance stops decreasing (or significantly slows the rate of decrease) as sample size increases we know we are approaching the point where any variance is actually variance in true talent and not small sample size randomness. So, without going on any further I present you my first chart of on-ice shooting percentages for forwards in 5v5 situations.

 

ShPctVarianceBySampleSize

Variance decline pretty much stops by the time you reach 5 years/2500+ minutes worth of data but after 3 years (1500+ minutes) the drop off rate falls off significantly. It is also worth noting that some of the drop off over longer periods of time is due to age progression/regression and not due to reduction in randomness.

What is the significance of all of this?  Well, at 5 years a 90th percentile player would have 45% more goals given an equal number of shots as a 10th percentile player. A player one standard deviation above average will have 33% more goals for given an equal number of shots as a player one standard deviation below average.

Now, let’s compare this to the same chart for CF/20 to get an idea of how shot generation varies across players.

CF20VarianceBySampleSize

It’s a little interesting that the top players show no regression over time but the bottom line players do. This may be because terrible shot generating players don’t stick around long enough. More importantly though is the magnitude of the difference between the top players and the bottom players.  Well, a 90th percentile CF20 player produces about 25% more shots attempts than a 10th percentile player and a one standard deviation above average CF20 player produces about 18.5% more than a one standard deviation below average CF20 player (over 5 years). Both of these are well below (almost half of) the 45% and 33% we saw for shooting percentage.

I hear a lot of ‘I told you so’ from the pro-corsi crowd in regards to the Leafs and their losing streak and yes, their percentages have regress this season but I think it is worth noting that the Leafs are still an example of a team where CF% is not a good indicator of performance. The Leafs 5v5close CF% is 42.5% but their 5v5close GF% is 47.6%. The idea that CF% and GF% are “tightly intertwined” as Tyler Dellow wrote is not supported by the Maple Leafs this season despite the fact that the Maple Leafs are the latest “pro-Corsi” crowds favourite “I told you so” team.

There is also some evidence that the Leafs have been “unlucky” this year. Their 5v5close shooting percentages over the past 3 seasons have been 8.82 (2nd), 8.59(4th), 10.54(1st) while this year it has dropped to 8.17 (8th). Now the question is how much of that is luck and how much is the loss of Grabovski and MacArthur and the addition of Clarkson (who is a generally poor on-ice Sh% player) but the Leafs Sh% is well below the past few seasons and some of that may be bad luck (and notably, not “regression” from years of “good luck”).

In summary, generating shots matter, but capitalizing on them matters as much or more.

 

Mar 122014
 

I know I am in a bit of a minority but it is my opinion that one of the greatest failings of hockey analytics thus far is overstating the importance of Corsi at both the team and (especially) the individual level.

In a post yesterday about Luke Gazdic Tyler Dellow of mc79hockey.com wrote:

We care about Corsi% because it predicts future goals for/against better than just using goals for/against.

The problem is, this is only partly true and is missing an important qualifier at the end of the sentence. It should read:

We care about Corsi% because it predicts future goals for/against better than just using goals for/against when sample sizes are not sufficiently large.

We can debate what ‘sufficiently large’ sample sizes are but at the team level I’d suggest that it is something less than a full seasons worth of data and at the player level is probably between 500 and  750 minutes of ice time depending on shot rates based on some past research I have done.

In a post on the limits of Corsi at Arctic Ice Hockey Garret Hohl writes:

Winning in puck possession and scoring chances is important and will lead to wins but does not encompass the full game. The largest factors outside of possession and chances are luck (ie: bounces), special teams, and combination of goaltending and shot quality (probably in that order).

The problem with that paragraph is that there is no context of sample size. Sample size means everything when writing a sentence like that. If the sample was 3 games played by a particular team luck is quite probably the most important factor in determining how many of those 3 games the team wins. If the sample is 300 games luck is mostly irrelevant. Without considering sample size, there is no way of knowing what the ‘luck factor’ truly is. Furthermore, luck will mostly impact goaltending (save percentage) and shot quality (shooting percentage) so while goaltending and shooting talent can have minimal impact on winning over small sample sizes, it can’t be known what impact they have over the long haul without looking at larger sample sizes. Far too many conclusions about shot quality and goaltending have been made by looking at too small of sample sizes and far too few people have attempted to actually quantify the importance of shooting talent at the team level. As a result, far too often I hear statements like ‘Team X’s shooting percentage is unsustainable” when in reality it actually is.

Below is a chart of the top 5 and bottom 5 teams in terms of 5v5close shooting percentage over the 5 years from the 2007-08 season to 2011-12 season along with their shooting percentages from last year and this year through Saturday games.

2007-12 2012-13 2013-14
Pittsburgh 8.45 10.12 8.28
Philadelphia 8.30 8.96 8.14
Tampa 8.29 7.68 7.40
Edmonton 8.17 7.79 9.01
Toronto 8.16 10.52 8.05
Top 5 Avg 8.27 9.01 8.18
Bottom 5 Avg 7.14 6.51 6.68
NY Islanders 7.23 8.14 7.38
San Jose 7.19 6.59 7.23
New Jersey 7.14 6.35 6.65
Ny Rangers 7.11 5.99 6.11
Florida 7.05 5.49 6.05

What you will see is that the top 5 teams had an average 5-year shooting percentage 1.13% points higher than the bottom 5 teams. This is not insignificant either. It means that the top 5 teams will score almost 16% more goals than the bottom 5 teams just based on differences in their shooting percentage. If one looks at 5 year CF/60 you will find the top 5 teams are just over 17% higher than the bottom 5 teams so over a 5 year span. Thus, there is very little difference in the variation in shooting percentage and variation in corsi rates at the 5 year level.

Now, are shooting percentages sustainable?  Well, in the 2 seasons since, one lock out shortened and one not yet complete, the top 5 5-year teams have actually, on average, improved while the bottom 5 teams have, on average, gotten worse. Aside from the 2012-13 NY Islanders all the other bottom 5 teams remained well below average and nowhere close to any of the top 5 teams. There is no observable regression occurring here.

Based on these observations, one can conclude that when it comes to scoring goals at the team level shooting percentages is pretty close to being equally important as shot generation. I won’t show it here, but if one did a similar study at the player ‘on-ice’ level you will find the difference in the best shooting percentage players and worst shooting percentage players are significantly more important than the difference in shot generation.

I don’t quite know why hockey analytics got this all wrong and has largely not yet come around to the importance of shot quality (it is slowly moving, but not there yet) as there have been some good posts showing the importance of shot quality but they largely get ignored out by the masses. Part of the problem is certainly that some of the early studies in shot quality just looked at too small a sample size. Another reason is that 2009-10 seems to be a real strange year for shooting percentages at the team level. Toronto, Edmonton and Philadelphia (top 5 teams from above) ranked 25th, 23rd and 20th in shooting percentage while San Jose, NY Islanders and New Jersey (bottom 5 teams from above) ranked 6th, 10th, and 13th. These were anomalies for all those teams so any year over year studies that used 2009-10 probably resulted in atypical results and less valid conclusions. Finally, I think part of the problem is that analytics have followed the lead of a few very vocal people and dismissed some other important but less vocal voices.  Regardless of how we got here for hockey analytics to move forward we need to move past the notion that shot-based metrics are more important than goal based metrics.

Shot-based metrics are OK to use only when we don’t have a very large sample size. The thing is, this isn’t true for most players/teams. The majority of NHL players have played multiple seasons in the NHL and teams have a history of data we can look at. We can look at multiple years of data to see how sustainable a particular teams or players percentages are.  It isn’t that difficult to do and will tell us far more about the player than looking at his CF% this season.

When I am asked to look at a player that I am not particularly knowledgeable on, the first thing I typically do is open up my WOWY pages for that player at stats.hockeyanalysis.com, especially the graphs that will quickly give me an indication of how the player performs relative to his team mates. I’ll maybe look at a multi-year WOWY first, and then look at several single-year WOWY’s to see if there are any trends I can spot. I’ll primarily look at GF% WOWY’s but will consider CF% WOWY’s to and maybe even GF20/GA20/CF20/CA20 WOWY’s. I look for trends over time, not how the player did during any particular year. This is because the percentages can matter a lot for some players and it is important to know what players can post good percentages consistently from year to year. I then may look at that players individual numbers such as GF/60, Pts/60, Assists/60 as well as IPP, IGP and IAP to determine how involved they were in the offense while they were on the ice (and I’ll do this looking at several seasons, and multiple seasons combined). Then I’ll take a look at his line mates, quality of competition, and usage (zone starts, PP/PK ice time, etc.). Only then will I start to feel comfortable drawing any kind of conclusions about the player.

As I recently wrote and article suggesting hockey analytics is hard and the above explains why. There is no single stat we can look at to find an answer. A goal-based analysis has flaws. A corsi-based analysis has flaws. Looking at just a single season has flaws. Looking at multiple seasons has flaws. There are score effects and quality of teammates and quality of opponents and zone starts that we need to consider not to mention sample sizes. Coaching/style of play is another area where hockey analytics has barely touched and yet it probably has a significant impact on statistics and results (maybe especially significant on corsi statistics). Hockey Analytics is hard and corsi doesn’t have all the answers so it is important not to reduce hockey analytics to looking up some corsi stats and drawing conclusions. I fear that hockey analytics has over-hyped the importance of corsi at the expense of other important factors and that is unfortunate.

 

Feb 092014
 

There is a recently posted article on BroadStreetHockey.com discussing overused and overrated statistics. The first statistic on that list is Plus/Minus. Plus/minus has its flaws and gets wildly misused at times but it doesn’t mean it is a useless statistics if used correctly so I want to defend it a little but also put it in the same context as corsi.

The rational given in the BroadStreetHockey.com article for plus/minus being a bad statisitcs is that the top of the plus/minus listing is dominated by a few teams. They list the top 10 players in +/- this season and conclude:

Now there are some good players on the list for sure, but look a little bit closer at the names on the list. The top-ten players come from a total of five teams. The top eight all come from three teams. Could it perhaps be more likely that plus/minus is more of a reflection of a team’s success than specific individuals?

Now that is a fair comment but let me present you the following table of CF% leaders as of a few days ago.

Player Name Team CF%
MUZZIN, JAKE Los_Angeles 0.614
WILLIAMS, JUSTIN Los_Angeles 0.611
KOPITAR, ANZE Los_Angeles 0.611
ERIKSSON, LOUI Boston 0.606
BERGERON, PATRICE Boston 0.605
TOFFOLI, TYLER Los_Angeles 0.595
TOEWS, JONATHAN Chicago 0.592
THORNTON, JOE San_Jose 0.591
MARCHAND, BRAD Boston 0.591
ROZSIVAL, MICHAL Chicago 0.590
TARASENKO, VLADIMIR St.Louis 0.589
KING, DWIGHT Los_Angeles 0.589
BROWN, DUSTIN Los_Angeles 0.586
DOUGHTY, DREW Los_Angeles 0.584
BURNS, BRENT San_Jose 0.583
BICKELL, BRYAN Chicago 0.582
HOSSA, MARIAN Chicago 0.581
KOIVU, MIKKO Minnesota 0.580
SAAD, BRANDON Chicago 0.579
SHARP, PATRICK Chicago 0.578
SHAW, ANDREW Chicago 0.578
SEABROOK, BRENT Chicago 0.576

Of the top 22 players, 8 are from Chicago and 7 are from Los Angeles. Do the Blackhawks and Kings have 68% of the top 22 players in the NHL? If we are tossing +/- aside because it is “more of a reflection of a team’s success than specific individuals” then we should be tossing aside Corsi as well, shouldn’t we?

The problem is not that the top of the +/- list is dominated by a few teams it is that people misinterpret what it means and don’t consider the context surrounding a players +/-. No matter what statistic we use we must consider context such as quality of team, ice time, etc. Plus/minus is  no different in that regard.

There are legitimate criticisms of +/- that are unique to +/- but in general I think a lot of the criticisms and subsequent dismissals of +/- having any value whatsoever are largely unfounded. It isn’t that plus/minus is over rated or over used it is that it is often misued and misinterpreted and to be honest I see this happen just as much with Corsi and the majority of other “advanced” statistics as well. It isn’t the statistic that is the problem, it is the user of the statistic. That, unfortunately, will never change but that shouldn’t stop us who know how to use these statistics properly from using them to advance our knowledge of hockey. So please, can we stop dismissing plus/minus (and other stats) as a valueless statistics just because a bunch of people frequently misuse it.

The truth is there are zero (yes, zero) statistics in hockey that can’t and aren’t regularly misused and used without contextualizing. That goes from everything from goals and point totals to corsi to whatever zone start or quality of competition metric you like. They are all prone to be misused and misinterpreted and more often than not are. It is not because the statistics themselves are inherently flawed or useless its because hockey analytics is hard and we are a long long way from fully understanding all the dynamics at play. Some people are just more willing to dig deeper than others. That will never change.

 

(Note: This isn’t intended to be a critique of the Broad Street Hockey article because the gist of the article is true. The premise of the article is really about statistics needing context and I agree with this 100%. I just wish it wasn’t limited to stats like plus/minus, turnovers, blocked shots, etc. because advanced statistics are just as likely to be misused.)

 

Oct 012013
 

It appears that Phil Kessel’s is on the verge of signing an 8 year, $8M/yr contract with the Leafs so this is a good time to compare this contract to a couple other elite wingers who have signed contracts in the past year or so. Corey Perry and Zach Parise. I have also chosen to include Rick Nash in the discussion because he is a comparable goal scoring winger with a comparable salary even though he signed his contract several years ago. Before we get into contracts though, let’s take a look at production levels by age.

KesselGoalsPerGameByAge

 

In terms of goal production, both Nash and Kessel got their careers started earlier than Perry or Parise and both had their best goal production years earlier int heir careers. Kessel of course had his best goal production year playing a significant amount of time with one of the best playmakers in the league at the time, Marc Savard. He has yet to match that level in Toronto but of course he is playing with Tyler Bozak in Toronto. Aside from Perry’s career year at age 25 he has generally been at or below the production level of the other three at the same age while Nash has generally been the more productive player. Note that I have removed Parise’s Age 25 season as he missed the majority of the year to injury. Nash’s age 20 season was lost due to a lockout. Ages are based on draft year (first season after draft year is age 18)

 

KesselPointsPerGameByAge

Not really a lot different in the points/game chart which kind of makes sense because all these players are wingers and more goal scorers than play makers. Parise once again had his peak season at age 23 while Perry again had his at age 25. Nash has maintained a little more consistency fluctuating between 0.8 and 1.0 since his age 21 season though one should remember that Nash’s age 21 season was 2005-06 when goal production was inflated due to obstruction crackdown and far more power plays. Kessel appears to still be on the upswing and he has shown more play making ability with Lupul or van Riemsdyk on the other wing and the absence of a play maker at center.

Age Length Total$
Parise* 27 8 $80M
Perry 27 8 $69M
Kessel 25 8 $64M
Nash 25 8 $62.4M

*Parise’s salary over the first 8 years of his contract.

Parise’s salary is a little wonky as he signed his contract under the old CBA which was a back diving contract in which he earns $94M over the first 10 years and $4M over the final 3. Perry is the easiest to compare with as he is the most recent contract signing while Nash signed several years ago when the salary cap was lower. All things considered Kessel’s contract is at least fairly priced if not a slight bargain.

In conclusion, even though the others may have had higher ‘peak’ seasons (though it is certainly possible, maybe likely, that Kessel hasn’t reached his peak) it is fair to suggest that Kessel is deserving to be considered similarly talented to the other three which makes his $8M/yr salary not only fair but maybe a slight bargain.

 

Sep 212013
 

In a series of recent posts at mc79hockey.com, Tyler Dellow discussed a new concept (to me anyway) that he called ‘open play’ hockey. In a post on “The Theory of the Application of Corsi%” he wrote:

I have my own calculation that I do of what I call an open play Corsi%. I wipe out the faceoff effects based on some math that I’ve done as to how long they persist and look just at what happened during the time in which there wasn’t a faceoff effect.

This sounds strangely similar to my zone start adjusted statistics where I eliminate the first 10 seconds after an offensive or defensive zone face off as I have found that beyond that the effect of the face off is largely dissipated. I was curious as to how in fact these were calculated and it seemed I wasn’t the only one.

As far as I can tell, the tweet went unanswered.

In a followup post “New Metrics I” the concept of open play hockey was mentioned again.

I’m calculating what I call an open play Corsi% – basically, I knock out the stuff after faceoffs and then the stuff I’m left with, theoretically, doesn’t have any faceoff effects. It’s just guys playing hockey.

In the comments I asked if he could define more precisely what “stuff after faceoffs” meant but the question went unanswered. Dellow has subsequently referenced open play hockey in his New Metrics 2 post and in a follow up post answering questions about these new metrics. What still hasn’t been explained though is how he actually determines “open play” hockey.

Doing a search on Dellow’s website for “open play” we find that this concept has been mentions a couple times previously. In a post titled Big Oilers Data IX: Neutral Zone Faceoff Wins we might get an answer to exactly what ‘open play’ actually is.

As those of you who have been reading this series as I’ve gone along will be aware, I’ve been kind of looking at things on the basis of eight different kinds of 5v5 shift: Open Play (no faceoff during shift), six types of shift with one faceoff (OZ+, OZ-, NZ+, NZ-, DZ+, DZ-) and multi-faceoff shifts. The cool thing with seven of those types of shift is that I can get a benchmark of a type by looking at how the Oilers opposition did in the same situation.

So, as best I can determine, open play is basically any shift that doesn’t have  a face off.

The next question I’d like to answer is, how different is ‘open play’ from my 10 second adjustment. This is an interesting question because I have had this debate with many people that suggest that my 10 second adjustment isn’t adequate and that zone start effects are far more significant than my 10 second adjustment suggests. I have even had debates with Tyler Dellow about this (See here, here and here) so I am really curious as to what impact open play hockey has on a players statistics. Unfortunately, I don’t have much ‘open play’ data to go with but in the posts that Dellow has discussed it he has mentioned a few players open play corsi% statistics so I will work with what I have. Here is a comparison of Dellow’s open play stats and my 10-second zone start adjusted stats.

Player Year OpenPlay Corsi% ZSAdj CF% OZ% DZ%
Fraser 2012-13 50.8% 50.4% 40.1 25.3
Fraser 2011-12 52.8% 53.2% 31.1 35.5
Fraser 2010-11 45.2% 42.2% 30.4 35.1
Fraser 2009-10 59.2% 57.7% 29.2 40.5
Fraser 2008-09 51.8% 52.6% 30.9 37
O’Sullivan 2011-12 44.3% 42.0% 35.7 26
O’Sullivan 2010-11 45.2% 45.6% 29.4 34
O’Sullivan 2009-10 43.9% 44.1% 31 32.2
O’Sullivan 2007-08 45.5% 46.5% 29.9 29.4
Eager 2012-13 34.4% 35.6% 40.5 32.8
Eager 2011-12 42.0% 43.0% 29.6 30.7
Eager 2009-10 54.4% 54.5% 18.3 39.1
Eager 2008-09 52.9% 53.9% 22.6 37.4

I have incldued OZ% and DZ% which is the percentage of face offs (including neutral zone face offs) that the player had in the offensive and defensive zone. These statistics along with ZSAdj CF% can be found on stats.hockeyanalysis.com.

If it isn’t obvious to you that there isn’t much difference between the two, let me make it more obvious by looking at this in graphical form.

OpenPlayvsZSAdjustedCorsiPct

That’s a pretty tight correlation and we are dealing with some player seasons that have had fairly significant zone start biases. Ben Eager had a very significant defensive zone start bias in both 2008-09 and 2009-10 but a sizable offensive zone bias in 2012-13. Colin Fraser had sizable defensive zone bias in 2009-10 but a sizable offensive zone bias in 2012-13. Patrick O’Sullivan had a heavy offensive zone bias in 2011-12. There is no compelling evidence here that ‘open play’ statistics are any more reliable or better than my 10-second zone start adjusted data. There is essentially no difference which reaffirms to me (yet again) that my 10-second adjustment is a perfectly reasonable method to adjust for zone starts which ultimately tells us that zone starts do not have a huge impact on a players statistics. Certainly not anywhere close to what many once believed, including Dellow himself. Any impact you see is more likely due to the quality of players one plays with if one gets a significant number of defensive zone starts.

Update: For Tyler Dellow’s response, or lack there of, read this.  Best I can tell is he doesn’t want to publicly say what open play is or how it shows zone starts affect players stats beyond my 10-second adjustment because I might interpret what he says as thinking I am right despite him clearly thinking the evidence proves me wrong. I guess rather than have me make a fool of myself by misinterpreting his results so I can believe I am right he is going to withhold the evidence from everyone. I feel so touched that Dellow would choose to save me from such embarrassment as misinterpreting results over letting everyone know the real effect of zone starts have on a players statistics and why ‘open play’ is what we should be using to negate the effect of zone starts. Truthfully though, I am willing to take the risk  of embarrassing myself if it furthers our knowledge of hockey statistics.

 

Related Articles:

Face offs and zone starts, is one more important than the other?

Tips for using Hockey Fancy Stats

 

 

Sep 162013
 

Let’s imagine a sport where two factors are equally correlated with winning so that FactorA is 50% correlated with winning and FactorB is 50% correlated with winning. Now for years general managers in this sport only ever knew that FactorA existed and when choosing how to build their team they only ever considered FactorA. Now let’s assume that in this idealist, yet uninformed about FactorB, world every general manager of every team allocated their financial resources perfectly based on their knowledge of Factor A. On top of that, every team is working under the same financial constraints meaning they spend the exact same amount of money.

The result is, in this fictional world, FactorA becomes perfectly evenly distributed across every team. Strangely though, even after accounting for luck, teams have statistically significant differences in winning percentages.

Now, along comes a smart individual who discovers the existence of FactorB and finds out that FactorB correlates 100% with winning percentage (after factoring out luck) and concludes that General Managers were wrong all along and that FactorB is all that matters to winning and FactorA is irrelevant (has to be since it has zero correlation with winning). Upon discovering this he gets hired to become a General Manager of a team and while every other GM was only signing FactorA players he chose to go out and sign solely FactorB players. He made signing FactorB players his goal. Strangely, despite FactorB seemingly showing a 100% correlation with winning, his team didn’t win any more than anyone else.

The reason for this is that FactorA is in fact important. It just doesn’t seem important because everyone knows about FactorA and FactorA is getting evenly spread out across teams. Ignoring FactorA for FactorB is equally wrong as ignoring FactorB for FactorA. Upon learning of the existence of FactorB and its high correlation with winning, the goal of a General Manager is not to optimize his team for FactorB but to recognize that there is undiscovered value in players that have FactorB as a skill while not ignoring other skills that we previously knew existed.

Bringing this back to hockey, lets call FactorA shooting percentage and FactorB shot generation. Teams have typically doled out contracts based on shooting percentage but not based on corsi as shown by Eric T. His conclusion was:\

most teams don’t give out contracts because of Corsi. But a team that does will get more wins out of their budget than a team that follows the conventional path and overvalues finishing talent.

My response is, not if it comes at the expense of ignoring finishing talent. Based on Tom Awad’s work, finishing talent is probably at least 50% of out scoring your opposition (note that shooting percentage is a combination of out finishing and shot quality in Awad’s terminology).

So, if teams have been doling out contracts based on, effectively, shooting percentage then it is perfectly reasonable to assume that shooting percentage talent is more evenly distributed across teams than corsi-talent is. Under these circumstances corsi would be highly correlated with winning percentage because that is where the differences lie between teams. This doesn’t mean that corsi is the main factor in out scoring the opponent though and valuing corsi at the expense of shooting percentage will be a detriment to any General Manager.

Furthermore, if General Managers as a whole started paying primarily for corsi we will start to find that corsi talent becomes more evenly distributed across teams and thus shooting percentage would become much more highly correlated with winning (even after adjusting for luck). Furthermore, paying players based on corsi would potentially lead to players altering their style of play to optimize their corsi statistics to the detriment of the ultimate goal, out scoring the opponent.

It is certainly possible in the current hockey universe in which players are paid more by shooting percentage than corsi that they play a style of game to optimize shooting percentage at the expense of winning so it is not unreasonable to see the flip side occur of corsi because a metric by which general managers dole out contracts.

Ultimately, the goal of any General Manager is to optimize his line up for out scoring the opposition, not out shooting percentage-ing them and not out corsi-ing them. Corsi or possession should never be considered the goal just as shooting percentage or any other identifiable skill shouldn’t be. The goal has been, is, and always will be out score the opposition and it’s the General Managers job to find the right balance of all the identifiable skills, not just those that seemingly correlate with winning.

 

Sep 142013
 

A while back I came up with a stat which at the time I called LT Index which is essentially the percentage of a players teams ice time when leading that the player is on the ice for divided by the percentage of a players teams ice time when trailing that the player is on the ice for (in 5v5 situations and only in games in which the player played). LT Index standing for Leading-Trailing Index. I have decided to rename this statistic to Usage Ratio since it gives us an indication of whether players are used more in defensive situations (i.e. leading and protecting a lead and thus a Usage Ratio above 1.00) or in offensive situations (i.e. when trailing and in need of a goal and thus a Usage Ratio less than 1.00). I think it does a pretty good job of identifying how a player is used.

I then compared players Usage Index to their 5v5 tied statistics using the theory that a player being used in a defensive role when leading/trailing is more likely to be used in a defensive role when the game is tied. This is also an out of sample comparison (which is always a nice thing to be able to do) since we are using leading/trailing situations to identify offensive vs defensive players and then comparing to 5v5 tied situations that in no way overlap the leading or trailing data.

Let’s start by looking at forwards using data over the last 3 seasons and including all forwards with >500 minutes of 5v5 tied ice time. The following charts compare Usage Ratio with 5v5 Tied CF%, CF60 and CA60.

UsageRatiovsCFPct

UsageRatiovsCF60

UsageRatiovsCA60

Usage Ratio is on the horizontal axis with more defensive players to the right and offensive players to the left.

Usage Ratio has some correlation with CF% but that correlation is solely due to it’s connection with generating shot attempts for and not for restricting shot attempts against. Players we identify as offensive players via the Usage Ratio statistic do in fact generate more shots but players we identify as defensive players do not suppress opposition shots any. In fact, Usage Ratio and 5v5 tied CA60 is as uncorrelated as you can possibly get. One may attempt to say this is because those defensive players are playing against offensive players (i.e. tough QoC) and that is why but if this were the case then those offensive players would be playing against defensive players (i.e. tough defensive QoC) and thus should see their shot attempts suppressed as well. We don’t observe that though. It just seems that players used as defensive players are no better at suppressing shot attempts against than offensive players but are, as expected, worse at generating shot attempts for.

Before we move on to defensemen let’s take a look at how Usage Ratio compares with shooting percentage and GF60.

UsageRatiovsShPct

 

UsageRatiovsGF60

As seen with CF60, Usage Ratio is correlated with both shooting percentage and GF60 and the correlation with GF60 is stronger than with CF60. Note that the sample size for 3 seasons (or 2 1/2 actually) of 5v5 tied data is about the same as the sample size for one season of 5v5 data (players in this study have between 500 and 1300 5v5 tied minutes which is roughly equivalent of how many 5v5 minutes forwards play over the course of one full season).

FYI, the dot up at the top with the GF60 above 5 is Sidney Crosby (yeah, he is in a league of his own offensively) and the dot to the far right (heavy defensive usage) is Adam Hall.

Now let’s take a look at defensemen.

UsageRatiovsCFPctDefensemen

UsageRatiovsCF60Defensemen

UsageRatiovsCA60Defensemen

There really isn’t much going on here and how a defenseman is used really does’t tell us much at all about their 5v5 stats (only marginal correlation to CF60). As with forwards, defensemen that we identify as being used in a defensive are not any better at reducing shots against than defensemen we identify as being used in an offensive manner.

To summarize the above, players who get more minutes when playing catch up are in fact better offensive players, particularly when looking at forwards but players who get more minutes when protecting a lead are not necessarily any better defensively. We do know that there are better defensive players (the range of CA60 among forwards is similar to the range of CF60 so if there is offensive talent there is likely defensive talent too), and yet coaches aren’t playing these defensive players when protecting a lead. Coaches in general just don’t know who their good defensive players are.

Still not sold on this? Well, let’s compare 5v5 defensive zone start percentage (percentage of face offs taken in the defensive zone) to CF60 and CA60 (for forwards) in 5v5 tied situations.

DefensiveFOPctvsCF60

Percentage of face offs in the defensive zone is on the horizontal axis and CF60 is on the vertical axis. This chart is telling us that the fewer defensive zone face offs a forward gets, and thus likely more offensive face offs, the more shot attempts for they produce. In short, players who get offensive zone starts get more shot attempts.

DefensiveFOPctvsCA60

The opposite is not true though. Players who get more defensive face offs don’t give up any more or less shots than their low defensive zone face off counterparts. This tells me that if there is any connection between zone starts and CF% it is solely due to the fact that players who get offensive zone starts are better offensive players and not because players who get defensive zone starts are better defensive players.

You might again be saying to yourself ‘the players who are getting the defensive zone starts they are playing against better offensive players so doesn’t make sense that their CA60 is inflated above their talent levels (which presumably is better than average defensively)?  This might be true, but if zone starts significantly impacted performance (as would be the case if that last statement were true), either directly or indirectly because zone starts are linked to QoC, then there should be more symmetry between the charts. There isn’t though. Let’s look at what these two charts tell us:

  1. The first chart tells us that players who get offensive zone starts generate more shot attempts.
  2. The second chart tells us that players who get defensive zone starts don’t give up more shots attempts against.

If zone starts were a major factor in results, those two statements don’t jive. How can one side of the ledger show an advantage and the other side of the ledger be neutral? The way those statements can work in conjunction with each other is if zone starts don’t significantly impact results which is what I believe (and have observed before).

But, if zone starts do not significantly impact results, then the results we see in the two charts above are driven by the players talent levels. Knowing that we once again can observe that coaches are doing a decent job of identifying offensive players to start in the offensive zone but are doing a poor job at identifying defensive players to play in the defensive zone.

All of this is to say, NHL coaches generally do a poor job at identifying their best defensive players so if you think that guy who is getting all those defensive zone starts (aka ‘tough minutes’) are more likely to be defensive wizards, think again. They may not be.

 

Sep 062013
 

I had first intended this to be a comment to Tyler Dellow’s investigation into Phaneuf and Grabovski shot totals for and against when they were on the ice together but once I started pulling numbers I decided it was important enough to have a post on its own and not get hidden in the comments somewhere. Go read Tyler’s post because it is a worthwhile read but he found that the when Grabovski/Phaneuf were on the ice together the Leafs were incredibly poor at getting shift with shots while good at having shifts where they gave up shots and it had very little to do with not getting multiple shots per shift or giving up multiple shots per shift at a higher rate.

This is helpful to know because it narrows the issue: the Leafs’ Corsi% last year with Grabovski/Phaneuf on the ice didn’t collapse because of a change in the rate at which multi-SAF and multi-SAA shifts occurred; it collapsed because the Leafs suddenly became extraordinarily poor at generating the first SAF and preventing the first SAA. If you’re blaming Korbinian Holzer or Mike Kostka or Jay McClement for this, you need to come up with a convincing explanation as to why their impact was felt in terms of the likelihood of the first shot attempt occurring, but not really on subsequent ones.

A lot of people blame Holzer or Kostka or McClement but I will present another (at least partial) explanation. Phaneuf and Grabovski’s numbers tanked because the Leafs were winning. Let me explain.

Here is a table of Phaneuf’s CF% over the last 4 seasons during various 5v5 situations: Tied, Leading, Trailing, Total. Note that part of 2009-10 season was with Calgary.

Tied Leading Trailing 5v5
2009-10 53.4% 44.3% 58.2% 52.3%
2010-11 46.5% 38.6% 54.7% 47.1%
2011-12 47.7% 44.3% 56.4% 49.9%
2012-13 39.6% 35.7% 55.4% 41.9%

In Tied and Overall situations Phaneuf’s numbers tanked quite significantly, particularly last season, but where it gets really interesting is in the Leading and Trailing stats. When Leading his stats dropped off a bit to 35.7% last year but he was at 38.6% in 2010-11 and was only 44.3% the other years so pretty bad all round. What’s interesting is his trailing stats have maintained significantly higher levels right through from 2009-10 through 2012-13 with relatively very little fluctuation (compared to leading and tied stats).

Now, let’s look at the percentage of ice time Phaneuf played in each situation.

Tied Leading Trailing
2009-10 41.2% 28.3% 30.5%
2010-11 31.9% 27.7% 40.4%
2011-12 33.5% 29.8% 36.6%
2012-13 32.9% 42.3% 24.8%

He played much more in tied situations in 2009-10 but maintained about the same the following 3 years. Where the big difference lies is in the percentage of ice time he played while leading and trailing. He played far more while leading last year and far less while trailing. When you combine this with the previous table, it isn’t a surprise that his corsi numbers tanked. If we took last years CF% and applied them to his ice time percentages of 2011-12 he’d have ended up with a CF% of 44.2% which is a fair bit higher than his actual 2012-13 CF% of 41.9%. This means about 29% (or 2.3 CF% points) of his drop off in CF% from 2011-12 to 2012-13 can be attributed to ice time changes alone. That’s not an insignificant amount.

As for the rest, I believe Randy Carlyle’s more defensive style of hockey compared to Ron Wilson’s is a significant factor. When leading teams play a more defensive game and we see above (and you’ll see with other players if you looked) when leading your CF% tanks compared to when trailing and playing offensive hockey. How much is Phaneuf’s drop off in CF% in 5v5 tied situations last year is due to Phaneuf being asked to play a far more defensive role?  Probably a significant portion of it.

When we take everything into consideration, the majority of Phaneuf’s drop off in CF% last year can probably be attributed to Leading vs Trailing ice time differences and being asked to play a far more significant defensive role in tied situations and probably only a very small portion of it can be attributed to playing with Holzer and Kostska or any change in quality of competition or zone starts (which I still claim have very little direct impact on stats, though they can be a proxy for their style of play, defensive vs offensive).

Now, let’s take a quick look at Grabovski’s stats.

Tied Leading Trailing 5v5
2009-10 58.0% 55.8% 56.1% 56.8%
2010-11 52.2% 49.8% 58.0% 53.6%
2011-12 52.8% 46.9% 59.2% 53.7%
2012-13 44.0% 38.2% 55.7% 44.3%

Much the same as Phaneuf. His 5v5 tied stats dropped off significantly but his trailing stats maintained at a fairly good level. His Leading stats have dropped off steadily since 2009-10, probably as he has been given more defensive responsibility.

Tied Leading Trailing
2009-10 38.6% 20.3% 41.0%
2010-11 33.3% 28.9% 37.8%
2011-12 33.5% 26.8% 39.7%
2012-13 32.2% 42.7% 25.1%

Nothing too different from Phaneuf. If anything more extreme changes in Leading vs Trailing. For Grabovski, 29.8% of his drop off in CF% last year can be attributed changes in Leading/Trailing ice time while I suspect a significant portion of the rest can be attributed in large part to Randy Carlyle’s more defensive game, and asking Grabovski to play a more defensive role in particular.

Now, how do the Leafs as a team look?

Tied Leading Trailing 5v5
2009-10 52.1% 48.0% 56.1% 52.8%
2010-11 46.1% 41.6% 54.0% 47.8%
2011-12 47.9% 42.1% 55.6% 48.9%
2012-13 43.8% 39.5% 52.2% 44.1%

The Leafs drop off in CF% is pretty even across the board. They lost 4.1% when tied, 2.6% when leading and 3.4% when trailing.  Interestingly that led to a 4.8% drop overall which kind of makes little sense until you look at their leading/trailing ice times.

Tied Leading Trailing
2009-10 37.2% 22.0% 40.9%
2010-11 33.6% 28.9% 37.5%
2011-12 33.7% 29.8% 36.5%
2012-13 33.1% 42.0% 25.0%

Tied ice time remained about the same last year as 2011-12 but leading ice time jumped from 29.8% to 42.0% while trailing ice time dropped from 36.5% to 25.0%. So, when we look at the Leafs as a whole and applied this years leading/trailing/tied CF% stats to last years  ice time percentages they would have only dropped from 48.9% to 45.6%. The remainder of the fall to 41.1% is due to changes in leading/trailing/tied ice times, or 30.8% of the drop off.

So, to summarize about 30% of the drop off in the Leafs team and individual CF% from 2011-12 season to last season can be directly attributed to changes int he Leafs leading/trailing/tied ice time percentages. This means 30% of the drop off can be attributed to the Leafs being a far better team last year at getting leads and winning games.  Or, if you believe that was largely due to lucky shooting you can say 30% of the Leafs drop off in CF% is due to good luck.

Although I haven’t explicitly proven it, I’ll contend that a significant portion of the remainder comes down to Randy Carlye being a far more defensive coach than Ron Wilson was. Maybe another day I’ll test this theory by looking at someone like Phil Kessel and see how his stats changed because Phil Kessel was not given a heavy defensive role last year like Phaneuf and Grabovski were and thus may not have seen the same drop off, particularly in tied situations (quick check: Kessel was 47.3 CF% in 5v5 tied situations in 2011-12 and 42.3% last year so he saw a significant drop off too but not as much as Phaneuf or Grabovski). It may also be interesting to look at how ice time changes impact shooting and save percentages and whether this partly explains the Leafs high shooting percentage last year and maybe what impact it had on their relatively decent save percentages too compared to previous years.

As you can see though, ice time changes can have a significant impact on a players statistics and it is important to take that into consideration in player evaluation like when I looked at Phaneuf’s leading/trailing stats a while back.

(All the stats in this post came from stats.hockeyanalysis.com so feel free to go there, pull the data and analyze whichever team or player you want in leading/trailing/tied situations)