Apr 262015
 

Hockey analytics is well behind analytics in other sports, particularly baseball, but we are now several years into what I will call modern (or current) hockey analytics which has largely focused on possession statistics such as Corsi and Fenwick. Last summer we even saw a number of teams publicly adopt analytics by picking up some prominent people from the public domain. Toronto, Edmonton, Carolina, Florida, and New Jersey to name a few. Results for those teams have clearly been mixed thus far but the greater question is whether hockey analytics, and possession analytics in particular, has had a greater impact on the game than just those few teams. I hope to answer some of those questions today.

One of the reasons why possession statistics such as Corsi became so popular is that it has shown that good possession teams often do well and it has also been identified as an undervalued skill as Eric Tulsky wrote about a couple of years ago. Contracts and salaries were generally given by teams to reward skills such as shooting percentage more than possession skills and thus possession skills were an undervalued talent. Teams could tap into this undervalued talent by getting good possession players at a fraction of the cost of good shooting percentage players. I warned that focusing too much on possession statistics is potentially harmful in the long run as it could result in players altering their playing style at the expense of what really matters, out scoring the opposition. I have shown that there is likely at least a loose inverse relationship between Corsi and shooting percentage implying that boosting one Corsi often has the negative consequences of reducing ones shooting percentage. I did this by looking at the impacts of coaching changes on Corsi and Shooting percentage and looking at the relationship between team CF% and Sh% when extreme outliers are removed.

So, the question is, have we started to see this shift where more teams are focused on possession and less so on shooting (and save) percentage? Has this shift altered team statistics and what leads to success in the NHL? Has the spread in talent across teams for the various metrics increased or decreased? To do this I am going to start off by investigating if there are any differences in statistics for the average team that makes the playoffs (or misses) compared to the average team that makes (or misses) the playoffs several years ago. Let’s start by comparing average playoff team GF% vs average non-playoff team GF% over the past 8 seasons (note that all statistics discussed here are 5v5close statistics unless otherwise specified).

Playoff_vs_NonPlayoff_Team_GFPct

Can’t really say too much has changed here. If anything the spread between good and bad teams has increased a bit but it could just be randomness too. The other observation is that 2012-13 is a bit of an anomaly where the non-playoff teams actually had a higher average GF% than the playoff teams did. This makes no sense other than in a shorter season strange things happened. We’ll get into this more but in a bit. For now, lets have a look at CF%.

Playoff_vs_NonPlayoff_Team_CFPct

Outside of 2012-13 that is about as stable of a chart as you could possible find. There was a slight increase in spread in 2008-09 but otherwise in full seasons the spread in CF% between playoff and non-playoff teams has been very persistent.

Now let’s take a look at shooting percentage.

Playoff_vs_NonPlayoff_Team_ShPct

For shooting percentage, not only is the short 2012-13 season an anomaly but so is 2011-12.  In both of these seasons non-playoff teams posted a shooting percentage higher than playoff teams did which I guess puts some water on the argument that lucky teams make the playoffs if one defines luck as posting an elevated shooting percentage. What is also interesting to note though that outside of these two seasons there appears to be a trend towards an increasing disparity between good and bad team shooting percentage. Let’s look at this difference more closely by plotting Average Playoff Team Sh% – Average Non-Playoff Team Sh%.

DifferenceBetweenPlayoffNonPlayoffTeamShPct

The trend line does not include 2011-12 and 2012-13 which I’ll admit could be interpreted as a bit of selection bias (though those are clear anomalies in this chart) but when one does ignore those two seasons the trend is pretty clear. The disparity in shooting percentage between playoff and non-playoff teams is growing.

This is really kind of counter-intuitive. In a hockey world where there is a hard salary cap and where shooting percentage is an expensive talent to acquire one would actually expect teams would have a difficult time keeping all of their high-shooting percentage players. This does not appear to be the case though.

What about save percentage?

Playoff_vs_NonPlayoff_Team_SvPct

Again, 2012-13 is an anomaly season but otherwise it is difficult to identify a trend in save percentage aside from playoff teams always tending to have a better save percentage than non-playoff teams which makes perfect sense.

And just to complete the charts, here is PDO.

Playoff_vs_NonPlayoff_Team_PDO

Not much new here. The short 2012-13 again appears to be an anomaly and 2011-12 driven by Sh% is a bit of an anomaly as well. Also driven by Sh% is what appears to be a slight increase in disparity between playoff team PDO and non-playoff team PDO.

The other thing we can do is look at the spread of these statistics by season by looking at the standard deviation across all teams to see if the spread is increasing, decreasing or staying more or less the same.

StandardDev_TeamGFPct_BySeason

Aside from the 2011-12 seasons there might be an upward trend in the size of the spread in team GF% which is kind of interesting and counter to the popular belief that parity is increasing in the NHL. It is possible that there is increased parity in the middle and what we are seeing above is driven by the extremes (a few extremely good or extremely bad teams).

StandardDev_TeamCFPct_BySeason

The spread in CF% appears to have increased the past three seasons. Could this be due to some teams jumping on board with possession statistics while others are not resulting in the increased disparity? Difficult to say but certainly possible. It could also be that more teams are going the tank and rebuild through high draft pick route (I am looking at you Edmonton and Buffalo).

StandardDev_TeamShPct_BySeason

As one would expect, the short season of 2012-13 produced the greatest spread in team shooting percentages but otherwise the spread in shooting percentage talent across teams has been pretty stable.

StandardDev_TeamSvPct_BySeason

Pretty much the same for save percentage – a bump in the short 2012-13 seasons but otherwise pretty stable.

StandardDev_TeamPDO_BySeason

And we finish it off with the standard deviations of PDO which is surprisingly variable considering how relatively stable both shooting percentage and save percentage were aside from 2012-13. Not quite sure what to make of that variability but there doesn’t seem to be any upward/downward trend otherwise.

From the above charts I think it is very difficult to suggest that there has been much change in outcomes thus far in the NHL’s adoption of analytics. There are some potentially interesting things surrounding shooting percentage and possibly the increased the variability in CF% the past couple seasons but overall we can’t say with any certainty that anything significant has changed thus far. It is still early though and it can take a number of seasons to change a teams focus so we’ll have to keep an eye on it but so far we aren’t seeing much impact.

 

Apr 132015
 

Since the Los Angeles Kings have been eliminated from the playoffs there has been a lot of discussion about why a team with such a good possession game failed to make the playoffs. This included my article from yesterday which generated a fair amount of discussion as well. A lot of the discussion can be summarized by the following tweet by Sunil Agnihotri referencing a comment by Walter Foddis.

The last paragraph is the one that interests me most.

“The substantive reason for LA not making the playoffs is the OT system, which does not reflect team strength. Statistically, OT outcomes have been shown to be a crap shoot. LA was unlucky in OT”

The fact that LA went 1-7 during overtime play does in fact mean that they were unlucky during OT play. They are a better team than that for sure (every team is expected to do better than that). OT results over the course of a single season are extremely random and thus one could consider them a crap shoot. The challenge I have is just because something is highly variable does that mean it is meaningless in our evaluation? Being unlucky in over time does not mean you are unlucky overall.

I’d hazard a guess that outcomes of the first 5 minutes of the second period for games that are played on a Thursday are highly random too. If a team missed the playoffs and had a terrible goal differential during the first 5 minutes of the second period in games that are played on a Thursday can we chalk up missing the playoffs to bad luck during the first 5 minutes of the second period in Thursday games? No, of course not. We don’t get to pick and choose what good luck or what bad luck we can blame results on. Just because we are more aware of bad luck that happens in overtime games doesn’t mean it is more important bad luck worthy of attributing blame to.

The reality of the situation is that unless you can be certain that the Kings OT bad luck is not offset by good luck during the remainder of the game you can’t blame the Kings missing the playoffs on their OT record.  I haven’t seen the complete luck analysis of the Kings season done to claim the Kings were unlucky during regulation and OT play as a whole so I am pretty reluctant to blame the Kings playoff miss on their OT record just yet.

The interesting question for me is whether 4v4 play is indicative of overall talent because if 4v4 hockey requires a completely different skill set then one could conclude that overtime play isn’t representative of true hockey talent. To answer this question I took the correlation between each teams 5v5close GF% over the past 8 seasons (to get large sample sizes though it would reduce the spread in talent) and compared it to their 4v4close GF% over the past 8 seasons (I used close since most 4v4 ice time is in OT and thus in close situations). Here are the results.

5v5close_vs_4c4close_GFPct

And the same for CF%.

5v5close_vs_4c4close_CFPct

Those correlations are good enough for me to consider that 5v5 skills are fairly transferable to 4v4 play and vice versa. Over small samples strange things happen, but to suggest that 4v4 play isn’t indicative of hockey skill and that is why one should ignore OT results is not valid either.

An interesting observation is that the slope on the CF% chart is almost exactly 1.0. The slope on the GF% chart is significantly higher than 1.0 which might indicate that 4v4 play is actually a better indicator of talent than 5v5 play (if you are good at 5v5 play you should be even better at 4v4 play). That said, if I force the intercept to zero the slop drops to 0.9958 or almost exactly even (and r^2 drops to 0.3123 with zero intercept) so maybe 5v5 and 4v4 are on par with each other. Regardless, this should at least alleviate Steve Burtch’s concern that poorer teams are more likely to score first during 4v4 play than during 5v5 play. I don’t believe that to be the case.

Now when we talk about shoot out record I think that it is safe to assume that the shoot out is a lot further from being representative of actual hockey talent than 4v4 play. There is probably not enough shoot out data to actually be able to do a similar analysis with any degree of confidence but I doubt there is much disagreement that the shoot out is a long way from being representative of real hockey.

 

Apr 122015
 

The other day I posted the following twitter comment after the Flames defeated the Kings to gain a playoff position while simultaneously eliminating the reigning Stanley Cup Champion Los Angeles Kings from the playoffs.

I posted this comment for two reasons. First because I think if you are being honest about evaluating possession analytics you have to consider the failures on an equal ground as the successes. I am certain that if the Kings defeated the Flames and ultimately made the playoffs over the Flames there would have been people that would use it as evidence that possession analytics is good at predicting future results. That would be a fair thing to do but you have to consider the failures too and possession analytics failed twice here, first with the Flames making the playoffs and second with the Kings missing. So, I made this comment because analytically it is the correct thing to do and I felt it needed to be said.

The other reason I made this comment was to see how people would react and to see whether people would react with fairness as explained above or in a defensive manner defending possession analytics and dismissing the Flames/Kings outcome as largely luck. For the most part the reaction was more subdued that I had thought but there were some jumping in defense of possession analytics including the following tweet from @67sound.

If you are relying on the LOS ANGELES KINGS to minimize the importance of possession metrics I don’t even know where to begin.

This is an over reaction because I didn’t actually try to minimize the importance of possession, I was just pointing out where it failed. If you follow me I use possession metrics all the time, I just think that there is too much consideration for when possession metrics succeed in predicting outcomes and too little consideration of when it fails and when other metrics succeed. I have talked about this before on a few occasions where people want to point out how well possession metrics are at predicting outcomes but not actually comparing the success rates against other predicting methodologies. In many instances possession statistics do a great job at predicting outcomes, but often goal based metrics actually do slightly better.

The follow up discussion to my tweet soon started to rationalize why the possession stats failed in predicting the Los Angeles Kings missing the playoffs.

Scott Cullen of TSN.ca wrote the following in his Statistically Speaking column about the Kings.

For starters, the Kings were 2-8 in shootouts and 1-7 in overtime games. Given the randomness involved in shootout results, that’s basically coming out on the wrong end of coin flips. 3-15 in overtime and shootout games, after going 12-8 the year before, is enough in tightly-contested standings, to come up short. Records in one-goal games tend to be unsustainable, but there’s enough of them in hockey that they make a huge difference in the standings.

Most of these are fair comments. The shootout record in almost completely random and not actually representative of how good they are at playing hockey (though I disagree with overtime records not being useful in evaluating how good the Kings are at playing hockey). With a bit better fortune the Kings likely would have made the playoffs and probably should have. The thing is though we all need to be careful not to use “luck” as a tool in confirmation bias as luck can be used to explain everything. Flames made the playoffs, write it off as good luck and move on without blinking an eye. They will regress next year, just watch. Kings missed the playoffs, write it off as bad luck and move on without blinking an eye. They will be better next year, just watch. A thorough review needs to be conducted, not just quickly write off anything that goes counter to our beliefs/predictions as luck.

The Kings missed the playoffs this year with 95 points. The previous four seasons they have had 100, 101 (prorated over 82 games), 95, and 98 points. So, on average the LA Kings have been a ~98 point team over the past 5 seasons. If they went 5-5 instead of 2-8 in shootouts that is exactly where they would have finished. For the most part this Kings team is what they have mostly been and what we probably should have expected. That is a good, but not elite, regular season team. Over these past 5 seasons they have finished 18th, 10th, 7th, 13th and 12th place overall. That actually compares somewhat poorly to the cross-town Anaheim Ducks who have finished 3rd, 2nd, 3rd, 25th, and 9th over the past 5 seasons. The Kings score adjusted Fenwick % over that time is 55.3% compared to the Ducks 50.3% and yet four of the five seasons the Ducks finished ahead of the Kings in the regular season. The reason for this is the Ducks have a 9.19 5v5close shooting percentage over the past four seasons compared to the Kings 6.69%. That difference is not luck. It’s a persistent repeatable skill that possession analytics doesn’t capture. Barring major off season roster moves no one should be predicting the Kings to end the regular season ahead of the Ducks next season. I suspect some will though just as was done for this season when using possession analytics to predict regular season point totals (Kings were predicted to get 107 points, Ducks 91).

So the Kings have been a pretty good but not a dominant regular season team. They have won the Stanley Cup twice during this period and have been a dominant possession team which has given us the perception that they are an elite team. Is it possible that we have generally over rated them because of their possession and post season success?  Maybe. Are they really a great team or just a good one that got hot when it mattered a couple times? It’s a question worth asking I think but if you just chalk up missing the playoffs this season to luck it is probably one you won’t be asking.

While we are on the subject of teams that were predicted to regress this season one such team is the Colorado Avalanche. A lot of people are tossing them out as an example of where possession statistics successfully predicted their failures this season. A major reason for predicting this regression was due to regression in their shooting and save percentages as Travis Yost of TSN.ca wrote prior to the season.

Using that regression for forecasting purposes, expect Colorado to shoot around 7.89 per cent for next year at evens and stop around 92.47 per cent of the shots.

Those are 5v5 shooting and save percentages Yost is talking about. In actual fact Colorado’s shooting hasn’t regressed this year as it is more or less identical to last seasons 5v5 shooting percentage (8.75% this season vs 8.80% last season). Save percentage has regressed almost what Yost predicted (92.52%) so he was right there (the role luck played in this is unknown though) but a major (and maybe the primary) reason for the Avalanche’s failures this season is they are playing a substantially worse possession game than last season. Colorado’s 5v5close CF% dropped from 47.4% last season to 42.9% this season which is a massive drop and likely the major reason for their failures this season. That drop can largely be attributed to letting two of their best CF% players leave in the off season – Paul Stastny and PA Parenteau and replacing them with poorer possession players in Iginla and especially Briere. Coaching may be a factor too. So some of the Avalanche’s failures this season can be attributed to a regression in save percentage but a significant part of it is due to poor off-season roster decisions.

Once again, we need to be careful with the “I told you they would regress” and leave it at that if the majority of their regression is due to factors you didn’t predict (to be fair Yost did mention that the Avalanche’s possession might drop a bit due to roster changes as well but it wasn’t the crux of his argument). It is quite possible, if not highly likely, the Avalanche is in fact a well above average shooting percentage team and we shouldn’t expect it to regress next season just as we shouldn’t expect the Ducks to either.

I need to reiterate here that it isn’t that I don’t believe that possession is an important aspect of the game. It is. It is why the Kings are good despite terrible shooting talent. It is why the Leafs are bad despite good shooting talent. What I really want to see and why I always point out where possession failed is because I want to ensure is that everyone evaluates possession fairly in the context of the complete game. I often hear things like “no one ever said possession was everything” and yet I frequently hear claims made without any mention of factors other than possession metrics. The Kings being a perfect example. Everyone assumed they were a great team that, barring massive bad luck, would make the playoffs and when they didn’t make the playoffs they started throwing out all the evidence of that bad luck. Truth is it was perfectly reasonable to predict that with even a little bit of bad luck the Kings could miss the playoffs though I don’t recall anyone really suggesting that (correct me if I am wrong though). It is also fair to suggest that if Colorado made smarter off season roster moves they could have been a playoff team again and not regress nearly to the extent they did but the discussion about the Avalanche revolved around bad possession, high PDO, they were lucky and will regress a lot. I want to see a better balance in hockey analytics as I think too much of hockey analytics is dominated by possession analytics. That is why I write tweets like the one about the Kings and Flames. There needs to be more balance.

So, my final words of advice is if you don’t believe that possession is everything (which apparently none of you do) you ought to be doing more than just conducting possession analytics. If you can honestly say you are doing that I congratulate you. If you can’t, well, what you do next is up to you.

 

Mar 212015
 

The other day on twitter I was called out by Sam Ventura who does some great work on war-on-ice.com. Specifically he did not like my article on zone starts that I wrote the other day.

Let me step in here and say that I have never denied this. Offensive zone face offs are more likely to result in shots for the team on offense and less likely for the team on defense. Ok, that is settled, lets move on.

 

This is the crux of the problem. At the micro level yes, the location of face offs impacts outcomes. On the macro or aggregate level they are minimal. I tried to explain that here in more detail but maybe it didn’t come across too well so I will try again, in another way, with the war-on-ice tools. Let’s look at the Shea Weber picture from above Sam’s tweets above.

Weber1

Ok, do there looks like a relationship. The higher the offensive zone start percentage the higher the CF%. Now, let’s take a look at the same chart but Offensive zone start percentage relative and see how the chart changes.

weber2

Significantly less correlation. Why? Because when the team is playing well the team as a whole generates more offensive zone starts. Not the other way around. We can also flip it around and look at how ZSO% compares to CF%Rel.

Weber3

And to finish the display we can look at ZSO%Rel vs CF%Rel.

Weber4

The relationship that Sam has observed is largely team driven, not Weber’s zone starts driven. There is a zone start impact on a players statistics but it is very minimal and for the majority of players can safely be ignored. The impact of the team is far more important. When the team does well it will result in a better CF% which in turn results in a higher ZSO% which is the reason for the high correlation. Zone starts don’t drive CF%, CF% drives zone starts. This makes total sense because the majority of zone starts will come after a shot on goal. The shot on goal produces the offensive zone face off, it isn’t the offensive zone face off that produces the shot on goal. We need to think of zone starts more as a result, not a cause.

On top of the team effect, I believe there is a style of play impact too which will take away even more correlation. When you play defensive hockey you often give up more shots. We see it in score effects all the time. Players who start more in the defensive zone are more likely to be the ones playing defensive hockey. This adds to the correlation as well and has nothing to do with zone starts.

Let me leave you with Phaneuf’s charts because his correlation in Sam’s charts was probably the greatest.

 

 

Phaneuf1

Phaneuf2b

Again, a significant portion of the relationship disappears when you look at ZSO%Rel.

For me, the main evidence that zone starts don’t have a significant effect on a player’s overall statistics is if I remove the 45seconds after all offensive/defensive zone face offs (which basically ignores the entire shift) the majority of players have the same CF% +/- 1% and only a handful with heavy offensive or defensive zone starts have an effect in the +/- 1-2%. If removing all shifts that start with an offensive or defensive zone start does not dramatically impact a players overall statistics you simply cannot conclude that zone start bias plays a prominent role in driving a players overall statistics. Yes, for a particular shift it will, but not overall. Furthermore, the majority of that impact occurs in the first 10 seconds after a face off which is why my zone start adjusted data removes these 10 seconds which is something I showed over 3 years ago.

The critical point to remember in all of this is shots drive where face offs occur, where face offs occur do not drive shots. Coaching and line changes for face offs can impact overall player statistics a little but really not all that much.

 

Mar 162015
 

Matthew Coller has an interesting article on Puck Prospectus about Shea Weber and his poor Relative Corsi. His conclusion was that Weber’s poor Relative Corsi is largely due to his playing time with Paul Gaustad in which he posted a very poor CF% along with having a very heavy defensive zone start. His conclusion was that Weber’s poor Corsi with Gaustad is in a significant way caused by the heavy defensive zone start bias. This is a case of correlation not causation as I outlined in the comment section of that article. I recommend you take the time to read both the article and my comments because they are worthwhile reads.

My issue with the article is that I don’t believe that zone starts dramatically impact a players overall statistics as I explained here. I just haven’t seen any convincing evidence that zone starts would change a players CF% much more than 1-2% and for most players considering zone starts in player evaluation is not important. The relationship that Coller observed is important though because there is a clear relationship between zone starts and CF%. The relationship isn’t causal though. What the zone starts signify is a style of play. Players with a heavy defensive zone start bias are likely asked by the coach to play a defense first game and in many cases generating offense is not an important issue. The result is often a relatively minor deviation in a players CA/60 but a major deviation in a players CF/60 from the overall team stats. Let’s look at Paul Gaustad as an example. Gaustad has a OZone% of just 12.2% which means he has over seven times as many defensive zone starts as offensive zone starts. Here are how his Corsi stats compare to Nashville’s overall stats in 5v5close situations this season.

CF60 CA60
Nashville 60.0 53.0
Gaustad 38.8 51.9

As you can see, despite a heavy defensive zone start bias when Gaustad is on the ice the Predators actually gives up slightly fewer shots attempts against than they do overall but it is pretty close. Offensively though, when Gaustad is on the ice there is significantly less offense generated. If zone starts are the explanation one would probably expect there to be more balance between more shot attempts against and fewer shot attempts for but this is not the case. The likely explanation is that when Gaustad is on the ice the team is largely focused on not giving up a goal rather than generating offense. I suspect they do this largely by not giving up the puck and maintaining puck possession when you get possession. When you take a shot you are actually giving up control of the puck. You may regain control but so might the other team. If you are focused on preventing goals the best way to do that is to not give up the puck.

Lets take a quick look at Filip Forsberg who has played with a heavy offensive zone start bias indicating he is probably used in more offensive situations.

CF60 CA60
Nashville 60.0 53.0
Forsberg 69.4 53.2

Forsberg’s CA/60 is actually very similar to the team average and not all that different from Gaustad’s (higher actually) but his CF/60 is almost 80% higher. Again, this is unlikely to be zone start influenced but rather some combination of talent and playing style.

So, it seems that Ozone% is likely an indication of style of play, or at least an indicator of the main objective of the players on the ice, and we have seen that this can have a major impact on shot attempt rates.  I want to take this discussion one step further by looking at whether players can influence shooting/save percentages based on their style of play. Since shooting/save percentages are highly variable over small sample sizes such as the number of shots for/against taken while a player is on the ice during a single season we need to find ways to work around the randomness associated with the percentages. One way to do this is to group players based on similar attributes and take a group average. One of my favourite hockey analytics articles was this one written by Tom Awad in which he grouped similar players based on ice time and in doing so he found that shooting better than your opponent is a major factor in what makes good players good. In this case I have grouped players based on their OZone% and then took a group average Sh%RelTM and Sv%RelTM during 5v5close situations.

Ozone% Sh% RelTM Sv% RelTM
<30% -0.92% 1.26%
30-35% -0.43% 0.59%
35-40% -0.38% 0.80%
40-45% -0.18% -0.03%
45-50% -0.07% -0.07%
50-55% 0.48% 0.10%
55-60% 0.50% -0.16%
60-65% 0.52% 0.36%
65+% 0.24% -1.07%

Graphically here is what we get.

ZoneStarts_vs_Percentages

As you can see, there is a fairly strong relationship between zone starts and Sh%RelTM and Sv%RelTM. Players with a heavy defensive zone start will generally have a positive impact on his teams save percentage and a negative impact on his teams shooting percentage. Conversely players with a heavier offensive zone start bias will generally have a positive impact on his teams shooting percentage and negative impact on his teams save percentage. Some of this is likely player talent but a significant portion of it is likely driven by style of play as we saw with Corsi. It is next to impossible to identify these relationships by looking at individual players statistics because of the small sample sizes but when we group similar players together the relationship becomes clear and is a relatively strong one.

For perspective, Paul Gaustad’s OZone% over past three seasons with Nashville is 21.2% while his Sh%RelTM is -1.4 and his Sv%RelTM is +1.9.

The major takeaways I hope people get from this article are the following:

  1. Zone starts really do not have a significant impact on a players statistics.
  2. Zone starts can be an indicator of a players style of play and style of play can have a major influence on a players statistics (see my Coaching/Corsi dilemma article for more evidence of how style of play impacts Corsi).
  3. Players are able to, through talent and/or playing style, influence save and shooting percentages.
  4. Finding trends in shooting/save percentages can be difficult due to small sample size issues but that does not mean they do not exist. Hockey is a complex sport to analyze but being creative in grouping similar players can allow you to pull out valuable information that you otherwise could not.

 

 

 

The Coaching-Corsi dilemma

 Uncategorized  Comments Off on The Coaching-Corsi dilemma
Feb 242015
 

The other day I wrote about Bozak-Corsi dilemma which basically goes as follows:

  • The coaching change in Toronto from Carlyle to Horachek resulted in Tyler Bozak and the rest of the Leafs top line posting dramatically improved Corsi (5v5 tied CF%). Does this mean Bozak et al. suddenly got good or does it mean that Corsi is largely driven by playing style which one can change and thus the value of Corsi in player evaluation is greatly minimized.

Today I will look at the rest of the Leafs 5v5 Tied CF% from Carlyle to Horachek as well as three other coaching changes that occurred during the 2011-12 season. Those are Bruce Boudreau to Dale Hunter in Washington, Randy Carlyle to Bruce Boudreau in Anaheim and Terry Murray to Darryl Sutter in Los Angeles. Expanding the analysis to more players/teams will determine whether the Bozak-Corsi dilemma can be expanded to a more general Corsi-Coaching dilemma. First, lets summarize how the team 5v5 tied CF% changed due to these coaching changes.

Coaching Change CF% Pre CF% Post Difference
Kings – Murray to Sutter 50.7 58.5 7.8
Ducks – Carlyle to Boudreau 42.6 48.7 6.1
Leafs – Carlyle to Horachek 45.2 51.1 5.9
Capitals – Boudreau to Hunter 56.7 48.1 -8.6

The biggest positive impact was with the Kings while the Capitals change from Boudreau to Hunter had the biggest negative impact and the biggest change overall. All four coaching changes saw significant impact in the teams overall CF%. Let’s look at the teams in order listed above starting with the Kings.

CoachingChangeCFPct_Kings201112

Shown here are each players CF% under Murray (in Blue) and under Sutter (in Orange) with the difference shown in the grey bars. Shown are players with at least 50 5v5tied minutes under both coaches. The first four players saw their CF% jump by at least 10% and the next four by at least 7.5% and 11 of the 14 players saw their CF% jump at least 5%. Only Drew Dougty saw his drop but he already had a team best 59.3 CF% under Murray.

Here is the chart for the Ducks.

CoachingChangeCFPct_Ducks201112

Every single player saw their CF% jump at least a little after the coaching change. Visnovsky’s and Getzlaf’s CF% jumped at least 10% while Perry, Sbisa and Lyudman jumped at least 7.5% and Selanne and Cogliano at least 5%.

Now for the Leafs.

CoachingChangeCFPct_Leafs201415

JVR, Bozak, Rielly and Kessel saw at least a 10% boost in their CF% while Polak and Gardiner were at least 8%. No other player saw a jump of more than 3% and Komarov actually has a significant (10.8%) drop off.

Now, for a reversal of fortunes here is the Capitals chart.

CoachingChangeCFPct_Caps201112

For the Capitals every single player saw at least a drop of 3.9% (in fact only Wideman saw a drop off of less than 5%) with the first 6 guys seeing a drop of at least 10% and three more at least 9%.

In total there are 53 players in the charts above, 17 of them saw an absolute change of at least 10% while another 13 saw an absolute change of at least 7.5%. The average change was just shy of 8%. By looking at these four coaching changes it is safe to say that it is not unusual for a coaching change, or a change in playing style, to impact a players 5v5 CF% by 10% or more (nearly one third of the players above saw that big of a change). If a normal range for 5v5tied CF% is between 40% and 60% I think it is safe to suggest that half or more of that spread might be due to playing style and not individual talent. Furthermore there are almost certainly different playing styles on a single team (some lines certainly play more defenisve roles while others play more offensive roles) so even looking at CorsiRel stats might not factor out all coaching decisions. It certainly appears that Kessel-Bozak-JVR have seen a far more significant boost in their CF% relative to the rest of the team indicating that they likely change their playing style the most.

Above I looked at four coaching changes which had an average absolute impact of an 8% change on 5v5tied save percentage with nearly one third the players having an absolute change of greater than 10%. The majority of NHL players end a season with a 5v5tied CF% of between 40% and 60%. Based on the above analysis it is probably reasonable to believe that at least half of that spread can be attributed to variations in coaching/playing style which means the actual talent spread is probably no more than 45% to 55%, possibly even less.

Furthermore, I have previously shown that FF% (and thus likely CF%) loses predictive ability over longer periods of time at the team level. A significant reason for this is likely the higher number of coaching and roster changes that occur over a 4 or 5 year span. Every coaching change and every time a player changes teams (or even the line they play on) can potentially lead to a playing style change which could impact their CF% significantly. Of course, none of this should really come as much of a surprise as we already know playing style can have a major impact on CF% because we know all about score effects. On average a teams 5v5 CF% when they are leading is about 10% higher than their 5v5 CF% when they are. This 10% difference in CF% due solely to playing style dictated by the score lines up fairly well with what we have seen above where a 10% change in CF% due to a coaching change is not abnormal. The Corsi-Coaching Dilemma is real.

What all this means is that we need to consider playing style when we evaluate players because playing style can have a major impact on a players statistics. In fact, it may be the most important factor in a players Corsi statistics. This is something that we rarely do in analytics but failure to do so could result in a very flawed player evaluation. This is something the hockey analytics community really needs to address in future research.

 

The Bozak-Corsi Dilemma

 Uncategorized  Comments Off on The Bozak-Corsi Dilemma
Feb 222015
 

(Note: This is a cross post with MapleLeafsHotStove.com. You can find the original article here. I don’t normally cross post but this is relevant to Hockey Analytics as a whole, not mostly to Maple Leaf fans.)

A significant portion of modern hockey analytics revolves around Corsi (or SAT% as defined by the NHL), which is really nothing more than looking at which team takes more shot attempts. If you can out shoot your opponent, the theory is that it goes a long way to driving success in terms of out scoring your opponent and ultimately winning games. There is a lot of evidence to support the case that Corsi is a major component of on-ice success. While I believe many people put too much weight on Corsi statistics, I do accept that it is a major component of success.

Over the past few weeks, I have looked at the Leafs performance this season under Randy Carlyle and under Peter Horachek. First I looked at how zone start usage has change from Carlyle to Horachek and the impact of those changes on Corsi. Last week, I looked at a WOWY analysis of Tyler Bozak and David Booth to see if change in linemates from Carlyle to Horachek accounted for the changes in results. The conclusion from these posts is that a significant portion of the Leafs’ improved Corsi statistics is driven by the Leafs top line, and that outside of the top line not a lot has changed with respect to their Corsi statistics. To highlight the improvement in the Leafs top line, here are their 5v5tied CF% statistics under Carlyle and under Horachek.

Bozak CF% Kessel CF% JVR CF%
under Carlyle 38.4 41.0 39.0
under Horachek 53.6 52.0 55.0
Difference 15.2 11.0 16.0

Under Carlyle, the trio of Bozak, Kessel and JVR were pretty close to a league-worst Corsi line, with Bozak being the worst of the three. Under Horachek, they are well above the break even 50.0% line and have put up pretty good Corsi percentages. As far as Corsi is concerned, this trio went from downright awful to well above average. All it took was, I presume, a playing style change demanded by a new coach.

For several years it has been believed that Corsi is an important tool in evaluating players. It was a major component of what has driven the analytics community to conclude that Bozak is a poor hockey player. The evidence above suggests that a simple playing style change can drive Corsi from downright terrible to pretty good. This leads to a bit of a dilemma within hockey analytics, which I will call the Bozak-Corsi dilemma, with two serious questions that need to be answered:

  1. Is Bozak now a pretty good player?
  2. More importantly, if a player (or a forward line) can dramatically alter their Corsi overnight seemingly solely through changing playing style (driven by a coaching change), it must be concluded that Corsi is not primarily driven by individual player talent.

The first point will provide some angst within the Leafs fan base, but from my perspective the answer is no because of his (goal) WOWY’s, Points/60, IPP, etc. are also pretty weak, although maybe he isn’t as bad as previously thought if he plays an optimal playing style.

The second point is critically important, though, because it basically implies that Corsi has significantly less value (maybe little or no value) in individual player evaluation than previously thought, which should send ripples throughout the hockey analytics community. If Corsi is largely driven by playing style, one must conclude it isn’t an individual skill? It isn’t something I’d conclude based on three players, but it definitely makes you think about it more.

 

Feb 212015
 

I wasn’t actually planning on writing anything formal about the new enhanced hockey stats on NHL.com but this post over at Jewels From The Crown was kind of the last straw.

Before getting into that article let me say a few things. Despite the fact that I run a popular hockey stats site I really wanted to see the NHL do a good job on their advanced hockey stats site. I honestly don’t see them as a competitor nor do I really care if they are because I make no money off the site and my interest lies as much in analysis and research as it does in producing and running a website. I also see the NHL.com site being more geared to the average, more casual user while my site is geared more towards the hard core user and researcher. I love hockey, I love hockey statistics and hockey analytics, and I really would have loved to see the NHL do this right to bring this to a wider audience than I, or any of the other hockey stats sites, ever could. While I still have that hope my thoughts on this first attempt is that is a very poor effort that could have gone much better.

So, now, what set me on this bit of a tirade. Well, the post at Jewels From the Crown featured an interview with Chris Foster, NHL Director of Digital Business Development, and Gary Bettman. In it Sheng Peng asked about what ‘exclusive’ stats that NHL.com offers over other sites such as mine. This was a portion of the answer.

Foster: I’ve got to double check. I’m not sure. There’s zone starts, I think those are completely brand-new. And the level of depth that we’re doing with primary and secondary assists. I don’t think anybody’s going to have that much detail. That first batch—shot attempts and unblocked shot attempts–there’s a lot of that. It’s that second batch of stats—primary assists and penalties drawn over time—those are the ones that will be more unique to the site. They may be out there but not to the level of depth that will be on NHL.com.

Zone starts?  Brand-new? Really? Zone starts have been around for years. Primary assists are exclusive to NHL.com? Really? I’ve had them on my site for years too. I even go a step further and look at primary points (goals + primary assists). Penalties drawn has been around elsewhere too. I’ll give the NHL the benefit of doubt and believe that they are actually oblivious to what else is being done out there because otherwise they are outright misleading and belittling the hard work that I and many others have done previously. Looking at their enhanced statistics site it is clear that they haven’t really put much thought into this whole project or reached out to the analytics community because they are tons of things that I think many would suggest they do differently. Here are a few examples:

  1. SAT and USAT are short for Shot ATtempts and Unblocked Shot ATtempts otherwise known as Corsi and Fenwick respectively. I am OK with the name change but for the NHL’s target audience it is absolutely unnecessary to use both. Even myself as an analytics person at times wonder why we have both Fenwick and Corsi. They are extremely highly correlated and the benefits of one over the other is generally very minimal. For the casual user it is completely unnecessary to burden them with these two separate stats. It would have been far better to simply use shot attempts (Corsi) and leave out the unblocked variety. Shot attempts are simple, straight forward, and easy to understand what they are.
  2. The Skater Shooting/Time on Ice stats have both /20 and /60 statistics which is redundant and pointless. One is just 3 times the other. Why one would see the need to present both side by side on the same page is beyond me. Let’s present a stat. Ok, now lets multiply it by three and present that too. Who thinks like that? Really? Who? Furthermore, when I started my site I used /20 stats because I figured a good player plays about 20 minutes per game. Other sites used /60 because a game is 60 minutes long and it tells how that caliber of player would produce in a full game. Both have merit but for the purpose of consistency across I have converted all my stats to /60. Had they reached out to me I’d have told them this and they may very well have done the smart thing and just present /60 stats.
  3. Having a stats site and not being able to filter based on games played or time on ice it practically useless. When I sort by SAT% I want to see who the best players are who play regular or semi-regular shifts. Instead the top of the list is dominated by AHL call-ups with one or two games that nobody has heard of and nobody cares about. Why do this?

There are numerous other smaller mistakes as well (see Eric Tulsky’s twitter time line for a few of them). It’s a shame really because I was hoping for and expecting for a whole lot better. I applaud the NHL for hopping on the ‘enhanced statistics’ bandwagon but what they released today screams of a poorly thought out beta release of a product developed by a group of amateurs, a long way from a major new product release from a multi-billion dollar organization (NHL) backed up by another multi-billion dollar organization (SAP) which they promoted it as being.

I really do hope that the NHL gets their act together and makes it work as I think it will be good for everyone. The NHL, the casual fan, and those in the hockey analytics community. We all benefit when the NHL does things well. We are, at the core, all hockey fans. My hope with this post is that it inspires the NHL to spend more time reaching out to the people that have been doing this for years. We have years of experience, knowledge and expertise that would have helped avoid many of the basic and senseless missteps we see today. If you are with the NHL and are reading this I want you to know that I more than willing to share my experience and I am certian most everyone in the hockey analytics community would as well. You just have to ask. My e-mail is david@hockeyanalysis.com.

Stat Site Upgrades

 Uncategorized  Comments Off on Stat Site Upgrades
Feb 022015
 

Some of these have been announced on twitter but I have recently made some upgrades to stats.hockeyanalysis.com and puckalytics.com. Here is a list of the upgrades.

New Situations:

  • Home and Road for 5v5 Tied, 5v5 Close, 5v5 Leading and 5v5 Trailing
  • 4v4
  • All Situations (all play)
  • All Power play (includes 5v4, 5v3, 4v3)
  • All Short handed (includes 4v5, 3v5, 3v4)

Multi-year stats with current season

  • 2013-15 (2yr), 2012-15 (3yr) and 2011-14 (4yr) stats have been added
  • Multi-year stats up to 2007-15 (8yr) will be added in the off season – too much data to update nightly.

 WOWY Zone Starts (stats.hockeyanalysis.com only)

  • WOWYs now include OZFO%, DZFO%, NZFO% and OZone% (tweaked UI a bit from initial release yesterday)
  • WOWYs also now /60 instead of /20 as were previously (now consistent with rest of site and puckalytics.com)
  • “Against You” stats now available for current season (currently only opponents with >15min TOI against but this will drop to 5 min. after update tonight)

Percent of Team Zone Starts (puckalytics.com only)

  • Now available is the percent of a teams (in games player played in only) offensive/defensive/neutral zone starts the player was on the ice for.

Various bug and data fixes

  • Fixed issue with Percent of Team stats for special team situations
  • Manually fixed a bunch of errors in NHL.com shift tables over past 4 seasons (should improve reliability of data)
  • Probably some others I have forgotten about

 

That’s all for now. As usual, if you find any problems or have any more requests for enhancements let me know.

 

The Value of Outliers

 Uncategorized  Comments Off on The Value of Outliers
Jan 252015
 

Ryan Stimson has been doing some valuable work tracking passes and this morning he posted an interesting analysis of the data he (and others) have collected thus far. It is a very interesting article and definitely worth a read. It is a valuable contribution to shot quality research but the article created some twitter discussion regarding one of the techniques that Ryan used. In particular, when Stimson was looking at the correlation between two variables (i.e. passing ability vs shooting percentage) he noticed that there was often an outlier team and he would subsequently look at the correlation between the two variables while eliminating the outlier team. This technique of removing outliers generated a bit of a backlash on twitter from @garik16 as it did when I used this technique not long ago.

While I think that removing outliers has to be done with great caution and consideration it is also important to acknowledge that outlier analysis can be incredibly valuable tool in understanding what is going on. Teams aren’t built randomly and talent isn’t evenly distributed across the league. Talent differences across teams may result in different statistical patterns across teams. Different organizations have different philosophies on players and playing styles and this too may impact statistical patterns. As I have said before, we know that teams can manipulate statistical patterns by changing their playing style based on the score of the game (score effects are a well researched and fully accepted concept in hockey analytics) so it isn’t difficult to envision that various other statistical patterns could be altered by organizational or coaching philosophies. As statistical analysts we have to be open to this and not just apply a statistical model, crank out the results, and settle on hard and fast conclusions. We need to spend the time to understand the underlying data too.

I have spent a significant portion of my career working on air pollution research with some world-renowned scientists. Many years ago one not long after I finished University and just embarking on my career I was conducting some research on the relationship between weather patterns and air pollution. While doing this research a research scientist that I highly respect once told me that often the most interesting things can be learned when we study outliers. For this area of research typical weather patterns resulted in typical pollution levels but the study of outliers (atypical weather patterns) can really highlight the intricate relationship between weather patterns and air pollution.

Hockey isn’t baseball where there are a series of one-on-one battles that can be relatively easily incorporated into a statistical model because the only real factors involved are the talent levels of each player in the one-on-one battle. Unfortunately this isn’t how hockey works. Hockey is more like weather patterns where everything is interdependent on everything else and thus is very difficult to model. Sure, there are prevailing weather norms but occasionally outlier events happen like hurricanes or blizzards. It is these outliers that are the most interesting and most researched weather phenomena. Compared to a hurricane or a blizzard nobody really cares much about another 80F sunny day in Miami or a -5C January day in Ottawa. It’s just another day.

So, when I see someone suggest that you shouldn’t investigate how outliers affect underlying trends I get a bit defensive. If all you care about is what normally happens you’ll never truly understand the most interesting stuff. No NHL team strives to be ordinary, they strive to be elite and being elite, by definition, means being an outlier. If you want to be an outlier, you ought to do everything you can to understand what makes an outlier an outlier.

In one of Stimson’s charts he identified Chicago as the outlier team. Interestingly, I identified Chicago as an outlier team in my study on the relationship between Corsi and shooting percentage because they are one of the few teams that can post a good Corsi and an elevated shooting percentage. Furthermore, when it comes to elite NHL teams, Chicago would be front and center in the discussion. Is this a coincidence? Maybe. Or maybe it isn’t. It could be luck, it could be skill, or it could be organizational philosophy and/or coaching tactics but understanding why outliers exist is of critical importance. (Note: This is where I see the convergence of hockey analytics with traditional ‘hockey people’ like coaches and scouts. Analytics can identify trends and outliers to those trends and coaches and scouts can help assess the reason why those trends and outliers occur.)

Ultimately, for any NHL franchise who strives to be an elite team (which they all should) it means they are striving to be an outlier. Without understanding what make an outlier how can you expect to be one and you’ll only understand what makes an outlier by studying outliers independently from the underlying typical trend. This needs to be done with caution and care as to not just reinforce preconceived beliefs, but by not doing outlier analysis you are not fully understanding what is happening.