Jun 302015
 

I am sure this post will rattle some feathers in the Hockey Analytics community but hey, it won’t be the first time I have accomplished that.

I have been looking through the list of potential free agents looking for players that are possibly under valued, possibly over valued, or otherwise interesting for one reason or another. There has been a fair bit of discussion around the three players that are the focus of this post. Justin Williams has been a favourite of the hockey analytics community posting outstanding Corsi numbers year after year. Alexander Semin, who was bought out by the Carolina Hurricanes is one of those guys that seems to be hated by coaches, scouts, general managers, and “traditional hockey people” but analytics people look at his numbers and, last season aside, they look outstanding. Matt Beleskey is an unrestricted free agent that hockey analytics people want to warn teams about because he is coming off a career year with 22 goals driven largely by a high, and unsustainable, shooting percentage. The hockey analytics community are predicting he will be one of those guys teams will over pay for and regret the decision a year from now. So, I figured it would be worth while taking a deep look at these players because from my observations the deeper you look the more interesting things become and the story potentially changes.

I am rushing a bit to put this post together so it may come across as just me throwing out some numbers and charts. I apologize for that but bear with me, there is an interesting story that will develop.

For this post all numbers will be 5v5close numbers to minimize the impact of score effects. I am also going to focus on my RelTM statistics which look at how each player influences his line mates. It is like a combined WOWY analysis where we can determine whether the players teammates perform better with him or apart from him.

Corsi

Let’s look at the corsi statistics first starting with the offensive and defensive components and then corsi percentage.

SeminWilliamsBeleskey_CF60RelTM

Here higher is better as it means teammates have a higher CF60 with them than apart from them. The 3-year average is their CF60RelTM over the past 3 seasons. For the most part Williams is the best, Beleskey is the worst and Semin bounces around a bit.

SeminWilliamsBeleskey_CA60RelTM

Here lower is better as it indicates there are fewer shots against when players are playing with them than apart from them. Things looked a little differently this past season but prior to that Williams was always better than Beleskey and Semin bounced around a bit. This past season both Beleskey and Semin were better than Williams.

SeminWilliamsBeleskey_CFPctRelTM

 

Higher is better on this chart. What we see is Beleskey is getting better, Williams is getting worse and Semin is relatively stagnant or maybe a slight drop off. Last season the three players were almost identical. One has to wonder if age effects are taking place here as Beleskey is 27 years old and has been entering his prime years and Williams is 33 years old and is starting to leave his prime years. Semin is 31 has been in his prime years and may just be starting his decline.

Goals

Corsi is a useful metric but I believe if you have multiple seasons worth of data you have to look at the goal data for trends as well because goals are what really matters. What is interesting is that with these three players it tells a somewhat different story.

SeminWilliamsBeleskey_GF60RelTM

Recall that for CF60 RelTM we saw Williams always better than Beleskey and Semin bouncing around a bit. Here we see Beleskey starting off below Semin and Williams but the past couple of seasons has surpasses both of them and has had the better GF60 RelTM. Once again Beleskey is improving, Williams and Semin have fallen off some.

SeminWilliamsBeleskey_GA60RelTM

Lower is better so this is pretty much a repeat of GF60 RelTM. Beleskey is improving and has easily had the better GA60 RelTM, particularly the past two seasons.

SeminWilliamsBeleskey_GFPctRelTM

As one would expect, Beleskey clearly has the better CF% RelTM the past couple seasons. What is interesting is hockey analytics favourite Justin Williams has had a negative GF%RelTM in 3 of the past 4 seasons despite having a CF%RelTM well above 0 in each of the last four seasons. Beleskey has had four straight seasons with a GF%RelTM above zero.

The Percentages

To summarize the above charts, Justin Williams looks far better when looking at Corsi than when looking at goals while for Beleskey it is almost the opposite. Furthermore Williams seems to be starting to show his age and starting to decline while the younger Beleskey appears to still be improving. To explain the divergence between Corsi and goals data lets have a look at two more charts: Sh%RelTM and Sv%RelTM.

SeminWilliamsBeleskey_ShPctRelTM

Higher is better on this chart. Williams has consistently been the worst on this list and has generally been at or below 0 meaning his team mates generally post better shooting percentages when not playing with Williams as opposed to playing with him. Beleskey on the other hand has always had a positive impact and Semin for the most part does as well (save for 2013-14).

SeminWilliamsBeleskey_SvPctRelTM

Beleskey’s Sv%Rel numbers are what really got me to investigate him far more deeply. He has posted positive Sv%RelTM numbers for five straight years (2010-11 not shown) and they seem to be improving as well.  Contrast that to Justin Williams who has had a negative Sv%RelTM in four of the past 5 seasons with only 2012-13 breaking that trend.

Aside: I get that people are skeptical that players can influence save percentage (I’ve seen and done the research) but I have also seen too many players show consistent trends to believe that it can’t and doesn’t happen. I have shown recently that coaches generally don’t dole out ice time based on defensive statistics which leads me to believe that it isn’t a trait that coaches emphasize. If coaches don’t emphasize it, it is understandable why not many players exhibit that skill. This would make it difficult to find league-wide correlations but it doesn’t mean that players with these skills don’t exist. It in fact could actually be a sign of untapped value.

Point Production

The last couple of charts I want to present are related to point production. First lets look at 5v5 close Points/60.

SeminWilliamsBeleskey_PtsPer60

What is interesting here is how much better Beleskey has been the past two seasons and how both Semin and Williams have experienced an equally significant drop off. Is aging a factor in these trends?

SeminWilliamsBeleskey_IPP

 

IPP is the percentage of goals that are scored when the player is on the ice that the player had a point (goal or an assist) on. This is an indication of how involved the player is in the offense that is being created when he is on the ice. Until this past season Williams numbers were pretty good while Semin went from OK to terrible this past season. It appears that both Semin and Williams had anomaly seasons but again, is aging a factor here. Conversely Beleskey appears to be improving and his last two seasons were very respectable, particular for a player who also seems to have good defensive numbers.

In Summary

  • There is ample evidence that Justin Williams possession (corsi) statistics are over inflating his value as he has fairly consistently had a poor influence on both shooting and save percentages.
  • There is also ample evidence that Justin Williams is already into his declining years and giving him a longer term contract may not be wise.
  • Beleskey on the other hand appears to be better overall than his possession statistics indicate and also appears to still be improving in all aspects of the game as he has entered his prime years.
  • Semin once had outstanding statistics no matter what you looked at. He has shown a decline the past two seasons and last season he fell off the cliff in a number of areas statistically. At only 31 if the price was reasonable he is worth the gamble on a shorter term contract because if he can get anywhere close to where he was he’d be outstanding value.

My final thought is likely to generate some buzz and controversy amongst the analytics crowd but of the three players I believe Matt Beleskey may be the best currently and almost certainly will be the best over the next several seasons as Williams and Semin age and Beleskey continues in his prime years. There, I said it. Discuss amongst yourselves.

 

Jun 282015
 

Yesterday I looked at what statistics TOI% correlates with which will give us an indication of how coaches distribute ice time to their players. It has occurred to me that TOI% is really a “Rel” statistic in the sense that TOI% gets handed out to players based on how the players compare to the rest of the team and not the rest of the league. So, in comparing TOI% to overall stats such as GF%, CF%, Sh% I am not really comparing apples to oranges. TOI% is a statistic relative to the players teammates while those other stats are relative to the league. In this post I plan on getting around this by looking at those other statistics relative to the players teammates where the Rel stats are calculated  by On Ice – Off Ice. Here is what we get.

TOI% vs R^2
GF60Rel 0.612
GF60 0.568
CF60Rel 0.547
Sh%Rel 0.484
CF%Rel 0.458
Sh% 0.453
GF%Rel 0.392
CF60 0.340
GF% 0.309587
CF% 0.157341
GA60Rel 0.132
GA60 0.104
Sv%Rel 0.095
Sv% 0.089
CA60 0.003
CA60Rel 0.002

In most cases the Rel stats have a higher correlation than the straight stats which makes perfect sense. A bad team still needs to give ice time to some not so good players and a great team will be limiting ice time to some relatively good players. When we compare players to their teammates and not the league as a whole we would expect the correlation with TOI% to get stronger and we do.

As we saw yesterday, we also see how poorly the defensive statistics correlate with TOI% and the Rel statistics are no different. This chart shows this observation really well.

TOIPct_correlations

There are no pure defensive statistics to the left of the red line and every statistic to the right of the red line is a pure defensive statistic.

Yesterday I postulated that coaches might view defense as driven more by systems than individual performance and thus individual performance doesn’t impact coaches decisions as much on the defensive side of the game. With that said though, we often hear about players getting benched or having their ice time limited because they aren’t “doing the little things away from the puck” which often gets interpreted as not being defensively responsible. I have previously wondered whether coaches just struggle with identifying what makes a player good defensively. It is far easier to identify the guys who is a great passer or the guy who has a great shot because it gets converted into goals. It is far more difficult to identify the guy who is positionally sound to inhibit scoring chances because there are no stats directly measure “goals that would have been scored if he were not so good defensively”. Regardless of what you are looking at the “it could have been worse” argument is always the most difficult to make.

It is clear to me that hockey analytics needs better measures of defensive performance which should help us better evaluate both defenders and goalies. It is a big gaping hole in hockey analytics but it also seems likely that there is a major inefficiency in how coaches utilize their players. I just can’t believe that with ideal player utilization that there should be that large of a disconnect between ice time and defensive results.

 

Jun 272015
 

A few days ago I wrote a post looking at whether scoring chances and high danger scoring chances does a very good job at explaining variations in on-ice shooting percentages among NHL forwards. The short answer is that they do explain some of it (scoring chances better than high danger scoring chances) but is still a long way from being an ideal explanatory variable. We know that because I found that TOI% (the percentage of ice time the coach assigns to the player) had a far better correlation with shooting percentages.

In this post I want to take a look at how TOI% correlates with other metrics because it will tell us how coaches decide to dole out ice time. Using statistics from Puckalytics.com I come up with the following table of how well TOI% explains various metrics. To attempt to eliminate score effects I am using 8-year 5v5close data for all forwards with >2000 minutes of ice time.

TOI% vs R^2
GF60 0.568
Sh% 0.452
CSh% 0.441
CF60 0.340
GF% 0.310
CF% 0.157
CSv% 0.130
GA60 0.104
Sv% 0.089
CA60 0.003

It is clear coaches dole out ice time based on offense with a slight preference for shooting percentage over shot generation. Coaches don’t give the big minutes to the best defenders and especially not to those that focus on shot suppression.

It is interesting that coaches don’t seem to value defense as much as offense and especially shots against. It is probably an important reason why we find it so difficult to find trends and relationships in defensive statistics. It is a little odd because we know that some coaches clearly stress defense more than others and yet they don’t seem to be doling out ice time based on defensive results. This observation is likely the result of the belief that defense is a product of the system being employed by the team and less so the talent of the individual players where as offense is far more individual player talent driven.

If you can put the biscuit in the basket you will get a lot of ice time. For those who can’t, ice time is doled out more on who is most dedicated to playing the role they have been assigned in the system the coach has put in place.

What is interesting is if anything analytics is driving the bias towards offense even more. Guys like  Johnny Gaudreau (all 150lbs of him) and Tyler Johnson are all the rage right now. Watching my twitter feed I see analytically inclined Leaf fans getting excited for every small, skilled draft pick the Leafs make while ridiculing pick ups by other teams of big strong players like Los Angeles Kings with Milan Lucic.

Small, fast, skilled players that can move the puck up the ice and put the puck in the net and in while stay at home defensemen and defensive specialist forwards are getting pushed aside. I wonder how the table above will look 5 years from now.

 

Jun 242015
 

This past season War on Ice introduced two new shot quality metrics – Scoring Chances (SC) and High Danger Scoring Chances (HSC) which are defined here.  Stephen Burtch has previously evaluated this scoring chances with respect to their ability to predict future goal scoring and goal differentials and found them to be a better predictor than traditional possession statistics. As a strong believer in shot quality I am not surprised by this conclusion but with this post I want to take a closer look at really how well these metrics are at measuring shot quality.

The premise underlying this analysis is a simple one. The higher the percentage of overall shots (or shot attempts) that are scoring chances or high danger scoring chances the higher the likelihood the player will post a higher shooting percentage. So, I will evaluate the following “on-ice” relationships:

  • HSCF/CF vs CSh%
  • HSCF/SF vs Sh%
  • SCF/CF vs CSh%
  • SCF/SF vs Sh%

To do this evaluation I will take all single seasons for all forwards with at least 500 5v5 minutes in that season for the past 8 seasons. All totalled there were 2611 such player seasons. Here are the results:

HSCF_vs_CShPct

HSCF_vs_ShPct

SCF_vs_CShPct

SCF_vs_ShPct

There are two key takeaways from these charts. First, these metrics do positively correlate with shooting percentage so to some degree these metrics are capturing shot quality though the correlations aren’t particularly great. The second takeaway is that considering all scoring chances is better at measuring average shot quality than restricting to just high danger scoring chances.

There is another metric that is known to correlate well with shooting percentage for forwards though. Ice time. To see how these scoring chance metrics stack up against ice time I looked at the relationship between TOI% (or the percentage of the teams ice time the player gets) and shooting percentage.

TOIPct_vs_ShPct

Now that is interesting. TOI% is quite a bit better at estimating shooting percentages than our shot quality metrics. This tells me two things. First, while scoring chance data is telling us something about shot quality there is still a lot that isn’t accounted for. Second, scoring chance data hasn’t caught up to the coaches “eye-test” ability yet – coaches are better at evaluating talent than our scoring chance data.

The last thing I want to look at is a larger sample size so I took all forwards who have played at least 3000 minutes over the past 8 seasons combined – a total of 364 players. Here is the SCF/SF vs Sh% chart.

SCF_vs_ShPct_8yr

Going with the larger sample size improves things a fair bit. For interest, the r^2 for HSCF/SF vs Sh% using the 8-years of data is 0.21 so scoring chance data is still much better to use than high danger scoring chance data.

So, how did the coaches do over the 8-years at handing out ice time? Pretty good.

TOIPct_vs_ShPct_8yr

In conclusion, shot quality is still something we are having a terribly difficult time understanding. It clearly exists and is a significant factor in driving on-ice results but out ability to measure and quantify what leads to higher shot quality is still clearly lacking. Scoring chances as defined by War on Ice might be a step in the right direction but our attempts to quantify it are still a step (or two) behind the coaches.

 

Jun 032015
 

Over the last several days I have tweeted several times (here, here and here) about my Sv%RelTM statistic which can be found on Puckalytics.com which generated some interest from my followers as well as some skeptics.

 

The issue I have with that statement and others like it is that it uses a simple statistical model, applies it to all players, and then draws conclusions about all players based on the results without actually really understanding what the model is telling us or understanding all the inherent problems with measuring players ability to impact shot quality against.

The most important factor is that goals are infrequent events and a single seasons worth of data is simply not enough to reliably measure shooting and save percentages. This means we need a larger sample size to accurately measure these effects. Unfortunately when you expand the sample size other factors come into play most of which we don’t ever control for in our statistical models. Playing style seems to be a significant impact on the percentages (and on possession statistics too). Over the course of multiple seasons players move up and down lineups and are given different roals, players change teams, coaches get changed and implement new systems, young players improve with age, older players decline, etc. These are all significant factors that come into play and are rarely if ever controlled for but they will all affect the reliability of the statistical models we apply.

Possession stats are great because they are based on shots and shots are frequently occurring events. We can with as little as a single seasons worth of data (or less) identify persistence of possession statistics. What is interesting though is that I have shown that over longer periods of time persistence in possession statistics starts to fall off (especially in shots for statistics). All those other factors that I mentioned above affect possession statistics too but we don’t care about it because we can identify possession talent using small enough sample sizes that they aren’t a factor. (In actual fact they are a factor in that possession stats may be measuring playing style more than player talent)

The reality is that to claim that players have zero talent to influence their goalies save percentage is an extraordinary claim that shouldn’t be made lightly. There are hundreds of players with differing skill sets assigned different roles and playing different styles of play and to conclude that none of those differences has an impact on goalie save percentage would be quite astonishing. In fact, this is something that Kyle Dubas recently talked about at the Sloan Conference.

“You read everything and I agree with it and its sound where Player X can’t influence the goalies save percentage in a repeatable manner or for a different goalie and so on and so forthand I can never bring myself, even though I agree with all of the data information is correct, I can’t bring myself to fully admit and accept that a defenseman can’t or a forward in a defensive posture can’t alter the course of the game defensively.” –Kyle Dubas (about the 39:00 mark in video)

So, I really wonder if grouping all players together and running some simple statistical models is sufficient to make the claim that no player can impact goalie save percentage? I don’t believe so and I don’t believe so for a number of reasons including that we know save percentage varies based on score due to how players play based on whether they are protecting a lead or playing catch up. If score effects impact save percentage there should be no doubt that different players with different roles can do so as well and thus we should be able to see this in the data.

The list below are the 2010-2013 (3yr) leaders (top 20 skaters) in Sv%RelTM among players who have had at least 1500 minutes of 5v5close ice time over that 3-year span and at least 1000 minutes of 5v5close ice time over the following 2-year span (2013-2015) which we will compare the 3-year data to.

Player_Name 2010-13 Sv% RelTM 2013-15 Sv% RelTM
DAVID BACKES 3.1 0.8
JASON GARRISON 2.6 0.7
DUSTIN BROWN 2.5 0.3
LOGAN COUTURE 2.4 2
MATT NISKANEN 2.2 0.9
STEVE OTT 2.1 1.5
TYLER SEGUIN 1.9 0
ANDREW MACDONALD 1.9 0.4
ALEXANDER SEMIN 1.8 -0.2
BRIAN CAMPBELL 1.8 0.1
MARIAN HOSSA 1.8 1.4
WILLIE MITCHELL 1.7 0.4
BRANDON SUTTER 1.7 0.8
BOBBY RYAN 1.7 0.4
FRANS NIELSEN 1.6 -0.2
DAN BOYLE 1.6 1.7
SERGEI GONCHAR 1.5 1.4
MATT MOULSON 1.4 0.3
DION PHANEUF 1.4 1.7
KEVIN SHATTENKIRK 1.3 -0.5
Average 1.9 0.695

Looking at the above table there should be no doubt that there is some level of persistence there. Of the top 20 skaters in for the 3-year period in Sv%RelTM only 3 of those players went on to post a negative Sv%RelTM in the following 2-year period and none were really that significantly negative.

Here are the top 15 players in 5v5close Sh%RelTM over the past 5 seasons combined.

Player_Name Sv% RelTM
SHAWN HORCOFF 2.5
BRYCE SALVADOR 2.5
DAVID BACKES 2.3
ERIK CONDRA 2.1
DANNY DEKEYSER 2.1
DALE WEISE 2
IAN COLE 2
LOGAN COUTURE 2
STEVE OTT 1.9
PAUL GAUSTAD 1.9
JASON GARRISON 1.8
SLAVA VOYNOV 1.7
TREVOR LEWIS 1.7
DANIEL WINNIK 1.7
CAM ATKINSON 1.7

 

What you will notice is that the majority of them are considered defensive specialists or at the very least quality 2-way players. Is this really evidence of randomness? No. There appears to be a certain type of player that rises to the top of this list. Randomness doesn’t produce this.

The ability of players to impact a goalies save percentage is real. It is difficult to reliably detect over small sample sizes and over larger sample sizes often gets washed out by other factors such as roster changes, coaching changes, changes in role, etc. and in fact there may not be a lot of players that exhibit this talent to a significant degree but it doesn’t mean it doesn’t exist. It most certainly does and failure to recognize it will lead to failures in properly evaluating players. It is also important to really take the time to understand what a statistical model is, and is not, telling us and not to draw conclusions prematurely.

——

I have written on this topic several times previously. Here are a couple of post that are worth reading. Eventually I hope I’ll be able to stop writing articles on this topic.

Defenders effect on save percentage

Why can’t players boost a goalies save percentage?

Is Hockey Analytics altering outcomes yet?

 Uncategorized  Comments Off on Is Hockey Analytics altering outcomes yet?
Apr 262015
 

Hockey analytics is well behind analytics in other sports, particularly baseball, but we are now several years into what I will call modern (or current) hockey analytics which has largely focused on possession statistics such as Corsi and Fenwick. Last summer we even saw a number of teams publicly adopt analytics by picking up some prominent people from the public domain. Toronto, Edmonton, Carolina, Florida, and New Jersey to name a few. Results for those teams have clearly been mixed thus far but the greater question is whether hockey analytics, and possession analytics in particular, has had a greater impact on the game than just those few teams. I hope to answer some of those questions today.

One of the reasons why possession statistics such as Corsi became so popular is that it has shown that good possession teams often do well and it has also been identified as an undervalued skill as Eric Tulsky wrote about a couple of years ago. Contracts and salaries were generally given by teams to reward skills such as shooting percentage more than possession skills and thus possession skills were an undervalued talent. Teams could tap into this undervalued talent by getting good possession players at a fraction of the cost of good shooting percentage players. I warned that focusing too much on possession statistics is potentially harmful in the long run as it could result in players altering their playing style at the expense of what really matters, out scoring the opposition. I have shown that there is likely at least a loose inverse relationship between Corsi and shooting percentage implying that boosting one Corsi often has the negative consequences of reducing ones shooting percentage. I did this by looking at the impacts of coaching changes on Corsi and Shooting percentage and looking at the relationship between team CF% and Sh% when extreme outliers are removed.

So, the question is, have we started to see this shift where more teams are focused on possession and less so on shooting (and save) percentage? Has this shift altered team statistics and what leads to success in the NHL? Has the spread in talent across teams for the various metrics increased or decreased? To do this I am going to start off by investigating if there are any differences in statistics for the average team that makes the playoffs (or misses) compared to the average team that makes (or misses) the playoffs several years ago. Let’s start by comparing average playoff team GF% vs average non-playoff team GF% over the past 8 seasons (note that all statistics discussed here are 5v5close statistics unless otherwise specified).

Playoff_vs_NonPlayoff_Team_GFPct

Can’t really say too much has changed here. If anything the spread between good and bad teams has increased a bit but it could just be randomness too. The other observation is that 2012-13 is a bit of an anomaly where the non-playoff teams actually had a higher average GF% than the playoff teams did. This makes no sense other than in a shorter season strange things happened. We’ll get into this more but in a bit. For now, lets have a look at CF%.

Playoff_vs_NonPlayoff_Team_CFPct

Outside of 2012-13 that is about as stable of a chart as you could possible find. There was a slight increase in spread in 2008-09 but otherwise in full seasons the spread in CF% between playoff and non-playoff teams has been very persistent.

Now let’s take a look at shooting percentage.

Playoff_vs_NonPlayoff_Team_ShPct

For shooting percentage, not only is the short 2012-13 season an anomaly but so is 2011-12.  In both of these seasons non-playoff teams posted a shooting percentage higher than playoff teams did which I guess puts some water on the argument that lucky teams make the playoffs if one defines luck as posting an elevated shooting percentage. What is also interesting to note though that outside of these two seasons there appears to be a trend towards an increasing disparity between good and bad team shooting percentage. Let’s look at this difference more closely by plotting Average Playoff Team Sh% – Average Non-Playoff Team Sh%.

DifferenceBetweenPlayoffNonPlayoffTeamShPct

The trend line does not include 2011-12 and 2012-13 which I’ll admit could be interpreted as a bit of selection bias (though those are clear anomalies in this chart) but when one does ignore those two seasons the trend is pretty clear. The disparity in shooting percentage between playoff and non-playoff teams is growing.

This is really kind of counter-intuitive. In a hockey world where there is a hard salary cap and where shooting percentage is an expensive talent to acquire one would actually expect teams would have a difficult time keeping all of their high-shooting percentage players. This does not appear to be the case though.

What about save percentage?

Playoff_vs_NonPlayoff_Team_SvPct

Again, 2012-13 is an anomaly season but otherwise it is difficult to identify a trend in save percentage aside from playoff teams always tending to have a better save percentage than non-playoff teams which makes perfect sense.

And just to complete the charts, here is PDO.

Playoff_vs_NonPlayoff_Team_PDO

Not much new here. The short 2012-13 again appears to be an anomaly and 2011-12 driven by Sh% is a bit of an anomaly as well. Also driven by Sh% is what appears to be a slight increase in disparity between playoff team PDO and non-playoff team PDO.

The other thing we can do is look at the spread of these statistics by season by looking at the standard deviation across all teams to see if the spread is increasing, decreasing or staying more or less the same.

StandardDev_TeamGFPct_BySeason

Aside from the 2011-12 seasons there might be an upward trend in the size of the spread in team GF% which is kind of interesting and counter to the popular belief that parity is increasing in the NHL. It is possible that there is increased parity in the middle and what we are seeing above is driven by the extremes (a few extremely good or extremely bad teams).

StandardDev_TeamCFPct_BySeason

The spread in CF% appears to have increased the past three seasons. Could this be due to some teams jumping on board with possession statistics while others are not resulting in the increased disparity? Difficult to say but certainly possible. It could also be that more teams are going the tank and rebuild through high draft pick route (I am looking at you Edmonton and Buffalo).

StandardDev_TeamShPct_BySeason

As one would expect, the short season of 2012-13 produced the greatest spread in team shooting percentages but otherwise the spread in shooting percentage talent across teams has been pretty stable.

StandardDev_TeamSvPct_BySeason

Pretty much the same for save percentage – a bump in the short 2012-13 seasons but otherwise pretty stable.

StandardDev_TeamPDO_BySeason

And we finish it off with the standard deviations of PDO which is surprisingly variable considering how relatively stable both shooting percentage and save percentage were aside from 2012-13. Not quite sure what to make of that variability but there doesn’t seem to be any upward/downward trend otherwise.

From the above charts I think it is very difficult to suggest that there has been much change in outcomes thus far in the NHL’s adoption of analytics. There are some potentially interesting things surrounding shooting percentage and possibly the increased the variability in CF% the past couple seasons but overall we can’t say with any certainty that anything significant has changed thus far. It is still early though and it can take a number of seasons to change a teams focus so we’ll have to keep an eye on it but so far we aren’t seeing much impact.

 

Is 4v4 overtime hockey a crap shoot we can or should ignore?

 Uncategorized  Comments Off on Is 4v4 overtime hockey a crap shoot we can or should ignore?
Apr 132015
 

Since the Los Angeles Kings have been eliminated from the playoffs there has been a lot of discussion about why a team with such a good possession game failed to make the playoffs. This included my article from yesterday which generated a fair amount of discussion as well. A lot of the discussion can be summarized by the following tweet by Sunil Agnihotri referencing a comment by Walter Foddis.

The last paragraph is the one that interests me most.

“The substantive reason for LA not making the playoffs is the OT system, which does not reflect team strength. Statistically, OT outcomes have been shown to be a crap shoot. LA was unlucky in OT”

The fact that LA went 1-7 during overtime play does in fact mean that they were unlucky during OT play. They are a better team than that for sure (every team is expected to do better than that). OT results over the course of a single season are extremely random and thus one could consider them a crap shoot. The challenge I have is just because something is highly variable does that mean it is meaningless in our evaluation? Being unlucky in over time does not mean you are unlucky overall.

I’d hazard a guess that outcomes of the first 5 minutes of the second period for games that are played on a Thursday are highly random too. If a team missed the playoffs and had a terrible goal differential during the first 5 minutes of the second period in games that are played on a Thursday can we chalk up missing the playoffs to bad luck during the first 5 minutes of the second period in Thursday games? No, of course not. We don’t get to pick and choose what good luck or what bad luck we can blame results on. Just because we are more aware of bad luck that happens in overtime games doesn’t mean it is more important bad luck worthy of attributing blame to.

The reality of the situation is that unless you can be certain that the Kings OT bad luck is not offset by good luck during the remainder of the game you can’t blame the Kings missing the playoffs on their OT record.  I haven’t seen the complete luck analysis of the Kings season done to claim the Kings were unlucky during regulation and OT play as a whole so I am pretty reluctant to blame the Kings playoff miss on their OT record just yet.

The interesting question for me is whether 4v4 play is indicative of overall talent because if 4v4 hockey requires a completely different skill set then one could conclude that overtime play isn’t representative of true hockey talent. To answer this question I took the correlation between each teams 5v5close GF% over the past 8 seasons (to get large sample sizes though it would reduce the spread in talent) and compared it to their 4v4close GF% over the past 8 seasons (I used close since most 4v4 ice time is in OT and thus in close situations). Here are the results.

5v5close_vs_4c4close_GFPct

And the same for CF%.

5v5close_vs_4c4close_CFPct

Those correlations are good enough for me to consider that 5v5 skills are fairly transferable to 4v4 play and vice versa. Over small samples strange things happen, but to suggest that 4v4 play isn’t indicative of hockey skill and that is why one should ignore OT results is not valid either.

An interesting observation is that the slope on the CF% chart is almost exactly 1.0. The slope on the GF% chart is significantly higher than 1.0 which might indicate that 4v4 play is actually a better indicator of talent than 5v5 play (if you are good at 5v5 play you should be even better at 4v4 play). That said, if I force the intercept to zero the slop drops to 0.9958 or almost exactly even (and r^2 drops to 0.3123 with zero intercept) so maybe 5v5 and 4v4 are on par with each other. Regardless, this should at least alleviate Steve Burtch’s concern that poorer teams are more likely to score first during 4v4 play than during 5v5 play. I don’t believe that to be the case.

Now when we talk about shoot out record I think that it is safe to assume that the shoot out is a lot further from being representative of actual hockey talent than 4v4 play. There is probably not enough shoot out data to actually be able to do a similar analysis with any degree of confidence but I doubt there is much disagreement that the shoot out is a long way from being representative of real hockey.

 

Apr 122015
 

The other day I posted the following twitter comment after the Flames defeated the Kings to gain a playoff position while simultaneously eliminating the reigning Stanley Cup Champion Los Angeles Kings from the playoffs.

I posted this comment for two reasons. First because I think if you are being honest about evaluating possession analytics you have to consider the failures on an equal ground as the successes. I am certain that if the Kings defeated the Flames and ultimately made the playoffs over the Flames there would have been people that would use it as evidence that possession analytics is good at predicting future results. That would be a fair thing to do but you have to consider the failures too and possession analytics failed twice here, first with the Flames making the playoffs and second with the Kings missing. So, I made this comment because analytically it is the correct thing to do and I felt it needed to be said.

The other reason I made this comment was to see how people would react and to see whether people would react with fairness as explained above or in a defensive manner defending possession analytics and dismissing the Flames/Kings outcome as largely luck. For the most part the reaction was more subdued that I had thought but there were some jumping in defense of possession analytics including the following tweet from @67sound.

If you are relying on the LOS ANGELES KINGS to minimize the importance of possession metrics I don’t even know where to begin.

This is an over reaction because I didn’t actually try to minimize the importance of possession, I was just pointing out where it failed. If you follow me I use possession metrics all the time, I just think that there is too much consideration for when possession metrics succeed in predicting outcomes and too little consideration of when it fails and when other metrics succeed. I have talked about this before on a few occasions where people want to point out how well possession metrics are at predicting outcomes but not actually comparing the success rates against other predicting methodologies. In many instances possession statistics do a great job at predicting outcomes, but often goal based metrics actually do slightly better.

The follow up discussion to my tweet soon started to rationalize why the possession stats failed in predicting the Los Angeles Kings missing the playoffs.

Scott Cullen of TSN.ca wrote the following in his Statistically Speaking column about the Kings.

For starters, the Kings were 2-8 in shootouts and 1-7 in overtime games. Given the randomness involved in shootout results, that’s basically coming out on the wrong end of coin flips. 3-15 in overtime and shootout games, after going 12-8 the year before, is enough in tightly-contested standings, to come up short. Records in one-goal games tend to be unsustainable, but there’s enough of them in hockey that they make a huge difference in the standings.

Most of these are fair comments. The shootout record in almost completely random and not actually representative of how good they are at playing hockey (though I disagree with overtime records not being useful in evaluating how good the Kings are at playing hockey). With a bit better fortune the Kings likely would have made the playoffs and probably should have. The thing is though we all need to be careful not to use “luck” as a tool in confirmation bias as luck can be used to explain everything. Flames made the playoffs, write it off as good luck and move on without blinking an eye. They will regress next year, just watch. Kings missed the playoffs, write it off as bad luck and move on without blinking an eye. They will be better next year, just watch. A thorough review needs to be conducted, not just quickly write off anything that goes counter to our beliefs/predictions as luck.

The Kings missed the playoffs this year with 95 points. The previous four seasons they have had 100, 101 (prorated over 82 games), 95, and 98 points. So, on average the LA Kings have been a ~98 point team over the past 5 seasons. If they went 5-5 instead of 2-8 in shootouts that is exactly where they would have finished. For the most part this Kings team is what they have mostly been and what we probably should have expected. That is a good, but not elite, regular season team. Over these past 5 seasons they have finished 18th, 10th, 7th, 13th and 12th place overall. That actually compares somewhat poorly to the cross-town Anaheim Ducks who have finished 3rd, 2nd, 3rd, 25th, and 9th over the past 5 seasons. The Kings score adjusted Fenwick % over that time is 55.3% compared to the Ducks 50.3% and yet four of the five seasons the Ducks finished ahead of the Kings in the regular season. The reason for this is the Ducks have a 9.19 5v5close shooting percentage over the past four seasons compared to the Kings 6.69%. That difference is not luck. It’s a persistent repeatable skill that possession analytics doesn’t capture. Barring major off season roster moves no one should be predicting the Kings to end the regular season ahead of the Ducks next season. I suspect some will though just as was done for this season when using possession analytics to predict regular season point totals (Kings were predicted to get 107 points, Ducks 91).

So the Kings have been a pretty good but not a dominant regular season team. They have won the Stanley Cup twice during this period and have been a dominant possession team which has given us the perception that they are an elite team. Is it possible that we have generally over rated them because of their possession and post season success?  Maybe. Are they really a great team or just a good one that got hot when it mattered a couple times? It’s a question worth asking I think but if you just chalk up missing the playoffs this season to luck it is probably one you won’t be asking.

While we are on the subject of teams that were predicted to regress this season one such team is the Colorado Avalanche. A lot of people are tossing them out as an example of where possession statistics successfully predicted their failures this season. A major reason for predicting this regression was due to regression in their shooting and save percentages as Travis Yost of TSN.ca wrote prior to the season.

Using that regression for forecasting purposes, expect Colorado to shoot around 7.89 per cent for next year at evens and stop around 92.47 per cent of the shots.

Those are 5v5 shooting and save percentages Yost is talking about. In actual fact Colorado’s shooting hasn’t regressed this year as it is more or less identical to last seasons 5v5 shooting percentage (8.75% this season vs 8.80% last season). Save percentage has regressed almost what Yost predicted (92.52%) so he was right there (the role luck played in this is unknown though) but a major (and maybe the primary) reason for the Avalanche’s failures this season is they are playing a substantially worse possession game than last season. Colorado’s 5v5close CF% dropped from 47.4% last season to 42.9% this season which is a massive drop and likely the major reason for their failures this season. That drop can largely be attributed to letting two of their best CF% players leave in the off season – Paul Stastny and PA Parenteau and replacing them with poorer possession players in Iginla and especially Briere. Coaching may be a factor too. So some of the Avalanche’s failures this season can be attributed to a regression in save percentage but a significant part of it is due to poor off-season roster decisions.

Once again, we need to be careful with the “I told you they would regress” and leave it at that if the majority of their regression is due to factors you didn’t predict (to be fair Yost did mention that the Avalanche’s possession might drop a bit due to roster changes as well but it wasn’t the crux of his argument). It is quite possible, if not highly likely, the Avalanche is in fact a well above average shooting percentage team and we shouldn’t expect it to regress next season just as we shouldn’t expect the Ducks to either.

I need to reiterate here that it isn’t that I don’t believe that possession is an important aspect of the game. It is. It is why the Kings are good despite terrible shooting talent. It is why the Leafs are bad despite good shooting talent. What I really want to see and why I always point out where possession failed is because I want to ensure is that everyone evaluates possession fairly in the context of the complete game. I often hear things like “no one ever said possession was everything” and yet I frequently hear claims made without any mention of factors other than possession metrics. The Kings being a perfect example. Everyone assumed they were a great team that, barring massive bad luck, would make the playoffs and when they didn’t make the playoffs they started throwing out all the evidence of that bad luck. Truth is it was perfectly reasonable to predict that with even a little bit of bad luck the Kings could miss the playoffs though I don’t recall anyone really suggesting that (correct me if I am wrong though). It is also fair to suggest that if Colorado made smarter off season roster moves they could have been a playoff team again and not regress nearly to the extent they did but the discussion about the Avalanche revolved around bad possession, high PDO, they were lucky and will regress a lot. I want to see a better balance in hockey analytics as I think too much of hockey analytics is dominated by possession analytics. That is why I write tweets like the one about the Kings and Flames. There needs to be more balance.

So, my final words of advice is if you don’t believe that possession is everything (which apparently none of you do) you ought to be doing more than just conducting possession analytics. If you can honestly say you are doing that I congratulate you. If you can’t, well, what you do next is up to you.

 

Mar 212015
 

The other day on twitter I was called out by Sam Ventura who does some great work on war-on-ice.com. Specifically he did not like my article on zone starts that I wrote the other day.

Let me step in here and say that I have never denied this. Offensive zone face offs are more likely to result in shots for the team on offense and less likely for the team on defense. Ok, that is settled, lets move on.

 

This is the crux of the problem. At the micro level yes, the location of face offs impacts outcomes. On the macro or aggregate level they are minimal. I tried to explain that here in more detail but maybe it didn’t come across too well so I will try again, in another way, with the war-on-ice tools. Let’s look at the Shea Weber picture from above Sam’s tweets above.

Weber1

Ok, do there looks like a relationship. The higher the offensive zone start percentage the higher the CF%. Now, let’s take a look at the same chart but Offensive zone start percentage relative and see how the chart changes.

weber2

Significantly less correlation. Why? Because when the team is playing well the team as a whole generates more offensive zone starts. Not the other way around. We can also flip it around and look at how ZSO% compares to CF%Rel.

Weber3

And to finish the display we can look at ZSO%Rel vs CF%Rel.

Weber4

The relationship that Sam has observed is largely team driven, not Weber’s zone starts driven. There is a zone start impact on a players statistics but it is very minimal and for the majority of players can safely be ignored. The impact of the team is far more important. When the team does well it will result in a better CF% which in turn results in a higher ZSO% which is the reason for the high correlation. Zone starts don’t drive CF%, CF% drives zone starts. This makes total sense because the majority of zone starts will come after a shot on goal. The shot on goal produces the offensive zone face off, it isn’t the offensive zone face off that produces the shot on goal. We need to think of zone starts more as a result, not a cause.

On top of the team effect, I believe there is a style of play impact too which will take away even more correlation. When you play defensive hockey you often give up more shots. We see it in score effects all the time. Players who start more in the defensive zone are more likely to be the ones playing defensive hockey. This adds to the correlation as well and has nothing to do with zone starts.

Let me leave you with Phaneuf’s charts because his correlation in Sam’s charts was probably the greatest.

 

 

Phaneuf1

Phaneuf2b

Again, a significant portion of the relationship disappears when you look at ZSO%Rel.

For me, the main evidence that zone starts don’t have a significant effect on a player’s overall statistics is if I remove the 45seconds after all offensive/defensive zone face offs (which basically ignores the entire shift) the majority of players have the same CF% +/- 1% and only a handful with heavy offensive or defensive zone starts have an effect in the +/- 1-2%. If removing all shifts that start with an offensive or defensive zone start does not dramatically impact a players overall statistics you simply cannot conclude that zone start bias plays a prominent role in driving a players overall statistics. Yes, for a particular shift it will, but not overall. Furthermore, the majority of that impact occurs in the first 10 seconds after a face off which is why my zone start adjusted data removes these 10 seconds which is something I showed over 3 years ago.

The critical point to remember in all of this is shots drive where face offs occur, where face offs occur do not drive shots. Coaching and line changes for face offs can impact overall player statistics a little but really not all that much.

 

Zone Starts, Corsi, and the Percentages

 Uncategorized  Comments Off on Zone Starts, Corsi, and the Percentages
Mar 162015
 

Matthew Coller has an interesting article on Puck Prospectus about Shea Weber and his poor Relative Corsi. His conclusion was that Weber’s poor Relative Corsi is largely due to his playing time with Paul Gaustad in which he posted a very poor CF% along with having a very heavy defensive zone start. His conclusion was that Weber’s poor Corsi with Gaustad is in a significant way caused by the heavy defensive zone start bias. This is a case of correlation not causation as I outlined in the comment section of that article. I recommend you take the time to read both the article and my comments because they are worthwhile reads.

My issue with the article is that I don’t believe that zone starts dramatically impact a players overall statistics as I explained here. I just haven’t seen any convincing evidence that zone starts would change a players CF% much more than 1-2% and for most players considering zone starts in player evaluation is not important. The relationship that Coller observed is important though because there is a clear relationship between zone starts and CF%. The relationship isn’t causal though. What the zone starts signify is a style of play. Players with a heavy defensive zone start bias are likely asked by the coach to play a defense first game and in many cases generating offense is not an important issue. The result is often a relatively minor deviation in a players CA/60 but a major deviation in a players CF/60 from the overall team stats. Let’s look at Paul Gaustad as an example. Gaustad has a OZone% of just 12.2% which means he has over seven times as many defensive zone starts as offensive zone starts. Here are how his Corsi stats compare to Nashville’s overall stats in 5v5close situations this season.

CF60 CA60
Nashville 60.0 53.0
Gaustad 38.8 51.9

As you can see, despite a heavy defensive zone start bias when Gaustad is on the ice the Predators actually gives up slightly fewer shots attempts against than they do overall but it is pretty close. Offensively though, when Gaustad is on the ice there is significantly less offense generated. If zone starts are the explanation one would probably expect there to be more balance between more shot attempts against and fewer shot attempts for but this is not the case. The likely explanation is that when Gaustad is on the ice the team is largely focused on not giving up a goal rather than generating offense. I suspect they do this largely by not giving up the puck and maintaining puck possession when you get possession. When you take a shot you are actually giving up control of the puck. You may regain control but so might the other team. If you are focused on preventing goals the best way to do that is to not give up the puck.

Lets take a quick look at Filip Forsberg who has played with a heavy offensive zone start bias indicating he is probably used in more offensive situations.

CF60 CA60
Nashville 60.0 53.0
Forsberg 69.4 53.2

Forsberg’s CA/60 is actually very similar to the team average and not all that different from Gaustad’s (higher actually) but his CF/60 is almost 80% higher. Again, this is unlikely to be zone start influenced but rather some combination of talent and playing style.

So, it seems that Ozone% is likely an indication of style of play, or at least an indicator of the main objective of the players on the ice, and we have seen that this can have a major impact on shot attempt rates.  I want to take this discussion one step further by looking at whether players can influence shooting/save percentages based on their style of play. Since shooting/save percentages are highly variable over small sample sizes such as the number of shots for/against taken while a player is on the ice during a single season we need to find ways to work around the randomness associated with the percentages. One way to do this is to group players based on similar attributes and take a group average. One of my favourite hockey analytics articles was this one written by Tom Awad in which he grouped similar players based on ice time and in doing so he found that shooting better than your opponent is a major factor in what makes good players good. In this case I have grouped players based on their OZone% and then took a group average Sh%RelTM and Sv%RelTM during 5v5close situations.

Ozone% Sh% RelTM Sv% RelTM
<30% -0.92% 1.26%
30-35% -0.43% 0.59%
35-40% -0.38% 0.80%
40-45% -0.18% -0.03%
45-50% -0.07% -0.07%
50-55% 0.48% 0.10%
55-60% 0.50% -0.16%
60-65% 0.52% 0.36%
65+% 0.24% -1.07%

Graphically here is what we get.

ZoneStarts_vs_Percentages

As you can see, there is a fairly strong relationship between zone starts and Sh%RelTM and Sv%RelTM. Players with a heavy defensive zone start will generally have a positive impact on his teams save percentage and a negative impact on his teams shooting percentage. Conversely players with a heavier offensive zone start bias will generally have a positive impact on his teams shooting percentage and negative impact on his teams save percentage. Some of this is likely player talent but a significant portion of it is likely driven by style of play as we saw with Corsi. It is next to impossible to identify these relationships by looking at individual players statistics because of the small sample sizes but when we group similar players together the relationship becomes clear and is a relatively strong one.

For perspective, Paul Gaustad’s OZone% over past three seasons with Nashville is 21.2% while his Sh%RelTM is -1.4 and his Sv%RelTM is +1.9.

The major takeaways I hope people get from this article are the following:

  1. Zone starts really do not have a significant impact on a players statistics.
  2. Zone starts can be an indicator of a players style of play and style of play can have a major influence on a players statistics (see my Coaching/Corsi dilemma article for more evidence of how style of play impacts Corsi).
  3. Players are able to, through talent and/or playing style, influence save and shooting percentages.
  4. Finding trends in shooting/save percentages can be difficult due to small sample size issues but that does not mean they do not exist. Hockey is a complex sport to analyze but being creative in grouping similar players can allow you to pull out valuable information that you otherwise could not.