Mar 212015

The other day on twitter I was called out by Sam Ventura who does some great work on Specifically he did not like my article on zone starts that I wrote the other day.

Let me step in here and say that I have never denied this. Offensive zone face offs are more likely to result in shots for the team on offense and less likely for the team on defense. Ok, that is settled, lets move on.


This is the crux of the problem. At the micro level yes, the location of face offs impacts outcomes. On the macro or aggregate level they are minimal. I tried to explain that here in more detail but maybe it didn’t come across too well so I will try again, in another way, with the war-on-ice tools. Let’s look at the Shea Weber picture from above Sam’s tweets above.


Ok, do there looks like a relationship. The higher the offensive zone start percentage the higher the CF%. Now, let’s take a look at the same chart but Offensive zone start percentage relative and see how the chart changes.


Significantly less correlation. Why? Because when the team is playing well the team as a whole generates more offensive zone starts. Not the other way around. We can also flip it around and look at how ZSO% compares to CF%Rel.


And to finish the display we can look at ZSO%Rel vs CF%Rel.


The relationship that Sam has observed is largely team driven, not Weber’s zone starts driven. There is a zone start impact on a players statistics but it is very minimal and for the majority of players can safely be ignored. The impact of the team is far more important. When the team does well it will result in a better CF% which in turn results in a higher ZSO% which is the reason for the high correlation. Zone starts don’t drive CF%, CF% drives zone starts. This makes total sense because the majority of zone starts will come after a shot on goal. The shot on goal produces the offensive zone face off, it isn’t the offensive zone face off that produces the shot on goal. We need to think of zone starts more as a result, not a cause.

On top of the team effect, I believe there is a style of play impact too which will take away even more correlation. When you play defensive hockey you often give up more shots. We see it in score effects all the time. Players who start more in the defensive zone are more likely to be the ones playing defensive hockey. This adds to the correlation as well and has nothing to do with zone starts.

Let me leave you with Phaneuf’s charts because his correlation in Sam’s charts was probably the greatest.





Again, a significant portion of the relationship disappears when you look at ZSO%Rel.

For me, the main evidence that zone starts don’t have a significant effect on a player’s overall statistics is if I remove the 45seconds after all offensive/defensive zone face offs (which basically ignores the entire shift) the majority of players have the same CF% +/- 1% and only a handful with heavy offensive or defensive zone starts have an effect in the +/- 1-2%. If removing all shifts that start with an offensive or defensive zone start does not dramatically impact a players overall statistics you simply cannot conclude that zone start bias plays a prominent role in driving a players overall statistics. Yes, for a particular shift it will, but not overall. Furthermore, the majority of that impact occurs in the first 10 seconds after a face off which is why my zone start adjusted data removes these 10 seconds which is something I showed over 3 years ago.

The critical point to remember in all of this is shots drive where face offs occur, where face offs occur do not drive shots. Coaching and line changes for face offs can impact overall player statistics a little but really not all that much.


Zone Starts, Corsi, and the Percentages

 Uncategorized  Comments Off on Zone Starts, Corsi, and the Percentages
Mar 162015

Matthew Coller has an interesting article on Puck Prospectus about Shea Weber and his poor Relative Corsi. His conclusion was that Weber’s poor Relative Corsi is largely due to his playing time with Paul Gaustad in which he posted a very poor CF% along with having a very heavy defensive zone start. His conclusion was that Weber’s poor Corsi with Gaustad is in a significant way caused by the heavy defensive zone start bias. This is a case of correlation not causation as I outlined in the comment section of that article. I recommend you take the time to read both the article and my comments because they are worthwhile reads.

My issue with the article is that I don’t believe that zone starts dramatically impact a players overall statistics as I explained here. I just haven’t seen any convincing evidence that zone starts would change a players CF% much more than 1-2% and for most players considering zone starts in player evaluation is not important. The relationship that Coller observed is important though because there is a clear relationship between zone starts and CF%. The relationship isn’t causal though. What the zone starts signify is a style of play. Players with a heavy defensive zone start bias are likely asked by the coach to play a defense first game and in many cases generating offense is not an important issue. The result is often a relatively minor deviation in a players CA/60 but a major deviation in a players CF/60 from the overall team stats. Let’s look at Paul Gaustad as an example. Gaustad has a OZone% of just 12.2% which means he has over seven times as many defensive zone starts as offensive zone starts. Here are how his Corsi stats compare to Nashville’s overall stats in 5v5close situations this season.

CF60 CA60
Nashville 60.0 53.0
Gaustad 38.8 51.9

As you can see, despite a heavy defensive zone start bias when Gaustad is on the ice the Predators actually gives up slightly fewer shots attempts against than they do overall but it is pretty close. Offensively though, when Gaustad is on the ice there is significantly less offense generated. If zone starts are the explanation one would probably expect there to be more balance between more shot attempts against and fewer shot attempts for but this is not the case. The likely explanation is that when Gaustad is on the ice the team is largely focused on not giving up a goal rather than generating offense. I suspect they do this largely by not giving up the puck and maintaining puck possession when you get possession. When you take a shot you are actually giving up control of the puck. You may regain control but so might the other team. If you are focused on preventing goals the best way to do that is to not give up the puck.

Lets take a quick look at Filip Forsberg who has played with a heavy offensive zone start bias indicating he is probably used in more offensive situations.

CF60 CA60
Nashville 60.0 53.0
Forsberg 69.4 53.2

Forsberg’s CA/60 is actually very similar to the team average and not all that different from Gaustad’s (higher actually) but his CF/60 is almost 80% higher. Again, this is unlikely to be zone start influenced but rather some combination of talent and playing style.

So, it seems that Ozone% is likely an indication of style of play, or at least an indicator of the main objective of the players on the ice, and we have seen that this can have a major impact on shot attempt rates.  I want to take this discussion one step further by looking at whether players can influence shooting/save percentages based on their style of play. Since shooting/save percentages are highly variable over small sample sizes such as the number of shots for/against taken while a player is on the ice during a single season we need to find ways to work around the randomness associated with the percentages. One way to do this is to group players based on similar attributes and take a group average. One of my favourite hockey analytics articles was this one written by Tom Awad in which he grouped similar players based on ice time and in doing so he found that shooting better than your opponent is a major factor in what makes good players good. In this case I have grouped players based on their OZone% and then took a group average Sh%RelTM and Sv%RelTM during 5v5close situations.

Ozone% Sh% RelTM Sv% RelTM
<30% -0.92% 1.26%
30-35% -0.43% 0.59%
35-40% -0.38% 0.80%
40-45% -0.18% -0.03%
45-50% -0.07% -0.07%
50-55% 0.48% 0.10%
55-60% 0.50% -0.16%
60-65% 0.52% 0.36%
65+% 0.24% -1.07%

Graphically here is what we get.


As you can see, there is a fairly strong relationship between zone starts and Sh%RelTM and Sv%RelTM. Players with a heavy defensive zone start will generally have a positive impact on his teams save percentage and a negative impact on his teams shooting percentage. Conversely players with a heavier offensive zone start bias will generally have a positive impact on his teams shooting percentage and negative impact on his teams save percentage. Some of this is likely player talent but a significant portion of it is likely driven by style of play as we saw with Corsi. It is next to impossible to identify these relationships by looking at individual players statistics because of the small sample sizes but when we group similar players together the relationship becomes clear and is a relatively strong one.

For perspective, Paul Gaustad’s OZone% over past three seasons with Nashville is 21.2% while his Sh%RelTM is -1.4 and his Sv%RelTM is +1.9.

The major takeaways I hope people get from this article are the following:

  1. Zone starts really do not have a significant impact on a players statistics.
  2. Zone starts can be an indicator of a players style of play and style of play can have a major influence on a players statistics (see my Coaching/Corsi dilemma article for more evidence of how style of play impacts Corsi).
  3. Players are able to, through talent and/or playing style, influence save and shooting percentages.
  4. Finding trends in shooting/save percentages can be difficult due to small sample size issues but that does not mean they do not exist. Hockey is a complex sport to analyze but being creative in grouping similar players can allow you to pull out valuable information that you otherwise could not.




The Coaching-Corsi dilemma

 Uncategorized  Comments Off on The Coaching-Corsi dilemma
Feb 242015

The other day I wrote about Bozak-Corsi dilemma which basically goes as follows:

  • The coaching change in Toronto from Carlyle to Horachek resulted in Tyler Bozak and the rest of the Leafs top line posting dramatically improved Corsi (5v5 tied CF%). Does this mean Bozak et al. suddenly got good or does it mean that Corsi is largely driven by playing style which one can change and thus the value of Corsi in player evaluation is greatly minimized.

Today I will look at the rest of the Leafs 5v5 Tied CF% from Carlyle to Horachek as well as three other coaching changes that occurred during the 2011-12 season. Those are Bruce Boudreau to Dale Hunter in Washington, Randy Carlyle to Bruce Boudreau in Anaheim and Terry Murray to Darryl Sutter in Los Angeles. Expanding the analysis to more players/teams will determine whether the Bozak-Corsi dilemma can be expanded to a more general Corsi-Coaching dilemma. First, lets summarize how the team 5v5 tied CF% changed due to these coaching changes.

Coaching Change CF% Pre CF% Post Difference
Kings – Murray to Sutter 50.7 58.5 7.8
Ducks – Carlyle to Boudreau 42.6 48.7 6.1
Leafs – Carlyle to Horachek 45.2 51.1 5.9
Capitals – Boudreau to Hunter 56.7 48.1 -8.6

The biggest positive impact was with the Kings while the Capitals change from Boudreau to Hunter had the biggest negative impact and the biggest change overall. All four coaching changes saw significant impact in the teams overall CF%. Let’s look at the teams in order listed above starting with the Kings.


Shown here are each players CF% under Murray (in Blue) and under Sutter (in Orange) with the difference shown in the grey bars. Shown are players with at least 50 5v5tied minutes under both coaches. The first four players saw their CF% jump by at least 10% and the next four by at least 7.5% and 11 of the 14 players saw their CF% jump at least 5%. Only Drew Dougty saw his drop but he already had a team best 59.3 CF% under Murray.

Here is the chart for the Ducks.


Every single player saw their CF% jump at least a little after the coaching change. Visnovsky’s and Getzlaf’s CF% jumped at least 10% while Perry, Sbisa and Lyudman jumped at least 7.5% and Selanne and Cogliano at least 5%.

Now for the Leafs.


JVR, Bozak, Rielly and Kessel saw at least a 10% boost in their CF% while Polak and Gardiner were at least 8%. No other player saw a jump of more than 3% and Komarov actually has a significant (10.8%) drop off.

Now, for a reversal of fortunes here is the Capitals chart.


For the Capitals every single player saw at least a drop of 3.9% (in fact only Wideman saw a drop off of less than 5%) with the first 6 guys seeing a drop of at least 10% and three more at least 9%.

In total there are 53 players in the charts above, 17 of them saw an absolute change of at least 10% while another 13 saw an absolute change of at least 7.5%. The average change was just shy of 8%. By looking at these four coaching changes it is safe to say that it is not unusual for a coaching change, or a change in playing style, to impact a players 5v5 CF% by 10% or more (nearly one third of the players above saw that big of a change). If a normal range for 5v5tied CF% is between 40% and 60% I think it is safe to suggest that half or more of that spread might be due to playing style and not individual talent. Furthermore there are almost certainly different playing styles on a single team (some lines certainly play more defenisve roles while others play more offensive roles) so even looking at CorsiRel stats might not factor out all coaching decisions. It certainly appears that Kessel-Bozak-JVR have seen a far more significant boost in their CF% relative to the rest of the team indicating that they likely change their playing style the most.

Above I looked at four coaching changes which had an average absolute impact of an 8% change on 5v5tied save percentage with nearly one third the players having an absolute change of greater than 10%. The majority of NHL players end a season with a 5v5tied CF% of between 40% and 60%. Based on the above analysis it is probably reasonable to believe that at least half of that spread can be attributed to variations in coaching/playing style which means the actual talent spread is probably no more than 45% to 55%, possibly even less.

Furthermore, I have previously shown that FF% (and thus likely CF%) loses predictive ability over longer periods of time at the team level. A significant reason for this is likely the higher number of coaching and roster changes that occur over a 4 or 5 year span. Every coaching change and every time a player changes teams (or even the line they play on) can potentially lead to a playing style change which could impact their CF% significantly. Of course, none of this should really come as much of a surprise as we already know playing style can have a major impact on CF% because we know all about score effects. On average a teams 5v5 CF% when they are leading is about 10% higher than their 5v5 CF% when they are. This 10% difference in CF% due solely to playing style dictated by the score lines up fairly well with what we have seen above where a 10% change in CF% due to a coaching change is not abnormal. The Corsi-Coaching Dilemma is real.

What all this means is that we need to consider playing style when we evaluate players because playing style can have a major impact on a players statistics. In fact, it may be the most important factor in a players Corsi statistics. This is something that we rarely do in analytics but failure to do so could result in a very flawed player evaluation. This is something the hockey analytics community really needs to address in future research.


The Bozak-Corsi Dilemma

 Uncategorized  Comments Off on The Bozak-Corsi Dilemma
Feb 222015

(Note: This is a cross post with You can find the original article here. I don’t normally cross post but this is relevant to Hockey Analytics as a whole, not mostly to Maple Leaf fans.)

A significant portion of modern hockey analytics revolves around Corsi (or SAT% as defined by the NHL), which is really nothing more than looking at which team takes more shot attempts. If you can out shoot your opponent, the theory is that it goes a long way to driving success in terms of out scoring your opponent and ultimately winning games. There is a lot of evidence to support the case that Corsi is a major component of on-ice success. While I believe many people put too much weight on Corsi statistics, I do accept that it is a major component of success.

Over the past few weeks, I have looked at the Leafs performance this season under Randy Carlyle and under Peter Horachek. First I looked at how zone start usage has change from Carlyle to Horachek and the impact of those changes on Corsi. Last week, I looked at a WOWY analysis of Tyler Bozak and David Booth to see if change in linemates from Carlyle to Horachek accounted for the changes in results. The conclusion from these posts is that a significant portion of the Leafs’ improved Corsi statistics is driven by the Leafs top line, and that outside of the top line not a lot has changed with respect to their Corsi statistics. To highlight the improvement in the Leafs top line, here are their 5v5tied CF% statistics under Carlyle and under Horachek.

Bozak CF% Kessel CF% JVR CF%
under Carlyle 38.4 41.0 39.0
under Horachek 53.6 52.0 55.0
Difference 15.2 11.0 16.0

Under Carlyle, the trio of Bozak, Kessel and JVR were pretty close to a league-worst Corsi line, with Bozak being the worst of the three. Under Horachek, they are well above the break even 50.0% line and have put up pretty good Corsi percentages. As far as Corsi is concerned, this trio went from downright awful to well above average. All it took was, I presume, a playing style change demanded by a new coach.

For several years it has been believed that Corsi is an important tool in evaluating players. It was a major component of what has driven the analytics community to conclude that Bozak is a poor hockey player. The evidence above suggests that a simple playing style change can drive Corsi from downright terrible to pretty good. This leads to a bit of a dilemma within hockey analytics, which I will call the Bozak-Corsi dilemma, with two serious questions that need to be answered:

  1. Is Bozak now a pretty good player?
  2. More importantly, if a player (or a forward line) can dramatically alter their Corsi overnight seemingly solely through changing playing style (driven by a coaching change), it must be concluded that Corsi is not primarily driven by individual player talent.

The first point will provide some angst within the Leafs fan base, but from my perspective the answer is no because of his (goal) WOWY’s, Points/60, IPP, etc. are also pretty weak, although maybe he isn’t as bad as previously thought if he plays an optimal playing style.

The second point is critically important, though, because it basically implies that Corsi has significantly less value (maybe little or no value) in individual player evaluation than previously thought, which should send ripples throughout the hockey analytics community. If Corsi is largely driven by playing style, one must conclude it isn’t an individual skill? It isn’t something I’d conclude based on three players, but it definitely makes you think about it more.


Feb 212015

I wasn’t actually planning on writing anything formal about the new enhanced hockey stats on but this post over at Jewels From The Crown was kind of the last straw.

Before getting into that article let me say a few things. Despite the fact that I run a popular hockey stats site I really wanted to see the NHL do a good job on their advanced hockey stats site. I honestly don’t see them as a competitor nor do I really care if they are because I make no money off the site and my interest lies as much in analysis and research as it does in producing and running a website. I also see the site being more geared to the average, more casual user while my site is geared more towards the hard core user and researcher. I love hockey, I love hockey statistics and hockey analytics, and I really would have loved to see the NHL do this right to bring this to a wider audience than I, or any of the other hockey stats sites, ever could. While I still have that hope my thoughts on this first attempt is that is a very poor effort that could have gone much better.

So, now, what set me on this bit of a tirade. Well, the post at Jewels From the Crown featured an interview with Chris Foster, NHL Director of Digital Business Development, and Gary Bettman. In it Sheng Peng asked about what ‘exclusive’ stats that offers over other sites such as mine. This was a portion of the answer.

Foster: I’ve got to double check. I’m not sure. There’s zone starts, I think those are completely brand-new. And the level of depth that we’re doing with primary and secondary assists. I don’t think anybody’s going to have that much detail. That first batch—shot attempts and unblocked shot attempts–there’s a lot of that. It’s that second batch of stats—primary assists and penalties drawn over time—those are the ones that will be more unique to the site. They may be out there but not to the level of depth that will be on

Zone starts?  Brand-new? Really? Zone starts have been around for years. Primary assists are exclusive to Really? I’ve had them on my site for years too. I even go a step further and look at primary points (goals + primary assists). Penalties drawn has been around elsewhere too. I’ll give the NHL the benefit of doubt and believe that they are actually oblivious to what else is being done out there because otherwise they are outright misleading and belittling the hard work that I and many others have done previously. Looking at their enhanced statistics site it is clear that they haven’t really put much thought into this whole project or reached out to the analytics community because they are tons of things that I think many would suggest they do differently. Here are a few examples:

  1. SAT and USAT are short for Shot ATtempts and Unblocked Shot ATtempts otherwise known as Corsi and Fenwick respectively. I am OK with the name change but for the NHL’s target audience it is absolutely unnecessary to use both. Even myself as an analytics person at times wonder why we have both Fenwick and Corsi. They are extremely highly correlated and the benefits of one over the other is generally very minimal. For the casual user it is completely unnecessary to burden them with these two separate stats. It would have been far better to simply use shot attempts (Corsi) and leave out the unblocked variety. Shot attempts are simple, straight forward, and easy to understand what they are.
  2. The Skater Shooting/Time on Ice stats have both /20 and /60 statistics which is redundant and pointless. One is just 3 times the other. Why one would see the need to present both side by side on the same page is beyond me. Let’s present a stat. Ok, now lets multiply it by three and present that too. Who thinks like that? Really? Who? Furthermore, when I started my site I used /20 stats because I figured a good player plays about 20 minutes per game. Other sites used /60 because a game is 60 minutes long and it tells how that caliber of player would produce in a full game. Both have merit but for the purpose of consistency across I have converted all my stats to /60. Had they reached out to me I’d have told them this and they may very well have done the smart thing and just present /60 stats.
  3. Having a stats site and not being able to filter based on games played or time on ice it practically useless. When I sort by SAT% I want to see who the best players are who play regular or semi-regular shifts. Instead the top of the list is dominated by AHL call-ups with one or two games that nobody has heard of and nobody cares about. Why do this?

There are numerous other smaller mistakes as well (see Eric Tulsky’s twitter time line for a few of them). It’s a shame really because I was hoping for and expecting for a whole lot better. I applaud the NHL for hopping on the ‘enhanced statistics’ bandwagon but what they released today screams of a poorly thought out beta release of a product developed by a group of amateurs, a long way from a major new product release from a multi-billion dollar organization (NHL) backed up by another multi-billion dollar organization (SAP) which they promoted it as being.

I really do hope that the NHL gets their act together and makes it work as I think it will be good for everyone. The NHL, the casual fan, and those in the hockey analytics community. We all benefit when the NHL does things well. We are, at the core, all hockey fans. My hope with this post is that it inspires the NHL to spend more time reaching out to the people that have been doing this for years. We have years of experience, knowledge and expertise that would have helped avoid many of the basic and senseless missteps we see today. If you are with the NHL and are reading this I want you to know that I more than willing to share my experience and I am certian most everyone in the hockey analytics community would as well. You just have to ask. My e-mail is

Stat Site Upgrades

 Uncategorized  Comments Off on Stat Site Upgrades
Feb 022015

Some of these have been announced on twitter but I have recently made some upgrades to and Here is a list of the upgrades.

New Situations:

  • Home and Road for 5v5 Tied, 5v5 Close, 5v5 Leading and 5v5 Trailing
  • 4v4
  • All Situations (all play)
  • All Power play (includes 5v4, 5v3, 4v3)
  • All Short handed (includes 4v5, 3v5, 3v4)

Multi-year stats with current season

  • 2013-15 (2yr), 2012-15 (3yr) and 2011-14 (4yr) stats have been added
  • Multi-year stats up to 2007-15 (8yr) will be added in the off season – too much data to update nightly.

 WOWY Zone Starts ( only)

  • WOWYs now include OZFO%, DZFO%, NZFO% and OZone% (tweaked UI a bit from initial release yesterday)
  • WOWYs also now /60 instead of /20 as were previously (now consistent with rest of site and
  • “Against You” stats now available for current season (currently only opponents with >15min TOI against but this will drop to 5 min. after update tonight)

Percent of Team Zone Starts ( only)

  • Now available is the percent of a teams (in games player played in only) offensive/defensive/neutral zone starts the player was on the ice for.

Various bug and data fixes

  • Fixed issue with Percent of Team stats for special team situations
  • Manually fixed a bunch of errors in shift tables over past 4 seasons (should improve reliability of data)
  • Probably some others I have forgotten about


That’s all for now. As usual, if you find any problems or have any more requests for enhancements let me know.


The Value of Outliers

 Uncategorized  Comments Off on The Value of Outliers
Jan 252015

Ryan Stimson has been doing some valuable work tracking passes and this morning he posted an interesting analysis of the data he (and others) have collected thus far. It is a very interesting article and definitely worth a read. It is a valuable contribution to shot quality research but the article created some twitter discussion regarding one of the techniques that Ryan used. In particular, when Stimson was looking at the correlation between two variables (i.e. passing ability vs shooting percentage) he noticed that there was often an outlier team and he would subsequently look at the correlation between the two variables while eliminating the outlier team. This technique of removing outliers generated a bit of a backlash on twitter from @garik16 as it did when I used this technique not long ago.

While I think that removing outliers has to be done with great caution and consideration it is also important to acknowledge that outlier analysis can be incredibly valuable tool in understanding what is going on. Teams aren’t built randomly and talent isn’t evenly distributed across the league. Talent differences across teams may result in different statistical patterns across teams. Different organizations have different philosophies on players and playing styles and this too may impact statistical patterns. As I have said before, we know that teams can manipulate statistical patterns by changing their playing style based on the score of the game (score effects are a well researched and fully accepted concept in hockey analytics) so it isn’t difficult to envision that various other statistical patterns could be altered by organizational or coaching philosophies. As statistical analysts we have to be open to this and not just apply a statistical model, crank out the results, and settle on hard and fast conclusions. We need to spend the time to understand the underlying data too.

I have spent a significant portion of my career working on air pollution research with some world-renowned scientists. Many years ago one not long after I finished University and just embarking on my career I was conducting some research on the relationship between weather patterns and air pollution. While doing this research a research scientist that I highly respect once told me that often the most interesting things can be learned when we study outliers. For this area of research typical weather patterns resulted in typical pollution levels but the study of outliers (atypical weather patterns) can really highlight the intricate relationship between weather patterns and air pollution.

Hockey isn’t baseball where there are a series of one-on-one battles that can be relatively easily incorporated into a statistical model because the only real factors involved are the talent levels of each player in the one-on-one battle. Unfortunately this isn’t how hockey works. Hockey is more like weather patterns where everything is interdependent on everything else and thus is very difficult to model. Sure, there are prevailing weather norms but occasionally outlier events happen like hurricanes or blizzards. It is these outliers that are the most interesting and most researched weather phenomena. Compared to a hurricane or a blizzard nobody really cares much about another 80F sunny day in Miami or a -5C January day in Ottawa. It’s just another day.

So, when I see someone suggest that you shouldn’t investigate how outliers affect underlying trends I get a bit defensive. If all you care about is what normally happens you’ll never truly understand the most interesting stuff. No NHL team strives to be ordinary, they strive to be elite and being elite, by definition, means being an outlier. If you want to be an outlier, you ought to do everything you can to understand what makes an outlier an outlier.

In one of Stimson’s charts he identified Chicago as the outlier team. Interestingly, I identified Chicago as an outlier team in my study on the relationship between Corsi and shooting percentage because they are one of the few teams that can post a good Corsi and an elevated shooting percentage. Furthermore, when it comes to elite NHL teams, Chicago would be front and center in the discussion. Is this a coincidence? Maybe. Or maybe it isn’t. It could be luck, it could be skill, or it could be organizational philosophy and/or coaching tactics but understanding why outliers exist is of critical importance. (Note: This is where I see the convergence of hockey analytics with traditional ‘hockey people’ like coaches and scouts. Analytics can identify trends and outliers to those trends and coaches and scouts can help assess the reason why those trends and outliers occur.)

Ultimately, for any NHL franchise who strives to be an elite team (which they all should) it means they are striving to be an outlier. Without understanding what make an outlier how can you expect to be one and you’ll only understand what makes an outlier by studying outliers independently from the underlying typical trend. This needs to be done with caution and care as to not just reinforce preconceived beliefs, but by not doing outlier analysis you are not fully understanding what is happening.


Jan 202015

On the weekend I posted an article looking at the relationship between Corsi and Shooting percentage and suggested that good Corsi teams are often poor Shooting Percentage teams and that there is generally a negative correlation between Corsi and Shooting percentage. This relationship seems to hold for most teams except for the elite teams or the truly bad teams. Yesterday over at I looked at this relationship just prior to, during, and just after the Randy Carlyle coaching era and it seemed to hold true (to some extent) for the Leafs during that time period.

These kind of relationships sometimes brings on a negative reaction among those familiar with Hockey Analytics and in particular those that believe strongly in possession and Corsi. I sometimes wonder why this is because we see this relationship occurring all the time with score effects and score effects is a well known and accepted concept in hockey analytics. Let’s recall what score effects are:

  • When a team is leading they will generally give up more shots and take fewer (resulting in a depressed Corsi) but generally the shots given up are of lower quality resulting in higher save percentage and the shots taken are of higher quality resulting in a higher shooting percentage.

So, due to some difference in playing style, when a team is leading they will see a drop in their Corsi and a boost in their shooting percentage. This is the exact same thing as the negative correlation I am observing in these articles. Why people find it hard to accept here but accept score effects is beyond me but some people have trouble with this. In any event, I want to take a look at how the relationship between Corsi (CF%) and shooting percentage has changed over the course of the season for the four teams that have made a coaching change thus far – Senators, Oilers, Devils and Leafs. Let’s look at these teams in reverse order and start with the Leafs first because I have already discussed them in the article and I’ll leave the Senators to last since they have the most interesting results. So, with that said, here is the 5v5 CF% vs Sh% chart for the Maple Leafs this season.


The black line indicates the time of the coaching change and what you see are the rolling averages over a 500 corsi event (for + against) sample. The correlation between these two is -0.20 so we do see a negative correlation. What we also see is that the Leafs CF% was actually rising under Carlyle prior to him being fired and the shooting percentage had already started falling off as well.

How about the New Jersey Devils?


The correlation between CF% and Sh% for the Devils is -0.38, or a fair bit stronger than for the Leafs. The Devils have been on a run of much improved shooting percentage recently but that has corresponded with the lowest CF% levels of the season. While Sh% seemed to be on the rise prior to the coaching change it did jump up a bit more after the coaching change though has dropped back the last little bit. Overall the highest shooting percentages on the season have occurred after the coaching change which is also when the Devils have had their worst CF%. Surprisingly, the Devils might be one of the worst possession teams in the league right now.

And the Oilers?


The negative correlation is quite strong here as the correlation coefficient is -0.795. Early in the season the Oilers had a low CF% and a higher shooting percentage which then reversed into a higher CF% and a lower shooting percentage before them both converged in the middle just prior to the coaching change. After the coaching change the Oilers CF% dropped to season lows while shooting percentage jumped back to early season highs (though it has fallen off in recent games).

For the Leafs, Devils and Oilers it is difficult to say that their coaching changes have had a major impact thus far (maybe for the Leafs but it is too early to tell) as it seems for all teams their post coaching change trends appear to have actually started just prior to the coaching change. Everything is different for the Senators.


Unlike the three other teams, the coaching change in Ottawa appears to have a significant positive impact as both their CF% and their shooting percentage has increased dramatically from where they were just prior to the coaching change. When you see stuff like this you really wonder if this is in fact one of those instances where the coach (in this case Paul MacLean) really did lose confidence of his players. The coaching change really did seem to have a positive impact on both CF% and shooting percentage. This surge in both CF% and shooting percentage means the two statistics are positively correlated over the course of the season with a correlation coefficient of 0.30.

In the future I’ll maybe take a look at a few other coaching changes from past seasons (i.e. Pittsburgh hiring Bylsma, Anaheim hiring Boudreau) to see how they looked and I might also take a look at save percentages as well. So far though all evidence points to the existence of a negative correlation between CF% and Sh% though there are also some exceptions to that rule like the Ottawa Senators after their coaching change.

Jan 172015

Shot quality as a talent at the team or player on-ice level has long been a topic of great debate and I outlined some of that debate in an article I wrote earlier in the week. For those who don’t believe that shot quality is a significant factor in performance put a lot of stock in possession metrics such as Fenwick or Corsi. These are shot attempt based metrics and as such ignore shot quality altogether. For those, like myself, who believe shot quality matters (at least for some teams and especially some players) I consider a possession based analysis a (potentially) incomplete analysis. Today I am going to put that debate aside and ask the question, is there any relationship between possession and shooting percentage?

To answer this question I took a look at CF% and CSh% (Corsi shooting percentage = GF/CF) for all 30 teams over the previous 3 seasons combined in 5v5close situations. When I plot these, here is what I get.


Ok, so while there seems to be some correlation it really isn’t all that significant. You might be inclined to end the investigation right here and conclude that there is no relationship but when you actually look at the data you will find that of the 10 best CSh% teams 8 of them are sub-50 CF% teams and of the 10 worst CSh% teams six are better than 50 CF% teams. The two top CSh% teams that have CF% above 50% are Chicago and Pittsburgh, two teams with elite level talent. The four bad CSh% teams that have a CF% below 50 are  Florida, Carolina, Minnesota and Buffalo. Of those teams, Florida, Carolina and Buffalo have combined for one playoff appearance in each of the past 3 seasons.

So, it appears that the teams that break the trend of good CSh% equals poor CF% and poor CSh% equals good CF% are the truly good or truly bad teams or, for better terminology, we could call them outlier teams. What if we attempted to remove the really good and really bad outlier teams from our analysis and focus on the teams that are more typical teams in terms of talent. To do this in an unbiased way I used GF% to rank teams and I removed the top 4 and bottom 4  GF% teams (8 total, or just over a quarter of the teams were removed). This is what the chart looks like now.


Now that looks better. R^2 has jumped from 0.09 to 0.47 and there is a clear negative relationship between possession and corsi shooting percentage. For the record the teams that were removed were Boston, Anaheim, Chicago, Pittsburgh, Calgary, Buffalo, Edmonton, and Florida.

For curiousity I took this one step further and removed the next two best (St. Louis, Detroit) and two worst (NY Islanders, Minnesota) teams and got the following chart.


Wow, R^2 jumps all the way to 0.77 which is a very strong correlation and indicates that for a large number of non-elite, non-terrible teams there is a strong negative correlation between possession and shooting percentage such that the difference between a 45% and a 55% possession team is 1.22% hit to CSh%. Considering last season the average team had about 2200 5v5close Corsi For events that would equate to a difference of about 27 goals. Considering the average NHL team had 90 5v5close goals last season, that is not an insignificant number.

How does the R^2 hold up for this season? Well, if we include all teams the R^2 is 0.00 or absolutely no correlation. If we delete the top 4 and bottom 4 GF% teams it improves to 0.097. If we drop the top 6 and bottom 6 it jumps to 0.26 and if we drop the top 7 and bottom 7 teams and just focus on the middle 16 the R^2 jumps up to 0.35. Now these correlations are not near as good as the 3-year analysis above but remember that our sample sizes are significantly smaller too (~43-45 games compared to 212 games). The general trend still continues. If we remove the really good and really bad outlier teams there appears to be a relatively strong negative relationship between possession and shooting percentage.

Now that we have identified a relationship, on thing we can do is look at how teams have changed from last season to this season. Let’s take the Edmonton Oilers as an example since they have improved their 5v5close CF% quite significantly this season but they are not an improved team. Let’s look at their numbers from last season and this season.

. CF% CSh%
2014-15 48.7 3.49
2013-14 43.4 4.38
Difference 5.3 -0.89

So, their 5v5 CSh% has improved from 43.4% to 48.7%. If we plug that 5.3% improvement into the regression equation above we would expect that their CSh% would drop 0.65% where it actually dropped 0.89%. Edmonton dropped from 11th in CSh% last season to 27th this season.

A couple of months ago I investigated the relationship between Corsi Against rates and save percentage and found that there does appear to be a relationship such that an increase in corsi against would result in a improved save percentage. This is completely consistent with the analysis above which one could infer that an increase in shot attempts correlates with a decrease in shooting percentage.

It is difficult to say whether these correlations are due to systems or talent but I have a couple theories.

  1. Good possession teams play in the offensive zone more frequently and the defensive zone less frequently. This could result in a shot type bias away from higher quality “rush shots” and towards lower quality zone play shots.
  2.  It could be related to style of play and passing. It has been shown that shots after passes are more likely to result in goals and lateral movement, especially passes, across the “Royal Road” down the center of the ice also result in more goals. My theory is passing, and in particular passing through the center of the ice, while more likely to result in a goal is also more likely to result in a turnover. Thus teams that take riskier, longer passes especially lateral passes are more likely to see plays result in a goal if successful or a turnover (and no shot from that possession) if unsuccessful. Conversely a more conservative passing team with fewer cross-ice passes through traffic would have fewer possession not result in shots but in turn not get rewarded with high quality shots that result from those risky cross-ice plays.

In conclusion, if you have exceptional talent such as Pittsburgh with Crosby and Malkin or Chicago with Kane and Toews  or exceptional depth like Boston or Detroit you might be able to be a good possession and a good shooting percentage team but if you are not one of the truly elite teams in the league it seems you likely have to choose one or the other. Unless of course you are Buffalo and you are terrible at both.


Update: Tyler Dellow, in one of his few hockey related tweets since being hired from the Oilers, tweeted the following:

Tyler is right. Things fall apart for earlier seasons. Let’s look at this in more detail by looking at R^2 between CF% and CSh% for individual seasons for all teams, middle 26, 22, and 18 GF% teams. Here is what we have:


All of the above relationships are negative relationships meaning improved CF% led to decreased CSh% so it is very difficult to argue that this relationship isn’t real. More shots tends to mean lower shot quality.

Additionally, for 5 of the 7 seasons the middle 22 are better than the middle 26 which is better than all 30 teams (only 2007-08 and 2010-11 do not fit) and of those 5 seasons, four of them also have the middle 18 teams being better than the middle 22 (only 2011-12 is worse). This implies that there may be a few truly elite teams that can post a good CF% and a good CSh% and a few truly terrible teams that put up bad CF% and bad CSh% but for the mass of teams in the middle the trend holds.

Finally, the strongest relationships have occurred during the previous few seasons after removing the outlier teams from the sample and from above 2014-15 appears to following that trend as well. It is difficult to say why this is but it is an interesting observation. One has to wonder if it has anything to do with teams becoming more aware of and putting more focus on possession which in turn is strengthening the negative correlation with shooting percentage.


Jan 122015

Let me start off by first saying that this isn’t going to be a research post as much as it will be a commentary on the past, present and future of shot quality research.

The History

I have had more than a few battles on shot quality so I feel I have a more than decent understanding on the subject. As outlined in this post by Michael Schuckers there are two aspects of shot quality. These are, the quality of an individual shot and the average shot quality of all shots taken by a team or a player when on the ice.

Individual Shot Probability does matter and this has been illustrated time and time again.  There’s no doubt about it.  The most recent example is the distance analysis by Michael Parkatti,  Different shots have different probabilities of going in and there are plenty of factors that influence these probabilities.  These include x and y coordinates as well as the type of shot matter.  Here are some heat maps to emphasize that.

What has not been shown to matter much, to my mind, is Average Shot Probability (ASP), either for shots that a goalie has faced or that a team has faced or that a team has generated over a long period of time.  It might be there but the consensus (yes, David, I see your hand is up)  is that it is not.  I’ve tried to look for it.  It, ASP, matters but not a ton.  I’ve got plans to take another look at it again this winter.  But there is little denying that where we are right now is that we lack evidence for the value of long term repeatable ASP.  Somewhere there’s a fourteen year old kid with mad R skills and a great idea on how to model these data and, perhaps, they’ll find that shot quality exists.  It’s just that right now we don’t have enough evidence for it.

Yes, the “David” with his hand up is me. I have always claimed that shot quality exists in the Average Shot Probability sense of the word. I believed it back then and I believe it now. The reason I believe it is the data supports it as there is clearly a difference between the on-ice shooting percentages of the players at the top of this list and the players at the bottom and the players at the top are mostly who we all consider as the elite offensive players in the league and the ones at the bottom are mostly 3rd and 4th line players. This isn’t coincidence or randomness and is the strongest evidence in support of shot quality we have. There are even teams that have consistently posted above or below average shooting percentages. Shot quality in the Average Shot Probability sense exists and we must acknowledge that.

A number of people have analyzed shot location or shot distance data (including Shuckers, myself and numerous others) and have found relatively little indication that shot location varies across teams in a significant enough way to have a significant impact on shooting percentage. This does not mean that Average Shot Probability does not exist (which Shuckers implied was the consensus) but rather that that Average Shot Probability is not significantly influenced by variations in average shot locations. There is an important difference and the latter does not mean that Average Shot Probability does not exist (I think this is the crux of many of my shot quality debates in the past like those with Gabe Desjardins).

One of my favourite articles on this subject is one written by Tom Awad in his “What makes good players good” series of posts. If you haven’t read this article I recommend you go read it now. It is probably the best article written on shot quality even though it isn’t explicitly about that. The most important thing to note is the last table which I will reproduce here:

Group	  +/- due to finishing +/- due to shot quality +/- due to outshooting
1st tier  0.22	               0.04	               0.15
2nd tier  0.07	               0.02	               0.10
3rd tier  0.00	               0.01	              -0.06
4th tier -0.20	              -0.04	              -0.15

In this table ‘finishing’ is essentially having a better shooting percentage than your opponents, ‘shot quality’ is having a better average shot location and outshooting is as it sounds, out shooting your opponents. The greatest spread in talent between first tier players and fourth tier players is being able to out finish your opponents followed closely by outshooting your opponents. Having a better average shot location is a relatively minor factor in what makes good players good. A key takeaway is average shot location has relatively small impact on average shot probability which is consistent with what everyone has found. This is the “consensus” that Shuckers is talking about.

Recent Developments

War-on-ice has recently come up with a new definition for a scoring chance and added the results to their statistical database. The definition starts with the notion of “danger zones” which are areas surrounding the zone in front of the goal (similar to the “home plate” definition) with additional adjustments for rebound shots and rush shots (which are based of my work from this past summer). Their formal definition of a scoring chance is as follows:

  • In the low danger zone, unblocked rebounds and rush shots only.
  • In the medium danger zone, all unblocked shots.
  • In the high danger zone, all shot attempts (since blocked shots taken here may be more representative of more “wide-open nets”, though we don’t know this for sure.)

This definition, while likely an improvement over anything we have had previously (and there is evidence to support that), it still significantly dependent on shot location and based on the history of not being able to find much of a link between shot location and shot quality I have concerns about it. In particular, are we watering down the definition of a scoring chance by relying too much on location? Might we get better results by looking at just rush and rebound shots? Until I see a formal analysis that shows that shot location is a major factor in Average Shot Probability at either the team or player level I have my doubts that using shot location when defining what a quality scoring chance is is beneficial (and may in fact by harmful by diluting the defnition).

Some other really interesting work being done recently is by former NHL goalie Stephen Valiquette where he is identifying higher quality shots as being those that (for the most part) result from plays with significant lateral movement. In particular he defines the “Royal Road” as the line down the middle of the ice from one end to the other and when the puck moves laterally across this line either by a pass or being skated across immediately before a shot is taken the shot is more likely to result in a goal. To me this makes a lot of sense and I think is really where the next great leap in shot quality analysis will come from. Speed of the play (i.e. rush shots) and lateral puck movement are likely the largest contributing factors to shot quality.

In support of the idea that puck movement is a significant factor in shot quality a couple of years ago I looked at the relative impact a player can have on his linemates shooting percentages and found that many of the best players at boosting line mate shooting percentage are excellent playmakers.

The Future

The challenge with Valiquette’s “Royal Road” work is that it currently requires a lot of manual tracking of the data which is time consuming and has the potential to bring human error into the analysis. Furthermore it also doesn’t account for speed of the play which may also be a significant factor in shot quality as it limits the goalies reaction time. While I believe Valiquette’s work is a significant step forward in our understanding of the game the holy grail of shot quality research will be when the NHL introduces player and puck tracking technology. When we get this data we will be able to dig far deeper into shot quality research and allow us to define shot quality in far greater detail. Everything that has been done up until now will pale in comparison to what we’ll be able to do with automated player and puck tracking data.