Jul 152014
 

Before I get into rush shots of individual players I am going to look at some teams. I am starting with the Columbus Blue Jackets which was suggested for me to look at by Jeff Townsend who was interested to see impact the decline of Steve Mason and then the transition to Bobrovsky had. Before we get to that though, let’s first look at the offensive side of things (and if you haven’t read my introductory pieces on rush shots read them here, here and here).

ColumbusRushShPct

The League data is league average over the past 7 seasons.

There is a lot of randomness happening here, particularly the rush shot shooting percentages. This could be due to randomness as sample size for single season 5v5 road data is getting pretty small, particularly for rush shot data. Having looked at a number of these charts I think sample size is definitely going to be an issue. They key will be looking for trends above and beyond the variability.

Now for save percentages.

ColumbusRushSvPct

This chart is definitely a little more stable. Steve Mason’s excellent rookie season was 2008-09 where he actually had a below average non-rush 5v5road save percentage but an above average rush save percentage. Columbus never again posted a rush save percentage anywhere close to league average until this past season. Interestingly, despite Bobrovsky’s good season in 2012-13 his 5v5road save percentage that year was somewhat average (at home it was outstanding though which just goes to show you how variable these things can be).

Let’s take a look at the percentage of shots that were rush shots for and against.

ColumbusRushPct

Not really sure what to read into that, but I thought I toss it out there for you.

Something that I haven’t looked at before is PDO which is the sum of shooting and save percentages. There is no reason we can’t do this for rush and non-rush shots so here is what it looks like for Columbus.

ColumbusRushPDO

Again, I am not sure what we can read into this PDO table. PDO is kind of an odd stat in my opinion. PDO typically gets used as a “luck” metric which it can be if it deviates from 100.0% significantly which is certainly the case for a couple of seasons of Rush Shot PDO.

I am still trying to figure out how useful any of this rush/non-rush information is. Certainly I think we hit some serious sample size issues when looking at a single seasons worth of road-only data and I think that puts some of the usefulness in question. I have done some year over year correlations and truthfully they aren’t very good. I think that is largely sample size related but I still think playing style and roster turnover can have significant impacts too. All that said, there is a clear difference between the difficulty of rush and non-rush shots and teams that can maximize the number of rush shots they take and minimize the number of rush shots against will be better off.

 

Jul 022014
 

The other day I looked at the effect that Mike Weaver and Bryce Salvador had on their teams save percentage (if you haven’t read it, definitely go give it a read) when they were on the ice versus when they weren’t on the ice. Today I am going to take a look at the Maple Leaf defensemen to see if there are any interesting trends to spot. We’ll start with the new acquisitions.

Stephane Robidas

RobidasOnOffSavePct

(Blue line above orange is good in these charts, opposite is not good)

Aside from 2008-09 he has had a negative impact on his team save percentage. In 2007-08, 2009-10 and 2010-11 his main defense partner was Nicklas Grossman but in 2008-09 his main defense partner was Trevor Daley. Did this have anything to do with his poor effect on save percentage in 2008-09? Well, aside from last season Daley’s on-ice save percentage has been at or better than the team save percentage so there might be something to that.

Roman Polak

PolakOnOffSavePct

Not really a lot happening there except in 2011-12 when he was worse than the team (and the team had significantly better goaltending). Rembember though, the Blues have a pretty good defense so it is quite possible that not being worse than the rest of them is a good thing. Will be interesting to see how he does in a Leaf jersey this season.

Dion Phaneuf

PhaneufOnOffSavePct

Aside from 2008-09 there has been a slight positive impact on save percentage when he is on the ice. In 2008-09 he didn’t have a regular defense partner. At 5v5 he played a total of 1348:08 in ice time and his main defense partners were Giordano (364:56), Vandermeer (342:47), Pardy (304:27), Leopold (163:47), Regehr (85:08) and Sarich (77:41). That variety in defense partners can’t be a good thing. But, maybe Phaneuf has a slight positive impact on save percentage.

Cody Franson

FransonOnOffSavePct

So, he was good for a few years and then he was bad. What happened? Well, he was traded to the Leafs. For the 2009-10 and 2010-11 seasons his main defense partner was Shane O’Brien and he also spent significant time with Hamhuis. This could be a case of him playing “protected” minutes as he had really easy offensive QoC but I generally don’t think QoC has anything near as significant an impact as other factors so I am not sure what is going on. He has had pretty weak QoC the last couple seasons too so who knows.

Jake Gardiner

GardinerOnOffSavePct

It is only 3 seasons of data but so far so good for Gardiner. He has been a boost to the teams save percentage and that is on top of his good possession numbers. In my opinion, Gardiner is quite likely the best defenseman. I’ll drop the “quite likely” from that statement when he repeats his success but against tougher QoC as that will remove any doubt.

Now, let’s take a look at a couple of departing Leaf defensemen.

Carl Gunnarsson

GunnarssonOnOffSavePct

Save for 2010-11 Leaf save percentage has been better whith Gunnarsson on the ice. His two main defense partners that year were Luke Schenn and Mike Komisarek so maybe we can forgive him. In 2009-10 his defense partner was mainly Beauchemin or Kaberle and starting in 1011-12 it has mainly been Phaneuf.

Tim Gleason

GleasonOnOffSavePct

Tim Gleason gets a lot of criticism from Leaf fans, the analytics community, and maybe pretty much everyone but his teams have generally had a positive boost in save % when he is on the ice and in some cases a significant boost.

Based on the loss of Gunnarsson and Gleason, two defenseman who seem to be able to boost on-ice save percentage, and the addition of Robidas who has a negative impact and Polak who has more neutral impact it is quite possible the Leafs suffer a drop off in save percentage this season.

That said, I am not certain what to make of the impact we see and why they occur. Of the 9 defenseman I have presented charts for the past few days (the 7 above as well as Weaver and Salvador in my previous post) it seems that the majority of them have all but one or two of their seasons consistently boosting or inhibiting their teams save percentage. More investigation is needed as to why but I am becoming fairly confident that this is a repeatable talent. There is just too much consistency to consider it purely random.

 

Sep 212013
 

In a series of recent posts at mc79hockey.com, Tyler Dellow discussed a new concept (to me anyway) that he called ‘open play’ hockey. In a post on “The Theory of the Application of Corsi%” he wrote:

I have my own calculation that I do of what I call an open play Corsi%. I wipe out the faceoff effects based on some math that I’ve done as to how long they persist and look just at what happened during the time in which there wasn’t a faceoff effect.

This sounds strangely similar to my zone start adjusted statistics where I eliminate the first 10 seconds after an offensive or defensive zone face off as I have found that beyond that the effect of the face off is largely dissipated. I was curious as to how in fact these were calculated and it seemed I wasn’t the only one.

As far as I can tell, the tweet went unanswered.

In a followup post “New Metrics I” the concept of open play hockey was mentioned again.

I’m calculating what I call an open play Corsi% – basically, I knock out the stuff after faceoffs and then the stuff I’m left with, theoretically, doesn’t have any faceoff effects. It’s just guys playing hockey.

In the comments I asked if he could define more precisely what “stuff after faceoffs” meant but the question went unanswered. Dellow has subsequently referenced open play hockey in his New Metrics 2 post and in a follow up post answering questions about these new metrics. What still hasn’t been explained though is how he actually determines “open play” hockey.

Doing a search on Dellow’s website for “open play” we find that this concept has been mentions a couple times previously. In a post titled Big Oilers Data IX: Neutral Zone Faceoff Wins we might get an answer to exactly what ‘open play’ actually is.

As those of you who have been reading this series as I’ve gone along will be aware, I’ve been kind of looking at things on the basis of eight different kinds of 5v5 shift: Open Play (no faceoff during shift), six types of shift with one faceoff (OZ+, OZ-, NZ+, NZ-, DZ+, DZ-) and multi-faceoff shifts. The cool thing with seven of those types of shift is that I can get a benchmark of a type by looking at how the Oilers opposition did in the same situation.

So, as best I can determine, open play is basically any shift that doesn’t have  a face off.

The next question I’d like to answer is, how different is ‘open play’ from my 10 second adjustment. This is an interesting question because I have had this debate with many people that suggest that my 10 second adjustment isn’t adequate and that zone start effects are far more significant than my 10 second adjustment suggests. I have even had debates with Tyler Dellow about this (See here, here and here) so I am really curious as to what impact open play hockey has on a players statistics. Unfortunately, I don’t have much ‘open play’ data to go with but in the posts that Dellow has discussed it he has mentioned a few players open play corsi% statistics so I will work with what I have. Here is a comparison of Dellow’s open play stats and my 10-second zone start adjusted stats.

Player Year OpenPlay Corsi% ZSAdj CF% OZ% DZ%
Fraser 2012-13 50.8% 50.4% 40.1 25.3
Fraser 2011-12 52.8% 53.2% 31.1 35.5
Fraser 2010-11 45.2% 42.2% 30.4 35.1
Fraser 2009-10 59.2% 57.7% 29.2 40.5
Fraser 2008-09 51.8% 52.6% 30.9 37
O’Sullivan 2011-12 44.3% 42.0% 35.7 26
O’Sullivan 2010-11 45.2% 45.6% 29.4 34
O’Sullivan 2009-10 43.9% 44.1% 31 32.2
O’Sullivan 2007-08 45.5% 46.5% 29.9 29.4
Eager 2012-13 34.4% 35.6% 40.5 32.8
Eager 2011-12 42.0% 43.0% 29.6 30.7
Eager 2009-10 54.4% 54.5% 18.3 39.1
Eager 2008-09 52.9% 53.9% 22.6 37.4

I have incldued OZ% and DZ% which is the percentage of face offs (including neutral zone face offs) that the player had in the offensive and defensive zone. These statistics along with ZSAdj CF% can be found on stats.hockeyanalysis.com.

If it isn’t obvious to you that there isn’t much difference between the two, let me make it more obvious by looking at this in graphical form.

OpenPlayvsZSAdjustedCorsiPct

That’s a pretty tight correlation and we are dealing with some player seasons that have had fairly significant zone start biases. Ben Eager had a very significant defensive zone start bias in both 2008-09 and 2009-10 but a sizable offensive zone bias in 2012-13. Colin Fraser had sizable defensive zone bias in 2009-10 but a sizable offensive zone bias in 2012-13. Patrick O’Sullivan had a heavy offensive zone bias in 2011-12. There is no compelling evidence here that ‘open play’ statistics are any more reliable or better than my 10-second zone start adjusted data. There is essentially no difference which reaffirms to me (yet again) that my 10-second adjustment is a perfectly reasonable method to adjust for zone starts which ultimately tells us that zone starts do not have a huge impact on a players statistics. Certainly not anywhere close to what many once believed, including Dellow himself. Any impact you see is more likely due to the quality of players one plays with if one gets a significant number of defensive zone starts.

Update: For Tyler Dellow’s response, or lack there of, read this.  Best I can tell is he doesn’t want to publicly say what open play is or how it shows zone starts affect players stats beyond my 10-second adjustment because I might interpret what he says as thinking I am right despite him clearly thinking the evidence proves me wrong. I guess rather than have me make a fool of myself by misinterpreting his results so I can believe I am right he is going to withhold the evidence from everyone. I feel so touched that Dellow would choose to save me from such embarrassment as misinterpreting results over letting everyone know the real effect of zone starts have on a players statistics and why ‘open play’ is what we should be using to negate the effect of zone starts. Truthfully though, I am willing to take the risk  of embarrassing myself if it furthers our knowledge of hockey statistics.

 

Related Articles:

Face offs and zone starts, is one more important than the other?

Tips for using Hockey Fancy Stats

 

 

Jul 102013
 

One of the complaints against advanced statistics in hockey is the names of some of the advanced statistics. Sometimes people complain about names like Corsi, Fenwick, PDO, etc. because they don’t have meaningful names. I never really understood it because once you figure it out, which honestly it isn’t that difficult, it isn’t all that difficult. That said, it still seems that some people feel it is a bit of a hurdle for some to get into advanced hockey statistics. I am hoping to revamp and improve my hockey statistics database even more this summer and in the process I wondered if there is interest in having me use some standardized hockey statistics nomenclature that we can all agree on. Here is what I am proposing:

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SOG Shots on goal
SAG Shots at goal (includes missed shots)
ASAG Attempted Shots at Goal (includes missed and blocked shots)
Percentage Statistics
Sh% Shooting percentage (G/SoG)
SAGSh% Shots at goal shooting percentage (G/SaG)
ASAGSh% Attempted Shots at Goal Shooting percentage (G/aSaG)
Sv% Save percentage (G/SoG)
SAGSv% Shots at goal save percentage (G/SaG)
ASAGSv% Attempted Shots at Goal Save percentage (G/aSaG)
ShSv% Shooting percentage + save percentage (Sh% + Sv%)
SAGShSv% Shots at goal shooting percentage + save percentage (SAGSh% + SAGSv%)
ASAGShSv% Attempted Shots at goal shooting percentage + save percentage (ASAGSh% + ASAGSv%)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISOGP Individual Shots on Goal Percentage (iSOG / SOGF)
IASAGP Individual Shots at Goal Percentage (iSAG / SAGF)
IASAGP Individual Attempted Shots at Goal Percentage (iASAG / ASAGF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
Prefix
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
Suffix
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics

The major changes are instead of calling shots + missed shots fenwick events we call them Shots At Goal (SAG) and instead of calling shots + missed shots + blocked shots corsi events we call them Attempted Shots At Goal (ASAG). Also PDO which is shooting percentage + save percentage is now named ShSv%.

The prefixes and suffixes can be added to individual stats to create new statistics. For example:

  • iSh% = Individual Shooting Percentage (iG / iSOG)
  • TMSAG20 = Team mate average Shots at Goal per 20 minutes of ice time weighted by TOI with
  • OppGF% = Opponent average Goals For Percentage weighted by time on ice against
  • PctTmG = In games that the player played in, the percentage of his teams goals that the player himself scored.

Note that not all combinations of prefixes and suffixes make sense. For example, PctTmSh% or Sh%F but that is self explanatory I think.

What does everyone think? I am perfectly fine sticking with the way I have statistics currently presented but if the majority think something along the lines of the above is better I am all for making the change. If anyone has any other suggestions they are welcome as well. I just think that this is as good a time as any to come up with some standardized nomenclature.

Also, I currently have statistics for the following situations:

  • 5v5
  • 5v5 Home
  • 5v5 Road
  • 5v5 Close
  • 5v5 Tied
  • 5v5 Up1
  • 5v5 Up 2+
  • 5v5 Down 1
  • 5v5 Down 2+
  • 5v5 Leading
  • 5v5 Trailing
  • 5v4 PP
  • 4v5 SH
  • Zone start adjusted data for all of the above except 5v4 SH and 4v5 SH.

If there is interest I may consider adding other situations. For example, first period, second period, third period, 4v4, 5v5 close home and 5v5 close road. Would anyone find these or any other situation interesting to look at?

Also feel free to consider the comments of this post the place where you can officially make any other suggestions of upgrades/enhancements you would like to see made to stats.hockeyanalysis.com. I can’t make any promises I will do implement them but I hope to make some upgrades over the summer.

Update:  Added ‘D’ to the suffix list which stands for differential. So ASAGD would stand for Attempted Shots At Goal Differential which is the equivalent of corsi differential in use now. Might consider adding Rel but need to consider if it is necessary or not. Thoughts?

 

Apr 112013
 

Every now and again someone asks me how I calculate HARO, HARD and HART ratings that you can find on stats.hockeyanalysis.com and it is at that point I realize that I don’t have an up to date description of how they are calculated so today I endeavor to write one.

First, let me define HARO, HARD and HART.

HARO – Hockey Analysis Rating Offense
HARD – Hockey Analysis Rating Defense
HART – Hockey Analysis Rating Total

So my goal when creating then was to create an offensive defensive and overall total rating for each and every player. Now, here is a step by step guide as to how they are calculated.

Calculate WOWY’s and AYNAY’s

The first step is to calculate WOWY’s (With Or Without You) and AYNAY’s (Against You or Not Against You). You can find goal and corsi WOWY’s and AYNAY’s on stats.hockeyanalysis.com for every player for 5v5, 5v5 ZS adjusted and 5v5 close zone start adjusted situations but I calculate them for every situation you see on stats.hockeyanalysis.com and for shots and fenwick as well but they don’t get posted because it amounts to a massive amounts of data.

(Distraction: 800 players playing against 800 other players means 640,000 data points for each TOI, GF20, GA20, SF20, SA20, FF20, FA20, CF20, CA20 when players are playing against each other and separate of each other per season and situation, or about 17.28 million data points for AYNAY’s for a single season per situation. Now consider when I do my 5 year ratings there are more like 1600 players generating more than 60 million datapoints.)

Calculate TMGF20, TMGA20, OppGF20, OppGA20

What we need the WOWY’s for is to calculate TMGF20 (a TOI with weighted average GF20 of the players teammates when his team mates are not playing with him), TMGA20 (a TOI with weighted average GA20 of the players teammates when his team mates are not playing with him), OppGF20 (a TOI against weighted average GF20 of the players opponents when his opponents are not playing against him) and OppGA20 (a TOI against weighted average GA20 of the players opponents when his opponents are not playing against him).

So, let’s take a look at Alexander Steen’s 5v5 WOWY’s for 2011-12 to look at how TMGF20 is calculated. The columns we are interested in are the Teammate when apart TOI and GF20 columns which I will call TWA_TOI and TWA_GF20. TMGF20 is simply a TWA_TOI (teammate while apart time on ice) weighted average of TWA_GF20. This gives us a good indication of how Steen’s teammates perform offensively when they are not playing with Steen.

TMGA20 is calculated the same way but using TWA_GA20 instead of TWA_GF20. OppGF20 is calculated in a similar manner except using OWA_GF20 (Opponent while apart GF20) and OWA_TOI while OppGA20 uses OWA_GA20.

The reason why I use while not playing with/against data is because I don’t want to have the talent level of the player we are evaluating influencing his own QoT and QoC metrics (which is essentially what TMGF20, TMGA20, OppGF20, OppGA20 are).

Calculate first iteration of HARO and HARD

The first iteration of HARO and HARD are simple. I first calculate an estimated GF20 and an estimated GA20 based on the players teammates and opposition.

ExpGF20 = (TMGF20 + OppGA20)/2
ExpGA20 = (TMGA20 + OppGF20)/2

Then I calculate HARO and HARD as a percentage improvement:

HARO(1st iteration) = 100*(GF20-ExpGF20) / ExpGF20
HARD(1st iteration) = 100*(ExpGA20 – GA20) / ExpGA20

So, a HARO of 20 would mean that when the player is on the goal rate of his team is 20% higher than one would expect based on how his teammates and opponents performed during time when the player is not on the ice with/against them. Similarly, a HARD of 20 would mean the goals against rate of his team is 20% better (lower) than expected.

(Note: The OppGA20 that gets used is from the complimentary situation. For 5v5 this means the opposition situation is also 5v5 but when calculating a rating for 5v5 leading the opposition situation is 5v5 trailing so OppGF20 would be OppGF20 calculated from 5v5 trailing data).

Now for a second iteration

The first iteration used GF20 and GA20 stats which is a good start but after the first iteration we have teammate and opponent corrected evaluations of every player which means we have better data about the quality of teammates and opponents the player has. This is where things get a little more complicated because I need to calculate a QoT and QoC metric based on the first iteration HARO and HARD values and then I need to convert that into a GF20 and GA20 equivalent number so I can compare the players GF20 and GA20 to.

To do this I calculate a TMHARO rating which is a TWA_TOI weighted average of first iteration HARO. TMHARD and OppHARO and OppHARD are calculated in a similar manner. TMHARD, OppHARO and OppHARD are similarly calculated. Now I need to convert these to GF20 and GA20 based stats so I do that by multiplying by league average GF20 (LAGF20) and league average GA20 (LAGA20) and from here I can calculated expected GF20 and expected GA20.

ExpGF20(2nd iteration) = (TMHARO*LAGF20 + OppHARD*LAGA20)/2
ExpGA20(2nd iteration) = (TMHARD*LAGA20 + OppHARD*LAGF20)/2

From there we can get a second iteration of HARO and HARD.

HARO(2nd iteration) = 100*(GF20-ExpGF20) / ExpGF20
HARD(2nd iteration) = 100*(ExpGA20 – GA20) / ExpGA20

Now we iterate again and again…

Now we repeat the above step over and over again using the previous iterations HARO and HARD values at every step.

Now calculate HART

Once we have done enough iterations we can calculate HART from the final iterations HARO and HARD values.

HART = (HARO + HARD) /2

Now do the same for Shot, Fenwick and Corsi data

The above is for goal ratings but I have Shot, Fenwick and Corsi ratings as well and these can be calculated in the exact same way except using SF20, SA20, FF20, FA20, CF20 and CA20.

What about goalies?

Goalies are a little unique in that they only really play the defensive side of the game. For this reason I do not include goalies in calculating TMGF20 and OppGF20. For shot, fenwick and corsi I do not include the goalies on the defensive side of things either as I assume a goalie will not influence shots against (though this may not be entirely true as some goalies may be better at controlling rebounds and thus secondary shots but I’ll assume this is a minimal effect if it does exist). The result of this is goalies do have a HARD rating but no HARO, or shot/fenwick/corsi based HARD or HARO rating.

I hope this helps explain how my hockey analysis ratings are calculated but if you have any followup questions feel free to ask them in the comments.