Sep 212013
 

In a series of recent posts at mc79hockey.com, Tyler Dellow discussed a new concept (to me anyway) that he called ‘open play’ hockey. In a post on “The Theory of the Application of Corsi%” he wrote:

I have my own calculation that I do of what I call an open play Corsi%. I wipe out the faceoff effects based on some math that I’ve done as to how long they persist and look just at what happened during the time in which there wasn’t a faceoff effect.

This sounds strangely similar to my zone start adjusted statistics where I eliminate the first 10 seconds after an offensive or defensive zone face off as I have found that beyond that the effect of the face off is largely dissipated. I was curious as to how in fact these were calculated and it seemed I wasn’t the only one.

As far as I can tell, the tweet went unanswered.

In a followup post “New Metrics I” the concept of open play hockey was mentioned again.

I’m calculating what I call an open play Corsi% – basically, I knock out the stuff after faceoffs and then the stuff I’m left with, theoretically, doesn’t have any faceoff effects. It’s just guys playing hockey.

In the comments I asked if he could define more precisely what “stuff after faceoffs” meant but the question went unanswered. Dellow has subsequently referenced open play hockey in his New Metrics 2 post and in a follow up post answering questions about these new metrics. What still hasn’t been explained though is how he actually determines “open play” hockey.

Doing a search on Dellow’s website for “open play” we find that this concept has been mentions a couple times previously. In a post titled Big Oilers Data IX: Neutral Zone Faceoff Wins we might get an answer to exactly what ‘open play’ actually is.

As those of you who have been reading this series as I’ve gone along will be aware, I’ve been kind of looking at things on the basis of eight different kinds of 5v5 shift: Open Play (no faceoff during shift), six types of shift with one faceoff (OZ+, OZ-, NZ+, NZ-, DZ+, DZ-) and multi-faceoff shifts. The cool thing with seven of those types of shift is that I can get a benchmark of a type by looking at how the Oilers opposition did in the same situation.

So, as best I can determine, open play is basically any shift that doesn’t have  a face off.

The next question I’d like to answer is, how different is ‘open play’ from my 10 second adjustment. This is an interesting question because I have had this debate with many people that suggest that my 10 second adjustment isn’t adequate and that zone start effects are far more significant than my 10 second adjustment suggests. I have even had debates with Tyler Dellow about this (See here, here and here) so I am really curious as to what impact open play hockey has on a players statistics. Unfortunately, I don’t have much ‘open play’ data to go with but in the posts that Dellow has discussed it he has mentioned a few players open play corsi% statistics so I will work with what I have. Here is a comparison of Dellow’s open play stats and my 10-second zone start adjusted stats.

Player Year OpenPlay Corsi% ZSAdj CF% OZ% DZ%
Fraser 2012-13 50.8% 50.4% 40.1 25.3
Fraser 2011-12 52.8% 53.2% 31.1 35.5
Fraser 2010-11 45.2% 42.2% 30.4 35.1
Fraser 2009-10 59.2% 57.7% 29.2 40.5
Fraser 2008-09 51.8% 52.6% 30.9 37
O’Sullivan 2011-12 44.3% 42.0% 35.7 26
O’Sullivan 2010-11 45.2% 45.6% 29.4 34
O’Sullivan 2009-10 43.9% 44.1% 31 32.2
O’Sullivan 2007-08 45.5% 46.5% 29.9 29.4
Eager 2012-13 34.4% 35.6% 40.5 32.8
Eager 2011-12 42.0% 43.0% 29.6 30.7
Eager 2009-10 54.4% 54.5% 18.3 39.1
Eager 2008-09 52.9% 53.9% 22.6 37.4

I have incldued OZ% and DZ% which is the percentage of face offs (including neutral zone face offs) that the player had in the offensive and defensive zone. These statistics along with ZSAdj CF% can be found on stats.hockeyanalysis.com.

If it isn’t obvious to you that there isn’t much difference between the two, let me make it more obvious by looking at this in graphical form.

OpenPlayvsZSAdjustedCorsiPct

That’s a pretty tight correlation and we are dealing with some player seasons that have had fairly significant zone start biases. Ben Eager had a very significant defensive zone start bias in both 2008-09 and 2009-10 but a sizable offensive zone bias in 2012-13. Colin Fraser had sizable defensive zone bias in 2009-10 but a sizable offensive zone bias in 2012-13. Patrick O’Sullivan had a heavy offensive zone bias in 2011-12. There is no compelling evidence here that ‘open play’ statistics are any more reliable or better than my 10-second zone start adjusted data. There is essentially no difference which reaffirms to me (yet again) that my 10-second adjustment is a perfectly reasonable method to adjust for zone starts which ultimately tells us that zone starts do not have a huge impact on a players statistics. Certainly not anywhere close to what many once believed, including Dellow himself. Any impact you see is more likely due to the quality of players one plays with if one gets a significant number of defensive zone starts.

Update: For Tyler Dellow’s response, or lack there of, read this.  Best I can tell is he doesn’t want to publicly say what open play is or how it shows zone starts affect players stats beyond my 10-second adjustment because I might interpret what he says as thinking I am right despite him clearly thinking the evidence proves me wrong. I guess rather than have me make a fool of myself by misinterpreting his results so I can believe I am right he is going to withhold the evidence from everyone. I feel so touched that Dellow would choose to save me from such embarrassment as misinterpreting results over letting everyone know the real effect of zone starts have on a players statistics and why ‘open play’ is what we should be using to negate the effect of zone starts. Truthfully though, I am willing to take the risk  of embarrassing myself if it furthers our knowledge of hockey statistics.

 

Related Articles:

Face offs and zone starts, is one more important than the other?

Tips for using Hockey Fancy Stats

 

 

Jul 102013
 

One of the complaints against advanced statistics in hockey is the names of some of the advanced statistics. Sometimes people complain about names like Corsi, Fenwick, PDO, etc. because they don’t have meaningful names. I never really understood it because once you figure it out, which honestly it isn’t that difficult, it isn’t all that difficult. That said, it still seems that some people feel it is a bit of a hurdle for some to get into advanced hockey statistics. I am hoping to revamp and improve my hockey statistics database even more this summer and in the process I wondered if there is interest in having me use some standardized hockey statistics nomenclature that we can all agree on. Here is what I am proposing:

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SOG Shots on goal
SAG Shots at goal (includes missed shots)
ASAG Attempted Shots at Goal (includes missed and blocked shots)
Percentage Statistics
Sh% Shooting percentage (G/SoG)
SAGSh% Shots at goal shooting percentage (G/SaG)
ASAGSh% Attempted Shots at Goal Shooting percentage (G/aSaG)
Sv% Save percentage (G/SoG)
SAGSv% Shots at goal save percentage (G/SaG)
ASAGSv% Attempted Shots at Goal Save percentage (G/aSaG)
ShSv% Shooting percentage + save percentage (Sh% + Sv%)
SAGShSv% Shots at goal shooting percentage + save percentage (SAGSh% + SAGSv%)
ASAGShSv% Attempted Shots at goal shooting percentage + save percentage (ASAGSh% + ASAGSv%)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISOGP Individual Shots on Goal Percentage (iSOG / SOGF)
IASAGP Individual Shots at Goal Percentage (iSAG / SAGF)
IASAGP Individual Attempted Shots at Goal Percentage (iASAG / ASAGF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
Prefix
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
Suffix
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics

The major changes are instead of calling shots + missed shots fenwick events we call them Shots At Goal (SAG) and instead of calling shots + missed shots + blocked shots corsi events we call them Attempted Shots At Goal (ASAG). Also PDO which is shooting percentage + save percentage is now named ShSv%.

The prefixes and suffixes can be added to individual stats to create new statistics. For example:

  • iSh% = Individual Shooting Percentage (iG / iSOG)
  • TMSAG20 = Team mate average Shots at Goal per 20 minutes of ice time weighted by TOI with
  • OppGF% = Opponent average Goals For Percentage weighted by time on ice against
  • PctTmG = In games that the player played in, the percentage of his teams goals that the player himself scored.

Note that not all combinations of prefixes and suffixes make sense. For example, PctTmSh% or Sh%F but that is self explanatory I think.

What does everyone think? I am perfectly fine sticking with the way I have statistics currently presented but if the majority think something along the lines of the above is better I am all for making the change. If anyone has any other suggestions they are welcome as well. I just think that this is as good a time as any to come up with some standardized nomenclature.

Also, I currently have statistics for the following situations:

  • 5v5
  • 5v5 Home
  • 5v5 Road
  • 5v5 Close
  • 5v5 Tied
  • 5v5 Up1
  • 5v5 Up 2+
  • 5v5 Down 1
  • 5v5 Down 2+
  • 5v5 Leading
  • 5v5 Trailing
  • 5v4 PP
  • 4v5 SH
  • Zone start adjusted data for all of the above except 5v4 SH and 4v5 SH.

If there is interest I may consider adding other situations. For example, first period, second period, third period, 4v4, 5v5 close home and 5v5 close road. Would anyone find these or any other situation interesting to look at?

Also feel free to consider the comments of this post the place where you can officially make any other suggestions of upgrades/enhancements you would like to see made to stats.hockeyanalysis.com. I can’t make any promises I will do implement them but I hope to make some upgrades over the summer.

Update:  Added ‘D’ to the suffix list which stands for differential. So ASAGD would stand for Attempted Shots At Goal Differential which is the equivalent of corsi differential in use now. Might consider adding Rel but need to consider if it is necessary or not. Thoughts?

 

Apr 112013
 

Every now and again someone asks me how I calculate HARO, HARD and HART ratings that you can find on stats.hockeyanalysis.com and it is at that point I realize that I don’t have an up to date description of how they are calculated so today I endeavor to write one.

First, let me define HARO, HARD and HART.

HARO – Hockey Analysis Rating Offense
HARD – Hockey Analysis Rating Defense
HART – Hockey Analysis Rating Total

So my goal when creating then was to create an offensive defensive and overall total rating for each and every player. Now, here is a step by step guide as to how they are calculated.

Calculate WOWY’s and AYNAY’s

The first step is to calculate WOWY’s (With Or Without You) and AYNAY’s (Against You or Not Against You). You can find goal and corsi WOWY’s and AYNAY’s on stats.hockeyanalysis.com for every player for 5v5, 5v5 ZS adjusted and 5v5 close zone start adjusted situations but I calculate them for every situation you see on stats.hockeyanalysis.com and for shots and fenwick as well but they don’t get posted because it amounts to a massive amounts of data.

(Distraction: 800 players playing against 800 other players means 640,000 data points for each TOI, GF20, GA20, SF20, SA20, FF20, FA20, CF20, CA20 when players are playing against each other and separate of each other per season and situation, or about 17.28 million data points for AYNAY’s for a single season per situation. Now consider when I do my 5 year ratings there are more like 1600 players generating more than 60 million datapoints.)

Calculate TMGF20, TMGA20, OppGF20, OppGA20

What we need the WOWY’s for is to calculate TMGF20 (a TOI with weighted average GF20 of the players teammates when his team mates are not playing with him), TMGA20 (a TOI with weighted average GA20 of the players teammates when his team mates are not playing with him), OppGF20 (a TOI against weighted average GF20 of the players opponents when his opponents are not playing against him) and OppGA20 (a TOI against weighted average GA20 of the players opponents when his opponents are not playing against him).

So, let’s take a look at Alexander Steen’s 5v5 WOWY’s for 2011-12 to look at how TMGF20 is calculated. The columns we are interested in are the Teammate when apart TOI and GF20 columns which I will call TWA_TOI and TWA_GF20. TMGF20 is simply a TWA_TOI (teammate while apart time on ice) weighted average of TWA_GF20. This gives us a good indication of how Steen’s teammates perform offensively when they are not playing with Steen.

TMGA20 is calculated the same way but using TWA_GA20 instead of TWA_GF20. OppGF20 is calculated in a similar manner except using OWA_GF20 (Opponent while apart GF20) and OWA_TOI while OppGA20 uses OWA_GA20.

The reason why I use while not playing with/against data is because I don’t want to have the talent level of the player we are evaluating influencing his own QoT and QoC metrics (which is essentially what TMGF20, TMGA20, OppGF20, OppGA20 are).

Calculate first iteration of HARO and HARD

The first iteration of HARO and HARD are simple. I first calculate an estimated GF20 and an estimated GA20 based on the players teammates and opposition.

ExpGF20 = (TMGF20 + OppGA20)/2
ExpGA20 = (TMGA20 + OppGF20)/2

Then I calculate HARO and HARD as a percentage improvement:

HARO(1st iteration) = 100*(GF20-ExpGF20) / ExpGF20
HARD(1st iteration) = 100*(ExpGA20 – GA20) / ExpGA20

So, a HARO of 20 would mean that when the player is on the goal rate of his team is 20% higher than one would expect based on how his teammates and opponents performed during time when the player is not on the ice with/against them. Similarly, a HARD of 20 would mean the goals against rate of his team is 20% better (lower) than expected.

(Note: The OppGA20 that gets used is from the complimentary situation. For 5v5 this means the opposition situation is also 5v5 but when calculating a rating for 5v5 leading the opposition situation is 5v5 trailing so OppGF20 would be OppGF20 calculated from 5v5 trailing data).

Now for a second iteration

The first iteration used GF20 and GA20 stats which is a good start but after the first iteration we have teammate and opponent corrected evaluations of every player which means we have better data about the quality of teammates and opponents the player has. This is where things get a little more complicated because I need to calculate a QoT and QoC metric based on the first iteration HARO and HARD values and then I need to convert that into a GF20 and GA20 equivalent number so I can compare the players GF20 and GA20 to.

To do this I calculate a TMHARO rating which is a TWA_TOI weighted average of first iteration HARO. TMHARD and OppHARO and OppHARD are calculated in a similar manner. TMHARD, OppHARO and OppHARD are similarly calculated. Now I need to convert these to GF20 and GA20 based stats so I do that by multiplying by league average GF20 (LAGF20) and league average GA20 (LAGA20) and from here I can calculated expected GF20 and expected GA20.

ExpGF20(2nd iteration) = (TMHARO*LAGF20 + OppHARD*LAGA20)/2
ExpGA20(2nd iteration) = (TMHARD*LAGA20 + OppHARD*LAGF20)/2

From there we can get a second iteration of HARO and HARD.

HARO(2nd iteration) = 100*(GF20-ExpGF20) / ExpGF20
HARD(2nd iteration) = 100*(ExpGA20 – GA20) / ExpGA20

Now we iterate again and again…

Now we repeat the above step over and over again using the previous iterations HARO and HARD values at every step.

Now calculate HART

Once we have done enough iterations we can calculate HART from the final iterations HARO and HARD values.

HART = (HARO + HARD) /2

Now do the same for Shot, Fenwick and Corsi data

The above is for goal ratings but I have Shot, Fenwick and Corsi ratings as well and these can be calculated in the exact same way except using SF20, SA20, FF20, FA20, CF20 and CA20.

What about goalies?

Goalies are a little unique in that they only really play the defensive side of the game. For this reason I do not include goalies in calculating TMGF20 and OppGF20. For shot, fenwick and corsi I do not include the goalies on the defensive side of things either as I assume a goalie will not influence shots against (though this may not be entirely true as some goalies may be better at controlling rebounds and thus secondary shots but I’ll assume this is a minimal effect if it does exist). The result of this is goalies do have a HARD rating but no HARO, or shot/fenwick/corsi based HARD or HARO rating.

I hope this helps explain how my hockey analysis ratings are calculated but if you have any followup questions feel free to ask them in the comments.