Jul 162013
 

Last week I posted an article with a proposal to standardize the terminology and nomenclature we use for advanced hockey statistics. This was an attempt to solicit feedback as to whether such a standardization was necessary and if so get some feedback on my proposed terminology. The response at first was slow to trickle in but the responses that did were somewhat telling. Generally speaking, while not a ton of people had an opinion, the feedback on standardizing terminology and getting rid of names like Corsi, Fenwick and PDO was positive.

 

 

 

 

 

Interestingly, or maybe not, the biggest resistance to the change was from some of the more hard core advanced statistics people. From this group of people more of the feedback was more along the lines of to “it will sort itself out eventually” to “people attack the name corsi as a way to attack the stat itself”.

 

Upon challenging Eric T. more he said he was open to standardization but believed it would eventually happen one way or another so wasn’t worried about the details now. In other conversations, he referred back to baseball and how its stats are named.

There is a clear difference though. DIPS is Defense Independent Pitching Statistics. BABIP is Batting Average on Balls In Play. OPS is On base Plus Slugging percentage. Corsi is a shot attempt differential stat that includes shots, missed shots and blocked shots and is named after an obscure former NHL goalie who used a similar metric to evaluate the work load a goalie experiences in a game during his time as a goalie coach for the Buffalo Sabres. Yes, you need an explanation of what the baseball stats mean but once you get that explanation you say “ahhh, it makes sense.” Down the road you see BABIP the name itself is a reminder of what the definition of it is. It is a lot easier to remember BABIP and what it stands for than it is to remember PDO and what it stands for. Now, this may not seem that difficult for someone who speaks and works with the terminology daily, but for the casual dabblers in advanced stats that might only occasionally read an article referencing advanced stats it becomes more difficult. There are no clues within PDO to trigger a memory response to recall it’s definition and this is even more difficult for Corsi and Fenwick where the differences are subtle. I have joked in the past that PDO stands for Pretty Damn Obfuscated because, well it is, and yet the stat itself is conceptually trivial to understand and calculate from it’s component parts.

Some of the arguments I have read on the resisting changes side of the equation essentially translates into “the onus is on others to learn our terminology and not on us to make it easier for them to learn” and “if they cannot be bothered to look up what a term means they probably won’t be bothered to understand what the stat says.” In some cases there may be some merit to these arguments but they also come across as being somewhat arrogant or elitist in nature. ‘It is not our duty to make it easy for others to learn it is their duty to take the time to learn what we do.’ It is probably not deliberate and we probably all do this sort of thing with respect to our own “fields of expertise” whatever they may be but that doesn’t mean it is right. We need to remind ourselves that being more accommodating and understanding of newcomers and casual observers is only going to benefit the field over the long haul.

I myself took offense to Eric T’s article “Steering advanced regression tools towards modern hockey thought” because when I hear the idea of steering other peoples work to a specific way of thinking it smacks of the same sort of elitism and arrogance (i.e. there being a “right way” and a “wrong way” to think of hockey analytics and one must conform to the right way). I am sure this was not deliberate in any way on Eric’s part but it it was made worse by the fact there was no effort to understand and critique the work that was actually done, only to point out what was not done up to “modern hockey thought” standards (which are not clearly published anywhere, more on this later). With that said, in the “Modern Hockey Thought” post Eric did write one thing that I think we can work off of:

I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.

I am not well versed in the history of baseball analytics but if this is occurring in the hockey analytics world, one must look into the reasons why. The truth is, those of us within the hockey analytics community have done a terrible job at removing barriers to entry regardless of how small those barriers to entry may seem. It starts with standardizing terminology and making terminology more understandable but the hockey analytics community has done a terrible job of making its research easily accessible. There is no well organized and maintained glossary of statistics and definitions. There is no well organized and maintained list of important papers and articles. There is no single place that one can go to get an up to date description of the state of hockey analytics, of what we know and what we don’t know, what we agree on and where differing opinions exist. We expect everyone to be up to speed on the current state of hockey analytics but don’t seem to want to put any effort into making it possible for people to do so. There was an attempt by Eric last summer on NHLNumbers.com to document the current state of hockey analytics and create a directory of important articles but it wasn’t finished and what was done isn’t easily found. If you want to learn about hockey analytics google is really your only resource but that can lead you in the wrong direction just as easily as the right direction and more likely than not lead to people giving up. As a hockey analytics community we need to be better at organizing the knowledge we have and making it far more accessible and it’s unfortunate that my effort to do so with regards to standardization of statistic naming conventions was mostly met with a big “meh, whatever” by the analytics community, or even worse ‘it means more work for us’.

 

 

I am sure I will get criticized for some of the comments I have made here but honestly, I don’t care (and my critiques go beyond just the couple people I mentioned here). If you feel you have been unfairly critiqued feel free to write your complaints in the comments. You can have your say, just keep them on topic and don’t expect a response from me. I have made my point and I know that there are others that agree with it.  I also regularly get people e-mailing me asking me for places they can go to get an intro to advanced hockey statistics and while I try to send them useful links I also know that they barely suffice. We need to do better and as such I will be going ahead with the following terminology changes when I update my stats site and I hope others will follow along. If we can’t even come together to reach an agreement on standardizing terminology there is no hope that we will be able to overcome that far more difficult challenges we face in making hockey analytics more accessible to anyone with an interest.

Event Statistics Description
TOI Time on ice
G Goals
A Assists
FirstA First Assists
SoG Shots on goal
SA Shot Attempts (includes missed and blocked shots, formerly a corsi event)
UBSA UnBlocked Shot Attempts (does not include blocked shots, formerly a fenwick event)
Percentage Statistics
Sh% Shooting percentage (G/S)
SASh% Shot Attempt Shooting percentage (G/SA)
UBSA-Sh% Unblocked Shot Attempt Shooting percentage (G/UBSA)
Sv% Save percentage (GA/SA)
SASv% Shot Attempt Save percentage (GA/SAA)
UBSASv% Unblocked Shot Attempt save percentage (GA/UBSAA)
SPS Save Plus Shooting (percentages)
SASPS Shot Attempt Save Plus Shooting (percentages)
UBSASPS Unblocked Shot Attempt Save Plus Shooting (percentages)
Other Statistics
IGP Individual Goals Percentage (iG / GF)
IAP Individual Assist Percentage (iA / GF)
IPP Individual Points Percentage (iPts / GF)
ISP Individual Shot Percentage (iS / SF)
ISAP Individual Shot Attempt Percentage (iSA/SAF)
IUBSAP Individual Unblocked Shot Attempt Percentage (iUBSA/UBSAF)
Zone Starts
OZFO Numer of Offensive Zone Face Offs
NZFO Number of Neutral Zone Face Offs
DZFO Number of Defensive Zone Face Offs
OZFO% Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)
NZFO% Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)
DZFO% Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)
OZBias Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)
DZBias Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)
OZFOW% Offensive Zone Face Off Winning Percentage
NZFOW% Neutral Zone Face Off Winning Percentage
DZFOW% Defensive Zone Face Off Winning Percentage
FOW% Face off win percentage (all zones)
Prefix
i Individual Stats
TM Average stats of team/line mates weighted by TOI with
Opp Stats of opposing players weighted by TOI against
PctTm Percent of Teams stats the player recorded in games the player played in
Suffix
F Stats for the players team while player is on the ice
A Stats against the players team while player is on the ice
20 or /20 Stats per 20 minutes of ice time
60 or /60 Stats per 60 minutes of ice time
F% Percentage of events that are by the players own team (i.e. for)
D Difference between For and Against statistics (i.e. a +/- statistics)

The major changes I made were to use “SA” (shot attempts) for corsi events and “UBSA” (unblocked shot attempts) for fenwick events instead of ASAG (attempted shots at goals) and SAG (shots at goal) in my previous iteration. This should make things a little clearer than my first proposal.  Update: Changed Sv+Sh% to SPS (Save Plus Shooting percentages).

 

  12 Responses to “The two sides of standardizing hockey nomenclature”

  1.  

    My position on names:

    1) I agree the names are sub-optimal
    2) I think it makes some difference in adoption
    3) I think many people overstate that difference
    4) I’d prefer to have optimal names if that’s easily accomplished
    5) Everything to date suggests it will be quite difficult, and I don’t think it’s worth putting a ton of effort into

    As for the rest, I’m not sure why we have to have this argument yet again, but…

    I myself took offense to Eric T’s article “Steering advanced regression tools towards modern hockey thought” because when I hear the idea of steering other peoples work to a specific way of thinking it smacks of the same sort of elitism and arrogance (i.e. there being a “right way” and a “wrong way” to think of hockey analytics and one must conform to the right way).

    [...]

    We expect everyone to be up to speed on the current state of hockey analytics but don’t seem to want to put any effort into making it possible for people to do so.

    The whole point of my article was to get them up to speed on the current state of hockey analytics. It seems strange for you to criticize me for doing that in the same article where you complain that nobody seems willing to put in effort to do that.

    Incidentally, even though you don’t seem to understand what I was doing, the authors of the paper that I was responding to understood it and appreciated it. I exchanged multiple emails with them and had a lengthy phone conversation about what’s already known and what is still underexplored and how I think they could add value instead of reliving past mistakes and reinventing past analysis.

    Also, there is a right way and a wrong way about certain things. Accounting for regression in measurements with poor repeatability is one of those things. Would it be elitist or arrogant to tell someone that they made a multiplication error?

    I am not well versed in the history of baseball analytics but if this is occurring in the hockey analytics world, one must look into the reasons why. The truth is, those of us within the hockey analytics community have done a terrible job at removing barriers to entry regardless of how small those barriers to entry may seem.

    I recommend you read the link at the top of my article that lays out the state of affairs in other sports. It will provide you with insight that might serve you well when you move on to guessing how to avoid those problems in hockey.

    I am sure this was not deliberate in any way on Eric’s part but it it was made worse by the fact there was no effort to understand and critique the work that was actually done, only to point out what was not done up to “modern hockey thought” standards

    No effort to understand and critique the work? Read the article again carefully; I go point by point through how failing to account for variance impacted their results and what they might have done differently.

    If anyone here has failed to understand someone else’s work, it wasn’t me.

    •  

      1) I agree the names are sub-optimal
      2) I think it makes some difference in adoption
      3) I think many people overstate that difference
      4) I’d prefer to have optimal names if that’s easily accomplished
      5) Everything to date suggests it will be quite difficult, and I don’t think it’s worth putting a ton of effort into

      Fair enough on 1-4 but I have no clue why it would be so difficult except that certain core people inside the hockey analytics community are being stubborn about it (or like you somewhat ambivalent about it) while almost all the feed back I have received from outside the core community has been positive and supportive of the change. There is no reason for the change to be difficult and I personally think it sheds a poor light on the hockey analytics community. It’s kind of sad really. I mean, getting mocked because my proposed statistical definitions are organized in table form? Rejecting a name change because Shot Attempt Differential gets abbreviated to SAD which is a lot like the English word sad? These are awfully poor excuses for resisting a name change. Everyone is free to have a preference for what they want to call a stat but to claim changing ‘Corsi’ to ‘SAD’ is in any way a difficult thing to do is nothing more than a poor excuse for stubbornness.

      •  

        I’m mostly on your side on this, but I think there are valid reasons to be on the other side.

        I think there are legitimate differences of viewpoint about what the optimal names would be. There are many different reasonable opinions about how heavily to value having the name be readable, having it be descriptive, avoiding having stats have similar names, etc. I think we could get 20 people in a room and spend three hours debating things and still not reach a consensus.

        If it were easy to get agreement on names, I submit that the change would’ve been made by now.

        It won’t be easy to get agreement. And once we do pick something, the actual switch is still a pain — propagating the new names throughout the community, re-educating the people who were up to speed on the old names, etc. It’s common practice to provide links to previous work that supports a point; now each of those links might need to come with a translation kit for the people who grew up in the post-renaming era.

        Nothing about this is easy. I’m not mocking you for trying — hell, I’d like to see you succeed — but you shouldn’t assume that stubbornness is the only reason to resist such a movement.

  2.  

    Slightly OT:

    IAP Individual Assist Percentage (iA / GF)

    I don’t have a problem with the name, but hate the calculation. It should be (iA / (GF – iG)), in my opinion. Otherwise, the measure of how good a passer I am depends on how good a shooter I am — if all of my shots go in, then my individual assist percentage will likely be very low.

    •  

      Interesting thought, though I would then ask if IGP should be iG/(GF-iA) so that one wouldn’t be penalized for being a good passer instead of shooting.

      I like that IAP + IGP = IPP. IPP tells you how involved they are in the offensive production of his team when he is on the ice and IAP and IGP tell you whether he is more of a passer, more of a shooter or more balanced.

  3.  

    If you want my opinion…I like the idea of standardizing these names into acronyms but IMO any acronym with more than 3, MAX 4 letters, is far too long and it’s just not going to work. I would try finding other abbreviations for those terms.

    •  

      Everyone needs to remember that some of these stats that have longer names will never be used as mainstream stats if at all. For example, the uses for UBSASPS (UnBlocked Shot Attempt Save Plus Shooting percentage) are minimal if at all but having the nomenclature to describe them, even if somewhat complex, is not necessarily a negative.

      Honestly, if I had a choice we would all agree to use either all shot attempts or unblocked shot attempts so we’d only have to keep track of one but when I posed that question a few months ago several people insisted it is important to keep both. Thus we are stuck with both SA and UBSA.

      •  

        Both are neccessary. Over small sample sizes, Corsi has better predictive value than Fenwick, but as the sample size increases, Fenwick becomes slightly more predictive (which is why we generally look at FF% at the end of the season.

        We shouldn’t throw one out if both bring value to the table.

  4.  

    Wouldn’t the use of SA for Corsi events cause confusion – nhl.com and the audience you are presumably designing this for already uses it for Shots Against.

  5.  

    I’m a hockey fan who has not followed closely the development of the advanced stats, but I can see the value in them. One thing that’s hampered me in following them is their opaque names. A name should be a mnemonic device for the statistic in question.

    As it is when someone cites “PDO” in an article I think, “Now, which one is that? Despite appearances the name is not an acronym. So, no help there.” So I have to open another tab and search the net for a glossary. The first few results returned aren’t helpful so I give up on the search and the article.

    I used to teach intro economics to college freshmen. I can imagine if economics named it’s most-used measurements like the new hockey stats: If gross domestic product was called “Ricardo” (we used to use “Smith” or gross national product, but switched years ago for reasons no one can remember), aggregate demand was called “Keynes”, median wage was “Galbraith” etc. This would result in sentences like, “Although Ricardo has increased each year since 2010, Keynes remains low because of stagnant Galbraith.”

    My students would have rioted. And rightly so. The name “GDP” helps us to remember what it represents in a way that “Ricardo” does not.

    Please reject obscurity for clarity. Thank you.

    •  

      I’m in the exact same boat as Voline. I haven’t closely followed the developments of advanced stats, but the logic behind them makes sense. When I do read about them, or want to read about them, I get confused by the names (I can’t remember which of Corsi or Fenwick includes blocked shots). Accronym, or at least logical names, at least for these two core items, would definitely reduce the number of people who are scared off because it sounds “complicated” or “weird”.

  6.  

    A very interesting summary of the discussion David, thanks.

Sorry, the comment form is closed at this time.