The two sides of standardizing hockey nomenclature
Last week I posted an article with a proposal to standardize the terminology and nomenclature we use for advanced hockey statistics. This was an attempt to solicit feedback as to whether such a standardization was necessary and if so get some feedback on my proposed terminology. The response at first was slow to trickle in but the responses that did were somewhat telling. Generally speaking, while not a ton of people had an opinion, the feedback on standardizing terminology and getting rid of names like Corsi, Fenwick and PDO was positive.
@hockeyanalysis Good plan. “Corsi”, “Fenwick”, and “PDO” need to be eliminated from common useage.
— KM (@SmoulderinGT) July 10, 2013
— Draglikepull (@draglikepull) July 11, 2013
— Doogie (@doogie2k) July 11, 2013
Interestingly, or maybe not, the biggest resistance to the change was from some of the more hard core advanced statistics people. From this group of people more of the feedback was more along the lines of to “it will sort itself out eventually” to “people attack the name corsi as a way to attack the stat itself”.
Remember how much trouble baseball stats had in catching on because some people called it WAR and some called it WARP? #MeEither
— Eric T. (@BSH_EricT) July 11, 2013
— mc79hockey (@mc79hockey) July 11, 2013
Upon challenging Eric T. more he said he was open to standardization but believed it would eventually happen one way or another so wasn’t worried about the details now. In other conversations, he referred back to baseball and how its stats are named.
There is a clear difference though. DIPS is Defense Independent Pitching Statistics. BABIP is Batting Average on Balls In Play. OPS is On base Plus Slugging percentage. Corsi is a shot attempt differential stat that includes shots, missed shots and blocked shots and is named after an obscure former NHL goalie who used a similar metric to evaluate the work load a goalie experiences in a game during his time as a goalie coach for the Buffalo Sabres. Yes, you need an explanation of what the baseball stats mean but once you get that explanation you say “ahhh, it makes sense.” Down the road you see BABIP the name itself is a reminder of what the definition of it is. It is a lot easier to remember BABIP and what it stands for than it is to remember PDO and what it stands for. Now, this may not seem that difficult for someone who speaks and works with the terminology daily, but for the casual dabblers in advanced stats that might only occasionally read an article referencing advanced stats it becomes more difficult. There are no clues within PDO to trigger a memory response to recall it’s definition and this is even more difficult for Corsi and Fenwick where the differences are subtle. I have joked in the past that PDO stands for Pretty Damn Obfuscated because, well it is, and yet the stat itself is conceptually trivial to understand and calculate from it’s component parts.
Some of the arguments I have read on the resisting changes side of the equation essentially translates into “the onus is on others to learn our terminology and not on us to make it easier for them to learn” and “if they cannot be bothered to look up what a term means they probably won’t be bothered to understand what the stat says.” In some cases there may be some merit to these arguments but they also come across as being somewhat arrogant or elitist in nature. ‘It is not our duty to make it easy for others to learn it is their duty to take the time to learn what we do.’ It is probably not deliberate and we probably all do this sort of thing with respect to our own “fields of expertise” whatever they may be but that doesn’t mean it is right. We need to remind ourselves that being more accommodating and understanding of newcomers and casual observers is only going to benefit the field over the long haul.
I myself took offense to Eric T’s article “Steering advanced regression tools towards modern hockey thought” because when I hear the idea of steering other peoples work to a specific way of thinking it smacks of the same sort of elitism and arrogance (i.e. there being a “right way” and a “wrong way” to think of hockey analytics and one must conform to the right way). I am sure this was not deliberate in any way on Eric’s part but it it was made worse by the fact there was no effort to understand and critique the work that was actually done, only to point out what was not done up to “modern hockey thought” standards (which are not clearly published anywhere, more on this later). With that said, in the “Modern Hockey Thought” post Eric did write one thing that I think we can work off of:
I think the baseball community consistently gets less than it could out of analytical experts like yourself because they are often directing their high-powered tools at the wrong problems. I’m hoping to help ensure that hockey, with its greater analytical challenges, gets as much out of your expertise as possible.
I am not well versed in the history of baseball analytics but if this is occurring in the hockey analytics world, one must look into the reasons why. The truth is, those of us within the hockey analytics community have done a terrible job at removing barriers to entry regardless of how small those barriers to entry may seem. It starts with standardizing terminology and making terminology more understandable but the hockey analytics community has done a terrible job of making its research easily accessible. There is no well organized and maintained glossary of statistics and definitions. There is no well organized and maintained list of important papers and articles. There is no single place that one can go to get an up to date description of the state of hockey analytics, of what we know and what we don’t know, what we agree on and where differing opinions exist. We expect everyone to be up to speed on the current state of hockey analytics but don’t seem to want to put any effort into making it possible for people to do so. There was an attempt by Eric last summer on NHLNumbers.com to document the current state of hockey analytics and create a directory of important articles but it wasn’t finished and what was done isn’t easily found. If you want to learn about hockey analytics google is really your only resource but that can lead you in the wrong direction just as easily as the right direction and more likely than not lead to people giving up. As a hockey analytics community we need to be better at organizing the knowledge we have and making it far more accessible and it’s unfortunate that my effort to do so with regards to standardization of statistic naming conventions was mostly met with a big “meh, whatever” by the analytics community, or even worse ‘it means more work for us’.
— mc79hockey (@mc79hockey) July 11, 2013
I am sure I will get criticized for some of the comments I have made here but honestly, I don’t care (and my critiques go beyond just the couple people I mentioned here). If you feel you have been unfairly critiqued feel free to write your complaints in the comments. You can have your say, just keep them on topic and don’t expect a response from me. I have made my point and I know that there are others that agree with it. I also regularly get people e-mailing me asking me for places they can go to get an intro to advanced hockey statistics and while I try to send them useful links I also know that they barely suffice. We need to do better and as such I will be going ahead with the following terminology changes when I update my stats site and I hope others will follow along. If we can’t even come together to reach an agreement on standardizing terminology there is no hope that we will be able to overcome that far more difficult challenges we face in making hockey analytics more accessible to anyone with an interest.
|TOI||Time on ice|
|SoG||Shots on goal|
|SA||Shot Attempts (includes missed and blocked shots, formerly a corsi event)|
|UBSA||UnBlocked Shot Attempts (does not include blocked shots, formerly a fenwick event)|
|Sh%||Shooting percentage (G/S)|
|SASh%||Shot Attempt Shooting percentage (G/SA)|
|UBSA-Sh%||Unblocked Shot Attempt Shooting percentage (G/UBSA)|
|Sv%||Save percentage (GA/SA)|
|SASv%||Shot Attempt Save percentage (GA/SAA)|
|UBSASv%||Unblocked Shot Attempt save percentage (GA/UBSAA)|
|SPS||Save Plus Shooting (percentages)|
|SASPS||Shot Attempt Save Plus Shooting (percentages)|
|UBSASPS||Unblocked Shot Attempt Save Plus Shooting (percentages)|
|IGP||Individual Goals Percentage (iG / GF)|
|IAP||Individual Assist Percentage (iA / GF)|
|IPP||Individual Points Percentage (iPts / GF)|
|ISP||Individual Shot Percentage (iS / SF)|
|ISAP||Individual Shot Attempt Percentage (iSA/SAF)|
|IUBSAP||Individual Unblocked Shot Attempt Percentage (iUBSA/UBSAF)|
|OZFO||Numer of Offensive Zone Face Offs|
|NZFO||Number of Neutral Zone Face Offs|
|DZFO||Number of Defensive Zone Face Offs|
|OZFO%||Offensive Zone Face Off Percentage – OZFO /(OZFO+NZFO+DZFO)|
|NZFO%||Neutral Zone Face Off Percentage – NZFO /(OZFO+NZFO+DZFO)|
|DZFO%||Defensive Zone Face Off Percentage – DZFO /(OZFO+NZFO+DZFO)|
|OZBias||Offensive Zone Bias – (2*OZFO + NZFO) / (OZFO + NZFO + DZFO)|
|DZBias||Defensive Zone Bias – (2*DZFO + NZFO) / (OZFO + NZFO + DZFO)|
|OZFOW%||Offensive Zone Face Off Winning Percentage|
|NZFOW%||Neutral Zone Face Off Winning Percentage|
|DZFOW%||Defensive Zone Face Off Winning Percentage|
|FOW%||Face off win percentage (all zones)|
|TM||Average stats of team/line mates weighted by TOI with|
|Opp||Stats of opposing players weighted by TOI against|
|PctTm||Percent of Teams stats the player recorded in games the player played in|
|F||Stats for the players team while player is on the ice|
|A||Stats against the players team while player is on the ice|
|20 or /20||Stats per 20 minutes of ice time|
|60 or /60||Stats per 60 minutes of ice time|
|F%||Percentage of events that are by the players own team (i.e. for)|
|D||Difference between For and Against statistics (i.e. a +/- statistics)|
The major changes I made were to use “SA” (shot attempts) for corsi events and “UBSA” (unblocked shot attempts) for fenwick events instead of ASAG (attempted shots at goals) and SAG (shots at goal) in my previous iteration. This should make things a little clearer than my first proposal. Update: Changed Sv+Sh% to SPS (Save Plus Shooting percentages).