So yesterday I wrote an article about the state of player evaluation models in hockey analytics. It was a mostly well-received article.
Web Sant, whose model I critiqued, had a ‘quibble’ and we had a pleasant follow up discussion even if we ended up with a respectful disagreement.
A quibble. I never said that my model purports to measure talent: “natural aptitude or skill.”
It does measure performance.
— Web Sant (@web_sant) July 17, 2017
Some others were receptive to the ideas, both directly and indirectly.
— James Lambert (@rivermen123) July 16, 2017
This post from David is interesting. I took on a similar theme, but more generally, in my article Catch-22. The basic issue seems to be if https://t.co/BpmmA5m8xo
— Stefan Wolejszo (@StefanWolejszo) July 16, 2017
This blog from Gelman about Bill James should be required reading for anyone arguing we should wave away obvious busts in model outcomes. pic.twitter.com/cSykygsJpi
— Oilers Nerd Alert (@OilersNerdAlert) July 17, 2017
Tyler Dellow even cited an interesting example from baseball analytics that essentially supports the theme of my article (read the tweet thread, it’s worth it).
I always find these posts of the least valuable (by WAR) 100 RBI seasons interesting. https://t.co/Tg1NcTVWj1
— dellowhockey (@dellowhockey) July 17, 2017
Then there is the response from the cult (current or alumni writers) known as Hockey Graphs.
It started out pleasant with one of the more respected members with a few questions/clarifications.
from a math point of view, why do u make a distinction between repeatability and predictiveness?
— Jack Han (@ml_han) July 16, 2017
That lead to a modest, but reasonable, questioning of how GAR should be presented.
I’m curious how you feel we should be presenting stats like GAR/etc with a level of confidence that would be more appropriate?
— EvolvingWild (@EvolvingWild) July 16, 2017
However today things really went off the rails. Garret Hohl suggest I am only allowed to critique model outputs if I also make suggestions on how to improve the model methodology.
So, when you wish to critique models, you need to critique it on the methodology, the background theory, and the results from testing. (7/7)
— Garret Hohl (@GarretHohl) July 13, 2017
It isn’t my model, I am not going to do your work.
Manny chimes in with his usual pointless insight and personal attacks.
My advice to anybody getting started in hockey analytics is to look at Dave Johnson and resolve to be exactly the opposite of that.
— Manny (@MannyElk) July 16, 2017
And then some more wonderful insight today.
Nothing says “I’m out of touch” quite like using GA/60 as evidence that a sophisticated statistical model is defective.
— Manny (@MannyElk) July 17, 2017
Oh, and Ryan Stimson didn’t want to be left out of all of this either.
better than other existing metrics, it makes our predictions better. A few “I didn’t think that player would be” are expected and make for..
— Ryan Stimson (@RK_Stimp) July 17, 2017
Nice to see Tyler call Stimson out on what ‘few’ means. I think using few is a mass under representation of how poor these models perform.
And of course Ryan couldn’t hold back from taking some back handed shots about something I didn’t say.
“Hockey Analytics fails to properly account for QoC because Jay Bouwmeester has an on-ice sv% of 95.24 against Patrick Kane” – Contrarians.
— Ryan Stimson (@RK_Stimp) July 17, 2017
To be clear, I was asked by someone else about Bouwmeester’s performance against Patrick Kane, which is good and not just Sv%, so I provided the numbers to support that.
However the one that really takes the cake (and prompted this article) is Nick Mercadante.
This is utter dog shit. https://t.co/hn2CB1FaFy
— Nick Mercadante (@NMercad) July 17, 2017
I mean, is this really necessary? Are we really going to be this immature over my opinion of someone’s hockey models?
I have written about the Hockey Graphs cult before. They have a pack mentality when a critique of one is like an outright attack on the whole group and counter attacks need not have a basis in reality and personal attacks are more than welcome. The one thing you will not see in these counter attacks is any defense of models output. There is some defense that the scientific method is the way to go but no one is actually defending the results of the model that I critiqued. None. No one had any ‘Klefbom is rated the second best NHL player because….’ arguments. It simply can’t be defended. It isn’t a small error in evaluation, it is a huge. There is no defending it.
There are reasons why these models fail so spectacularly. Hockey data is terrible and hockey is an incredibly complex sport. We don’t even know what we don’t know because we are so limited in the data we have. You can have the best models in the world but if the inputs are terrible the outputs will be too. Garbage in, garbage out. This is not an attack on those who created the models, it is simply reality. It’s just unfortunate that one segment of the hockey analytics crowd can’t deal with that in an an honest and up front way.
For me I am always looking at this from the perspective of an NHL front office and how they might use hockey analytics. So my suggestion to everyone is when someone critiques your work respond to it as if the critique is coming from an NHL general manager whom you would really like to work for. If we all did that hockey twitter would be a much better, more respectful, and more productive place.
Finally, I’ll leave you with this.
Bill James on WAR: CRAM IT. pic.twitter.com/1wgS20ESG6
— Internet🤔Contrarian (@bogcommenter) July 17, 2017
That from http://www.billjamesonline.com/hey_bill/.