Ryan Stimson has been doing some valuable work tracking passes and this morning he posted an interesting analysis of the data he (and others) have collected thus far. It is a very interesting article and definitely worth a read. It is a valuable contribution to shot quality research but the article created some twitter discussion regarding one of the techniques that Ryan used. In particular, when Stimson was looking at the correlation between two variables (i.e. passing ability vs shooting percentage) he noticed that there was often an outlier team and he would subsequently look at the correlation between the two variables while eliminating the outlier team. This technique of removing outliers generated a bit of a backlash on twitter from @garik16 as it did when I used this technique not long ago.
@RK_Stimp Again 2 comments. 1. You can’t remove ANY data point from a data set of SIX teams – removing too high a % of the set there
— garik16 (@garik16) January 25, 2015
While I think that removing outliers has to be done with great caution and consideration it is also important to acknowledge that outlier analysis can be incredibly valuable tool in understanding what is going on. Teams aren’t built randomly and talent isn’t evenly distributed across the league. Talent differences across teams may result in different statistical patterns across teams. Different organizations have different philosophies on players and playing styles and this too may impact statistical patterns. As I have said before, we know that teams can manipulate statistical patterns by changing their playing style based on the score of the game (score effects are a well researched and fully accepted concept in hockey analytics) so it isn’t difficult to envision that various other statistical patterns could be altered by organizational or coaching philosophies. As statistical analysts we have to be open to this and not just apply a statistical model, crank out the results, and settle on hard and fast conclusions. We need to spend the time to understand the underlying data too.
I have spent a significant portion of my career working on air pollution research with some world-renowned scientists. Many years ago one not long after I finished University and just embarking on my career I was conducting some research on the relationship between weather patterns and air pollution. While doing this research a research scientist that I highly respect once told me that often the most interesting things can be learned when we study outliers. For this area of research typical weather patterns resulted in typical pollution levels but the study of outliers (atypical weather patterns) can really highlight the intricate relationship between weather patterns and air pollution.
Hockey isn’t baseball where there are a series of one-on-one battles that can be relatively easily incorporated into a statistical model because the only real factors involved are the talent levels of each player in the one-on-one battle. Unfortunately this isn’t how hockey works. Hockey is more like weather patterns where everything is interdependent on everything else and thus is very difficult to model. Sure, there are prevailing weather norms but occasionally outlier events happen like hurricanes or blizzards. It is these outliers that are the most interesting and most researched weather phenomena. Compared to a hurricane or a blizzard nobody really cares much about another 80F sunny day in Miami or a -5C January day in Ottawa. It’s just another day.
So, when I see someone suggest that you shouldn’t investigate how outliers affect underlying trends I get a bit defensive. If all you care about is what normally happens you’ll never truly understand the most interesting stuff. No NHL team strives to be ordinary, they strive to be elite and being elite, by definition, means being an outlier. If you want to be an outlier, you ought to do everything you can to understand what makes an outlier an outlier.
In one of Stimson’s charts he identified Chicago as the outlier team. Interestingly, I identified Chicago as an outlier team in my study on the relationship between Corsi and shooting percentage because they are one of the few teams that can post a good Corsi and an elevated shooting percentage. Furthermore, when it comes to elite NHL teams, Chicago would be front and center in the discussion. Is this a coincidence? Maybe. Or maybe it isn’t. It could be luck, it could be skill, or it could be organizational philosophy and/or coaching tactics but understanding why outliers exist is of critical importance. (Note: This is where I see the convergence of hockey analytics with traditional ‘hockey people’ like coaches and scouts. Analytics can identify trends and outliers to those trends and coaches and scouts can help assess the reason why those trends and outliers occur.)
Ultimately, for any NHL franchise who strives to be an elite team (which they all should) it means they are striving to be an outlier. Without understanding what make an outlier how can you expect to be one and you’ll only understand what makes an outlier by studying outliers independently from the underlying typical trend. This needs to be done with caution and care as to not just reinforce preconceived beliefs, but by not doing outlier analysis you are not fully understanding what is happening.