It is definitely more difficult to find defensive relationships. I have found this in nearly everything I have researched. I am not sure exactly why but I suspect a big portion of it is as you suggest, offense is more in your own control. I do think that some players are capable of suppressing shooting percentage against (boosting save percentage) but far fewer players are able to do that than can boost offensive shooting percentage.

I also did some research that seemed to indicate that coaches are actually quite poor at identifying defensive players so maybe that is a part of it too.

]]>Please check my comment in his later post from the 20th.

]]>Interestingly, despite my variations in methodology I got similar results, the R^2 for the CF correlation was .2966, for FF it was .2698, and for SCF it was .3262, all with a solid negative relationship between production rates for and relevant Sh%. More precisely I think it speaks to the idea that teams that pile on the shots indiscriminately are doing so at lower levels of quality, be it a function of being behind more often than even score adjustment can account for or systems/coaching or simply the poorer decision making/play of the teams players. Here are some links for the pics of these three scatter plots:

CF: http://pbrd.co/1J9a5QL

FF: http://pbrd.co/1J9aylZ

SCF: http://pbrd.co/1J9aEtL

I went one step further with is and examined the other end of the table; for these same selection of teams, how do their CA, FA, and SCA rates compare to their relevant Sh%? If there is indeed a persistent and inverse relationship between shot quantity and shot quality then this should be observable at both ends of the ice. We would expect to see that teams giving up greater numbers of shot attempts should also see the % of those attempts going in to be less (this would be the defensive argument of shot quality, that “we allow more shots but they’re outside and lower%”, also know as the “Toronto Leafs strategy”). However, contrary to this expectation, the effect goes away completely. For CA, FA, and SCA rates there ends up being essentially no relationship between the rates against and the % of them that go in. The R^2s are CA at .0634, FA at .0599, and SCA at .0697. Why would this be? My best guess is the conclusion would be teams have the most control over their own behavior and results in terms of the quantity and the quality of that offense they generate but have little impact on the quality of offense directed against them, and can only distinctly impact the quantity of offense directed against them. I think this would be an important conclusion and supportive of the statements made by @RegressedPDO in terms of shot suppression being extremely important. Here are the 3 charts for the ‘against’ relationships:

CA: http://pbrd.co/1J9fNSR

FA: http://pbrd.co/1J9fOWP

SCA: http://pbrd.co/1J9fRlB

Also, just to try it out I rejiggered the against analysis to change which teams I excluded from the top and bottom to be based on best and worst in GA/60 rather than keeping the same teams from the GF/60. While I don’t think this approach is correct since it’s very much skewing the sample to fit the result you want to achieve, I was curious. (perhaps I should’ve just done both based on GF% but not sure that would change results much) Even when making this adjustment, the results don’t much with the R^2 for CA at .0786, FA at .122, and SCA being the only significant change moving to .3412 (oddly enough the strongest relationship out of all 9 metrics examined). This last point is very interesting, particularly in how it diverges from the Corsi and Fenwick measures. I would have to think more as to why they could be so different.

]]>I am in general a strong believer at examining underlying rates as I believe they contain extra and important information beyond simply percentage shares (a shared opinion with @RegressedPDO, @IneffectiveMath, and @acthomasca). It is very different if a team has CF% of 55% with CF/60 55 and CA/60 45 as opposed to one with the same CF% but rates of CF/60 22 and CA/60 18. I think it would be much more reasonable to expect the first team to have their CF% negatively correlate with Sh% because of their high shot attempt generation whereas the second team has the same CF% due primarily to shot attempt suppression and would not necessarily be expected to have a negative correlation with Sh%.

As to your last note of items for further investigation, I’d also like to see the impact on a team like the Capitals after Boudreau was fired and they brought Hunter on. Keep up the good work!

]]>Using your initial premise, 5-on-5 Close from the previous 3 seasons comparing CF% to CSh%, if we instead remove the top and bottom 4 GF60 teams the correlation improves to .4854. We even see an improvement when negating the top and bottom 5 (I like 5, because 66.7% of the league should fall within 1 standard deviation of the midpoint, so the middle 20 can all be viewed as “average”), using GF% we get a .5038 while using GF60 we get a .5527. However, we don’t get the same massive improvement you did when we dropped the top and bottom 6, so obviously something about GF% was better for that final entry.

Also on the same line of thought, I wondered why we were looking at CF% rather than Corsi For. So I looked at raw CF vs CSh% and for all 30 teams our correlation jumped to .1437, while eliminating the top and bottom 4 GF60 teams improved to a .5930 (it is a .4721 if you sort by GF% rather than GF60). Taken a step further and removing the top and bottom 5, leaving us with the 20 average teams within one standard deviation of the midpoint, was a massive .8247 (.4678 when sorting by GF%) and eliminating top/bottom 6 gives us a .8566 (.6151 using GF%).

So all in all it is a pretty clear indication that for the average team more shot attempts do indeed mean a lower Sh% (and conversely one can assume this also suggests that more shot attempts against tend to mean increased Sv%, since Sv% is inversely proportional to Sh%).

And lastly I suppose I was curious to see how that worked when comparing Sh% rather than CSh% (since that was one of the comments on twitter). So sorting by GF60 and removing the top/bottom 4 teams we still have a decent .4186 correlation between raw Corsi and Sh% (.6717 when eliminating top/bottom 5 and .7132 when eliminating top/bottom 6). As you pointed out, its not quite as high, but it still follow the same trend and suggests there is an inverse relationship between the number of shot attempts and the team’s Sh%.

]]>There were more 5-on-5 goals per 60 from 2008-11 (2.31/2.34/2.29) than there were from 2011-14 (2.25/2.24/2.25). There were also more 5-on-5 shots per 60, although not quite by as wide a margin, from 2008-11 (29.2/29.3/29.5) than there were from 2011-14 (28.9/28.5/29.1). Conversely Sv% was lower (and as such Sh% was higher) from 2007-10 (.920/.921/.920) than it was from 2010-14 (.922/.922/.921/.923).

So when you are basing your results on comparing a period in which the league average scoring was higher (2007-11) to one in which the numbers were down on the whole (2011-14) then it stands to reason you are going to see a decline, which he attributes simply to being a “regression to the mean.” It’s not a huge difference (especially in the percentages themselves), but it could very well be enough to influence the results.

Your method of instead looking at even/odd allows a balance so that each data set has some information including the high scoring years as well as some information from the low years.

Personally I try to stick just the past 3-4 seasons in which league-wide numbers have remained more or less steady (at the time I researched it the 2014-15 numbers were in line with recent years as well), but then that does severely limit your samples size, so it probably wouldn’t be helpful to a project like this.

I’m not overly surprised to see that you found a point in which the signal and noise were at odds, as I remember reading somewhere (it may even have been from you) that after you hit 500-750 minutes you have enough data to offset the effect of noise on your results.

Anyway, thanks once again for providing us with an entertaining and informative look at the data.

]]>As you say, we will need to find new measures that indicate an edge.

]]>This approach has been adapted from bburke’s e adv nfl stats website. The best teams win at most 65% of games also the best theoectical predictive models top out at. I will provide links. We know that randomness is solit equally since it is not skill based.Therefore to get around a 63% winbest skilled tm must hv ~30% skill and 1/2 of the luck component. or 70/2 which is 30+35 or 65%.

This also passes the rye test. I counted ~300 goals in games llast yr and morecthan 2 out of 3 were primarly the result of luck…bad bouncec etc etc. I have made this point before and no ine has refuted it .Perhaps someone will this time? – So Corsi’s value is fighting for the 35%. I like Alan ryders old breakdown goaltending 15% defense 40% offense 45%. If we div. by 3. We get goaltending 5% skill.(about the observed skill diff we see-btw)..Off. 15% skill and def. ~13%. So the next question is how much of off is Corsi and how much of def skill is from corsi….Not sure about this answer yet as seperating D skill very tough.. Dan ]]>