Tyler Dellow has an interesting post on differences between the Kings and Leafs offensive production. He comes at the problem from a slightly different angle than I have explored in my rush shot series so definitely go give it a read. These two paragraphs discuss a theory of Dellow’s that is interesting.

That’s the sort of thing that can affect a team’s shooting percentage. To take it to an extreme, teams shot 6.2% in the ten seconds after an OZ faceoff win this year; the league average shooting percentage at 5v5 is more like 8%. Of course, when you win an offensive zone draw, you start with the puck but the other team has five guys back and in front of you.

I wonder whether there isn’t something like that going on here that explains LA’s persistent struggles with shooting percentage (as well as those of New Jersey, another team that piles up Corsi but can’t score – solving this problem is one of the burning questions in hockey analytics at the moment). It’s a theory, but one that seems to fit with what Eric’s suggested about how LA generates the bulk of their extra shots. It’s hard for me to explain the Leafs scoring so many more goals in the first 11 seconds after a puck has been carried in, particularly given that I suspect that LA, by virtue of their possession edge, probably enjoyed many more carries into the offensive zone overall.

Earlier today I posted some team rush statistics for the past 7 and past 3 seasons. Let’s look in a little more detail how the Leafs, Kings and Devils performed over the past 3 seasons.

 Team RushGF RushSF OtherGF OtherSF RushSh% OtherSh% Rush% New Jersey 45 540 103 1675 8.33% 6.15% 24.4% Toronto 66 523 128 1675 12.62% 7.64% 23.8% Los Angeles 53 609 112 1978 8.70% 5.66% 23.5%

The Leafs scored the most goals on the rush despite the fewest rush shots due to a vastly better shooting percentage (nearly 50% better than the Devils and Kings) on the rush. They do not generate more shots on the rush, but do seem to generate higher quality shots.

The Kings generate by far the most shots in non-rush situations but have the poorest shooting percentage and thus do not score a ton of goals. The Devils don’t generate many non-rush shots and don’t have a great non-rush shooting percentage either and thus posted the fewest goals. The Leafs have had the same number of shots as the Devils but a significantly higher shooting percentage than the Devils and thus scored significantly more non-rush goals.

The Leafs scored 34% of their goals on the rush compared to 32% for the Kings and 30% for the Devils.

Are the Leafs a good rush team? Well, only Boston has scored more 5v5 road rush goals than the Leafs so probably yes but it is mostly because of finishing talent, not shot generating talent. They are 4th last in 5v5 road rush shots.

The Ducks have very similar offense to the Leafs. They don’t get many rush shots but post a really high rush shooting percentage. Anaheim generate a few more non-rush shots than the Leafs but they are very similar offense.

The Kings are a slightly better rush team than the Devils but neither are good and both are weak shooting percentage teams regardless of whether it is a rush or non-rush shot. The Kings make up for this though by generating a lot of shots from offensive zone play where as the Devil’s don’t.

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

 Comparison 0710 vs 1013 Even vs Odd Difference GF20 vs GF20 0.61 0.89 0.28 FF20 vs FF20 0.62 0.97 0.35 FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs GF20 0.54 0.86 0.33 GF20 vs FF20 0.44 0.86 0.42 FSh% vs GF20 0.48 0.76 0.28 GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs FSh% 0.38 0.66 0.28 FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

 Comparison Even vs odd GF20 vs GF20 0.82 FF20 vs FF20 0.93 FSh% vs FSh% 0.63 FF20 vs GF20 0.74 GF20 vs FF20 0.77 FSh% vs GF20 0.65 GF20 vs FSh% 0.66 FF20 vs FSh% 0.45 FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

On Monday I outlined an all-encompassing player evaluation model that allows us to evaluate every forward, defenseman and goalie under the same methodology.  In short, the system compares how many goals are scored for and against while a player is on the ice and compares it to how many goals scored for/against one should expect based on the quality of his line mates and opposition.  That model, I believe, makes a reasonable attempt at evaluating a players performance, but it can be improved.

The first method of improvement is to utilize the additional information we have about the quality of a players line mates and opposition once we have run the model.  Initially I use the goals for and against performance of his line mates and opposition when the player being evaluated is not on the ice at the same time as his line mates and opposition.  But now that we have run the model we, at least theoretically, have a better understanding of the quality of his team mates and opposition.  I can then take the output of the first model run and use it as the input of the second model run to get new and better results.  I can then continue doing this iteratively and the good news is that after every iteration the difference between the player rating from that iteration and the previous iteration trends towards zero which is a very nice result.