# Measuring persistence, randomness, and true talent

In Rob Vollman’s Hockey Abstract book he talks about the persistence and its importance when it comes to a particular statistics having value in hockey analytics.

For something to qualify as the key to winning, two things are required: (1) a close statistical correlation with winning percentage and (2) statistical persistence from one season to another.

More generally, persistence is a prerequisite for being able to call something a talent or a skill and how close it correlates with winning or some other positive outcome (such as scoring goals) tells us how much value that skill has.

Let’s look at persistence first. The easiest way to measure persistence is to look at the correlation of that statistics over some chunk of time vs some future chunk of time. For example, how well does a stat from last season correlate with the same stat this season (i.e. year over year correlation). For some statistics such as shooting percentages it may even be necessary to go with even larger sample sizes such as 3 year shooting percentage vs future 3 year shooting percentages.

One mistake that many people make when doing this is conclude that the lack of correlation and thus lack of persistence means that the statistics is not a repeatable skill and thus, essentially, random. The thing is, the method for how we measure persistence can be a major factor in how well we can measure persistence and how well we can measure true randomness. Let’s take two methods for measuring persistence:

1.  Three year vs three year correlation, or more precisely the correlation between 2007-10 and 2010-13.
2.  Even vs odd seconds over the course of 6 seasons, or the statistic during every even second vs the statistic during every odd second.

Both methods split the data roughly in half so we are doing a half the data vs half the data comparison and I am going to do this for offensive statistics for forwards with at least 1000 minutes of 5v5 ice time in each half. I am using 6 years of data so we get large sample sizes for shooting percentage calculations. Here are the correlations we get.

 Comparison 0710 vs 1013 Even vs Odd Difference GF20 vs GF20 0.61 0.89 0.28 FF20 vs FF20 0.62 0.97 0.35 FSh% vs FSh% 0.51 0.73 0.22

GF20 is Goals for per 20 minutes of ice time. FF20 is fenwick for (shots + missed shots) per 20 minutes of ice time. FSh% is Fenwick Shooting Percentage or goals/fenwick.

We can see that the level of persistence we identify is much greater when looking at even vs odd minute correlation than when looking at 3 year vs 3 year correlation. A different test of persistence gives us significantly different results. The reason for this is that there are a lot of other factors that come into play when looking at 3 year vs 3 year correlations than even vs odd correlations. In the even vs odd correlations factors such as quality of team mates, quality of competition, zone starts, coaching tactics, etc. are non-factors because they should be almost exactly the same in the even minutes as the odd minutes. This is not true for the 3 year vs 3 year correlation. The difference between the two methods is roughly the amount of the correlation that can be attributed to those other factors. True randomness, and thus true lack of persistence, is essentially the difference between 1.00 and the even vs odd correlation. This equates to 0.11 for GF20, 0.03 for FF20 and 0.27 for FSh%.

Now, lets look at how well they correlate with a positive outcome, scoring goals. But instead of just looking at that lets combine it with persistence by looking at how well predict ‘other half’ goal scoring.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs GF20 0.54 0.86 0.33 GF20 vs FF20 0.44 0.86 0.42 FSh% vs GF20 0.48 0.76 0.28 GF20 vs FSh% 0.57 0.77 0.20

As you can see, both FF20 and FSh% are very highly correlated with GF20 but this is far more evident when looking at even vs odd than when looking at 3 year vs 3 year correlations. FF20 is more predictive of ‘other half’ GF20 but not significantly so but this is likely solely due to the greater randomness of FSh% (due to sample size constraints) since FSh% is more correlated with GF20 than FF20 is. The correlation between even FF20 and even GF20 is 0.75 while the correlation between even FSh% and even GF20 is 0.90.

What is also interesting to note is that even vs odd provides greater benefit for identifying FF20 value and persistence than for FSh%. What this tells us is that the skills related to FF20 are not as persistent over time as the skills related to FSh%. I have seen this before. I think what this means is that GMs are valuing shooting percentage players more than fenwick players and thus are more likely to maintain a core of shooting percentage players on their team while letting fenwick players walk. Eric T. found that teams reward players for high shooting percentage more than high corsi so this is likely the reason we are seeing this.

Now, let’s take a look at how well FF20 correlates with FSh%.

 Comparison 0710 vs 1013 Even vs Odd Difference FF20 vs FSh% 0.38 0.66 0.28 FSh% vs FF20 0.22 0.63 0.42

It is interesting to note that fenwick rates are highly correlated with shooting percentages especially when looking at the even vs odd data. What this tells us is that the skills that a player needs to generate a lot of scoring chances are a similar set of skills required to generate high quality scoring chances. Skills like good passing, puck control, quickness can lead to better puck possession and thus more shots but those same skills can also result in scoring at a higher rate on those chances. We know that this isn’t true for all players (see Scott Gomez) but generally speaking players that are good at controlling the puck are good at putting the puck in the net too.

Finally, let’s look at one more set of correlations. When looking at the the above correlations for players with >1000 minutes in each ‘half’ of the data there are a lot of players that have significantly more than 1000 minutes and thus their ‘stats’ are more reliable. In any given year a top line forward will get 1000+ minutes of 5v5 ice time (there were 125 such players in 2011-12) but generally less than 1300 minutes (only 5 players had more than 1300 minutes in 2010-11). So, I took all the players that had more than 1000 even and odd minutes over the course of the past 6 seasons but only those that had fewer than 2600 minutes in total. In essense, I took all the players that have between 1000 and 1300 even and odd minutes over the past 6 seasons. From this group of forwards I calculated the same correlations as above and the results should tell us approximately how reliable (predictive) one seasons worth of data is for a front line forward assuming they played in exactly the same situation the following season.

 Comparison Even vs odd GF20 vs GF20 0.82 FF20 vs FF20 0.93 FSh% vs FSh% 0.63 FF20 vs GF20 0.74 GF20 vs FF20 0.77 FSh% vs GF20 0.65 GF20 vs FSh% 0.66 FF20 vs FSh% 0.45 FSh% vs FF20 0.40

It should be noted that because of the way in which I selected the players (limited ice time over past 6 seasons) to be included in this calculation there is an abundance of 3rd liners with a few players that reached retirement (i.e. Sundin) and young players (i.e. Henrique, Landenskog) mixed in. It would have been better to take the first 2600 minutes of each player and do even/odd on that but I am too lazy to try and calculate that data so the above is the best we have. There is far less diversity in the list of players used than the NHL in general so it is likely that for any particular player with between 1000 and 1300 minutes of ice time the correlations are stronger.

So, what does the above tell us? Once you factor out year over year changes in QoT, QoC, zone starts, coaching tactics, etc.  GF20, FF20 and FSh% are all pretty highly persistent with just one years worth of data for a top line player. I think this is far more persistent, especially for FSh%, than most assume. The challenge is being able to isolate and properly account for changes in QoT, QoC, zone starts, coaching tactics, etc. This, in my opinion, is where the greatest challenge in hockey analytics lies. We need better methods for isolating individual contribution, adjusting for QoT, QoC, usage, etc. Whether that comes from better statistics or better analytical techniques or some combination of the two only time will tell but in theory at least there should be a lot more reliable information within a single years worth of data than we are currently able to make use of.

1. Alien says:

Yeah you can’t just look at Corsi For and Against rates (even if you look at them in the context of the rates of their team mates when playing without them and their competition when playing against others). You have to look at advanced statistics in a holistic way. I hate how opponents of advanced stats cherry pick Corsi rates and then say “see? I told you advanced stats are a joke. Games are won on the ice, not on Excel spreadsheets. This ain’t moneyball! Watch teh gamez!!” It doesn’t help though when Corsi proponents tell us that shot quality doesn’t matter and that Scott Gomez is just having bad luck while Sidney Crosby is lucky. That’s what makes us look like a joke to people. I can’t count how many times I’ve been derided by Leafs fans for pulling up advanced stats to make a case for or against a player. Especially whenever I talk about Nazem Kadri’s unsustainable on-ice sh% and PDO last season (higher than Sidney Crosby’s career level and Martin St. Louis’ level in the last 3 seasons). I get branded a moneyball Corsi kook.

For a player of his tenure, Scott Gomez probably has among the worst finishing ability in the NHL. But as a playmaker, he is very, very underrated. Look at his IAP rate. He has seen some regression in his IAP rate in 2012-13 for the Sharks. But his IGP and IPP were both higher last season. Probably because he played with 3rd/4th liners for the Sharks, who like him, also lack finishing ability. So he was forced to shoot at the net more this time. Scott Gomez ranks 43rd/301 NHL forwards (1500+ 5v5 mins) in IAP but dead last in IGP among that sample. When you harmonize the goals and assists and look at IPP, he’s 285th/301. Considering that most NHL teams dress 2 scoring lines, *maybe* 3, I’m pretty sure the Sharks didn’t sign him for his offensive flair. They signed Gomez because he’s great at bringing the puck north and generating shots, he’s a top 15 percentile playmaker (43rd/301) and while he’s below-average defensively, he’s not Tyler Bozak (so good enough defensively for a 3rd line role). And now he’s with the Florida Panthers for 900k/1yr. They’re probably going to use Scott Gomez in a similar capacity.

He’s paid low enough that they could bury him in the minors at no cap hit if he doesn’t perform. But Scott Gomez still has something to offer as an NHL regular in a 3rd scoring/checking line or 4th energy line role. He’s just probably not Top 6 quality anymore. His Corsi For stats are great. But probably not good enough to justify his anemic 58.2% IPP. Scott Gomez needs to play with 1 solid finisher and 1 well-rounded offense guy (goalscorer/playmaker) to justify a 2nd scoring line role. In hockey, assists are important too. But not being able to score goals is such a liability that you would have a hard time making a top 6 role in the NHL. Guys who are typically heralded as the best pure playmakers in the NHL typically shoot at a higher percentage than Scott Gomez and if their percentage is good or better, they get criticized for not shooting enough (ie. Alex Tanguay has a very high shooting %. But he doesn’t take many shots. Small sample size).

Scott Gomez
2010-13(3yr) forwards 1500+ 5v5 mins (301)
IAP: 43/301 (47.3%)
IGP: 301/301 (10.9%)
IPP: 285/301 (58.2%)
Sh%: 301/301 (3.11%)

Scott Gomez
Sample Size: 2012-13 forwards 400+ 5v5 mins (277)
IAP: 90/277 (45.5%)
IGP: 237/277 (18.2%)
Sh%: 246/277 (5%)

Sidney Crosby
2010-13(3yr) forwards 1250+ 5v5 mins (327)
IAP: 10/327 (52.5%)
IGP: 110/327 (31.4%)
IPP: 3/327 (83.9%)
Sh%: 8/327 (15.81%)

2. Alien says:

On the Scott Gomez tip, his on-ice Corsi Sh% (not to be confused with straight up shooting %) would bump up to 3.77% instead of 2.64% if he played with team mates who were NHL league average goal scorers (4.19 CSh%). It’s easy to blame Scott Gomez for the lack of production on his lines because he has a low shooting percentage. But he only takes 16.88% of the team’s shots while he’s on the ice anyway. Scott Gomez is a passer and if his line mates can’t get the job done with scoring, his assist production is going to suffer. At 3.77% on-ice CSh%, Scott Gomez’s Goal For rate would be above the NHL average. Scott Gomez would be productive on a 2nd scoring line with two average goal scorers. If you put him in with a clinical goal scorer and a well-rounded offensive winger, he’d be even more productive. Scott Gomez will never be Sidney Crosby good. But he’s very underrated.

Goals For with or without yous don’t work sometimes tells the story. But in Scott Gomez’s case, it does not. Just because Scott Gomez’s team mates do better without him doesn’t mean that Scott Gomez is a drag necessarily. If you put Scott Gomez in with guys who can’t score, then that’s just bad pairing and line chemistry. Scott Gomez’s team mates do better without Gomez because they have low iCorsi sh% themselves and need to play with a goal scorer to be productive.

To evaluate Scott Gomez I made use of Corsi, on-ice sh%, individual stats and production rates to provide a holistic picture. I know that the average Joe shmoe hockey fan will write off my analysis. But I am 100% positive that Scott Gomez, if he’s not playing with guys who can’t score, would be a suitable 2nd scoring line forward on a number of NHL teams. If Scott Gomez was flanked by Phil Kessel and James van Riemsdyk/Joffrey Lupul, he would score more points than Tyler Bozak. At <20% of the cap hit.

Scott Gomez By the Numbers
2010-13 (3yrs) Data
5v5 TOI (mins) 1967.616667
iCSh% 1.71%
Individual Goals 6
Individual Assists 26
First Assists 17
iCorsi 351
Individual Share of On-ice Corsi events 16.88%
G/60 0.183
A/60 0.793
P/60 0.976
FirstA/60 0.518

CF20 21.142
Total On-ice Corsi 2080
GF20 0.559
On-Ice GF 55
On-Ice CSh% 2.64%

Teammate Goals 49
Teammate Corsi 1729
Teammates’ iCSh% 2.83%
NHL Average CSh% 4.19%

Goals scored by Scott’s team mates if they were average goal scorers 72.48670476
Total on-ice Goals For if Scott Gomez had average goal scoring team mates 78.48670476
On-Ice CSh% if Scott Gomez had average goal scoring team mates 3.77%
GF20 Rate if Scott Gomez had average goal scoring team mates 0.798
NHL Average GF20 0.760

Assist Rate on Team Mate Goals 53.1%
First Assist Rate on Team Mate Goals 34.69%
Assists Recorded if Team Mates were Average Goal Scorers 38.46233314
First Assists Recorded if Team Mates were Average Goal Scorers 25.14844859
A/60 Rate if Team Mates were Average Goal Scorers 1.173
FirstA/60 Rate if Team Mates were Average Goal Scorers 0.767
P/60 Rate if Team Mates were Average Goal Scorers 1.356
Tyler Bozak with 1337 Goal Scorer Team Mates 1.34