Apr 112013
 

Stats.hockeyanalysis.com has just gotten even better! Several people have asked why I have zone start adjusted stats for team stats and it is a good question. The answer to that is that it was just easier from a programming point of view to have the same ‘situations’ for both the player level and the team level and since I was already calculating, for example, 5v5close zone start adjusted data for players it was east to add 5v5close zone start adjusted data for teams. Since it makes sense to have non-zone start adjusted data for teams it was on my todo list to get it implemented. So, now it is done, and so much more. The situations that you can access data for at both the player and team level are:

  • 5v5
  • 5v5 home
  • 5v5 road
  • 5v5 close
  • 5v5 tied
  • 5v5 leading
  • 5v5 trailing
  • 5v5 up 1 goal
  • 5v5 up 2+ goals
  • 5v5 down 1 goal
  • 5v5 down 2+ goals
  • 5v4 PP
  • 4v5 PK

In addition to all of the above, all of the above are also available in their Zone Adjusted forms except for the 5v4 PP and 4v5 PK situations. In total, there are now 24 different situations you can search for stats on.  Have at it and don’t blame me for any lost weekends (or lost productivity at work).

(As usual, if you find any issues with the new data please let me know. The stats should be correct but while I have done some testing on the new code to display the stats but it isn’t completely tested.)

 

Apr 052013
 

I often get asked questions about hockey analytics, hockey fancy stats, how to use them, what they mean, etc. and there are plenty of good places to find definitions of various hockey stats but sometimes what is more important than a definition is some guidelines on how to use them. So, with that said, here are several tips that I have for people using advanced hockey stats.

Don’t over value Quality of Competition

I don’t know how often I’ll point out one players poor stats or another players good stats and immediately get the response “Yeah, but he always plays against the opponents best players” or “Yeah, but he doesn’t play against the oppositions best players” but most people that say that kind of thing have no real idea how much quality of opponent will affect the players statistics. The truth is it is not nearly as much as you might think.  Despite some coaches desperately trying to employ line matching techniques the variation in quality of competition metric is dwarfed by variation in quality of teammates, individual talent, and on-ice results. An analysis of Pavel Datsyuk and Valterri Filppula showed that if Filppula had Datsyuk’s quality of competition his CorsiFor% would drop from 51.05% to 50.90% and his GoalsFor% would drop from 55.65% to 55.02%. In the grand scheme of things, this are relatively minor factors.

Don’t over value Zone Stats either

Like quality of competition, many people will use zone starts to justify a players good/poor statistics. The truth is zone starts are not a significant factor either. I have found that the effect of zone starts is largely eliminated after about 10 seconds after a face off and this has been found true by others as well. I account for zone starts in statistics by eliminating the 10 seconds after an offensive or defensive zone face off and I have found doing this has relatively little effect on a players stats. Henrik Sedin is maybe the most extreme case of a player getting primarily offensive zone starts and all those zone starts took him from a 55.2 fenwick% player to a 53.8% fenwick% player when zone starts are factored out. In the most extreme case there is only a 1.5% impact on a players fenwick% and the majority of players are no where close to the zone start bias of Henrik Sedin. For the majority of players you are probably talking something under 0.5% impact on their fenwick%. As for individual stats over the last 3 seasons H. Sedin had 34 goals and 172 points in 5v5 situations and just 2 goals and 14 points came within 10 seconds of a zone face off, or about 5 points a year. If instead of 70% offensive zone face off deployment he had 50% offensive zone face off deployment instead of having 14 points during the 10 second zone face off time he may have had 10.  That’s a 4 point differential over 3 years for a guy who scored 172 points. In simple terms, about 2.3% of H. Sedin’s 5v5 points can be attributed to his offensive zone start bias.

A derivative of this is that if zone starts don’t matter much, a players face off winning percentage probably doesn’t matter much either which is consistent with other studies. It’s a nice skill to have, but not worth a lot either.

Do not ignore Quality of Teammates

I have just told you to pretty much ignore quality of competition and zone starts, what about quality of teammates? Well, to put it simply, do not ignore them. Quality of teammates matters and matters a lot. Sticking with the Vancouver Canucks, lets use Alex Burrows as an example. Burrows mostly plays with the Sedin twins but has played on Kesler’s line a bit too. Over the past 3 seasons he has played about 77.9% of his ice time with H. Sedin and about 12.3% of his ice time with Ryan Kesler and the reminder with Malhotra and others. Burrow’s offensive production is significantly better when playing with H. Sedin as 88.7% of his goals and 87.2% of his points came during the 77.9% ice time he played with H. Sedin. If Burrows played 100% of his ice time with H. Sedin and produced at the same rate he would have scored 6 (9.7%) more goals and 13 (11%) more 5v5 points over the past 3 seasons. This is far more significant than the 2.3% boost H. Sedin saw from all his offensive zone starts and I am not certain my Burrows example is the most extreme example in the NHL. How many more points would an average 3rd line get if they played mostly with H. Sedin instead of the average 3rd liner. Who you play with matters a lot. You can’t look at Tyler Bozak’s decent point totals and conclude he is a decent player without considering he plays a lot with Kessel and Lupul, two very good offensive players.

Opportunity is not talent

Kind of along the same lines as the Quality of Teammates discussion, we must be careful not to confuse opportunity and results. Over the past 2 seasons Corey Perry has the second most goals of any forward in the NHL trailing only Steven Stamkos. That might seem impressive but it is a little less so when you consider Perry also had the 4th most 5v5 minutes during that time and the 11th most 5v4 minutes.  Perry is a good goal scorer but a lot of his goals come from opportunity (ice time) as much as individual talent. Among forwards with at least 1500 minutes of 5v5 ice time the past 2 seasons, Perry ranks just 30th in goals per 60 minutes of ice time. That’s still good, but far less impressive than second only to Steven Stamkos and he is actually well behind teammate Bobby Ryan (6th) in this metric. Perry is a very good player but he benefits more than others by getting a lot of ice time  and PP ice time. Perry’s goal production is a large part talent, but also somewhat opportunity driven and we need to keep this in perspective.

Don’t ignore the percentages (shooting and save)

The percentages matter, particularly shooting percentages. I have shown that players can sustain elevated on-ice shooting percentages and I have shown that players can have an impact on their line mates shooting percentages and Tom Awad has shown that a significant portion of the difference between good players and bad players is finishing ability (shooting percentage).  There is even evidence that goal based metrics (which incorporate the percentages) are a better predictor of post season success than fenwick based metric. What corsi/fenwick metrics have going for them is more reliability over small sample sizes but once you approach a full seasons worth of data that benefit is largely gone and you get more benefit from having the percentages factored into the equation. If you want to get a better understanding of what considering the percentages can do for you, try to do a Malkin vs Gomez comparison or a Crosby vs Tyler Kennedy comparison over the past several years. Gomez and Kennedy actually look like relatively decent comparisons if you just consider shot based metrics, but both are terrible percentage players while Malkin and Crosby are excellent percentage players and it is the percentages that make Malkin and Crosby so special. This is an extreme example but the percentages should not be ignored if you want a true representation of a players abilities.

More is definitely better

One of the reason many people have jumped on the shot attempt/corsi/fenwick band wagon is because they are more frequent events than goals and thus give you more reliable metrics. This is true over small sample sizes but as explained above, the percentages matter too and should not be ignored. Luckily, for most players we have ample data to get past the sample size issues. There is no reason to evaluate a player based on half a seasons data if that player has been in the league for several years. Look at 2, 3, 4 years of data.  Look for trends. Is the player consistently a higher corsi player? Is the player consistently a high shooting percentage player? Is the player improving? Declining? I have shown on numerous occassions that goals are a better predictor of future goal rates than corsi/fenwick starting at about one year of data but multiple years are definitely better. Any conclusion about a players talent level using a single season of data or less (regardless of whether it is corsi or goal based) is subject to a significant level of uncertainty. We have multiple years of data for the majority of players so use it. I even aggregate multiple years into one data set for you on stats.hockeyanalysis.com for you so it isn’t even time consuming. The data is there, use it. More is definitely better.

WOWY’s are where it is at

In my mind WOWY’s are the best tool for advanced player evaluation. WOWY stands for with or without you and looks at how a player performs while on the ice with a team mate and while on the ice without a team mate. What WOWY’s can tell you is whether a particular player is a core player driving team success or a player along for the ride. Players that consistently make their team mates statistics better when they are on the ice with them are the players you want on your team. Anze Kopitar is an example of a player who consistently makes his teammates better. Jack Johnson is an example of a player that does not, particularly when looking at goal based metrics.   Then there are a large number of players that are good players that neither drive your teams success nor hold it back, or as I like to say, complementary players. Ideally you build your team around a core of players like Kopitar that will drive success and fill it in with a group of complementary players and quickly rid yourself of players like Jack Johnson that act as drags on the team.

 

Apr 052013
 

Yesterday HabsEyesOnThePrize.com had a post on the importance of fenwick come playoff time over the past 5 seasons. It is definitely worth a look so go check it out. In the post they look at FF% in 5v5close situations and see how well it translates into post season success. I wanted to take this a step further and take a look at PDO and GF% in 5v5close situations to see of they translate into post season success as well.  Here is what I found:

Group N Avg Playoff Avg Cup Winners Lost Cup Finals Lost Third Round Lost Second Round Lost First Round Missed Playoffs
GF% > 55 19 2.68 2.83 5 1 2 6 4 1
GF% 50-55 59 1.22 1.64 0 2 6 10 26 15
GF% 45-50 52 0.62 1.78 0 2 2 4 10 34
GF% <45 20 0.00 - 0 0 0 0 0 20
FF% > 53 23 2.35 2.35 3 2 4 5 9 0
FF% 50-53 55 1.15 1.70 2 2 1 10 22 18
FF% 47-50 46 0.52 1.85 0 0 4 3 6 33
FF% <47 26 0.54 2.00 0 1 1 2 3 19
PDO >1010 27 1.63 2.20 2 2 2 6 8 7
PDO 1000-1010 42 1.17 1.75 1 0 5 7 15 14
PDO 990-1000 47 0.91 1.95 2 1 3 4 12 25
PDO <990 34 0.56 1.90 0 2 0 3 5 24

I have grouped GF%, FF% and PDO into four categories each, the very good, the good, the mediocre and the bad and I have looked at how many teams made it to each round of the playoffs from each group. If we say that winning the cup is worth 5 points, getting to the finals is worth 4, getting to the 3rd round is worth 3, getting to the second round is worth 2, and making the playoffs is worth 1, then the Avg column is the average point total for the teams in that grouping.  The Playoff Avg is the average point total for teams that made the playoffs.

As HabsEyesOnThePrize.com found, 5v5close FF% is definitely an important factor in making the playoffs and enjoying success in the playoffs. That said, GF% seems to be slightly more significant. All 5 Stanley Cup winners came from the GF%>55 group while only 3 cup winners came from the FF%>53 group and both Avg and PlayoffAvg are higher in the GF%>55 group than the FF%>53 group. PDO only seems marginally important, though teams that have a very good PDO do have a slightly better chance to go deeper into the playoffs. Generally speaking though, if you are trying to predict a Stanley Cup winner, looking at 5v5close GF% is probably a better metric than looking at 5v5close FF% and certainly better than PDO. Now, considering this is a significantly shorter season than usual, this may not be the case as luck may be a bit more of a factor in GF% than usual but historically this has been the case.

So, who should we look at for playoff success this season?  Well, there are currently 9 teams with a 5v5close GF% > 55.  Those are Anaheim, Boston, Pittsburgh, Los Angeles, Montreal, Chicago, San Jose, Toronto and Vancouver. No other teams are above 52.3% so that is a list unlikely to get any new additions to it before seasons end though some could certainly fall out of the above 55% list. Now if we also only consider teams that have a 5v5close FF% >50% then Toronto and Anaheim drop off the list leaving you with Boston, Pittsburgh, Los Angeles, Montreal, Chicago, San Jose and Vancouver as your Stanley Cup favourites, but we all pretty much knew that already didn’t we?

 

Apr 012013
 

I have been on a bit of a mission recently to push the idea that quality of competition (and zone starts) is not a huge factor in ones statistics and that most people in general over value its importance. I don’t know how often I hear arguments like “but he plays all the tough minutes” as an excuse as to why a player has poor statistics and pretty much every time I do I cringe because almost certainly the person making the argument has no clue how much those tough minutes impact a players statistics.

While thinking of how to do this study, and which players to look at, I was listening to a pod cast and the name Pavel Datsyuk was brought up so I decided I would take a look at him because in addition to being mentioned in a pod cast he is a really good 2-way player who plays against pretty tough quality of competition. For this study I looked at 2010-12 two year data and Datsyuk has the 10th highest HART QoC during that time in 5v5 zone start adjusted situations.

The next step was to look how Datsyuk performed against various types of opposition. To do this I took all of Datsyuk’s opponent forwards who had he played at least 10 minutes of 5v5 ZS adjusted ice time against (you can find these players here) and grouped them according to their HARO, HARD, CorHARO and CorHARD ratings and looked at how Datsyuk’s on-ice stats looked against each group.

OppHARO TOI% GA20
>1.1 46.84% 0.918
0.9-1.1 34.37% 0.626
<0.9 18.79% 0.391

Lets go through a quick explanation of the above table. I have grouped Datsyuk’s opponents by their HARO ratings into three groups, those with a HARO >1.1, those with a HARO between 0.9 and 1.1 and those with a HARO rating below 0.9. These groups represent strong offensive players, average offensive players and weak offensive players. Datsyuk played 46.84% of his ice time against the strong offensive player group, 34.37% against the average offensive player group and 18.79% against the weak offensive player group. The GA20 column is Datsyuk’s goals against rate, or essentially the goals for rate of Datsyuk’s opponents when playing against Datsyuk. As you can see, the strong offensive players do significantly better than the average offensive players who in turn do significantly better than the weak offensive players.

Now, let’s look at how Datsyuk does offensively based on the defensive ability of his opponents.

OppHARD TOI% GF20
>1.1 35.39% 1.171
0.9-1.1 35.36% 0.994
<0.9 29.25% 1.004

Interestingly, the defensive quality of Datsyuk’s opponents did not have a significant impact on Datsyuk’s ability to generate offense which is kind of an odd result.

Here are the same tables but for corsi stats.

OppCorHARO TOI% CA20
>1.1 15.59% 15.44
0.9-1.1 77.79% 13.78
<0.9 6.63% 10.84

 

OppCorHARD TOI% CF20
>1.1 18.39% 15.89
0.9-1.1 68.81% 18.49
<0.9 12.80% 22.69

I realize that I should have tightened up the ratings splits to get a more even distribution in TOI% but I think we see the effect of QoC fine. When looking at corsi we do see that CF20 varies across defensive quality of opponent which we didn’t see with GF20.

From the tables above, we do see that quality of opponent can have a significant impact on a players statistics. When you are playing against good offensive opponents you are bound to give up a lot more goals than you will against weaker offensive opponents. The question remains is whether players can and do play a significantly greater amount of time against good opponents compared to other players. To take a look at this I looked at the same tables above but for Valtteri Filppula, a player who rarely gets to play with Datsyuk so in theory could have a significantly different set of opponents to Datsyuk. Here are the same tables above for Filppula.

OppHARO TOI% GA20
>1.1 42.52% 1.096
0.9-1.1 35.35% 0.716
<0.9 22.12% 0.838

 

OppHARD TOI% GF20
>1.1 32.79% 0.841
0.9-1.1 35.53% 1.197
<0.9 31.68% 1.370

 

OppCorHARO TOI% GA20
>1.1 12.88% 19.03
0.9-1.1 78.20% 16.16
<0.9 8.92% 14.40

 

OppCorHARD TOI% GF20
>1.1 20.89% 15.48
0.9-1.1 64.94% 17.16
<0.9 14.17% 19.09

Nothing too exciting or unexpected in those tables. What is more important is how the ice times differ from Datsyuk’s across groups and how those differences might affect Filppula’s statistics.

We see that Datsyuk plays a little bit more against good offensive players and a little bit less against weak offensive players and he also plays a little bit more against good defensive players and a little bit less against weak defensive players. If we assume that Filppula played Datsyuk’s and that Datsyuk’s within group QoC ratings was the same as Filppula’s we can calculate what Filppula’s stats will be against similar QoC.

Actual w/ DatsyukTOI
GF20 1.135 1.122
GA20 0.905 0.917
GF% 55.65% 55.02%
CF20 17.08 17.09
CA20 16.37 16.49
CF% 51.05% 50.90%

As you can see, that is not a huge difference. If we gave Filppula the same QoC as Datsyuk instead of being a 55.65% GF% player he’d be a 55.02% GF% player. That is hardly enough to worry about and the difference in CF% is even less.

From this an any other study I have looked at I have found very little evidence that QoC has a significant impact on a players statistics. The argument that a player can have bad stats because he plays the ‘tough minutes’ is, in my opinion, a bogus argument. Player usage can have a small impact on a players statistics but it is not anything to be concerned with for the vast majority of players and it will never make a good player have bad statistics or a bad player have good statistics. Player usage charts (such as those found here or those found here) are interesting and pretty neat and do give you an idea of how a coach uses his players but as a tool for justifying a players good, or poor, performance they are not. The notion of ‘tough minutes’ exists, but are not all that important over the long haul.

 

 

Mar 202013
 

I generally think that the majority of people give too much importance to quality of competition (QoC) and its impact on a players statistics but if we are going to use QoC metrics let’s at least try and use the best ones available. In this post I will take a look at some QoC metrics that are available on stats.hockeyanalysis.com and explain why they might be better than those typically in use.

OppGF20, OppGA20, OppGF%

These three stats are the average GF20 (on ice goals for per 20 minutes), OppGA20 (on ice goals against per 20 minutes) and GF% (on ice GF / [on ice GF + on ice GA]) of all the opposition players that a player lined up against weighted by ice time against. In fact, these stats go a bit further in that they remove the ice time the opponent players played against the player so that a player won’t influence his own QoC (not nearly as important as QoT but still a good thing to do). So, essentially these three stats are the goal scoring ability of the opposition players, the goal defending ability of the opposition players, and the overall value of the opposition players. Note that opposition goalies are not included in the calculation of OppGF20 as it is assume the goalies have no influence on scoring goals.

The benefits of using these stats are they are easy to understand and are in a unit (goals per 20 minutes of ice time) that is easily understood. GF20 is essentially how many goals we expect the players opponents would score on average per 20 minutes of ice time. The drawback from this stat is that if good players play against good players and bad players play against bad players a good player and a bad player may have similar statistics but the good players is a better player because he did it against better quality opponents. There is no consideration for the context of the opponents statistics and that may matter.

Let’s take a look at the top 10 forwards in OppGF20 last season.

Player Team OppGF20
Patrick Dwyer Carolina 0.811
Brandon Sutter Carolina 0.811
Travis Moen Montreal 0.811
Carl Hagelin NY Rangers 0.806
Marcel Goc Florida 0.804
Tomas Plekanec Montreal 0.804
Brooks Laich Washington 0.800
Ryan Callahan NY Rangers 0.799
Patrik Elias New Jersey 0.798
Alexei Ponikarovsky New Jersey 0.795

You will notice that every single player is from the eastern conference. The reason for this is that the eastern conference is a more offensive conference. Taking a look at the top 10 players in OppGA20 will show the opposite.

Player Team OppGF20
Marcus Kruger Chicago 0.719
Jamal Mayers Chicago 0.720
Mark Letestu Columbus 0.721
Andrew Brunette Chicago 0.723
Andrew Cogliano Anaheim 0.723
Viktor Stalberg Chicago 0.724
Matt Halischuk Nashville 0.724
Kyle Chipchura Phoenix 0.724
Matt Belesky Anaheim 0.724
Cory Emmerton Detroit 0.724

Now, what happens when we look at OppGF%?

Player Team OppGF%
Mike Fisher Nashville 51.6%
Martin Havlat San Jose 51.4%
Vaclav Prospal Columbus 51.3%
Mike Cammalleri Calgary 51.3%
Martin Erat Nashville 51.3%
Sergei Kostitsyn Nashville 51.3%
Dave Bolland Chicago 51.2%
Rick Nash Columbus 51.2%
Travis Moen Montreal 51.0%
Patrick Marleau San Jose 51.0%

There are predominantly western conference teams with a couple of eastern conference players mixed in. The reason for this western conference bias is that the western conference was the better conference and thus it makes sense that the QoC would be tougher for western conference players.

OppFF20, OppFA20, OppFF%

These are exactly the same stats as the goal based stats above but instead of using goals for/against/percentage they use fenwick for/against/percentage (fenwick is shots + shots that missed the net). I won’t go into details but you can find the top players in OppFF20 here, in OppFA20 here, and OppFF% here. You will find a a lot of similarities to the OppGF20, OppGA20 and OppGF% lists but if you ask me which I think is a better QoC metric I’d lean towards the goal based ones. The reason for this is that the smaller sample size issues we see with goal statistics is not going to be nearly as significant in the QoC metrics because over all opponents luck will average out (for every unlucky opponent you are likely to have a lucky one t cancel out the effects). That said, if you are doing a fenwick based analysis it probably makes more sense to use a fenwick based QoC metric.

HARO QoC, HARD QoC, HART QoC

As stated above, one of the flaws of the above QoC metrics is that there is no consideration for the context of the opponents statistics. One of the ways around this is to use the HockeyAnalysis.com HARO (offense), HARD (defense) and HART (Total/Overall) ratings in calculating QoC. These are player ratings that take into account both quality of teammates and quality of competition (here is a brief explanation of what these ratings are).The HARO QoC, HARD QoC and HART QoC metrics are simply the average HARO, HARD and HART ratings of players opponents.

Here are the top 10 forwards in HARO QoC last year:

Player Team HARO QoC
Patrick Dwyer Carolina 6.0
Brandon Sutter Carolina 5.9
Travis Moen Montreal 5.8
Tomas Plekanec Montreal 5.8
Marcel Goc Florida 5.6
Carl Hagelin NY Rangers 5.5
Ryan Callahan NY Rangers 5.3
Brooks Laich Washington 5.3
Michael Grabner NY Islanders 5.2
Patrik Elias New Jersey 5.2

There are a lot of similarities to the OppGF20 list with the eastern conference dominating. There are a few changes, but not too many, which really is not that big of a surprise to me knowing that there is very little evidence that QoC has a significant impact on a players statistics and thus considering the opponents QoC will not have a significant impact on the opponents stats and thus not a significant impact on a players QoC. That said, I believe these should produce slightly better QoC ratings. Also note that a 6.0 HARO QoC indicates that the opponent players are expected to produce a 6.0% boost on the league average GF20.

Here are the top 10 forwards in HARD QoC last year:

Player Team HARD QoC
Jamal Mayers Chicago 6.0
Marcus Kruger Chicago 5.9
Mark Letestu Columbus 5.8
Tim Jackman Calgary 5.3
Colin Fraser Los Angeles 5.2
Cory Emmerton Detroit 5.2
Matt Belesky Anaheim 5.2
Kyle Chipchura Phoenix 5.1
Andrew Brunette Chicago 5.1
Colton Gilles Columbus 5.0

And now the top 10 forwards in HART QoC last year:

Player Team HART QoC
Dave Bolland Chicago 3.2
Martin Havlat San Jose 3.0
Mark Letestu Columbus 2.5
Jeff Carter Los Angeles 2.5
Derick Brassard Columbus 2.5
Rick Nash Columbus 2.4
Mike Fisher Nashville 2.4
Vaclav Prospal Columbus 2.2
Ryan Getzlaf Anaheim 2.2
Viktor Stalberg Chicago 2.1

Shots and Corsi based QoC

You can also find similar QoC stats using shots as the base stat or using corsi (shots + shots that missed the net + shots that were blocked) on stats.hockeyanalysis.com but they are all the same as above so I’ll not go into them in any detail.

CorsiRel QoC

The most common currently used QoC metric seems to be CorsiRel QoC (found on behindthenet.ca) but in my opinion this is not so much a QoC metric but a ‘usage’ metric. CorsiRel is a statistic that compares the teams corsi differential when the player is on the ice to the teams corsi differential when they player is not on the ice.  CorsiRel QoC is the average CorsiRel of all the players opponents.

The problem with CorsiRel is that good players on a bad team with little depth can put up really high CorsiRel stats compared to similarly good players on a good team with good depth because essentially it is comparing a player relative to his teammates. The more good teammates you have, the more difficult it is to put up a good CorsiRel. So, on any given team the players with a good CorsiRel are the best players on team team but you can’t compare CorsiRel on players on different teams because the quality of the teams could be different.

CorsiRel QoC is essentially the average CorsiRel of all the players opponents but because CorsiRel is flawed, CorsiRel QoC ends up being flawed too. For players on the same team, the player with the highest CorsiRel QoC plays against the toughest competition so in this sense it tells us who is getting the toughest minutes on the team, but again CorsiRel QoC is not really that useful when comparing players across teams.  For these reasons I consider CorsiRel QoC more of a tool to see the usage of a player compared to his teammates, but is not in my opinion a true QoC metric.

I may be biased, but in my opinion there is no reason to use CorsiRel QoC anymore. Whether you use GF20, GA20, GF%, HARO QoC, HARD QoC, and HART QoC, or any of their shot/fenwick/corsi variants they should all produce better QoC measures that are comparable across teams (which is the major draw back of CorsiRel QoC.

 

Mar 142013
 

I often see people using zone starts and/or quality of competition as a way to justify any players unexpectedly poor or unexpectedly good play. Player X has a bad goal or corsi ratio because he plays all the tough minutes (i.e. the defensive zone starts and against the oppositions best lines). I am pretty certain that quality of competition is vastly over emphasized (everyone plays against everyone to some extent) and is vastly overshadowed by individual skill and quality of teammates, and I think zone starts do as well.

Eric Tulsky at NHL Numbers.com posted a good review of the research into the zone start effects on corsi statistics and I recommend people give that a read. I want to look into the issue a little further though. Most of the attempts to identify the impact of zone starts on a players stats have been inferred by looking at the league-wide correlations or by actual counting of how many shots are taken after a zone face off. Both of these have their faults. As Eric Tulsky pointed out, taking a correlation of every players corsi with their zone start stats doesn’t take into account that it is the top line players that usually get the offensive zone starts and thus this likely over estimates the impact as these players do take more shots regardless of their zone start. Eric Tulsky also took the time to count the number of fenwick events that occur between an offensive zone face off and the time the puck leaves the offensive zone and estimated that to be 0.31. This would imply that every extra offensive zone start a player takes is worth 0.31 fenwick events. Of course, this doesn’t take into account that the best offensive players in the league typical get more  offensive zone starts but it also doesn’t consider what happens after the puck leaves the zone. If the puck leaves the zone under the opposing teams control there is probably a negative fenwick effect for the next several seconds of play reducing the 0.31 number further.

I want to get beyond these issues by taking a look at how zone starts affect individual players. I have previously argued that after 10 seconds of an offensive/defensive zone face off the majority of the benefit (or penalty) of an offensive (or defensive zone) face off has worn off. I wanted to take it a bit further to be sure that there is no residual effect and chose to conduct this analysis using a 45 second cut off. So, any time within 45 seconds of an offensive or defensive zone face off with no other stoppages in play will be eliminated in my face off adjusted data. This should eliminate pretty much every second of every shift that started with an offensive or defensive zone face off leaving just the play that occurred after a neutral zone face off or on the fly changes. I am going to call this ice time F45 ice time and it will represent ice time that is not in any way affected by zone starts. With this in mind, I will take a look at the differences between straight 5v5 stats and the F45 stats and the differences will give me an indication of how significant zone starts impact a players stats.

To do this I will look at both corsi for and corsi against stats on a per 20 minutes of ice time basis. It should be noted that corsi rates are about 7.5% higher during the f45 play (goal rates are ~15% higher!) so I will reduce the f45 corsi rates by 7.5% to account for this and conduct a fair comparison (previous zone start studies may have been impacted by this as well). Now, let’s take a look at eight players (Manny Malhotra, Dave Bolland, Brian Boyle, Jay McClement, Tanner Glass, Brandon Sutter, Adam Hall, and Taylor Pyatt) with an excess of defensive zone starts.

OZ% DZ% OZ%-DZ% FF20 FA20 FF%
Malhotra 12.2 54.6 -42.4 -3.09% 1.09% -1.0%
Bolland 19.8 40.5 -20.7 8.94% -5.25% 3.5%
B. Boyle 21.0 40.2 -19.2 2.87% 8.74% 0.3%
McClement 24.8 41.9 -17.1 -0.31% 1.34% -0.4%
Glass 20.5 37.1 -16.6 4.39% -6.00% 2.6%
Sutter 23.1 36.6 -13.5 -2.67% 2.32% -1.2%
Hall 20.7 33.9 -13.2 -4.06% 4.59% -2.2%
Pyatt 24.0 36.4 -12.4 0.38% -0.25% 0.2%
Average 20.8 40.2 -19.4 0.81% 0.82% 0.23%

The FF20 and FA20 columns show the % change in from 5v5 play to F45 play and the FF% column shows the 5v5 FF% – F45 FF%. The averages are a straight average, not weighted for ice time or zone starts. For players that have a significant defensive zone bias we would expect their F45 play to exhibit an increase in FF20 and a decrease in FA20 resulting in an increase in FF%. In bold are the circumstances where this in fact did happen. As you can see, this isn’t the majority of the time. It is actually kind of surprising that these heavily defensive zone start biased players didn’t see a significant and systematic improvement in their fenwick rates.

Now, let’s take a look at eight players (Henrik Sedin, Patrick Kane, Maian Gaborik, Justin Abdelkader, Kyle Wellwood, Tomas Vanek, John Tavares, Jason Arnott) who had a heavy offensive zone start bias.

OZ% DZ% OZ%-DZ% FF20 FA20 FF%
H. Sedin 49.3 16.2 33.1 -3.72% 1.81% -1.4%
P. Kane 41.4 20.3 21.1 5.94% 4.66% 0.3%
Gaborik 39.0 22.8 16.2 0.60% 2.32% -0.4%
Abdelkader 37.5 26.0 11.5 3.93% 3.49% 0.1%
K. Wellwood 36.9 27.6 9.3 4.54% -2.32% 1.7%
Vanek 36.2 27.2 9.0 -3.39% 1.06% -1.1%
Tavares 35.8 27.2 8.6 -2.39% 1.83% -1.0%
Arnott 36.4 28.0 8.4 -3.41% 1.81% -1.3%
Average 39.1 24.4 14.7 0.26% 1.83% -0.39%

For offensive zone start biased players we would expect to see their FF20 decrease, FA20 increase and FF% decrease when we remove their zone start bias. This is mostly true for FA10 (only Wellwood deviated from expectations) but less true for FF20 and FF% and overall the adjustments were relatively minor. Henrik Sedin had the greatest negative impact to his FF% but it only took him from a 55.2% fenwick player to a 53.8% fenwick player which is still pretty good. This could very well be an upper bound on the benefit of excessive offensive zone starts.

Eric Tulsky also presented a paper at the recent Sloan Sports Analytics Conference in which he suggested that a successful zone entry via carrying the puck in is worth upwards of 0.60 fenwick and upwards of 0.28 fenwick on a dump in. As pointed out earlier, Eric Tulsky counted o.31 fenwick between an offensive zone face off and the puck clearing the zone so and if the other team is clearing the zone with control of the puck, it is certainly possible that they will generate almost as many shots on their subsequent counter-rush essentially negating much of the benefit of the offensive zone start. Without studying zone exits and how frequently zone exists result in successful zone entries into opposing teams end we won’t know for sure, but the data shown above indicates that this might be the case.

The next question that might be worth exploring is, if there is no significant benefit to starting your offensive players in the offensive zone, is there a penalty? For example, might it be better for the Canucks to start the Sedin’s solely in the defensive and neutral zones on the theory that their talent with the puck will allow them to more frequently carry the puck into the offensive zone which, as Eric Tulsky showed, more frequently results in shots and goals. I am not certain of that but might be worthy of further investigation.  I suspect again any benefit/penalty of any zone start deployment will largely be overshadowed by the players individual ability and the quality of their line mates. The ability to win puck battles, control the puck and move it up the ice is the real driver of stats, not usage of the player.

All of this is to say that coaching strategy (at least player usage strategy) is probably not a significant factor in the statistical performance of the players or the outcomes of games and I suspect, as I previously found, the majority of the benefit of an offensive zone start is those situations where you win a face off, take a shot resulting in a goal or the goalie catching it or covering it for another face off.  If the play goes beyond that individual talent (puck retrieval for example) takes over and the opposition will get an opportunity to counter attack. This is why, as I previously determined, eliminating the first 10 seconds after a face off is sufficient for eliminating the majority of the effects of a zone start and even then, the effects are probably not as significant as we think they should be.

 

Mar 112013
 

There has been a fair bit of talk recently about Tyler Bozak and what the Leafs should do with him as he is clearly not suited for his #1C role but is set to be a UFA this summer and if the Leafs intend to keep him he’ll need a new contract.  To get an idea of his worth, I decided to see if I could identify a few comparable players.

Let’s start off offensively. The first thing I looked at was primary points per 60 minutes of 5v5 ice time (primary points = goals + first assists). From last year through this past weekend’s games Bozak had a PrPts/60of 1.085 so as an initial cut off I pared down the list of comparable players to forwards a PrPts/60 of between 1.00 and 1.20 and who have had at least 1000 minutes of ice time. There are some pretty good players in this list such as Ryan Getzlaf, Stephen Weiss, Tomas Plekanec and Daniel Breiere but there are some less talented players like Eric Nystrom and Marcel Goc.

The next thing I considered is Primary Points Percentage (PrPts%), or the percentage of goals scored while the player was on the ice. Tyler Bozak’s PrPts% is a relatively weak 41.24% (Getzlaf, for example, is 52.38% and Plekanec’s is 56.22%). I then pared down the list to just include centers and this is what I came up with as comparable offensive centers, sorted by PrPts%.

Player Team PPts/60 PrPts%
NIELSEN, FRANS NY Islanders 1.091 47.98%
SMITH, ZACK Ottawa 1.008 46.67%
VERMETTE, ANTOINE Phoenix 1.173 46.55%
LETESTU, MARK Columbus 1.138 46.32%
NUGENT-HOPKINS, RYAN Edmonton 1.182 46.14%
ZUBRUS, DAINIUS New Jersey 1.12 45.31%
KRUGER, MARCUS Chicago 1.115 43.78%
HANZAL, MARTIN Phoenix 1.078 42.27%
STAJAN, MATT Calgary 1.064 41.87%
BOZAK, TYLER Toronto 1.085 41.24%
KOIVU, SAKU Anaheim 1.15 38.49%

That is a list of mostly 2nd and 3rd line centers along with not yet fully developed Nugent-Hopkins. So, what about Bozak defensively? To evaluate defensive play I looked at the players 5v5 corsi events against per 20 minutes (CA20) and the ratio of the players CA20 vs his team mates CA20 when they are not playing with him (TMCA20). This gives us an indication of whether their team mates are improving their defensive stats while on the the ice with the player.

Player Name Team CA20 CA20/TMCA20
ZUBRUS, DAINIUS New Jersey 14.309 0.77
LETESTU, MARK Columbus 17.034 0.90
STAJAN, MATT Calgary 17.312 0.91
HANZAL, MARTIN Phoenix 18.122 0.93
VERMETTE, ANTOINE Phoenix 17.762 0.97
NIELSEN, FRANS NY Islanders 18.307 1.01
KOIVU, SAKU Anaheim 17.114 1.02
SMITH, ZACK Ottawa 18.771 1.04
KRUGER, MARCUS Chicago 15.940 1.05
BOZAK, TYLER Toronto 21.155 1.08

For CA20/TMCA20, the lower the number the better as this indicates their line mates CA20 is better with the player than not with the player. Bozak ranks dead last in this category and also ranks dead last (by a significant margin) in CA20.

So, what does this tell us about Tyler Bozak?  Well, it probably means he has 3rd line offensive ability but it is very questionable whether he is good enough defensively be a useful 3rd liner. As for the best comparable to Tyler Bozak, I’d have to say either Marcus Kruger or Matt Stajan or maybe Frans Nielsen but Bozak is probably somewhat below all of them in terms of value due to his poor defensive play.

 

Mar 062013
 

One of the surprise player performances so far this season is that of Jakub Voracek. Voracek currently sits tied for 7th in points with 10 goals and 27 points in 24 games.  That puts him on pace to score 54 points in this lock-out shortened 48 game season which is 4 points more than he has scored in any 82 game season (career best was  50 points in 2009-10 in 81 games).

Last season when Rick Nash was on the trade block I wrote an article about Nash and in it I had a few comments about Jakub Voracek as part of a WOWY analysis. Here is what I wrote:

Nash played best when he was paired up with Voracek and Brassard and only Voracek, Brassard and Huselius made Nash a better offensive player when playing with him.  Vermette, Umberger and Malhotra were drags on his offensive numbers.  When playing apart, Voracek’s numbers are better than Nash’s.  Same for Brassard’s (who is doing it again this year, 0.782 GF20 vs Nash’s 0.613 when apart).  As an aside, the numbers suggest that Voracek is a very good offensive player  and it was probably a big mistake to trade him.  It also suggest that the Flyers aren’t getting full value from him by playing him primarily with Maxime Talbot.  If someone acquired Voracek and put him in the right situations, he could be the next Joffrey Lupul.

Voracek wasn’t traded but the departure of James van Riemsdyk and Jaromir Jagr opened up some spots on the top two lines and Voracek got a promotion from playing mostly with Talbot to playing with Claude Giroux and getting lots of powerplay time.  The results of that move are, as I predicted, very Joffrey Lupul like. Lupul put up solid but unspectacular numbers while mostly been given second line minutes and secondary power play minutes for the majority of his career. Lupul’s numbers looked unspectacular but were actually quite good considering his usage as a secondary offensive player and the quality of line mates he played with. When Lupul came to Toronto and was put on a line with another elite offensive player, given first line minutes, and first power play unit minutes, he started putting up high end offensive numbers. It wasn’t so much that Lupul had a break out season or that he had a career year, its more than he was finally given an opportunity to play with top end talent and given first line minutes.  The exact same thing happened with Voracek.  He put up solid numbers while given secondary minutes in secondary offensive roles and just needed to be given a chance to prove his worth as a first line player with quality line mates. Now he has been given that chance and the results are clear. He is a high end offensive talent.

 

Feb 272013
 

The last several days I have been playing around a fair bit with team data and analyzing various metrics for their usefulness in predicting future outcomes and I have come across some interesting observations. Specifically, with more years of data, fenwick becomes significantly less important/valuable while goals and the percentages become more important/valuable. Let me explain.

Let’s first look at the year over year correlations in the various stats themselves.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.3334 0.2447 0.1937
FF60 0.2414 0.1635 0.0976
FA60 0.3714 0.2743 0.3224
GF% 0.1891 0.2494 0.3514
GF60 0.0409 0.1468 0.1854
GA60 0.1953 0.3669 0.4476
Sh% 0.0002 0.0117 0.0047
Sv% 0.1278 0.2954 0.3350
PDO 0.0551 0.0564 0.1127
RegPts 0.2664 0.3890 0.3744

The above table shows the r^2 between past events and future events.  The Y1 vs Y2 column is the r^2 between subsequent years (i.e. 0708 vs 0809, 0809 vs 0910, 0910 vs 1011, 1011 vs 1112).  The Y12 vs Y23 is a 2 year vs 2 year r^2 (i.e. 07-09 vs 09-11 and 08-10 vs 10-12) and the Y123 vs Y45 is the 3 year vs 2 year comparison (i.e. 07-10 vs 10-12). RegPts is points earned during regulation play (using win-loss-tie point system).

As you can see, with increased sample size, the fenwick stats abilitity to predict future fenwick stats diminishes, particularly for fenwick for and fenwick %. All the other stats generally get better with increased sample size, except for shooting percentage which has no predictive power of future shooting percentage.

The increased predictive nature of the goal and percentage stats with increased sample size makes perfect sense as the increased sample size will decrease the random variability of these stats but I have no definitive explanation as to why the fenwick stats can’t maintain their predictive ability with increased sample sizes.

Let’s take a look at how well each statistic correlates with regulation points using various sample sizes.

1 year 2 year 3 year 4 year 5 year
FF% 0.3030 0.4360 0.5383 0.5541 0.5461
GF% 0.7022 0.7919 0.8354 0.8525 0.8685
Sh% 0.0672 0.0662 0.0477 0.0435 0.0529
Sv% 0.2179 0.2482 0.2515 0.2958 0.3221
PDO 0.2956 0.2913 0.2948 0.3393 0.3937
GF60 0.2505 0.3411 0.3404 0.3302 0.3226
GA60 0.4575 0.5831 0.6418 0.6721 0.6794
FF60 0.1954 0.3058 0.3655 0.4026 0.3951
FA60 0.1788 0.2638 0.3531 0.3480 0.3357

Again, the values are r^2 with regulation points.  Nothing too surprising there except maybe that team shooting percentage is so poorly correlated with winning because at the individual level it is clear that shooting percentages are highly correlated with goal scoring. It seems apparent from the table above that team save percentage is a significant factor in winning (or as my fellow Leaf fans can attest to, lack of save percentage is a significant factor in losing).

The final table I want to look at is how well a few of the stats are at predicting future regulation time point totals.

Y1 vs Y2 Y12 vs Y34 Y123 vs Y45
FF% 0.2500 0.2257 0.1622
GF% 0.2214 0.3187 0.3429
PDO 0.0256 0.0534 0.1212
RegPts 0.2664 0.3890 0.3744

The values are r^2 with future regulation point totals. Regardless of time frame used, past regulation time point totals are the best predictor of future regulation time point totals. Single season FF% is slightly better at predicting following season regulation point totals but with 2 or more years of data GF% becomes a significantly better predictor as the predictive ability of GF% improves and FF% declines. This makes sense as we earlier observed that increasing sample size improves GF% predictability of future GF% while FF% gets worse and that GF% is more highly correlated with regulation point totals than FF%.

One thing that is clear from the above tables is that defense has been far more important to winning than offense. Regardless of whether we look at GF60, FF60, or Sh% their level of importance trails their defensive counterpart (GA60, FA60 and Sv%), usually significantly. The defensive stats more highly correlate with winning and are more consistent from year to year. Defense and goaltending wins in the NHL.

What is interesting though is that this largely differs from what we see at the individual level. At the individual level there is much more variation in the offensive stats indicating individual players have more control over the offensive side of the game. This might suggest that team philosophies drive the defensive side of the game (i.e. how defensive minded the team is, the playing style, etc.) but the offensive side of the game is dominated more by the offensive skill level of the individual players. At the very least it is something worth of further investigation.

The last takeaway from this analysis is the declining predictive value of fenwick/corsi with increased sample size. I am not quite sure what to make of this. If anyone has any theories I’d be interested in hearing them. One theory I have is that fenwick rates are not a part of the average GMs player personal decisions and thus over time as players come and go any fenwick rates will begin to vary. If this is the case, then this may represent an area of value that a GM could exploit.

 

Feb 182013
 

I have some new and exciting enhancements to stats.hockeyanalysis.com for you all today. Charts, Charts, and more Charts.

Before we get to the charts though, let me also mention that I have made some modifications to my HARO, HARD and HART ratings. Most of the change is to the scale and presentation and not so much to the actual formula (though there were some tweaks there too). Instead of 1.00 being an average hockey player, 0 is and the scale has been multiplied by 100 to represent % as opposed to a ratio. So now one should interpret [Shot,Fenwick,Corsi]HARO offensive ratings to mean that when the player was on the ice his team had x% (where x is his rating) more goals [shots, fenwick, corsi] for than expected (as determined by his quality of team mates and quality of competition). This means that a positive value means more goals were scored than expected and a negative value means less goals were expected. A positive value indicates the player boosted his teams offensive performance while a negative value means he was a drag to his teams offense.

For defensive [Shot,Fenwick,Corsi]HARD ratings the effect is opposite. One should interpret the HARD ratings to mean that when the player is on the ice his team gave up x% (where his rating is x) fewer goals [shots, fenwick, corsi] than expected (as determined by quality of teammates and opposition).  So, a 10 HARO rating indicates the player boosted his teams expected goal scoring rate by 10% and a 10 HARD rating indicates the player reduced his teams expected goals against rate by 10%.  The [Shot,Fenwick,Corsi]HART ratings are simply the average of the HARO and HARD ratings.

Now on to the more exciting news, the charts. We all love charts so I have added a bunch for you all to enjoy. When you go to a player page now (i.e. Zdeno Chara) you will find a link named Visualize performance over time. Clicking this link will give you a visual representation of the players performance over the past several seasons starting in 2007-08 if their careers were active then. For example, here is Zdeno Chara’s performance charts. For forwards and defensemen there are 5 charts.

  1. Point production (G/60, A/60, First A/60 and Points/60)
  2. Individual shot, fenwick and corsi rates (shot/60, ifenwick/60, icorsi/60)
  3. HARO, HARD, FenHARO and FenHARD ratings
  4. GoalsFor%, ShotsFor%, FenwickFor% and CorsiFor%
  5. Zone Start %

This should give you a quick visualization of each players performance and how it has changed over time.

For goalies (i.e. Roberto Luongo) the only chart I have right now are 5v5 Zone Start Adjusted Save percentages.

Maybe the charts that will generate the most interest though are the new WOWY charts (sure to make you scream “WOWY!!!”). To access the WOWY charts you simply need to go to a WOWY data page and click on the “Visualize This Table” link at the top of the WOWY table (only for ‘with you’ WOWY, not ‘against you’). This will give you two WOWY bubble charts.  The first one plots teammate ‘with you’ GF% across the horizontal axis and teammate ‘without you’ GF% across the vertical access. The second chart is the same but plots CF% instead. The size of the bubbles are relative to the total TOI With.

In these plots good players will have the majority of their teammates bubbles show up below or to the right of the diagonal line from the bottom left corner to the top right corner and bad players will have the majority of their teammates above or to the left of that line. Players with a lot of teammates in the bottom right quadrant are really good because they are taking sub par players and making them look good. Players with a lot of teammates in the upper left quadrant are  bad because they make good players look bad.

For a look at two polar opposite players, take a look at Zdeno Chara’s WOWY charts compared to Jack Johnson’s WOWY charts (I have linked to the 3 year 5v5 ZS adjusted WOWY charts). Also, on Saturday I wrote a post about how bad Tyler Bozak is and if you want more evidence of that have a look at his 2 year WOWY charts. I am slowly becoming a big believer that WOWY’s are where it is at in evaluating players (though I guess I have always been a believer as this is the core of my HARO, HARD, and HART ratings). The great players are the ones who consistently make their team mates better. The good players are the ones who can really capitalize playing with great players and don’t hold them back. The bad players are those who act as drags on their team mates. These WOWY charts are a quick and easy way of visualizing the different types of players. For the Leafs, Grabovski fits into the ‘great’ category, Kessel into the ‘good’ category and Bozak into the bad.

I have a few more ideas of some charts and tables to add (I’d got some ideas for some more ‘usage’ type charts) but I think this will be the last major update for a while. That said, if you have any ideas of what you would like to see added definitely let me know and I’ll see what I can do. As for updating of the 2012-13 stats, it should be noted that they aren’t updated daily.  I have been trying (fairly successfully so far) to update them every Monday, Wednesday and Friday mornings and I hope to continue that but no guarantees.

Update: I know I said I wouldn’t do any more updates but I have made the WOWY charts better by adding WOWY charts for GF20, GA20, CF20 and CA20. Now we can easily see where a players strengths and weaknesses are (i.e. offense vs defense).