# Brendan Morrison and the failure of Corsi

Last night after news came out that Brendan Morrison had re-signed with the Calgary Flames, Kent Wilson tweeted the following:

Morrison back in Calgary. Check out his corsi tied rating fellow stats nerds: http://bit.ly/q1ywUj

The link is to the Calgary Flames 5v5 game tied corsi ratings which show Morrison had a 0.452 corsi rating (Corsi For %) which was dead last on the Flames.  The problem with jumping to the conclusion that Morrison is bad is two fold:

1.  Corsi generally speaking isn’t good at evaluating players.

2.  One year of 5v5 game tied data is not enough to evaluate players, even with corsi.

Lets take a look at Brendan Morrison over the past 4 years and I’ll show you exactly what I mean.  First lets look just at 5v5 any game score situations.

 Season(s) CorF% GF% 2010-11 0.484 0.562 2009-10 0.514 0.627 2008-09 0.498 0.569 2007-08 0.430 0.500 2007-11 (4yr) 0.491 0.577

In each and every year the goals for percentage is significantly higher than his corsi for percentage.  His corsi ratings make Morrison look mediocre at best but his goal ratings make him appear to be quite good.  This isn’t a fluke.  It is occurring systematically, every single season, over 4 seasons in which Morrison played for 5 different teams (Vancouver, Anaheim, Dallas, Washington, Calgary).

Now what about 5v5 game tied situations.  Morrison’s 4 year game tied corsi for percentage is 0.482, his 4 year game tied goal for percentage is 0.592 (which ranks 28th of  217 among forwards with at least 1000 5v5 game tied minutes over the past 4 seasons).

Personally, I’d rather have good goal ratings than good corsi ratings.  Morrison is a good signing by the Flames.

1. Here is the remedial math lesson to explain why this post and many others here are failures.

You write in the past
We have a very simple equation.

Goals per 20 minutes = Shots per 20 minutes * Shooting percentage

You want to treat this as a linear equation. Now the equation of a line is written as y = mx + b where m is the slope of the line, b is the y-intercept, x is an intependent variable and y is a dependent variable.

We have a slightly simpler situation in your attempted linear equation. There is no y-intercept so b=0. Thus we have a situation where our equation of a line is y = mx.

Now you want to treat both the number of shots and the shooting percentage as independent variables. Sometimes this is a valid thing to do if they truly are independent variables. One famous example where you can do this in science is newton’s second law F=ma. Mass (m) can be an independent variable and acceleration (a) can be an independent variable. You can change one arbitrarily without changing the other. Then you will get a force (F) that depends upon the m and a values.

There are equations you can write in a similar form where you cant do this. A famous science equation that shows this principle is Einstein’s famous E=mc^2. We can treat m as an independent variable, but we cannot treat c^2 as an independent variable because it is a constant (the speed of light squared). We cannot vary it independently.

Now we have established (hopefully in a non-controvertial way) that you can’t arbitrarily treat any variable in a linear (or linearized) equation as an independant variable.

A little more on topic we can’t have an equation y = x f(x) where we treat x or f(x) as independent variables. This equation is not actually linear. The value of x sets f(x) and if we have a one-to-one mapping f(x) sets the value of x (if its not a one-to-one mapping you have an invertion problem but it doesn’t change anything significant).

Your equation is y = x f(x). Your equation is Goals per 20 minutes = Shots per 20 minutes * Shooting percentage

Shots is x. Shooting percentage is f(x). The shooting percentage clearly depends upon the number of shots. For most players, their shooting percentage will drop if they take more shots because the extra shots they are taking now are low percentage shots that they otherwise would have not taken. Similarly, a player can raise his shooting percentage by not taking some of the lower percentage shots they would otherwise take. There is a limit ot this if you limit enough shots and for example only shoot on open nets, but the principle is clear. Shoooting percentage is NOT independent from shots. It is a function of the number of shots. It is a complex function that varies from player to player, but it is a function nonetheless. Thus it is a failure to treat the equation as a linear equation even though you write it so that it looks like one. Many mathematical operations (such as correlation) assume linear equations in their intepretation so they are incorrect to apply – but you do it anyway.

Now you have argued in the past that you use 4 years of data (or some other ridiculously large numbers) and you seem to think that you can now assume the variables and not linked (shooting percentage is not a function of number of shots). That is false and not a way out. If the variables are linked, they are linked if we use 1 minute of data, 1 year of data or 1 millenium of data.

1. For most players, their shooting percentage will drop if they take more shots because the extra shots they are taking now are low percentage shots that they otherwise would have not taken.

Really? What you are postulating is that shooting percentage is negatively correlated with shots taken. The more shots a player takes, they must be low quality shots, thus his shooting percentage will drop. Do players to take more shots really have lower shooting percentages? Prove it.

Actually, I’ll save you the time. The correlation for forwards between on ice shooting percentage and on ice fenwick for per 20 minutes is r^2=0.1169 and it is actually positive correlated. This means, players who take more shots tend to produce slightly higher shooting percentages which is opposite of your hypothesis. This weak positive relationship actually isn’t surprising and it is probably a result of good offensive players (i.e. those with good shooting percentages) getting more offensive zone starts which drive up their shots per 20 minutes rates.

Similarly, a player can raise his shooting percentage by not taking some of the lower percentage shots they would otherwise take.

In theory yes but do you honestly believe Ovechkin is out there on the ice thinking “I can’t shoot from here because it will reduce my shooting percentage.” No, his only goal is to score a goal. That is his mission when he is on the ice. He isn’t out there thinking about his shooting rates or his shooting percentage just so he can mess up my math. He just wants to get a goal.

But ultimately, if your argument against using a goals based method for analyzing players is ‘you cannot do that because players can manipulate their shooting percentages’ then you are arguing against the use of corsi because corsi evaluates players using the assumption that all shots are created equal and have an equal chance of going in.

Of course, this is just a meaningless argument because I don’t evaluate players based on shooting percentage. I evaluate them based on goals for rates and I have showed that goals for rates are as good at predicting future goals for rates as corsi with one year of data and better at predicting future goals for rates with 2 years of data. Hence, with 2 years of data, goal rates are the much better evaluator of talent.

2. David

You are being impossible. You are denying basic math.

I am merely saying that shooting percentage is a function of the number of shots taken. They are not independent variables. Stating the correlation between them is meaningless.

Unfortunately for you this is a serious flaw in your methodology. Your responce is to stick your fingers in your ears and carry on.

1. “I am merely saying that shooting percentage is a function of the number of shots taken. They are not independent variables. Stating the correlation between them is meaningless.”

If there is no correlation between them, they are not linked – one has no connection with the other. That is basic math. The fact that there is a weak correlation means at best there is a weak dependence on each other (but a correlation does not imply dependence). Taking more shots does not necessarily mean a higher or lower shooting percentage. If I give Ovechkin more ice time he’ll get more shots, but that in no way affects his shooting ability (unless you want to bring up a fatigue argument).

But again, forget I ever brought up shooting percentage because as I have written elsewhere, you would never use shooting percentage as an evaluation metric because it has all the flaws of GF20 (dependent on small sample size of goals) but is less correlated with GF20 than GF20 (since GF20 has a 100% correlation with GF20).

So, lets focus on the real point I want to make, the connection between GF20 and FF20 and future GF20. With > 1 year of data GF20 predicts future GF20 better. With > 1 year of data GF20 is the better evaluator of goal scoring talent. At ~1 year of data GF20 is about equivalent to FF20 as an evaluator of talent. Address that please because ultimately that is my argument on why we should use goals over corsi/fenwick in player evaluation.

3. David

We still need t review some math, so here we go.

You write:

If there is no correlation between them, they are not linked – one has no connection with the other. That is basic math. The fact that there is a weak correlation means at best there is a weak dependence on each other (but a correlation does not imply dependence).

This is entirely false. I can give you a simple mathematical example to show this. What is the correlation between x and x^2? If we have sampled the functions sufficiently from -infinity to +infinity, the correlation is zero. Yet they are clearly linked variables. If we limit our range to positive values, we get a positive correlation. If we limit our range to negative numbers we get a negative correlation. We can get any correlation we want depending upon which values of x we chose to sample over.

Your problem here is that shooting percentage is a function of number of shots. I don’t know the explicit form of the function – it probably is different from one player to the next – but it is a clear function of the number of shots. As such when we find a correlation between number of shots and shooting percentage we are effectively finding the correlation between x and f(x). Depending upon the form of f(x) and the range we happen to be sampling over, we might be able to get any value. At any rate, the value we get is not something we can interpret the way you want to. Your interpretation assumes linear equations and we don’t have linear equations. We might have some very non-linear equations. We don’t exactly know. All we know for sure is they are not linear. Conclusions that assume linearity will fail.

If I give Ovechkin more ice time he’ll get more shots, but that in no way affects his shooting ability

This totally misses the point. Ovechkin is given as much ice time as he can handle. That is the only logical way for a team like Washington to use their best player – a player who is arguably the best player in the world.

If we play Ovechkin the amount of ice time he can handle and we tell him to shoot more, he will take more shots (if he listens). He will get more shots by takng more low quality shots he otherwise would have avoided. As a result, he will lower his shooting percentage. This is true of any player in the league and not just Ovechkin.

1. We can debate everything you have written but it is really irrelevant. Please address what matters: The relationship between GF20, FF20 and future GF20 and how GF20 predicts future GF20 better.

The fact is that the link b/w shooting % and shots taken is a matter of *theory*. There is no discarding a theory that they might be (for all intent and purpose) independent on the basis that one thinks that a unit of analysis (Goals per 20: the foundation of David’s analysis) shouldn’t be linearized: by stating that they should be comprehended and explained in a complex and unknown *non-linear* way, one simply is bringing forth it’s own (admittedly vague) *theory* of shot% affect, not disproving another’s shot% affect *theory*.

Having looked at both your arguments on the dependent nature of shot% with shots taken, I have to say that I believe the best *theory* is that they are indeed (for all practical purpose) independent. Ontologically, they are probably slightly dependent like PSH states, but they are probably very much independent for all (positivistic) practical purpose. For instance, ontologically, I have no doubt that that is true for some extremes in the set. Should Gomez shoot only slightly more, in the moments when your like “Shoot! god damn it!”, his shot% would probably increase because he misses out on some good environmental/strategical chances that he helped create, and these could presumably be finished at a higher rate than is total shot%. But this is a “shoot more only in those added-value chances” affect. Not a link between shot% and shots taken. And you’ll notice that it goes contrary to the example PSH gave. A “shoot more” advice, presumably, is intended to have this effect on Gomez. If he executes it this way, he becomes a better player as shown by his increase on his shot%. Chances are, though, that if he indulges in more shooting overall and not parsimoniously, I believe that his shot% should be unchanged, with Gomez and surroundings staying equal. Of course the experience of the possible is impossible so we will most likely never know. But both of you guys’ work are IMHO definitely of the incremental variety, and are already allowing us to learn more.

In a way PSH’s theory has some ontological clout but his practical suggestion seems to be : find a player’s Goals per 20 Constant Potential (the link for a given player’s shot% and shots taken), then you will be able rate players correctly. The *theory* I would choose going about analyzing hockey is definitely that shot% and shots taken are independent. Please remember that within a positivist philosophy of science, theories and concepts are *instrumental*, i.e. they are not suppose to actually exist (ontologically). Even though this is conceptual a weakness of a positivist PoS, it is supposed to ironically lead us (though this is incremental and correlation is never a certain causation) to real existing forces/causations. In fact, even if I give PSH’s competing theory on shot% affect some ontological plausibility, I still think that David’s has the most ontological sense. Shots% is a function of a very large number of decisions and executions taken with rockets on your feet in a strategically evolving environment. Shots taken is not its most relevant determinant. Nor are they it’s most relevant indicators, goals are.

4. The foundation on which you justify it is based upon the mathematical problems I have explained in these comments. You built your house on an unstable foundation. You want me to discuss how nice the bedroom window looks and I am telling you the house is about to fall over and that is far more important.

1. No. The reason why I prefer goal rates to corsi rates is because goal rates do a better job at predicting future goal rates. It has nothing to do with any other stuff I have discussed.

If you don’t like the math I used to show shooting percentage matters, so be it. I don’t care. It’s irrelevant. It’s is just background stuff that led me to looking at goal rates as the predictor. If the math is flawed and I got there by mistake, so be it. Doesn’t matter. It would be like taking a wrong turn and taking a scenic route to the destination. Deal with the destination, not the route to get there.