Friday, May 23, 2014

Summer Project: Advanced Stats


My summer project is going to be to really get under the hood on advanced stats/analytics in hockey. Like I've said: professionally, I operate in a world of using stats to tell stories and analyze situations. I'm not resistant to the idea of advanced analytics in hockey. I just haven't been sold on the value of the metrics that are currently in vogue. So I want to understand them really well, to see if I need to come to a more-informed opinion.

I started out today by finding an article on a Jets blog about what Corsi is and is not. It was a very good article: rationally laid out, and well-stated, arguments. It did not seem to have a pro-advanced analytics agenda – just more of a “here’s the deal, for better or for worse” attitude, which I respect.

Corsi: Defined

Corsi is simply the differential between shots (attempted) for and against, typically in even-strength situations. It can be applied at the team or player level, for a given game or number of games. If the Wild takes 6 shots at/on goal and Chicago takes 3 while Mikko is on the ice, Mikko has a +3 Corsi. Corsi counts all shot attempts, regardless of the outcome (blocked, missed the net, on goal, in goalPretty intuitive, makes sense. Glad we're tracking that. All aboard so far.

Corsi: Applications, and a bit about Fenwick

The article then says that "Corsi can proxy scoring chances reasonably well". Okay, again, I don't debate that. Simply, it seems to me to be extremely rare that a team can register a scoring chance without first attempting a shot on goal. Sure, someone could suffer a Skoula moment that could lead to a crazy chance without someone taking a shot at the goal, but those are so rare as to be clear outliers. So, yeah, the team that wins the shots battle should come out on top. The higher the Corsi for a player, the greater the likelihood that he was on the ice for goals-for instead of goals-against.

The point is made that Corsi lends itself to analysis about possession and territorial advantage. That’s a little thinner, I think. Because not all shot attempts come after the same amount of possession or even zone time. Think of a team that isn’t supporting the puck in the offensive zone, getting those “one-and-done” chances and then retreating back on defense. They could win the Corsi for that game, and it wouldn’t give you an accurate depiction of possession or territorial advantage. Or if team A gets pinned in its zone for a while, but doesn’t give up a shot attempt, then gets a breakaway and goes in for a shot attempt. In other words, there are normal situations that occur within a game that undermine the correlation between Corsi and possession and territorial advantage. But, I’m still willing to say that the middle of the bell curve of outcomes will support a pretty strong correlation between these things.

Then we get into Fenwick, which is Corsi net of blocked shots. So, the more shots you take that get to or around the net (instead of getting blocked on the way) vs. the other team, the better. Fair enough.

I found other articles by the likes of Cam Charron and Jesse Spector that delved deeper in the genesis and history of Corsi, which were also very interesting.

Additive Applicability Challenges

But here’s where I think the breakdown in additive applicability to the broader game of hockey starts to come into play. Corsi grew out of Jim Corsi’s desire to track events within a game that would cause a goaltender to react physically. He made the novel leap that such events are not solely manifested by shots on goal/goals, but also by shots attempted. I play goalie, granted at the beer league level, and this kind of thinking warms the cockles of my heart. But, lots of things occur during a game that cause me, as a goalie, to move – many of which occur with or without a shot attempt associated with them. Now, I understand you can’t wrap everything into a metric like this. There has to be a line, and making that line shot attempts makes good sense. I’m just saying that, when you extrapolate this beyond the goalie to the broader game, its effectiveness starts to fray at the edges a bit. Again, it’s not even that I don’t see a correlation to things like scoring chances or even wins and losses, it’s that I don’t see an added level of insight to those things beyond what we get with simple shots for vs. shots against.


So, I ran the numbers for the Wild’s 2013-2014 season, using as my source. What I came up with was pretty interesting.

When Corsi Positive 13 12 8 33 39.39%
When Corsi Negative 26 14 6 46 56.52%
When Fenwick Positive 15 13 9 37 40.54%
When Fenwick Negative 24 14 6 44 54.55%
When SOG* Positive 14 13 9 36 38.89%
When SOG* Negative 18 13 6 37 48.65%

Corsi Even+Pos 16 13 7 36 44.44%
Fenwick Even+Pos 18 12 8 38 47.37%
SOG* Even+Pos 22 14 9 45 48.89%
*SOG = Shots on Goal

The preceding data shows the Wild’s record (win, loss, overtime/shootout loss) when they had a positive or negative Corsi, Fenwick or simple shots rating. The upper section of data show the results with a positive or negative differential only. The lower section of data show the results with an even OR positive differential, or a negative differential.

The thesis was that Corsi and Fenwick do a good job of indicating outcomes, and a better job than traditional statistics (such as simple SOG differential). The conclusion, from this sample set, debunks that thesis in two ways.

First, a positive Corsi differential only correlated to a 39.39% winning percentage. A positive SOG differential correlated to a 38.89% winning percentage. That’s a wash. A positive Fenwick differential correlated to a 40.54% winning percentage – a little better result than Corsi, relative to straight SOG.

Second, a negative Corsi or Fenwick differential, the thesis would follow, would indicate that the Wild trailed in chances, possession and territorial advantage, all of which would seem to lead to the analysis that the Wild would be more likely to lose those games in which their Corsi or Fenwick differential was negative. The data proves otherwise. In fact, the Wild had a significantly better winning percentage when their Corsi differential was negative (56.52%) than when it was positive (39.39%). The Fenwick data show the same thing: a better winning percentage when the differential was negative (54.55% than positive (40.54%) . The same basic relative outcomes hold for the SOG differentials, proving both that Corsi and Fenwick are no better at predicting, or correlating to, wins than simple SOG and that Corsi, Fenwick and SOG are lousy predictors of outcomes in general.

Adding in the outcomes where the differential was even to the outcomes where the differential was positive moves the data even more in favor of SOG being at least as good a metric than Corsi and Fenwick for predicting wins.

A Word About Sample Size

When you challenge the advanced analytics set on their thesis, sooner or later they trot out sample size as a limitation. It’s funny because they don’t seem to have an issue with a small sample size when the data work for them, but it’s definitely an issue when the data work against them.

There can be no doubt that even an 82-game sample size is sub-optimal. For the record, was giving me trouble going back farther than this season, so I will continue to try to add to the data set for the Wild. So, before the advanced stats guys jump all over me, I acknowledge this sample size is small. I think you can make the argument that you could either go insane trying to establish what an appropriate sample size is, and also that each roster is unique (and rosters are constantly in flux due to injuries, line changes, etc) so maybe a full season is a reasonable sample size. But, I know sample size is going to be brought up.


By way of a conclusion, I am not willing to say I have concluded anything. The fact is I would like to get more data. But people should also recognize that as you add to the sample size you invite the introduction of additional variables that could limit the applicability of the results.

I came into this thinking that advanced stats don’t tell me anything that elementary stats already told me. I have not seen anything that would make me think otherwise, to this point.

Thursday, May 1, 2014

TDI May Day Version (Also, Wild Wins!)


Series victory version, of our "If you give us five minutes, we'll give you the Wild" micro-podcast format.

Topics discussed:

*Turning point of the series
*Yeo's post-game comments in the room
*Rivalry with Colorado

Wild Defeats Avs, Series Thoughts


In the end I think this series was about experience and depth. The Wild just had a little more of both, and was able to take advantage of those differences when it mattered the most. Colorado probably has more top-end talent than Minnesota. I do not think we have the guns to match what MacKinnon (as sure a star in the making as exists in the NHL right now), Landeskog, Duchene and Stastny can bring. And O'Reilly is no slouch either, although he is really cast more in the Wild player mold of sub-elite offensive talent rounded out by excellent three-zone tenacity. With Barrie organizing from the back, you do wonder if either of these past two games would have ended up differently.

But, where the Avs were really a one-line team until Duchene came back, and then a one-plus line team when he came back , the Wild got more contributions throughout their lineup. The goals from Nino and Heatley last night, for example. Colorado had nine forwards finish with a total of 43 points. Minnesota had 11 forwards finish with a total of 50 points. Both teams' defenses contributed 12 points.

Put another way, Suter was the highest-scoring defenseman for the Wild, with four points in seven games. He currently stands 8th on the team in playoff scoring. Holden was the highest-scoring defenseman for the Avs, with five points in seven games. He currently stands 5th on the team in playoff scoring. (Barrie finished with two points in two-and-a-half games, for what it's worth.) You know who has the best plus/minus on the Wild right now? Heatley (+5). I know, I'm surprised, too. This was a very close series, and the point is the Wild got just enough additional scoring depth to win.

As far as experience, the Wild started with more recent playoff experience (from last year), but then also learned and drew on experience gained early in the series, and put it to work at the end. Again, we are talking incrementally here, as close as this series obviously was. But overcoming four one-goal deficits in a game seven is pretty remarkable. Cuts both ways (you gave up the lead four times!), but the Wild just was not a team that has that kind of mental fortitude - until lately. Interestingly, where the Avs had a decided experience advantage was in goal, and Varlamov could not protect four leads last night. Obviously some of Avs goals were stoppable, and I am not saying the Wild won the goaltending match-up so much as they survived a shootout. But you put a couple defensemen (Barrie and ?) on that Avs team and look out.

And, as good a job as I thought Patrick Roy did, I thought Yeo was able to adapt better over the course of the series. Hey, Yeo's team was the only one to win on the road (granted he had one more kick at that can than Roy did). I was impressed with Roy's demeanor and ability to motivate his men to carry out his game plan, in general. I was a Patrick Roy, the goalie, fan - arrogance included. But I'll admit I was surprised he has been as effective as he's been as a coach. But maybe Yeo's ability to draw on last season's (brief) playoffs gave him just that much more of an edge?

And I have been harsh on Yeo a lot in this space, and elsewhere. But he has really impressed me since the calendar turned over the 2014. And there can be no debating that he has this team playing with more poise and confidence than I have maybe ever seen from a Wild team. At this point I have to think he is going to get a new contract from the Wild, and he has earned it.