clock menu more-arrow no yes mobile

Filed under:

Challenging the Hockey Statisticians - Part 1

Our own Justin Azevedo has run a great series explaining advanced stats to the uninitiated. The advanced stats perspective of hockey is a tough hurdle when you first come upon it. 

Kind of like walking into a Calculus class before you have done Algebra.


Over the last couple years I've seen three potential reflex reactions to them in numerous hockey forums, blogs and in on-line hockey conversation. 


(1) The fan who goes this is complicated therefore it must be right and accepts it blindly without unpacking the epistemological foundations.


(2) The fan who goes this is too complicated for me to understand therefore I am going to disregard all of it. 


(3) The one that is silenced by the advanced stat conversation. The thought being that this is complicated yet I can't disregard it because it is rationally based on empirical evidence but I also can't challenge it because I do not know enough about it 


Number 3 would be me and I read more and more on the topic to further my understanding. Hopefully one day I can contribute something meaningful to the conversation. (That day is not today by the way, this is a different kind of exercise to keep us occupied in the off-season)

The folks who do work in this area are well aware of its limitations. Justin immediately caveats all 4 of his articles with: 

Advanced stats are not perfect and are mostly useless as evaluators of talent (like pretty much any other stat) without context, and the bigger the sample size the more accurate a stat will become-ideally.


Justin Azevedo - M&G

So how will advanced stats improve? The same way just about every idea in human history has, it has to be questioned through dialectic


A dialogue between two people holding different views with a rational foundation. So I am going to forward some Socratic style simplistic questions here.


The challenge for our advanced stat advocates at M&G; be they our contributors like Justin, Ryan P and Arik or our regular readers and our mentors *cough* Kent *cough* is to answer these simple questions on advanced stats. As far as I can tell the advanced hockey stat conversation started a little over a decade ago. It was in use much longer of course with NHL clubs privately but it entered public fan discourse about a decade ago, as far as I can tell. 


Some of you may be wondering why I didn't post these questions on another blog I write for, which is well known for its advanced stats writers. The reason is simple, I want to write these simplistic, yet foundational questions, from an introductory perspective for all M&G readers.

This is the level of counter to Justin's introductory articles on advanced stats. As I mentioned, advanced hockey stats in the public fan forum has been going on for over a decade. I suspect that these types of questions have circulated before and are tiresome or uninteresting for some stat writers who have been around forever.


Although it would not upset me in the least if anyone in the SBNation hockey stats universe wanted to weigh in from anywhere else. That is the whole point, to get a dialogue going and share the knowledge.




This is not an anti-stats article. I acknowledge the validity of the advanced stats paradigm. It is irrational to completely disregard it but I question the scope of its predictive power and with that its authority to do much more than describe past events and contextual circumstances.


It is an interesting conversation, certainly, it helps enrich and provide fuller perspective on the players we watch. The Sedins are good but the high % of O-zone starts really helps them doesn't it. Hence, we get a fuller perspective of players and the game by listening to the advanced stats conversation but I have also seen advanced stats wielded like a sledge hammer to crush the optimism of the casual fan. Is that justified when those that create the analysis themselves qualify its very limitations? 


Certainly the broad predictive strokes have merit. The Canucks will finish in the NHL's top ten next season, the Senators will miss the playoffs but can you thread a needle absolutely on a bubble team and predict the Flames or Blue Jackets in or out of the playoffs? Can you state the Canucks will be top 3 again this season? 


Certainly there is some general predictive power here but is it significant? Is it any different from the experienced hockey fan on the coach, beer in hand perspective. This is not due to flaws in the existing computations or quantifications of Corsi, Qualcomp and so forth but rather due to the nature of the game of hockey. These are the foundational challenges that will be put forth for the Statistician's rebuttal. 


Some sports like Baseball slide extremely well into statistical analysis, I would postulate that others resist it significantly and hockey is the latter. 


The hockey gods or what may be more properly called the X factor is the reason I watch so many games year after year. Lets challenge the Statisticians for a response for how they deal with these various X factors.


(1) The Ice


 Does any other sport have the diversity of surface for the game that hockey does? Ice in the southern markets has been referred to as worker's ice, slower ice, bouncy ice. This is a consistent geographical effect. Is ice differentials noted in any analysis? Is different types of ice something that certain players will proposer on and if traded to a northern team could be expected to decline in performance or vice versa.


Is it reasonable to hypothesize that Jay Bouwmeester's performance in Calgary is a result of not being able to perform as well on faster ice? Is Roberto Luongo's stay in the crease approach cultivated by the fast ice and boards in Vancouver? 


This is more than a home ice effect. I am considering it as a macro principle. If all 30 teams had the same players who were perfectly cloned the effect I would expect would be lower scoring and lower goals allowed on the slower southern ice. Higher scoring and more goals allowed on the faster ice. 


Is this a legitimate component for analysis or should teams simply continue to look at things on a rink by rink basis as visitors? Would coaches on slower ice in the south be better to consider trapping and defensive systems and focus on acquiring these types of players vs northern teams where faster ice who will be better served picking up highly skilled fast forwards?


As a macro principle should ice quality be a foundational element of analysis? Do statisticians disregard it and if so what is the explanation?


(2) The Goal by Deflection or Luck


Hardly an uncommon event. A goal by deflection is almost something that you expect to happen at least once in most games but how would you weight the skill vs luck factor of a deflected goal? I am not sure if one can fairly state this is too small a % of goals scored to say it is insignificant. 


Can you blame a goalie for being beat on a deflection? Can you credit a player facing the wrong way, not even seeing the puck and the puck hitting his skate and going in fairly with this along with every other player on the ice? Do the numerous goals by deflection unfairly skew player stats upwards for the scoring team and unfairly skew stats downward for the opposing team who is defensively perfectly positioned, including the Goalie?


Or do we just say the hockey gods bless all teams equally with the luck or misfortune of deflection goals?(remember the bouncy ice of the south) It would not bother me in the slightest to see Goals given a sub-category of - XDeflection goal representing a evaluation of more luck than skill. 


It is not to state that deflections are not practiced and not something that many players are highly skilled at. The player that jumps to my mind is Ryan Smyth who has made a career out of the art. But I have seen just as many goals go in through skill deflections as I have by luck deflections - off player's skates, their body and I hesitate to give them the same credit as a Ryan Smyth type player who knows position and deflection to the point that it is hard to ignore. 


Same with a pure luck goal where a player scores on his own net for example or an incredibly bad goal the Goalie has let in. A goal that is so bad that it is a clear bungle and again why should the stats of the players on the ice be moved up or down in these cases? Not all teams were lucky enough to have the opposing team score on their own net last year.


This type of pure error luck goal by Johnson is rare but in principle is it any different than a goal that hits a skate and deflects in - neither goal is really planned is it? Does any player shoot with the intention of deflecting a goal in off another player's skate?

(3) Injury Analysis in Advanced Stats


Hockey is the one sport where injury is almost expected. Yet to the best of my knowledge players are not valued more positively because of their sturdiness. 


In an analysis of the value of players is this factored in anywhere? A player, who may excel when he is healthy, can not be considered as valuable for his team if he rarely plays a full season without injury. Is there any calculation that considers this? If not, why not? Players have histories of being sturdy and injury prone and this should be a factor considered somehow in the strength of a team going into the season, should it not?


Hockey is not the type of game where injury is the exception, it is the rule. How do statisticians account for the injury element in their analysis of a teams strength. One would intuit that it is reasonable to predict in some cases higher level of injuries on a team overall, given the injury history of its players and its style of game. 


As knowledge of concussions increase and players take longer and longer to return to active play this element of analysis may be an area to consider for advanced statisticians. After the first concussion the possibility of another gets easier and easier as does the probability of missing more and more game time. 


(4) Predictive Power?


The final question and perhaps the most important one. The question that cuts away all the complexity of advanced stats and brings the paradigm under the interrogator's light. 


Keep in mind I have already acknowledged the validity of the advanced stats paradigm in enriching our perspective of the game and enhancing our perspective of players. It gives us a fuller understanding. At the bottom line though when someone states Jarome Iginla is over the hill, look at his declining X, Y or Z statistical quantifications, it is a powerful statement. One that due to the mathematical structure of the assertion tends to steam roll over those who want to counter-argue optimistically for the future. I don't like this common use of stats by some of its followers. 


Last season Jarome Iginla crushed the pessimistic predictions of the statisticians. Matt Stajan was a massive bust and under-performer of their predictions. Lets review a posted article from Flames Nation.  


In fairness, the authors at Flames Nation, as all advanced stats writers usually do in some way, qualify their predictions if they make them. Rob Vollman immediately states how difficult it is to predict points for a player in an upcoming season but he uses a good methodology and he has a lot of little beads flying across the abacus in the background. Lets look at the results of his work to challenge the statistical paradigm he represents. 


6 of 9 players missed his statistical point production range: Iginla, Tanguay, Giordano, Stajan, Hagman and Bouwmeester.


Vollman hit the predicted point range for White, Bourque and Jokinen. That is a 33% success rate.


Statistically speaking should we consider this significant? When all the dust has settled, when all the buttons have been pushed, when all the gerbil driven excel spreadsheets have stopped churning, if you are not at least above 50% in your predictions are you not better off flipping a coin? 


I applaud Vollman for doing this by the way, he is exactly the kind of stats writer I like, one that really tests methodologies and actually crunches out numbers for predictive results. Even though he misses, I respect his gusto for doing what we all want to see from advanced stats, predictive power, our hockey pools await.


Kent Wilson mentions in the evaluation article something we hear endlessly from hockey statisticians, I used to hear it from financial analysts as well when their stock picks failed. I did not have enough data, my data was flawed or in the case of hockey stats, the sample size is too small and if you bundle my projections all together it all averages perfectly. I am actually right. 


No, the math is right but its ability to predict the future is limited.


The problem for financial analysts is that if your task is to pick a list of stocks and bundle them into a mutual fund it is fine but the task at hand was not to make a projection for the Calgary Flames as a bundled mutual fund of players but to predict individual player's scoring projections. The article is entitled "Pre-season PLAYER projections review", not average scoring for all Flames forwards, unless I misread or misunderstood the original articles Vollman wrote. 


It is like playing monopoly with someone who has a permanent "Get out of Jail Free Card." While it is true of course, the larger the sample size, the better the predictive power but when it goes wrong the response is well, the sample size is too small for reliable results. You rarely hear sample size mentioned when they get a prediction right. 


What is a sufficient sample size?


If the old vet forwards of the Flames last year could not provide an extensive enough history and they are into the final years of their career, perhaps advanced stats simply can not predict the future at all. Perhaps they really can only make the most general predictions and if so, well, remember old hockey fan with beer in hand on the coach, he can do that as well. 


And with that comes the rub, if advanced hockey stats are limited in their predictive power and are primarily descriptive of past events are they not over valued at the current moment until they evolve to illustrate some consistent predictive results?


Anyone still want to say the Flames will miss the playoffs this year? ; ) 




Next week's prepatory reading


Ludwig Wittgenstein's "Remarks on the Foundations of Mathematics" & Godel's Incompleteness Theorems.