Sports Analytics: Remember, Statistics aren’t Perfect!

/Sports Analytics: Remember, Statistics aren’t Perfect!

Sports Analytics: Remember, Statistics aren’t Perfect!

By | 2017-08-18T16:56:44+00:00 April 15th, 2015|$msg|6 Comments

We came across an interesting article in the Wharton Magazine blog titled “The Dangerous Data Fetishes of Sports Analytics” by Ian Cooper. The main point of the article is that some sports statistics do not add value. The main example Ian cites is the “PDO” variable which is used in hockey.

Take the example of PDO, a meaningless acronym that’s simply the sum of a team’s overall shooting percentage (the number of shots that result in goals) and its save percentage (the percentage of opponents’ shots that its goalie prevents from going in the net).

Overall, PDO can add some value, but it is not a perfect measure. Ian points out one of the flaws in the measure: the assumption is that hitting the net (scoring) from any distance is a repeatable task. So it doesn’t matter how far away from the net you are when you shoot. That doesn’t seem to make sense — but this is assumed in the PDO variable! As Ian states:

I dug a bit further and what I found was really interesting (although not perhaps entirely surprising): those who shot from “in-close” did a better job of hitting the net than those who shot from far away. (For more, read “The Relationship Between Shooting Distance and Shot Percentage on Net.”)

That seems pretty straight-forward! I have played hockey my entire life, and I agree with the study’s assessment — it is easier to score when being close to the net!!

Here is Ian’s main point:

But what’s increasingly clear in hockey, as in all areas that are being transformed by big data, is that data don’t apply themselves. Concepts like repeatability, sustainability and sample size are useful, but only if you understand how and when to apply them. To do that, it takes a good qualitative understanding of whatever subject you’re studying (in my case hockey), constant attention to whether the data are relevant to the hypothesis you’re testing, and the creativity and analytical chops to interpret whatever your scatter plot, histogram or regression model might be telling you.

This is similar in stock selection and other sports — you need to understand why a variable works in order for it to be useful. Otherwise, it is most likely just noise.

As an avid NBA fan, I am aware of Daryl Morey, who is famous for his “sports analytics” capabilities.  He is the General Manager of the Houston Rockets, who have had a successful season by all accounts. Part of his success has been driven through a combination of good draft picks such as Terrence Jones and Donatas Motiejunas, as well as picking up a lesser-known player in Patrick Beverly in free agency. However, he was also able bring in some franchise players such as Dwight Howard (free agency) and James Harden (via trade), as well as pick up Josh Smith at pennies on the dollar (bad fallout from the Pistons).

Is the success of the Houston Rockets driven by sports analytics? Some say yes — the 76ers hired Sam Hinkie from Houston to employ similar advanced analytics. Sam has, for better or worse, embraced the tanking philosophy. However, others disagree.

I leave the last word to Charles Barkley for his views on sports analytics (and Daryl Morey):






  • The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).
  • Join thousands of other readers and subscribe to our blog.
  • This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

Print Friendly, PDF & Email

About the Author:

Jack Vogel
Jack Vogel, Ph.D., conducts research in empirical asset pricing and behavioral finance, and is a co-author of DIY FINANCIAL ADVISOR: A Simple Solution to Build and Protect Your Wealth. His dissertation investigates how behavioral biases affect the value anomaly. His academic background includes experience as an instructor and research assistant at Drexel University in both the Finance and Mathematics departments, as well as a Finance instructor at Villanova University. Dr. Vogel is currently a Managing Member of Alpha Architect, LLC, an SEC-Registered Investment Advisor, where he heads the research department and serves as the Chief Financial Officer. He has a PhD in Finance and a MS in Mathematics from Drexel University, and graduated summa cum laude with a BS in Mathematics and Education from The University of Scranton.


  1. Michael Milburn April 15, 2015 at 2:37 pm

    Funny Barkley clip. “Analytics is just a bunch of guys who aint never played the game, and they never got the girls in high school, and they just want to get in the game.” That’s classic

    • Jack Vogel
      Jack Vogel, PhD April 15, 2015 at 2:39 pm

      If nothing else, he is entertaining!

  2. kczat April 15, 2015 at 8:29 pm

    I have to wonder if Ian Cooper is misunderstanding how these statistics are used.

    I don’t have any knowledge of hockey analytics, but the same statistics (PDO and TSR) are used in soccer as well, which I am more familiar with. In soccer analytics, PDO is used as a measure of “luck” precisely because it is not believed to be repeatable. If a team has a high PDO, then you predict them to do worse in the future as their luck wanes, and if they have a low PDO, then you predict them to improve.

    Overall, the idea, as I understand it, is to split goals scored into two parts, PDO and TSR, where PDO is the part that quickly regresses to the mean and TSR is the part that is more stable.

    For what it’s worth, I’ve seen similar analysis in one of the books on valuation, where they argue that some metrics (like operating margin) quickly regress to the mean while ROIC is more stable. This seems to be a generic technique for improving the quality of predictions in areas where you have little data to work from.

    • Jack Vogel
      Jack Vogel, PhD April 15, 2015 at 9:17 pm

      I think his main point is that there are some flaws in the measure.

      • kczat April 15, 2015 at 9:26 pm

        Again, I don’t know hockey analytics, but distance to the goal is commonly used in soccer analytics. (It’s a key component of “expected goals” models.) It’s possible that hockey isn’t doing this, but there is a lot of overlap between the two communities, so I’m suspicious that this isn’t fair criticism.

        • Jack Vogel
          Jack Vogel, PhD April 15, 2015 at 9:31 pm

          I don’t think that they account for distance to net. Here is a quote from his article:

          “A blogger with whom I occasionally bounced ideas (hired this summer by an NHL team) came back and concluded, based on data he had seen for the entire NHL over half a season, that there was no relationship between shooting distance and hitting the net.

          According to a study he had read (and agreed with), if you were trying to guess the likelihood of whether or not a player would get a shot on net rather than miss or have the shot blocked before it got there, it didn’t matter whether that player shot from far away or in close. To use the language of hockey analytics, hitting the net from any distance was a matter of luck or randomness rather than a repeatable skill.

          That struck me as a little odd since it was the exact opposite of what both intuition and my own data suggested, but I thought about it some more and what became clear was that all the additional data had introduced a great deal of noise into the equation.”

Leave A Comment