A Protocol to Prevent “Quants Gone Wild”

///, Machine Learning/A Protocol to Prevent “Quants Gone Wild”

A Protocol to Prevent “Quants Gone Wild”

By |2018-12-18T10:16:08+00:00December 17th, 2018|Academic Research Insight, Machine Learning|

A Backtesting Protocol in the Era of Machine Learning

  • Rob Arnott, Campbell Harvey, and Harry Markowitz
  • Working paper
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category.

What are the Research Questions?

Data mining in finance has long been a concern for academic researchers. Campbell Harvey, one of the authors on this paper, is leading the effort to ensure the integrity of empirical finance research. For example, see here for a post on his address to the AFA.

The concerns associated with data mining aren’t going away. A monster increase in affordable computing power is facilitating the use of machine learning to create predictive algorithms in finance. Machine learning algorithms have built-in defenses against data mining, but they aren’t full proof. Moreover, the data required to do proper cross-validation does not exist in finance (at least in the investing realm…HFT may be a different story).(1)

This paper addresses a basic question related to the use of quantitative methods (to include machine learning) in the context of finance:

  1. Can we develop a sensible research protocol to deal with data-mining concerns?

What are the Academic Insights?

The authors outline a great research protocol. Below we outline the 7 steps of the protocol with our simple key takeaway on each. Readers should dig into the paper for more detail on each component to fully appreciate what is being proposed.

  1. Research motivation
    • Start any research project with an ex-ante hypothesis, driven by economic foundations
    • For a “winning strategy,” ask the following question: Who is on the other side of the trade?…and why?
  2. Multiple testing and statistical methods
    • How many variables were tried?
    • How many combinations were used?
    • Do you have enough data to justify the value of additional complexity? Probably not.
  3. Data and sample choice
    • Live with the data you’ve been dealt — don’t cherry pick, transform, “clean”, and windsorize at random…
    • …but also make sure the data is accurate (e.g., market cap doesn’t exceed 10 trillion for 20 percent of the data set)
  4. Cross-validation
    • There is no real “out of sample” at this point, save live trading data and fresh historical data.
  5. Model dynamics
    • Beware of structural change. Humans are tricky animals with evolving tastes.
    • Avoid “tweaking” a model based on live results.
  6. Complexity
    • Keeps things as simple as possible, but no simpler.
    • We don’t have enough data to truly assess the value of complexity.
  7. Research culture
    • Reward good processes, not good results. (h.t. to Annie Duke for expressing a similar idea in “Thinking in Bets.”).
    • Do you know where the bodies are buried? Probably not –Do your own research!

Why does it matter?

The authors make a simple, but important point:

When data are limited, economic foundations become more important. 

Here is a framework for understanding how markets work.

The Most Important Chart from the Paper

Here is a “magical” backtest of a strategy that is long all stocks with tickers with an “s” as the third letter and short stocks with tickers that have “u” as the third letter.

  • In-sample and out-of-sample validation
  • No correlation with known factors
  • Low turnover


Machine learning offers a set of powerful tools that holds considerable promise for investment management. As with most quantitative applications in finance, the danger of misapplying these techniques can lead to disappointment. One crucial limitation involves data availability. Many of machine learning’s early successes originated in the physical and biological sciences, in which truly vast amounts of data are available. Machine learning applications often require far more data than are available in finance, which is of particular concern in longer-horizon investing. Hence, choosing the right applications before applying the tools is important. In addition, capital markets reflect the actions of people, which may be influenced by others’ actions and by the findings of past research. In many ways, the challenges that affect machine learning are merely a continuation of the long-standing issues researchers have always faced in quantitative finance. While investors need to be cautious—indeed, more cautious than in past applications of quantitative methods—these new tools offer many potential applications in finance. In this article, the authors develop a research protocol that pertains both to the application of machine learning techniques and to quantitative finance in general.

  • The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).
  • Join thousands of other readers and subscribe to our blog.
  • This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

References   [ + ]

1.We are highlighting some applications of machine learning to investing this week, so what follows is timely!

About the Author:

Wesley Gray, PhD
Wes Gray has published multiple academic papers and four books, including Quantitative Value (Wiley, 2012), DIY Financial Advisor (Wiley, 2015), and Quantitative Momentum (Wiley, 2016).After serving as a Captain in the United States Marine Corps, Dr. Gray earned an MBA and a PhD in finance from the University of Chicago where he studied under Nobel Prize Winner Eugene Fama. Next, Wes took an academic job in his wife’s hometown of Philadelphia and worked as a finance professor at Drexel University. Dr. Gray’s interest in bridging the research gap between academia and industry led him to found Alpha Architect, an asset management firm that delivers affordable active exposures for tax-sensitive investors. He is a contributor to multiple industry publications and regularly speaks to professional investor groups across the country. Wes currently resides in the suburbs of Philadelphia with his wife and three children.
Yes No
This website uses cookies and third party services. Settings Ok


We use “cookies” on this site. A cookie is a piece of data stored on a site visitor’s hard drive to help us improve your access to our site and identify repeat visitors to our site. For instance, when we use a cookie to identify you, you would not have to log in a password more than once, thereby saving time while on our site. Cookies can also enable us to track and target the interests of our users to enhance the experience on our site. Usage of a cookie is in no way linked to any personally identifiable information on our site. Some of our business partners may use cookies on our site (for example, advertisers). However, we have no access to or control over these cookies.

Embedded Content

Articles on this Site may include embedded content (e.g. videos, images, articles, etc.). Embedded content from other websites behaves in the exact same way as if the visitor has visited the other website.These websites may collect data about you, use cookies, embed additional third-party tracking, and monitor your interaction with that embedded content, including tracking your interaction with the embedded content if you have an account and are logged in to that website.