Evidence-Based Investing? Take that Alpha and Shove It.

/Evidence-Based Investing? Take that Alpha and Shove It.

Evidence-Based Investing? Take that Alpha and Shove It.

By | 2017-08-18T17:05:22+00:00 March 3rd, 2017|Factor Investing, Research Insights|9 Comments

Johnny Paycheck has a great country song centered around the following lyric:

Take this job and shove it…I ain’t working here no more…

Campbell Harvey, in the 2017 AFA Presidential Address, elaborates an analogous comment on the current state of the financial economics field:

Take this alpha and shove it…I ain’t publishing this research no more…

Prof. Harvey is rightly concerned that the incentives to publish “strong significant results” are super high in finance and economics and this is skewing our true understanding of reality.  In short, Campbell has the intellectual fortitude to state plainly what many of us have known — or indirectly sensed — data-mining is probably rampant in financial economics.

harvey calling out finance

 

Here is a video lecture based on the topic — highly recommend everyone watch this for further understanding.

To be clear, Prof. Harvey is not saying that all research is bogus, he is just saying that we need to be much more skeptical, have much higher standards, and develop more sensible techniques, when determining if a specific research finding is robust and reliable. For example, here are past posts we’ve done on the topic:

Prof. Harvey also points out the following regarding our discussion on their “factor zoo” paper, which looks at over 300 published findings and identifies that the majority of these findings are false, because the t-stats need to be adjusted above 3:

…raise the threshold for discovering a new finding to t>3

In this new discussion from Harvey, he makes an even more compelling point:

I would like to make the case today that making a decision based on t>3 is not sufficient either.

Prof. Harvey’s discussion emphasizes the importance of the 5 elements of “factor identification” discussed in Swedroe/Berkin’s new book on factor investing:

  • Be persistent over a long period of time, and across several market cycles;
  • Be pervasive across a wide variety of investment universes, geographies, and sometimes asset classes;
  • Be robust to various specifications;
  • Have intuitive explanations grounded in strong risk and/or behavioural arguments, with reasonable barriers to arbitrage; and,
  • Be implementable after accounting for market impacts and transaction costs.

Why Should We Be Weary of “Evidence-Based Investing”

Some of the most interesting research I’ve done has never been published, probably because the work wasn’t that good, but also because the results showed a “non-result.” For example, one of my favorite research projects involved the investigation of the so-called “limited attention” hypothesis. Limited attention is a core concept behind our tweak on momentum investing (see “frog in the pan” concept). We collected a new dataset of all NYSE bell events, which served as an “attention shock,” and allowed us to do a relatively clean test on “limited attention” and question our core assumption that this bias influences asset prices.

Here is a short-hand version of what we found:

…limited attention has little influence on asset prices.

Here is a direct comment from the referee report we received from the journal:

Regardless of the economic story, I think the weak/no result in price movement makes the paper a tough sell and hurts its potential contribution.

Not sure one can find clearer evidence that there is a desire to publish “positive” results. Kinda sad, but also reality.

Here are some figures from Prof. Harvey’s paper, which highlight that our experience is not an anecdote, but potentially a normal operating procedure.

First, the figure below shows the distribution of t-stats associated with research published on so-called “factor studies.” (See here for a discussion on factor investing):

where are the t stats

No t-stats? No publish.

Note that there are very few papers published, which show that something doesn’t work, but a ton of papers published showing that something does work. But one would think that knowing what doesn’t work is as important as knowing what does work — not according to the journal editors.

Next, we look at the broader academic research fields to see if this is “standard,” or unique to finance/economics. To address this question, Prof. Harvey looks to research from Fanelli (2010). Fanelli examines how likely various research fields will publish research that have “positive” findings related to a particular hypothesis:

data miners across the sciences

Economists, and social sciences in general, are pretty bad. Psychology wins the grand prize for “data-mining.”

What Can the Academic Literature Do?

As Prof. Harvey points out, the American Statistical Association is aware of the problem that many researchers do not really understand statistical significance and that “the p-value has become a gatekeeper for whether work is publishable.” In reaction to the problem, the ASA released the following 6 principles to address the misuse of p-values:

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Prof. Harvey suggests his own solutions to the problem:

  1. Better research methods
  2. Fix the agency problem.

Better Research Methods:

With respect to “better research methods,” the good professor’s recommendations revolve around transparency, better theory, and the use of a “Bayesion p=value,” which addresses the core question we really want answered: What’s the probability the null hypothesis is true, given the data? Of course, to calculate the Bayesian p-value, one has to input their prior thoughts on how likely the null hypothesis is true. For example, is there a 50% chance the size effect is real? Or is there a 2o% chance? Maybe if we are testing if “beta matters” we put the probability at 50%. But maybe if we are looking at the size effect we think the odds are 4/1, or there is only a 20% chance the null is false? The table below highlights how the Bayesian p-value can correct for our prior assumptions about how likely something is “real.”

How’s this work? Below, Prof. Harvey looks at the size effect and shows that the p-value is .0099, which essentially says that, given the null hypothesis is 0, the chance of seeing the observation is less than 1%. Pretty unlikely. But what if we think that the size effect, which we argue doesn’t have great economic theory behind it, is probably only “real” with a 20% chance? The Bayesian p-value is now .125, and basically says that the chance the null hypothesis is true (and our assumption it is correct with an 80% chance), given the data observed, is about 12.5% — which is a lot less compelling that <1%!

bayesian p value

Anyway, the Bayesian p-value concept is pretty cool and allows one to ascertain up-front how likely the alternative hypothesis is true. If one thinks the alternative is totally ridiculous (e.g., stock tickers predict performance), the Bayesian p-value can account for this, whereas the standard p-value approach won’t cut it. Cool.

Fix the Agency Problem? Good Luck!

The agency problem, or the incentive for researchers to submit papers with strong statistical significance, is tough to tackle and Prof. Harvey suggests that solving this problem will be difficult.

Tough one to tackle.


The Scientific Outlook in Financial Economics

  • Campbell Harvey
  • A version of the paper can be found here.

Abstract:

It is time that we reassess how we approach our empirical research in financial economics. Given the competition for top journal space, there is an incentive to produce “significant” results. With the combination of: unreported tests, lack of adjustment for multiple tests, direct and indirect p-hacking, many of the research results that we are publishing will fail to hold up in the future. In addition, there are some fundamental issues with the interpretation of statistical significance. Increasing thresholds, such as t > 3, may be necessary but such a rule is not sufficient. If the effect being studied is rare, even a rule like t > 3 will produce a large number of false positives. I take a step back and explore the meaning of a p-value and detail its limitations. I offer a simple alternative approach known as the minimum Bayes factor which delivers a Bayesian p-value. I present a list of guidelines that are designed to provide a foundation for a robust, transparent research culture in financial economics. Finally, I offer some thoughts on the importance of risk taking (both from the perspective of both authors and editors) to advance our field.


  • The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).
  • Join thousands of other readers and subscribe to our blog.
  • This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

Print Friendly, PDF & Email

About the Author:

Wes Gray
After serving as a Captain in the United States Marine Corps, Dr. Gray earned a PhD, and worked as a finance professor at Drexel University. Dr. Gray’s interest in bridging the research gap between academia and industry led him to found Alpha Architect, an asset management that delivers affordable active exposures for tax-sensitive investors. Dr. Gray has published four books and a number of academic articles. Wes is a regular contributor to multiple industry outlets, to include the following: Wall Street Journal, Forbes, ETF.com, and the CFA Institute. Dr. Gray earned an MBA and a PhD in finance from the University of Chicago and graduated magna cum laude with a BS from The Wharton School of the University of Pennsylvania.
  • Marc Gerstein

    Surprised you’d cite the five-biullet points from the book by Swedroe/Berkin. I don’t know Berkin at all, but have had much opportunity to debate Swedroe and find him to be a horrific data miner and that he has absolutely no understanding comprehension of any aspect of financial theory.

  • Hi Marc,
    That might be true on the personal level, but pulling Swedroe/Berkin out of the picture, the framework still makes sense to try and separate what’s real vs. fake (I think the framework is actually from AQR originally).

  • re Bayes: Totally agree, but at least there are some bounds and a framework around inputting a prior on the null hypothesis. For example, one might find that stock names that have the letter D have incredible alpha, under the null of market efficiency. The raw p-value is .0001. Pretty unlikely that it was a random event. But a skeptic might say, “Yeah right, this is total BS. The odds are more like 10/1.” You could say, “Fine, let’s plug it in and adjust the p-value/Tstat using your prior”. You bust out the calculation and the p-value is .01. So still a 1% chance the null is true, given the 10/1 prior, and given the data reviewed. Still pretty dang compelling that a strategy that buys stocks with the letter D is a legit finding.

    Long story short. Yes, the “prior” concept adds a new element of potential “opinion,” but it also might help facilitate a deeper understanding of what the p-value actually means.

  • Also, a simpler way to think about it is that the “subjective” prior allows one to input their assessment of “data-mining” likelihood up front. Whereas the normal p-value calc is under the assumption that the sample was independent and not shown after trying a 1000 other ideas that didn’t work.

  • sixchickensleft

    Hi Wes

    Let’s break this down Barney style: Incentives matter & researchers have an incentive to come up with results that can make their sponsors money. The problem is by no means confined to econ or finance:

    http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970

    Like they say, the only thing worse than no data is bad data.

    Not having to at least assemble and massage some numbers into making sense, the humanities have their own set of problems as evidenced by the bizarre theories they come up with that sends Little Johnny to the girls room to sit down and pee.

    In most cases investors believe faulty conclusions because that’s the only part of the paper they read, conveniently bypassing anything involving methods or data. Maybe the researchers know this!

    Sierra Sierra Delta Delta

    Mike

  • rgr that.

  • An interesting follow on piece from Prof. Harvey
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2874625

  • RT1C

    Wes,
    On a related note, have you looked at Hou, Xue and Zhang’s new paper, “Replicating Anomalies”? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2961979&utm_medium=email&utm_source=newsletter&utm_term=170509&utm_campaign=moneystuff

    They take a similar approach to that which took the natural sciences by storm a dozen years ago. They cite Harvey but go beyond. Momentum survives but the returns are lower than published. Maybe that helps explain why so many factor funds are not doing all that well compared to what one might have expected.

    Any thoughts?

  • Yep, been working on SSL transition for all our sites so haven’t had a chance to grind on content. But we have a piece tomorrow that mentions this article and we are doing a full review on it as well. Should be out soon.

    All great questions and something we spend every day thinking about.

    I think the narrative that factor funds aren’t doing well because of “crowding” has a few major holes in it. For example, we already know that value and momentum can go on epic drags for multiple years — similar, if not worse, than what we’ve seen. So were there too many momentum factor chasers in the mid 1800’s? I doubt it. Moreover, blitz has a paper looking at the aggregate factor flows from ETFs — there is no clear pattern. https://alphaarchitect.com/2017/02/17/will-etfs-destroy-factor-investing-nope/

    Arguably the only factors that have crowding are those associated with SP 500 index stocks — mega-cap and beta factors. They have a lot of capacity, but they are also seeing epic flows like we’ve never seen before.

    In short, who the hell knows. I concur with the epic amount of data mining out there since I’ve seen it from the inside. However, I think the narrative that highly persistent anomalies driven by a mix of risk/mispricing — such as value and momentum — are ‘arbed away,’ may be suffering, ironically, from small sample bias.