Non-Standard Errors

  • 164 authors
  • Working paper
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category

What are the research questions?

Most readers are familiar with p-hacking and the so-called replication crisis in financial research (see here, here, and here for differing views). Some claim that these research challenges are driven by a desire to find ‘positive’ results in the data because these results get published, whereas negative results do not get published (the evidence backs these claims).

But this research project identifies and quantifies another potential issue with research — the researchers themselves! This “noise” created by differences in empirical techniques, programming language, data pre-processing, and so forth are deemed “non-standard-errors,” which may contribute even more uncertainty in our quest to determine intellectual truth. Yikes!

In this epic study, the #fincap community delivers a fresh standardized dataset and a set of hypotheses to 164 research teams across the globe. The authors then try and identify the variation in the results due to differences in the researcher’s approach to tackling the problem.

The research questions the paper seeks to address are as follows:

  1. How large are non-standard errors for financial research?
  2. What explains the differences?
  3. Does peer-review minimize these differences?
  4. How aware are researchers of the differences?

What are the Academic Insights?

  1. How large are non-standard errors for financial research?
    1. Very large. On basic questions such as “How has market efficiency changed over time?” one gets a huge dispersion in ‘evidence-based’ insights.

  2. What explains the differences?
    1. Hard to say, but intuitive concepts such as team quality, process quality, and paper quality are examined. The evidence, collectively, seems to suggest it is hard to explain the research differences on the dimensions listed.

  3. Does peer-review minimize these differences?
    1. Yes. Which reinforces why it is important to run a peer-review process and have people review your work with a critical eye.

  4. How aware are researchers of the differences?
    1. Not aware at all. But this is not surprising — we already know humans are generally confused, on average.

Why does it matter?

We already knew that data-mining was a problem in academic research and researchers are working hard to fix this problem. However, this paper brings up a new source of variability — the researchers themselves! And the sad part is all of these variations and biases embedded in research may not be tied to nefarious motives — they are simply part of the landscape and should be considered when reviewing academic research.

Of course, we should be clear that the takeaway is NOT to disregard academic research and the scientific approach to learning new things. Relying on intuition and gut feel is a process that will likely lead to even more bias and warped conclusions! So while academic research is not flawless, its the best we got. To me, the key data point from this paper is that we should reinforce peer-review processes and establish a research culture where criticism is applauded, not derided.

The most important chart from the paper


In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in sample estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: non-standard errors. To study them, we let 164 teams test six hypotheses on the same sample. We find that non-standard errors are sizeable, on par with standard errors. Their size (i) co-varies only weakly with team merits, reproducibility, or peer rating, (ii) declines significantly after peer-feedback, and (iii) is underestimated by participants.

Print Friendly, PDF & Email