Order from Chaos: Data Science is Revolutionizing Investment Practice

  • Joseph Simonian, Marcos Lopez de Prado, and Frank Fabozzi
  • Journal of Portfolio Management, Fall 2018
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category.

What are the Research Questions?

This editorial introduces data science to the wider investment community and highlights some of the advantages (and potential pitfalls as discussed yesterday) it can bring to everyday investment practice.

The paper answers two apparently simple questions:

  1. What is data science?
  2. How can data science help advance investing practice?

What are the Academic Insights?

  1. Data science is a field of study that combines the use of statistics and computing to discover or impose order in complex data to enhance informed decision-making. Machine learning, one branch of data science, comprises a family of computational techniques that facilitate the automated learning of patterns and the formation of predictions from data.  These algorithms are generally designed to solve one of two types of problems: a classification-type problem, in which the goal is to categorize data into different types, or a regression-type problem, in which the goal is to predict a quantity for a variable given the values for a set of predictor variables.
  2. Both types of problems are ubiquitous in finance, so machine learning can be viewed as a natural extension to investment practitioners’ existing toolset. Here is a concrete example: in trying to predict future returns for a restaurant stock, an analyst is using price momentum as a base trending signal. The signal is strong but the analyst would like to supplement it with additional information on the number of patrons who have been frequenting the chain of restaurants nationwide and in the last 12 months. In fact, if the number of cars has been increasing over the last 12 months, that would seem to justify the strong price momentum observed in the market. Satellite imagery comes in handy but it is difficult to extract information from such “unstructured data sets”. A machine learning technique comes in handy: neural networks, which will be used to distinguish cars from non-cars ( classification problem).

Why does it matter?

The authors conclude:

As the sophistication and power of computing continue to grow, data science will surely continue its march to the forefront of investment research and practice. The journey has just begun.

The Most Important Chart from the Paper

There was no chart in this paper but here is an image on data science that relates to the situation.

Photo by Franki Chamaki on Unsplash


Data are fundamental inputs to any applied scientific endeavor. Finance is no different. Yet for almost its entire existence as an organized field of inquiry, much of finance has relied almost exclusively on relatively primitive and rigid forms of data analysis to drive both investment theories and real-world portfolio management decisions. Over the years, it has become evident that the complexity of today’s markets, with a seemingly infinite amount of data being generated at rapid speeds, cannot be handled using the blunt mathematical arrows being deployed from the analytical quivers of most investment professionals. In response, the investment industry has begun to recognize the value and importance of the various methodological approaches that constitute what has come to be known as data science.

Print Friendly, PDF & Email