By |Published On: December 15th, 2022|Categories: Research Insights, Guest Posts, AI and Machine Learning|

This article examines the research on the comparison between various machine learning models to predict the cross-section of emerging market stock returns.

  • Machine Learning and The Cross-Section of Emerging Market Stock Returns, November 28, 2022
  • Hanauer, Matthias, and Kalsbach, Tobias.
  • A version of this paper can be found here.
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category

What is the paper about?

More specifically, the paper differentiates between:

  1. Traditional linear models (ordinary least squares regression and elastic net) and
  2. Machine learning methods that allow for non-linearities and interactions (tree-based models such gradient boosted regression trees and random forest and neural networks with one to five layers)

What are the main results? 

#1

Return forecasts based on machine learning models lead to economically and statistically superior out-of-sample long-short returns compared to traditional linear models. Additionally, the Fama and French (2018) six-factor model can only partly explain these long-short returns, and their alphas remain highly significant.  

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained.  Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.

#2

These findings are robust to several methodological choices and for emerging market sub-regions. Furthermore, the authors document that machine learning forecasts beat linear models consistently over the sample period, and one cannot observe a decline in predictability over time.

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained.  Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.

#3

Developed market long-short returns based on machine learning forecasts derived in the same way as their emerging market counterparts cannot explain emerging market out-of-sample returns. However, models estimated solely on developed market data predict emerging market stock returns nearly as well as an emerging market model. These findings indicate that similar relationships between firm characteristics and future stock returns exist for developed and emerging markets but that the pricing of these characteristics is not fully integrated between developed and emerging markets. Furthermore, these results indicate potential diversification benefits for investors applying such strategies in both developed and emerging markets.

#4

The authors also document that the high returns of the machine learning strategies in emerging do not primarily stem from higher-risk months and do not revert quickly, suggesting that an underreaction explanation is more likely than a risk-based explanation. Although both linear and machine learning models show higher predictability for stocks associated with higher limits-to-arbitrage, the effect is less pronounced for machine learning forecasts than for linear regression forecasts, indicating that the superiority of machine learning models in emerging markets does not stem from limits to arbitrage.

#5

Accounting for transaction costs, short-selling constraints, and limiting the investment universe to big stocks only, the authors document that machine learning-based return forecasts lead to significant net excess returns and net alphas, at least when efficient trading rules are applied.  

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained.  Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.

Conclusion 

The paper documents that return forecasts from machine learning methods lead to superior out-of-sample returns in emerging markets. Interestingly, investors already applying such a strategy in developed markets seem to enjoy potential diversification benefits when applying them also in emerging markets. The authors also investigate the source of the predictability and conclude that it rather stems from mispricing than higher risk. Still, the superiority of machine learning models in emerging markets does not stem from limits to arbitrage. Finally, significant net returns can be achieved when accounting for transaction costs, short-selling constraints, and limiting our investment universe to big stocks only.

About the Author: Matthias Hanauer

Matthias Hanauer is a Senior Researcher and Director at Robeco’s Quant Equity Research team and a postdoctoral researcher at the Technical University of Munich (TUM). His areas of expertise include international factor premia and stock selection research. He has published his work in various peer-reviewed finance journals, including the Journal of Banking and Finance, Finance Research Letters, and the Journal of Portfolio Management. Matthias joined Robeco in February 2014 after submitting his doctoral dissertation. He holds a PhD in Finance (summa cum laude) and a Master’s in Business Administration from Technische Universität München and is a CFA® charterholder.

About the Author: Tobias Kalsbach

Tobias Kalsbach is a Research Associate and PhD cancidate at the Technische Universität München (TUM). His areas of expertise include machine learning and network analysis. Tobias joined TUM in October 2018. He holds a Master’s in Business Administration from Technische Universität München.

About the Author: Matthias Hanauer

Matthias Hanauer