What is a Bayesian solution? Good question. Bayesian statistics, named for Thomas Bayes, is a structured framework that allows one to update the probability of an event occurring as new data about that event becomes available. In the context of the infamous Factor Zoo in investing, Bayes’ rule provides an avenue for the investor to revise his/her beliefs about the likelihood that a stock will perform well as information evolves over time. For developers of quantitative factor strategies, robust methods are sorely needed. It is a tricky proposition to improve the identification of the “best” factors while reducing the plague of false positives associated with datamining.

Bayesian Solutions for the Factor Zoo: We Just Ran Two Quadrillion Models

  • Svetlana Bryzgalova, Jiantao Huang, and Christian Julliard
  • Journal of Finance
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category.

What are the research questions?

  1. What is the “factor zoo”?
  2. What are the key determinants of the framework’s performance?
  3. How does the Bayesian framework handle model selection and averaging?
  4. How are weak factors handled in the proposed framework?
  5. What was the answer to the best/weakest factor identification?

What are the Academic Insights?

  1. The term “factor zoo” refers to the large and growing number of potential factors identified in the empirical asset pricing literature that drive or explain equity returns. This proliferation creates a challenge for researchers and practitioners trying to identify which factors genuinely drive asset returns. Researchers in the field of factor investing need a better approach. The list of factors and associated references tested here can be found in Appendix B.
  2. A detailed description of the performance of each determinant of the Bayesian framework utilized in this study includes:
    • Factor selection is accurate: The framework effectively identifies a small subset of factors likely to be part of the best model. This is crucial because it ensures that the model includes only those factors that have a significant impact on asset returns.
    • Effective shrinkage of factors by using Spike-and-Slab priors: The spike-and-slab prior is a type of prior distribution used in Bayesian statistics, particularly in variable selection and regression models. It is designed to handle situations where some coefficients are expected to be exactly zero (indicating irrelevant variables), while others can take non-zero values (indicating relevant variables). Inclusion of factors is achieved by assigning a high probability to zero for irrelevant factors (spike) while a normal distribution is assigned to relevant factors (slab). As a consequence, relevant variables are separated from irrelevant ones, a mechanism is provided to shrink unnecessary coefficients towards zero while allowing significant coefficients to be estimated more accurately. This prior effectively controls potential overfitting and ensures the selected factors are genuinely influential.
    • Weak factors can be identified: The framework can detect weakly identified factors and manage their inclusion appropriately. By shrinking the coefficients of these factors towards zero (using spike-and-slab priors), the framework prevents weak factors from distorting the overall model.
    • Bayesian Model Averaging (BMA) provides optimal aggregation: BMA allows the framework to aggregate multiple models, weighing them based on their posterior probabilities. This aggregation leads to an averaged model that incorporates the strengths of various candidate models, improving the robustness and reliability of the predictions.
    • Model uncertainty is handled: BMA addresses model uncertainty by considering a wide range of potential models rather than relying on a single best model. This reduces the risk of overfitting and ensures that the final model is robust to different data scenarios.
    • In-sample and out-of-sample validation is conducted: The framework is validated both in-sample and out-of-sample, demonstrating its robustness and reliability. The BMA approach outperforms existing models in predicting asset returns, indicating its practical utility.
    • Cross-sectional and time-series data is tested: The framework performs well across both cross-sectional and time-series dimensions, highlighting its versatility and robustness in different analytical contexts.
    • Consistent with economic theory: The priors used in the framework are transparent and motivated by economic considerations, such as beliefs about the Sharpe ratio in the economy. This alignment ensures that the model selection process is not arbitrary but grounded in economic theory.
    • Overfitting: is mitigated: These priors effectively control potential overfitting by imposing economically reasonable constraints on the model parameters.
  3. The Bayesian framework handles model selection and averaging through a complex approach that incorporates BMA, spike-and-slab priors, and posterior probabilities. The framework employs a spike-and-slab prior to handling weak factors, which helps to ensure that weakly identified factors are not selected by mistake. This prior shrinks the posterior distribution of weak factors towards zero, thus mitigating their impact on model selection. By focusing on cross-sectional performance and utilizing efficient computational techniques, the framework ensures robust and reliable model selection in the presence of many potential factors. This method not only identifies the best models but also aggregates information from multiple models to improve predictive performance and mitigate the risk of overfitting. The framework is designed to be numerically simple and computationally feasible, even for an extensive number of models. The authors demonstrate that their method can handle quadrillions of potential models within a reasonable time period, making it practical for real-world applications.
  4. Weak factors in asset pricing models are those that have little to no true covariance with asset returns, yet they may appear empirically relevant due to noise or limited sample size. These factors can distort model selection and inference, leading to unreliable results. The use of “spike-and-slab priors” effectively manages weak factors by shrinking their coefficients towards zero. This ensures that only genuinely influential factors are included in the model, enhancing the model’s reliability. By shrinking the posterior distributions of weak factors towards zero, the spike-and-slab prior ensures that these factors do not disproportionately affect the model selection process.
  5. An hypothetical illustration of the results:
    • Best Factors
      • Market (MKT)
      • Size (SMB)
      • Value (HML)
      • Momentum (MOM)
    • Weakest Factors
      • Industry-Specific Factors:
      • Short-Term Technical Indicators
      • Highly Complex or Overfit Factors

Why does it matter?

Using Bayesian methods to construct factor strategies provides an opportunity to integrate new evidence with a prior information while increasing robustness and reliability. In addition to the advantages discussed previously, users may expect risk management to improve as estimation risk and changes in market conditions are regularly updated. The authors propose a novel Bayesian framework for analyzing linear asset pricing models designed to handle issues associated with high-dimensional problems: (1) provide reliable price of risk estimates; (2) detect weakly identified factors, (3) identify the best factor model, or provide a Bayesian model substitute (the averaging-stochastic discount factor (BMA-SDF)) when no best model can be identified. This approach offers a significant advancement in the empirical asset pricing literature, addressing key challenges posed by the factor zoo.

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained.  Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.

Abstract

We propose a novel framework for analyzing linear asset pricing models: simple, robust, and applicable to high-dimensional problems. For a (potentially misspecified) stand-alone model, it provides reliable price of risk estimates for both tradable and nontradable factors, and detects those weakly identified. For competing factors and (possibly nonnested) models, the method automatically selects the best specification—if a dominant one exists—or provides a Bayesian model averaging stochastic discount factor (BMA-SDF), if there is no clear winner. We analyze 2.25 quadrillion models generated by a large set of factors and find that the BMA-SDF outperforms existing models in- and out-of-sample.

About the Author: Tommi Johnsen, PhD

Tommi Johnsen, PhD
Tommi Johnsen is the former Director of the Reiman School of Finance and an Emeritus Professor at the Daniels College of Business at the University of Denver. She has worked extensively as a research consultant and investment advisor for institutional investors and wealth managers in quantitative methods and portfolio construction. She taught at the graduate and undergraduate levels and published research in several areas including: capital markets, portfolio management and performance analysis, financial applications of econometrics and the analysis of equity securities. In 2019, Dr. Johnsen published “Smarter Investing” with Palgrave/Macmillan, a top 10 in business book sales for the publisher.  She received her Ph.D. from the University of Colorado at Boulder, with a major field of study in Investments and a minor in Econometrics.  Currently, Dr. Johnsen is a consultant to wealthy families/individuals, asset managers, and wealth managers.

Important Disclosures

For informational and educational purposes only and should not be construed as specific investment, accounting, legal, or tax advice. Certain information is deemed to be reliable, but its accuracy and completeness cannot be guaranteed. Third party information may become outdated or otherwise superseded without notice.  Neither the Securities and Exchange Commission (SEC) nor any other federal or state agency has approved, determined the accuracy, or confirmed the adequacy of this article.

The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).

Join thousands of other readers and subscribe to our blog.