A critical task in stock selection is identifying a firm’s true profitability. Given the potential of AI to deal with large data, an important question is: Can AI outsmart seasoned analysts? Matthew Shaffer and Charles Wang sought to answer this question in their October 2024 paper “Scaling Core Earnings Measurement with Large Language Models,” in which they studied the application of large language models (LLMs) to the estimation of core earnings. They began by noting that the task requires “judgment and integration of information scattered throughout financial disclosures contextualized with general industry knowledge. This has become increasingly difficult as financial disclosures have become more ‘bloated’ and accounting standards have increased non-recurring impacts on GAAP net income. LLMs, with their ability to process unstructured text, incorporate general knowledge, and mimic human reasoning, may be well-suited for this kind of task.”
Shaffer and Wang developed a process to use LLMs to estimate core earnings from the annual 10-K filings of a large sample of U.S. public companies: scraping filings from EDGAR, extracting the HTMLs to clean text, and making calls to the GPT-4o API provided by OpenAI. Their sample included roughly 2,000 U.S. companies over the 24-year period from 2000 to 2023. They employed LLMs with two prompting strategies:
A baseline “out of the box” (lazy) approach providing only a definition of core earnings and the full 10-K. The lazy approach required the LLM to estimate core earnings and provide a rationale. No other guidance was given to assess the LLM’s basic performance. Here is the full prompt used.
“You are a financial analyst tasked with determining a company’s core earnings based on its 10-K filing. Core earnings represent the persistent profitability of the company’s central and ongoing activities, exclusive of ancillary items and one-time shocks. This concept aims to capture the owner’s earnings – the sustainable, recurring profitability that accrues to equity holders. Please analyze the provided 10-K text and estimate the company’s core earnings. Start with the reported GAAP net income and make adjustments you deem necessary based on the information in the 10-K. Provide a clear explanation of your reasoning for each adjustment. Additionally, to make it possible to extract your answer later, please include the following tag at the end of your response, after you finish your reasoning and calculation: “*Core Earnings LLMs and Core Earnings 12 Calculation (final) = $[your determination]” where [your determination] is the final core earnings amount you calculate.”
A structured “sequential” approach, refined through experiments, instructed the model to identify unusual losses, then gains, and then tabulate and aggregate them. The sequential approach involved three threaded API (Application Programming Interface) calls, each with its own prompt and response. An API is a set of rules and protocols that allows different software applications to communicate and interact with each other. It acts as a bridge between two software applications, enabling them to exchange data, features, and functionality.
- Call 1: Identification of Non-Recurring Losses/Expenses: LLMs and Core Earnings 13 You are an expert financial analyst with extensive experience. Here’s a 10-K. Are there any nonrecurring/unusual expenses in the income statement, cash flow statement, footnotes, or MD&A? Be comprehensive and check your work twice. [Cleaned 10-K Text Inserted Here]
- Call 2: Identification of Non-Recurring Gains/Income: Are there any nonrecurring/unusual income in the income statement, cash flow statement, footnotes, or MD&A? Be comprehensive and check your work twice.
- Call 3: Computation of Adjusted Earnings: Based on the above, compute a new earnings measure. Start with net income, add back nonrecurring/unusual expenses, subtract nonrecurring/unusual income. No hypothetical values. Express in $ Millions. Provide a summary table like this example:
They evaluated the models’ analyses by reviewing their stated reasoning process and analyzing their core earnings measures with an array of standard quantitative tests. Following is a summary of their key findings:
- Under the baseline approach, the LLM confused core earnings with other earnings-type measures such as EBITDA and cashflow and made incorrect adjustments for recurring expenses such as interest, depreciation, and amortization.
- The sequential approach provided more accurate earnings forecasts—it outperformed GAAP Net Income and Compustat’s OPEPS and OIADP in predicting average future earnings in most standard tests.
- The sequential prompt core earnings measure was higher than GAAP Net Income (i.e., income-increasing) only 61% of the time, compared to 87% for Compustat’s Operating Income After Depreciation (OIADP) and 64% for Earnings Per Share from Operations (OPEPS).
- The sequential prompt measure exceeded GAAP Net Income by less than 10 cents per share at the median, versus 90 cents per share for OIADP. The mean absolute prediction error was $1.58 for the sequential measure, $1.77 for the GAAP Net Income benchmark, and $1.56 for Compustat’s OPEPS. The explanatory power (R2) obtained from regression estimates of next-period net income was 70.86% for the sequential approach exceeding that of the GAAP Net Income benchmark at 60.87%.
- When the prediction horizon was extended to two years, the sequential measure produced an R² of 83.60%, higher than Compustat’s OPEPS 66.57%.
- The LLMs were more effective at capturing the components of earnings that persist over a longer term—the Sequential Prompt Core Earnings per Share measure had the desired property that its adjustments were not persistent. Compustat’s OPEPS did as well, while both the Lazy Analyst measure of core earnings and Compustat’s OIADP lacked this property, having recurring adjustments.
- By breaking the forecasting task into small steps, the sequential approach was superior at avoiding conceptual errors—nonrecurring gains and losses were filtered out and then aggregated in an accurate manner.
- The sequential measure generated a core earnings forecast that reflected the stable components of profitability over time.
- The sequential model produced an autoregressive coefficient (reflecting the level of persistence), of .917 compared to the GAAP Net Income AR coefficient of .849, suggesting that its core earnings measure captured the stable components of profitability. However, Compustat’s OPEPS and OIADP had slightly higher persistence values with coefficients of 1.0174 and 1.0178, respectively. The authors concluded the sequential approach still provided a competitive and valid measure of stability, capturing more meaningful elements of core profitability, especially when compared to GAAP Net Income.
- Addressing the question of whether the adjustments made by the LLM to GAAP Net Income were nonrecurring, they found the adjustments did include some recurring elements and did not completely exclude transitory components—persistence coefficients were estimated for the sequential measure (.0288, insignificant) and the “lazy analyst” measure (.0759, significant at 5%), and Compustat’s OIADP (.3125, significant at 1%).
- The core earnings estimate derived from the LLM sequential prompt produced the lowest mean-squared-error and highest average R2.
Their findings led Shaffer and Wang to conclude:
“Models can fail and succeed in complex tasks of this nature. For researchers, we pave a path for using current and future models to generate valid, neutral, scalable measures of core earnings, rather than relying on surrogates provided by company management or standard data providers. Overall, our findings suggest LLMs have enormous potential for lowering the costs associated with processing and analyzing the increasingly bloated financial disclosures of publicly traded companies.”
They added:
“Our results offer empirical support for anecdotal claims that these models can fail when used ‘out of the box,’ on complex tasks without sufficient guidance; but can perform remarkably well when properly guided…. We believe the most distinctive and relevant application of future LLMs may be in tasks that, like ours, blend background knowledge, reasoning, integration of text, and judgment–tasks that mirror those of human knowledge workers.”
Before concluding we need to review the findings of a related study “From Man vs. Machine to Man + Machine: The Art and AI of Stock Analyses,” published in the October 2024 issue of the Journal of Financial Economics. The authors, Sean Cao, Wei Jiang, Junbo Wang, and Baozhong Yang, examined how AI performs compared to human analysts in predicting stock returns. They built their own AI model for 12-month stock returns predictions (inferred from 12-month target prices), to be compared to analyst forecasts made at the same time on the same stock. They collected firm-level, industry-level, and macroeconomic variables, as well as textual information from firms’ disclosures, news, and social media (updated to right before the time of an analyst forecast), as inputs or predictors, deliberately excluding information from analyst forecasts themselves so that the AI model did not benefit from analyst insights. Their sample of analyst forecasts was built from the Thomson Reuters I/B/E/S analyst database. After merging I/B/E/S with CRSP and Compustat data, their final sample consists of 1,153,565 12-month target price forecasts on 6,315 firms issued by 11,890 analysts from 861 brokerage firms, and 5,885,063 1-quarter to 4-quarter earnings predictions on 8,062 firms issued by 14,363 analysts from 926 brokerage firms and covered the period from 1996 to 2018. Their model spanned the period 2001-2018. Studying firm-level, industry-level, and macroeconomic variables, as well as textual information from firms’ disclosures, news, and social media (updated to right before the time of an analyst forecast), their results led them to conclude:
“Overall, this study supports the hypothesis that analyst capabilities could be augmented by AI, and more importantly, that analysts’ work possesses incremental value to and synergies with AI modeling, especially in unusual and fast-evolving situations.”
They added:
“While the future of AI remains uncertain, the parts of human skills that are incremental to AI, as we document, allow for promising Man + Machine collaboration and augmentation.”
Investor Takeaways
Shaffer and Wang demonstrated that the use of LLMs could allow investors to access more reliable earnings metrics without relying solely on expensive proprietary databases or specialized financial expertise. In addition, since the LLMs produced more reliable forecasts of earnings, and the tool is widely available, their use should make the market more efficient. The takeaway for investors is that LLMs seem likely to make active security selection even more of a loser’s game than it already is. AI provides yet another reason why alpha is getting harder and harder to generate. For an in-depth discussion of the explanations for “The Incredible Shrinking Alpha” I recommend Andrew Berkin’s and my book with that title.
Larry Swedroe is the author or co-author of 18 books on investing, including his latest Enrich Your Future.
About the Author: Larry Swedroe
—
Important Disclosures
For informational and educational purposes only and should not be construed as specific investment, accounting, legal, or tax advice. Certain information is deemed to be reliable, but its accuracy and completeness cannot be guaranteed. Third party information may become outdated or otherwise superseded without notice. Neither the Securities and Exchange Commission (SEC) nor any other federal or state agency has approved, determined the accuracy, or confirmed the adequacy of this article.
The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).
Join thousands of other readers and subscribe to our blog.