News is now data. But how is this data associated with changes in stock market returns and risks, and is there predictive power in the news via the words used?
This innovative paper asks and answers nine important questions about the interrelationship of news and stock market outcomes.
How should one best measure news using word flow?
Which aspects of word flow should be the focus of measurement?
How can we capture changes over time of the patterns that link frequency, topics, sentiment, and entropy measures of word flow with market outcomes?
Given the potential importance of identifying topical context, how should one identify topics?
Does the effect of our word flow measures operate through a risk channel?
How should one measure risk?
Do empirical patterns that apply to individual company stocks or the aggregate U.S. index also apply to other countries?
What source of news should one use?
Over what time frame should word flow predict risk and return?
What are the Academic Insights?
According to the authors, there are two major methods: a) atheoretical without an a priori position regarding which particular words should be the focus of the analysis and b) identification of key lists of words or combinations of keywords (based on a priori criteria) to see how their presence matters for market outcomes. This paper utilizes the former one, which does not require researchers to know in advance what aspects of word flow are most relevant and it is less prone to data mining issues.
The authors suggest focusing at a minimum on sentiment (positive or negative) based on a preidentified dictionary, frequency of appearance of certain words, unusualness (entropy) of word strings, the context in which words appear (topics) – which is important since sentiment matters differently depending on the context.
The authors use principal component analysis to identify a dividing point. In fact, they present results for the entire sample period (1998-2015) and for two subperiods (April 1998-February 2007, and March 2007-December 2015). They also use a rolling elastic net regression to allow for dynamic changes of coefficients.
Within the set of atheoretical means of identifying topics, there are two common methods, namely the Louvain (Blondel et al., 2008, where each word belongs to only one topic area ) and latent Dirichlet allocation (LDA, see Blei, Ng, and Jordan, 2003, where words can appear in more than one topic area ). The author focus on the first one, which has the advantage of being faster in computational speed.
Yes, the authors find that when a word flow measure predicts positive expected returns, it also predicts a reduction in risk. This suggests that the factors captured by news flow are not priced risks.
To capture risk, in addition to using the standard deviation of returns, the authors also employ the “maximum one-year drawdown.” This measures, at any point in time, the maximum percentage decline that occurs from the current index value during the next year. This measure also is intended to capture the fact that “downside risk” may be treated differently from “upside risk” (the standard deviation of returns treats them as identical).
No, there is ample evidence in the literature that they do not. For this reason, the authors divide countries into EMs and DMs and perform separate panel analyses of each group of countries. They look at a total of 51 countries from 1998 to 2015.
In this specific case, and given the global focus of the analysis, the authors looked for an English source and decided to use Thomson Reuters news database.
Much of the existing finance literature on the effects of sentiment on individual stocks’ returns have focused on high-frequency predictions. However recent studies, including the one we reviewed here, find that it can be useful to aggregate over longer periods of time when analyzing news for individual stocks.
Why does it matter?
The news and word choice within it are relatively new data sets to be examined for predictive power in markets. It is possible that within the words we use hidden meaning and correlations could be found that without the aid of a computer we would never have noticed. Five interesting findings from this paper are as follows.
The plots for EMs and DMs are quite similar for all the topical categories. One noteworthy aspect of the event studies is that news events appear to cause more of a market reaction in the DM sample than in the EM sample. This can reflect either more timely reporting by Reuters in their developed market news bureaus or information leakage (perhaps due to weaker regulatory enforcement) in EM economies.
The nature of news, and the range of potential news outcomes, differ in EMs and DMs (reflecting important differences in the political and economic environments, which are reflected in returns outcomes).
The news contained in the text flow measures studied, forecast one-year ahead returns, and drawdowns. One interpretation of this finding is that word flow captures “collective unconscious” aspects of news that are not understood at the time articles appear but that capture influences on the market that have increasing relevance over time.
Principal components analysis of topic areas suggests a possible change in coefficient values occurs during the onset of the global financial crisis.
Word flow measures tend to have greater incremental predictive power (measured in terms of the percentage improvement in R-squared) for understanding returns and risks in EMs, although they also have important incremental predictive power for returns and drawdowns in DMs.
The Most Important Chart from the Paper:
We develop a classification methodology for the context and content of news articles to predict risk and return in stock markets in 51 developed and emerging economies. A parsimonious summary of news, including topic-specific sentiment, frequency, and unusualness (entropy) of word flow, predicts future country-level returns, volatilities, and drawdowns. Economic and statistical significance is higher and larger for the year ahead than monthly predictions. The effect of news measures on market outcomes differs by country type and over time. News stories about emerging markets contain more incremental information. Out-of-sample testing confirms the economic value of our approach for forecasting country-level market outcomes.
After serving as a Captain in the United States Marine Corps, Dr. Gray earned an MBA and a PhD in finance from the University of Chicago where he studied under Nobel Prize Winner Eugene Fama. Next, Wes took an academic job in his wife’s hometown of Philadelphia and worked as a finance professor at Drexel University. Dr. Gray’s interest in bridging the research gap between academia and industry led him to found Alpha Architect, an asset management firm dedicated to an impact mission of empowering investors through education. He is a contributor to multiple industry publications and regularly speaks to professional investor groups across the country. Wes has published multiple academic papers and four books, including Embedded (Naval Institute Press, 2009), Quantitative Value (Wiley, 2012), DIY Financial Advisor (Wiley, 2015), and Quantitative Momentum (Wiley, 2016).
Dr. Gray currently resides in Palmas Del Mar Puerto Rico with his wife and three children. He recently finished the Leadville 100 ultramarathon race and promises to make better life decisions in the future.
Performance figures contained herein are hypothetical, unaudited and prepared by Alpha Architect, LLC; hypothetical results are intended for illustrative purposes only. Past performance is not indicative of future results, which may vary. There is a risk of substantial loss associated with trading stocks, commodities, futures, options and other financial instruments. Full disclosures here.