Factor Timing HypothesisWe can think of plausible reasons why past factor performance would influence future performance. Since we safely reside in fantasy land, let’s assume the value factor has had a good run recently (my sincerest condolences to Wes :-)). In this scenario, funds that employ the value strategy would see their AUMs increase due to the combination of performance gains and an influx of new investors. The same funds would then invest more money into value stocks, lifting the price of those stocks further. Of course, we can think of the opposite scenario happening where poor factor performance leads to continued poor results [cough][cough]value[/cough]. Using this intuition, we formulate our hypothesis such that there’s a sentiment state that affects whether factors will perform well in the future, and that past factor performances influence these sentiments. We then design a machine learning algorithm that reflects this hypothesis.(1) One obvious way to represent sentiment is to use a numerical variable. A positive number indicates positive sentiment and vice versa. The sentiment’s evolution will depend on two sources of data – past sentiment (because sentiment changes gradually), and recent performance. We can mathematically express this belief as follows:
new sentiment = [(sentiment retention variable) x (past sentiment)] + [(new performance influence variable) x (new performance)] After we iteratively update the sentiment with past performance data, we can then attempt to predict future factor performances using the following formula.
expected factor performance = (sentiment scaling variable) x (final sentiment) Now that we’ve set up the structure of the algorithm, our goal now is to find the best values for sentiment retention variable, new performance influence variable, and sentiment scaling variable such that the expected factor performance aligns most closely to actual performance. How do we do that? This is where the magic of machine learning comes into play. We train machine learning algorithms by feeding the model with historical data. For example, data may indicate that momentum factor performances were -0.6%, +1.0%, …, +0.2% from May 2015 to Apr 2018, and that that the 3-month performance of the factor was +2.1% from May 2018 to Jul 2018. The machine learning model first tries some random numbers for the variables, calculates the expected factor performances using past performance data (i.e. May 2015 to Apr 2018 data), and then determines which variable numbers appeared to have predicted future performance (i.e. May 2018 to Jul 2018 data) best. It would then make intelligent guesses to find better variable numbers that improve the predictions.
Bayesian Approach and PriorsThere are two major approaches to finding the best variable numbers – Frequentist and Bayesian. The Frequentist approach is more common, and it treats each of the variables as single point numbers, not statistical distributions. In other words, it may assign a single value of ‘0.52’ to the sentiment retention variable, as opposed to a range between ‘0.3 to 0.7’. While the Frequentist approach is simpler, it’s more prone to overfitting (i.e. learning bad lessons from data), particularly when working with small and/or noisy data sets. The Bayesian approach, on the other hand, models each variable as statistical distributions.(2) In other words, it acknowledges that there’s some uncertainty in what the variable numbers should be. In addition to being more resistant to overfitting, the Bayesian approach allows us to incorporate some ‘prior’ beliefs about the variables to help the machine learning model train better. As we’re dealing with a small noisy data set, the Bayesian approach is favored for its flexibility. In order to use the Bayesian approach, we need to set priors for each variable. Let me explain how I set them.
- sentiment retention: This variable determines how much of the past sentiment carries over into the next month. I thought it would be reasonable to expect that the magnitude of the sentiment carried over will decrease over time. Memories tend to fade, not amplify. This led me to constrain the absolute value of the variable to less than or equal to 1.
- new performance influence: I wasn’t sure what the effect of recent performances would be on the sentiment, but I didn’t believe that the magnitude of the effect would be extremely large. For example, I didn’t think a +2% monthly performance would have 10 times (i.e. +20%) magnitude effect on sentiment. To express this belief, I chose to use a normal distribution with 0 mean and 5 standard deviation for the prior, which only allows for a 5% chance that the magnitude of the new performance influence is greater than 10.
- sentiment scaling: We’re trying to forecast the future quarterly return, and I believed that the sentiment state should have a comparable magnitude to that of monthly returns, so a rough guess of 3 made sense as the variable’s value. I thought the variable is more likely to be positive than negative, but I wanted to allow for the possibility that the value is negative. I also wanted to allow for a wide range of values to be possible (since 3 is just a very rough guess), so I chose the prior to be a normal distribution with a mean of 3 and a standard deviation of 10.
- model uncertainty: We didn’t discuss this variable before, but it’s necessary for the Bayesian approach. Our model not only tries to predict the future factor performances, but it also tries to model the uncertainty around the predictions as well by finding the correct standard deviation around the prediction. I thought it reasonable to assume that model uncertainty would be of similar magnitude to quarterly returns, which is roughly 8%. Standard deviation values must be positive, and since I assumed that values above 20% is unlikely, I’ve used the half Cauchy distribution with median value of 20% as the prior for the model uncertainty.