Predicting Stocks : Not a trivial matter!

Why is it so hard to predict stocks? As a little exercise, I decided to put to test my humble (and quite amateur) knowledge on Time Series Analysis and attempted to build a prediction model for stocks closing prices of a certain company, based on three months of historical data. What were the results?

The data 

The plot above shows the closing price for the stocks of company X. What do we see? Perhaps that it goes a little bit like crazy, initially starting at around 41.30$ on Mar 02, and dropping to a disastrous ~ 33.0$ on Mar 14. After this, we notice a recovery, but nonetheless volatility almost every day. On May 01, we see a drop of roughly 2.5$ in price. So what can we conclude from all of this? Well... not much, really. Knowing all of this gives us some cues, but it doesn't really help us seeing into the future.

Fitting the trend

Of course, the first most obvious step is to fit some models for the trend. While all of this looks very pretty, its intention is not to gives us predictions, but rather, to help us create a process without the trend; otherwise, the analysis becomes much harder than it already is. In this case, I decided to use the order-five polynomial fit through linear regression, but you could have used something else; (like the order-5 moving average in yellow). 

Fitting an ARIMA model

After some experimentation, stationarity, causality, invertibility and other tests and analysis, the ARIMA(1,1,0) was the aptest model for this data (again, you could have obtained something different depending on your analysis!). Indeed, the residuals look zero-centered, you could argue that more or less Gaussian and the ACF lags all lie within 0.2 from the mean, except for one.

Forecasting the future

For this exercise, I originally had 42 data points (that is, weekdays data for 3 months, I leave the math to you), from which I used 32 for training and 10 for testing. The results after adjusting the trend are shown by the graph above; the squiggly lines represent the original data points, while the almost straight line at the end represents the predictions from the model. They look pretty close to the real ones eh? Here's the catch: notice the blue area and the bigger area around it; these are 80% and 95% confidence bounds respectively. This is saying: 80% and 95% of the time, respectively, the real value, as opposed to our predictions, will fall inside that interval. In this case, these are huge!!! In particular, the farther we predict into the future, the wider they become. Notice, for instance, the one for May 01: the lowest value of the 95% lower bound is roughly 34$, while the biggest one is 15$. In a real-world situation, this prediction is absolutely flawed and disastrous; this is as good as guessing by eye what tomorrow's value will be! (or even worse). The truth is, even domain professionals often have a hard time doing these kinds of predictions. This shows just how hard it is to predict the stock market.


Although this exercise was based on very real data, recent as to May 06, 2020, this is a toy exercise and does not represent in any way a professional analysis or opinion. I don't recommend doing this kind of analysis on your own. If you wish to invest in the stock market, you should seek advice from a professional in the field. 


Popular Posts