T+n time series analysis
This isn’t stock advice, it’s a data and math project. AAPL is interesting (not investing advice!) from a data perspective. Part of my big data project was the creation of a tool to analyze every symbol regressed on every other symbol for numerous lags. In addition to revealing the vendor’s mangling of data for unknown purposes, that analysis of lags produced at least one independent variable upon which the regression of future AAPL produced a .3 R-squared value. That is a weak or low effect size, but was a genuine discovery which passed hypothesis testing across time horizons. That was Direxion Daily S&P Oil & Gas Exp. & Prod. Bear 2X Shares (DRIP). DRIP was the first independent variable I discovered through the lag analysis. The problem is that the lag analysis is very computational intensive. Then, after doing that, ferreting out the red herrings because of bogus data takes another large amount of time. It seems that my data provider inserts bogus data producing .9 r^2 values between different vectors. These are of course problems that can one can mitigate with better code. That too, takes time. This isn’t a document about that first statistically significant predictor for AAPLt+3.
Keeping the t+ and t- in mind causes some difficulty for me. T+2 means today plus 2. If today is January 5, t+2 is January 7. If we are analyzing t+3, that is January 8. Programming in 0-indexed programming languages produced an inner impulse to count from 0 instead of 1. With zero indexing, the third option is 7 as shown in this example of a list [5, 6, 7, 8]. This is a heavily ingrained impulse that I must both use and mitigate. This project used Python and R and shell scripting.
The first 1 factor model devised using DRIP was as follows:
-2.065(DRIP Typical %change) -.0040 = AAPL Typical %changet+3
This model was tested in Stata using the data in manipulated in Python and R. This formula may be of no use in the future or even now. I might revisit this in discussing lags portion of the project later