I'm trying to obtain the halflife/lookback period for a simple mean reversion trading strategy. Previously, I was using SciPy's stats package to do my linear regression. However I wanted to be able to get critical values of the test statistic out of the regression, so I opted for statsmodels OLS
. I'm using statsmodels.tools.tools.add_constant
to account for the intercept in my data. The problem is that with SciPy
(same data) I've been getting a negative slope for the regression. With statsmodels
I'm getting a positive value. For mean reverting time series, the slope of an Augmented Dickey Fuller regression should be negative! Here is a sample of price data: (x)
2007-07-24 1.03660
2007-07-25 1.04180
2007-07-26 1.05360
2007-07-27 1.06390
2007-07-30 1.06556
2007-07-31 1.06680
2007-08-01 1.05600
2007-08-02 1.05300
2007-08-03 1.05210
2007-08-06 1.05170
2007-08-07 1.05510
2007-08-08 1.04850
2007-08-09 1.05705
2007-08-10 1.05310
And a sample of the price change data: (y)
2007-07-24 -0.01016
2007-07-25 0.00520
2007-07-26 0.01180
2007-07-27 0.01030
2007-07-30 0.00166
2007-07-31 0.00124
2007-08-01 -0.01080
2007-08-02 -0.00300
2007-08-03 -0.00090
2007-08-06 -0.00040
2007-08-07 0.00340
2007-08-08 -0.00660
2007-08-09 0.00855
2007-08-10 -0.00395
This is the function that I'm calling to get my statistics:
def adf(x,y):
class holder(object):
1
results = holder()
model = sm.GLS(y,sm.add_constant(x)).fit()
coint = model.resid
print model.params
adfstat, pvalue, critvalues, res = ts.adfuller(coint, store=True, regresults=True)
results.df = model.params[1]/model.bse[1]
results.crit = res.critvalues
results.slope = model.params[1]
results.halflife = -log(2)/model.params[1]
results.lookback = int(round(-log(2)/model.params[1]))
results.coint = coint
return results
What's exceedingly interesting, is that I'm getting a halflife of -116 days for this time series. The result I'm looking to obtain is a halflife of around 115 days (positive). So this tells me the slope I'm getting in the regression is positive, when it should be negative. Any thoughts? I've achieve the correct result using Scipy stats, however I switched over to get the critical values. You might ask why I'm not just using adfuller()
for this. Because I also want to use this to, in one punch, calculate optimum hedge ratio and halflife of a cointegrated set of series. Something that I'm not sure you can do with adfuller()
p.s. my data sample for regression is about 1216 data points long. If you want to test using the full set, use "ffn for python" to download financial data for ticker "CAD=x" from 2007-07-22 to 2012-03-28.