OLS regression for augmented dickey fuller test

138 views
Skip to first unread message

Brandon Johnson

unread,
Jul 21, 2016, 3:26:54 PM7/21/16
to pystatsmodels

I'm trying to obtain the halflife/lookback period for a simple mean reversion trading strategy. Previously, I was using SciPy's stats package to do my linear regression. However I wanted to be able to get critical values of the test statistic out of the regression, so I opted for statsmodels OLS. I'm using statsmodels.tools.tools.add_constant to account for the intercept in my data. The problem is that with SciPy (same data) I've been getting a negative slope for the regression. With statsmodels I'm getting a positive value. For mean reverting time series, the slope of an Augmented Dickey Fuller regression should be negative! Here is a sample of price data: (x)

2007-07-24  1.03660
2007-07-25  1.04180
2007-07-26  1.05360
2007-07-27  1.06390
2007-07-30  1.06556
2007-07-31  1.06680
2007-08-01  1.05600
2007-08-02  1.05300
2007-08-03  1.05210
2007-08-06  1.05170
2007-08-07  1.05510
2007-08-08  1.04850
2007-08-09  1.05705
2007-08-10  1.05310

And a sample of the price change data: (y)

2007-07-24 -0.01016
2007-07-25  0.00520
2007-07-26  0.01180
2007-07-27  0.01030
2007-07-30  0.00166
2007-07-31  0.00124
2007-08-01 -0.01080
2007-08-02 -0.00300
2007-08-03 -0.00090
2007-08-06 -0.00040
2007-08-07  0.00340
2007-08-08 -0.00660
2007-08-09  0.00855
2007-08-10 -0.00395

This is the function that I'm calling to get my statistics:

def adf(x,y):
    class holder(object):
        1
    results = holder()

    model = sm.GLS(y,sm.add_constant(x)).fit()
    coint = model.resid

    print model.params

    adfstat, pvalue, critvalues, res = ts.adfuller(coint, store=True, regresults=True)

    results.df = model.params[1]/model.bse[1]
    results.crit = res.critvalues
    results.slope = model.params[1]
    results.halflife = -log(2)/model.params[1]
    results.lookback = int(round(-log(2)/model.params[1]))
    results.coint = coint

    return results

What's exceedingly interesting, is that I'm getting a halflife of -116 days for this time series. The result I'm looking to obtain is a halflife of around 115 days (positive). So this tells me the slope I'm getting in the regression is positive, when it should be negative. Any thoughts? I've achieve the correct result using Scipy stats, however I switched over to get the critical values. You might ask why I'm not just using adfuller() for this. Because I also want to use this to, in one punch, calculate optimum hedge ratio and halflife of a cointegrated set of series. Something that I'm not sure you can do with adfuller()

p.s. my data sample for regression is about 1216 data points long. If you want to test using the full set, use "ffn for python" to download financial data for ticker "CAD=x" from 2007-07-22 to 2012-03-28.

josef...@gmail.com

unread,
Jul 22, 2016, 4:09:34 AM7/22/16
to pystatsmodels
I never looked at half life, and don't remember all the details of ADF anymore.

However, the estimation is in first differences with a lagged level as
regressor and additional lagged differences to capture the short run
behavior.
So, from what I can see, the sign should be negative, i.e. if there is
a large positive value in the level than the movement (diff) should be
down towards the mean. If there is a small or negative value, then the
movement should be up, i.e. in the positive direction.

based on a quick look wikipedia seems to say the same

https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test
see gamma < 0 as alternative hypothesis (or gamma !=0 if two sided)

Josef

josef...@gmail.com

unread,
Jul 22, 2016, 4:54:58 AM7/22/16
to pystatsmodels
Also, based on reading the code, the final stored regression result
has the lagged level in the zero position
adfstat = resols.tvalues[0]

add_trend defaults to prepend=False, so in the final regression,
constant and trend are at the end of the regressors.

Josef


>
> Josef
Reply all
Reply to author
Forward
0 new messages