adfuller - testing for cointegration

Stephan

unread,

Jul 3, 2015, 2:36:55 PM7/3/15

to pystat...@googlegroups.com

I'm testing stock prices for cointegration by checking their spreads for mean reversion.

For this I used

statsmodels.tsa.stattools.adfuller(spreadseries).

I tested with

spreadseries = numpy.log10(numpy.cumsum(numpy.random.randn(50000))+1000),

-which is a nonstationary series-

and

results = statsmodels.tsa.stattools.adfuller(p, maxlag=None, regression='c', autolag='AIC')

I obtained p value= 0.06, which does almost mean the series is stationary.

Why is that?

When I use adfuller like this:

results = statsmodels.tsa.stattools.adfuller(spreadseries, maxlag=None, regression='ct', autolag='AIC')

I get a p-value of 0,2, which indicates nonstationarity more clearly.

So-I'm unsure how to properly use the test in my case.
Any help and ideas?

josef...@gmail.com

unread,

Jul 3, 2015, 3:03:56 PM7/3/15

to pystatsmodels

Did you check more than one example?

Because this is random, we would need at least a small Monte Carlo to check whether there is any interesting or weird behavior.

My guess was that the log transform might mess up the test, but in an example, I get large p-values with or without log10

>>> p = (numpy.cumsum(numpy.random.randn(50000))+1000)

>>> import statsmodels.tsa.stattools

>>> results = statsmodels.tsa.stattools.adfuller(p, maxlag=None, regression='c', autolag='AIC')

>>> results

(-2.136242381506678, 0.23019320091444895, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, 141879.18707858815)

>>> results_log10 = statsmodels.tsa.stattools.adfuller(numpy.log10(p), maxlag=None, regression='c', autolag='AIC')

>>> results_log10

(-2.0446687253882567, 0.26733715234622352, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, -624405.95866750379)

>>> results_log10 = statsmodels.tsa.stattools.adfuller(numpy.log10(p), maxlag=10, regression='c', autolag='AIC')

>>> results_log10

(-2.0446687253882567, 0.26733715234622352, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, -624984.43077686161)

BTW: If the time series is long, then it's better to use maxlag, because autolag search in adfuller is still wasting memory.

Josef

Kevin Sheppard

unread,

Jul 5, 2015, 6:08:23 PM7/5/15

to pystat...@googlegroups.com

> I obtained p value= 0.06, which does almost mean the series is stationary.

Under the null, 5% of nonstationary series should be classified as stationary. This is the size o the test (Pr of type 1 error) .

Stephan

unread,

Jul 6, 2015, 1:56:19 PM7/6/15

to pystat...@googlegroups.com

I run my code several times again and - voila- I obtain p-values in both stationary and nonstationary series that I would expect.
And indeed, when I fix the max lag entry, it runs much faster, however, the obtained p-value (nonstationary) is quite sensitive to the max lag setting.
I assume, in eighter case the regression parameter has to be set to regression='c', not 'ct'?

Reply all

Reply to author

Forward