adfuller - testing for cointegration

325 views
Skip to first unread message

Stephan

unread,
Jul 3, 2015, 2:36:55 PM7/3/15
to pystat...@googlegroups.com
I'm testing stock prices for cointegration by checking their spreads for mean reversion.

For this I used

statsmodels.tsa.stattools.adfuller(spreadseries).

I tested with

spreadseries = numpy.log10(numpy.cumsum(numpy.random.randn(50000))+1000),

-which is a nonstationary series-

and

results = statsmodels.tsa.stattools.adfuller(p, maxlag=None, regression='c', autolag='AIC')


I obtained p value= 0.06, which does almost mean the series is stationary.

Why is that?

When I use adfuller like this:

results = statsmodels.tsa.stattools.adfuller(spreadseries, maxlag=None, regression='ct', autolag='AIC')

I get a p-value of 0,2, which indicates nonstationarity more clearly.

So-I'm unsure how to properly use the test in my case.
Any help and ideas?


josef...@gmail.com

unread,
Jul 3, 2015, 3:03:56 PM7/3/15
to pystatsmodels
Did you check more than one example?
Because this is random, we would need at least a small Monte Carlo to check whether there is any interesting or weird behavior.

My guess was that the log transform might mess up the test, but in an example, I get large p-values with or without log10

>>> p = (numpy.cumsum(numpy.random.randn(50000))+1000)
>>> import statsmodels.tsa.stattools
>>> results = statsmodels.tsa.stattools.adfuller(p, maxlag=None, regression='c', autolag='AIC')
>>> results
(-2.136242381506678, 0.23019320091444895, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, 141879.18707858815)

>>> results_log10 = statsmodels.tsa.stattools.adfuller(numpy.log10(p), maxlag=None, regression='c', autolag='AIC')
>>> results_log10
(-2.0446687253882567, 0.26733715234622352, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, -624405.95866750379)

>>> results_log10 = statsmodels.tsa.stattools.adfuller(numpy.log10(p), maxlag=10, regression='c', autolag='AIC')
>>> results_log10
(-2.0446687253882567, 0.26733715234622352, 8, 49991, {'5%': -2.8615978181014836, '10%': -2.5668007746632417, '1%': -3.4304808162631715}, -624984.43077686161)


BTW: If the time series is long, then it's better to use maxlag, because autolag search in adfuller is still wasting memory.

Josef


Kevin Sheppard

unread,
Jul 5, 2015, 6:08:23 PM7/5/15
to pystat...@googlegroups.com
> I obtained p value= 0.06, which does almost mean the series is stationary.

Under the null, 5% of nonstationary series should be classified as stationary.  This is the size o the test (Pr of type 1 error) .

Stephan

unread,
Jul 6, 2015, 1:56:19 PM7/6/15
to pystat...@googlegroups.com
I run my code several times again and - voila- I obtain p-values in both stationary and nonstationary series that I would expect.
And indeed, when I fix the max lag entry, it runs much faster, however, the obtained p-value (nonstationary) is quite sensitive to the max lag setting.
 I assume, in eighter case the regression parameter has to be set to regression='c', not 'ct'?
Reply all
Reply to author
Forward
0 new messages