tsa plot_acf: is it possible to test if autocorrelations are zero on the graph?

333 views
Skip to first unread message

nak3...@gmail.com

unread,
Jan 12, 2017, 3:08:40 PM1/12/17
to pystatsmodels
A standard statistical test to check if an autocorrelation is equal to 0 at the 95% confidence level is to see if the magnitude of the autocorrelation is greater than 1.96/sqrt(T), where T is the number of data points you have. Is it possible to have the 1.96/sqrt(T) threshold plotted on the plot_acf graph? At first I thought that's what the blue shaded region was, but if you look at the example in the tutorial here you'll see that the blue threshold varies over time.

josef...@gmail.com

unread,
Jan 12, 2017, 4:19:52 PM1/12/17
to pystatsmodels

On Thu, Jan 12, 2017 at 3:08 PM, <nak3...@gmail.com> wrote:
A standard statistical test to check if an autocorrelation is equal to 0 at the 95% confidence level is to see if the magnitude of the autocorrelation is greater than 1.96/sqrt(T), where T is the number of data points you have. Is it possible to have the 1.96/sqrt(T) threshold plotted on the plot_acf graph? At first I thought that's what the blue shaded region was, but if you look at the example in the tutorial here you'll see that the blue threshold varies over time.

and there should be one in diagnostics

(busy at the moment and cannot look more closely.)

Josef

josef...@gmail.com

unread,
Jan 12, 2017, 5:12:58 PM1/12/17
to pystatsmodels
The shaded area is the pointwise confidence interval, and not based on a joint test or simultaneous interval, AFAIR.

Ljung-Box https://en.wikipedia.org/wiki/Ljung%E2%80%93Box_test reports a sequence of tests for all lags up to the maxlag

aside: how many lags should we use ?

Josef


 

Josef

josef...@gmail.com

unread,
Jan 12, 2017, 5:34:29 PM1/12/17
to pystatsmodels
On Thu, Jan 12, 2017 at 5:12 PM, <josef...@gmail.com> wrote:


On Thu, Jan 12, 2017 at 4:19 PM, <josef...@gmail.com> wrote:


On Thu, Jan 12, 2017 at 3:08 PM, <nak3...@gmail.com> wrote:
A standard statistical test to check if an autocorrelation is equal to 0 at the 95% confidence level is to see if the magnitude of the autocorrelation is greater than 1.96/sqrt(T), where T is the number of data points you have. Is it possible to have the 1.96/sqrt(T) threshold plotted on the plot_acf graph? At first I thought that's what the blue shaded region was, but if you look at the example in the tutorial here you'll see that the blue threshold varies over time.

BTW: The newer version of the notebook here http://www.statsmodels.org/dev/examples/notebooks/generated/tsa_arma_0.html doesn't have the exceptions (missing updates after refactoring) at the end.
 

nak3...@gmail.com

unread,
Jan 12, 2017, 7:05:38 PM1/12/17
to pystatsmodels

I don't think I conveyed what I wanted to convey. Here's an example from R (found on this website). Observe the dashed lines indicating the zero threshold.

josef...@gmail.com

unread,
Jan 12, 2017, 7:44:55 PM1/12/17
to pystatsmodels
On Thu, Jan 12, 2017 at 7:05 PM, <nak3...@gmail.com> wrote:

I don't think I conveyed what I wanted to convey. Here's an example from R (found on this website). Observe the dashed lines indicating the zero threshold.



Ok, that's the question which confidence intervals to compute.

I don't find any information after searching a bit. The code predates pull requests and I don't see a discussion on github. Brief google search seems to favor var = 1/N.

Skipper added this and I don't remember based on which reference.
As far as a vaguely remember, there is an issue about the alternative in the test statistic for creating the confidence interval.

I don't have a time series text book handily available to check.

Brief check with the Stata ts manual: It seems to have the same variance and confidence interval as statsmodels based on a MA process, referring to Brockwell and Davis (2002) page 94

I never checked the details or arguments for different confidence intervals, and my Brockwell and Davis is on a dead notebook.
If there is a justification to prefer the 1/N confidence interval, then we could add an option for it.

Josef

josef...@gmail.com

unread,
Jan 12, 2017, 7:56:20 PM1/12/17
to pystatsmodels
On Thu, Jan 12, 2017 at 7:44 PM, <josef...@gmail.com> wrote:


On Thu, Jan 12, 2017 at 7:05 PM, <nak3...@gmail.com> wrote:

I don't think I conveyed what I wanted to convey. Here's an example from R (found on this website). Observe the dashed lines indicating the zero threshold.



Ok, that's the question which confidence intervals to compute.

I don't find any information after searching a bit. The code predates pull requests and I don't see a discussion on github. Brief google search seems to favor var = 1/N.

Skipper added this and I don't remember based on which reference.
As far as a vaguely remember, there is an issue about the alternative in the test statistic for creating the confidence interval.

I don't have a time series text book handily available to check.

Brief check with the Stata ts manual: It seems to have the same variance and confidence interval as statsmodels based on a MA process, referring to Brockwell and Davis (2002) page 94

I never checked the details or arguments for different confidence intervals, and my Brockwell and Davis is on a dead notebook.
If there is a justification to prefer the 1/N confidence interval, then we could add an option for it.

related difference between Stata and R, and how to replicate Stata's Bartlett confidence intervals in R
http://stats.stackexchange.com/questions/57577/correlogram-in-r-like-in-stata

josef...@gmail.com

unread,
Jan 12, 2017, 10:38:16 PM1/12/17
to pystatsmodels
On Thu, Jan 12, 2017 at 7:56 PM, <josef...@gmail.com> wrote:


On Thu, Jan 12, 2017 at 7:44 PM, <josef...@gmail.com> wrote:


On Thu, Jan 12, 2017 at 7:05 PM, <nak3...@gmail.com> wrote:

I don't think I conveyed what I wanted to convey. Here's an example from R (found on this website). Observe the dashed lines indicating the zero threshold.



Ok, that's the question which confidence intervals to compute.

I don't find any information after searching a bit. The code predates pull requests and I don't see a discussion on github. Brief google search seems to favor var = 1/N.

Skipper added this and I don't remember based on which reference.
As far as a vaguely remember, there is an issue about the alternative in the test statistic for creating the confidence interval.

I don't have a time series text book handily available to check.

Brief check with the Stata ts manual: It seems to have the same variance and confidence interval as statsmodels based on a MA process, referring to Brockwell and Davis (2002) page 94

I never checked the details or arguments for different confidence intervals, and my Brockwell and Davis is on a dead notebook.
If there is a justification to prefer the 1/N confidence interval, then we could add an option for it.

related difference between Stata and R, and how to replicate Stata's Bartlett confidence intervals in R
http://stats.stackexchange.com/questions/57577/correlogram-in-r-like-in-stata


Looks like an option for 1/N would be useful.

1/N is the variance under the assumption of white noise

What we and Stata have is a sequential confidence interval with changing Null hypothesis:

If we want to test or compute confidence intervals that acf(k) = 0 i.e. at the k-th lag, then there are several processes that are consistent with this Null hypothesis
1) the process is white noise, which would imply that the acf at all lags are zero, or
2) the process is a MA(q) with order q < k, i.e. acf at lower lags could be non-zero, but acf at lags >= k are zero. The worst case, or largest deviation from white noise consistent with acf(k) = 0 (and also zero acf at larger lags), is MA(q) for q=k-1.

1) implies the variance of acf(k) is 1/nobs
2) that's what we and Stata uses.

reference Brockwell and Davis (2002) page 94, and section 2.4

We also match Stata and Brockwell and Davis in using 1/nobs for pacf.
(The stackoverflow answer used the same MA(q) type calculation also for pacf. Second last comment to the answer points out not to use Bartlett.)
var for pacf(k) under AR(q), q<k is approximately 1/nobs (BD p.96)

MA(q) and AR(p) seem to be the appropriate reference processes to decide whether we should increase the MA or AR order. That is we want a sequential choice instead of comparing all acf against the white noise null.

This reminds me: Stata has user (Baum) provided autocorrelation test that does not restrict acf at low lags, i.e. H0: acf(k)=0 for k > q. We don't have it (yet).
I thought we had a discussion somewhere, but all I find is
https://github.com/statsmodels/statsmodels/issues/1175#issuecomment-29477371

Josef
Reply all
Reply to author
Forward
0 new messages