use for biased auto correlation function?

Max Linke

unread,

Jun 23, 2014, 1:06:14 PM6/23/14

to pystat...@googlegroups.com

Hi

I want to estimate a autocorrelation time from a timeseries. When I calculate the autocorrelation function I have the choice between a biased and unbiased estimator.

For me the biased estimator is equal to padding my original timeseries with zeros. Which is the assumption that I can add pairs x_i x_{i+lag} that are uncorrelated to the sum $acf(lag) = \sum_i x_i x_{i+lag}$ . I would like to know why this is a reasonable assumption to make I haven't found any information.

From a gut-feeling I would rather use the unbiased autocorrelation to avoid underestimating the the actual autocorrelation time.

best Max

josef...@gmail.com

unread,

Jun 23, 2014, 9:01:37 PM6/23/14

to pystatsmodels

On Mon, Jun 23, 2014 at 1:06 PM, Max Linke <max.l...@gmail.com> wrote:

Hi

I want to estimate a autocorrelation time from a timeseries. When I calculate the autocorrelation function I have the choice between a biased and unbiased estimator.

For me the biased estimator is equal to padding my original timeseries with zeros. Which is the assumption that I can add pairs x_i x_{i+lag} that are uncorrelated to the sum $acf(lag) = \sum_i x_i x_{i+lag}$ . I would like to know why this is a reasonable assumption to make I haven't found any information.

I don't understand why this should be the case, if you pad the array, then the number of observations increases, which will increase the denominator.

From a gut-feeling I would rather use the unbiased autocorrelation to avoid underestimating the the actual autocorrelation time.

I don't remember having ever seen a Monte Carlo or any clear statement which should be "better". It's a bit like the debate for ddof in np.var or np.std.

In small samples the unbiased version might behave a bit better, however it depends also on the intended usage.

For variance estimation, I don't really care about ddof=0 versus ddof=1. For hypothesis tests using ddof>0 will often be better because we usually want to maintain the size of the test and ddof makes the estimate and hypothesis test a bit more "conservative".

For autocorrelation, unbiased divides by the number of summands that enter in the calculation of that autocorrelation, for a given lag we have nobs - lag observations.

However, if we divide by nobs instead, then we are "shrinking" autocorrelation for larger lags. This might actually give an estimate with smaller mean squared error, even if there is bias, because we have fewer observations for larger lags and the "noise" is relatively larger, i.e. larger variance.

( a bit in analogy to Ridge regression, where if we have many parameters we might be better off by shrinking them towards zero.)

This is just general principles, I haven't looked at this in years, and don't remember any handy reference.

Josef

best Max

Sturla Molden

unread,

Jun 24, 2014, 8:07:34 PM6/24/14

to pystat...@googlegroups.com

Max Linke <max.l...@gmail.com> wrote:

> From a gut-feeling I would rather use the unbiased autocorrelation to avoid
> underestimating the the actual autocorrelation time.

My gut feeling is to go for the more accurate estimator. There will always
be an error. Bias is just a part of the total error.
The strive for "unbiasedness" is one of my N issues with frequentist
statistics. Who came up with the idea that a "large unbiased error" is more
accurate than a "small biased error"? The "most powerful unbiased"
criterion for choosing estimators is an error of thought.

Sturla

josef...@gmail.com

unread,

Jun 24, 2014, 8:16:39 PM6/24/14

to pystatsmodels

BTW https://github.com/statsmodels/statsmodels/pull/1665

Classical statistics that biases towards prior information (!) to get biased estimates that have smaller MSE.

we just call it regularization or penalization nowadays.

(but don't tell the machine learners or Bayesians that we are doing the same thing just with econometrics.)

:)

Josef

Sturla

Reply all

Reply to author

Forward