Bootstrap resampling of timeseries

533 views
Skip to first unread message

Andreas Hilboll

unread,
Feb 4, 2013, 5:12:54 AM2/4/13
to pystat...@googlegroups.com
Hi,
I need to do some bootstrap resampling of timeseries. I have build my
own nonlinear trend fit function with scipy.optimize. For the bootstrap
resampling, it would be nice to have some ready-made function(s) for
shuffling the timeseries, with the possibility to specify the length of
the bootstrap interval.
Is there something like this in statsmodels?
Cheers, Andreas.

josef...@gmail.com

unread,
Feb 4, 2013, 8:11:37 PM2/4/13
to pystat...@googlegroups.com
Unfortunately not.
bootstrap in statsmodels is currently included in some models for
specific cases, but there is still not much of a general setup.

There was a gist for stationary bootstrap but it looks like it has disappeared.
I haven't worked on time series models in quite some time, and never added this.

block bootstrap should be easy to implement, some more difficult parts
are to figure out what block size to choose, or in the case of
stationary bootstrap, what the average block length should be.


BTW: Sorry about the delayed approval of the message, it looks like
google groups has a long delay or forgets to send out moderator
notifications.

Josef



> Cheers, Andreas.
>
> --
> You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Paul Hobson

unread,
Feb 4, 2013, 8:46:42 PM2/4/13
to pystat...@googlegroups.com
Andreas,

I'm working on something similar. As for reshuffling the time series, I use numpy.random.random_intergers. 

Doe that help or am I missing the mark here?
-paul 

josef...@gmail.com

unread,
Feb 4, 2013, 8:59:25 PM2/4/13
to pystat...@googlegroups.com
reshuffling the entire data is only appropriate if there is no
dependence of observations.

In block bootstrap we shuffle segments of the data, for example split
the data into 5 observation blocks and then reshuffle the blocks, or
choose random starting values and then use the next 5 observations, or
something like that.

In stationary bootstrap, also the block length is chosen randomly.

Something like this is usually used when there is autocorrelation in
the data or noise. I'm not quite sure about all the variations. I
think in some models we only reshuffle the noise when we simulate a
new dataset.

Josef

> -paul

Guillaume Calmettes

unread,
Feb 4, 2013, 9:10:04 PM2/4/13
to pystat...@googlegroups.com
Hi Andreas,

I also needed to bootstrap some non linear fit function to to draw confidence intervals bands for the original fit as well as confidence interval limits of one of the parameter of the fit.

See the nbviewer below for an example of what I did (not quite sure this is what you wants to achieve)

http://nbviewer.ipython.org/4711516/

Guillaume

josef...@gmail.com

unread,
Feb 4, 2013, 9:56:36 PM2/4/13
to pystat...@googlegroups.com
On Mon, Feb 4, 2013 at 9:10 PM, Guillaume Calmettes
<guillaume...@gmail.com> wrote:
> Hi Andreas,
>
> I also needed to bootstrap some non linear fit function to to draw confidence intervals bands for the original fit as well as confidence interval limits of one of the parameter of the fit.
>
> See the nbviewer below for an example of what I did (not quite sure this is what you wants to achieve)
>
> http://nbviewer.ipython.org/4711516/

this kind of bootstrap, where you randomly draw from the residuals,
works when you can assume that the residual for all observations come
from the same distribution and are independent, that is no correlation
among residuals and no heteroscedasticity (requires same noise
variance for all observations).

to the last point:
If the noise in the observations might have varying variance, the
'wild bootstrap' is a possibility. George and Ralph added it in a
similar way to the non-parametric kernel regression, but it's not yet
available as standalone function.

Josef

josef...@gmail.com

unread,
Feb 5, 2013, 4:26:13 PM2/5/13
to pystat...@googlegroups.com
On Mon, Feb 4, 2013 at 8:11 PM, <josef...@gmail.com> wrote:
> On Mon, Feb 4, 2013 at 5:12 AM, Andreas Hilboll <li...@hilboll.de> wrote:
>> Hi,
>> I need to do some bootstrap resampling of timeseries. I have build my
>> own nonlinear trend fit function with scipy.optimize. For the bootstrap
>> resampling, it would be nice to have some ready-made function(s) for
>> shuffling the timeseries, with the possibility to specify the length of
>> the bootstrap interval.
>> Is there something like this in statsmodels?
>
> Unfortunately not.
> bootstrap in statsmodels is currently included in some models for
> specific cases, but there is still not much of a general setup.
>
> There was a gist for stationary bootstrap but it looks like it has disappeared.
> I haven't worked on time series models in quite some time, and never added this.
>
> block bootstrap should be easy to implement, some more difficult parts
> are to figure out what block size to choose, or in the case of
> stationary bootstrap, what the average block length should be.

I thought it might be tricky to get the block resampling without
loops, so I gave it a try last night

https://github.com/josef-pkt/statsmodels/commit/f009f4d089a72101cbaa0e5e3e9c837a539fa01b

intended as index into the original data/residual array.
moving blocks with overlap seems to have better statistical properties
from a quick check of some references

(branch: boots)

Josef

Andreas Hilboll

unread,
Feb 6, 2013, 10:59:19 AM2/6/13
to pystat...@googlegroups.com, josef...@gmail.com
>>> Hi,
>>> I need to do some bootstrap resampling of timeseries. I have build my
>>> own nonlinear trend fit function with scipy.optimize. For the bootstrap
>>> resampling, it would be nice to have some ready-made function(s) for
>>> shuffling the timeseries, with the possibility to specify the length of
>>> the bootstrap interval.
>>> Is there something like this in statsmodels?
>>
>> Unfortunately not.
>> bootstrap in statsmodels is currently included in some models for
>> specific cases, but there is still not much of a general setup.
>>
>> There was a gist for stationary bootstrap but it looks like it has disappeared.
>> I haven't worked on time series models in quite some time, and never added this.
>>
>> block bootstrap should be easy to implement, some more difficult parts
>> are to figure out what block size to choose, or in the case of
>> stationary bootstrap, what the average block length should be.
>
> I thought it might be tricky to get the block resampling without
> loops, so I gave it a try last night
>
> https://github.com/josef-pkt/statsmodels/commit/f009f4d089a72101cbaa0e5e3e9c837a539fa01b
>
> intended as index into the original data/residual array.
> moving blocks with overlap seems to have better statistical properties
> from a quick check of some references
>
> (branch: boots)

Cool, thanks, Josef! This should get me going :)
A.

josef...@gmail.com

unread,
Feb 6, 2013, 11:22:06 AM2/6/13
to Andreas Hilboll, pystat...@googlegroups.com
And when you get going, please send improvements or bug reports back. :)

Josef

> A.

josef...@gmail.com

unread,
Feb 6, 2013, 12:17:42 PM2/6/13
to pystat...@googlegroups.com
just parking a coffee break idea for later (I don't have the branch now)

stationary bootstrap with (almost) no python loop:
numpy has a function that uses interval boundaries, but I cannot find
the name right now, and I don't know which version of numpy.

Josef

>
> Josef
>
>> A.

josef...@gmail.com

unread,
Feb 6, 2013, 9:36:52 PM2/6/13
to pystat...@googlegroups.com
Ain't working, doesn't exist in numpy

as fun (or joke), to check stationary bootstrap on the command line

1000 draws of the index for stationary bootstrap with mean block
length equal to 4, as a one-liner:
(uses large oversampling to guarantee minimum size is nobs)

>>> nobs = 10
>>> ridx = np.array([(np.concatenate([np.arange(start, start+length) for start, length in zip(np.random.randint(nobs, size=nobs), np.random.geometric(1./4, size=nobs))])[:nobs]) for _ in xrange(1000)])

>>> ridx[ridx>=nobs] = ridx[ridx>=nobs] - nobs

>>> for i in range(5): print ridx[i]

[1 2 9 0 1 2 3 8 9 0]
[2 8 9 0 6 7 8 9 1 6]
[1 2 3 4 5 6 7 7 8 7]
[4 5 6 7 3 4 5 6 2 3]
[1 3 4 5 8 1 2 3 4 5]

>>> np.bincount(ridx.ravel())
array([1011, 989, 1019, 1017, 992, 983, 994, 1008, 980, 1007])

another run
array([1018, 979, 972, 1023, 1005, 1018, 993, 983, 1015, 994])

looks roughly uniform
but is not "production" code

Josef


>
> Josef
>
>>
>> Josef
>>
>>> A.

Andreas Hilboll

unread,
Mar 9, 2013, 9:20:39 AM3/9/13
to josef...@gmail.com, pystat...@googlegroups.com
I ended up using your code for MBB as-is, as I didn't find any bug.

Thanks again!

In the process, I coded the purely data-driven blocklength selector by
B�hlmann and K�nsch [1], as well as a bias-corrected, accelerated (BCa)
confidence interval calculation.

I'd like to somehow share this (together with the original
replication-generating loop function). And ideas where it might fit
best? Someplace in statsmodels?

Andreas.


[1] http://www.sciencedirect.com/science/article/pii/S0167947399000146

josef...@gmail.com

unread,
Mar 9, 2013, 9:47:42 AM3/9/13
to pystat...@googlegroups.com, Andreas Hilboll
> Bühlmann and Künsch [1], as well as a bias-corrected, accelerated (BCa)
> confidence interval calculation.

from a quick look at the paper, the block length selection is
independent of a specific model.
Can it be just used for example on the residuals?

In this case I would put it together with the bootstrap code.

>
> I'd like to somehow share this (together with the original
> replication-generating loop function). And ideas where it might fit
> best? Someplace in statsmodels?

It would fit well in statsmodels. I started a branch with the block
and stationary bootstrap iterators, but haven't done anything else
with it.

Currently we have an empty ``resampling`` directory in statsmodels
https://github.com/statsmodels/statsmodels/tree/master/statsmodels/resampling
and some code for iterators that is still in the sandbox/tools folder

model specific bootstrap is together with the models.

You could open a pull request and put your functions into
statsmodels/resampling.

It would also be useful if you have a example with it's usage, because
up to now it's not yet clear how we should tie in bootstrap with the
models to make it easier for users, and seeing some usage examples
will help.

Thank you,

Josef

>
> Andreas.
>
>
> [1] http://www.sciencedirect.com/science/article/pii/S0167947399000146
>

Andreas Hilboll

unread,
Mar 14, 2013, 10:08:11 AM3/14/13
to pystat...@googlegroups.com, josef...@gmail.com
>> B�hlmann and K�nsch [1], as well as a bias-corrected, accelerated (BCa)
>> confidence interval calculation.
>>
>> [1] http://www.sciencedirect.com/science/article/pii/S0167947399000146
>
> from a quick look at the paper, the block length selection is
> independent of a specific model.
> Can it be just used for example on the residuals?

I think so, yes. At least that's how I used it.

> In this case I would put it together with the bootstrap code.
>
>>
>> I'd like to somehow share this (together with the original
>> replication-generating loop function). And ideas where it might fit
>> best? Someplace in statsmodels?
>
> It would fit well in statsmodels. I started a branch with the block
> and stationary bootstrap iterators, but haven't done anything else
> with it.
>
> Currently we have an empty ``resampling`` directory in statsmodels
> https://github.com/statsmodels/statsmodels/tree/master/statsmodels/resampling
> and some code for iterators that is still in the sandbox/tools folder
>
> model specific bootstrap is together with the models.
>
> You could open a pull request and put your functions into
> statsmodels/resampling.
>
> It would also be useful if you have a example with it's usage, because
> up to now it's not yet clear how we should tie in bootstrap with the
> models to make it easier for users, and seeing some usage examples
> will help.

Okay, I'll try to polish my code a bit and then do a PR agains the
resampling directory.

Is there some data already in statsmodels I can use for test cases?

Andreas.

Vincent Arel

unread,
Mar 14, 2013, 10:18:43 AM3/14/13
to pystat...@googlegroups.com
>>> Bühlmann and Künsch [1], as well as a bias-corrected, accelerated (BCa)
Yes there are.

http://statsmodels.sourceforge.net/devel/datasets/index.html
Reply all
Reply to author
Forward
0 new messages