Moving (rolling) statistics

11 views
Skip to first unread message

Keith Goodman

unread,
Nov 4, 2010, 4:30:27 PM11/4/10
to labele...@googlegroups.com, pystat...@googlegroups.com
I'm working on a module of moving (rolling) summary statistics. Some
function signatures:

mov_sum(arr, window, axis=-1, method='filter')
mov_nansum(arr, window, axis=-1, method='filter')

mov_max(arr, window, axis=-1, method='filter')
mov_nanmax(arr, window, axis=-1, method='filter')

The available methods for mov_nansum, for example, are 'filter',
'strides', 'cumsum', and 'loop'. The filter method (default) uses
scipy.ndimage.

The input is a nd numpy array; output is an array of the same shape.
If the input is, say, a 2d array and axis=1, then the moving
statistics will be measured along each row separately (not in a 2d
rectangle as is often done in image work).

The module is in the prototype stage. It only contains sum and max
(plus nansum, nanmax, and unit tests). Doc strings are missing but all
functions share the same signature.

I'm looking for comments and suggestions before building it out. Would
anyone find a module like this useful?

code (BSD license):

http://gitorious.org/labeled-array/la/blobs/master/la/farray/mov.py
https://github.com/kwgoodman/la/blob/master/la/farray/mov.py

Some functions I plan to add (nan versions too):

mov_mean
mov_var
mov_std
mov_zscore
mov_median
mov_min
mov_prod
mov_percentile
mov_count
mov_ranking
mov_gmean
mov_all?
mov_any?

josef...@gmail.com

unread,
Nov 4, 2010, 4:52:28 PM11/4/10
to labele...@googlegroups.com

If you assume equal spaced in time, then trend would also be possible
in a fast way. I think the recipe (more complete than mine) is now in
the cookbook.
The strided version looks interesting, since I never tried to figure
out these tricks. If you have a rough idea about the speed advantages
of different methods, then you could do a "auto" method that uses some
heuristics to choose the fastest method. I tried to do this very
roughly for some tsa calculations.

With ndimage you need crashtests, since wrong inputs can kill the
interpreter, essentially I switched for all my convolutions to
scipy.signal, but ndimage has some extra features.

more later (time for the kids)

Josef

Keith Goodman

unread,
Nov 4, 2010, 5:10:05 PM11/4/10
to labele...@googlegroups.com
On Thu, Nov 4, 2010 at 1:52 PM, <josef...@gmail.com> wrote:

> If you assume equal spaced in time, then trend would also be possible
> in a fast way. I think the recipe (more complete than mine) is now in
> the cookbook.

That's a neat one. For this first round I'll stick to the basics. The
moving stats functions are the focus of la 0.5, BTW.

> The strided version looks interesting, since I never tried to figure
> out these tricks. If you have a rough idea about the speed advantages
> of different methods, then you could do a "auto" method that uses some
> heuristics to choose the fastest method. I tried to do this very
> roughly for some tsa calculations.

I wondered about that, but in the end decided to leave it to the user
and instead provide some sphinx docs with timing examples. Picking the
problem right, I can get faster times with a simple loop than with a
filter!

> With ndimage you need crashtests, since wrong inputs can kill the
> interpreter, essentially I switched for all my convolutions to
> scipy.signal, but ndimage has some extra features.

I'm brand new to ndimage. It's what I found first. I did try to add an
optional lag input to the moving stats but that gave me a segfault in
ndimage when the lag was above a small limit! So no lag for now.

Reply all
Reply to author
Forward
0 new messages