Bottleneck 0.4.1

3 views
Skip to first unread message

Keith Goodman

unread,
Mar 8, 2011, 4:19:48 PM3/8/11
to SciPy Users List, bottl...@googlegroups.com
Bottleneck is a collection of fast NumPy array functions written in
Cython. It contains functions like median, nanmedian, nanargmax,
move_mean.

This is a bug fix release.

The low-level functions nanstd_3d_int32_axis1 and
nanstd_3d_int64_axis1, called by bottleneck.nanstd(), wrote beyond the
memory owned by the output array if both arr.shape[1] == 0 and
arr.shape[0] > arr.shape[2], where arr is the input array.

Thanks to Christoph Gohlke for finding an example to demonstrate the bug.

download
  http://pypi.python.org/pypi/Bottleneck
docs
  http://berkeleyanalytics.com/bottleneck
code
  http://github.com/kwgoodman/bottleneck
mailing list
  http://groups.google.com/group/bottle-neck
mailing list 2
  http://mail.scipy.org/mailman/listinfo/scipy-user

Keith Goodman

unread,
Mar 8, 2011, 6:07:57 PM3/8/11
to SciPy Users List, bottl...@googlegroups.com
On Tue, Mar 8, 2011 at 1:19 PM, Keith Goodman <kwgo...@gmail.com> wrote:
> Bottleneck is a collection of fast NumPy array functions written in
> Cython. It contains functions like median, nanmedian, nanargmax,
> move_mean.
>
> This is a bug fix release.
>
> The low-level functions nanstd_3d_int32_axis1 and
> nanstd_3d_int64_axis1, called by bottleneck.nanstd(), wrote beyond the
> memory owned by the output array if both arr.shape[1] == 0 and
> arr.shape[0] > arr.shape[2], where arr is the input array.
>
> Thanks to Christoph Gohlke for finding an example to demonstrate the bug.

How embarrassing! The same bug in nanstd() that was fixed in 0.4.1
exists in nanvar(). Thank you, Christoph, for pointing that out. Fixed
in Bottleneck 0.4.2.

Keith Goodman

unread,
Mar 8, 2011, 10:17:40 PM3/8/11
to SciPy Users List, bottl...@googlegroups.com
On Tue, Mar 8, 2011 at 6:06 PM, Wes McKinney <wesm...@gmail.com> wrote:

> Any interest adding a "min_periods" argument to the moving window
> functions in bottleneck?

Each moving window function in Bottleneck has a NaN version and a
non-NaN version, so move_nanmean() and move_mean(), for example.
Pandas has one version but you can adjust the min_periods to get
either the NaN or non-NaN version or anything in between. That's
clever.

The rest of Bottleneck uses the NaN and non-NaN naming, for example,
nanmedian and median. I think it is simpler (to discover what
Bottleneck can do for example) to stick with that. Much harder to
explain that the functionality is in a parameter that most users
haven't seen before. But let me think about it. It would be useful.

> One random question. Any idea on the long import time:
>
> $ time python -c "import bottleneck"
>
> real    0m0.712s
> user    0m0.546s
> sys     0m0.114s
> $ time python -c "import numpy"
>
> real    0m0.142s
> user    0m0.090s
> sys     0m0.049s
> $ time python -c "import scipy"
>
> real    0m0.201s
> user    0m0.132s
> sys     0m0.066s

Bottleneck has many low-level functions, for example,
median_2d_float64_axis0, median_2d_float64_axis1,
median_2d_int32_axis0, etc, etc. Maybe that explains it? But scipy has
a lot of functions too, so I don't know.

Keith Goodman

unread,
Mar 13, 2011, 5:32:39 PM3/13/11
to SciPy Users List, bottl...@googlegroups.com
On Tue, Mar 8, 2011 at 6:06 PM, Wes McKinney <wesm...@gmail.com> wrote:

> One random question. Any idea on the long import time:
>
> $ time python -c "import bottleneck"
>
> real    0m0.712s
> user    0m0.546s
> sys     0m0.114s
> $ time python -c "import numpy"
>
> real    0m0.142s
> user    0m0.090s
> sys     0m0.049s
> $ time python -c "import scipy"
>
> real    0m0.201s
> user    0m0.132s
> sys     0m0.066s

Bottleneck imports are now 3x faster. I switched to a lazy import of
scipy (Bottleneck rarely uses scipy).

Before:

$ time python -c "import bottleneck"

real 0m0.196s
user 0m0.150s
sys 0m0.040s

After:

$ time python -c "import bottleneck"

real 0m0.061s
user 0m0.010s
sys 0m0.050s

Does adding Bottleneck to your package increase the import time by
0.06 seconds? No, not if your package imports numpy:

$ time python -c "import numpy; import bottleneck"
real 0m0.060s
user 0m0.020s
sys 0m0.030s

I used this pattern for lazy imports:

email = None
def parse_email():
global email
if email is None:
import email

which I found here:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead

Thanks, Wes, for the report.

Reply all
Reply to author
Forward
0 new messages