Fwd: Fast Cython searchsorted for both arrays sorted

275 views
Skip to first unread message

Keith Goodman

unread,
Jun 4, 2012, 2:58:38 PM6/4/12
to bottl...@googlegroups.com
---------- Forwarded message ----------
From: Tom Aldcroft <aldc...@head.cfa.harvard.edu>
Date: Mon, Jun 4, 2012 at 2:13 PM
Subject: Fast Cython searchsorted for both arrays sorted
To: bottl...@googlegroups.com


I just saw the announce for bottleneck 0.6.  This looks really useful!
 By coincidence this weekend I wrote a little Cython code to replace
np.searchsorted(a, v) for the special case when both arrays are
sorted.  When this is the case and len(v) > len(a) / 100 (roughly
speaking) then the Cython version is 10 to 20 times faster.  This use
case is pretty common in my work, for instance in regridding
coordinate data etc.  Here is the code, first the Cython code and then
the Python wrapper function:

https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx
https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/Numpy.py#L427

This is my first real Cython, and I didn't initially intend for
distribution, but if you think it would be a useful addition to
bottleneck then feel free to take the code.

Cheers,
Tom

On Mon, Jun 4, 2012 at 1:09 PM, Keith Goodman <kwgo...@gmail.com> wrote:
> Bottleneck is a collection of fast NumPy array functions written in
> Cython. It contains functions like median, nanargmax,
> move_median, rankdata, partsort, replace, anynan.
>
> The sixth release of bottleneck adds four new functions and, thanks to
> Dougal Sutherland, now supports Python 3.2.
>
> New functions:
> - replace(arr, old, new), e.g, replace(arr, np.nan, 0)
> - nn(arr, arr0, axis) nearest neighbor and its index of 1d arr0 in 2d arr
> - anynan(arr, axis) faster alternative to np.isnan(arr).any(axis)
> - allnan(arr, axis) faster alternative to np.isnan(arr).all(axis)
>
> Enhancements:
> - Python 3.2 support (may work on earlier versions of Python 3)
> - C files are now generated with Cython 0.16 instead of 0.14.1
> - Upgrade numpydoc from 0.3.1 to 0.4 to support Sphinx 1.0.1
>
> Breaks from 0.5.0:
> - Support for Python 2.5 dropped
> - Default axis for benchmark suite is now axis=1 (was 0)
>
> Bug fixes:
> - #31 Confusing error message in partsort and argpartsort
> - #32 Update path in MANIFEST.in
> - #35 Wrong output for very large (2**31) input arrays
>
> download
>   http://pypi.python.org/pypi/Bottleneck
> docs
>   http://berkeleyanalytics.com/bottleneck
> code
>   http://github.com/kwgoodman/bottleneck
> mailing list
>   http://groups.google.com/group/bottle-neck
> _______________________________________________
> SciPy-User mailing list
> SciPy...@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Keith Goodman

unread,
Jun 4, 2012, 3:19:02 PM6/4/12
to bottl...@googlegroups.com
> ---------- Forwarded message ----------
> From: Tom Aldcroft
> Date: Mon, Jun 4, 2012 at 2:13 PM
> Subject: Fast Cython searchsorted for both arrays sorted
> To: bottl...@googlegroups.com
>
>
> I just saw the announce for bottleneck 0.6.  This looks really useful!
>  By coincidence this weekend I wrote a little Cython code to replace
> np.searchsorted(a, v) for the special case when both arrays are
> sorted.  When this is the case and len(v) > len(a) / 100 (roughly
> speaking) then the Cython version is 10 to 20 times faster.  This use
> case is pretty common in my work, for instance in regridding
> coordinate data etc.  Here is the code, first the Cython code and then
> the Python wrapper function:
>
> https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx
> https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/Numpy.py#L427

Nice. I opened an issue: https://github.com/kwgoodman/bottleneck/issues/47
Reply all
Reply to author
Forward
0 new messages