la 0.5.0dev now uses Bottleneck

5 views
Skip to first unread message

Keith Goodman

unread,
Mar 10, 2011, 12:02:08 PM3/10/11
to labele...@googlegroups.com
The development version of la (0.5.0dev) is faster and adds fast,
moving window methods to larry. The cost of the speed and new
functionality is that la 0.5.0 will requires the Bottleneck package
(http://pypi.python.org/pypi/Bottleneck).

Faster

* sum, mean, std, var, min, max, median, ranking

Moving window

* fast (Bottleneck): move_sum, move_mean, move_std, move_min, move_max
* slow (Python): move_ranking, move_median, move_func

Here's the motivation for using Bottleneck in the la package:

Import:

>> import la
>> import numpy as np
>> import scipy.stats

Make data:

>> lar = la.rand(1000,1000)
>> lar[lar > 0.5] = la.nan
>> arr = lar.A

Check:

>> lar.median()
0.24999568789356486
>> np.median(arr)
nan
>> scipy.stats.nanmedian(arr, axis=None)
array(0.24999568789356486)

Time it:

>> timeit lar.median()
1000 loops, best of 3: 1.69 ms per loop
>> timeit np.median(arr)
10 loops, best of 3: 63.3 ms per loop
>> timeit scipy.stats.nanmedian(arr, axis=None)
10 loops, best of 3: 82.9 ms per loop

The development version of la can be download from
https://github.com/kwgoodman/la. Please report any issues (good or
bad).

Keith Goodman

unread,
Mar 12, 2011, 10:17:44 AM3/12/11
to labele...@googlegroups.com
On Thu, Mar 10, 2011 at 9:02 AM, Keith Goodman <kwgo...@gmail.com> wrote:

> Here's the motivation for using Bottleneck in the la package:
>
> Import:
>
>>> import la
>>> import numpy as np
>>> import scipy.stats
>
> Make data:
>
>>> lar = la.rand(1000,1000)
>>> lar[lar > 0.5] = la.nan
>>> arr = lar.A
>
> Check:
>
>>> lar.median()
>   0.24999568789356486
>>> np.median(arr)
>   nan
>>> scipy.stats.nanmedian(arr, axis=None)
>   array(0.24999568789356486)
>
> Time it:
>
>>> timeit lar.median()
> 1000 loops, best of 3: 1.69 ms per loop
>>> timeit np.median(arr)
> 10 loops, best of 3: 63.3 ms per loop
>>> timeit scipy.stats.nanmedian(arr, axis=None)
> 10 loops, best of 3: 82.9 ms per loop

The timing above wasn't fair. lar.median() worked on the input array
inplace. In Bottleneck 0.4.3dev it now works on a copy:

>> timeit lar.median()
100 loops, best of 3: 13.8 ms per loop
>> timeit np.median(arr)
10 loops, best of 3: 75.5 ms per loop
>> timeit scipy.stats.nanmedian(arr, axis=None)
10 loops, best of 3: 90.9 ms per loop

Reply all
Reply to author
Forward
0 new messages