Boost for np.sum(x[np.isfinite(x)])

30 views
Skip to first unread message

Jan-Philip Gehrcke

unread,
Feb 19, 2014, 9:29:02 AM2/19/14
to bottl...@googlegroups.com
Hello,

I have another question for today. I see that bottleneck's nansum
returns inf or -inf if such values are within the set of numbers to be
summed:

>>> bn.nansum([1, np.nan, np.inf])
inf
>>> bn.nansum([1, np.nan, np.NINF])
-inf

This makes sense. In certain cases, however, people might want to ignore
(-)infs while summing up.

After taking the reciprocal of a large set of finite numbers X_i, the
result may contain inf in all places where X was zero. Summing up while
ignoring infs then requires something like this:

np.sum(x[np.isfinite(x)])

his involves creation of a boolean array which is then used for
indexing, so that only a subset of the reciprocal set of numbers is summed.

bottleneck seems to do a great job summing over arrays containing NaNs
while *ignoring* NaNs. Can't it just -- in a different function than
`nansum` also ignore (-)inf values? I guess that would be significantly
faster than doing np.sum(x[np.isfinite(x)]).


Thanks for insights,

Jan-Philip

Keith Goodman

unread,
Feb 19, 2014, 2:03:19 PM2/19/14
to bottl...@googlegroups.com
You could do an in-place replace of np.inf with np.nan

    bn.replace(x, np.inf, np.nan)

and then do

    bn.nansum(x)

The timings:

In [1]: a = np.random.rand(1000000)
In [2]: a[[1,4,8,9,100]] = np.inf
In [3]: timeit bn.nansum(a[np.isfinite(a)])
100 loops, best of 3: 3.34 ms per loop
In [4]: timeit bn.replace(a, np.inf, np.nan); bn.nansum(a)
1000 loops, best of 3: 1.49 ms per loop

or if you replace once and then need to do many sums or similar operations:

In [5]: timeit bn.nansum(a)
1000 loops, best of 3: 749 us per loop
Reply all
Reply to author
Forward
0 new messages