In other words I've replaced the old move_sum and move_nansum with a new move_sum that takes an extra parameter. Fewer functions, more functionality.
I also rewrote the slow versions of the moving window functions, dropping support for the filter and strides methods and only providing the python loop method.
In other words I've replaced the old move_sum and move_nansum with a new move_sum that takes an extra parameter. Fewer functions, more functionality.I think this is a good idea. The current implementation of move_nansum is particularly confusing now that nansum (in numpy 1.9) returns 0 for all nan arrays. I suppose I could get that new version with minc=0? It's better to require users to specify what they want.
I would, however, suggest that min_count would be a better name, one that is actually self-descriptive. Both are equally fast to type in a world of autocomplete but I would struggle to guess the meaning of minc.
I also rewrote the slow versions of the moving window functions, dropping support for the filter and strides methods and only providing the python loop method.I'm a little sad to see the strides method go -- I thought it was nice to have that method around, at least as a reference point. But I suppose there's not really any need for it now that the moving window functions are n-dimensional.
On Mon, Jan 5, 2015 at 1:43 PM, Stephan Hoyer <sho...@gmail.com> wrote:In other words I've replaced the old move_sum and move_nansum with a new move_sum that takes an extra parameter. Fewer functions, more functionality.I think this is a good idea. The current implementation of move_nansum is particularly confusing now that nansum (in numpy 1.9) returns 0 for all nan arrays. I suppose I could get that new version with minc=0? It's better to require users to specify what they want.Hm. I didn't allow minc to be zero. I guess move_sum is the only one that has a default value even if there are no data. Should I try to add minc=0 capability to all the moving window fucntions?
Allowing min_count=0 adds complexity to the code. And I don't think it would be used much. Most often I bet min_count=0 would be just a
user error (maybe meant axis=0). Is it worth it?
--
You received this message because you are subscribed to the Google Groups "bottle-neck" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bottle-neck...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
A related thought occurred to me: what about similarly consolidating sum and nansum into sum with the parameter min_count? This would require some additional logic, but it would not be too bad, I think.
The big downside, of course, is speed. Moving window functions have to keep track of NaNs (even with min_count=window) but functions like sum (which doesn't exist in bottleneck) do not need to check if each element is NaN. Adding min_count would mean checking each element, which would be slower than numpy.sum for large float input arrays
On Thu, Jan 8, 2015 at 6:55 PM, Stephan Hoyer <sho...@gmail.com> wrote:So nansum, for example, would be one function for the user but three functions for the developer:
if min_count = 0 (default):
current bn.nansum code
elif min_count is None: # which means min_count=a.shape[axis]
write new code for sum which wouldn't have to check for NaN or count them, so fast
elif min_count in interval [1, a.shape[axis]]:
write code that checks for nans and counts