numpy.where() and multiple comparisons

John Ladasky

unread,

Jan 17, 2014, 8:51:17 PM1/17/14

to

Hi folks,

I am awaiting my approval to join the numpy-discussion mailing list, at scipy.org. I realize that would be the best place to ask my question. However, numpy is so widely used, I figure that someone here would be able to help.

I like to use numpy.where() to select parts of arrays. I have encountered what I would consider to be a bug when you try to use where() in conjunction with the multiple comparison syntax of Python. Here's a minimal example:

Python 3.3.2+ (default, Oct 9 2013, 14:50:09)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.where(a < 5)
>>> b
(array([0, 1, 2, 3, 4]),)
>>> c = np.where(2 < a < 7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Defining b works as I want and expect. The array contains the indices (not the values) of a where a < 5.

For my definition of c, I expect (array([3, 4, 5, 6]),). As you can see, I get a ValueError instead. I have seen the error message about "the truth value of an array with more than one element" before, and generally I understand how I (accidentally) provoke it. This time, I don't see it. In defining c, I expect to be stepping through a, one element at a time, just as I did when defining b.

Does anyone understand why this happens? Is there a smart work-around? Thanks.

duncan smith

unread,

Jan 17, 2014, 9:16:28 PM1/17/14

to

>>> a = np.arange(10)
>>> c = np.where((2 < a) & (a < 7))
>>> c

(array([3, 4, 5, 6]),)
>>>

Duncan

John Ladasky

unread,

Jan 17, 2014, 11:00:20 PM1/17/14

to

On Friday, January 17, 2014 6:16:28 PM UTC-8, duncan smith wrote:

> >>> a = np.arange(10)
> >>> c = np.where((2 < a) & (a < 7))
> >>> c
> (array([3, 4, 5, 6]),)

Nice! Thanks!

Now, why does the multiple comparison fail, if you happen to know?

Peter Otten

unread,

Jan 18, 2014, 3:50:00 AM1/18/14

to pytho...@python.org

2 < a < 7

is equivalent to

2 < a and a < 7

Unlike `&` `and` cannot be overridden (*), so the above implies that the
boolean value bool(2 < a) is evaluated. That triggers the error because the
numpy authors refused to guess -- and rightly so, as both implementable
options would be wrong in a common case like yours.

(*) I assume overriding would collide with short-cutting of boolean
expressions.

Tim Roberts

unread,

Jan 18, 2014, 4:20:54 PM1/18/14

to

Peter Otten <__pet...@web.de> wrote:

>John Ladasky wrote:
>
>> On Friday, January 17, 2014 6:16:28 PM UTC-8, duncan smith wrote:
>>
>>> >>> a = np.arange(10)
>>> >>> c = np.where((2 < a) & (a < 7))
>>> >>> c
>>> (array([3, 4, 5, 6]),)
>>
>> Nice! Thanks!
>>
>> Now, why does the multiple comparison fail, if you happen to know?
>
>2 < a < 7
>
>is equivalent to
>
>2 < a and a < 7
>

>Unlike `&` `and` cannot be overridden (*),,,,

And just in case it isn't obvious to the original poster, the expression "2
< a" only works because the numpy.array class has an override for the "<"
operator. Python natively has no idea how to compare an integer to a
numpy.array object.

Similarly, (2 < a) & (a > 7) works because numpy.array has an override for
the "&" operator. So, that expression is compiled as

numpy.array.__and__(
numpy.array.__lt__(2, a),
numpy.array.__lt__(a, 7)
)

As Peter said, there's no way to override the "and" operator.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Terry Reedy

unread,

Jan 18, 2014, 7:12:35 PM1/18/14

to pytho...@python.org

On 1/18/2014 3:50 AM, Peter Otten wrote:

> Unlike `&` `and` cannot be overridden (*),

> (*) I assume overriding would collide with short-cutting of boolean
> expressions.

Yes. 'and' could be called a 'control-flow operator', but in Python it
is not a functional operator.

A functional binary operator expression like 'a + b' abbreviates a
function call, without using (). In this case, it could be written
'operator.add(a,b)'. This function, or it internal equivalent, calls
either a.__add__(b) or b.__radd__(a) or both. It is the overloading of
the special methods that overrides the operator.

The control flow expression 'a and b' cannot abbreviate a function call
because Python calls always evaluate all arguments first. It is
equivalent* to the conditional (control flow) *expression* (also not a
function operator) 'a if not a else b'. Evaluation of either expression
calls bool(a) and hence a.__bool__ or a.__len__.

'a or b' is equivalent* to 'a if a else b'

* 'a (and/or) b' evaluates 'a' once, whereas 'a if (not/)a else b'
evaluates 'a' twice. This is not equivalent when there are side-effects.
Here is an example where this matters.
input('enter a non-0 number :') or 1

--
Terry Jan Reedy