On Tue, Sep 10, 2013 at 12:51 PM, <
josef...@gmail.com> wrote:
> On Tue, Sep 10, 2013 at 12:14 PM, andy hayden <
andyh...@gmail.com> wrote:
>> There's a discussion on github to change the behaviour of `__nonzero__` for
>> pandas objects. We wanted to gauge users' feedback on proposed changes*.
>>
>> Bool behaviour in pandas (and numpy) often trips up and surprises new (and
>> experienced) users, for one thing because it differs from many python
>> objects.
>>
>> - For empty arrays it's Falsey
>> - For length one arrays it's bool of item (Note: bool(nan) is True)
>> - Otherwise it raises a ValueError: The truth value of an array with more
>> than one element is ambiguous. Use a.any() or a.all()
>>
>> One option (originally discussed
>>
https://github.com/pydata/pandas/issues/4633 and currently implemented in
>> master via
https://github.com/pydata/pandas/pull/4657) is to turn off bool
>> **always**:
>>
>> - raise ValueError: The truth value of an array is ambiguous. Use a.empty,
>> a.any() or a.all().
>>
>> An alternative proposal being discussed is
>> (
https://github.com/pydata/pandas/pull/4738):
>>
>> - For length one arrays it's bool of item (Perhaps raising on
>> bool(Series([nan])).)
>> - Otherwise raise a ValueError: The truth value of an array with more than
>> one element is ambiguous. Use a.empty, a.any() or a.all()
>
> How common is the use of Series([True]) and Series([False])?
I rely on this behavior in both numpy and pandas.
> Do dataframe or series .any() .all() and similar return a Series or
> a python bool?
>
Boolean I believe.
FWIW, I'll summarize a bit my vote and the reasoning for it here. I
think we should continue the numpy behavior but fix the wart-y
NaN-handling in numpy, because as we all know this is an area that
pandas exists to improve. I'm operating under the assumption that the
checked Series/DataFrame is the result of an indexing operation for
which one element is expected to be returned. You can't control the
container that's returned and I'd rather not have to add an .item()
everywhere in my code but pandas should keep me from doing the wrong
thing i.e., doing an ambiguous operation. Maybe it's confusing, but I
don't really see how you could shoot yourself in the foot. It just
seems drastic and unnecessary to disallow this behavior. If you want
to use .any() and .all() everywhere, then nothing is stopping you.
Behavior and reasoning:
1. Empty series raises. Maybe you screwed up your index? What is the
'correct' output of this?
if pd.isnull(pd.DataFrame([])):
print 'this dataframe has no missing values?'
This seems ambiguous. You can't answer the question because there's no
information to evaluate the statement.
2. 1 element is fine. You know what you're doing, carry on. Also
.all() == .any() in this case, so it's not ambiguous.
3. Length > 1 raises . This is ambiguous. Ask for all, any, or empty.
Maybe you screwed up your index?
Skipper