Changing bool behaviour of pandas objects  andy hayden  9/10/13 9:14 AM  There's a discussion on github to change the behaviour of `__nonzero__` for pandas objects. We wanted to gauge users' feedback on proposed changes*. Bool behaviour in pandas (and numpy) often trips up and surprises new (and experienced) users, for one thing because it differs from many python objects.  For empty arrays it's Falsey  For length one arrays it's bool of item (Note: bool(nan) is True)  Otherwise it raises a ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() One option (originally discussed https://github.com/pydata/pandas/issues/4633 and currently implemented in master via https://github.com/pydata/pandas/pull/4657) is to turn off bool **always**:  raise ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all(). An alternative proposal being discussed is (https://github.com/pydata/pandas/pull/4738):  For length one arrays it's bool of item (Perhaps raising on bool(Series([nan])).)  Otherwise raise a ValueError: The truth value of an array with more than one element is ambiguous. Use a.empty, a.any() or a.all() Note: bool of empty objects would be disallowed. In both cases:  not/and/or would be specifically disallowed. *sometime after 0.8 there was an API change from https://github.com/pydata/pandas/pull/1073, where bool(df) was df.empty see https://github.com/pydata/pandas/issues/4633. 
Re: [pydata] Changing bool behaviour of pandas objects  Josef Pktd  9/10/13 9:51 AM  On Tue, Sep 10, 2013 at 12:14 PM, andy hayden <andyh...@gmail.com> wrote:How common is the use of Series([True]) and Series([False])? Do dataframe or series .any() .all() and similar return a Series or a python bool? Josef >  > You received this message because you are subscribed to the Google Groups > "PyData" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pydata+un...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. 
Re: [pydata] Changing bool behaviour of pandas objects  Skipper Seabold  9/10/13 10:02 AM  On Tue, Sep 10, 2013 at 12:51 PM, <josef...@gmail.com> wrote:I rely on this behavior in both numpy and pandas. Boolean I believe. FWIW, I'll summarize a bit my vote and the reasoning for it here. I think we should continue the numpy behavior but fix the warty NaNhandling in numpy, because as we all know this is an area that pandas exists to improve. I'm operating under the assumption that the checked Series/DataFrame is the result of an indexing operation for which one element is expected to be returned. You can't control the container that's returned and I'd rather not have to add an .item() everywhere in my code but pandas should keep me from doing the wrong thing i.e., doing an ambiguous operation. Maybe it's confusing, but I don't really see how you could shoot yourself in the foot. It just seems drastic and unnecessary to disallow this behavior. If you want to use .any() and .all() everywhere, then nothing is stopping you. Behavior and reasoning: 1. Empty series raises. Maybe you screwed up your index? What is the 'correct' output of this? if pd.isnull(pd.DataFrame([])): print 'this dataframe has no missing values?' This seems ambiguous. You can't answer the question because there's no information to evaluate the statement. 2. 1 element is fine. You know what you're doing, carry on. Also .all() == .any() in this case, so it's not ambiguous. 3. Length > 1 raises . This is ambiguous. Ask for all, any, or empty. Maybe you screwed up your index? Skipper 
Re: [pydata] Changing bool behaviour of pandas objects  Skipper Seabold  9/10/13 10:03 AM  On Tue, Sep 10, 2013 at 12:14 PM, andy hayden <andyh...@gmail.com> wrote:What do you mean by in both cases here?

Re: [pydata] Changing bool behaviour of pandas objects  Josef Pktd  9/10/13 10:22 AM  If I understand correctly:
Then the point is that pandas should have behavior that is useful for pandas "scalars". I think that's the issue (and the inconsistency with numpy) and not just the "bool" of a scalar or oneelement array. >>> type(np.array(['', 'a'], dtype='O')[0]) <type 'str'> >>> type(np.array([0, 1])[0]) <type 'numpy.int32'> >>> type(np.array([0, 1])[0].item()) <type 'int'> >>> type(np.array([0, 1], bool)[0]) <type 'numpy.bool_'> When indexing into a numpy array, then we get scalars that we can work with (besides small differences between the scalar numpy type and the related python type). so indexing into a boolean dataframe or series that returns one element should be useful as bool. I assume numerical operations also work with a oneelement series in an analogous way. (I would prefer if numpy didn't have any python bool behavior with arrays with shape > (), and should always raise.) Josef

Re: [pydata] Changing bool behaviour of pandas objects  Josef Pktd  9/10/13 10:37 AM  I don't know where I have a more recent pandas than 0.11, so I better be quiet
I'm getting a numpy.bool_ >>> b 0 0 False 1 False 2 True 3 True >>> type(b.iloc[0,0]) <type 'numpy.bool_'> >>> bool(b.iloc[0,0]) False >>> bool(b.iloc[2,0]) True >>> pd.__version__ '0.11.0' >>> type(b) <class 'pandas.core.frame.DataFrame'> Josef 
Re: [pydata] Changing bool behaviour of pandas objects  andy hayden  9/11/13 4:33 AM  To add my thoughts: Explicit is better than implicit, and imo using bool on pandas object is *never* explicit (and *always* ambiguous). I find writing code which depends on the context (the array/Series length) a strange idiom, and it's not one I use. This special case can be made completely nonambiguous by using .item()... so why make it special? (We should add .item() to the ValueError message.)
The disallowing of `__nonzero__` *entirely*, requiring users to be explicit, seems to me a clean and sensible solution to a common hiccup/cause of bugs in pandas code. And the ValueError would give immediate feedback of what the user should do to correct their code and remove the ambiguity.
Josef: we're talking about applying bool to pandas objects e.g. DataFrame and Series: bool(df) and bool(s).

Re: [pydata] Changing bool behaviour of pandas objects  Jeff  9/11/13 5:27 AM  FYI, master has this behavior (which makes sense and slightly sways me to Skipper's position) In [3]: Series([]).item() ValueError: can only convert an array of size 1 to a Python scalar In [4]: Series([1]).item() Out[4]: 1 In [5]: Series([1,2]).item() ValueError: can only convert an array of size 1 to a Python scalar 
Re: [pydata] Changing bool behaviour of pandas objects  andy hayden  9/11/13 5:42 AM  I agree that using item() here makes this behaviour completely explicit. However, imo it would be surprising for boo(s) to be sugar for bool(s.item())... which (I think) is Skipper's suggestion.

Re: [pydata] Changing bool behaviour of pandas objects  Nathaniel Smith  9/11/13 6:09 AM  On Wed, Sep 11, 2013 at 1:27 PM, Jeff <jeffr...@gmail.com> wrote:.item() is a weird and I think widely misunderstood method (certainly I never understood it until getting more immersed in numpy's internals). Logically, there are two operations involved:  Indexing. The "pure indexing" method of course is .__getitem__, []  Conversion from numpydefined types types to python native types. Numpy has a "pure conversion" method, .tolist(). (This method is misnamed, e.g. if you call .tolist() on a numpy scalar than you get a Python scalar, type(np.int32(1).tolist()) is int, not list or np.int32.) .item() *combines* these two operations: arr.item(*args) is defined as arr[args].tolist(), *except* that it has its own bizarro indexing rules:  if multiple index arguments are given, it's just arr[args] *except* it is an error if the result is not a scalar.  if one index argument is given, then the array is first flattened and then indexed (??)  if no index arguments are given, then it's equivalent to .item(0) except that it's required that the flattened array have exactly 1 entry. My guess is that this is a holdover from the old turmoil about whether scalar types should exist at all that was a big issue around the numeric/numarray transition, where someone invented it as a compromise pythontypeusing indexing operation that has survived until now, like a programmatic coelacanth. It's certainly not at all consistent with modern numpy style. n 
Re: [pydata] Changing bool behaviour of pandas objects  Jeff  9/16/13 6:17 AM  The PR was updated to use @cpcloud name suggestion in the error message. Note that since we only allow a single element of bool dtype thru, a single element NaN/NaT is already raising
 @hayd maybe let's summarize and decide? Pandas will differ with numpy (1):  empty will ALWAYS raise in a boolean context And conform (2):  a single element Series of dtype == bool will return the bool of its element so weighing practciality and consistency I think preservering (2) for the time being preserves backward compat. `` will still *work*. Maybe best way to move this forward, is to accept this PR with a deprecation message on (2), which can be then changed in a future pandas release? 
Re: Changing bool behaviour of pandas objects  Dale Jung  9/16/13 8:21 AM  I guess I'm the only one who made liberal use of: if df: blah blah during that small window that it worked. :P
