Regression when setting multiple object columns as index?

442 views
Skip to first unread message

Jonathan Rocher

unread,
Dec 23, 2015, 6:50:12 PM12/23/15
to pyd...@googlegroups.com

Dear all, 


I am investigating an issue that has appeared when updating pandas from 0.15.2 to 0.16.1 when setting multiple columns as the index when these columns are non-trivial python objects. You can reproduce the issue with:


from pandas import DataFrame


a = DataFrame(range(5), columns=["val"])


list_of_frozensets = [frozenset(l) for l in list("abcde")]

a["letter"] = list_of_frozensets

a["letter2"] = list_of_frozensets

print a.set_index(["letter", "letter2"])   #  <<< This causes the error


Is this code not reasonable or is there a different recommended way to do that?


This raises a TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.


That comes from the ordered=True in https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L4805 and setting that to False doesn't seem cause any issues, though that might make the indexing operations slower.


Thoughts? Recommendations?
Jonathan

--
Jonathan Rocher, PhD
Scientific software developer & Project manager

tom

unread,
Dec 23, 2015, 8:20:03 PM12/23/15
to pyd...@googlegroups.com
Seems to work correctly with 0.17.1, are you able to upgrade to it?

```
In [11]: a = DataFrame(list(range(5)), columns=['val'])

In [12]: list_of_frozensets = [frozenset(l) for l in list("abcde")]

In [13]: a["letter"] = list_of_frozensets

In [14]: a["letter2"] = list_of_frozensets

In [15]: a.set_index(["letter", "letter2"])
Out[15]:
                val
letter letter2
(a)    (a)        0
(b)    (b)        1
(c)    (c)        2
(d)    (d)        3
(e)    (e)        4
```

-Tom

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Rocher

unread,
Dec 28, 2015, 1:45:12 PM12/28/15
to pyd...@googlegroups.com
Hey Tom, 

Thanks for responding, but oddly enough, when I run it on 0.17.1 or on current master, I still get the same error. Can someone else confirm? Do you guys agree that it is a problem? If so, I can file a bug.

Jonathan

Joris Van den Bossche

unread,
Dec 28, 2015, 2:36:24 PM12/28/15
to PyData
For me the code snippet runs without error for current master, with numpy 1.10.1, python 2.7 on Windows.
Also works with 0.17.1 on python 3.5 (also numpy 1.10.1, Windows).

Regards,
Joris

Andy Ray Terrel

unread,
Dec 28, 2015, 3:03:14 PM12/28/15
to pyd...@googlegroups.com
The code works for me on pandas: 0.17.0, numpy: 1.10.2, Mac OS X

-- Andy

Jonathan Rocher

unread,
Dec 29, 2015, 2:14:20 PM12/29/15
to pyd...@googlegroups.com
Thanks very much to all of you for confirming. I am still on numpy 1.9.2 (MacOSX and windows), so that seems like a probable cause... Will reconfirm once I can update.

Happy holidays!


Reply all
Reply to author
Forward
0 new messages