Proposal to improve Series.sort() api

124 views
Skip to first unread message

Ed Schofield

unread,
Mar 24, 2015, 8:33:52 AM3/24/15
to pyd...@googlegroups.com
Hi all,

I regularly teach people who are learning Pandas. Something that crops up very often as a barrier for people is the inconsistency of the .sort() API for Series (in-place) and DataFrames (not in-place). I explain over and over again that the .sort() API inconsistency is a historical wart and that will hopefully be fixed one day. Now I would to help with getting this fixed. I understand that the default in-place behaviour of .sort() on Series was originally a consequence of Series being subclasses of NumPy arrays. Now that Series have been decoupled from arrays, I am certain that Pandas would be easier to learn and use if the default sort behaviour for Series were changed to inplace=False.

Here is an example:

In [1]: !head -n7 olympics.csv

Country,Population (million),Gold,Silver,Bronze,Total
Grenada,0.10,1,0,0,1
Jamaica,2.74,4,4,4,12
Trinidad and Tobago,1.34,1,0,3,4
New Zealand,4.37,6,2,5,13
Bahamas,0.34,1,0,0,1
Slovenia,2.03,1,1,2,4

In [2]: medals = pd.read_csv('olympics.csv', skiprows=3)


In [3]: medals['Gold per million'] = medals['Gold'] / medals['Population (million)']


In [4]: medals.sort('Gold per million', ascending=False)[:20]['Gold per million'].plot(kind='bar')


which gives a nice sorted bar chart. However, trying the obvious analogous operation with Series:

In [18]: medals['Gold per million'].sort(ascending=False)[:20].plot(kind='bar')


fails with the following exception:


ValueError: This Series is a view of some other array, to sort in-place you must create a copy


which is rather uninformative for a beginner.


My conservative proposal to change the Series.sort() API is as follows:


Version 0.17: issue a deprecation warning for Series.sort() calls with no parameters (but still default to inplace=True)

Version 0.18: force passing inplace=True or inplace=False for calls to Series.sort()

Version 0.19: set the default for Series.sort() to inplace=False.


It would also be possible IMHO to jump straight to the second stage, requiring passing inplace=True or inplace=False for Series.sort() calls in v0.17.


I would be happy to submit pull requests for this if the maintainers give the proposal the green light.


Best wishes,

    Ed

Jeff

unread,
Mar 24, 2015, 8:40:38 AM3/24/15
to pyd...@googlegroups.com
Ed,

thanks for your interest!

This exact topic has been discussed and some proposals put forth in the following 2 issues.

https://github.com/pydata/pandas/issues/5190

https://github.com/pydata/pandas/issues/8239

Would love to have you review, comment, and potentially implement these changes.

Once some agreement on what should happen, we can prob close 8239, then open a new one with a concise description of what needs to be done.

thanks!

Jeff

Joris Van den Bossche

unread,
Apr 6, 2015, 9:50:28 AM4/6/15
to pyd...@googlegroups.com
There is now also a new proposal at https://github.com/pydata/pandas/issues/9816.

This proposes the introduction of a new `sorted` method on Series/DataFrame. A new method with a unified interface, but without the difficulty of having to deprecate the sort behaviour of Series now.

Feel free to chime in on the discussion here or on github!


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joris Van den Bossche

unread,
Aug 4, 2015, 5:27:57 PM8/4/15
to PyData
Note that there is a renewed discussion on a proposed PR (https://github.com/pydata/pandas/pull/10726) to refine the API.

Current state:

- add `.sorted()` method that defaults to sorting on the values (and cannot sort the index)
- keep `.sort_index` to be the function to sort on the index
- deprecate `Series.sort`, `Series.order` and `DataFrame.sort`

Any feedback certainly welcome!
Reply all
Reply to author
Forward
0 new messages