how to keep=max in pandas.DataFrame.drop_duplicates

4,188 views
Skip to first unread message

进陆

unread,
May 8, 2017, 9:25:36 AM5/8/17
to PyData
As the title says, sometimes, I need to keep the max value where other items are same. For the example on http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest

A B
1 10
1 20
2 30
2 40
3 10

Should turn into this:

A B
1 20
2 40
3 10

if I use `df.drop_duplicates(subset=['A'], keep=max)` or change max to other functions. 
Is there a univsersal method to do this in pandas? Or does the developer plan to add this?

Thanks

Tom Augspurger

unread,
May 8, 2017, 9:35:25 AM5/8/17
to pyd...@googlegroups.com
You might try sorting and then keeping the first (or last):  `df.sort_values(['A', 'B'], ascending=False).drop_duplicates(subset=["A"], keep="first")`.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Spencer Ogden

unread,
May 8, 2017, 10:09:54 AM5/8/17
to PyData
Seems like grouby is what you are looking for:

import pandas
df = pandas.DataFrame({'A':[1,1,2,2,3],'B':[10,20,30,40,10]})
df


A
B
0 1 10
1 1 20
2 2 30
3 2 40
4 3 10

df.groupby('A').max()


B
A
1 20
2 40
3 10

or
df.groupby('A').max().reset_index()

A B
0 1 20
1 2 40
2 3 10

Spencer


On 5/8/2017 9:35 AM, Tom Augspurger wrote:
You might try sorting and then keeping the first (or last):  `df.sort_values(['A', 'B'], ascending=False).drop_duplicates(subset=["A"], keep="first")`.
On Mon, May 8, 2017 at 8:22 AM, 进陆 <lepto....@gmail.com> wrote:
-- You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
-- You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages