how to keep=max in pandas.DataFrame.drop

进陆

unread,

May 8, 2017, 9:25:36 AM5/8/17

to PyData

As the title says, sometimes, I need to keep the max value where other items are same. For the example on http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest

Should turn into this:

if I use `df.drop_duplicates(subset=['A'], keep=max)` or change max to other functions.

Is there a univsersal method to do this in pandas? Or does the developer plan to add this?

Thanks

Tom Augspurger

unread,

May 8, 2017, 9:35:25 AM5/8/17

to pyd...@googlegroups.com

You might try sorting and then keeping the first (or last): `df.sort_values(['A', 'B'], ascending=False).drop_duplicates(subset=["A"], keep="first")`.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Spencer Ogden

unread,

May 8, 2017, 10:09:54 AM5/8/17

to PyData

Seems like grouby is what you are looking for:

import pandas
df = pandas.DataFrame({'A':[1,1,2,2,3],'B':[10,20,30,40,10]})
df

A	B
0	1	10
1	1	20
2	2	30
3	2	40
4	3	10

df.groupby('A').max()

	B
A

1	20
2	40
3	10

ordf.groupby('A').max().reset_index()

	A	B
0	1	20
1	2	40
2	3	10

Spencer

On 5/8/2017 9:35 AM, Tom Augspurger wrote:

You might try sorting and then keeping the first (or last): `df.sort_values(['A', 'B'], ascending=False).drop_duplicates(subset=["A"], keep="first")`.

On Mon, May 8, 2017 at 8:22 AM, 进陆 <lepto....@gmail.com> wrote:

-- You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

-- You received this message because you are subscribed to the Google Groups "PyData" group. To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

how to keep=max in pandas.DataFrame.drop_duplicates

进陆

Tom Augspurger

Spencer Ogden