Problem with groupby and nth in pandas 0.18.1

Benjamin Bertrand

unread,

Jul 5, 2016, 9:04:07 AM7/5/16

to PyData

Hi,

With pandas 0.17.1 I used to do the following:

import pandas as pd

df = pd.DataFrame(
    {'device': ['A', 'A', 'A', 'B', 'B', 'B'],
     'timestamp': [0, 2, 4, 1, 3, 5]})
df['start'] = df.groupby('device')['timestamp'].nth(0)

It gave:

df
   device timestamp start
0  A          0       0
1  A          2       NaN
2  A          4       NaN
3  B          1       1
4  B          3       NaN
5  B          5       NaN

With pandas 0.18.1, this is what I get:

df
   device timestamp start
0  A          0       NaN
1  A          2       NaN
2  A          4       NaN
3  B          1       NaN
4  B          3       NaN
5  B          5       NaN

In pandas 0.17.1, df.groupby('device')['timestamp'].nth(0) returns the index and timestamp column:

0  0
3  1

But in pandas 0.18.1, it returns the device and timestamp column. The index is "lost":

device
A   0
B   1

Is this the new normal behavior?

How can I achieve the same thing as what I was doing in pandas 0.17.1?

My DataFrame is sorted by device and timestamp and I want to get the first (and last) timestamp for each device.

Thanks

Benjamin

Joris Van den Bossche

unread,

Jul 5, 2016, 9:25:46 AM7/5/16

to PyData

Hi Benjamin,

This was changed in pandas 0.18.1 (see http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#groupby-nth-changes, https://github.com/pydata/pandas/pull/11039).

For now, to get the old result back, you can use head:

In [3]: df.groupby('device')['timestamp'].head(1)
Out[3]:
0 0
3 1
Name: timestamp, dtype: int64

In [4]: pd.__version__
Out[4]: u'0.18.1'

But of course this is not a solution if you want something else than the first element (nth(0)).

Given there is not an easy way to get the old result, and that is has been like that for a long time, maybe we should reconsider this.

Regards,

Joris

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Bertrand

unread,

Jul 12, 2016, 5:38:14 PM7/12/16

to PyData

Thanks for the answer. Sorry I only noticed it today.

In the meantime, I found this stackoverflow question with another solution using transform('idxmim'):

start_index = df.groupby('device')['timestamp'].transform('idxmim')
df['start'] = df.loc[start_index, 'timestamp'].values

df
Out[7]:
device timestamp start
0 A 0 0
1 A 2 0
2 A 4 0
3 B 1 1
4 B 3 1
5 B 5 1

	device	timestamp	start
0	A	0	0
1	A	2	0
2	A	4	0
3	B	1	1
4	B	3	1
5	B	5	1

This gives me what I want. And I can use "idxmax" to get the end value.

Benjamin

Joris Van den Bossche

unread,

Jul 15, 2016, 5:50:35 PM7/15/16

to PyData

I noticed that you can also have the original behaviour of 0.17 by passing as_index=False:

In [13]: df.groupby('device', as_index=False)['timestamp'].nth(0)
Out[13]:

0 0
3 1
Name: timestamp, dtype: int64

Are you sure the transform('idxmin') works? I get an error when I try that (both on 0.17.1 as 0.18.1): AttributeError: 'SeriesGroupBy' object has no attribute 'idxmim'

Regards,

Joris

Joris Van den Bossche

unread,

Jul 15, 2016, 5:52:59 PM7/15/16

to PyData

2016-07-15 23:50 GMT+02:00 Joris Van den Bossche <jorisvand...@gmail.com>:

I noticed that you can also have the original behaviour of 0.17 by passing as_index=False:

In [13]: df.groupby('device', as_index=False)['timestamp'].nth(0)
Out[13]:
0 0
3 1
Name: timestamp, dtype: int64

Are you sure the transform('idxmin') works? I get an error when I try that (both on 0.17.1 as 0.18.1): AttributeError: 'SeriesGroupBy' object has no attribute 'idxmim'

Whoops, there was a typo in your code, which is the cause that it failed: idxmim of course does not work, but idxmin does :-)

Benjamin Bertrand

unread,

Jul 16, 2016, 11:16:20 AM7/16/16

to PyData

Sorry for the typo :-)

I thought I did a copy/paste from my Jupiter Notebook.

But I managed to write twice idxmim...

Reply all

Reply to author

Forward