columns overlap but no suffix specified

Soren

unread,

Sep 27, 2015, 4:02:24 PM9/27/15

to PyData

Hi,

when I want to join two tables on a specific column pandas by default asks for a suffices.

However, the column that is used for the join must have the same values. Including both columns in the output therefore is redundant.

You would always join the table and throw away one of the columns, wouldn't you?

Can I change this behavior somehow, so that the column appears without suffix and just one time instead of two?

That would be much more convenient in my opinion.

regards

Sören

Paul Hobson

unread,

Sep 27, 2015, 10:34:43 PM9/27/15

to pyd...@googlegroups.com

The columns on which the join is performed should not be duplicated. Could you post an exampled demonstrating this behavior?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Soren

unread,

Sep 28, 2015, 1:44:39 PM9/28/15

to PyData

import pandas as pd

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],

'key': ['K0', 'K1', 'K2', 'K3']})

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],

'key': ['K1', 'K0', 'K2', 'K3']})

print left.join(right,on='key',rsuffix='_r',how='right')

print left.join(right,on='key')

ValueError: columns overlap but no suffix specified: Index([u'key'], dtype='object')

print left.join(right,on='key',rsuffix='_r',how='left')

A key C key_r

0 A0 K0 NaN NaN

1 A1 K1 NaN NaN

2 A2 K2 NaN NaN

3 A3 K3 NaN NaN

Message has been deleted

Paul Hobson

unread,

Sep 28, 2015, 4:42:19 PM9/28/15

to pyd...@googlegroups.com

So there's the thing:

df.join -- use this when joining on an (subset of an) index

df.merge -- use this when joining on a subset of the columns

So using merge on your data as-is:

print(left.merge(right, on=['key']))

    A key   C
0  A0  K0  C1
1  A1  K1  C0
2  A2  K2  C2
3  A3  K3  C3

ll = left.set_index('key')

rr = right.set_index('key')

print(ll.join(rr))

      A   C
key        
K0   A0  C1
K1   A1  C0
K2   A2  C2
K3   A3  C3

On Sun, Sep 27, 2015 at 1:30 PM, Soren <soer...@gmail.com> wrote:

Where is the code that I posted??

Soren

unread,

Sep 28, 2015, 10:47:49 PM9/28/15

to PyData

Would it be difficult to change the behavior of join in a way that it does fill up the NaN's?

I mean, is there a meaningful reason for why join is behaving like that?

E.g. performance?

Paul Hobson

unread,

Sep 29, 2015, 11:18:59 AM9/29/15

to pyd...@googlegroups.com

I don't know the answers to your question other than: "join" is for indices, "merge" is for columns.

--

Reply all

Reply to author

Forward