columns overlap but no suffix specified

4,801 views
Skip to first unread message

Soren

unread,
Sep 27, 2015, 4:02:24 PM9/27/15
to PyData
Hi,

when I want to join two tables on a specific column pandas by default asks for a suffices. 
However, the column that is used for the join must have the same values. Including both columns in the output therefore is redundant. 
You would always join  the table and throw away one of the columns, wouldn't you?

Can I change this behavior somehow, so that the column appears without suffix and just one time instead of two?
That would be much more convenient in my opinion. 
regards
Sören

Paul Hobson

unread,
Sep 27, 2015, 10:34:43 PM9/27/15
to pyd...@googlegroups.com
The columns on which the join is performed should not be duplicated. Could you post an exampled demonstrating this behavior?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Soren

unread,
Sep 28, 2015, 1:44:39 PM9/28/15
to PyData
import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'key': ['K0', 'K1', 'K2', 'K3']})
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                   'key': ['K1', 'K0', 'K2', 'K3']})
print left.join(right,on='key',rsuffix='_r',how='right')

print left.join(right,on='key')
ValueError: columns overlap but no suffix specified: Index([u'key'], dtype='object')

print left.join(right,on='key',rsuffix='_r',how='left')
A key    C key_r
0  A0  K0  NaN   NaN
1  A1  K1  NaN   NaN
2  A2  K2  NaN   NaN
3  A3  K3  NaN   NaN
Message has been deleted

Paul Hobson

unread,
Sep 28, 2015, 4:42:19 PM9/28/15
to pyd...@googlegroups.com
So there's the thing:

df.join -- use this when joining on an (subset of an) index
df.merge -- use this when joining on a subset of the columns

So using merge on your data as-is:
print(left.merge(right, on=['key']))
    A key   C
0  A0  K0  C1
1  A1  K1  C0
2  A2  K2  C2
3  A3  K3  C3

ll = left.set_index('key')
rr = right.set_index('key')
print(ll.join(rr))
      A   C
key        
K0   A0  C1
K1   A1  C0
K2   A2  C2
K3   A3  C3



On Sun, Sep 27, 2015 at 1:30 PM, Soren <soer...@gmail.com> wrote:

Where is the code that I posted??

Soren

unread,
Sep 28, 2015, 10:47:49 PM9/28/15
to PyData
Would it be difficult to change the behavior of join in a way that it does fill up the NaN's? 
I mean, is there a meaningful reason for why join is behaving like that?

E.g. performance?

Paul Hobson

unread,
Sep 29, 2015, 11:18:59 AM9/29/15
to pyd...@googlegroups.com
I don't know the answers to your question other than: "join" is for indices, "merge" is for columns.


--
Reply all
Reply to author
Forward
0 new messages