Adding columns to DataFrames in pandas

1,088 views
Skip to first unread message

yarden

unread,
Jan 18, 2012, 5:11:41 PM1/18/12
to pystatsmodels
Hi,

Suppose df1 is a DataFrame indexed using column X. Now I have another
DataFrame, df2, also indexed by the same range of values as column X.
I'd like to add df2's column Y to df1. That is, for each entry in df2,
add column Y, using the X values to "match" the entries between df1
and df2. If df2 has X values that are not in df1, these entries should
not be added to df1.

Thanks for your help.

yarden

unread,
Jan 18, 2012, 7:56:49 PM1/18/12
to pystatsmodels

Adam Klein

unread,
Jan 19, 2012, 1:11:17 PM1/19/12
to pystat...@googlegroups.com

yarden

unread,
Jan 19, 2012, 1:24:22 PM1/19/12
to pystat...@googlegroups.com
Thanks for your reply. I thought join was the right method but I was unable to get it to work. Suppose we have:

>>> df1
   col1    id
0 -0.3149  a
1  0.3524  b
2 -0.6351  c

>>> df2
   col2    id
0  0.1234  a
1  0.4563  d

# Attempt to join using "id" key
>>> df1.join(df2, on="id")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pandas/core/frame.py", line 2605, in join
    return self._join_on(other, on, how, lsuffix, rsuffix)
  File "/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pandas/core/frame.py", line 2628, in _join_on
    lsuffix=lsuffix, rsuffix=rsuffix)
  File "/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pandas/core/internals.py", line 789, in join_on
    this, other = self._maybe_rename_join(other, lsuffix, rsuffix)
  File "/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pandas/core/internals.py", line 757, in _maybe_rename_join
    raise Exception('columns overlap: %s' % intersection)
Exception: columns overlap: [id]

Any idea what is wrong with this?  Thanks, --Yarden

Adam Klein

unread,
Jan 19, 2012, 2:11:44 PM1/19/12
to pystat...@googlegroups.com
You cannot join column to column, but you can join column to index. So at least one of the dataframes needs to do set_index("id")... for instance

In [70]: df1 = df1.set_index("id")

In [72]: df1
Out[72]: 
    col1  
id        
a  -0.3149
b   0.3524
c  -0.6351

In [73]: df2
Out[73]: 
   col2   id
0  0.1234 a 
1  0.4563 d 

In [74]: df2.join(df1, on="id")
Out[74]: 
   col2   id  col1  
0  0.1234 a  -0.3149
1  0.4563 d   NaN   



Wes McKinney

unread,
Jan 19, 2012, 2:15:02 PM1/19/12
to pystat...@googlegroups.com

Note I've completely overhauled the join/merge functionality for 0.7.0
(released imminently, as soon as I can go through my e-mail and make
sure there are no lingering issues and all the tests pass on the OS
combinations):

http://pandas.sourceforge.net/merging.html#database-style-dataframe-joining-merging

You'll probably want to get upgraded to that ASAP as it will make your
life much easier ;)

- W

Reply all
Reply to author
Forward
0 new messages