"Updating" a pandas dataframe with another

1,347 views
Skip to first unread message

Etienne Chamayou

unread,
Apr 29, 2017, 4:18:44 AM4/29/17
to PyData
I regularly need to "update" a dataframe (A) with another (B) with the following constraints/goals:
- A and B have the same columns
- I can rely on uniquess in one column X (or a set of columns) within A and within B (imagine my dataframes contain orders which are uniquely identified by an order id stored in X)
- B typically include rows whose X value is already in A (the order was already recorded in A but B brings some fresh info e.g. the order amount was changed as one product could not be delivered) in which case I want to overwrite A
- B rows whose X values are not found in A should basically be appended

Currently the way I do this is (with uniqueness in indexes):

import pandas as pd

df0 = pd.DataFrame([1, 2, 3], index = ['A', 'B', 'C'])
df1 = pd.DataFrame([4, 5], index = ['A', 'D'])

dfa = pd.concat([df0,df1])
dfa = dfa[~dfa.index.duplicated(keep='last')]

# Returns:
#   0
#B  2
#C  3
#A  4
#D  5

Does it seem like a reasonable way to achieve this? Could that be a one liner with df0.update(df1, join = 'outer') if 'outer' was available or would that be something else?



Lakshman Prasad

unread,
Apr 29, 2017, 10:00:09 AM4/29/17
to pyd...@googlegroups.com

You can do a merge of 2 DFs to achieve the same with less steps and CPU cycles.

For example, here is one I used recently:

      wares_all_merged = used_ware.merge(ware_df,on='title',how='inner')

You can lookup the whole notebook, over here: https://github.com/becomingGuru/usesthis_analysis_v2/blob/master/uses_this_mar_2017.ipynb if you'd like.


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Etienne Chamayou

unread,
May 8, 2017, 6:57:03 AM5/8/17
to PyData
I don't think that this corresponds to what I'm talking about. Nevertheless, it is true that I can 
- perform an outer join between A and B
- then overwrite A columns with corresponding B columns for each row in which B columns are not empty
(- finally drop B columns)
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages