"Updating" a pandas dataframe with another

Etienne Chamayou

unread,

Apr 29, 2017, 4:18:44 AM4/29/17

to PyData

I regularly need to "update" a dataframe (A) with another (B) with the following constraints/goals:

- A and B have the same columns

- I can rely on uniquess in one column X (or a set of columns) within A and within B (imagine my dataframes contain orders which are uniquely identified by an order id stored in X)

- B typically include rows whose X value is already in A (the order was already recorded in A but B brings some fresh info e.g. the order amount was changed as one product could not be delivered) in which case I want to overwrite A

- B rows whose X values are not found in A should basically be appended

Currently the way I do this is (with uniqueness in indexes):

import pandas as pd

df0 = pd.DataFrame([1, 2, 3], index = ['A', 'B', 'C'])

df1 = pd.DataFrame([4, 5], index = ['A', 'D'])

dfa = pd.concat([df0,df1])

dfa = dfa[~dfa.index.duplicated(keep='last')]

# Returns:

# 0

#B 2

#C 3

#A 4

#D 5

Does it seem like a reasonable way to achieve this? Could that be a one liner with df0.update(df1, join = 'outer') if 'outer' was available or would that be something else?

Lakshman Prasad

unread,

Apr 29, 2017, 10:00:09 AM4/29/17

to pyd...@googlegroups.com

You can do a merge of 2 DFs to achieve the same with less steps and CPU cycles.

For example, here is one I used recently:

      wares_all_merged = used_ware.merge(ware_df,on='title',how='inner')

You can lookup the whole notebook, over here: https://github.com/becomingGuru/usesthis_analysis_v2/blob/master/uses_this_mar_2017.ipynb if you'd like.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Etienne Chamayou

unread,

May 8, 2017, 6:57:03 AM5/8/17

to PyData

I don't think that this corresponds to what I'm talking about. Nevertheless, it is true that I can

- perform an outer join between A and B

- then overwrite A columns with corresponding B columns for each row in which B columns are not empty

(- finally drop B columns)

To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.

Reply all

Reply to author

Forward