I regularly need to "update" a dataframe (A) with another (B) with the following constraints/goals:
- A and B have the same columns
- I can rely on uniquess in one column X (or a set of columns) within A and within B (imagine my dataframes contain orders which are uniquely identified by an order id stored in X)
- B typically include rows whose X value is already in A (the order was already recorded in A but B brings some fresh info e.g. the order amount was changed as one product could not be delivered) in which case I want to overwrite A
- B rows whose X values are not found in A should basically be appended
Currently the way I do this is (with uniqueness in indexes):
import pandas as pd
df0 = pd.DataFrame([1, 2, 3], index = ['A', 'B', 'C'])
df1 = pd.DataFrame([4, 5], index = ['A', 'D'])
dfa = pd.concat([df0,df1])
dfa = dfa[~dfa.index.duplicated(keep='last')]
# Returns:
# 0
#B 2
#C 3
#A 4
#D 5
Does it seem like a reasonable way to achieve this? Could that be a one liner with df0.update(df1, join = 'outer') if 'outer' was available or would that be something else?