in-place concatenation?

1,556 views
Skip to first unread message

Seth P

unread,
Sep 18, 2014, 9:41:58 PM9/18/14
to pyd...@googlegroups.com
I'm doing something like this to add a new DataFrame to an existing one:
    df1 = pd.concat((df1, df2))
The problem is, I believe, that this temporarily doubles memory usage, which can be problematic when df1 is very large.
Is there a way to do in-place concatenation? (In case it matters/helps, df1 and df2 have the same columns and non-overlapping index.) Alas DataFrame.merge() doesn't seem to have an in_place option.

Of course I also want to do in-place concatenation of Panel objects...

Jeff Reback

unread,
Sep 18, 2014, 10:08:05 PM9/18/14
to pyd...@googlegroups.com
best way is to append both to a HDFStore (could be in chunks if they r really large)

then read in the store

works for panels as well

cannot so inplace concat easily
in theory u could resize a numpy array and copy the data to it 
but I don't think that saves anything
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephan Hoyer

unread,
Sep 18, 2014, 10:39:40 PM9/18/14
to pyd...@googlegroups.com
Unfortunately, concatenating a dataframe in place is not possible, nor is it possible to "resize" a numpy array, due to the numpy memory model -- an array is a contiguous block of memory:

I agree with Jeff that the best solution (if you have memory concerns) is probably to append to an HDF5 table, which does not need to copy to append:

You could do the same thing with a database, which can also append rows efficiently.

Seth P

unread,
Sep 23, 2014, 10:08:27 PM9/23/14
to pyd...@googlegroups.com
FWIW, the documentation for DataFrame.update(), http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.update.html#pandas.DataFrame.update, says that it supports "join : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’", but in practice it complains if join is anything other than 'left'.

Also, there seems to be an inconsistency in naming the {‘left’, ‘right’, ‘outer’, ‘inner’} parameter: DataFrame.update() uses "join"; while DataFrame.join() and DataFrame.merge() uses "how".
Reply all
Reply to author
Forward
0 new messages