Hi,
Not exactly answers your question, but your attempt at working without having all of the data in memory, and having different portions of pandas dataframes with slightly different columns resembles some of the problems I had up until recently.
Saving tables in hdf format doesn't allow different structure (column names, counts, types, etc), as long as it's saved into the same node. Once the structure is created, it can't be easily modified.
You might already figured out you can work around that by saving dataframes to different nodes or hdf files.
If you just save the dataframes to separate nodes or files and operate on each dataframe indevidualy, you're pretty much OK.
However, if you use dask (which is a library based on pandas that enables out of core computations by working on multiple dataframe partitions) you can have most of the heavy lifting of working with multiple dataframes separately (and since version 0.10.0, latest, seemlessly allows saving every internal partition to a different hdf node (full disclosure: that's an ability I contributed to).
Dask does some things well and other things not so much, so YMMV.
Hope that helps,
Nir
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.