Hello all,
My query is related to the following question on StackOverflow:
http://stackoverflow.com/questions/12278347/how-can-i-efficiently-save-a-python-pandas-dataframe-in-hdf5-and-open-it-as-a-da
and is partially dealt with by the trtools mentioned in the notebook by Dale Jung:
https://groups.google.com/forum/?fromgroups=#!searchin/pydata/hdf5/pydata/bsgm5V4xP04/hCxi04Iq9z8J
I have data prepared in R using bioconductor which I can save quite nicely in an hdf5 file. The data.frame is stored as a compound dataset. When I read the file using the HDFStore class the dataset is not recognised as a single entity (a DataFrame) but instead each column is available individually to me. I would like to work with the data and then store results again in a way that R can read them as a data.frame. So my questions:
1. If I can get R to write the data.frame as a table inside the hdf5 file, can pandas read that as a DataFrame?
2. Is there a way to have pandas write a DataFrame that can be read by R as a data.frame?
3. Is there a general interest in such round-trip data exchange, i.e., would hacking a more laborious solution maybe based on Dale Jung's trtools be worthwhile to the community?
Thanks in advance for any replies,
Moritz
--
--
--
--
In [969]: store.root.df._v_attrs.pandas_type Out[969]: 'frame_table'where df is a group node. Now, with the generic reader:
r_store.root.detector.readout._v_attrs.pandas_type 'frame_table'where readout is a table node.
r_store.append("detector/readout", df)but that throws an attribute error because the table object lacks the necessary attribute:
AttributeError: 'Table' object has no attribute '_v_filters'
If I instead append to the group:r_store.append("detector", df)It instead creates a new table "table" as per usual.
data_columns := [], index_cols := [(0, 'index')], levels := 1, nan_rep := 'nan', non_index_axes := [(1, ['ADCcount', 'TDCcount', 'energy', 'grid_i', 'grid_j', 'idnumber', 'name', 'pressure'])], pandas_type := 'frame_table', pandas_version := '0.10.1', table_type := 'appendable_frame', values_cols := ['values_block_0', 'values_block_1', 'values_block_2']]which you attach to the table node when reading a generic file. Why does that information have to be on the group node that is parent to the table when pandas writes the data itself?
--