Saving Julia dataframe to read in R using HDF5

793 views
Skip to first unread message

Pavel

unread,
Jan 22, 2015, 2:09:25 PM1/22/15
to julia...@googlegroups.com
While reading R datasets in Julia received sufficient attention already, sometimes the results of computations done in Julia need to be readable to R. To accomplish that I was trying to save a DataFrame.jl object in HDF5 file. The code so far is in my StackOverflow question (probably should have posted here instead):
http://stackoverflow.com/questions/28084403/saving-julia-dataframe-to-read-in-r-using-hdf5

The dataframe can then be reassembled in R using 
rhdf5 package tools. It works in principle, but is there a more elegant way to accomplish this? Something that does not require to split the dataframe apart and re-assemble in R, losing some column types (e.g. boolean does not work) along the way?

Tim Holy

unread,
Jan 22, 2015, 6:25:06 PM1/22/15
to julia...@googlegroups.com
In your code, could you basically replace `h5open` with `jldopen`? That way
when you try reading the same file again with julia, you'll have all the type
information.

JLD is basically "HDF5 with annotations that JLD knows how to interpret." If
you're reading the file from another language, you don't have to pay attention
to the annotations (unless you want to).

--Tim

On Thursday, January 22, 2015 11:09:25 AM Pavel wrote:
> While reading R datasets in Julia received sufficient attention already,
> sometimes the results of computations done in Julia need to be readable to
> R. To accomplish that I was trying to save a DataFrame.jl
> <https://github.com/JuliaStats/DataFrames.jl> object in HDF5 file. The code
> so far is in my StackOverflow question (probably should have posted here
> instead):
> http://stackoverflow.com/questions/28084403/saving-julia-dataframe-to-read-i
> n-r-using-hdf5
>
> The dataframe can then be reassembled in R using rhdf5
> <http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html> package

Pavel

unread,
Jan 22, 2015, 7:48:13 PM1/22/15
to julia...@googlegroups.com
Thanks Tim for responding. I tried with `JLD.jldopen` instead. Now all the columns are saved including boolean without conversion to integer, as expected. However R session consistently crashes when trying to even look at the file structure with `rhdf5::h5ls("trydf.h5")`. Not sure if this is rhdf5 R-package issue or not, but something goes wrong when JLD annotations are present.

On a more conceptual level, are R and Julia DataFrame structures too different to manage read/write without reassembling from separate columns?

Tim Holy

unread,
Jan 22, 2015, 9:31:01 PM1/22/15
to julia...@googlegroups.com
On Thursday, January 22, 2015 04:48:13 PM Pavel wrote:
> Thanks Tim for responding. I tried with `JLD.jldopen` instead. Now all the
> columns are saved including boolean without conversion to integer, as
> expected. However R session consistently crashes when trying to even look
> at the file structure with `rhdf5::h5ls("trydf.h5")`. Not sure if this is
> rhdf5 R-package issue or not, but something goes wrong when JLD annotations
> are present.

From the Bioconductor website it appears that rhdf5 aims to be a generic HDF5
interface. So if it's crashing on a *.jld file---which is an HDF5 file---then it
indicates some limitation of rhdf5.

> On a more conceptual level, are R and Julia DataFrame structures too
> different to manage read/write without reassembling from separate columns?

Can't answer that, because I don't know R. Maybe someone else can. If you try
running the jld_dataframe.jl test in HDF5.jl and inspect the results with
h5dump, you'll see that each column is already split out for you, if you know
where to look. (Start with the "df2" data set and follow the references.)

Best,
--Tim

Tom Short

unread,
Jan 22, 2015, 9:40:35 PM1/22/15
to julia...@googlegroups.com
I don't know if it can do it yet, but the RCall package might be able to save data back to an RData file. It's a young package.

Also, you could use CSV files.

Pavel

unread,
Jan 27, 2015, 9:43:23 PM1/27/15
to julia...@googlegroups.com
RCall.jl is a real breakthrough for trying to put together Julia and R work, thanks for the pointer! Here is my example code:
https://gist.github.com/multidis/7ac6f4779e09c986be39

The main advantage is that column types are converted properly (in particular Bool), and a native R object is saved in RData-file. Performance-wise I have not tested with large objects yet, so any advice on code improvement is appreciated.
Reply all
Reply to author
Forward
0 new messages