reading XDR representations of BitsKinds

84 views
Skip to first unread message

Douglas Bates

unread,
Jan 24, 2013, 2:55:55 PM1/24/13
to juli...@googlegroups.com
I am progressing reasonably well on reading the ASCII version of the R save format (see https://github.com/dmbates/RDA.jl) and now, in a moment of madness, plan to try reading the binary format.  It mostly comes down to Int32 and Float64 values stored in XDR format (bigendian) within a file that is usually compressed with gzip or xz compression.

Assuming that I have a gzip'd file, I can use the GZip module (extras/gzip.jl) to open it an pull bytes from it.  What I would like to do is to create an XDR structure (defined in /usr/include/rpc/xdr.h) with that stream and use the 

xdr_<whatever>

functions to read the contents.  Is this going about things in a reasonable way?  If so, some hints on how I would accomplish this would be welcome.  (For example, in what shared object are the xdr library routines found?)  If not, what alternatives would be good?

By the way, reading can be done in a single pass so I don't ever need to seek in the file.

Patrick O'Leary

unread,
Jan 24, 2013, 3:07:50 PM1/24/13
to juli...@googlegroups.com
XDR isn't that bad, and might be worth straight reimplementing. But if not, the bsd-xdr implementation (https://code.google.com/p/bsd-xdr/) might be a good idea, since it supports Windows hosts and doesn't have the Sun licensing uncertainty. Which may have been resolved but I can't seem to find where.

Douglas Bates

unread,
Jan 24, 2013, 3:24:19 PM1/24/13
to juli...@googlegroups.com
I may be able to get away with using the hton function on the results of read(<GZipStream>, Array(<typ>, <size>)) although it may not be as effective as the xdr_calls.  I'll see how that goes.

Douglas Bates

unread,
Jan 24, 2013, 6:08:29 PM1/24/13
to juli...@googlegroups.com
On Thursday, January 24, 2013 2:24:19 PM UTC-6, Douglas Bates wrote:
I may be able to get away with using the hton function on the results of read(<GZipStream>, Array(<typ>, <size>)) although it may not be as effective as the xdr_calls.  I'll see how that goes.

That worked and I'm very pleased.  I am able to read a saved R dataframe with 2.6 million rows and 21 columns, most of which are factors (PooledDataArray's in the Julia DataFrames package) in less than a minute.

This is a big deal for those who use R because it provides access to the saved data sets from R packages (currently limited to the save format in the source package).  Nevertheless, this is just a first pass at it and I'm sure there is much optimization that could be done.

John Myles White

unread,
Jan 25, 2013, 2:37:35 PM1/25/13
to juli...@googlegroups.com
This is really great news. Thanks so much for handling this, Doug!

 -- John

--
 
 
 

Reply all
Reply to author
Forward
0 new messages