rhdfs: Deserialization problem.

75 views
Skip to first unread message

Andrés Gordo Navarro

unread,
Sep 8, 2014, 4:40:25 AM9/8/14
to rha...@googlegroups.com

Hi everyone,

I am Andrés Gordo and I work with Hadoop system. In our company project we developed a R algorithm, it, makes all calcs and write on HDFS using rhdfs, more specifically, hdfs.write function.

hdfs.write function, serializes the content file by default. Is possible write on HDFS directly without serialize the file? We can create the file on local system and use hdfs.put for upload it to HDFS, but, it is not directly writing.

On the other hand, we are trying deserialize the R file using Java (for integrate Java and R we are using JRI and rJava), but all attempts are unsuccessful. Anyone know anything about this?

We are searching manuals, tutorials, information about rhdfs and all RHadoop components, we don't found anything, only the Official RHadoop GitHub, but the documentation is not available here. You have got some king of guidance documents?

Thank you very much.
Andrés Gordo.

Antonio Piccolboni

unread,
Sep 17, 2014, 12:04:22 PM9/17/14
to rha...@googlegroups.com
You can't avoid serialization, by definition. From wikipedia:

In computer science, in the context of data storage, serialization is the process of translating data structures orobject state into a format that can be stored (for example, in a file or memory buffer, or transmitted across anetwork connection link) and reconstructed later in the same or another computer environment.[

What you need is a serialization format that's implemented on both sides.  I can't help with that, but once you have that settled, then read the manual for hdfs.write

If the object is a raw vector, it is written directly to the con object, otherwise it is serialized and the bytes written to the con


Antonio

Thelonius Buddha

unread,
Jan 15, 2015, 9:36:29 AM1/15/15
to rha...@googlegroups.com
I am having a similar issue (as here: http://stackoverflow.com/questions/27852357/string-character-in-rhdfs-output/27940947#27940947 ) with the serialization.  Does anyone have tips on reading text files generated by rhdfs outside of R?


Reply all
Reply to author
Forward
0 new messages