RHDFS output format

43 views
Skip to first unread message

Brian Dolan

unread,
Feb 4, 2015, 12:15:30 PM2/4/15
to rha...@googlegroups.com
Hello RHadoopers,

I'm having a problem reading the output of an RHDFS process.  When I write out to a file, a control character is placed at the top of the new file.   I understand there is a serialization process, but I can't find any documentation on how to read that file except within R.  Can anyone provide a work-around that does not require writing to the local system?

For instance
> ofile = hdfs.file("brian.txt", "w")
> hdfs.write("hi",ofile)
> hdfs.close(ofile)

> hdfs dfs -cat brian.txt
X
    hi

What is that "X", exactly?

I have also opened this StackOverflow, if anyone wants points: http://stackoverflow.com/questions/27852357/string-character-in-rhdfs-output/27940947

Thanks!
b

~~~~~~
May All Your Sequences Converge



Antonio Piccolboni

unread,
Feb 4, 2015, 12:33:45 PM2/4/15
to RHadoop Google Group
A txt extension does not an ASCII file make, so if you want to read that file I suggest you use hexdump (that will be instructive about binary formats, but not immediately useful). By default, hdfs.write will call serialize on its main argument to get a raw vector out of it, unless it is already a raw vector. So what I would try is

hdfs.write(charToRaw("hi"), ofile)


I apologize I can test it myself right now, it's a long story about java versions. To read text files, there is a dedicated hdfs.read.text.file.  I hope this helps

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages