Can input be a folder of Rdata files?

33 views
Skip to first unread message

Elizabeth Tang

unread,
Mar 31, 2014, 5:46:26 AM3/31/14
to rha...@googlegroups.com
Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs() 
However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?


Antonio Piccolboni

unread,
Mar 31, 2014, 12:25:34 PM3/31/14
to RHadoop Google Group
Hi,
unfortunately that's  not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least  not in a format that Hadoop streaming can understand.

Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antonio Piccolboni

unread,
Mar 31, 2014, 11:33:24 PM3/31/14
to rha...@googlegroups.com, ant...@piccolboni.info
Is it possible to modify whatever process creates the .Rdata files to write the same data with to.dfs? I've never tried this, but if you write multiple files in the same directory with to.dfs, then you should be able to run a mapreduce job on them. Alternatively, you can specify all the input paths as a character vector.

Antonio


On Monday, March 31, 2014 9:25:34 AM UTC-7, Antonio Piccolboni wrote:
Hi,
unfortunately that's  not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least  not in a format that Hadoop streaming can understand.

Antonio
On Mon, Mar 31, 2014 at 2:46 AM, Elizabeth Tang <eliz...@gmail.com> wrote:
Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs() 
However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?



web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+unsubscribe@googlegroups.com.

Elizabeth Tang

unread,
Apr 2, 2014, 3:22:42 AM4/2/14
to rha...@googlegroups.com, ant...@piccolboni.info
Ok, will experiment, thanks!


On Tuesday, April 1, 2014 11:33:24 AM UTC+8, Antonio Piccolboni wrote:
Is it possible to modify whatever process creates the .Rdata files to write the same data with to.dfs? I've never tried this, but if you write multiple files in the same directory with to.dfs, then you should be able to run a mapreduce job on them. Alternatively, you can specify all the input paths as a character vector.

Antonio

On Monday, March 31, 2014 9:25:34 AM UTC-7, Antonio Piccolboni wrote:
Hi,
unfortunately that's  not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least  not in a format that Hadoop streaming can understand.

Antonio
On Mon, Mar 31, 2014 at 2:46 AM, Elizabeth Tang <eliz...@gmail.com> wrote:
Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs() 
However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?



web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.

Elizabeth Tang

unread,
Apr 2, 2014, 9:33:48 PM4/2/14
to rha...@googlegroups.com, ant...@piccolboni.info
Yes, the 
input=list(to.dfs(x1), to.dfs(x2)) 
works.

Thanks.

Antonio Piccolboni

unread,
Apr 2, 2014, 10:43:52 PM4/2/14
to RHadoop Google Group
Thanks for reporting back, it helps everybody


Reply all
Reply to author
Forward
0 new messages