Can input be a folder of Rdata files?

Elizabeth Tang

unread,

Mar 31, 2014, 5:46:26 AM3/31/14

to rha...@googlegroups.com

Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs()

However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?

Antonio Piccolboni

unread,

Mar 31, 2014, 12:25:34 PM3/31/14

to RHadoop Google Group

Hi,

unfortunately that's not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least not in a format that Hadoop streaming can understand.

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antonio Piccolboni

unread,

Mar 31, 2014, 11:33:24 PM3/31/14

to rha...@googlegroups.com, ant...@piccolboni.info

Is it possible to modify whatever process creates the .Rdata files to write the same data with to.dfs? I've never tried this, but if you write multiple files in the same directory with to.dfs, then you should be able to run a mapreduce job on them. Alternatively, you can specify all the input paths as a character vector.

Antonio

On Monday, March 31, 2014 9:25:34 AM UTC-7, Antonio Piccolboni wrote:

Hi,
unfortunately that's not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least not in a format that Hadoop streaming can understand.

Antonio

On Mon, Mar 31, 2014 at 2:46 AM, Elizabeth Tang <eliz...@gmail.com> wrote:

Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs()

However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+unsubscribe@googlegroups.com ||

web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+unsubscribe@googlegroups.com.

Elizabeth Tang

unread,

Apr 2, 2014, 3:22:42 AM4/2/14

to rha...@googlegroups.com, ant...@piccolboni.info

Ok, will experiment, thanks!

On Tuesday, April 1, 2014 11:33:24 AM UTC+8, Antonio Piccolboni wrote:

Is it possible to modify whatever process creates the .Rdata files to write the same data with to.dfs? I've never tried this, but if you write multiple files in the same directory with to.dfs, then you should be able to run a mapreduce job on them. Alternatively, you can specify all the input paths as a character vector.

Antonio

On Monday, March 31, 2014 9:25:34 AM UTC-7, Antonio Piccolboni wrote:

Hi,
unfortunately that's not supposed to work. Hadoop needs a way to parse the data, at least to the point of telling where records start and the distinction between key and value. Inside that, it can be opaque as long as one doesn't need certain features. With Rdata we don't have that information, or at least not in a format that Hadoop streaming can understand.

Antonio

On Mon, Mar 31, 2014 at 2:46 AM, Elizabeth Tang <eliz...@gmail.com> wrote:

Hi,

can input be a folder to Rdata files?

i realized that it works for csv files and for a single R object with to.dfs()

However, it does not seem to work for a folder of Rdata files in the hdfs.

Any suggestion or is this against how it is supposed to work?

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||

web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.

Elizabeth Tang

unread,

Apr 2, 2014, 9:33:48 PM4/2/14

to rha...@googlegroups.com, ant...@piccolboni.info

Yes, the

input=list(to.dfs(x1), to.dfs(x2))

works.

Thanks.

Antonio Piccolboni

unread,

Apr 2, 2014, 10:43:52 PM4/2/14

to RHadoop Google Group

Thanks for reporting back, it helps everybody

Reply all

Reply to author

Forward