How can i read csv file as a data set to work with plyrmr.

135 views
Skip to first unread message

Yanish Pradhananga

unread,
Sep 29, 2014, 10:19:45 AM9/29/14
to rha...@googlegroups.com

bind.cols(mtcars,carb.per.cyl=carb/cyl)
This works perfectly. (y)
inp =make.input.format("csv",sep = ",",stringsAsFactors = FALSE)
inpu <- from.dfs("/user/yanish/sampledata/a.csv",format = inp)
Here i'm getting error.
#########3
Simillarly i try
data=to.dfs(as.data.frame(read.csv("/usr/local/hadoop/hadoop/sample_data/a.csv",header=TRUE)))
bind.cols(data,something=somethin+somethi)
Error in UseMethod("bind.cols") : 
  no applicable method for 'bind.cols' applied to an object of class "function"
Even I try with iris dataset it's work.
How can i perform plyrmr or any mapreduce as a data set for csv file what I can do.
Or do i have to make my csv file as a data set to deal with plyrmr.
Some suggestion and guideline plz.

Antonio Piccolboni

unread,
Sep 29, 2014, 12:05:32 PM9/29/14
to rha...@googlegroups.com
Hi,
can you build a test case that I can run? For instance with the iris data set?


On Monday, September 29, 2014 7:19:45 AM UTC-7, Yanish Pradhananga wrote:

bind.cols(mtcars,carb.per.cyl=carb/cyl)
This works perfectly. (y)
inp =make.input.format("csv",sep = ",",stringsAsFactors = FALSE)
inpu <- from.dfs("/user/yanish/sampledata/a.csv",format = inp)
Here i'm getting error.

What error? We are all professionals, we know how to report bugs, correct?
 

#########3
Simillarly i try
data=to.dfs(as.data.frame(read.csv("/usr/local/hadoop/hadoop/sample_data/a.csv",header=TRUE)))
bind.cols(data,something=somethin+somethi)
Error in UseMethod("bind.cols") : 
  no applicable method for 'bind.cols' applied to an object of class "function"

Bind.cols can't work directly off to.dfs output. rmr2 and plyrmr have some interoperability, but you can't mix and match to your heart's content.
 
Even I try with iris dataset it's work.
How can i perform plyrmr or any mapreduce as a data set for csv file what I can do.
Or do i have to make my csv file as a data set to deal with plyrmr.
Some suggestion and guideline plz.

I would try to build a self-contained plyrmr example that I can run, for instance using the iris or mtcars data sets and learn each package before mixing and matching with rmr2. This works for me


bind.cols(input(iris), aspect = Sepal.Length/Sepal.Width)

This also works (shows how to read from disk after creating on-disk data set)

IF = make.input.format("csv",sep = ",",stringsAsFactors = FALSE)
OF = make.output.format("csv",sep = ",")
iris.on.disk = output(input(iris), format = OF, input.format = IF)
bind.cols(input(iris.on.disk), aspect = Sepal.Length/Sepal.Width)


If you replace iris.on.disk with a path to an existing data set in the same format, it should also work.


 
 
Reply all
Reply to author
Forward
0 new messages