Linear Regression in R Mapreduce(RHadoop)

451 views
Skip to first unread message

vijay kumar

unread,
Jul 3, 2014, 12:15:39 PM7/3/14
to rha...@googlegroups.com
I m new to RHadoop and also to RMR... I had an requirement to write a Mapreduce Job in R Mapreduce. I have Tried writing but While executing this it gives an Error. Tring to read the file from hdfs


Error:
--------
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
   hadoop streaming failed with error code 1
Code:
input = "/hdfs/bikes_LR/day.csv",
     map=
       function(.,Xi){
         yi =c[Xi[,1],]
         Xi = Xi[,-1]
        keyval(1,list(t(Xi)%*%yi))
       },
     reduce = TRUE ,
     combine = TRUE)))[[1]]
solve(XtX,XtY)



Input:
------------

instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600
6,2011-01-06,1,0,1,0,4,1,1,0.204348,0.233209,0.518261,0.0895652,88,1518,1606
7,2011-01-07,1,0,1,0,5,1,2,0.196522,0.208839,0.498696,0.168726,148,1362,1510
8,2011-01-08,1,0,1,0,6,0,2,0.165,0.162254,0.535833,0.266804,68,891,959
9,2011-01-09,1,0,1,0,0,0,1,0.138333,0.116175,0.434167,0.36195,54,768,822
10,2011-01-10,1,0,1,0,1,1,1,0.150833,0.150888,0.482917,0.223267,41,1280,1321



 Please Suggest me any mistakes.

Antonio Piccolboni

unread,
Jul 3, 2014, 12:35:57 PM7/3/14
to RHadoop Google Group
It looks like you didn't supply the input format. Please check help(mapreduce) and help(make.input.format). There is a preset csv format, but you'll have to change the separator to ",". If that doesn't do it, please review the bug report guidelines (this group's intro message) and provide a complete bug or problem report so that I can look into it. Thanks


Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vijay kumar

unread,
Jul 3, 2014, 12:48:38 PM7/3/14
to rha...@googlegroups.com, ant...@piccolboni.info
Thank u for replying me... Actually i have tried all the possibilities....
But my doubt is with respect ro the code itself...  i want to find out this using RMR lm(cnt~weathersit+temp+atemp+hum+windspeed).

Can u please share the code or please modify my code.... i cant able to figure out how to do...........

Antonio Piccolboni

unread,
Jul 3, 2014, 12:56:05 PM7/3/14
to vijay kumar, RHadoop Google Group
On Thu, Jul 3, 2014 at 9:48 AM, vijay kumar <vijayk...@gmail.com> wrote:
Thank u for replying me... Actually i have tried all the possibilities....

I would respectfully challenge the accuracy of this statement. You have clearly not tried the correct program, for one, and the possibilities are infinite if you don't bound program size and grow exponentially with program size.

But my doubt is with respect ro the code itself...  i want to find out this using RMR lm(cnt~weathersit+temp+atemp+hum+windspeed).

Can u please share the code or please modify my code.... i cant able to figure out how to do...........

I think in the medium and long term it is much better if users access the documentation and learn how to do things and hold me accountable if the documentation is not sufficient than handing out one-off coding freebies. Check out the file "getting-data-in-and-out.R" in the test directory (or some such) if the help() function doesn't solve your problem. Regards


Antonio

vijay kumar

unread,
Jul 3, 2014, 12:59:15 PM7/3/14
to rha...@googlegroups.com, ant...@piccolboni.info
Yes... Its my Bad because i m new to this R and and also RHadoop.. Basically i m from Hadoop background..so i dont have that mush knowledge...
kindly modify my code according to my input.... i would be very happy u can help me ...

lee john

unread,
Jul 13, 2014, 10:08:46 AM7/13/14
to rha...@googlegroups.com
Dear All,
   What is the mean of . in  function(.,Xi) ?
Thank you in advance.

vijay kumar於 2014年7月4日星期五UTC+8上午12時15分39秒寫道:

Antonio Piccolboni

unread,
Jul 13, 2014, 11:49:43 AM7/13/14
to RHadoop Google Group
Nothing special, it's a function of two arguments. I use the . sometimes as argument when I am not going to access that argument in the body, that's just personal style. You could replace it with any valid identifier.


Antonio


hsinay Pradhananga

unread,
Jul 23, 2014, 3:38:02 AM7/23/14
to rha...@googlegroups.com

library(rmr2)
x1<-c(1:9)
x2<-c(2,5,7,8,11,13,15,16,17)
y<-c(6,4,4,2,7,4,7,9,9)
set<-data.frame(x1,x2,y)
se<-to.dfs(set)
ma=mapreduce(input=se,map=function(k,v) {solve(t(cbind(1,v[,1],v[,2]))%*%(cbind(1,v[,1],v[,2])))}%*%(t(cbind(1,v[,1],v[,2]))%*%(v[,3])))
from.dfs(ma) 
I want to perform regression using both map and reduce, 
How can i do


vijay kumar

unread,
Jul 24, 2014, 1:54:28 AM7/24/14
to rha...@googlegroups.com
Hey Here is the working code.... Just check it and get back to me if any doubts


Sys.setenv(HADOOP_HOME="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop")
Sys.setenv(HADOOP_CMD="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/bin/hadoop")

Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.7.0.jar")
library(rmr2)
library(rhdfs)
hdfs.init()
x1<-c(1:9)
x2<-c(2,5,7,8,11,13,15,16,17)
X <- matrix(c(x1, x2), ncol=2)
se<-to.dfs(X)
y<-c(6,4,4,2,7,4,7,9,9)
y<- as.matrix(y,2)

Sum = function(.,YY){
  keyval(1,list(Reduce('+',YY)))
}

XtX =
  values(from.dfs(
    mapreduce(
      input = se,
      map=
        function(.,Xi){
          
         
          Xi = Xi[,-1]
          head(Xi,4)
          keyval(1,list(t(Xi)%*% Xi))
        },
      reduce = Sum ,
      combine = TRUE)))[[1]]

solve(XtX)
XtY =
  values(from.dfs(
    mapreduce(
      input = se,
      map=
        function(.,Xi){
          yi = y[Xi[,1],]
          
          Xi = Xi[,-1]
          
          keyval(1,list(t(Xi)%*%yi))
        },
      reduce = Sum ,
      combine = TRUE)))[[1]]
solve(XtX,XtY)


You received this message because you are subscribed to a topic in the Google Groups "RHadoop" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhadoop/KrK-v2T5pqQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhadoop+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Regards,
Vijay Kumar

Yanish Pradhananga

unread,
Sep 3, 2014, 5:08:53 AM9/3/14
to rha...@googlegroups.com

Hey the code above u have pasted is same as the code from "Big Data analytics with R and Hadoop". In that code from book X.index is defined as
X.index = to.dfs(rnorm(20000,ncol=10)
In above code u have done like
se<-to.dfs(X)
Simillary for y
y=as.matrix(rnorm(2000)) //as in book
y=as.matrix(y,2) // in above code.
In function given below work fine.

yi = y[Xi[,1],]

But if u save y in dfs it give error.
like
y=to.dfs(as.matrix(rnorm(2000))
y=to.dfs.as.matrix(y,2)
yi=y[Xi[,1],]

This will give error why.
And what will be the value of yi in above and this code.
What is the expected output from yi.
plz do tell me.
And another confusion for me is function(.,v) and function(k,v)
keyval(.,v), keyval(1,v) keyval(k,v)

What is the difference in these keyval and function parameter with . and k and 1 etc.
Reply all
Reply to author
Forward
0 new messages