MapReduce is stop at large data without error messages.

31 views

Skip to first unread message

jieun shin

unread,

Jul 9, 2015, 10:37:22 PM7/9/15

to rha...@googlegroups.com

Hi, I'm a master student in the Korea and i'm studying Big data with RHadoop.

I have a problem with under the code. Pleas give me a tip!

When I input the 10Mbyte file using the under code, I have not problems.

However, in the case of the 200Mbyte(>10Mbyte ), MapReduce stops!

Without error messages, just stop at Reduce 67% point.

My question is why my code is wrong with 200Mbyte data.

<My RHadoop code>-------------------------------------------------------------------------------------

cs.map=function(.,M)

{

jData<- do.call("rbind", lapply(strsplit(unlist(M),","),as.numeric))

keyval(1,jData+1)

}

cs.reduce=function(k,Z)

{

keyval(k,Z)

}

Jinput='/JBH/l200M.csv'

Joutput='/result06'

mapreduce(input=Jinput,output=Joutput,input.format="text",map=cs.map,reduce=cs.reduce,combine=F)

-------------------------------------------------------------------------------------------------------

<My distributed Enviroment>-----------

Number of Nodes : 1 master

5 slaves

software version :

- OS : Ubuntu 14.04LTS

- Java : 1.7.0

- Hadoop 0.20.2

- R : 3.1.0

- rmr2 3.3.0

- rhdfs 1.0.8

---------------------------------------

Any assistance is appreciated.

Best Regards,

Jieun

Antonio Piccolboni

unread,

Aug 5, 2015, 6:38:47 PM8/5/15

to RHadoop

It seems to me your map function is inefficient and the reduce function useless. If your input has some format that can be named, like csv, I would recommend using the input format argument to mapreduce and the make.input.format function. Reading the manual would also not hurt, as well generally strengthening your knowledge of efficient R programming and mapreduce.

Reply all

Reply to author

Forward

0 new messages