MapReduce is stop at large data without error messages.

31 views
Skip to first unread message

jieun shin

unread,
Jul 9, 2015, 10:37:22 PM7/9/15
to rha...@googlegroups.com

Hi, I'm a master student in the Korea and i'm studying Big data with RHadoop. 
I have a problem with under the code. Pleas give me a tip! 

When I input the 10Mbyte file using the under code, I have not problems. 
However, in the case of the 200Mbyte(>10Mbyte ), MapReduce stops!   
Without error messages, just stop at Reduce 67% point. 
My question is why my code is wrong with 200Mbyte data.

<My RHadoop code>-------------------------------------------------------------------------------------

cs.map=function(.,M)
{
   jData<- do.call("rbind", lapply(strsplit(unlist(M),","),as.numeric))
   keyval(1,jData+1)
}
cs.reduce=function(k,Z)
{
  keyval(k,Z)
}

Jinput='/JBH/l200M.csv'
Joutput='/result06'

mapreduce(input=Jinput,output=Joutput,input.format="text",map=cs.map,reduce=cs.reduce,combine=F)

-------------------------------------------------------------------------------------------------------

<My distributed Enviroment>-----------

Number of Nodes : 1 master
                  5 slaves
software version : 
 - OS : Ubuntu 14.04LTS
 - Java : 1.7.0 
 - Hadoop 0.20.2
 - R : 3.1.0
 - rmr2 3.3.0 
 - rhdfs 1.0.8 
---------------------------------------

Any assistance is appreciated.

Best Regards,

Jieun

Antonio Piccolboni

unread,
Aug 5, 2015, 6:38:47 PM8/5/15
to RHadoop
It seems to me your map function is inefficient and the reduce function useless. If  your input has some format that can be named, like csv, I would recommend using the input format argument to mapreduce and the make.input.format function. Reading the manual would also not hurt, as well generally strengthening your knowledge of efficient R programming and mapreduce.
Reply all
Reply to author
Forward
0 new messages