Error in MapReduce

Beta

unread,

Apr 3, 2015, 11:04:05 AM4/3/15

to rha...@googlegroups.com

Hi,

I was trying to create mapreduce program based on selecting variables from a dataset. The following program worked perfectly.

Model=c("AMC","AMC","AMC","Buick","Buick")
MPG=c(22,17,22,20,15)
WEIGHT=c(2930,3350,2640,3250,4080)
PRICE=c(4099,4749,3799,4816,7827)
Cars1=data.frame(Model,MPG,WEIGHT,PRICE)


Cars<-to.dfs(Cars1)


out<-mapreduce(Cars,
               map=function(k,v){
                 keyval(v$Model,v$PRICE)
               },
               reduce=function(k,vv){
                 keyval(k,sum(vv))
                 
               }
               
)




out1<-from.dfs(out)
View(out1)

But when I was running the following code, I'm not getting any output from the map task. So even though the code is not throwing any error, I'm not getting output. I'll be grateful if somebody can tell me where I'm making the mistake.

map.R=function(k,v){
  keyval(v$Month,v$Distance)
}


reduce.R=function(k,vv){
  keyval(k,sum(vv))
  
}


deptdelay <- function (input, output,pattern=","){
  mapreduce(input = input,output=output,
            input.format = make.input.format("csv", sep = ","),
            map = map.R,reduce=reduce.R)}






hdfs.root <- '/user/dir1/'
hdfs.data <- file.path(hdfs.root, 'flights.csv') 
hdfs.out <- file.path(hdfs.root, 'out') 
out <- deptdelay(hdfs.data, hdfs.out)

The section of the log is the following:

15/04/03 03:16:08 INFO mapreduce.Job: Job job_1428054454059_0002 completed successfully
15/04/03 03:16:09 INFO mapreduce.Job: Counters: 49

Map-Reduce Framework
		Map input records=227497
		Map output records=0
		Map output bytes=0
		Map output materialized bytes=12
		Input split bytes=218
		Combine input records=0
		Combine output records=0
		Reduce input groups=0
		Reduce shuffle bytes=12
		Reduce input records=0
		Reduce output records=0
		Spilled Records=0
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=1900
		CPU time spent (ms)=39570
		Physical memory (bytes) snapshot=547024896
		Virtual memory (bytes) snapshot=2714505216
		Total committed heap usage (bytes)=396361728

Antonio Piccolboni

unread,

Apr 3, 2015, 12:09:41 PM4/3/15

to RHadoop Google Group

Hi,

I think the only problem there is that you assume support for col names in csv. Unfortunately, because files are partitioned, it's not trivial to do that. Which partitions should have col information? One, all of them? We can make up our own convention and support it, but csv is a transfer format from other tools. Unless we know what other tools are going to do, it's pretty useless. As things are now, the solution is to remove headers from csv files and to provide that information at the format level, specifying col.names in make.input.format and also colClasses if you want to be absolutely safe (as class auto-detection may fail on very small partitions)

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Beta

unread,

Apr 4, 2015, 10:26:00 AM4/4/15

to rha...@googlegroups.com, ant...@piccolboni.info

Thanks a lot Antonio! I tried your suggestion and it worked. Now I'm in better position to play around with Rhadoop. You had been very helpful.

Thank you again!

Reply all

Reply to author

Forward