Hello everyone,
For the fallowing dataset and code I get only one reducer. If I use “mapred.reduce.tasks=4” the system is using 4 reducer jobs with no error and with correct results. What am I doing wrong? Perhaps I did not understand the concept.
The purpose is to compute the mean() for each student, class and year. All the data is randomly generated and divided in 4 csv files separated by “;” (2010, 2011, 2012 and 2013)
Data set: Grades for student in a school.
Year;Student;Class;Grade
2009;Mike;Math;8
2009;John;Biology;9
…
Code:
library(rmr2)library(rhdfs)hdfs.init()
hdfs.root <- 'catalog'hdfs.in <- file.path(hdfs.root, 'data')hdfs.out <- file.path(hdfs.root, 'out')
map <- function(k,catalogl) { colnames(catalogl)<- c('year', 'student','class','grade') key<-data.frame(catalogl$year,catalogl$student,catalogl$class) return(keyval(key,catalogl$grade)) }
reduce <- function(key, grades) { keyval(key, mean(grades))}
hdfs.out<-from.dfs(mapreduce(input=hdfs.in, input.format = make.input.format("csv", sep = ";"), map=map, reduce=reduce #,backend.parameters = list(hadoop = list(D = "mapred.reduce.tasks=4")) ))
Thank you.
--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.