One reducer if not "forced" to use more

21 views

Skip to first unread message

George Mihaila

unread,

Nov 18, 2013, 11:42:54 AM11/18/13

to rha...@googlegroups.com

Hello everyone,

For the fallowing dataset and code I get only one reducer. If I use “mapred.reduce.tasks=4” the system is using 4 reducer jobs with no error and with correct results. What am I doing wrong? Perhaps I did not understand the concept.

The purpose is to compute the mean() for each student, class and year. All the data is randomly generated and divided in 4 csv files separated by “;” (2010, 2011, 2012 and 2013)

Data set: Grades for student in a school.

Year;Student;Class;Grade

2009;Mike;Math;8

2009;John;Biology;9

…

Code:

library(rmr2)
library(rhdfs)
hdfs.init()

hdfs.root <- 'catalog'
hdfs.in <- file.path(hdfs.root, 'data')
hdfs.out <- file.path(hdfs.root, 'out')

map <- function(k,catalogl) {
  colnames(catalogl)<- c('year', 'student','class','grade')
  key<-data.frame(catalogl$year,catalogl$student,catalogl$class)
  return(keyval(key,catalogl$grade))  
}

reduce <- function(key, grades) {  
  keyval(key, mean(grades))
}

hdfs.out<-from.dfs(mapreduce(input=hdfs.in, 
                             input.format = make.input.format("csv", sep = ";"),
                             map=map,
                             reduce=reduce
                             #,backend.parameters = list(hadoop = list(D = "mapred.reduce.tasks=4"))
                             ))

Thank you.

Antonio Piccolboni

unread,

Nov 18, 2013, 12:56:27 PM11/18/13

to RHadoop Google Group

This depends on the configuration of your cluster. You can act at that level using the recommended setting of 0.95 * the number of reduce slots available or use

```

rmr.options(backend.parameters = ...)

```

with the same syntax you used above and it will set it for all jobs started in the same R session. Thanks

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward

0 new messages