Hello,
I am very new to rhadoop and having a little trouble.
I have a large number of files, each containing a number of vectors. I wish to perform some vector operations on each vector and then in the reducer stage sum up all vectors from the same file before outputting a filename + a single vector as my output.
Currently I am mimic-ing the wordcount example to some extent, however I want to emit the filename as key and processed vector as data.
something like:
map = function(.,line){
keyval(Sys.getenv("mapreduce_map_input_file"), vector)
}
reduce = function(filename,vectors){
keyval(filename,sum(vectors)
}
When I run this the vectors appear to be getting processed correctly however I am not getting back the filenames. I cam getting unique ids of some kind 1,2,3,.... that seem to correspond to the vector not the file.
That is I am outputing each vector rather than each file + it's associated vector.
I have also tried "map_input_file" and MAPREDUCE_MAP_INPUT_FILE as the env values.