A lot of the doc I can find is about transforming and storing data sets as key/value pairs, which is traditional for MapReduce jobs. However, most other MapReduce APIs also allow for operating on data in MapReduce jobs as though the data were simply sequential vectors, that is, transforming a vector of data into a new vector. Is such a structure for rhwatch() jobs possible without simply having NULL or NA values in an rhcollect() as the last line of the mapper expression?
Additionally, if we want our MapReduce jobs to simply transform vectors, not key/value pairs (e.g. like data read in from a newline delimited text file) do we use map.keys or map.values to get the data?
Thanks,
Alek Eskilson