I've been interested in this for a while, and I think I made some progress. Not a full solution but maybe a step in the right direction, that (with some more work) can become a solution:
1. Once a job is over, the command line 'mapred job -status [[job_id]]' yields the full status message for a specific job - the job file, the tracking URL, and all the counters. I'm not sure how standard the counters are, but if they are standard everything that's non-standard is probably a custom counter. Alternatively, knowing what counter names to look for may be useful for some regex matches.
2. Looks like the job id is part of the object returned by the mapreduce function when in non-verbose mode. So after some additional digging and some trial and error, this seems to be working fine:
myData <- to.dfs(1:1000)
myOut2 <- mapreduce(
input = myData,
map = function(k, v) {return(keyval(key = v, val = v^2))},
verbose = FALSE)
jobId <- attr(x = myOut2, which = "job.id")
system(paste(rmr2:::hadoop.cmd(), "job -status", attr(x = myOut2, which = "job.id")))
It may require some further string manipulation, but at the very least the counters (and other outputs) can be made available as part of something we can use programmatically...
I'm not sure how feasible it would be to make the job ID available as an attribute of the output in verbose mode, but I think it would definitely be useful.
Hope this helps,
-Saar