Hi Zubin,
First we need to solve compatibility issues that involve streaming. You can follow the evolution of one such issue issue
here (btw, it's a Cloudera employee working on it). It's hard to know if additional issues will arise as this one alone blocks the test suite very early. One thing we could do is build Hadoop from trunk and test rmr2. The fix should be already in there.
Once bugs are out of the way, YARN is pretty much transparent for rmr2. One aspect where it could help which you seem to hint at is the following, quoted from the cloudera intro to YARN
"Because YARN expects to schedule jobs with heterogeneous task resource requests, it instead allows containers to request variable amounts of memory and schedules based on those. "
Pure Java jobs vs rmr2 jobs (a combination of streaming and R) can be very different as far as the handling of resources is concerned. Whereas for Java only jobs one may try to allocate all available memory to java, compatible with hw resources and desired level of parallelism, in rmr2 one may want to keep the java processes lighter and give more room to R. While I don't know the details of how this would work (a dedicated rmr2 queue? Some additional functionality in rmr2 to allocate resources on a per job basis?) it looks to me there is potential and I am looking forward to working with teams such as yours that have developed considerable experience with production deployments and large scale rmr2-based computations.
Of course YARN allows complete new types of applications but it's hard for me to speculate what the implications could be for R users. Those would be beyond the scope of rmr2.
Antonio