YARN and RHADOOP

33 views

Skip to first unread message

Zubin Dowlaty

unread,

Jun 30, 2013, 9:54:31 PM6/30/13

to rha...@googlegroups.com

Was at Hadoop Summit last week - the Horton Works guys were saying that RHADOOP will support YARN. I was curious what specific functionality this will enable for us? For example running RMR jobs - will YARN offer R to be plugged into a Hadoop cluster and be managed more effectively? Curious minds want to know..

Antonio Piccolboni

unread,

Jul 1, 2013, 1:36:37 PM7/1/13

to RHadoop Google Group

Hi Zubin,
First we need to solve compatibility issues that involve streaming. You can follow the evolution of one such issue issue here (btw, it's a Cloudera employee working on it). It's hard to know if additional issues will arise as this one alone blocks the test suite very early. One thing we could do is build Hadoop from trunk and test rmr2. The fix should be already in there.

Once bugs are out of the way, YARN is pretty much transparent for rmr2. One aspect where it could help which you seem to hint at is the following, quoted from the cloudera intro to YARN

"Because YARN expects to schedule jobs with heterogeneous task resource requests, it instead allows containers to request variable amounts of memory and schedules based on those. "

Pure Java jobs vs rmr2 jobs (a combination of streaming and R) can be very different as far as the handling of resources is concerned. Whereas for Java only jobs one may try to allocate all available memory to java, compatible with hw resources and desired level of parallelism, in rmr2 one may want to keep the java processes lighter and give more room to R. While I don't know the details of how this would work (a dedicated rmr2 queue? Some additional functionality in rmr2 to allocate resources on a per job basis?) it looks to me there is potential and I am looking forward to working with teams such as yours that have developed considerable experience with production deployments and large scale rmr2-based computations.

Of course YARN allows complete new types of applications but it's hard for me to speculate what the implications could be for R users. Those would be beyond the scope of rmr2.

Antonio

On Jun 30, 2013 6:54 PM, "Zubin Dowlaty" <nat...@gmail.com> wrote:

Was at Hadoop Summit last week - the Horton Works guys were saying that RHADOOP will support YARN. I was curious what specific functionality this will enable for us? For example running RMR jobs - will YARN offer R to be plugged into a Hadoop cluster and be managed more effectively? Curious minds want to know..

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward

0 new messages