mapreduce.map.memory.mb and other settings

Huahang Liu

unread,

May 20, 2013, 6:39:18 AM5/20/13

to scoob...@googlegroups.com

Good day to you all.

Scoobi looks very interesting to me. And I am trying to figure out if it could solve most of my problems better.

I am new here so please forgive me if this is a stupid question.

In my current MapReduce jobs, in order to utilize the cluster resources, we have the following settings different for each job.

mapreduce.map.memory.mb

mapreduce.map.java.opts

mapreduce.reduce.memory.mb

mapreduce.reduce.java.opts

I would like to know that if it is possible to set these memory settings in Scoobi?

And also, in our cluster, we have several different queues allocated for different type of jobs. We use a setting named "mapreduce.job.queuename" to submit different jobs to different queues. Is this possible in Scoobi?

And finally, in order to optimise each job, we set different "mapreduce.job.reduce.slowstart.completedmaps" for them. Is this possible in Scoobi? Or maybe I don't have to set this manually and Scoobi will figure out an optimal value for me?

Thanks in advance!

-

huahang

Eric Springer

unread,

May 20, 2013, 10:33:46 AM5/20/13

to scoob...@googlegroups.com

Hi Huahang,

I think one of the guiding philosophies of scoobi is human productivity over performance. Ideally you'd have both, but when you start lifting the level of abstraction, it is is very hard -- I do however think scoobi does an excellent job at keeping it with a constant factor, that can be made up with a few more machines. I just say this, as if you're at the point of tuning things like when the reducers start -- you might be coming into it with the wrong expectations. The ideal job for scoobi is a complicated chain of jobs, where you're not too concerned about every ounce of performance -- as opposed to a 1:1 wrapper around map reduce.

That said, I think you should be able to configure all your settings fine -- there is a ScoobiConfiguration which exposes a Configuration that you could directly set. Keep in mind though, a typical scoobi application needs to execute multiple MapReduce jobs -- so it'll be problematic if you want different settings for each job

--
You received this message because you are subscribed to the Google Groups "scoobi-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scoobi-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Huahang Liu

unread,

May 21, 2013, 10:59:45 PM5/21/13

to scoob...@googlegroups.com

Hi Eric,

I do appreciate the philosophy, human productivity over performance.

The problem of a shared configuration for me is that, in my pipeline, some stage takes a lot of memory and I have to set a pretty large "mapreduce.reduce.memory.mb" (say, 4G) for it. But for a stage that just need 256MB to run, YARN will reserve 4G for it and wouldn't let other jobs to come in until there is enough memory unreserved.

I think ScoobiConfiguration it definitely helpful for me. Since I have to separate out stages that take a large chunk of memory and put different memory settings for them anyway.