controlling #mappers & #reducers in Scalding job?

281 views
Skip to first unread message

Ittay Dror

unread,
Apr 23, 2014, 9:57:17 AM4/23/14
to cascadi...@googlegroups.com
Hi,

How can I set a limit on the number of mappers & reducers in my Scalding job?

Regards,
Ittay

Jonathan Coveney

unread,
Apr 23, 2014, 2:52:39 PM4/23/14
to cascadi...@googlegroups.com
Number of mappers is controlled by hadoop and you input format. You can influence it by setting the min split size.

Reducers can be set on any intermediate typedpipe (you are using the typed api, right?!) that has WithReducers.

tpipe.grouped.withReducers(100) //pseudocode


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/f4447e4c-a7bb-487d-ac36-2fae5532a7b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ittay Dror

unread,
Apr 23, 2014, 3:07:36 PM4/23/14
to cascadi...@googlegroups.com

What do I do if I'm using the fields API?

You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/xfrZXAjbSto/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Jonathan Coveney

unread,
Apr 23, 2014, 3:50:34 PM4/23/14
to cascadi...@googlegroups.com

Ittay Dror

unread,
Apr 24, 2014, 1:57:44 AM4/24/14
to cascadi...@googlegroups.com
About mappers (and reducers): I'm concerned with the number of concurrently running mappers (reducers). So I don't mind having 1,000, as long as just 10 are running at any given time. Can I control that?


--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/xfrZXAjbSto/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Jonathan Coveney

unread,
Apr 24, 2014, 2:45:10 AM4/24/14
to cascadi...@googlegroups.com
Can you give a better description of what you want? This sounds like a scheduling issue, not a cascading or scalding one

El miércoles, 23 de abril de 2014, Ittay Dror <ittay...@gmail.com> escribió:

Ittay Dror

unread,
Apr 24, 2014, 3:07:57 AM4/24/14
to cascadi...@googlegroups.com
I'm submitting a job into an environment that hosts other jobs and I don't want it to take too many resources. so only run X processes at any given time.


Jonathan Coveney

unread,
Apr 24, 2014, 8:02:15 PM4/24/14
to cascadi...@googlegroups.com
This sounds like something your scheduler needs to handle, not scalding or cascading.


Reply all
Reply to author
Forward
0 new messages