Is there any good way to control the number of mappers for the sub-tasks in one scalding job?

7 views
Skip to first unread message

Tianshan Cui

unread,
Aug 28, 2018, 2:40:26 PM8/28/18
to Scalding Development
Hi scalding users,

I am curious that is there a good way to control the number of mappers per individual MR stage? It seems like we can easily control the number of reducers via withReducers. However, I didn't find there is a good way to do the similar for mappers. I know we could setup the job config and tune the split size, but that will affect the whole flow. 

Any ideas? Maybe I missed something? :)

Thanks,
Tianshan

P. Oscar Boykin

unread,
Aug 28, 2018, 2:43:56 PM8/28/18
to tiansh...@gmail.com, Scalding Development
That's right, it is much easier to control reducers compared to mappers.

Mappers are controlled by hadoop without a similar simple knob. You can set some config parameters to give hadoop an idea of how many map tasks to launch.

This stack overflow may be helpful:

--
You received this message because you are subscribed to the Google Groups "Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scalding-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Tianshan Cui

unread,
Aug 28, 2018, 4:03:19 PM8/28/18
to P. Oscar Boykin, Scalding Development
Thanks for your quick response. That totally make sense. I guess the workaround in my case would be that I manually split the data in my job class by setting the Hadoop properties. :)
Reply all
Reply to author
Forward
0 new messages