As a manual hack to get bootstrap actions to work, I edited emr.py:745
to something like this:
args['bootstrap_actions'] = [botoemr.BootstrapAction(
'configure_hadoop', 's3://elasticmapreduce/bootstrap-
actions/configure-hadoop', ["-
s","mapred.tasktracker.map.tasks.maximum=1","-
s","mapred.map.tasks=5"])]
if self._master_bootstrap_script:
args['bootstrap_actions'].extend([botoemr.BootstrapAction(
'master', self._master_bootstrap_script['s3_uri'],
[])])
It appears, however, that setting mapred.map.tasks here has no
effect. Also, passing mapred.map.tasks through --jobconf seems to get
ignored too. Any idea why this parameter is ignored? Does mrjob do
something special to calculate the number of mappers?
Jesse
On May 16, 12:20 pm, Dave Marin <
d...@yelp.com> wrote:
> To set the number of maps/reduces to run, you can use --jobconf to access
> the appropriate hadoop options. For example:
>
> mr_your_job.py --jobconf mapred.map.tasks=23 --jobconf
> mapred.reduce.tasks=42
>
> I believe to actually setting the number of mappers and reducers happens
> when Hadoop is started up. There's a way to do it with bootstrap actions on
> EMR, but haven't yet built support for those (seehttps://
github.com/Yelp/mrjob/issues/69).
>
> -Dave