Hi all,
I've been stuck on this problem for a bit, and I thought I had it solved, but no luck. Running my task on EMR, I can only get it to launch one (or 2 max) map tasks. My project is certainly big enough to need many more than that. I have a 200mb gzip input file, and each line requires a fairly cpu intensive computation.
After some digging in the docs (both mrjob and hadoop) I added the following to my mrjob command:
--bootstrap-action="s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapred.tasktracker.map.tasks.maximum=10 -m mapred.tasktracker.reduce.tasks.maximum=10"
This seemed to allow me to start two tasks on two m1.smalls, but when I tested with a few c1.xlarge, it only started one map task.
Any insight toward how to get it to launch a reasonable number of tasks?
Thanks for the help,
Joan