First of all, congratulations with the 0.5.0 release!
I am now trying to use the new release together with emr-4.3.0 or emr.4.4.0. They changed a lot in EMR 4.x series and now the things that could be previously achieved with bootstrap actions no longer work. In particular, right now I am trying to set up Ganglia, but I guess this is a broader question of specifying preinstalled applications and their configurations.
The only possible way to do it that I found is through
RunJobFlow API call params, which can be specified via the emr_api_params config option (or --emr-api-params cli option), e.g. like this:
runners:
emr:
emr_api_params:
Applications.member.1: 'Hadoop'
Applications.member.2: 'Ganglia'
So far I've tried every config format I could think of, and neither seems to work (I get "MalformedInput" error with different messages).
From the code of mrjob 0.5.0 I can see that it calls run_jobflow with boto args representing complex objects e.g. boto.emr.bootstrap_action.BootstrapAction, and probably for this application params to work it should also create boto.emr.emrobject.Application objects somewhere in _cluster_args(), but I don't see that Application class is used anywhere in mrjob.
Am I missing something? Maybe there is another way of installing EMR 4.x apps with MRJob?
Thanks for the good work and congrats again with the release!