mrjob v0.7.2 is out!
If you are using Spark, mrjob now emulates archives (a YARN-only feature of Spark) on all non-YARN masters except for local. This means you can use `mrjob spark-submit` to port a Spark job from YARN to Mesos or other non-YARN platforms (as well as making Spark mrjobs more flexible).
This release fixes a long-standing security issue where we would sometimes copy your EC2 key pair file to the master node to be able to reach logs on other nodes. mrjob now uses ssh-add and the SSH agent.
Since Python 2 has reached end-of-life, the default python_bin when you’re using Python 2 is now `python2.7`, not `python`.
The extra_cluster_params option will now recursively merge dict params, so you can do things like:
runners:
emr:
extra_cluster_params:
Instances:
EmrManagedMasterSecurityGroup: sg-foo
without clobbering the Instances param and wrecking your API query.
For more information, see:
https://mrjob.readthedocs.io/en/stable/whats-new.html#v0-7-2
-Dave
P.S. Contrary to my last email, I’m back working on mrjob for a few months on a contract basis, for a different company. If you or your company has mrjob features you’d like to hire me to work on, please let me know! :)