mrjob v0.7.2 released!

8 views
Skip to first unread message

Dave Marin

unread,
Apr 13, 2020, 7:04:09 PM4/13/20
to mr...@googlegroups.com
mrjob v0.7.2 is out!

If you are using Spark, mrjob now emulates archives (a YARN-only feature of Spark) on all non-YARN masters except for local. This means you can use `mrjob spark-submit` to port a Spark job from YARN to Mesos or other non-YARN platforms (as well as making Spark mrjobs more flexible).

This release fixes a long-standing security issue where we would sometimes copy your EC2 key pair file to the master node to be able to reach logs on other nodes. mrjob now uses ssh-add and the SSH agent.

Since Python 2 has reached end-of-life, the default python_bin when you’re using Python 2 is now `python2.7`, not `python`.

The extra_cluster_params option will now recursively merge dict params, so you can do things like:

runners:
emr:
extra_cluster_params:
Instances:
EmrManagedMasterSecurityGroup: sg-foo

without clobbering the Instances param and wrecking your API query.

For more information, see: https://mrjob.readthedocs.io/en/stable/whats-new.html#v0-7-2

-Dave

P.S. Contrary to my last email, I’m back working on mrjob for a few months on a contract basis, for a different company. If you or your company has mrjob features you’d like to hire me to work on, please let me know! :)

Reply all
Reply to author
Forward
0 new messages