mrjob v0.6.8 is out, big news for Spark users

12 views

Skip to first unread message

Dave Marin

unread,

Apr 26, 2019, 2:40:45 PM4/26/19

to mr...@googlegroups.com

mrjob v0.6.8 provides full support for Spark. You can now launch Spark code with any runner (except Google Cloud Dataproc, still working on that), and mrjob not only integrates with existing features, but makes mrjob-specific features (e.g. setup scripts) work seamlessly inside Spark:

https://mrjob.readthedocs.io/en/stable/guides/spark.html

As if that weren’t enough, this release also adds a Spark runner that can run regular old MRJobs (originally designed to run on Hadoop Streaming) on any Spark cluster. So if your team is moving from Hadoop to a non-Hadoop Spark (e.g. Mesos, Kubernetes), you can take your old MRJobs with you, without rewriting a line of code. For more info, see:

https://mrjob.readthedocs.io/en/stable/guides/spark.html#running-classic-mrjobs-on-spark

As with all releases, there are also a number of bugfixes and small improvements; for the details, see:

https://mrjob.readthedocs.io/en/stable/whats-new.html#v0-6-8

-Dave

Reply all

Reply to author

Forward

0 new messages