Integration with EMR

11 views
Skip to first unread message

Igor Gatis

unread,
Oct 29, 2013, 8:14:02 PM10/29/13
to dumbo...@googlegroups.com
mrjob has great integration with EMR. I read this article that dates from Dec'09 which tells integration is manual. How much has changed since then?

brisssou

unread,
Oct 30, 2013, 3:46:23 AM10/30/13
to dumbo...@googlegroups.com
Hey,

We do use dumbo on EMR, with the help from a small bootstrap script, which mainly installs dumbo and a bunch of python dependencies we use.

Brice.

Igor Gatis

unread,
Oct 30, 2013, 5:31:28 AM10/30/13
to dumbo...@googlegroups.com
mrjob integrates with EMR very smoothly. It takes care of creating/deleting the cluster, setting it up, copying input to s3 if needed, etc. All one has to do is to pass -runner=emr parameter (ref, full doc here). I was wondering of something of that nature.


--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user.
For more options, visit https://groups.google.com/groups/opt_out.

Klaas Bosteels

unread,
Oct 30, 2013, 5:39:50 AM10/30/13
to dumbo...@googlegroups.com
Hey Igor,

Don't think anyone has worked on such deep integration so far, but we would definitely love to have it and adding it shouldn't be too hard either. Maybe we could convince you do some work on this? :)

-K

Igor Gatis

unread,
Oct 30, 2013, 5:54:15 AM10/30/13
to dumbo...@googlegroups.com

Hi Klaas,

Yea maybe. I'm checking what already exists.

I wonder if it would be a matter of forking part of the mrjob code and write some glue.

Reply all
Reply to author
Forward
0 new messages