Multi-step jobs from multiple classes?

47 views
Skip to first unread message

Michael Armida

unread,
Mar 5, 2015, 5:59:51 PM3/5/15
to mr...@googlegroups.com
Is it possible for a job to span multiple classes? The definition returned by steps is a list - is it possible for it to include a second entry, and for that entry to be another MRStep referencing another class, e.g.:

class AJob(MRJob):
    ...
    def steps(self):
        return [MRStep(mapper=self.mapper), MRStep(reducer=OtherClass.reducer)]

dm

unread,
Mar 8, 2015, 4:06:01 PM3/8/15
to mr...@googlegroups.com
Yes, but you may be asking the wrong question?

mappers, reducers, etc. can be anything that matches the function signature. They don't have to be attached to a class at all; they can be plain old functions! In your example, OtherClass.reducer appears to be a class or static method (though nothing would prevent you from instantiating another job class and getting an instance method from it: OtherClass().reducer).

There are two things that making your mappers/reducers instance methods in your job class buy you:
- the module that the job class is in will be automatically uploaded, so you don't need to worry about setup options, PYTHONPATH, etc.
- self.stderr can be redirected to a StringIO for unit testing (self.increment_counter() and self.set_status() use it).

-Dave
Reply all
Reply to author
Forward
0 new messages