By the way, normally you'd pass some command-line arguments to your
job's constructor to feed in input files and configuration. In this
particular example, the job will ALWAYS read from stdin, and will
always run in local mode.
You can also initialize your job with args=sys.argv[1:] to just read
the standard options from the command line.
I apologize for not having better examples of running a MRJob from a
separate script. At Yelp we have a (crufty, old) framework for running
batch jobs that has its own command-line option parsing. A typical
batch job flow is something like:
- pick which log files to read from based on a date range specified on
the command-line
- run a MRJob on these log files, passing through relevant options
from the command-line
- write the output of the MRJob to a database
Hope this helps!
-Dave
--
Yelp is looking to hire great engineers! See http://www.yelp.com/careers.
-Dave