Dynamic input path?

37 views
Skip to first unread message

Eric Seufert

unread,
Nov 2, 2012, 11:50:41 AM11/2/12
to mr...@googlegroups.com
Hi,

I want to run mrjobs so that the input path isn't set from the command line but rather constructed based on some external criteria. I have tried passing the --input_paths argument like this:

mymrjob = mrjobclass(inputs)

but it doesn't work; I can access input paths in __init__ with self.options.input_paths, but once the class initializes, I am prompted with "reading from STDIN". How do I pass a file list to the class? 

Steve Johnson

unread,
Nov 2, 2012, 1:51:08 PM11/2/12
to mr...@googlegroups.com
Sounds like you're doing the right thing, but are you calling super() from __init__()? Also, why are you overriding __init__() at all?

Eric Seufert

unread,
Nov 3, 2012, 9:10:07 AM11/3/12
to mr...@googlegroups.com
Yes, I'm calling super() from within __init()__. I'm overriding __init()__ to set some initial variables, but I don't need to do that there...when I removed the __init()__ function I got the same output, though.

I'm creating the class like this:

    x = myMRJobClass(baseargs)

    with x.make_runner() as runner:

        print "Starting runner"

        runner.run()

and the baseargs variable is a list of strings, the ouput of which looks like this:

['-r', 'inline', "--input_paths=['data/20121103*']", '--db_to_use=mydb', '--data_location=data/', '--db_host=someIP']

Is this formatted correctly? I'm assuming it is, since I can access options.input_paths

Reply all
Reply to author
Forward
0 new messages