You should only ever have one dumbo.run() call and it's normal for
dumbo scripts to kind of get executed twice yeah (there are reasons
for this but you shouldn't have to worry about those as a dumbo user).
Instead, you should use a runner if you want to have multiple
map/reduce iterations:
def runner(job):
job.additer(TestMapper(1), opts = [("output", "test1")])
job.additer(TestMapper(2), opts = [("output", "test2")])
class TestMapper:
def __init__(self, _val):
sys.stderr.write("Executing __init__" + str(_val) + "\n")
self.val = _val
def __call__(self, key, val):
yield key, self.val #using init val, not MR input val
if __name__ == "__main__":
dumbo.main(runner)
Hope this helps,
-Klaas
> --
> You received this message because you are subscribed to the Google Groups "dumbo-user" group.
> To post to this group, send email to dumbo...@googlegroups.com.
> To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.
>
>
def variator(prog):
i = 0
while some_predicate:
clone = prog.clone()
clone.addopt("param", "iteration=" + str(i))
clone.addopt("input", nr_to_input(i))
clone.addopt("output", nr_to_output(i))
yield clone
i++
def starter(prog):
pass # or set general opts or so
def runner(job):
job.additer(Mapper)
Please blog about it if it works, since this is completely
undocumented functionality as far as I know :)
-K
dumbo.main(runner, starter, variator)
-K