Google Groups

Re: Puzzled by redundant output with multiple runs


Klaas Bosteels Sep 7, 2010 2:09 AM
Posted in group: dumbo-user
Hey Antonio,

You should only ever have one dumbo.run() call and it's normal for
dumbo scripts to kind of get executed twice yeah (there are reasons
for this but you shouldn't have to worry about those as a dumbo user).

Instead, you should use a runner if you want to have multiple
map/reduce iterations:

def runner(job):
   job.additer(TestMapper(1), opts = [("output", "test1")])
   job.additer(TestMapper(2), opts = [("output", "test2")])

class TestMapper:
   def __init__(self, _val):
       sys.stderr.write("Executing __init__" + str(_val) + "\n")
       self.val = _val
   def __call__(self, key, val):
       yield key, self.val #using init val, not MR input val

if __name__ == "__main__":
    dumbo.main(runner)


Hope this helps,
-Klaas

On Fri, Sep 3, 2010 at 8:54 PM, piccolbo <picc...@gmail.com> wrote:
> I have this little test program. I wanted to learn if can pass
> arguments to __init__ of a mapper class.
>
> import dumbo
> import sys
>
> class TestMapper:
>    def __init__(self, _val):
>        sys.stderr.write("Executing __init__" + str(_val) + "\n")
>        self.val = _val
>    def __call__(self, key, val):
>        yield key, self.val #using init val, not MR input val
>
> if __name__ == '__main__':
>    sys.stderr.write("before run 1\n")
>    dumbo.run(TestMapper(1), opts = [("output", "test1")])
>    sys.stderr.write("in between runs\n")
>    dumbo.run(TestMapper(2), opts = [("output", "test2")])
>    sys.stderr.write("after run 2\n")
>
> This is the console output. Why on earth does "before run 1" print
> twice, before and after run 1? Also, "in between runs" prints twice
> and so does "after run 2". This is puzzling. As to the original
> question, the stderr prints might not be totally clear, but the mapper
> objects get instantiated with 1, but never with 2, or if it does it's
> never used. Can somebody please enlighten me? Thanks
>
> Antonio
>
> rl1:~/dumbo/dc3$ dumbo start testinit.py  -python python  -input
> mississippi.txt
> before run 1
> Executing __init__1
> EXEC: PYTHONPATH="/usr/local/lib/python2.6/dist-packages/dumbo-0.21.26-
> py2.6.egg:$PYTHONPATH" python -m dumbo.cmd encodepipe -file
> mississippi.txt | PYTHONPATH="/usr/local/lib/python2.6/dist-packages/
> dumbo-0.21.26-py2.6.egg:$PYTHONPATH"
> dumbo_mrbase_class='dumbo.backends.common.MapRedBase'
> dumbo_jk_class='dumbo.backends.common.JoinKey' python -m testinit map
> 0 262144000  > 'test1'
> before run 1
> Executing __init__1
> in between runs
> Executing __init__2
> after run 2
> in between runs
> Executing __init__2
> EXEC: PYTHONPATH="/usr/local/lib/python2.6/dist-packages/dumbo-0.21.26-
> py2.6.egg:$PYTHONPATH" python -m dumbo.cmd encodepipe -file
> mississippi.txt | PYTHONPATH="/usr/local/lib/python2.6/dist-packages/
> dumbo-0.21.26-py2.6.egg:$PYTHONPATH"
> dumbo_mrbase_class='dumbo.backends.common.MapRedBase'
> dumbo_jk_class='dumbo.backends.common.JoinKey' python -m testinit map
> 0 262144000  > 'test2'
> before run 1
> Executing __init__1
> in between runs
> Executing __init__2
> after run 2
> after run 2
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "dumbo-user" group.
> To post to this group, send email to dumbo...@googlegroups.com.
> To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.
>
>