how to control the output of reducer?

37 views
Skip to first unread message

xpjian

unread,
Jan 4, 2013, 8:53:30 AM1/4/13
to dumbo...@googlegroups.com
hi, all:
    my case is that I want to select some matched lines from a file and print them into output file. But in the reducer , I finally "yield key, values.netxt()", it can not omit the key( in this file case, the key is the offset of the line in the file". Here is my code .

def mapper(key, value):
    import re
    if re.compile(r'\d').match(value):
        yield key, value

def reducer(key, value):
    yield key, value.next()


if __name__ == '__main__':
    import dubmo
    dubmo.run(mapper, reducer)


and my inputfile is like this:
a
1
3
d

so , I want the output file like this:
1
3


but , the above MR outputs 
1, 1
2, 3

Does anyone can help?

Gilles

unread,
Jan 5, 2013, 4:42:53 AM1/5/13
to dumbo...@googlegroups.com
Hi,
2 comments first:
You don't have to use a reducer if you don't need to you can run:
if __name__ == '__main__':
    import dumbo
    dumbo.run(mapper)

You can also use the identityreducer found in dumbo.lib this is an exemple of the code:
 def identityreducer(key, values):
    for value in values:   
        yield (key, value)   

This said if you want to have one line in your output per line matching your regex: use just a mapper and emit all with an empty key
If you want a unique output per distinct values that match you regex you can use this code

def mapper(key, value):
    import re
    if re.compile(r'\d').match(value):
        yield value,""
Reply all
Reply to author
Forward
0 new messages