I'm using Dumbo to build a book index. This is an example from a lecture I gave. I want to make sure I'm spreading knowledge and not misinformation. :)Input data looks like this:page-1: Programming means writing code...page-2: A computer is a complex...That's a clipping of the complete input dataset. I map that into...code 1code 1code 2computer 2...and that should be reduced to...code 1,2computer 2...
and so on. Did I correctly implement my reducer? I'm basically trying to gracefully handle both of these cases: (a) the value is a single number and (b) the value is a comma-separated list of numbers.
I tried to make it idempotent, but I'm not sure if I did it right.Thanks,-Adam
--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
it seems your reducer covers both cases, but I wonder why the second case is supported, since the mapper can't generate it.
The framework calls the application's Reduce function once for each unique key in the sorted order.D'oh! And I also confirm this "once per key" behavior in the Hadoop API docs. Now I'm thinking that with Dumbo (and Hadoop streaming in general) the reducer need not be idempotent. Is this correct?