I've run into the same thing. My conclusion of the situation was that when you
use a combiner with hadoop it adds an extra: serialization, sorting,
deserialization to the mapper
If the combiner isn't able to reduce the data a significant amount,
this is pure overhead. Some code (like scoobi's join) stuff is
organized so that each mapper can has a huge amount of reducable data,
(IIRC x100 isn't unusual) so its a huge benefit (and without the
combiner, isn't actually even feasible to run).
Maybe a simple (and ugly) combineWithoutCombiner method could be
added, that is just simply a wrapper around map [Since I kind of feel
a configuration option is too course for anything other than
performance testing]
> --
> You received this message because you are subscribed to the Google Groups
> "scoobi-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
scoobi-dev+...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.