I'm CC-ing this reply to the dumbo-user mailing list. Please direct
any other questions you might have to this list instead.
The main difference between Happy and Dumbo is that Happy uses Jython
whereas Dumbo relies on CPython (and the Hadoop Streaming interface
for communicating with Hadoop's Java code). Apparently using Jython
works fine for the freebase guys, but I did some experiments with
Jython as well before I decided to go for CPython and it seemed to be
too slow for our needs. Also, on Jython you cannot use certain handy
Python modules (such as NumPy, for instance).
Besides this fundamental difference, there are also some other
disparities, e.g.:
* Dumbo tries to be as succinct and pythonic as possible, whereas the
Happy guys seem to prefer a slightly more rigid and verbose API. In
particular, Dumbo is all about generators, while Happy doesn't use
generators at all (presumably because the generator functionally got
added to Jython only recently).
* Happy seems to be more of a code dump than a proper open source
project. There's only one release available, most of the code seems to
be written by a single developer, and it doesn't seem to use a public
SCM and/or ticketing system. Apparently, there isn't much of a
community around it yet.
But I'm biased of course. Happy definitely is an interesting project,
and I certainly wouldn't mind collaborating more with its developers.
Maybe we could even try to merge the two projects to some extent at
some point.
-Klaas
> Hello,
> Could you please explain the differences in Dumbo and Happy (http://code.google.com/p/happy/).
>
> In other words, why should I choose to work with Dumbo over Happy, or vice-versa.
>
> I'm still incredibly new to Hadoop, and frankly, don't really know where to start from.
>
> Cheers,
> Ryan