Hey Daishi,
Hope you don't mind me forwarding this answer to the dumbo-user mailinglist as well. In the future you might want to post your questions directly to this list btw, it's usually a quicker way of getting an answer... :)
Afaik, the cloudera distribution includes HADOOP-1722, HADOOP-5450, and HADOOP-5528, but not MAPREDUCE-764. The following blog post provides more info about the latter:
http://dumbotics.com/2009/07/15/mapreduce-764/
So it should work fine in most cases, but you might see some problems when you try to do fairly unusual things like outputting None.
I'm sure the Cloudera guys would be willing to apply the MAPREDUCE-764 bugfix as well btw, now that it got reviewed and accepted for hadoop 0.21. It'd definitely be worth dropping them a line about it if you ask me.
-Klaas
-----Original Message-----
From: Daishi Harada
Sent: Fri 6-11-2009 20:04
To: Bosteels, Klaas
Subject: Dumbo on Cloudera's Hadoop distribution
Hi Klaas,
Thanks for making a pythonic interface to hadoop available
to the wider community.
I'm currently running Cloudera's hadoop-0.20.1+152 distribution,
and I was wondering whether you knew if that distribution includes
the patches necessary to run Dumbo. I do see a blog post that
you made from earlier this year which seems to indicate that it
worked at that time, but I know that both Cloudera's hadoop and
Dumbo have made releases since then.
If you have any comments/thoughts on how I might verify
correct operation I'd appreciate it. (Will it be obvious if
it doesn't work? I.e., if the examples simply fail then it's
easy to determine that the patches are still needed, but
I'm concerned that things might mostly work but I'd be
unaware of corner-case failures).
Thanks in advance,
Daishi