Strange encoding of None in output

39 views
Skip to first unread message

Chris Smoak

unread,
Jul 13, 2009, 10:46:52 PM7/13/09
to dumbo-user
I've tried a few ways of dealing with this, but I end up getting the
same encoding of None in each case.

Instead of None, I'm getting the string: '\x80\x02N.'

Am I missing something?

Zak Stone

unread,
Jul 13, 2009, 11:08:47 PM7/13/09
to dumbo...@googlegroups.com
Hey, Chris!

I've seen the same behavior. My theory is that None is not encoded
properly because it is a Python object rather than a binary string,
and Dumbo can handle the former but not the latter. As a workaround, I
either pickle my keys and values or I use the empty string instead of
None.

Zak

Klaas Bosteels

unread,
Jul 14, 2009, 4:26:31 AM7/14/09
to dumbo...@googlegroups.com
That's a bug! It would be awesome if you could create a ticket for it:

http://dumbo.assembla.com/spaces/dumbo

I just had a quick look but I cannot immediately find the cause.
Fortunately you can, as Zak pointed out, easily work around it...

-Klaas

Klaas Bosteels

unread,
Jul 15, 2009, 4:56:48 AM7/15/09
to dumbo...@googlegroups.com
Follow up:

Thanks to Zak there's a ticket for this issue now:

http://dumbo.assembla.com/spaces/dumbo/tickets/54

As I explained in a comment on this ticket, the erroneous behaviour is
caused by a bug in Streaming's typed bytes code:

http://issues.apache.org/jira/browse/MAPREDUCE-764

A patch has been submitted, hopefully it'll get accepted/committed
soon. You don't have to wait until it gets accepted though, since you
can also just apply the patch yourself:

$ cd /path/to/hadoop
$ wget http://issues.apache.org/jira/secure/attachment/12413534/MAPREDUCE-764.patch
$ patch -p0 < MAPREDUCE-764.patch

-Klaas

Zak Stone

unread,
Jul 15, 2009, 10:38:09 AM7/15/09
to dumbo...@googlegroups.com
Thanks so much, Klaas!

Zak
Reply all
Reply to author
Forward
0 new messages