mongo_ext can raise "no c decoder for this type yet" for special characters.

184 views
Skip to first unread message

Nicolas Fouché

unread,
Nov 30, 2009, 11:53:54 AM11/30/09
to mongodb-user
Hi,

following the discussion about unreadable objects in the database
http://groups.google.com/group/mongodb-user/browse_thread/thread/4e3aa74f1a6fe5fc,
I found a problem with the C extension of the Ruby driver.

Here is a code snippet to reproduce the problem: http://pastie.org/720325

So it seems that mongo_ext can insert anything but cannot read it.
For information, the Mongo console strips the displayed value where
this character is present.


Here is another example for a document containing bad utf8 characters:

Here is what I retrieve without mongo_ext:
"Cette pi�ce jointe est un message MAPI 1.0 incorpor� et n'est pas
compatible avec ce syst�me de messagerie.\000�M\036��w��t���m8�g ��y"

Here is what I is raised with mongo_ext:
TypeError: no c decoder for this type yet (-45)

And the Mongo console raises:
"decode failed. probably invalid utf-8 string [Cette pi?ce jointe
est un message MAPI 1.0 incorpor? et n'est pas compatible avec ce syst?
me de messagerie.]
why: TypeError: malformed UTF-8 character sequence at offset 8
exception: invalid utf8"


I understand that it is our task to ensure that we do not insert "bad"
things in MongoDB, but we get data from a lot of different sources,
and new encoding problems can happen at anytime.

Should "mongo_ext" behave exactly like "mongo" ? Returning what was
originally inserted without raising.

Thanks,

Nicolas

Michael Dirolf

unread,
Nov 30, 2009, 11:55:49 AM11/30/09
to mongod...@googlegroups.com
mongo_ext shouldn't be allowing you to insert bad data to begin with.
will take a look at your test case
> --
>
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
>

Nicolas Fouché

unread,
Nov 30, 2009, 1:09:25 PM11/30/09
to mongodb-user
FYI, here are what I find in my logs, a lot of other unrecognized
types.

"#<TypeError: no c decoder for this type yet (32)>"
"#<TypeError: no c decoder for this type yet (0)>"
"#<TypeError: no c decoder for this type yet (62)>"
"#<TypeError: no c decoder for this type yet (-120)>"
"#<TypeError: no c decoder for this type yet (-61)>"
"#<TypeError: no c decoder for this type yet (122)>"
"#<TypeError: no c decoder for this type yet (67)>"
"#<TypeError: no c decoder for this type yet (-60)>"
"#<TypeError: no c decoder for this type yet (48)>"
"#<TypeError: no c decoder for this type yet (-62)>"
"#<TypeError: no c decoder for this type yet (49)>"
"#<TypeError: no c decoder for this type yet (44)>"
"#<TypeError: no c decoder for this type yet (31)>"
"#<TypeError: no c decoder for this type yet (109)>"
"#<TypeError: no c decoder for this type yet (-127)>"
"#<TypeError: no c decoder for this type yet (-48)>"
"#<TypeError: no c decoder for this type yet (28)>"
"#<TypeError: no c decoder for this type yet (82)>"
"#<TypeError: no c decoder for this type yet (73)>"
"#<TypeError: no c decoder for this type yet (87)>"
"#<TypeError: no c decoder for this type yet (63)>"
"#<TypeError: no c decoder for this type yet (105)>"
"#<TypeError: no c decoder for this type yet (92)>"
"#<TypeError: no c decoder for this type yet (78)>"
"#<TypeError: no c decoder for this type yet (120)>"
"#<TypeError: no c decoder for this type yet (-7)>"
"#<TypeError: no c decoder for this type yet (50)>"
"#<TypeError: no c decoder for this type yet (-49)>"
"#<TypeError: no c decoder for this type yet (58)>"
"#<TypeError: no c decoder for this type yet (41)>"
"#<TypeError: no c decoder for this type yet (108)>"
(...)

On Nov 30, 5:55 pm, Michael Dirolf <m...@10gen.com> wrote:
> mongo_ext shouldn't be allowing you to insert bad data to begin with.
> will take a look at your test case
>
> On Mon, Nov 30, 2009 at 11:53 AM, Nicolas Fouché <nico...@silentale.com> wrote:
> > Hi,
>
> > following the discussion about unreadable objects in the database
> >http://groups.google.com/group/mongodb-user/browse_thread/thread/4e3a...,

Michael Dirolf

unread,
Nov 30, 2009, 1:12:56 PM11/30/09
to mongod...@googlegroups.com
Yeah, those all stem from the same root issue, which is that you're
being allowed to insert bad data to begin with.

Thanks for the update,
Mike

Nicolas Fouché

unread,
Nov 30, 2009, 1:44:09 PM11/30/09
to mongodb-user
One other thing, in case it can help.

Here is the code we sometimes use in tests to know if a string
contains bad 'utf-8' characters. The original code was found in the
"stringex" gem.

def raise_unless_utf8(string)
string.gsub(/[^\x00-\x7f]/u) do |codepoint|
codepoint.unpack("U")[0]
end
end

require 'iconv'
string = "日本語"
string = Iconv.conv('shift-jis', 'utf-8', string).first
raise_unless_utf8(string) # => ArgumentError: malformed UTF-8
character

string = "é"
string = Iconv.conv('iso-8859-1, 'utf-8', string).first
raise_unless_utf8(string) # => ArgumentError: malformed UTF-8
character (expected 3 bytes, given 1 bytes)

Nicolas

On Nov 30, 7:12 pm, Michael Dirolf <m...@10gen.com> wrote:
> Yeah, those all stem from the same root issue, which is that you're
> being allowed to insert bad data to begin with.
>
> Thanks for the update,
> Mike
>

Nicolas Fouché

unread,
Dec 17, 2009, 7:16:43 AM12/17/09
to mongodb-user
Hi,

I found that at least one UTF-8 character is not supported by
mongo_ext (0.18.1). It's the very first one: "\x00"
http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=string-literal&view=3

$KCODE = 'u'
char = "\x00"
document = {:_id => rand(999999999), :content => char}
id = collection.save(document)
collection.find({:_id => id}).to_a # => <TypeError: no c decoder for
this type yet (0)>

I know it can be considered as an edge case, but it's common to see
"=00" quoted printable strings in emails (which is concerted to "\x00"
by unpack("M*")).

It worked fine with the next 2047 utf-8 characters, see http://pastie.org/747024

Nicolas

> > >>mongo_extshouldn't be allowing you to insert bad data to begin with.


> > >> will take a look at your test case
>
> > >> On Mon, Nov 30, 2009 at 11:53 AM, Nicolas Fouché <nico...@silentale.com> wrote:
> > >> > Hi,
>
> > >> > following the discussion about unreadable objects in the database
> > >> >http://groups.google.com/group/mongodb-user/browse_thread/thread/4e3a...,
> > >> > I found a problem with the C extension of the Ruby driver.
>
> > >> > Here is a code snippet to reproduce the problem:http://pastie.org/720325
>

> > >> > So it seems thatmongo_extcan insert anything but cannot read it.

Michael Dirolf

unread,
Dec 17, 2009, 10:00:21 AM12/17/09
to mongod...@googlegroups.com
Thanks Nicolas - this is a bug. Will have a fix out soon and ping you here.

Michael Dirolf

unread,
Dec 17, 2009, 12:26:57 PM12/17/09
to mongod...@googlegroups.com
Should be fixed in master, thanks again!
Reply all
Reply to author
Forward
0 new messages