Is --objcheck mandatory to avoid inserting unreadable BSON objects in the database ?

122 views
Skip to first unread message

Nicolas Fouché

unread,
Nov 30, 2009, 7:44:58 AM11/30/09
to mongodb-user
Hello,

we still can't setup a properly working slave on our production
database. The clone() phase of the slave initialization fails with the
message listed in SERVER-434 (http://jira.mongodb.org/browse/
SERVER-434)

(...)
Sat Nov 28 14:28:03 Buildindex mydb.mycollection idxNo:0 { name:
"_id_", ns: "mydb.mycollection", key: { _id: ObjId
(000000000000000000000000) } }
Sat Nov 28 14:28:03 done for 0 records 0.006secs
Sat Nov 28 14:28:08 bad recv() len: 16760190
Sat Nov 28 14:28:08 Assertion: dbclient error communicating with
server
Sat Nov 28 14:28:08 repl: AssertionException dbclient error
communicating with server
Sat Nov 28 14:28:08 repl: sleep 2sec before next pass
Sat Nov 28 14:28:10 repl: main@myserver
(...)

Same message over and over.

It seems that this happens because of invalid objects (not just size,
but invalid UTF-8 characters) that have been blindly inserted by the
server. As far as we understand, running our servers with --objcheck
could solve this issue, but we wonder how it will impact performance
(calling BSONObj::valid() that calls toString() on every inserts seems
inefficient, right ?), and what to do with the existing data (best way
to get a clean database, even at the expense of loosing invalid
objects ?)

Currently, any kind of client (mongod slave or ruby driver) requesting
any of those invalid objects gets an error. Here is a backtrace:

#<TypeError: no c decoder for this type yet (32)>
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/util/bson.rb:155:in `deserialize',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/util/bson.rb:155:in `deserialize',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:208:in `object_from_stream',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:187:in `next_object_on_wire',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:148:in `read_objects_off_wire',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:144:in `read_all',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:219:in `send_query_if_needed',
/opt/ruby-ee-1.8.7/lib/ruby/1.8/mutex_m.rb:67:in `synchronize',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/db.rb:551:in `_synchronize',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:216:in `send_query_if_needed',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:191:in `refill_via_get_more',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:170:in `num_remaining',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:178:in `more?',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/mongodb-mongo-0.14.1/lib/
mongo/cursor.rb:100:in `to_a',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/nfo-mongomapper-0.3.5/lib/
mongomapper/document.rb:178:in `find_every',
/opt/ruby-ee-1.8.7/lib/ruby/gems/1.8/gems/nfo-mongomapper-0.3.5/lib/
mongomapper/document.rb:216:in `find_one',
(...)

FYI, displaying the document in a "mongo" console does not raise any
error, it only stops printing the value of the key when the wrong
character is. In other words, the displayed value is stripped. Command
used: db['mycollection'].find('myid').next()["content"]

So, if we set --objcheck in production, we would receive an error on
inserts instead of reads ? In our architecture, we highly prefer
receiving errors during inserts, so we don't receive errors from
client reads at random times. And so we can fix our encoding issues
and relaunch our insertion jobs.
To avoid using the server to validate BSON objects, would it be a good
thing to implement the object check in drivers ?

We're currently running validate() on all our collections to see if it
can help us retrieving invalid BSON objects already inserted in the
database. If not, how could we find them ? Make a script to iterate on
all objects in all collections with the Ruby driver and keep the
object _id when an exception is raised ?

Thanks in advance,

Nicolas

Eliot Horowitz

unread,
Nov 30, 2009, 8:32:37 AM11/30/09
to mongod...@googlegroups.com, mongodb-user
Have you tried a repair?

Do you know how they got there in the first place? --objcheck is
definitely a decent performance hit. Better to figure out how it
happened and block it there

On Nov 30, 2009, at 7:44 AM, Nicolas Fouché <nic...@silentale.com>
wrote:
> --
>
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en
> .
>
>

Nicolas Fouché

unread,
Nov 30, 2009, 9:31:14 AM11/30/09
to mongodb-user
We know how they got there. We grab emails, parse them and insert
their content in MongoDB (quoted printable decoding, base 64 decoding,
charset conversion). So it can results on bad characters.

We can fix the problem yes. But we also have to retrieve the bad
documents already inserted, to re-process them.

But once this problem is fixed, we're not protected from another
problems. And we'll be in the exact same situation where bad documents
were inserted in the database.

Is there really no way to validate BSON objects without hitting to
much the performance ? Drivers cannot do that ?

What would repair do ? Removing invalid messages ? If so, we don't
want to lose them, we need to find them.

On Nov 30, 2:32 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Have you tried a repair?
>
> Do you know how they got there in the first place?  --objcheck is  
> definitely a decent performance hit. Better to figure out how it  
> happened and block it there
>
> On Nov 30, 2009, at 7:44 AM, Nicolas Fouché <nico...@silentale.com>  

Dwight Merriman

unread,
Nov 30, 2009, 9:38:03 AM11/30/09
to mongod...@googlegroups.com
why not just run with --objcheck and see if it is fast enough for your purposes?  it may be just fine.

Eliot Horowitz

unread,
Nov 30, 2009, 9:42:56 AM11/30/09
to mongod...@googlegroups.com
the drivers shouldn't allow bad data to be inserted.
so its probably a driver bug that is causing it in the first place.
repair will remove the invalid objects, so you may not want to do that.
with the latest nightly, you can use $where to find objects that are
too as a first step.

Nicolas Fouché

unread,
Nov 30, 2009, 9:51:33 AM11/30/09
to mongodb-user
"you can use $where to find objects that are too as a first step." ?
there should be a typo.. Can you share a example command please ?

I'm trying to reproduce the problem with a simple snippet of code, to
see what happens in the driver. I'll repost once it's done.

About validate(), here is the result with a collection containing at
least one bad object. It seems correct.

> db["mycollection"].validate();
{"ns" : "mydb.mycollection" , "result" : "
validate
details: 0x2aac2c2d6d30 ofs:b19d30
firstExtent:94:3a838600 ns:mydb.mycollection
lastExtent:b3:20f2b100 ns:mydb.mycollection
# extents:6
datasize?:11254020 nrecords?:1152 lastExtentSize:10135040
padding:1.48
first extent:
loc:94:3a838600 xnext:94:3c623d00 xprev:null
ns:mydb.mycollection
size:8192 firstRecord:94:3a839198 lastRecord:94:3a8386b0
1152 objects found, nobj:1152
11272452 bytes data w/headers
11254020 bytes data wout/headers
deletedList: 0001111111000000001
deleted: n: 174 size: 10031836
nIndexes:10
mydb.mycollection.$_id_ keys:1152
mydb.mycollection.$xxx_1 keys:2297
mydb.mycollection.$xxx_1 keys:1152
mydb.mycollection.$xxx_1 keys:1152
mydb.mycollection.$xxx_1 keys:1152
mydb.mycollection.$xxx_1 keys:1257
mydb.mycollection.$xxx_1 keys:109926
mydb.mycollection.$xxx_1 keys:1152
mydb.mycollection.$xxx_1 keys:1152
mydb.mycollection.$xxx_1 keys:1152
" , "ok" : 1 , "valid" : true , "lastExtentSize" : 10135040}


On Nov 30, 3:42 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> the drivers shouldn't allow bad data to be inserted.
> so its probably a driver bug that is causing it in the first place.
> repair will remove the invalid objects, so you may not want to do that.
> with the latest nightly, you can use $where to find objects that are
> too as a first step.
>

Eliot Horowitz

unread,
Nov 30, 2009, 9:52:50 AM11/30/09
to mongod...@googlegroups.com
validate() won't find bad obje

Eliot Horowitz

unread,
Nov 30, 2009, 9:56:20 AM11/30/09
to mongod...@googlegroups.com
validate() doesn't check the objects themselves, just the collection consitency.
db.foo.find( { $where : "this.bsonsize() > 4000000" } , { _id : 1 } )
that will return all _ids where the object is over 4mb

Nicolas Fouché

unread,
Nov 30, 2009, 12:38:02 PM11/30/09
to mongodb-user
Thanks,

I created a new thread about the "TypeError: no c decoder for this
type yet" error raised by the Ruby driver
http://groups.google.com/group/mongodb-user/browse_thread/thread/319bd1925a0f219a

So, a first solution would be to uninstall "mongo_ext" from our
server.

We still need to retrieve the bad messages from our collections, so I
guess a script to iterate on all documents is our last solution.

Nicolas

On Nov 30, 3:56 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> validate() doesn't check the objects themselves, just the collection consitency.
> db.foo.find( { $where : "this.bsonsize() > 4000000" } , { _id : 1 } )
> that will return all _ids where the object is over 4mb
>

Nicolas Fouché

unread,
Dec 10, 2009, 9:42:28 AM12/10/09
to mongodb-user
Hi,

just discovered something today, about replication and documents
containing non-utf8 characters.

First, mongodump and mongorestore can dump and restore documents
containing non-utf8 characters without any warning.

Second, cloning fails if one document contains a non-utf8 character,
whereas live replication takes the document without any problem.

Here are the two scenarii, tested on 1.0.1, 1.1.3 and 1.1.4, with a
master/slave and a replica-pair setup. My dumps contains 8420
documents. Somewhere in the middle, there is one bad document:

1) start master, start slave. Once replication is alive, launch
mongorestore. Clone grabs 4576 documents, and then it replicates 1335,
and finally the remaining 2509 documents. Here are the logs of the
slave: http://pastie.org/737211
If I query the slave with the mongo console, I DO find the bad
document : http://pastie.org/737222

2) start master, launch mongorestore, and then start the slave.
Cloning tries to get the 8420 documents and it says that it skipped
the bad one. Here are the logs of the slave: http://pastie.org/737217
If I query the slave with the mongo console, I DO NOT find the bad
document.

These tests show that documents with non-utf8 characters are not
related to SERVER-434, as cloning does not interrupt, it only skips
documents. I'm still trying to find why SERVER-234 happens, not easy
with a very big database (more than 400GB in this case).

We're doing our best to avoid putting this kind of documents in mongo,
and it's easier now with the fixes appeared in the Ruby driver 0.18.1
(thanks BTW).
But would it be possible to make cloning behave the same way as
replication, i.e. having a consistent state between master and slave
(s) ?

Nicolas


On Nov 30, 6:38 pm, Nicolas Fouché <nico...@silentale.com> wrote:
> Thanks,
>
> I created a new thread about the "TypeError: no c decoder for this
> type yet" error raised by the Ruby driverhttp://groups.google.com/group/mongodb-user/browse_thread/thread/319b...
> ...
>
> read more »

Eliot Horowitz

unread,
Dec 10, 2009, 9:46:55 AM12/10/09
to mongod...@googlegroups.com
We probably don't want to do anything more than we currently to with --objcheck.
Any other checks are going to be slow, and the drivers are supposed to
guarantee no non-utf8 strings are added.
Have you tried this check with --objcheck?
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
>

Nicolas Fouché

unread,
Dec 10, 2009, 12:29:50 PM12/10/09
to mongodb-user
If I start the master with --objcheck, then the bad document is
skipped, and the restoration is a little slower. A count on the
collection returns 8419 documents (versus 8420 initially).
http://pastie.org/737390

With --objcheck and the Ruby driver 0.18.1, we'll be able to safely
insert documents.

My concerns are about **existing documents**, it's a production
database. I cannot just simply skip/forget these documents, I have to
find and fix them.
The fastest solution for that would be to replicate this database, and
logs of the slave would show the _id of the bad documents.

But, currently (1.1.4), the cloning phase only shows:

Thu Dec 10 14:04:14 done allocating datafile /data/slavedb-1.1.4/mydb.
1, size: 128MB, took 0.112 secs
Thu Dec 10 14:04:33 Assertion: Invalid dbref/code/string/symbol size
skipping corrupt object from mydb.mycollection
Thu Dec 10 14:04:54 resync: done mydb

I tried setting the log level to 5 (-vvvvv), but I still cannot see
the document _id.

Unless you tell me that you can do it in the next few hours/days, I'll
try to update cloner.cpp myself :)

Nicolas

PS: I don't know if you consider this as a bug, but the replication
seems to restart again and again when setting the log level to 6, with
or without --objcheck. Of course it only happens with a collection
containing a bad message. It goes well with log level 5 or inferior.
http://pastie.org/737517

PS2: Congratulations for 1.2.0 !

On Dec 10, 3:46 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> We probably don't want to do anything more than we currently to with --objcheck.
> Any other checks are going to be slow, and the drivers are supposed to
> guarantee no non-utf8 strings are added.
> Have you tried this check with --objcheck?
>
> ...
>
> read more »

Nicolas Fouché

unread,
Dec 10, 2009, 1:48:26 PM12/10/09
to mongodb-user
Ok, I can now show the _id of documents with non-utf8 characters.

Here is my diff on 1.2.0, if anyone is interested: http://pastie.org/737683

diff --git a/db/cloner.cpp b/db/cloner.cpp
index ea2f003..3752b71 100644
--- a/db/cloner.cpp
+++ b/db/cloner.cpp
@@ -111,6 +111,18 @@ namespace mongo {
/* assure object is valid. note this will slow us down a
little. */
if ( !tmp.valid() ) {
out() << "skipping corrupt object from " <<
from_collection << '\n';
+ // ---------------------
+ try {
+ out() << "Calling BSONObj::getField('_id') " <<
tmp.getField("_id") << '\n';
+ } catch (...) {
+ try {
+ out() << "Could not get field _id, trying
BSONObj::jsonString()" << tmp.jsonString() << '\n';
+ } catch (...) {
+ out() << "Could not get field _id, or jsonString
() :(((\n";
+ }
+ }
+ // ---------------------
+
continue;
}

Nicolas

On Dec 10, 6:29 pm, Nicolas Fouché <nico...@silentale.com> wrote:
> If I start the master with --objcheck, then the bad document is
> skipped, and the restoration is a little slower. A count on the
> collection returns 8419 documents (versus 8420 initially).http://pastie.org/737390
>
> With --objcheck and the Ruby driver 0.18.1, we'll be able to safely
> insert documents.
>
> My concerns are about **existing documents**, it's a production
> database. I cannot just simply skip/forget these documents, I have to
> find and fix them.
> The fastest solution for that would be to replicate this database, and
> logs of the slave would show the _id of the bad documents.
>
> But, currently (1.1.4), the cloning phase only shows:
>
> Thu Dec 10 14:04:14 done allocating datafile /data/slavedb-1.1.4/mydb.
> 1, size: 128MB, took 0.112 secs
> Thu Dec 10 14:04:33 Assertion: Invalid dbref/code/string/symbol size
> skipping corrupt object from mydb.mycollection
> Thu Dec 10 14:04:54 resync: done mydb
>
> I tried setting the log level to 5 (-vvvvv), but I still cannot see
> the document _id.
>
> Unless you tell me that you can do it in the next few hours/days, I'll
> try to update cloner.cpp myself :)
>
> Nicolas
>
> PS: I don't know if you consider this as a bug, but the replication
> seems to restart again and again when setting the log level to 6, with
> or without --objcheck. Of course it only happens with a collection
> containing a bad message. It goes well with log level 5 or inferior.http://pastie.org/737517
> ...
>
> read more »

Eliot Horowitz

unread,
Dec 10, 2009, 1:55:08 PM12/10/09
to mongod...@googlegroups.com
I pushed something similar to this to the master repo.
tries to print out the first element, which hopefully is readable.
Reply all
Reply to author
Forward
0 new messages