MongoException: could not initialize cursor across all shards because : Invalid BSONObj size

204 views
Skip to first unread message

Shi Shei

unread,
Nov 25, 2011, 4:59:31 AM11/25/11
to mongodb-user
Since yesterday we were using a replicaSet of 4 member. Now, we
changed it into a sharded system with 2 shards.
Mongo is balancing chunks to the new shard and all seemed to be ok.

However, a query executed through the Perl driver (v0.37) which worked
yesterday does not work today. The error message I get is now:
emergency 17536 query error: could not initialize cursor across all
shards because : Invalid BSONObj size: -286331154 (0xEEEEEEEE) first
element: _id: 768324454 @

I tested the same using the java driver (v2.6.5) and got the same
error:
Exception in thread "main" com.mongodb.MongoException: could not
initialize cursor across all shards because : Invalid BSONObj size:
-286331154 (0xEEEEEEEE) first element: _id: 793227605 @ offerstore/
s141:27018,s142:27018,s144:27018,s143:27018
at com.mongodb.MongoException.parse(MongoException.java:82)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310)
at com.mongodb.DBCursor._check(DBCursor.java:360)
at com.mongodb.DBCursor._hasNext(DBCursor.java:490)
at com.mongodb.DBCursor.hasNext(DBCursor.java:515)

The query I'm executing is very simple:
{ "clickCount" : { $ne : null } }
I'm able to execute it without errors through the mongo shell.

I found out that when I set a batch size (say 100) in java, it works
without throwing the error. But why?
How to solve the issue without setting a batch size?

We're using Mongo v2.0.1 running on 64 Bit Linux.

Eliot Horowitz

unread,
Nov 25, 2011, 10:13:41 AM11/25/11
to mongod...@googlegroups.com
Can you send the mongos log?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Shi Shei

unread,
Nov 28, 2011, 4:00:05 AM11/28/11
to mongodb-user
Here it is:

Mon Nov 28 09:53:19 [LockPinger] cluster
sx175:20020,sx176:20020,sx177:20020 pinged successfully at Mon Nov 28
09:53:19 2011 by distributed lock pinger
'sx175:20020,sx176:20020,sx177:20020/
sx175:27019:1322467103:1804289383', sleeping for 30000ms
Mon Nov 28 09:55:57 [mongosMain] connection accepted from
192.168.0.124:41542 #5
Mon Nov 28 09:56:06 [conn5] DBException in process: could not


initialize cursor across all shards because : Invalid BSONObj size:

-286331154 (0xEEEEEEEE) first element: _id: 297885885 @ offerstore/


s141:27018,s142:27018,s144:27018,s143:27018

Mon Nov 28 09:56:17 [conn5] end connection 192.168.0.124:41542

Better formatted:
http://pastebin.com/raw.php?i=XeBNkQZT

It's not much more than I've already posted.
Let me know how I can help you more to track down the issue.
Thanks!

On Nov 25, 4:13 pm, Eliot Horowitz <el...@10gen.com> wrote:
> Can you send the mongos log?
>

Kristina Chodorow

unread,
Nov 28, 2011, 1:06:57 PM11/28/11
to mongodb-user
Sounds like there's a corrupt document on your offerstore/
s141:27018,s142:27018,s144:27018,s143:27018 shard. Did a mongod ever
crash without journaling enabled? Do you have a clean copy of that
data somewhere?

Shi Shei

unread,
Nov 29, 2011, 7:48:53 AM11/29/11
to mongodb-user
At the time we were using 1.8.2 - without journaling of course. It's
quite possible that a mongod crashed at the time.
Since a few days we are using 2.0.1 where journaling is enabled by
default.

Our data change continuously. So we can't have a "clean copy" of it.
It's always an outdated backup.
That's why a replica set is good for - isn't it? Also, with journaling
enabled, data should stay consistent even when the server crashes.

Well, do we need to repair our database to overcome the issue?

Kristina Chodorow

unread,
Nov 29, 2011, 12:56:24 PM11/29/11
to mongodb-user
The advantage of replication is that you have a higher chance of
having a clean copy of the data somewhere. If you have a member of
the replica set that hasn't crashed, you could resync the primary from
it and everything should work fine.

If every member of the set has crashed at some point without
journaling enabled, you should repair. Repairing removes any
corruption (it doesn't fix it, so you'll lose corrupt documents), so
if you're going to go that route, I'd recommend trying it on a couple
different members until you find the one with the most recoverable
data, then using that to re-seed the set.

Journaling keeps data safe through crashes, but it doesn't
retroactively fix corrupted data.

Reply all
Reply to author
Forward
0 new messages