Mongodb 1.8.1 Relica Set secondary node report Invalid BSONObj size

xiangjun wu

unread,

Apr 25, 2011, 9:55:11 AM4/25/11

to mongodb-user

Hello,

I always seen the following staff from Mongodb replica set secondary
node:
on Apr 25 08:37:48 [conn5311] Assertion: 10334:Invalid BSONObj size:
-286331154 (0xEEEEEEEE) first element: _id:
ObjectId('4d2fa060b22397186299f851')
0x55ece9 0x4ede7e 0x7213f8 0x65b62a 0x78d476 0x795f91 0x799715
0x79bc54 0x79c7f7 0x64543b 0x752225 0x757938 0x8a3b3e 0x8b6a40
0x7f7d0746e7e1 0x7f7d06a34ead
/opt/mongodb-linux-x86_64-1.8.1/bin/mongod(_ZN5mongo11msgassertedEiPKc
+0x129) [0x55ece9]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x46e) [0x4ede7e]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo11BtreeCursor7currentEv+0x68) [0x7213f8]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo11UserQueryOp4nextEv+0x86a) [0x65b62a]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo12QueryPlanSet6Runner6nextOpERNS_7QueryOpE+0x56)
[0x78d476]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo12QueryPlanSet6Runner3runEv+0x861) [0x795f91]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE+0x255) [0x799715]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE+0x64)
[0x79bc54]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo16MultiPlanScanner5runOpERNS_7QueryOpE+0x17)
[0x79c7f7]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_
+0xafb) [0x64543b]
/opt/mongodb-linux-x86_64-1.8.1/bin/mongod() [0x752225]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE
+0x5b8) [0x757938]
/opt/mongodb-linux-x86_64-1.8.1/bin/
mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a3b3e]
/opt/mongodb-linux-x86_64-1.8.1/bin/mongod(thread_proxy+0x80)
[0x8b6a40]
/lib64/libpthread.so.0(+0x77e1) [0x7f7d0746e7e1]
/lib64/libc.so.6(clone+0x6d) [0x7f7d06a34ead]
Mon Apr 25 08:37:48 [conn5311] assertion 10334 Invalid BSONObj size:
-286331154 (0xEEEEEEEE) first element: _id:
ObjectId('4d2fa060b22397186299f851') ns:dpu-
production.monitoring.indices query:{ url: /^http://www\.abc\.com/ }

Below is the information for problem document:
db.monitoring.indices.findOne({"_id" :
ObjectId("4d2fa060b22397186299f851")});
{
"_id" : ObjectId("4d2fa060b22397186299f851"),

"url" : "http://www.abc.com/profiles?http://www.abc.com/
profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc.com/profiles"
}

Kyle Banker

unread,

Apr 25, 2011, 12:19:19 PM4/25/11

to mongod...@googlegroups.com

This is usually caused by some kind of corruption in the primary
node's oplog. Were you ever running the primary as 1.8.0 with
journaling enabled?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

xiangjun wu

unread,

Apr 26, 2011, 1:23:24 AM4/26/11

to mongodb-user

We are using 1.8.1 without journaling enabled.
Do we need to enable it?

> > profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc...."

Scott Hernandez

unread,

Apr 26, 2011, 1:26:42 AM4/26/11

to mongod...@googlegroups.com

Did you have any unclean shutdowns (like a crash or fault)? Are any other secondaries having issues or just that one?

xiangjun wu

unread,

Apr 27, 2011, 2:45:53 AM4/27/11

to mongodb-user

Yes. EC2 crashed last week due to EBS issue.
So, what is your suggestion to fix or workaround this issue?
/opt/mongodb-linux-x86_64-1.8.1/bin/mongod --fork --dbpath=/data/db --
logpath=/data/mongodb.log --replSet production --rest

On Apr 26, 1:26 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Did you have any unclean shutdowns (like a crash or fault)? Are any other
> secondaries having issues or just that one?
>

xiangjun wu

unread,

Apr 27, 2011, 2:46:24 AM4/27/11

to mongodb-user

Yes. EC2 crashed last week due to EBS issue.
So, what is your suggestion to fix or workaround this issue?
/opt/mongodb-linux-x86_64-1.8.1/bin/mongod --fork --dbpath=/data/db --
logpath=/data/mongodb.log --replSet production --rest

On Apr 26, 1:26 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:

> Did you have any unclean shutdowns (like a crash or fault)? Are any other
> secondaries having issues or just that one?
>

xiangjun wu

unread,

Apr 27, 2011, 2:53:23 AM4/27/11

to mongodb-user

Only one secondary. We've not seen such error in primary mongodb node.
How can we workaround this issue?

On Apr 26, 1:26 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:

> Did you have any unclean shutdowns (like a crash or fault)? Are any other
> secondaries having issues or just that one?
>

foreverman

unread,

Apr 27, 2011, 4:38:01 AM4/27/11

to mongodb-user

Seems the error will be triggered when the result set of query
includes that document(4d2fa060b22397186299f851).
We tried to delete that document:

Model.collection.remove( "url" => "http://www.abc.com/profiles?http://

www.abc.com/
profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc...."

)

It seems we can not remove it:
=> Model.collection.count( "url" => "http://www.abc.com/profiles?
http://www.abc.com/
profileshttp://www.abc.com/profileshttp://www.abc.com/profileshttp://www.abc...."
)
=>1

Any suggestion on this issue?

Thanks.

Kyle Banker

unread,

Apr 27, 2011, 10:34:15 AM4/27/11

to mongod...@googlegroups.com

Can you reproduce the error in the mongod log by querying the
secondary? Or does this happen when querying the primary. If it's it
secondary, then that's probably where the corruption lies. But if all
of your nodes crashes, then you may have corruption on all of them. In
that case, it may be best to run a database repair on the primary and
then resync the secondaries. This will require downtime. Is that an
option?

Feifei Jia

unread,

Apr 28, 2011, 12:45:47 AM4/28/11

to mongod...@googlegroups.com

On Wed, Apr 27, 2011 at 10:34:15AM -0400, Kyle Banker wrote:
> Can you reproduce the error in the mongod log by querying the
> secondary? Or does this happen when querying the primary. If it's it
> secondary, then that's probably where the corruption lies. But if all
> of your nodes crashes, then you may have corruption on all of them. In
> that case, it may be best to run a database repair on the primary and
> then resync the secondaries. This will require downtime. Is that an
> option?

Thanks Kyle, our database is large, and repair will cost a lot of time.
What about dumping the data, and restore back?

1. dump data
2. drop collection
3. restore
4. create index

Will this solve the issue?

-- 8< --

--
Cheers
Feifei Jia

Scott Hernandez

unread,

Apr 28, 2011, 1:04:47 AM4/28/11

to mongod...@googlegroups.com

Were you able to tell if it was a problem with the all the replicas?

There is a mongodump repair option in recent versions but it locks, just like the global repair.

On Wed, Apr 27, 2011 at 9:45 PM, Feifei Jia <feif...@gmail.com> wrote:

On Wed, Apr 27, 2011 at 10:34:15AM -0400, Kyle Banker wrote:
> Can you reproduce the error in the mongod log by querying the
> secondary? Or does this happen when querying the primary. If it's it
> secondary, then that's probably where the corruption lies. But if all
> of your nodes crashes, then you may have corruption on all of them. In
> that case, it may be best to run a database repair on the primary and
> then resync the secondaries. This will require downtime. Is that an
> option?

Thanks Kyle, our database is large, and repair will cost a lot of time.
What about dumping the data, and restore back?

1. dump data
2. drop collection
3. restore
4. create index

It depends on the scope of the corruption. Do you have a good idea of that?

Will this solve the issue?

It is hard to say without the answers from the questions.

Feifei Jia

unread,

Apr 28, 2011, 2:13:12 AM4/28/11

to mongod...@googlegroups.com

On Wed, Apr 27, 2011 at 10:04:47PM -0700, Scott Hernandez wrote:
> Were you able to tell if it was a problem with the all the replicas?

Yes, it happened in all replicas.

> There is a mongodump repair option in recent versions but it locks, just
> like the global repair.

So we cannot use that feature now?

>
> On Wed, Apr 27, 2011 at 9:45 PM, Feifei Jia <feif...@gmail.com> wrote:
>
> > On Wed, Apr 27, 2011 at 10:34:15AM -0400, Kyle Banker wrote:
> > > Can you reproduce the error in the mongod log by querying the
> > > secondary? Or does this happen when querying the primary. If it's it
> > > secondary, then that's probably where the corruption lies. But if all
> > > of your nodes crashes, then you may have corruption on all of them. In
> > > that case, it may be best to run a database repair on the primary and
> > > then resync the secondaries. This will require downtime. Is that an
> > > option?
> >
> > Thanks Kyle, our database is large, and repair will cost a lot of time.
> > What about dumping the data, and restore back?
> >
> > 1. dump data
> > 2. drop collection
> > 3. restore
> > 4. create index
> >
>

> It depends on the scope of the corruption. Do you have a good idea of that?

We believe that only one collection corrupts.

>
> >
> > Will this solve the issue?
> >
>

> It is hard to say without the answers from the questions.

Do you think the problem is related to index? I'm not use remove() or
drop() to delete the collection.

Thanks in advance.

Feifei Jia

unread,

Apr 28, 2011, 5:10:53 AM4/28/11

to mongodb-user

After running db.collection.reIndex() command on Primary node, we
could run
the query successfully only on Primary, problem still remained on
Secondary node.

Is this because reIndex() operation will not be executed in Secondary
node?

Do we have to drop and re-create the index on Primary node?

Thanks.

> application_pgp-signature_part
> < 1KViewDownload

Kyle Banker

unread,

Apr 28, 2011, 10:58:19 AM4/28/11

to mongod...@googlegroups.com

Yes. Drop and recreate is equivalent, and that will propagate to
secondary nodes.

xiangjun wu

unread,

Apr 28, 2011, 1:59:56 PM4/28/11

to mongodb-user

After we run "db.foo.reIndex()" in primary node, the problem disappear
in primary nodes.
But it seems like secondary nodes didn't start to do reindexing
operations via watching its log.
So, we still run into the problem in secondary node.
It seems like reindexing operation will not propagate to secondary
nodes. Is that right?
To do full resync, it should work. But it is too painful.
Can you give us some ideas?
Thank you!!

On Apr 28, 10:58 pm, Kyle Banker <k...@10gen.com> wrote:
> Yes. Drop and recreate is equivalent, and that will propagate to
> secondary nodes.
>

Feifei Jia

unread,

Apr 28, 2011, 10:01:06 PM4/28/11

to mongod...@googlegroups.com

On Thu, Apr 28, 2011 at 10:58:19AM -0400, Kyle Banker wrote:
> Yes. Drop and recreate is equivalent, and that will propagate to
> secondary nodes.

So reIndex() will not propagate to secondary nodes? Is it OK to run
reIndex() on secondary nodes (since primary has already re-create
index)?

Scott Hernandez

unread,

Apr 28, 2011, 10:15:59 PM4/28/11

to mongod...@googlegroups.com

On Thu, Apr 28, 2011 at 7:01 PM, Feifei Jia <feif...@gmail.com> wrote:

On Thu, Apr 28, 2011 at 10:58:19AM -0400, Kyle Banker wrote:
> Yes. Drop and recreate is equivalent, and that will propagate to
> secondary nodes.

So reIndex() will not propagate to secondary nodes? Is it OK to run
reIndex() on secondary nodes (since primary has already re-create
index)?

Yes, that is how it is done.

You can see this chart for a list of commands and if they can be run on slaves, or require admin db access: http://www.mongodb.org/display/DOCS/List%20of%20database%20commands

Feifei Jia

unread,

Apr 29, 2011, 2:55:15 AM4/29/11

to mongod...@googlegroups.com

On Thu, Apr 28, 2011 at 07:15:59PM -0700, Scott Hernandez wrote:
> > On Thu, Apr 28, 2011 at 10:58:19AM -0400, Kyle Banker wrote:
> > > Yes. Drop and recreate is equivalent, and that will propagate to
> > > secondary nodes.
> >
> > So reIndex() will not propagate to secondary nodes? Is it OK to run
> > reIndex() on secondary nodes (since primary has already re-create
> > index)?
> >
>
> Yes, that is how it is done.
>
> You can see this chart for a list of commands and if they can be run on
> slaves, or require admin db access:
> http://www.mongodb.org/display/DOCS/List%20of%20database%20commands
>

First, we tried to run reIndex in secondary node, only to find the index was
dropped, but cannot be created again:

set1:SECONDARY> db.foo.reIndex()
{
"nIndexesWas" : 24,
"msg" : "indexes dropped for collection",
"errmsg" : "exception: no index name specified",
"code" : 12523,
"ok" : 0
}

-- 8< --

Does this indicate that user should not run reIndex on secondary nodes?

--
Cheers
Feifei Jia

Reply all

Reply to author

Forward