Problem with replica set initial sync

734 views
Skip to first unread message

iragsdale

unread,
Nov 2, 2011, 12:35:40 AM11/2/11
to mongodb-user
Hi all. I have replica set set up as one shard in a cluster, and over
the weekend the primary went down and the secondary took over as
primary. I've been trying to bring a new box up to act as the
primary, and the initial sync has failed at least 3-4 times.

It copies over the initial data (takes about 12 hours), and then
begins applying the oplog, and each time I get this in the logs:

Tue Nov 1 17:37:57 [rsSync] build index done 54628 records 0.294 secs
Tue Nov 1 17:37:57 [rsSync] replSet initial sync cloning db: config
Tue Nov 1 17:37:57 [rsSync] replSet initial sync query minValid
Tue Nov 1 17:38:04 [rsSync] replSet initial oplog application from
mongobackup.cloudsmtp.com:27018 starting at Nov 1 07:11:22:8 to Nov
1 17:37:57:d7
Tue Nov 1 17:38:04 [rsSync] replSet info adding missing object
Tue Nov 1 17:38:04 [rsSync] Assertion failure !e.eoo() db/
repl/../../bson/bsonobjbuilder.h 120
0x57eeb6 0x589d6b 0x827cbe 0x82bbc1 0x82da98 0x823168 0x82439a
0x824820 0xaa4560 0x7f57856639ca 0x7f5784c1270d
mongod(_ZN5mongo12sayDbContextEPKc+0x96) [0x57eeb6]
mongod(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x589d6b]

mongod(_ZN5mongo11ReplSetImpl27initialSyncOplogApplicationEPKNS_6MemberENS_6OpTimeES4_
+0x341e) [0x827cbe]
mongod(_ZN5mongo11ReplSetImpl18_syncDoInitialSyncEv+0x1261)
[0x82bbc1]
mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x28) [0x82da98]
mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x58) [0x823168]
mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x4a) [0x82439a]
mongod(_ZN5mongo15startSyncThreadEv+0xa0) [0x824820]
mongod(thread_proxy+0x80) [0xaa4560]
/lib/libpthread.so.0(+0x69ca) [0x7f57856639ca]
/lib/libc.so.6(clone+0x6d) [0x7f5784c1270d]
Tue Nov 1 17:38:04 [rsSync] replSet initial sync failing, error
applying oplog 0 assertion db/repl/../../bson/bsonobjbuilder.h:120
Tue Nov 1 17:38:04 [rsSync] replSet initial sync failed during
applyoplog
Tue Nov 1 17:38:04 [rsSync] replSet cleaning up [1]
Tue Nov 1 17:38:04 [rsSync] replSet cleaning up [2]
Tue Nov 1 17:38:10 [rsSync] replSet initial sync pending
Tue Nov 1 17:38:10 [rsSync] replSet syncing to:
mongobackup.cloudsmtp.com:27018
Tue Nov 1 17:38:10 [rsSync] replSet initial sync drop all databases
Tue Nov 1 17:38:10 [rsSync] dropAllDatabasesExceptLocal 71

It then begins the sync all over again. How can I add a new member to
this replica set? Anything I can do to further debug the problem?

Eliot Horowitz

unread,
Nov 2, 2011, 1:35:34 AM11/2/11
to mongod...@googlegroups.com
What version is this with?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

iragsdale

unread,
Nov 2, 2011, 2:21:31 AM11/2/11
to mongodb-user
Sorry, forgot to add that info. This is on 2.0.1, linux 64 bit
running on ubuntu.

iragsdale

unread,
Nov 2, 2011, 1:37:12 PM11/2/11
to mongodb-user
Is there any other info I can provide?

iragsdale

unread,
Nov 3, 2011, 1:57:11 AM11/3/11
to mongodb-user
Are there really no suggestions to be made? This is happening
consistently and repeatably on two of our shards.

Nat

unread,
Nov 3, 2011, 2:37:38 AM11/3/11
to mongod...@googlegroups.com
Did you try to bring it down and repair it before trying to sync up?

iragsdale

unread,
Nov 3, 2011, 3:23:07 AM11/3/11
to mongodb-user
Bring down the primary instance and repair it? That would require
many hours of downtime, since my only replica failed. I'm really
hoping to avoid doing that.

Nat

unread,
Nov 3, 2011, 7:58:38 AM11/3/11
to mongod...@googlegroups.com
the error you were getting usually means that the database is corrupted. So when you tried to resync, it failed.
-----Original Message-----
From: iragsdale <ian.ra...@gmail.com>
Sender: mongod...@googlegroups.com
Date: Thu, 3 Nov 2011 00:23:07
To: mongodb-user<mongod...@googlegroups.com>
Reply-To: mongod...@googlegroups.com
Subject: [mongodb-user] Re: Problem with replica set initial sync

iragsdale

unread,
Nov 3, 2011, 12:22:17 PM11/3/11
to mongodb-user
Well, I seem to have gotten things up and running. Reverting back to
mongodb 2.0.0 eliminated that error, and allowed the initial sync to
begin. It got mostly caught up, and then ran into a problem where it
was trying to add a unique index on _id to some capped collections
which didn't have unique _id values. Fortunately that was a simple
problem to fix, and after that it caught up completely, and I have a
working replica again. I'm just about to try it on our other shard to
see if I get the same result.

- Ian

iragsdale

unread,
Nov 3, 2011, 3:28:13 PM11/3/11
to mongodb-user
Well, I thought it was working OK, but I've got a new problem.

I have a set of capped collections (all very small) with no _id
indexes. They're updated pretty frequently. For some reason the
secondary mongodb keeps insisting on adding unique index on {_id:1} to
them as it's replicating (all I see in the log is "build index
whatever_db.rc_60x60_relayed_messages { _id: 1 }"), and then either
failing on the index create or failing on the next insert, because
there is no unique _id field. This stops replication, and I no longer
have a working secondary.

- Ian

I've spent some time looking through the oplog to see what is causing
th

iragsdale

unread,
Nov 3, 2011, 3:52:05 PM11/3/11
to mongodb-user
I accidentally cut off the part where I said I've looked through the
oplog to try to figure out why the indexes are being created, and I
can't see anything. None of them have _id indexes on the primary.

Greg Studer

unread,
Nov 4, 2011, 5:42:56 PM11/4/11
to mongodb-user
Think the issue you're running into is SERVER-2019 and related -
basically replication requires an _id field. Workarounds for now
include moving those collections to local, if they are storing just
transient information... but they won't be replicated. Alternately,
if you create an _id index across your capped collections and resync,
replication should work properly - this will require all the elements
in that collection to have an _id field.

iragsdale

unread,
Nov 22, 2011, 4:28:34 PM11/22/11
to mongodb-user
Sounds like even capped collections should have _id fields and indexes
added by default, or it should be specified that they won't work with
replication - the documentation for capped collections doesn't mention
this at all.

In order to add the _id index, I'm going to have to dump and re-create
those collections, because without the initial _id field, the
documents will have to grow. Thankfully these are very small
collections, so it will just be a huge pain.

This seems like a very serious bug to me - just using a capped
collection can cause replication to totally fail, even if the capped
collection is the only thing failing to sync. This seems like it
should be highlighted strongly in the capped collection documentation,
especially due to the fact that it could be extremely hard to add that
index after the fact - if these were multi-gigabyte collections that
had to be dumped & reinserted, that would be a real operational
problem.

- Ian

Greg Studer

unread,
Nov 27, 2011, 4:04:53 PM11/27/11
to mongodb-user
Definitely should be much better marked in the docs, will update.
Believe the previous behavior was replication implicitly creating an
_id index, but this affects performance, which people generally used
capped collections to improve.

Grégoire Seux

unread,
Dec 1, 2011, 8:48:36 AM12/1/11
to mongod...@googlegroups.com
I am having the same issue with the 2.0.0 version. Is there any bug report on this issue ?

Greg Studer

unread,
Dec 1, 2011, 11:41:50 AM12/1/11
to mongodb-user
SERVER-2019, DOCS-86 for documentation. Definitely vote on the issue
so it gets more visibility.
Reply all
Reply to author
Forward
0 new messages