Secondary will not build indexes if crash/restart during STARTUP2

102 views
Skip to first unread message

Nimi Wariboko Jr

unread,
Sep 28, 2012, 2:20:31 PM9/28/12
to mongod...@googlegroups.com
We are attempting to migrate ~300GB of data and ~100GB of indexes to another replica set. We previously had a machine on HDD that could sync perfectly but was perpetually 24 hours behind. We upgraded to a server with SDDs, and we tried to initial sync twice, and both times it has shutdown unexpectedly. 

Fri Sep 28 04:06:04 [initandlisten] connection accepted from :37317 #1748 (18 connections now open)
Fri Sep 28 04:06:06 [rsSync] 40000000/374169652 10%
Fri Sep 28 04:06:18 [rsSync] 42000000/374169652 11%


***** SERVER RESTARTED *****


Fri Sep 28 04:09:35 [initandlisten] MongoDB starting : pid=898 port=27018 dbpath=/var/lib/mongodb 64-bit
Fri Sep 28 04:09:35 [initandlisten] db version v2.2.0, pdfile version 4.5
Fri Sep 28 04:09:35 [initandlisten] git version: f5e83eae9cfbec7fb7a071321928f00d1b0c5207

The shutdown is unsolved, but that wouldn't really be an issue if it wasn't for the fact that the machine came up as a SECONDARY, and eligible for primary with no indexes. We weren't aware of the issue at first so we stepped down our other machine, and the fact that the new machine had no indexes (except _id), caused a lot of trouble.

This may be unrelated as well, but we also had a lot of assertion failures in our logs.

Fri Sep 28 14:25:10 [rsSync]  local.oplog.rs warning assertion failure _intents.size() < 2000000 src/mongo/db/dur_commitjob.h 101
0xade6e1 0x802c5a 0x78c4a0 0x78c4ff 0x78c7d2 0x78c8ed 0x78c95b 0xa07c1a 0x626166 0x62de4b 0x73954c 0xb6708a 0x64b5eb 0x65345e 0x6538f8 0x65394a 0x653d58 0x7c3659 0x7fe12f2a39ca 0x7fe12e64acdd 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x11a) [0x802c5a]
 /usr/bin/mongod(_ZN5mongo3dur9CommitJob4noteEPvi+0x280) [0x78c4a0]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents8_unspoolEv+0x4f) [0x78c4ff]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents7unspoolEv+0x52) [0x78c7d2]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents4pushERKNS0_11WriteIntentE+0x6d) [0x78c8ed]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl18declareWriteIntentEPvj+0x6b) [0x78c95b]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl10writingPtrEPvj+0xa) [0xa07c1a]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails13addDeletedRecEPNS_13DeletedRecordENS_7DiskLocE+0x1a6) [0x626166]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE+0x1eb) [0x62de4b]
 /usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x6c) [0x73954c]
 /usr/bin/mongod(_ZN5mongo11_logOpObjRSERKNS_7BSONObjE+0x27a) [0xb6708a]
 /usr/bin/mongod(_ZN5mongo7replset8SyncTail15applyOpsToOplogEPSt5dequeINS_7BSONObjESaIS3_EE+0x4b) [0x64b5eb]
 /usr/bin/mongod(_ZN5mongo7replset8SyncTail16oplogApplicationEv+0x48e) [0x65345e]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0xb8) [0x6538f8]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2a) [0x65394a]
 /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xa8) [0x653d58]
 /usr/bin/mongod() [0x7c3659]
 /lib/libpthread.so.0(+0x69ca) [0x7fe12f2a39ca]
 /lib/libc.so.6(clone+0x6d) [0x7fe12e64acdd]

I checked google and found that it was harmless, but the assertion ended up crashing mongodb here:

Fri Sep 28 15:20:42 [rsSyncNotifier] dbexception in groupCommit causing immediate shutdown: 13524 out of memory AlignedBuilder
Fri Sep 28 15:20:42 gc1
Fri Sep 28 15:20:42 Got signal: 6 (Aborted).

Fri Sep 28 15:20:42 Backtrace:
0xade6e1 0x5582d9 0x7fe12e597af0 0x7fe12e597a75 0x7fe12e59b5c0 0xb503f7 0xa0a61a 0xa0a83a 0xa0818f 0xa0835c 0xad8d87 0xad9673 0xadb1b8 0x94ff58 0x95250c 0x9556cb 0x7c3659 0x7fe12f2a39ca 0x7fe12e64acdd 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5582d9]
 /lib/libc.so.6(+0x33af0) [0x7fe12e597af0]
 /lib/libc.so.6(gsignal+0x35) [0x7fe12e597a75]
 /lib/libc.so.6(abort+0x180) [0x7fe12e59b5c0]
 /usr/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x47) [0xb503f7]
 /usr/bin/mongod() [0xa0a61a]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl9commitNowEv+0x1a) [0xa0a83a]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl16_aCommitIsNeededEv+0x3f) [0xa0818f]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl14commitIfNeededEb+0x4c) [0xa0835c]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite7lockTopERNS_9LockStateE+0x57) [0xad8d87]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWrite6lockDBERKSs+0xe3) [0xad9673]
 /usr/bin/mongod(_ZN5mongo4Lock7DBWriteC1ERKNS_10StringDataE+0x58) [0xadb1b8]
 /usr/bin/mongod(_ZN5mongo7replset14BackgroundSync9hasCursorEv+0x68) [0x94ff58]
 /usr/bin/mongod(_ZN5mongo7replset14BackgroundSync9markOplogEv+0x2c) [0x95250c]
 /usr/bin/mongod(_ZN5mongo7replset14BackgroundSync14notifierThreadEv+0x10b) [0x9556cb]
 /usr/bin/mongod() [0x7c3659]
 /lib/libpthread.so.0(+0x69ca) [0x7fe12f2a39ca]
 /lib/libc.so.6(clone+0x6d) [0x7fe12e64acdd]


 I am out of ideas on what we can do to solve this problem, as stated before this was our second initial sync.

Nimi Wariboko Jr

unread,
Sep 28, 2012, 2:35:52 PM9/28/12
to mongod...@googlegroups.com
My best guess for the final crash, is that the out of memory errors are caused by the fact no indexes are built. The primary does have a lot of writes and updates and so when the secondary has to make these same writes/updates the fact there is no meaningful index on the data most likely causes the crash. Our only options seems to be to take the the secondary out and build the indexes manually, or figure out why the indexes aren't being built during the sync.

The indexes were built before (same OS image, just different drives) and the logs aren't showing anything meaningful for the restart.

Kristina Chodorow

unread,
Oct 1, 2012, 10:53:50 AM10/1/12
to mongod...@googlegroups.com
It looks like mongod is being killed by the OOM killer, which happens sometimes on large index builds (you could check your kernel log, there should be a mention of why it's getting killed if that's it).  MongoDB will not come up properly if it gets killed halfway through initial sync (as you've noticed).

There are a couple possible ways of dealing with this:

1. Restore from a backup instead - do you have any backups you could use?
2. Add this member with the buildIndexes=false option.  This will take more work, as once it has initial synced, you'll have to take it back out of the set, build the indexes, take buildIndexes=false out of the config, and bring it up again.

Let me know if you want to do #2, as it's more complicated and I can go through what the steps are exactly.
Reply all
Reply to author
Forward
0 new messages