2016-04-08T15:22:46.632-0400 I STORAGE [FileAllocator] allocating new datafile /data_mount/mongodata/shard001/local/local.24, filling with zeroes...2016-04-08T15:22:46.634-0400 I STORAGE [FileAllocator] done allocating datafile /data_mount/mongodata/shard001/local/local.24, size: 2047MB, took 0.001 secs2016-04-08T15:22:46.634-0400 I STORAGE [FileAllocator] allocating new datafile /data_mount/mongodata/shard001/local/local.25, filling with zeroes...2016-04-08T15:22:46.635-0400 I STORAGE [FileAllocator] done allocating datafile /data_mount/mongodata/shard001/local/local.25, size: 2047MB, took 0.001 secs2016-04-08T15:22:46.642-0400 I REPL [rsSync] ******2016-04-08T15:22:46.642-0400 I REPL [rsSync] initial sync pending2016-04-08T15:22:46.642-0400 I REPL [ReplicationExecutor] syncing from: mongoserver002.public.mycomp.com:291112016-04-08T15:22:46.644-0400 I REPL [rsSync] initial sync drop all databases2016-04-08T15:22:46.644-0400 I STORAGE [rsSync] dropAllDatabasesExceptLocal 12016-04-08T15:22:46.644-0400 I REPL [rsSync] initial sync clone all databases2016-04-08T15:22:46.645-0400 I REPL [rsSync] initial sync data copy, starting syncup2016-04-08T15:22:46.645-0400 I REPL [rsSync] oplog sync 1 of 32016-04-08T15:22:46.645-0400 I REPL [rsSync] oplog sync 2 of 32016-04-08T15:22:46.645-0400 I REPL [rsSync] initial sync building indexes2016-04-08T15:22:46.645-0400 I REPL [rsSync] oplog sync 3 of 32016-04-08T15:22:46.646-0400 I REPL [rsSync] initial sync finishing up2016-04-08T15:22:46.646-0400 I REPL [rsSync] replSet set minValid=57080501:12016-04-08T15:22:46.647-0400 I REPL [rsSync] initial sync done2016-04-08T15:22:46.649-0400 I REPL [ReplicationExecutor] transition to RECOVERING2016-04-08T15:22:46.649-0400 I REPL [ReplicationExecutor] transition to SECONDARY2016-04-08T15:22:47.581-0400 I REPL [ReplicationExecutor] could not find member to sync from2016-04-08T15:22:47.583-0400 I REPL [ReplicationExecutor] Member mongoserver009.public.mycomp.com:29112 is now in state SECONDARY2016-04-08T15:33:26.684-0400 I REPL [ReplicationExecutor] syncing from: mongoserver002.public.mycomp.com:291112016-04-08T15:33:26.685-0400 I REPL [SyncSourceFeedback] replset setting syncSourceFeedback to mongoserver002.public.mycomp.com:291112016-04-08T15:33:26.686-0400 I INDEX [repl writer worker 15] allocating new ns file /data_mount/mongodata/shard001/myglobal/myglobal.ns, filling with zeroes...2016-04-08T15:33:26.691-0400 F - [repl writer worker 15] out of memory.
0xf75549 0xf74e29 0x142202f 0xd1fe95 0xd4bd2c 0xd4f08f 0x914f35 0x9221b9 0x9222f1 0xca8904 0xcab815 0xf0881b 0xfc3684 0x3625c079d1 0x36258e89dd----- BEGIN BACKTRACE -----{"backtrace":[{"b":"400000","o":"B75549"},{"b":"400000","o":"B74E29"},{"b":"400000","o":"102202F"},{"b":"400000","o":"91FE95"},{"b":"400000","o":"94BD2C"},{"b":"400000","o":"94F08F"},{"b":"400000","o":"514F35"},{"b":"400000","o":"5221B9"},{"b":"400000","o":"5222F1"},{"b":"400000","o":"8A8904"},{"b":"400000","o":"8AB815"},{"b":"400000","o":"B0881B"},{"b":"400000","o":"BC3684"},{"b":"3625C00000","o":"79D1"},{"b":"3625800000","o":"E89DD"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.3.3.el6.x86_64", "version" : "#1 SMP Fri Dec 12 16:05:43 EST 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7F7AA372EDE22BA34234ADA10A8AF2E665681140" }, { "b" : "7FFF3A6AA000", "elfType" : 3, "buildId" : "E752C57E2BD5883E5CE1211B21FC5859B4520D90" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "174BE4CAD6B9CDE9463A1ED403A8A45667042F1B" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "58B33C1A58DAD354D36CB87FD14997F06BF1497D" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F860E546E4C495D35A272CBD8B5E6B31DD4B1A7F" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "114449B5D483AC2FAEE7DD8CD72F086D2C9E7BC0" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "3BD3452E91CB76304CCC9665D8742E55EF2EB903" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "743EA30ADF8E973D45AB59200C307F5ABC2749F6" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "BAA4D2FBB45B33028E58C6C4524D5F0D69C0FD60" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "7918143A0396110395C28377A1F202C769EFAC65" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "733AD3A438B5A695F7A63A77413F9B2C8C94E8E6" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "3BFA45BCE0E82E1D90D37A0CC8630F97F2003BF5" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "AF7DB57AA4CA5C35AEEEFDB94CC1B97827C710FA" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "07BD094ED077DA56CDD76B8F562586745BA01326" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "54FF9CD35F9E7E253F66C458DC902307190E0F80" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "D053BB4FF0C2FC983842F81598813B9B931AD0D1" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "129AD60521E5EF66722CA7C3DA6FC854DA5A8CDB" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "3BCCABE75DC61BBA81AAE45D164E26EF4F9F55DB" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "D02CF15D0B3F2216AD2C54CA960F028BF3C5E4D0" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "2D0F26E648D9661ABD83ED8B4BBE8F2CFA50393B" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf75549] mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf74e29] mongod(tc_new+0x1AF) [0x142202f] mongod(_ZN5mongo14NamespaceIndex4initEPNS_16OperationContextE+0x425) [0xd1fe95] mongod(_ZN5mongo26MMAPV1DatabaseCatalogEntryC1EPNS_16OperationContextERKNS_10StringDataES5_bb+0x13C) [0xd4bd2c] mongod(_ZN5mongo12MMAPV1Engine23getDatabaseCatalogEntryEPNS_16OperationContextERKNS_10StringDataE+0x1AF) [0xd4f08f] mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextERKNS_10StringDataEPb+0x105) [0x914f35] mongod(_ZN5mongo6Client7Context11_finishInitEv+0xF9) [0x9221b9] mongod(_ZN5mongo6Client7ContextC2EPNS_16OperationContextERKSsb+0x61) [0x9222f1] mongod(_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEb+0x284) [0xca8904] mongod(_ZN5mongo4repl14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x65) [0xcab815] mongod(_ZN5mongo10threadpool6Worker4loopERKSs+0x2FB) [0xf0881b] mongod(+0xBC3684) [0xfc3684] libpthread.so.0(+0x79D1) [0x3625c079d1] libc.so.6(clone+0x6D) [0x36258e89dd]----- END BACKTRACE -----
[mongouser@mongoserver001 ~]$ ulimit -acore file size (blocks, -c) 0data seg size (kbytes, -d) unlimitedscheduling priority (-e) 0file size (blocks, -f) unlimitedpending signals (-i) 3097936max locked memory (kbytes, -l) unlimitedmax memory size (kbytes, -m) unlimitedopen files (-n) 131072pipe size (512 bytes, -p) 8POSIX message queues (bytes, -q) 819200real-time priority (-r) 0stack size (kbytes, -s) 10240cpu time (seconds, -t) unlimitedmax user processes (-u) unlimitedvirtual memory (kbytes, -v) unlimitedfile locks (-x) unlimited[mongouser@mongoserver001 ~]$ /sbin/sysctl vm.max_map_countvm.max_map_count = 131072[mongouser@mongoserver001 ~]$ free -ltg total used free shared buffers cachedMem: 378 7 370 0 0 3Low: 378 7 370High: 0 0 0-/+ buffers/cache: 3 374Swap: 62 0 62Total: 440 7 433[mongouser@mongoserver001 ~]$
Hi Darshan,
The configuration of your deployment that you have discussed looks much like the one that you have provided in the other thread except it had Arbiters instead of the Hidden Replica Set members as in current configuration. Please review the overcommit settings as recommended there.
The third member of each replicaset is a hidden member running on a same single machine, i.e. one machine hosts the secondary members for all the 15 shards.
Is each of the member running on a separate virtual machine and following the production notes recommendations?
Since all secondaries have to write the same data as the primary, this one machine may be very overloaded.
In the log, I see that the initial sync has completed and a new namespace file (myglobal.ns) has been allocated successfully. After this, the instance failed with “out of memory” error.
As it is performing an initial sync, did you try emptying its dbpath directory and restarting the process again?
You should monitor the exact memory usage of the system around this time (if you are able to replicate) may be with an automated script recording db.serverStatus().mem
and system iostat
every second. Or you can also use mongostat 1
to get these statistics. This will help you identify the spikes in memory usage.
Regards
Ankur