Error: [repl writer worker 15] out of memory.

145 views
Skip to first unread message

Darshan Shah

unread,
Apr 8, 2016, 4:20:48 PM4/8/16
to mongodb-user
I am running MongoDb 3.0.6 with 15 shards, with each shard being a 3 member replicaset.
The third member of each replicaset is a hidden member running on a same single machine, i.e. one machine hosts the secondary members for all the 15 shards.
In this setup, one secondary hidden mongod is consistently failing with out of memory even though there is a lot of memory available.
Note that this secondary is part of the replicaset that is the primary shard for unsharded data and there is only a single small database with only one collection in this freshly created setup. 

Any help / pointers highly appreciated - Thanks!

The error in the log is:

2016-04-08T15:22:46.632-0400 I STORAGE  [FileAllocator] allocating new datafile /data_mount/mongodata/shard001/local/local.24, filling with zeroes...
2016-04-08T15:22:46.634-0400 I STORAGE  [FileAllocator] done allocating datafile /data_mount/mongodata/shard001/local/local.24, size: 2047MB,  took 0.001 secs
2016-04-08T15:22:46.634-0400 I STORAGE  [FileAllocator] allocating new datafile /data_mount/mongodata/shard001/local/local.25, filling with zeroes...
2016-04-08T15:22:46.635-0400 I STORAGE  [FileAllocator] done allocating datafile /data_mount/mongodata/shard001/local/local.25, size: 2047MB,  took 0.001 secs
2016-04-08T15:22:46.642-0400 I REPL     [rsSync] ******
2016-04-08T15:22:46.642-0400 I REPL     [rsSync] initial sync pending
2016-04-08T15:22:46.642-0400 I REPL     [ReplicationExecutor] syncing from: mongoserver002.public.mycomp.com:29111
2016-04-08T15:22:46.644-0400 I REPL     [rsSync] initial sync drop all databases
2016-04-08T15:22:46.644-0400 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 1
2016-04-08T15:22:46.644-0400 I REPL     [rsSync] initial sync clone all databases
2016-04-08T15:22:46.645-0400 I REPL     [rsSync] initial sync data copy, starting syncup
2016-04-08T15:22:46.645-0400 I REPL     [rsSync] oplog sync 1 of 3
2016-04-08T15:22:46.645-0400 I REPL     [rsSync] oplog sync 2 of 3
2016-04-08T15:22:46.645-0400 I REPL     [rsSync] initial sync building indexes
2016-04-08T15:22:46.645-0400 I REPL     [rsSync] oplog sync 3 of 3
2016-04-08T15:22:46.646-0400 I REPL     [rsSync] initial sync finishing up
2016-04-08T15:22:46.646-0400 I REPL     [rsSync] replSet set minValid=57080501:1
2016-04-08T15:22:46.647-0400 I REPL     [rsSync] initial sync done
2016-04-08T15:22:46.649-0400 I REPL     [ReplicationExecutor] transition to RECOVERING
2016-04-08T15:22:46.649-0400 I REPL     [ReplicationExecutor] transition to SECONDARY
2016-04-08T15:22:47.581-0400 I REPL     [ReplicationExecutor] could not find member to sync from
2016-04-08T15:22:47.583-0400 I REPL     [ReplicationExecutor] Member mongoserver009.public.mycomp.com:29112 is now in state SECONDARY
2016-04-08T15:33:26.684-0400 I REPL     [ReplicationExecutor] syncing from: mongoserver002.public.mycomp.com:29111
2016-04-08T15:33:26.685-0400 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to mongoserver002.public.mycomp.com:29111
2016-04-08T15:33:26.686-0400 I INDEX    [repl writer worker 15] allocating new ns file /data_mount/mongodata/shard001/myglobal/myglobal.ns, filling with zeroes...
2016-04-08T15:33:26.691-0400 F -        [repl writer worker 15] out of memory.

 0xf75549 0xf74e29 0x142202f 0xd1fe95 0xd4bd2c 0xd4f08f 0x914f35 0x9221b9 0x9222f1 0xca8904 0xcab815 0xf0881b 0xfc3684 0x3625c079d1 0x36258e89dd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B75549"},{"b":"400000","o":"B74E29"},{"b":"400000","o":"102202F"},{"b":"400000","o":"91FE95"},{"b":"400000","o":"94BD2C"},{"b":"400000","o":"94F08F"},{"b":"400000","o":"514F35"},{"b":"400000","o":"5221B9"},{"b":"400000","o":"5222F1"},{"b":"400000","o":"8A8904"},{"b":"400000","o":"8AB815"},{"b":"400000","o":"B0881B"},{"b":"400000","o":"BC3684"},{"b":"3625C00000","o":"79D1"},{"b":"3625800000","o":"E89DD"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.3.3.el6.x86_64", "version" : "#1 SMP Fri Dec 12 16:05:43 EST 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7F7AA372EDE22BA34234ADA10A8AF2E665681140" }, { "b" : "7FFF3A6AA000", "elfType" : 3, "buildId" : "E752C57E2BD5883E5CE1211B21FC5859B4520D90" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "174BE4CAD6B9CDE9463A1ED403A8A45667042F1B" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "58B33C1A58DAD354D36CB87FD14997F06BF1497D" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F860E546E4C495D35A272CBD8B5E6B31DD4B1A7F" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "114449B5D483AC2FAEE7DD8CD72F086D2C9E7BC0" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "3BD3452E91CB76304CCC9665D8742E55EF2EB903" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "743EA30ADF8E973D45AB59200C307F5ABC2749F6" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "BAA4D2FBB45B33028E58C6C4524D5F0D69C0FD60" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "7918143A0396110395C28377A1F202C769EFAC65" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "733AD3A438B5A695F7A63A77413F9B2C8C94E8E6" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "3BFA45BCE0E82E1D90D37A0CC8630F97F2003BF5" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "AF7DB57AA4CA5C35AEEEFDB94CC1B97827C710FA" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "07BD094ED077DA56CDD76B8F562586745BA01326" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "54FF9CD35F9E7E253F66C458DC902307190E0F80" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "D053BB4FF0C2FC983842F81598813B9B931AD0D1" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "129AD60521E5EF66722CA7C3DA6FC854DA5A8CDB" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "3BCCABE75DC61BBA81AAE45D164E26EF4F9F55DB" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "D02CF15D0B3F2216AD2C54CA960F028BF3C5E4D0" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "2D0F26E648D9661ABD83ED8B4BBE8F2CFA50393B" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf75549]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x49) [0xf74e29]
 mongod(tc_new+0x1AF) [0x142202f]
 mongod(_ZN5mongo14NamespaceIndex4initEPNS_16OperationContextE+0x425) [0xd1fe95]
 mongod(_ZN5mongo26MMAPV1DatabaseCatalogEntryC1EPNS_16OperationContextERKNS_10StringDataES5_bb+0x13C) [0xd4bd2c]
 mongod(_ZN5mongo12MMAPV1Engine23getDatabaseCatalogEntryEPNS_16OperationContextERKNS_10StringDataE+0x1AF) [0xd4f08f]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextERKNS_10StringDataEPb+0x105) [0x914f35]
 mongod(_ZN5mongo6Client7Context11_finishInitEv+0xF9) [0x9221b9]
 mongod(_ZN5mongo6Client7ContextC2EPNS_16OperationContextERKSsb+0x61) [0x9222f1]
 mongod(_ZN5mongo4repl8SyncTail9syncApplyEPNS_16OperationContextERKNS_7BSONObjEb+0x284) [0xca8904]
 mongod(_ZN5mongo4repl14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x65) [0xcab815]
 mongod(_ZN5mongo10threadpool6Worker4loopERKSs+0x2FB) [0xf0881b]
 mongod(+0xBC3684) [0xfc3684]
 libpthread.so.0(+0x79D1) [0x3625c079d1]
 libc.so.6(clone+0x6D) [0x36258e89dd]
-----  END BACKTRACE  -----


Here are the other parameters which shows limits, free memory & other info:

[mongouser@mongoserver001 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3097936
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[mongouser@mongoserver001 ~]$ /sbin/sysctl vm.max_map_count
vm.max_map_count = 131072
[mongouser@mongoserver001 ~]$ free -ltg
             total       used       free     shared    buffers     cached
Mem:           378          7        370          0          0          3
Low:           378          7        370
High:            0          0          0
-/+ buffers/cache:          3        374
Swap:           62          0         62
Total:         440          7        433
[mongouser@mongoserver001 ~]$

Ankur Raina

unread,
Apr 18, 2016, 6:42:41 AM4/18/16
to mongodb-user

Hi Darshan,

The configuration of your deployment that you have discussed looks much like the one that you have provided in the other thread except it had Arbiters instead of the Hidden Replica Set members as in current configuration. Please review the overcommit settings as recommended there.

The third member of each replicaset is a hidden member running on a same single machine, i.e. one machine hosts the secondary members for all the 15 shards.

Is each of the member running on a separate virtual machine and following the production notes recommendations?

Since all secondaries have to write the same data as the primary, this one machine may be very overloaded.

In the log, I see that the initial sync has completed and a new namespace file (myglobal.ns) has been allocated successfully. After this, the instance failed with “out of memory” error.

  • As it is performing an initial sync, did you try emptying its dbpath directory and restarting the process again?

  • You should monitor the exact memory usage of the system around this time (if you are able to replicate) may be with an automated script recording db.serverStatus().mem and system iostat every second. Or you can also use mongostat 1 to get these statistics. This will help you identify the spikes in memory usage.

Regards
Ankur

Reply all
Reply to author
Forward
0 new messages