Troubleshooting instance crash

36 views
Skip to first unread message

fernande...@gmail.com

unread,
Aug 2, 2017, 12:58:30 AM8/2/17
to mongodb-user
Hi,

We are running a Mongodb replica set with three nodes version 3.2.14, all of them running on CentOS Linux release 7.1.1503 with kernel version 3.10.0-229.14.1.el7.x86_64.

Today, one of our secondary nodes crash with the following errors:

2017-08-01T12:10:57.174+0200 I COMMAND  [conn431] command **** command: dbStats { dbStats: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:187 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { R: 1 } } } protocol:op_query 228ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn2996] command ****.files-7001 command: getMore { getMore: 22136202299973, collection: "files-7001", batchSize: 10 } planSummary: IXSCAN { destination: 1, fileName: 1, active: 1 } cursorid:22136202299973 keysExamined:10 docsExamined:10 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:10 reslen:23228 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 190475 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 190ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn2809] command ****.files-3 command: getMore { getMore: 21738151959370, collection: "files-3", batchSize: 10 } planSummary: IXSCAN { destination: 1, fileName: 1, active: 1 } cursorid:21738151959370 keysExamined:10 docsExamined:10 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:10 reslen:29238 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 104112 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 104ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn3019] command ****.files-6841 command: getMore { getMore: 22105935130565, collection: "files-6841", batchSize: 10 } planSummary: IXSCAN { destination: 1, fileName: 1, active: 1 } cursorid:22105935130565 keysExamined:10 docsExamined:10 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:10 reslen:50568 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 174026 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 174ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn2801] command ****.collection-***********-YAL command: find { find: "collection-***********-YAL", filter: { node: 0, destination: "YAL", type: { $exists: false }, clients: 5001 }, sort: { node: 1, destination: 1, incoming: 1, contract: 1 } } planSummary: IXSCAN { node: 1, destination: 1, incoming: 1, contract: 1 } keysExamined:9 docsExamined:9 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:6 reslen:10553 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 221604 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 222ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn3020] command ****.files-6581 command: getMore { getMore: 22059538225790, collection: "files-6581", batchSize: 10 } planSummary: IXSCAN { destination: 1, fileName: 1, active: 1 } cursorid:22059538225790 keysExamined:0 docsExamined:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:10 reslen:37784 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 141312 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 141ms
2017-08-01T12:10:57.175+0200 I COMMAND  [conn3020] command ****.files-6581 command: getMore { getMore: 22059538225790, collection: "files-6581", batchSize: 10 } planSummary: IXSCAN { destination: 1, fileName: 1, active: 1 } cursorid:22059538225790 keysExamined:0 docsExamined:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:10 reslen:37784 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 141312 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 141ms
2017-08-01T12:10:57.577+0200 F -        [thread1] Invalid access at address: 0x78
2017-08-01T12:10:57.608+0200 F -        [thread1] Got signal: 11 (Segmentation fault).

 0x1349b82 0x1348cd9 0x1349058 0x7f60cc30c370 0x7f60cc307bb0 0x1a0c2bb 0x1a105e5 0x1a5eac5 0x1a593ba 0x1a59917 0x1a5b19a 0x1ac37e6 0x7f60cc304dc5 0x7f60cc03376d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F49B82","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F48CD9"},{"b":"400000","o":"F49058"},{"b":"7F60CC2FD000","o":"F370"},{"b":"7F60CC2FD000","o":"ABB0","s":"__pthread_mutex_unlock"},{"b":"400000","o":"160C2BB"},{"b":"400000","o":"16105E5","s":"__wt_split_multi"},{"b":"400000","o":"165EAC5","s":"__wt_evict"},{"b":"400000","o":"16593BA"},{"b":"400000","o":"1659917"},{"b":"400000","o":"165B19A","s":"__wt_evict_thread_run"},{"b":"400000","o":"16C37E6","s":"__wt_thread_run"},{"b":"7F60CC2FD000","o":"7DC5"},{"b":"7F60CBF3C000","o":"F776D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.14", "gitVersion" : "92f6668a768ebf294bd4f494c50f48459198e6a3", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-229.14.1.el7.x86_64", "version" : "#1 SMP Tue Sep 15 15:05:51 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "9033F908262138155963DFBEDCA6620B91383BC9" }, { "b" : "7FFDCDE6A000", "elfType" : 3, "buildId" : "E62F10F01C3E0DBFFBCD03D2359B63AFB3CBC24E" }, { "b" : "7F60CD224000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "BB96EE99138B19FECDAB55E80A1728B648ECAD50" }, { "b" : "7F60CCE3D000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "B154203FB7C05AEE29D5D6F6C000305191209FE4" }, { "b" : "7F60CCC35000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7F60CCA31000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7F60CC72F000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7F60CC519000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7F60CC2FD000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7F60CBF3C000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "626F78F2F88860B8844A5FAC68823D9605F802B2" }, { "b" : "7F60CD491000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "0874508AA13D28E3F48637C1D5BF067BA8D9FD3A" }, { "b" : "7F60CBCEE000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7F60CBA07000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7F60CB803000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "3A1166709F88740C49E060731832E3FAD2DFB66B" }, { "b" : "7F60CB5D1000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7F60CB3BB000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "E45643F27F3B3E960F3691AFC6EC27A98EF7B46B" }, { "b" : "7F60CB1AC000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7F60CAFA8000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F60CAD8E000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7F60CAB67000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7F60CA906000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "298B19C64B19995F2AA4DA7B852E90BA5302F630" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1349b82]
 mongod(+0xF48CD9) [0x1348cd9]
 mongod(+0xF49058) [0x1349058]
 libpthread.so.0(+0xF370) [0x7f60cc30c370]
 libpthread.so.0(__pthread_mutex_unlock+0x0) [0x7f60cc307bb0]
 mongod(+0x160C2BB) [0x1a0c2bb]
 mongod(__wt_split_multi+0x85) [0x1a105e5]
 mongod(__wt_evict+0xA55) [0x1a5eac5]
 mongod(+0x16593BA) [0x1a593ba]
 mongod(+0x1659917) [0x1a59917]
 mongod(__wt_evict_thread_run+0x6A) [0x1a5b19a]
 mongod(__wt_thread_run+0x16) [0x1ac37e6]
 libpthread.so.0(+0x7DC5) [0x7f60cc304dc5]
 libc.so.6(clone+0x6D) [0x7f60cc03376d]
-----  END BACKTRACE  -----

Due it is the first time that we need to deal with a server crash, please can someone give me some tips about the procedure to be follow to troubleshoot the issue.

Regards

Weishan Ang

unread,
Aug 2, 2017, 11:17:15 AM8/2/17
to mongodb-user
It might be worth creating a jira ticket with MongoDB.

Wan Bachtiar

unread,
Aug 4, 2017, 12:18:08 AM8/4/17
to mongodb-user

We are running a Mongodb replica set with three nodes version 3.2.14, all of them running on CentOS Linux release 7.1.1503 with kernel version 3.10.0-229.14.1.el7.x86_64.

Hi Fernandez,

Are you running using WiredTiger storage engine ?

If so, and based on the specific information you’ve provided below:

  • MongoDB version 3.2.14
  • mongod Log : [thread1] Invalid access at address: 0x78
  • Backtrace log :
mongod(__wt_split_multi+0x85) [0x1a105e5]
mongod(__wt_evict+0xA55) [0x1a5eac5]

It is very likely that you’ve encountered an issue described at SERVER-29850. Please see ticket for more information.

Please try upgrading to the current latest revision v3.2.16 (download center: previous version) where it contains the fix for the issue.

See Upgrade to the latest revision of MongoDB for upgrade details.

Regards,
Wan.

Reply all
Reply to author
Forward
0 new messages