Intermittent Stack Trace Causing MongoDB to Crash

155 views
Skip to first unread message

John A

unread,
May 3, 2016, 11:50:38 AM5/3/16
to mongodb-user
We have a production 4 node replica set (1 hidden, non-voting data bearing member) running MongoDB v2.6.12.  We see an intermittent issue happening for the past 1-2 months where 1 of the 3 voting members just crashes.  When the crash happens there is a stack trace in the MongoDB log file:

2016-05-03T01:04:55.211-0400 [conn28306]  authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                                      mongo::printStackTrace+0x43
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\util\signal_handlers.cpp(107)                                 mongo::`anonymous namespace'::abruptQuit+0xf2
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\winsig.c(593)                   raise+0x1ed
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\abort.c(81)                     abort+0x18
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\db\instance.cpp(342)                                          mongo::mongoAbort+0x6e
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(647)                                               `
mongo::dur::groupCommitWithLimitedLocks'::`1'::catch$0+0x97
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\SELF_64_amd64\crt\prebuild\eh\amd64\handlers.asm(44)  _CallSettingFrame+0x20
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\prebuild\eh\frame.cpp(1337)         __CxxCallCatchBlock+0xeb
2016-05-03T01:04:55.273-0400 [journal] ntdll.dll                                                                                 RtlCaptureContext+0x3c3
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(643)                                               mongo::dur::groupCommitWithLimitedLocks+0x27
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(783)                                               mongo::dur::durThreadGroupCommit+0x6d
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(855)                                               mongo::dur::durThread+0x27a
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)             boost::`anonymous namespace'::thread_start_function+0x21
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)                 _callthreadstartex+0x17
2016-05-03T01:04:55.273-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)                 _threadstartex+0x7f
2016-05-03T01:04:55.273-0400 [journal] KERNEL32.DLL                                                                              BaseThreadInitThunk+0xd
2016-05-03T01:04:55.273-0400 [journal] SEVERE: Got signal: 22 (SIGABRT).
Backtrace:

This is affecting our production replica set only.  I do not know how to proceed with troubleshooting this issue.  Can someone please advise me on what I can do to find the source of the problem?

John A

unread,
May 4, 2016, 9:59:09 AM5/4/16
to mongodb-user
I found the first occurrence of this issue happening in our environment.  It looks like it started after one of the replica set members ran out of disk space.  The disk space issue has since been fixed but the crashes continue to intermittently happen with the same error messages.  Does this mean my database is corrupt?  Would a db.repairDatabase() fix it?  Is there any way to determine this for certain?

Here's the disk space message we got on the 17th of April along with the first crash message:

2016-04-17T01:14:22.408-0400 [conn3032850] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [#.#.#.#:53821]
2016-04-17T01:14:22.892-0400 [journal] error exception in dur::journal error appending to file D:\MongoDB\data\journal\j._390 8192 8192 errno:112 There is not enough space on the disk.
2016-04-17T01:14:22.908-0400 [journal] dbexception in groupCommitLL causing immediate shutdown: 13517 error appending to file D:\MongoDB\data\journal\j._390 8192 8192 errno:112 There is not enough space on the disk.
2016-04-17T01:14:22.924-0400 [journal] SEVERE: dur1
2016-04-17T01:14:25.689-0400 [conn3032205]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "mmsagentuser", key: "xxx" }
2016-04-17T01:14:28.127-0400 [conn3032682] end connection #.#.#.#:52202 (82 connections now open)
2016-04-17T01:14:28.127-0400 [initandlisten] connection accepted from #.#.#.#:52682 #3032928 (83 connections now open)
2016-04-17T01:14:28.127-0400 [conn3032928]  authenticate db: MyDB { authenticate: 1, user: "MyUser", nonce: "xxx", key: "xxx" }
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                                      mongo::printStackTrace+0x43
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\util\signal_handlers.cpp(107)                                 mongo::`anonymous namespace'::abruptQuit+0xf2
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\winsig.c(593)                   raise+0x1ed
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\abort.c(81)                     abort+0x18
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\instance.cpp(342)                                          mongo::mongoAbort+0x6e
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(647)                                               `
mongo::dur::groupCommitWithLimitedLocks'::`1'::catch$0+0x97
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\SELF_64_amd64\crt\prebuild\eh\amd64\handlers.asm(44)  _CallSettingFrame+0x20
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\prebuild\eh\frame.cpp(1337)         __CxxCallCatchBlock+0xeb
2016-04-17T01:14:32.517-0400 [journal] ntdll.dll                                                                                 RtlCaptureContext+0x3c3
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(643)                                               mongo::dur::groupCommitWithLimitedLocks+0x27
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(783)                                               mongo::dur::durThreadGroupCommit+0x6d
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(855)                                               mongo::dur::durThread+0x27a
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)             boost::`anonymous namespace'::thread_start_function+0x21
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)                 _callthreadstartex+0x17
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)                 _threadstartex+0x7f
2016-04-17T01:14:32.517-0400 [journal] KERNEL32.DLL                                                                              BaseThreadInitThunk+0xd
2016-04-17T01:14:32.517-0400 [journal] SEVERE: Got signal: 22 (SIGABRT).
Backtrace:
Enter code here...


John A

unread,
May 4, 2016, 10:42:18 AM5/4/16
to mongodb-user
I did more discovery and found out that before every crash there was an exception thrown about low disk space.  That is the cause of the problem.  I will work to address that issue.  Please consider this closed.


On Wednesday, May 4, 2016 at 9:59:09 AM UTC-4, John A wrote:
I found the first occurrence of this issue happening in our environment.  It looks like it started after one of the replica set members ran out of disk space.  The disk space issue has since been fixed but the crashes continue to intermittently happen with the same error messages.  Does this mean my database is corrupt?  Would a db.repairDatabase() fix it?  Is there any way to determine this for certain?

Here's the disk space message we got on the 17th of April along with the first crash message:

2016-04-17T01:14:22.408-0400 [conn3032850] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [#.#.#.#:53821]
2016-04-17T01:14:22.892-0400 [journal] error exception in dur::journal error appending to file D:\MongoDB\data\journal\j._390 8192 8192 errno:112 There is not enough space on the disk.
2016-04-17T01:14:22.908-0400 [journal] dbexception in groupCommitLL causing immediate shutdown: 13517 error appending to file D:\MongoDB\data\journal\j._390 8192 8192 errno:112 There is not enough space on the disk.
2016-04-17T01:14:22.924-0400 [journal] SEVERE: dur1
2016-04-17T01:14:25.689-0400 [conn3032205]  authenticate db: admin { authenticate: 1, nonce: "xxx", user: "mmsagentuser", key: "xxx" }
2016-04-17T01:14:28.127-0400 [conn3032682] end connection #.#.#.#:52202 (82 connections now open)
2016-04-17T01:14:28.127-0400 [initandlisten] connection accepted from #.#.#.#:52682 #3032928 (83 connections now open)
2016-04-17T01:14:28.127-0400 [conn3032928]  authenticate db: MyDB { authenticate: 1, user: "MyUser", nonce: "xxx", key: "xxx" }
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                                      mongo::printStackTrace+0x43
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\util\signal_handlers.cpp(107)                                 mongo::`anonymous namespace'::abruptQuit+0xf2
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\winsig.c(593)                   raise+0x1ed
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\abort.c(81)                     abort+0x18
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\instance.cpp(342)                                          mongo::mongoAbort+0x6e
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(647)                                               `
mongo::dur::groupCommitWithLimitedLocks'::`1'::catch$0+0x97
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\SELF_64_amd64\crt\prebuild\eh\amd64\handlers.asm(44)  _CallSettingFrame+0x20
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\prebuild\eh\frame.cpp(1337)         __CxxCallCatchBlock+0xeb
2016-04-17T01:14:32.517-0400 [journal] ntdll.dll                                                                                 RtlCaptureContext+0x3c3
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(643)                                               mongo::dur::groupCommitWithLimitedLocks+0x27
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(783)                                               mongo::dur::durThreadGroupCommit+0x6d
2016-04-17T01:14:32.517-0400 [journal] mongod.exe    ...\src\mongo\db\dur.cpp(855)                                               mongo::dur::durThread+0x27a
2016-04-17T01:14:32.517-0400 [
...
Reply all
Reply to author
Forward
0 new messages