Mongodb Wired Tiger library panic exit

1,628 views
Skip to first unread message

Sherry Ummen

unread,
Mar 31, 2016, 5:37:26 AM3/31/16
to mongodb-user
Hi,

We have monogbd runnign on a server; its crashing randomnly atleast once in two days

Here is the log

2016-03-28T04:21:49.921+0300 E STORAGE  WiredTiger (-27550) [1459128109:913851][1216:8792152871504], log-server: C:\server\pr\journal\*: FindFirstFile: error return: 1450
2016-03-28T04:21:49.923+0300 E STORAGE  WiredTiger (-27550) [1459128109:922858][1216:8792152871504], log-server: dirlist journal prefix WiredTigerPreplog: error return: 1450
2016-03-28T04:21:49.923+0300 E STORAGE  WiredTiger (-27550) [1459128109:923858][1216:8792152871504], log-server: log pre-alloc server error: error return: 1450
2016-03-28T04:21:49.923+0300 E STORAGE  WiredTiger (-27550) [1459128109:923858][1216:8792152871504], log-server: log server error: error return: 1450
2016-03-28T04:22:29.843+0300 E STORAGE  WiredTiger (-27550) [1459128149:841224][1216:8792152871504], file:WiredTiger.wt, session.checkpoint: C:\server\pr\WiredTiger.turtle.set: error return: 1450
2016-03-28T04:22:29.843+0300 E STORAGE  WiredTiger (-27550) [1459128149:843228][1216:8792152871504], checkpoint-server: checkpoint server error: error return: 1450
2016-03-28T04:22:29.843+0300 E STORAGE  WiredTiger (-31804) [1459128149:843228][1216:8792152871504], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2016-03-28T04:22:29.844+0300 I -        Fatal Assertion 28558
2016-03-28T04:22:29.854+0300 I CONTROL  Stack trace failed, SymInitialize failed with error 8
2016-03-28T04:22:29.854+0300 I CONTROL 
2016-03-28T04:22:29.854+0300 I -       


Could someone please help me to understand what is going wrong. Thanks

-Sherry

Sherry Ummen

unread,
Apr 1, 2016, 2:53:54 AM4/1/16
to mongodb-user
We are using 3.0.6 version with WT

Kevin Adistambha

unread,
Apr 5, 2016, 3:30:42 AM4/5/16
to mongodb-user

Hi Sherry,

2016-03-28T04:21:49.921+0300 E STORAGE WiredTiger (-27550) [1459128109:913851][1216:8792152871504], log-server: C:\server\pr\journal*: FindFirstFile: error return: 1450

The error reported is that the WiredTiger engine is unable to list the contents of the directory C:\server\pr\journal\.

Could you check:

  • Is the directory C:\server\pr\journal\ local on the server, or mounted remote (eg. NFS, CIFS, …)?
  • Does the mongod process have permission to read and write to that directory?
  • Is there some kind of process that could prevent the mongod process from reading that directory (e.g. antivirus, etc.)?
  • Are you sure that there isn’t already a running instance of mongod.exe on this machine?

I would also encourage you to upgrade to the latest 3.0.x version (3.0.11 currently) for critical bug fixes.

Best regards,
Kevin

Sherry Ummen

unread,
Apr 5, 2016, 5:13:43 AM4/5/16
to mongodb-user
Ok I will upgrade and monitor.

Also the check list you made seems invalid in my case because this is happening randomnly. Which means at some point it worked and then failed.

Anyway I will upgrade and see

Sherry Ummen

unread,
Apr 5, 2016, 5:14:01 AM4/5/16
to mongodb-user
Thank you

Tim Boudreau

unread,
Mar 22, 2017, 4:29:25 PM3/22/17
to mongodb-user
FWIW, I am able to reproduce this or a very similar crash (same assertion error number, at any rate).  I have a test harness for a bunch of Java unit tests, which creates a new temporary directory, starts mongodb, configures a client and then injects that into the tests.


To ensure that the server is up and running, after starting the process, the test harness attempts to open a socket connection to the server in a loop until it succeeds or fails repeatedly for long enough to assume that something is wrong (it immediately closes the connection on success - it is not actually used).

In order for this process to not make build times enormously long, I had the test harness wait only 150ms after launching the process before attempting to connect.  With recent versions of mongodb, the attempt to connect (or, from the stack trace, perhaps closing the connection was the trigger) would cause a very similar crash.  Increasing the delay to 500ms cured the problem.

I would guess that mongod starts listening for connections before it's fully initialized.

-Tim

Kevin Adistambha

unread,
Mar 23, 2017, 5:57:47 PM3/23/17
to mongodb-user

Hi Tim

I would guess that mongod starts listening for connections before it’s fully initialized.

Actually MongoDB will not listen for connections before all initialization processes have finished. This is reflected in the logs, where the line waiting for connections on port 27017 should be printed after all initialization steps are done. If you’re curious about this, you can increase the log level in the command line option using the —setParameter option. For example:

mongod --setParameter logLevel=5 <other required parameters>

will start the server with the highest log level. Please note that running at such an elevated logging level could be detrimental for performance, and is not recommended for production deployments.

Regarding the assertion error you saw, could you please provide more details:

  • What is your MongoDB version and your OS version
  • What command line parameters you used to start the server
  • Could you also provide a self-contained script that can reliably reproduce the assertion error you’re seeing

Best regards
Kevin

Tim Boudreau

unread,
Mar 24, 2017, 3:58:37 AM3/24/17
to mongod...@googlegroups.com

Regarding the assertion error you saw, could you please provide more details:

  • What is your MongoDB version and your OS version
db version v3.4.1
git version: 5e103c4f5583e2566a45d740225dc250baacfbd7
OpenSSL version: OpenSSL 1.0.2j  26 Sep 2016
allocator: tcmalloc
modules: none
build environment:
    distarch: x86_64
    target_arch: x86_64 

OS:  Gentoo Linux, kernel 4.9.4

Target filesystem is BTRFS, running on an SSD on a laptop.
  • What command line parameters you used to start the server
mongod --dbpath /tmp/$GENERATED_DIR_NAME --nojournal --smallfiles -nssize 1 --noprealloc --slowms 5 --port $RANDOM_PORT --maxconns 50 --nohttpinterface --syncdelay 0 --oplogsize 1 --diaglog 0
 
  • Could you also provide a self-contained script that can reliably reproduce the assertion error you’re seeing
That's straightforward, if not small:

cd mastfrog-parent
git checkout mongodb-crash
./init

The init script will check out a bunch of Git submodules, and then initiate a Maven build which should reproduce the crash.  You'll need Maven 3.3.x and mongod on the PATH, and JAVA_HOME set to a JDK 8 install (using Oracle's, not OpenJDK).  The mongodb-crash branch simply backs up two submodules to the commit before I increased the delay.

The code that fails is in 

giulius-modules/giulius-parent/giulius-mongodb-async/src/test/java/com/mastfrog/giulius/mongodb/async/MongoHarness.java

which you can see online here:

https://github.com/timboudreau/giulius/blob/master/giulius-mongodb-async/src/test/java/com/mastfrog/giulius/mongodb/async/MongoHarness.java

- search for the text 150 for the delay before connecting in the original code on the mongodb-crash branch (in master it now is a constant CONNECT_WAIT_MILLIS).

Failure is when building the subproject giulius-modules/giulius-parent/giulius-mongo-async (for repeated builds, you can just cd there and run mvn install).

Let me know if you need any further info.

-Tim

Kevin Adistambha

unread,
Mar 27, 2017, 1:34:03 AM3/27/17
to mongodb-user

Hi Tim,

Thanks for the detailed response. I ran the script as you suggested, and managed to reproduce the Fatal Assertion 28558 error. Here’s the relevant part from the MongoDB log that I got:

2017-03-27T02:24:09.588+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2017-03-27T02:24:09.588+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2017-03-27T02:24:09.588+0000 I NETWORK  [signalProcessingThread] closing listening socket: 6
2017-03-27T02:24:09.588+0000 I NETWORK  [signalProcessingThread] closing listening socket: 7
2017-03-27T02:24:09.588+0000 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-29001.sock
2017-03-27T02:24:09.588+0000 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
2017-03-27T02:24:09.588+0000 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-03-27T02:24:09.589+0000 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
2017-03-27T02:24:09.589+0000 I -        [conn2] end connection 127.0.0.1:43644 (2 connections now open)
2017-03-27T02:24:09.592+0000 I -        [conn1] end connection 127.0.0.1:43642 (1 connection now open)
2017-03-27T02:24:09.593+0000 E STORAGE  [thread2] WiredTiger error (2) [1490581449:593073][13464:0x7fd5d2c54700], file:WiredTiger.wt, WT_SESSION.checkpoint: /tmp/mongo-1490581448019/WiredTiger.turtle.set: handle-open: open: No such file or directory
2017-03-27T02:24:09.593+0000 E STORAGE  [thread2] WiredTiger error (2) [1490581449:593144][13464:0x7fd5d2c54700], checkpoint-server: checkpoint server error: No such file or directory
2017-03-27T02:24:09.593+0000 E STORAGE  [thread2] WiredTiger error (-31804) [1490581449:593152][13464:0x7fd5d2c54700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-03-27T02:24:09.593+0000 I -        [thread2] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-03-27T02:24:09.593+0000 I -        [thread2]
***aborting after fassert() failure
2017-03-27T02:24:09.599+0000 F -        [thread2] Got signal: 6 (Aborted).

Could you please confirm that you’re seeing a similar progression of event in your log?

If you are, then apparently your test script removes MongoDB’s data files before MongoDB has a chance to shut itself down. This is evident in the message No such file or directory. When you specified your script to wait 500ms, it seems to allow MongoDB to shut down cleanly.

It’s worth noting that you’re using MMAPv1 specific command line parameters that doesn’t apply to WiredTiger, such as --smallfiles, --nssize, --noprealloc. Another option that you don’t seem to need is --oplogsize, since you are not running a replica set. The --diaglog option was also deprecated since MongoDB 2.6.

Also, please note that MongoDB was not tested with BTRFS, so there could be other subtle interaction between the filesystem and WiredTiger that resulted in obscure errors. For the recommended setting, please have a look at the Production Notes.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages