primary config server crashes with Fatal Assertion 13515 errno:28

323 views
Skip to first unread message

Darshan Shah

unread,
Aug 27, 2015, 11:20:38 AM8/27/15
to mongodb-user
I have a sharded Mongodb setup and only the primary config server crashes with Fatal Assertion 13515 error number 28 even though there is over 1TB of disk space available on the local ext4 disk.
Since this is a testing instance, there isn't much data or read / write activity on the servers either.

Any help is highly appreciated - thanks in advance.

Here is the log:


2015-08-26T22:20:01.974-0400 [conn39] command admin.$cmd command: fsync { fsync: true } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) W:9 reslen:51 113ms
2015-08-26T22:20:02.098-0400 [conn39] CMD fsync: sync:1 lock:0
2015-08-26T22:20:02.382-0400 [conn43] command admin.$cmd command: getLastError { getlasterror: 1, fsync: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:140 102ms
2015-08-26T22:20:02.710-0400 [conn78] command admin.$cmd command: getLastError { getlasterror: 1, fsync: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:122 117ms
2015-08-26T22:20:02.710-0400 [conn71] command admin.$cmd command: getLastError { getlasterror: 1, fsync: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:140 125ms
2015-08-26T22:20:02.907-0400 [conn71] command admin.$cmd command: getLastError { getlasterror: 1, fsync: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:122 103ms
2015-08-26T22:20:03.161-0400 [conn67] CMD fsync: sync:1 lock:0
2015-08-26T22:20:03.287-0400 [conn67] CMD fsync: sync:1 lock:0
2015-08-26T22:20:05.308-0400 [conn95] CMD fsync: sync:1 lock:0
2015-08-26T22:20:05.433-0400 [conn95] CMD fsync: sync:1 lock:0
2015-08-26T22:20:05.637-0400 [journal] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes;  b=0x422a000 errno:28 No space left on device
2015-08-26T22:20:05.637-0400 [journal] Fatal Assertion 13515
2015-08-26T22:20:05.665-0400 [journal] 0x121eb61 0x11be149 0x11a0c5d 0x11be5a8 0xa781f7 0xa78462 0xa6b538 0xa6dbe9 0xa6df27 0x1263929 0x7f2d441fadf3 0x7f2d434e91ad
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121eb61]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11be149]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x11a0c5d]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x1a8) [0x11be5a8]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e7) [0xa781f7]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0xa78462]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x158) [0xa6b538]
 /oas_alias/mongodb/mongodb/bin/mongod() [0xa6dbe9]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6df27]
 /oas_alias/mongodb/mongodb/bin/mongod() [0x1263929]
 /lib64/libpthread.so.0(+0x7df3) [0x7f2d441fadf3]
 /lib64/libc.so.6(clone+0x6d) [0x7f2d434e91ad]
2015-08-26T22:20:05.665-0400 [journal]

***aborting after fassert() failure


2015-08-26T22:20:05.669-0400 [journal] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x121eb61 0x121df3e 0x7f2d43428640 0x7f2d434285c9 0x7f2d43429cd8 0x11a0cca 0x11be5a8 0xa781f7 0xa78462 0xa6b538 0xa6dbe9 0xa6df27 0x1263929 0x7f2d441fadf3 0x7f2d434e91ad
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x121eb61]
 /oas_alias/mongodb/mongodb/bin/mongod() [0x121df3e]
 /lib64/libc.so.6(+0x35640) [0x7f2d43428640]
 /lib64/libc.so.6(gsignal+0x39) [0x7f2d434285c9]
 /lib64/libc.so.6(abort+0x148) [0x7f2d43429cd8]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x11a0cca]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x1a8) [0x11be5a8]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e7) [0xa781f7]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0xa78462]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x158) [0xa6b538]
 /oas_alias/mongodb/mongodb/bin/mongod() [0xa6dbe9]
 /oas_alias/mongodb/mongodb/bin/mongod(_ZN5mongo3dur9durThreadEv+0x297) [0xa6df27]
 /oas_alias/m
 

Darshan Shah

unread,
Aug 27, 2015, 11:36:01 AM8/27/15
to mongodb-user
More info that might be helpful:

Here are the file sizes from the log & journal folders of the config server:
[user@cfgsrvr configdb]$ ls -l
total 84820
-rw------- 1 user group 16777216 Aug 26 22:20 config.0
-rw------- 1 user group 16777216 Aug 25 16:51 config.ns
drwxrwsr-x 2 user group     4096 Aug 26 22:05 journal
-rw------- 1 user group 16777216 Aug 26 22:20 local.0
-rw------- 1 user group 16777216 Aug 26 22:20 local.ns
-rw-rw-r-- 1 user group 19718144 Aug 26 22:20 mongodb_configdb.log
-rw-rw-r-- 1 user group        6 Aug 25 16:38 mongodb_config.pid
-rwxrwxr-x 1 user group        6 Aug 25 16:38 mongod.lock
drwxrwsr-x 2 user group     4096 Aug 25 16:38 _tmp
[user@cfgsrvr configdb]$ cd journal/
[user@cfgsrvr journal]$ ls -l
total 38984
-rw------- 1 user group 39911424 Aug 26 22:20 j._52
-rw------- 1 user group       88 Aug 26 22:20 lsn


Available disk space:
[user@cfgsrvr configdb]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       2.1T  958G 1010G  49% /


Mount info:
[user@cfgsrvr journal]$ mount | grep -w /dev/sda4
/dev/sda4 on / type ext4 (rw,relatime,data=ordered)


Uname info:
[user@cfgsrvr journal]$ uname -sr
Linux 3.10.0-123.20.1.el7.x86_64


No quota is configured as running quota does not return anything

The file /etc/security/limits.conf does not have anything configured in it - everything is commented out.

Asya Kamsky

unread,
Aug 31, 2015, 1:53:37 AM8/31/15
to mongodb-user
Can you check dmesg and see if there are any errors related to this device?

You might also try smartctl -H /dev/sda4

Btw, we recommend mounting the disk the datafiles are on with noatime.
http://docs.mongodb.org/manual/administration/production-notes/

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/e8c0ac31-6787-48d3-a1b9-d92fc29a3706%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Darshan Shah

unread,
Aug 31, 2015, 1:11:15 PM8/31/15
to mongodb-user
I did not see any error messages in dmesg for the time when the config server crashed - the dmesg log is filled with these messages:
[Aug28 18:40] systemd-journald[536]: Vacuuming done, freed 100663296 bytes
[Aug29 07:05] systemd-journald[536]: Vacuuming done, freed 100663296 bytes
[Aug29 10:38] systemd-journald[536]: Vacuuming done, freed 100663296 bytes
[Aug29 11:57] systemd-journald[536]: Vacuuming done, freed 100663296 bytes
[Aug30 11:04] systemd-journald[536]: Vacuuming done, freed 100663296 bytes



I assumed that these options " bg, nolock, and noatime " applied only if I was using NFS so I ignored them.
Should we use all the above options even for ext4? 
Do these have anything to do with the current problem of crashing config server?

While the primary config server is down, I am able to read & write to the DB using Java driver as well as by connecting to mongos port using mongo shell on any of the servers.

Output from the smartctl command for all the 6 disks in the raid:
=== START OF READ SMART DATA SECTION ===
SMART
Health Status: OK


Note that I used the below command:
sudo /usr/sbin/smartctl -H -d megaraid,N /dev/sda4


I had restarted the servers last week and the primary server crashed again over the weekend. The disk usage in the journal folder is higher now:
[user@configsrvr configdb]$ df -h .

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       2.1T  1.6T  373G  82% /
[user@configsrvr configdb]$ ls -l
total 74576
-rw------- 1 user configsrvr 16777216 Aug 29 17:45 config.0
-rw------- 1 user configsrvr 16777216 Aug 28 16:33 config.ns
drwxrwsr-x 2 user configsrvr     4096 Aug 29 17:17 journal
-rw------- 1 user configsrvr 16777216 Aug 29 17:45 local.0
-rw------- 1 user configsrvr 16777216 Aug 29 17:45 local.ns
-rw-rw-r-- 1 user configsrvr  9227886 Aug 29 17:45 mongodb_configdb.log
-rw-rw-r-- 1 user configsrvr        6 Aug 28 16:06 mongodb_config.pid
-rwxrwxr-x 1 user configsrvr        6 Aug 28 16:06 mongod.lock
drwxrwsr-x 2 user configsrvr     4096 Aug 28 16:06 _tmp
[user@configsrvr configdb]$ cd journal/

[user@configsrvr journal]$ ls -l
total
104776
-rw------- 1 user configsrvr 107282432 Aug 29 17:45 j._45
-rw------- 1 user configsrvr        88 Aug 29 17:45 lsn
[user@configsrvr configdb]$



Please let me know if you need more info.

Thanks a ton!

Darshan Shah

unread,
Sep 2, 2015, 9:54:31 AM9/2/15
to mongodb-user
Forgot to mention that in the command, N is 0-5 for the 6 disks:

marcin...@nokaut.pl

unread,
Sep 2, 2015, 2:41:34 PM9/2/15
to mongodb-user
I have an idea. Please check if you are not run out of free inodes. Run command:
df -i

and show us the result.

Darshan Shah

unread,
Sep 8, 2015, 9:05:50 AM9/8/15
to mongodb-user
Hi,

Here is the output after the config server crashed again - looks like free inodes is not the cause:

[user@configsrvr ~]$ cd /data_mount/mongodata/configdb/

[user@configsrvr configdb]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       2.1T  1.1T  855G  57% /
[user@configsrvr configdb]$ df -i .
Filesystem        Inodes  IUsed     IFree IUse% Mounted on
/
dev/sda4      137953280 587433 137365847    1% /


This is what I see in the log - even though based on the above, there is a lot of space still left on the device:

2015-09-06T01:24:32.079-0400 [conn36] command admin.$cmd command: getLastError { getlasterror: 1, fsync: 1 } ntoreturn:1 keyUpdates:0 numYields:0  reslen:140 110ms
2015-09-06T01:24:34.881-0400 [journal] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes;  b=0x436c000 errno:28 No space left on device
2015-09-06T01:24:34.881-0400 [journal] Fatal Assertion 13515
2015-09-06T01:24:34.903-0400 [journal] 0x121eb61 0x11be149 0x11a0c5d 0x11be5a8 0xa781f7 0xa78462 0xa6b538 0xa6dbe9 0xa6df27 0x1263929 0x7ffedac28df3 0x7ffed9f171ad


Here are the disk space usages:

[user@configsrvr configdb]$ ls -lh
total
66M
-rw------- 1 user configsrvr  16M Sep  6 01:24 config.0
-rw------- 1 user configsrvr  16M Sep  3 15:26 config.ns
drwxrwsr
-x 2 user configsrvr 4.0K Sep  6 01:14 journal
-rw------- 1 user configsrvr  16M Sep  6 01:24 local.0
-rw------- 1 user configsrvr  16M Sep  6 01:24 local.ns
-rw-rw-r-- 1 user configsrvr 1.2M Sep  6 01:24 mongodb_configdb.log
-rw-rw-r-- 1 user configsrvr    5 Sep  3 15:01 mongodb_config.pid
-rwxrwxr-x 1 user configsrvr    5 Sep  3 15:01 mongod.lock
drwxrwsr
-x 2 user configsrvr 4.0K Sep  3 15:01 _tmp
[user@configsrvr configdb]$ cd journal
[user@configsrvr journal]$ ls -lh
total
164M
-rw------- 1 user configsrvr 129M Sep  6 01:14 j._106
-rw------- 1 user configsrvr  36M Sep  6 01:24 j._107
-rw------- 1 user configsrvr   88 Sep  6 01:24 lsn
[user@configsrvr journal]$ cd ../_tmp/
[user@configsrvr _tmp]$ ls -lh
total
0


Running out of ideas on what might be happening and how to resolve this - any help is highly appreciated.

Thanks!

Asya Kamsky

unread,
Sep 9, 2015, 5:56:31 PM9/9/15
to mongodb-user
You said so far:
local ext4 disk.
/dev/sda4 on / type ext4 (rw,relatime,data=ordered)
Linux 3.10.0-123.20.1.el7.x86_64

Can you check the limits for the user? ulimit -a I believe.

Can you also provide full output of "df -h"? (no ".") I'm wondering
if /tmp is mounted somewhere more limited in space.

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/9339651b-681e-40a9-87c0-700db33263a1%40googlegroups.com.

Vilius O

unread,
Jul 17, 2017, 10:42:40 AM7/17/17
to mongodb-user
I have very similar problem. The disk has free space. There are free inodes available. ulimits are default but no errors on that nor they are reached. mongo 2.6.5. no bad blocks found
Did OP solved the issue? 


log shows 
2017-07-15T18:30:02.219+0200 [conn37859] end connection 127.0.0.1:60761 (74 connections now open)
2017-07-15T18:30:09.237+0200 [journal] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes;  b=0x5812000 errno:28 No space left on device
2017-07-15T18:30:09.237+0200 [journal] Fatal Assertion 13515
2017-07-15T18:30:09.290+0200 [journal] 0x11e9b11 0x118b849 0x116e37d 0x118bca8 0xa6c9e7 0xa6cc52 0xa60958 0xa63369 0xa6376d 0x122e4a9 0x7eff4a82f851 0x7eff49bd590d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e9b11]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x118b849]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x116e37d]
 /usr/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x1a8) [0x118bca8]
 /usr/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e7) [0xa6c9e7]
 /usr/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0xa6cc52]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x158) [0xa60958]
 /usr/bin/mongod() [0xa63369]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x2fd) [0xa6376d]
 /usr/bin/mongod() [0x122e4a9]
 /lib64/libpthread.so.0(+0x7851) [0x7eff4a82f851]
 /lib64/libc.so.6(clone+0x6d) [0x7eff49bd590d]
2017-07-15T18:30:09.290+0200 [journal]

***aborting after fassert() failure


2017-07-15T18:30:09.309+0200 [journal] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x11e9b11 0x11e8eee 0x7eff49b1f920 0x7eff49b1f8a5 0x7eff49b21085 0x116e3ea 0x118bca8 0xa6c9e7 0xa6cc52 0xa60958 0xa63369 0xa6376d 0x122e4a9 0x7eff4a82f851 0x7eff49bd590d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e9b11]
 /usr/bin/mongod() [0x11e8eee]
 /lib64/libc.so.6(+0x32920) [0x7eff49b1f920]
 /lib64/libc.so.6(gsignal+0x35) [0x7eff49b1f8a5]
 /lib64/libc.so.6(abort+0x175) [0x7eff49b21085]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x116e3ea]
 /usr/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x1a8) [0x118bca8]
 /usr/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e7) [0xa6c9e7]
 /usr/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0xa6cc52]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x158) [0xa60958]
 /usr/bin/mongod() [0xa63369]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x2fd) [0xa6376d]
 /usr/bin/mongod() [0x122e4a9]
 /lib64/libpthread.so.0(+0x7851) [0x7eff4a82f851]
 /lib64/libc.so.6(clone+0x6d) [0x7eff49bd590d]

Kevin Adistambha

unread,
Jul 24, 2017, 1:47:11 AM7/24/17
to mongodb-user

Hi

I have very similar problem. The disk has free space. There are free inodes available. ulimits are default but no errors on that nor they are reached.
mongo 2.6.5.

Please note that you are replying to a thread that is almost 2 years old. Also, although at a glance the issue may look similar, the cause may be entirely different. It’s usually best to open a new thread with the exact description of your deployment, whether the issue is intermittent, and a description/code that can reliably reproduce the error. For example, you mentioned that “ulimits are default”. Could you include the output of ulimit -a, to confirm whether your ulimit settings follow the suggested values as in the ulimit settings page.

Having said that, MongoDB 2.6 series has been out of support since October 2016. Also, MongoDB 2.6.5 was released on October 2014, which is almost three years ago as of today. Is there a reason why you are using this particular version? You may want to consider upgrading to the latest version (currently 3.4.6) to ensure that you have the latest bugfixes and improvements, as the exact issue you’re seeing may have been fixed in later versions of MongoDB.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages