flushing mmaps after unclean restart

DanYamins

unread,

Apr 10, 2016, 7:38:05 AM4/10/16

to mongodb-user

We had an unexpected shutdown of our mongodb database due to hardware failure. The database process restarted automatically, but something seems to be going wrong in the restart. The log file is included below. It shows that after "recover done", the system has been flushing mmaps, like this:

.... [initandlisten] couldn't lazily get rwlock

.... [DataFileSync] flushing mmaps took 332569ms for 62 files

It's been doing this for nearly 24 hours. Is this normal? How long should I wait before killing it and doing some kind of repair?

=======

Sat Apr 9 12:33:30.130 [initandlisten] MongoDB starting : pid=3447 port=29101 dbpath=/data/db_convnet_repair 64-bit host=dicarlo5

Sat Apr 9 12:33:30.130 [initandlisten]

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: You are running on a NUMA machine.

Sat Apr 9 12:33:30.130 [initandlisten] ** We suggest launching mongod like this to avoid performance problems:

Sat Apr 9 12:33:30.130 [initandlisten] ** numactl --interleave=all mongod [other options]

Sat Apr 9 12:33:30.130 [initandlisten]

Sat Apr 9 12:33:30.130 [initandlisten] db version v2.4.6

Sat Apr 9 12:33:30.130 [initandlisten] git version: b9925db5eac369d77a3a5f5d98a145eaaacd9673

Sat Apr 9 12:33:30.130 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49

Sat Apr 9 12:33:30.130 [initandlisten] allocator: tcmalloc

Sat Apr 9 12:33:30.130 [initandlisten] options: { bind_ip: "127.0.0.1", config: "/home/mongodb/db_convnet_repair.conf", dbpath: "/data/db_convnet_repair", logpath: "/h\

ome/mongodb/db_convnet_repair.log", port: 29101 }

Sat Apr 9 12:33:30.130 [initandlisten]

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: Readahead for /data/db_convnet_repair is set to 1024KB

Sat Apr 9 12:33:30.130 [initandlisten] ** We suggest setting it to 256KB (512 sectors) or less

Sat Apr 9 12:33:30.130 [initandlisten] ** http://dochub.mongodb.org/core/readahead

Sat Apr 9 12:33:51.183 [initandlisten] journal dir=/data/db_convnet_repair/journal

Sat Apr 9 12:33:51.183 [initandlisten] recover begin

Sat Apr 9 12:33:51.183 [initandlisten] info no lsn file in journal/ directory

Sat Apr 9 12:33:51.184 [initandlisten] recover lsn: 0

Sat Apr 9 12:33:51.184 [initandlisten] recover /data/db_convnet_repair/journal/j._0

Sat Apr 9 12:33:54.130 [initandlisten] recover cleaning up

Sat Apr 9 12:33:54.130 [initandlisten] removeJournalFiles

Sat Apr 9 12:34:19.100 [initandlisten] recover done

Sat Apr 9 12:34:19.100 [initandlisten] preallocating a journal file /data/db_convnet_repair/journal/prealloc.0

Sat Apr 9 12:39:42.249 [DataFileSync] flushing mmaps took 252124ms for 62 files

Sat Apr 9 12:45:38.040 [DataFileSync] flushing mmaps took 355791ms for 62 files

Sat Apr 9 12:45:48.517 [initandlisten] couldn't lazily get rwlock

Sat Apr 9 12:51:10.609 [DataFileSync] flushing mmaps took 332569ms for 62 files

Sat Apr 9 12:54:44.640 [DataFileSync] flushing mmaps took 214031ms for 64 files

Sat Apr 9 13:00:37.567 [DataFileSync] flushing mmaps took 352926ms for 64 files

Sat Apr 9 13:01:31.367 [initandlisten] couldn't lazily get rwlock

Sat Apr 9 13:06:49.986 [DataFileSync] flushing mmaps took 372418ms for 64 files

Sat Apr 9 13:11:25.073 [DataFileSync] flushing mmaps took 275087ms for 64 files

Sat Apr 9 13:17:10.028 [initandlisten] couldn't lazily get rwlock

Sat Apr 9 13:17:35.620 [DataFileSync] flushing mmaps took 370547ms for 64 files

Sat Apr 9 13:20:55.396 [DataFileSync] flushing mmaps took 199776ms for 65 files

Sat Apr 9 13:26:57.391 [DataFileSync] flushing mmaps took 361995ms for 65 files

Sat Apr 9 13:27:56.662 [initandlisten] couldn't lazily get rwlock

Sat Apr 9 13:33:13.741 [DataFileSync] flushing mmaps took 376350ms for 65 files

.

Sun Apr 10 05:28:27.000 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 05:29:31.533 [DataFileSync] flushing mmaps took 682175ms for 129 files

Sun Apr 10 05:39:19.004 [DataFileSync] flushing mmaps took 587471ms for 129 files

Sun Apr 10 05:39:20.396 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 05:51:12.881 [DataFileSync] flushing mmaps took 713877ms for 129 files

Sun Apr 10 06:01:33.616 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 06:02:39.204 [DataFileSync] flushing mmaps took 686323ms for 130 files

Sun Apr 10 06:12:56.963 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 06:14:39.181 [DataFileSync] flushing mmaps took 719976ms for 131 files

Sun Apr 10 06:21:15.675 [DataFileSync] flushing mmaps took 396494ms for 131 files

Sun Apr 10 06:24:59.974 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 06:33:27.544 [DataFileSync] flushing mmaps took 731869ms for 131 files

Sun Apr 10 06:44:20.129 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 06:44:45.180 [DataFileSync] flushing mmaps took 677635ms for 133 files

Sun Apr 10 06:55:13.643 [DataFileSync] flushing mmaps took 628463ms for 133 files

Sun Apr 10 07:05:34.373 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 07:06:53.134 [DataFileSync] flushing mmaps took 699491ms for 133 files

Sun Apr 10 07:16:31.574 [DataFileSync] flushing mmaps took 578440ms for 134 files

Sun Apr 10 07:17:10.835 [initandlisten] couldn't lazily get rwlock

Sun Apr 10 07:29:18.096 [DataFileSync] flushing mmaps took 766522ms for 134 files

DanYamins

unread,

Apr 10, 2016, 12:11:19 PM4/10/16

to mongodb-user

Unfortunately for unknown reasons the server in question in the above process rebooted overnight, killing the mongodb process (I believe due to power outage).

Mongodb has been restarted and appeared to functioning normally. However, I have attempted to run an index-creation process on a collection.

... Sun Apr 10 10:12:50.007 [conn4] build index saved_files.models.chunks { files_id: 1, n: 1 }

This process started out at a reasonable speed [log snippet]:

Sun Apr 10 10:13:00.085 [conn4] Index: (1/3) External Sort Progress: 11800/3609859 0%

.

Sun Apr 10 10:14:32.093 [conn4] Index: (1/3) External Sort Progress: 62900/3609859 1%

Sun Apr 10 10:14:38.889 [DataFileSync] flushing mmaps took 101089ms for 5354 files

but has now crawled to virtual standstill:

Sun Apr 10 11:21:48.416 [DataFileSync] flushing mmaps took 1988103ms for 5354 files

and is getting slower. There does not appear to a major system load on the machine, either in terms of CPU or memory usage.

Any idea what might be going on, and if this is related to the earlier problem we encountered with mmaps?

Kevin Adistambha

unread,

Apr 20, 2016, 3:43:44 AM4/20/16

to mongodb-user

Hi Dan,

We had an unexpected shutdown of our mongodb database due to hardware failure.

What is the nature of the hardware failure? Was it repaired when the mongod process restarts?

Also, could you provide more information regarding your deployment, such as:

the operating system and its version
whether the /data directory mounted locally or remotely (e.g. as an NFS)
the filesystem and mount options of the /data directory; e.g. the output of the mount command in Linux
are there any other process running on the machine other than a single mongod (e.g. another database server, web server, or another mongod process)

Sun Apr 10 07:29:18.096 [DataFileSync] flushing mmaps took 766522ms for 134 files

MongoDB flushes the data that changes in memory to disk using a background thread every 60 seconds. mmaps means the MMAPv1 storage engine. It is abnormal for the flushes to take that long (12 minutes in the log above).

There are three items in your log that are worth a closer look:

The first item is:

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: You are running on a NUMA machine.

which seems to indicate that you are running on a NUMA hardware. We recommend disabling NUMA since leaving it active could lead to performance issues. Please see MongoDB and NUMA Hardware for more information.

The second item is:

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: Readahead for /data/db_convnet_repair is set to 1024KB

this indicates that your readahead setting is set too high. You might want to lower the readahead setting of your block device.

The third item is:

Sat Apr 9 12:33:30.130 [initandlisten] db version v2.4.6

I recommend upgrading to the latest version in the 2.4.x series for bugfixes and improvements, which currently is at 2.4.14.

For MongoDB performance tuning, you may find the Version 2.4 Production Notes helpful.

Best regards,
Kevin

Reply all

Reply to author

Forward