flushing mmaps after unclean restart

56 views
Skip to first unread message

DanYamins

unread,
Apr 10, 2016, 7:38:05 AM4/10/16
to mongodb-user
We had an unexpected shutdown of our mongodb database due to hardware failure.   The database process restarted automatically, but something seems to be going wrong in the restart.    The log file is included below.   It shows that after "recover done", the system has been flushing mmaps, like this:

.... [initandlisten] couldn't lazily get rwlock
.... [DataFileSync] flushing mmaps took 332569ms  for 62 files

It's been doing this for nearly 24 hours.  Is this normal?  How long should I wait before killing it and doing some kind of repair? 





=======
Sat Apr  9 12:33:30.130 [initandlisten] MongoDB starting : pid=3447 port=29101 dbpath=/data/db_convnet_repair 64-bit host=dicarlo5
Sat Apr  9 12:33:30.130 [initandlisten]
Sat Apr  9 12:33:30.130 [initandlisten] ** WARNING: You are running on a NUMA machine.
Sat Apr  9 12:33:30.130 [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
Sat Apr  9 12:33:30.130 [initandlisten] **              numactl --interleave=all mongod [other options]
Sat Apr  9 12:33:30.130 [initandlisten]
Sat Apr  9 12:33:30.130 [initandlisten] db version v2.4.6
Sat Apr  9 12:33:30.130 [initandlisten] git version: b9925db5eac369d77a3a5f5d98a145eaaacd9673
Sat Apr  9 12:33:30.130 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Sat Apr  9 12:33:30.130 [initandlisten] allocator: tcmalloc
Sat Apr  9 12:33:30.130 [initandlisten] options: { bind_ip: "127.0.0.1", config: "/home/mongodb/db_convnet_repair.conf", dbpath: "/data/db_convnet_repair", logpath: "/h\
ome/mongodb/db_convnet_repair.log", port: 29101 }
Sat Apr  9 12:33:30.130 [initandlisten]
Sat Apr  9 12:33:30.130 [initandlisten] ** WARNING: Readahead for /data/db_convnet_repair is set to 1024KB
Sat Apr  9 12:33:30.130 [initandlisten] **          We suggest setting it to 256KB (512 sectors) or less
Sat Apr  9 12:33:30.130 [initandlisten] **          http://dochub.mongodb.org/core/readahead
Sat Apr  9 12:33:51.183 [initandlisten] journal dir=/data/db_convnet_repair/journal
Sat Apr  9 12:33:51.183 [initandlisten] recover begin
Sat Apr  9 12:33:51.183 [initandlisten] info no lsn file in journal/ directory
Sat Apr  9 12:33:51.184 [initandlisten] recover lsn: 0
Sat Apr  9 12:33:51.184 [initandlisten] recover /data/db_convnet_repair/journal/j._0
Sat Apr  9 12:33:54.130 [initandlisten] recover cleaning up
Sat Apr  9 12:33:54.130 [initandlisten] removeJournalFiles
Sat Apr  9 12:34:19.100 [initandlisten] recover done
Sat Apr  9 12:34:19.100 [initandlisten] preallocating a journal file /data/db_convnet_repair/journal/prealloc.0
Sat Apr  9 12:39:42.249 [DataFileSync] flushing mmaps took 252124ms  for 62 files
Sat Apr  9 12:45:38.040 [DataFileSync] flushing mmaps took 355791ms  for 62 files
Sat Apr  9 12:45:48.517 [initandlisten] couldn't lazily get rwlock
Sat Apr  9 12:51:10.609 [DataFileSync] flushing mmaps took 332569ms  for 62 files
Sat Apr  9 12:54:44.640 [DataFileSync] flushing mmaps took 214031ms  for 64 files
Sat Apr  9 13:00:37.567 [DataFileSync] flushing mmaps took 352926ms  for 64 files
Sat Apr  9 13:01:31.367 [initandlisten] couldn't lazily get rwlock
Sat Apr  9 13:06:49.986 [DataFileSync] flushing mmaps took 372418ms  for 64 files
Sat Apr  9 13:11:25.073 [DataFileSync] flushing mmaps took 275087ms  for 64 files
Sat Apr  9 13:17:10.028 [initandlisten] couldn't lazily get rwlock
Sat Apr  9 13:17:35.620 [DataFileSync] flushing mmaps took 370547ms  for 64 files
Sat Apr  9 13:20:55.396 [DataFileSync] flushing mmaps took 199776ms  for 65 files
Sat Apr  9 13:26:57.391 [DataFileSync] flushing mmaps took 361995ms  for 65 files
Sat Apr  9 13:27:56.662 [initandlisten] couldn't lazily get rwlock
Sat Apr  9 13:33:13.741 [DataFileSync] flushing mmaps took 376350ms  for 65 files
.
.
.
Sun Apr 10 05:28:27.000 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 05:29:31.533 [DataFileSync] flushing mmaps took 682175ms  for 129 files
Sun Apr 10 05:39:19.004 [DataFileSync] flushing mmaps took 587471ms  for 129 files
Sun Apr 10 05:39:20.396 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 05:51:12.881 [DataFileSync] flushing mmaps took 713877ms  for 129 files
Sun Apr 10 06:01:33.616 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 06:02:39.204 [DataFileSync] flushing mmaps took 686323ms  for 130 files
Sun Apr 10 06:12:56.963 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 06:14:39.181 [DataFileSync] flushing mmaps took 719976ms  for 131 files
Sun Apr 10 06:21:15.675 [DataFileSync] flushing mmaps took 396494ms  for 131 files
Sun Apr 10 06:24:59.974 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 06:33:27.544 [DataFileSync] flushing mmaps took 731869ms  for 131 files
Sun Apr 10 06:44:20.129 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 06:44:45.180 [DataFileSync] flushing mmaps took 677635ms  for 133 files
Sun Apr 10 06:55:13.643 [DataFileSync] flushing mmaps took 628463ms  for 133 files
Sun Apr 10 07:05:34.373 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 07:06:53.134 [DataFileSync] flushing mmaps took 699491ms  for 133 files
Sun Apr 10 07:16:31.574 [DataFileSync] flushing mmaps took 578440ms  for 134 files
Sun Apr 10 07:17:10.835 [initandlisten] couldn't lazily get rwlock
Sun Apr 10 07:29:18.096 [DataFileSync] flushing mmaps took 766522ms  for 134 files

DanYamins

unread,
Apr 10, 2016, 12:11:19 PM4/10/16
to mongodb-user
Unfortunately for unknown reasons the server in question in the above process rebooted overnight, killing the mongodb process (I believe due to power outage).   

Mongodb has been restarted and appeared to functioning normally.   However, I have attempted to run an index-creation process on a collection.  

 ...   Sun Apr 10 10:12:50.007 [conn4] build index saved_files.models.chunks { files_id: 1, n: 1 }

This process started out at a reasonable speed [log snippet]:


   Sun Apr 10 10:13:00.085 [conn4]                 Index: (1/3) External Sort Progress: 11800/3609859      0%
   .
   .
   .
   Sun Apr 10 10:14:32.093 [conn4]                 Index: (1/3) External Sort Progress: 62900/3609859      1%  
   Sun Apr 10 10:14:38.889 [DataFileSync] flushing mmaps took 101089ms  for 5354 files
 

but has now crawled to  virtual standstill:  

    Sun Apr 10 11:21:48.416 [DataFileSync] flushing mmaps took 1988103ms  for 5354 files

and is getting slower.   There does not appear to a major system load on the machine, either in terms of CPU or memory usage.  

Any idea what might be going on, and if this is related to the earlier problem we encountered with mmaps? 

Kevin Adistambha

unread,
Apr 20, 2016, 3:43:44 AM4/20/16
to mongodb-user

Hi Dan,

We had an unexpected shutdown of our mongodb database due to hardware failure.

What is the nature of the hardware failure? Was it repaired when the mongod process restarts?

Also, could you provide more information regarding your deployment, such as:

  • the operating system and its version
  • whether the /data directory mounted locally or remotely (e.g. as an NFS)
  • the filesystem and mount options of the /data directory; e.g. the output of the mount command in Linux
  • are there any other process running on the machine other than a single mongod (e.g. another database server, web server, or another mongod process)

Sun Apr 10 07:29:18.096 [DataFileSync] flushing mmaps took 766522ms for 134 files

MongoDB flushes the data that changes in memory to disk using a background thread every 60 seconds. mmaps means the MMAPv1 storage engine. It is abnormal for the flushes to take that long (12 minutes in the log above).

There are three items in your log that are worth a closer look:

The first item is:

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: You are running on a NUMA machine.

which seems to indicate that you are running on a NUMA hardware. We recommend disabling NUMA since leaving it active could lead to performance issues. Please see MongoDB and NUMA Hardware for more information.

The second item is:

Sat Apr 9 12:33:30.130 [initandlisten] ** WARNING: Readahead for /data/db_convnet_repair is set to 1024KB

this indicates that your readahead setting is set too high. You might want to lower the readahead setting of your block device.

The third item is:

Sat Apr 9 12:33:30.130 [initandlisten] db version v2.4.6

I recommend upgrading to the latest version in the 2.4.x series for bugfixes and improvements, which currently is at 2.4.14.

For MongoDB performance tuning, you may find the Version 2.4 Production Notes helpful.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages