db.fsyncLock() for each db in instance, or just for admin db?

317 views
Skip to first unread message

David Resnick

unread,
Apr 23, 2018, 4:14:25 PM4/23/18
to mongodb-user
Hi,

I've created a procedure for backing up a mongo replica with multiple db's using EBS snapshots. The data and journal files are on separate volumes, but the backups I create are consistently valid. Before I initiate snapshot creation, I stop the mongod service.

I'd like to use fsyncLock/fsyncUnlock instead, though my attempts so far have not been consistently successful.

I have a couple of questions regarding the procedure.

1. Is it required to call fsyncLock on every db on the instance, or just the admin db?
2. What is the significance of using RAID 10 on the volumes storing the data and journal files? Is this required for creating the snapshots, or just recommended for failure recovery?

Thanks,
David

David Resnick

unread,
Apr 30, 2018, 1:07:35 AM4/30/18
to mongodb-user
The docs aren't 100% clear on which DB(s) needs "fsynclock" (just 'admin' or all DBs on instance). And the experience I've had indicated that might be the direction needed to fix my problem.

I would really appreciate a definitive answer here...

Thanks,
David

Stephen Steneker

unread,
Apr 30, 2018, 2:11:07 AM4/30/18
to mongodb-user
On Monday, 30 April 2018 15:07:35 UTC+10, David Resnick wrote:
The docs aren't 100% clear on which DB(s) needs "fsynclock" (just 'admin' or all DBs on instance). And the experience I've had indicated that might be the direction needed to fix my problem.
  
1. Is it required to call fsyncLock on every db on the instance, or just the admin db?

Hi David,

db.fsyncLock()is an administrative command that affects an entire mongod instance, so you only need to run the command once to stop writes to all databases in that instance. For more information see the db.fsyncLock() documentation which notes the expected behaviour.

 
2. What is the significance of using RAID 10 on the volumes storing the data and journal files? Is this required for creating the snapshots, or just recommended for failure recovery?

I assume you are referring to the RAID recommendation in the MongoDB Production Notes or EC2 Deployment Notes. RAID is not a requirement for EBS, but if you choose to use RAID the recommended level is RAID 10. RAID 10 combines the fault tolerance of RAID 1 (mirroring) with the higher write performance of RAID 0 (striping). However, if you are using EBS in a RAID 10 configuration you will not be able to use EBS Snapshots to get a consistent backup.


If you are still having issues creating consistent backups, can you confirm the specific version of MongoDB server you are using (x.y.z), your storage engine (if not the default), and whether your EBS configuration is using RAID? Can you also share more information on how you are validating your backups and the errors encountered?

Thanks,
Stennie

David Resnick

unread,
Apr 30, 2018, 4:47:27 PM4/30/18
to mongodb-user
Thanks for the clarification regarding fsyncLock.

I'm using version 3.4.9, with WiredTiger and without RAID on EBS.

I'll create more snapshots using fsyncLock and see if I continue to be unable to restore them. My previous attempts left me with mongo instances that failed to start for a number of retries (I don't have the error messages). They eventually started, though many DBs were missing.

- David

David Resnick

unread,
Jun 21, 2018, 8:51:22 AM6/21/18
to mongodb-user
Hi,

I've retried and have the same problems. 

I run this before creating the EBS snapshots:

echo "use admin\ndb.fsyncUnlock()" | mongo

My configuration includes 2 EBS volumes, data and journal.

As I mentioned earlier, when I create stop mongod before creating the snapshots, I am able to successfully bring an instance up with the same configuration.

Any suggestions on how I can get this working are appreciated.

The startup log after creating volumes based on the snapshots:

2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] MongoDB starting : pid=5583 port=27027 dbpath=/data/db-0 64-bit host=ip-98-0-10-129
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] db version v3.4.9
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] git version: 876ebee8c7dd0e2d992f36a848ff4dc50ee6603e
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] modules: none
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] build environment:
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten]     distmod: ubuntu1404
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-06-21T11:12:33.987+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod-0.conf", net: { bindIp: "0.0.0.0", ipv6: false, port: 27027 }, operationProfiling: { slowOpThresholdMs: 10000 }, storage: { dbPath: "/data/db-0", directoryPerDB: true, engine: "wiredTiger", indexBuildRetry: false, journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 10.0 } } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "/data/logs/mongod-0.log", quiet: true } }
2018-06-21T11:12:33.987+0000 W -        [initandlisten] Detected unclean shutdown - /data/db-0/mongod.lock is not empty.
2018-06-21T11:12:34.001+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-06-21T11:12:34.001+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=10240M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-06-21T11:12:34.012+0000 I STORAGE  [initandlisten] WiredTiger message Both WiredTiger.turtle and WiredTiger.backup exist; recreating metadata from backup
2018-06-21T11:12:48.370+0000 E STORAGE  [initandlisten] WiredTiger error (17) [1529579568:370237][5583:0x7fd6929a4d00], WT_SESSION.create: /data/db-0/sizeStorer.wt: handle-open: open: File exists
2018-06-21T11:12:48.370+0000 I STORAGE  [initandlisten] WiredTiger message unexpected file sizeStorer.wt found, renamed to sizeStorer.wt.1
2018-06-21T11:12:48.373+0000 E STORAGE  [initandlisten] WiredTiger error (17) [1529579568:373846][5583:0x7fd6929a4d00], WT_SESSION.create: /data/db-0/_mdb_catalog.wt: handle-open: open: File exists
2018-06-21T11:12:48.373+0000 I STORAGE  [initandlisten] WiredTiger message unexpected file _mdb_catalog.wt found, renamed to _mdb_catalog.wt.1
2018-06-21T11:12:48.380+0000 I CONTROL  [initandlisten]
2018-06-21T11:12:48.380+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-06-21T11:12:48.380+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-06-21T11:12:48.380+0000 I CONTROL  [initandlisten]
2018-06-21T11:12:48.493+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db-0/diagnostic.data'
2018-06-21T11:12:48.631+0000 I INDEX    [initandlisten] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" }
2018-06-21T11:12:48.631+0000 I INDEX    [initandlisten]          building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2018-06-21T11:12:48.632+0000 I INDEX    [initandlisten] build index done.  scanned 0 total records. 0 secs
2018-06-21T11:12:48.633+0000 I COMMAND  [initandlisten] setting featureCompatibilityVersion to 3.4
2018-06-21T11:12:48.633+0000 I NETWORK  [thread1] waiting for connections on port 27027

Thanks,
David

Stephen Steneker

unread,
Jun 25, 2018, 2:30:29 AM6/25/18
to mongodb-user
On Thursday, 21 June 2018 22:51:22 UTC+10, David Resnick wrote:
I've retried and have the same problems. 

I run this before creating the EBS snapshots:

echo "use admin\ndb.fsyncUnlock()" | mongo

Hi David,

You should be running db.fsyncLock() before creating the backups, and db.fsyncUnlock() when your backups are finished.

If you run db.fsyncUnlock() before creating snapshots you should see an error message like "fsyncUnlock called when not locked" (and this command will have no effect).

Instead of piping a command with the use admin interactive shell syntax, I would also suggest using the scripted syntax:

  mongo --eval "db.getSiblingDB('admin').fsyncLock()"


Regards,
Stennie

David Resnick

unread,
Jun 26, 2018, 3:13:50 AM6/26/18
to mongodb-user
Hi Stephen,

Unfortunately, my mistake was only in my post; I just checked and did use db.fsyncLock() before creating the snapshots.

Thanks for the tip about a cleaner way to do run the mongo shell snippet.

I've created EBS snapshot backups of 5 mongo instances and attempted restoring them. 4 started successfully but one failed with this:

2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] MongoDB starting : pid=6049 port=27037 dbpath=/data/db-1 64-bit host=ip-98-0-10-145
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] db version v3.4.9
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] git version: 876ebee8c7dd0e2d992f36a848ff4dc50ee6603e
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] modules: none
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] build environment:
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten]     distmod: ubuntu1404
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-06-25T14:37:35.973+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod-1.conf", net: { bindIp: "0.0.0.0", ipv6: false, port: 27037 }, operationProfiling: { slowOpThresholdMs: 10000 }, storage: { dbPath: "/data/db-1", directoryPerDB: true, engine: "wiredTiger", indexBuildRetry: false, journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 10.0 } } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "/data/logs/mongod-1.log", quiet: true } }
2018-06-25T14:37:35.973+0000 W -        [initandlisten] Detected unclean shutdown - /data/db-1/mongod.lock is not empty.
2018-06-25T14:37:35.986+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-06-25T14:37:35.986+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=10240M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-06-25T14:37:35.996+0000 I STORAGE  [initandlisten] WiredTiger message Both WiredTiger.turtle and WiredTiger.backup exist; recreating metadata from backup
2018-06-25T14:44:13.584+0000 E STORAGE  [initandlisten] WiredTiger error (17) [1529937853:584738][6049:0x7f008e725d00], WT_SESSION.create: /data/db-1/sizeStorer.wt: handle-open: open: File exists
2018-06-25T14:44:13.584+0000 I STORAGE  [initandlisten] WiredTiger message unexpected file sizeStorer.wt found, renamed to sizeStorer.wt.1
2018-06-25T14:45:33.494+0000 I -        [initandlisten] Fatal Assertion 34433 at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 796
2018-06-25T14:45:33.494+0000 I -        [initandlisten]

***aborting after fassert() failure

Any ideas?

Thanks,
David

David Resnick

unread,
Jun 27, 2018, 1:17:15 AM6/27/18
to mongodb-user
Another attempt: same 5 mongo instances, same procedure, new snapshots.

The same instance failed to come up from the snapshots. This is the log this time:

2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] MongoDB starting : pid=6126 port=27037 dbpath=/data/db-1 64-bit host=ip-98-0-10-124
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] db version v3.4.9
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] git version: 876ebee8c7dd0e2d992f36a848ff4dc50ee6603e
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] modules: none
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] build environment:
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten]     distmod: ubuntu1404
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-06-26T09:59:58.249+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod-1.conf", net: { bindIp: "0.0.0.0", ipv6: false, port: 27037 }, operationProfiling: { slowOpThresholdMs: 10000 }, storage: { dbPath: "/data/db-1", directoryPerDB: true, engine: "wiredTiger", indexBuildRetry: false, journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 10.0 } } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "/data/logs/mongod-1.log", quiet: true } }
2018-06-26T09:59:58.249+0000 W -        [initandlisten] Detected unclean shutdown - /data/db-1/mongod.lock is not empty.
2018-06-26T09:59:58.262+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-06-26T09:59:58.262+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=10240M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-06-26T09:59:58.272+0000 I STORAGE  [initandlisten] WiredTiger message Both WiredTiger.turtle and WiredTiger.backup exist; recreating metadata from backup
2018-06-26T10:00:35.450+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1530007235:450517][6126:0x7fe5c4c79d00], connection: WiredTiger.backup: encountered an illegal file format or internal value
2018-06-26T10:00:35.450+0000 E STORAGE  [initandlisten] WiredTiger error (-31804) [1530007235:450577][6126:0x7fe5c4c79d00], connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-06-26T10:00:35.450+0000 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2018-06-26T10:00:35.450+0000 I -        [initandlisten]

***aborting after fassert() failure


2018-06-26T10:00:35.460+0000 F -        [initandlisten] Got signal: 6 (Aborted).

 0x555931ee34b1 0x555931ee26c9 0x555931ee2bad 0x7fe5c387d390 0x7fe5c34d7428 0x7fe5c34d902a 0x555931189903 0x555931bf59c6 0x555931193c12 0x555931193e2e 0x555931194086 0x55593288c97f 0x55593283af71 0x555931bda78a 0x555931bd2fd5 0x555931ac54f7 0x555931175adc 0x55593119543b 0x7fe5c34c2830 0x5559311f43cf
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55593096E000","o":"15754B1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55593096E000","o":"15746C9"},{"b":"55593096E000","o":"1574BAD"},{"b":"7FE5C386C000","o":"11390"},{"b":"7FE5C34A2000","o":"35428","s":"gsignal"},{"b":"7FE5C34A2000","o":"3702A","s":"abort"},{"b":"55593096E000","o":"81B903","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"55593096E000","o":"12879C6"},{"b":"55593096E000","o":"825C12","s":"__wt_eventv"},{"b":"55593096E000","o":"825E2E","s":"__wt_err"},{"b":"55593096E000","o":"826086","s":"__wt_panic"},{"b":"55593096E000","o":"1F1E97F","s":"__wt_turtle_init"},{"b":"55593096E000","o":"1ECCF71","s":"wiredtiger_open"},{"b":"55593096E000","o":"126C78A","s":"_ZN5mongo18WiredTigerKVEngineC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"55593096E000","o":"1264FD5"},{"b":"55593096E000","o":"11574F7","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"55593096E000","o":"807ADC"},{"b":"55593096E000","o":"82743B","s":"main"},{"b":"7FE5C34A2000","o":"20830","s":"__libc_start_main"},{"b":"55593096E000","o":"8863CF"}],"processInfo":{ "mongodbVersion" : "3.4.9", "gitVersion" : "876ebee8c7dd0e2d992f36a848ff4dc50ee6603e", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1054-aws", "version" : "#63-Ubuntu SMP Wed Mar 28 19:42:42 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55593096E000", "elfType" : 3, "buildId" : "47463EE454C83B0D31139422160E82A1AE6FCDE8" }, { "b" : "7FFEDD964000", "elfType" : 3, "buildId" : "64D828864A19BFB01AD5C3F457BC3FD45C08D2D2" }, { "b" : "7FE5C47F8000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "5A9B9023B6E9ABCEE21DADDB95EF096F505E9E67" }, { "b" : "7FE5C43B4000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B467957BF7E767DD5C0CA92213C2DAF0182765D2" }, { "b" : "7FE5C41AC000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "89C34D7A182387D76D5CDA1F7718F5D58824DFB3" }, { "b" : "7FE5C3FA8000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "8CC8D0D119B142D839800BFF71FB71E73AEA7BD4" }, { "b" : "7FE5C3C9F000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DFB85DE42DAFFD09640C8FE377D572DE3E168920" }, { "b" : "7FE5C3A89000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7FE5C386C000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "CE17E023542265FC11D9BC8F534BB4F070493D30" }, { "b" : "7FE5C34A2000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B5381A457906D279073822A5CEB24C4BFEF94DDB" }, { "b" : "7FE5C4A61000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5D7B6259552275A3C17BD4C3FD05F5A6BF40CAA5" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x555931ee34b1]
 mongod(+0x15746C9) [0x555931ee26c9]
 mongod(+0x1574BAD) [0x555931ee2bad]
 libpthread.so.0(+0x11390) [0x7fe5c387d390]
 libc.so.6(gsignal+0x38) [0x7fe5c34d7428]
 libc.so.6(abort+0x16A) [0x7fe5c34d902a]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x555931189903]
 mongod(+0x12879C6) [0x555931bf59c6]
 mongod(__wt_eventv+0x3D7) [0x555931193c12]
 mongod(__wt_err+0x9D) [0x555931193e2e]
 mongod(__wt_panic+0x24) [0x555931194086]
 mongod(__wt_turtle_init+0xB0F) [0x55593288c97f]
 mongod(wiredtiger_open+0x1641) [0x55593283af71]
 mongod(_ZN5mongo18WiredTigerKVEngineC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0x73A) [0x555931bda78a]
 mongod(+0x1264FD5) [0x555931bd2fd5]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x697) [0x555931ac54f7]
 mongod(+0x807ADC) [0x555931175adc]
 mongod(main+0x96B) [0x55593119543b]
 libc.so.6(__libc_start_main+0xF0) [0x7fe5c34c2830]
 mongod(+0x8863CF) [0x5559311f43cf]
-----  END BACKTRACE  -----

Suggestions for trying to prevent this are very much welcome.

- David
Reply all
Reply to author
Forward
0 new messages