Failing to start mongo due to old lock file though journal was cleaned.

128 views
Skip to first unread message

Avi Ribchinsky

unread,
Oct 12, 2015, 5:32:04 PM10/12/15
to mongodb-user
Hi all, I'm a new member to this community. Have been working with mongo for several months now and still encounter some interesting errors which I hope the community can help resolve!

I have a local pre production mongodb deployment which is configured in the following topology: several shards, each shard has three replica sets and journaling is enabled for all, sever version is 2.6.9.

The problem:
One of the servers terminated with the following exit log and left an unlocked lock file which caused subsequent startups to fail.

... for command :{ $err: "socket exception [SEND_ERROR] for 0.1.2.187:1, code: 9001 }
2015-10-03T00:53:43.494+0000 [conn12906] waiting till out of critical section
2015-10-03T00:53:43.501+0000 [conn12900] waiting till out of critical section
2015-10-03T00:53:43.643+0000 [conn12904] waiting till out of critical section
2015-10-03T00:53:43.651+0000 [conn12907] waiting till out of critical section
2015-10-03T00:53:43.735+0000 [conn12905] waiting till out of critical section
2015-10-03T00:53:43.743+0000 [conn12902] waiting till out of critical section
2015-10-03T00:53:43.893+0000 [conn12903] waiting till out of critical section
2015-10-03T00:53:43.901+0000 [conn12901] waiting till out of critical section
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: moveChunk commit failed: version is at 5|5||000000000000000000000000 instead of 6|1||55dce04fae57789a22a4d141
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: TERMINATING

2015-10-03T00:53:53.461+0000 [conn12908] dbexit:
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close listening sockets...
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 10
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 13
2015-10-03T00:53:53.461+0000 [conn12908] removing socket file: /tmp/mongodb-27022.sock
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to flush diaglog...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close sockets...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: waiting for fs preallocator...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: lock for final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: closing all files...
2015-10-03T00:53:53.461+0000 [conn12750] end connection 0.1.2.187:1 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12929] end connection 0.1.2.187:2 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12936] end connection 0.1.2.187:3 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12915] end connection 0.1.2.187:4 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12966] end connection 0.1.2.187:5 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12967] end connection 0.1.2.134:6 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12949] end connection 0.1.2.134:7 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12736] end connection 0.1.2.134:8 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12913] end connection 0.1.2.134:9 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12912] end connection 0.1.2.134:10 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12726] end connection 0.1.2.134:11 (22 connections now open)
2015-10-03T00:53:53.465+0000 [conn12908] closeAllFiles() finished
2015-10-03T00:53:53.465+0000 [conn12908] journalCleanup...
2015-10-03T00:53:53.465+0000 [conn12908] removeJournalFiles
2015-10-03T00:53:53.483+0000 [initandlisten] now exiting
2015-10-03T00:53:53.483+0000 [initandlisten] dbexit: ; exiting immediately

Subsequent server startups are failings due to old lock file:

2015-10-03T01:08:10.101+0000 [initandlisten] MongoDB starting : pid=13734 port=1 dbpath=/mongo/data/S1R2 64-bit host=myhost
2015-10-03T01:08:10.101+0000 [initandlisten] db version v2.6.9
2015-10-03T01:08:10.101+0000 [initandlisten] git version: df313bc75aa94d192330cb92756fc486ea604e64
2015-10-03T01:08:10.101+0000 [initandlisten] build info: Linux build20.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-10-03T01:08:10.101+0000 [initandlisten] allocator: tcmalloc
2015-10-03T01:08:10.101+0000 [initandlisten] options: { config: "/etc/st_mongod_1_S1R2.conf", net: { port: 1 }, processManagement: { fork: true }, replication: { replSet: "S1R" }, security: { keyFile: "/key.txt" }, storage: { dbPath: "/mongo/data/S1R2" }, systemLog: { destination: "file", path: "/mongo/data/S1R2.log" } }
2015-10-03T01:08:10.101+0000 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
2015-10-03T01:08:10.101+0000 [initandlisten] dbexit:
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to close listening sockets...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to flush diaglog...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to close sockets...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: waiting for fs preallocator...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: lock for final commit...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: final commit...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: closing all files...
2015-10-03T01:08:10.101+0000 [initandlisten] closeAllFiles() finished
2015-10-03T01:08:10.101+0000 [initandlisten] dbexit: really exiting now

It seems that the Journal files were cleaned and removed so why the file lock remained?
Since I have journaling enabled, what is the recommended action I can perform (restore from backup/ delete and sync and etc)?
Can I just delete lock file?
Also can it be determined from this log why the server was terminated?

Thanks in advance!

Wan Bachtiar

unread,
Nov 9, 2015, 9:26:25 PM11/9/15
to mongodb-user

It seems that the Journal files were cleaned and removed so
why the file lock remained?

Hi Avi,

If the mongod.lock file exists in the data directory specified by dbpath and is not a zero-byte file, this is an indication that the mongod process did not exit cleanly. Since there can be many factors that could caused a process to exit uncleanly, the deletion of this file should be done manually.


Since I have journaling enabled, what is the recommended action I can perform (restore from backup/ delete and sync and etc)? Can I just delete lock file?

Based on your log file, the journal file had been removed before the process exited :

2015-10-03T00:53:53.465+0000 [conn12908] closeAllFiles() finished
2015-10-03T00:53:53.465+0000 [conn12908] journalCleanup...
2015-10-03T00:53:53.465+0000 [conn12908] removeJournalFiles
2015-10-03T00:53:53.483+0000 [initandlisten] now exiting

In which case, it would be safe to delete the lock file manually and restart the process.

On the other hand if the journal file exists, the recommended way for a replica set deployment is to restore from a backup or restart the mongod instance with an empty dbPath and allow MongoDB to perform an initial sync to restore the data. See Resync replica set member


Also can it be determined from this log why the server was terminated?

Perhaps check for any error messages before the critical section of moveChunk. It is possible that the process terminated itself to prevent unwanted outcomes (for example: duplicate chunks or missing chunks) after an unexpected event has occurred during the critical section of moveChunk commit.

There is an open ticket for the log error message that you have posted in MongoDB JIRA issue tracker: SERVER-8358. I’ve noticed that you have posted a message in the ticket, please feel free to watch or upvote for updates.


Regards,
Wan.

Reply all
Reply to author
Forward
0 new messages