Hi all, I'm a new member to this community. Have been working with mongo for several months now and still encounter some interesting errors which I hope the community can help resolve!
I have a local pre production mongodb deployment which is configured in the following topology: several shards, each shard has three replica sets and journaling is enabled for all, sever version is 2.6.9.
The problem:
One of the servers terminated with the following exit log and left an unlocked lock file which caused subsequent startups to fail.
... for command :{ $err: "socket exception [SEND_ERROR] for
0.1.2.187:1, code: 9001 }
2015-10-03T00:53:43.494+0000 [conn12906] waiting till out of critical section
2015-10-03T00:53:43.501+0000 [conn12900] waiting till out of critical section
2015-10-03T00:53:43.643+0000 [conn12904] waiting till out of critical section
2015-10-03T00:53:43.651+0000 [conn12907] waiting till out of critical section
2015-10-03T00:53:43.735+0000 [conn12905] waiting till out of critical section
2015-10-03T00:53:43.743+0000 [conn12902] waiting till out of critical section
2015-10-03T00:53:43.893+0000 [conn12903] waiting till out of critical section
2015-10-03T00:53:43.901+0000 [conn12901] waiting till out of critical section
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: moveChunk commit failed: version is at 5|5||000000000000000000000000 instead of 6|1||55dce04fae57789a22a4d141
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: TERMINATING2015-10-03T00:53:53.461+0000 [conn12908] dbexit:
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close listening sockets...
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 10
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 13
2015-10-03T00:53:53.461+0000 [conn12908] removing socket file: /tmp/mongodb-27022.sock
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to flush diaglog...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close sockets...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: waiting for fs preallocator...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: lock for final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: closing all files...
2015-10-03T00:53:53.461+0000 [conn12750] end connection
0.1.2.187:1 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12929] end connection
0.1.2.187:2 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12936] end connection
0.1.2.187:3 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12915] end connection
0.1.2.187:4 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12966] end connection
0.1.2.187:5 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12967] end connection
0.1.2.134:6 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12949] end connection
0.1.2.134:7 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12736] end connection
0.1.2.134:8 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12913] end connection
0.1.2.134:9 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12912] end connection
0.1.2.134:10 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12726] end connection
0.1.2.134:11 (22 connections now open)
2015-10-03T00:53:53.465+0000 [conn12908] closeAllFiles() finished
2015-10-03T00:53:53.465+0000 [conn12908] journalCleanup...
2015-10-03T00:53:53.465+0000 [conn12908] removeJournalFiles
2015-10-03T00:53:53.483+0000 [initandlisten] now exiting
2015-10-03T00:53:53.483+0000 [initandlisten] dbexit: ; exiting immediately
Subsequent server startups are failings due to old lock file:
2015-10-03T01:08:10.101+0000 [initandlisten] MongoDB starting : pid=13734 port=1 dbpath=/mongo/data/S1R2 64-bit host=myhost
2015-10-03T01:08:10.101+0000 [initandlisten] db version v2.6.9
2015-10-03T01:08:10.101+0000 [initandlisten] git version: df313bc75aa94d192330cb92756fc486ea604e64
2015-10-03T01:08:10.101+0000 [initandlisten] build info: Linux
build20.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-10-03T01:08:10.101+0000 [initandlisten] allocator: tcmalloc
2015-10-03T01:08:10.101+0000 [initandlisten] options: { config: "/etc/st_mongod_1_S1R2.conf", net: { port: 1 }, processManagement: { fork: true }, replication: { replSet: "S1R" }, security: { keyFile: "/key.txt" }, storage: { dbPath: "/mongo/data/S1R2" }, systemLog: { destination: "file", path: "/mongo/data/S1R2.log" } }
2015-10-03T01:08:10.101+0000 [initandlisten] exception in initAndListen: 12596 old lock file, terminating2015-10-03T01:08:10.101+0000 [initandlisten] dbexit:
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to close listening sockets...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to flush diaglog...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: going to close sockets...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: waiting for fs preallocator...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: lock for final commit...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: final commit...
2015-10-03T01:08:10.101+0000 [initandlisten] shutdown: closing all files...
2015-10-03T01:08:10.101+0000 [initandlisten] closeAllFiles() finished
2015-10-03T01:08:10.101+0000 [initandlisten] dbexit: really exiting now
It seems that the Journal files were cleaned and removed so why the file lock remained?
Since I have journaling enabled, what is the recommended action I can perform (restore from backup/ delete and sync and etc)?
Can I just delete lock file?
Also can it be determined from this log why the server was terminated?
Thanks in advance!