Chunk migration blocks read queries to mongos. HELP!

62 views
Skip to first unread message

Abhishek Raj

unread,
Dec 14, 2017, 6:12:42 PM12/14/17
to mongodb-user
Hi. We are using mongo 3.2.17. We recently enabled sharding for one of our collections. We opted for _waitForDelete: true because we wanted to throttle the chunk migration to prevent performance impacts on the cluster. The chunk migration process goes smoothly, but as soon as the range deleter kicks in, mongos starts dropping connections with socket exception [SEND_ERROR]

We see these in the primary's logs - 

SHARDING [conn12165] moveChunk data transfer progress: { active: true, sessionId: "sh1_sh2_5a32fe5711ce58b78673f0cf", ns: "<namespace>", from: "sh1/<ips>.", min: { key: "<min>" }, max: { key: "<max>" }, shardKeyPattern: { <pattern> }, state: "steady", counts: { cloned: 44971, clonedBytes: 16904159, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
SHARDING [conn12165] About to check if it is safe to enter critical section
SHARDING [conn12165] About to enter migrate critical section
SHARDING [conn12165] moveChunk setting version to: <version>
SHARDING [conn10988] Waiting for 10 seconds for the migration critical section to end
SHARDING [conn12165] moveChunk migrate commit accepted by TO-shard: { active: false, ns: "<namespace>", from: "sh1/<ips>", min: { key: "<min>" }, max: { key: "<max>" }, shardKeyPattern: { <pattern> }, state: "done", counts: { cloned: 44971, clonedBytes: 16904159, catchup: 0, steady: 0 }, ok: 1.0 }
SHARDING [conn12165] moveChunk updating self version to: <version through { key: "<max>" } -> { key: "<max>" } for collection '<namespace>'
SHARDING [conn12165] about to log metadata event into changelog: { _id: "<id>", server: "<server>", clientAddr: "<client>", time: new Date(1513291397898), what: "moveChunk.commit", ns: "<namespace>", details: { min: { key: "<min>" }, max: { key: "<max>" }, from: "sh1", to: "sh2", cloned: 44971, clonedBytes: 16904159, catchup: 0, steady: 0 } }
SHARDING [conn12165] MigrateFromStatus::done About to acquire global lock to exit critical section
SHARDING [conn12165] doing delete inline for cleanup of chunk data
SHARDING [conn12165] Deleter starting delete for: <namespace> from { key: "<min>" } -> { key: "<max>" }, with opId: 6008172
SHARDING [conn12165] rangeDeleter deleted 44977 documents for devices.devices from { key: "<min>" } -> { key: "<max>" }


The rangedeleter takes about a minute to complete and right around the time it kicks in we start seeing these on the mongos -

SHARDING [conn10188480] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188575] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188420] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188266] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10187891] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10187582] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188665] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10186909] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10184469] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188594] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188353] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188475] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188725] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 
SHARDING [conn10188141] Exception thrown while processing query op for <op> :: caused by :: 9001 socket exception [SEND_ERROR] server [<ips>] 



We have spent hours trying out various things with no luck. Any help here would be greatly appreciated.

Abhishek Raj

unread,
Dec 15, 2017, 4:17:48 AM12/15/17
to mongodb-user
Anything?

Kevin Adistambha

unread,
Jan 7, 2018, 8:10:52 PM1/7/18
to mongodb-user

Hi Abhishek

From the log messages you posted, it appears that the chunk move happened successfully, and the range deleter finished deleting the documents. The socket exception you’re seeing may be caused by the application instead of internal MongoDB processes such as the range deleter.

Are you seeing any issues in the operation of the sharded cluster that is connected to these messages? If you suspect that you have orphaned documents, you may want to check out the cleanupOrphaned() command.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages