I'm repairing my replica sets according to
http://www.mongodb.org/display/DOCS/Durability+and+Repair#DurabilityandRepair-RepairCommand. I first step down the masters, restart, repair, etc., then I do the secondaries.
In my script, to speed up this process, I actually do the repairs in parallel. First I go through and build a list of who my masters are and who my secondaries are. I start with the masters and execute the procedure in parallel. Then I do it for the secondaries after the (previous) masters have come back up.
I notice though, that when all is said and done, my application, specifically the mongos's seem to have problems reconnecting to the shards and continue to throw application errors. I see this in the mongos logs:
"warning could not clear last error from shard ... caused by ... socket exception"
"Socket say send() errno:32 Broken pipe ..."
If I restart my mongos's all is well, but this isn't ideal.
We're using Mongodb 2.0.4.
Thoughts?
Thanks,
Justin