Hi Ankit
In addition to the links Weishan provided, there is a Wiki page on MongoDB’s github repository: https://github.com/mongodb/mongo/wiki
You may be interested in particular in the Sharding Internals part of the document.
Regarding your questions:
If insertion in db is stopped, then only balancer is active and started moving chunks. If i insert more data for longer duration which will create more chunks and data will be more skewed.Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks ?
This is answered in the Balancer section of the Wiki.
I was able notice spikes in write latency if data is getting inserted after 20M docs. Does it mean balancer is moving some of the chunks intermittently?
Possibly. It is also possible that you overloaded the hardware resources of your test deployment.
Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API will also give incorrect result(duplicate docs) ?
No. Each shard will only return part of the collection that it is responsible for, as recorded in the config servers. Note that this is true if you perform the query via the mongos
process. Result may be inconsistent if you connect directly to the shard servers. It is strongly not recommended to connect directly to the shards.
Best regards
Kevin