Hi Andrew,
Sharding is used to scale reads and writes beyond the capacity of a single server. However, inserting data in a sharded cluster is different from a non-sharded deployment.
We have our Mongos instance colocated with the mongoimport process to avoid having additional network latency, but that’s the only configuration change between the two set ups.
Could you provide more details on the two deployments (the standalone and the sharded cluster), i.e. hardware, MongoDB version, storage engine, how many mongod
are running in each server, whether virtualization were used, where was the mongoimport
process run?
We have our Mongos instance colocated with the mongoimport process to avoid having additional network latency, but that’s the only configuration change between the two set ups.
There are some differences in details when MongoDB perform writes to a standalone node vs to a sharded cluster. Could you provide more details into your sharded environment, e.g. your shard key, specifics of the tags, and the output of sh.status()
Importing into a sharded cluster involves more parts than importing into a standalone node. Depending on the data, chunks may be split and balanced around the cluster as the import is happening. This balancing process during import can have a large impact on import performance. To mitigate this balancing during import, you could:
sh.status()
output to ensure that the chunks are evenly distributed across the shards.Regards,
Amar