Mongo Performance Degredation When Using Sharding

66 views
Skip to first unread message

Andrew FigPope

unread,
Aug 19, 2016, 5:05:32 PM8/19/16
to mongodb-user
We have a test Mongos cluster set up with the following machines:
  • 1 Config Server (A replica set with one machine)
  • 3 Shards (Each a replica set with one machine)
  • 3 Mongos
We're trying to scale our Insert performance when using mongoimport, but we're seeing a decrease in throughput when writing to our sharded cluster.

When writing from a single machine to a single Mongod instance, we're seeing throughputs of ~240 documents per second, however as soon as we put that machine into a Mongos cluster with the configuration specified above our throughput drops to ~120 documents per second (when writing to only that shard).

We have our Mongos instance colocated with the mongoimport process to avoid having additional network latency, but that's the only configuration change between the two set ups. We've set up our sharding using a Tag with a very specific range so all the documents we're importing should be going to that specific machine.

Anyone know why we'd be losing so much throughput?

Amar

unread,
Aug 25, 2016, 12:32:13 AM8/25/16
to mongodb-user

Hi Andrew,

Sharding is used to scale reads and writes beyond the capacity of a single server. However, inserting data in a sharded cluster is different from a non-sharded deployment.

We have our Mongos instance colocated with the mongoimport process to avoid having additional network latency, but that’s the only configuration change between the two set ups.

Could you provide more details on the two deployments (the standalone and the sharded cluster), i.e. hardware, MongoDB version, storage engine, how many mongod are running in each server, whether virtualization were used, where was the mongoimport process run?

We have our Mongos instance colocated with the mongoimport process to avoid having additional network latency, but that’s the only configuration change between the two set ups.

There are some differences in details when MongoDB perform writes to a standalone node vs to a sharded cluster. Could you provide more details into your sharded environment, e.g. your shard key, specifics of the tags, and the output of sh.status()

Importing into a sharded cluster involves more parts than importing into a standalone node. Depending on the data, chunks may be split and balanced around the cluster as the import is happening. This balancing process during import can have a large impact on import performance. To mitigate this balancing during import, you could:

  1. Pre-split the collection into as many chunks as needed. More is better.
  2. Let the balancer move the empty chunks throughout the cluster. This will be a relatively quick process since the chunks are still empty.
  3. Check the sh.status() output to ensure that the chunks are evenly distributed across the shards.
  4. Import the data with the balancer in a disabled state.
  5. Re-enable the balancer after import.

Regards,

Amar


Reply all
Reply to author
Forward
0 new messages