Nevertheless, why does the first sentence insists 256GB,
which is a constant, and much larger than 31GB?
Hi Yos,
The listed collection size of 256GB is a reasonable guideline for an existing collection to be sharded successfully. Although, I agree that the manual could be made clearer on how the number came to be.
When you initially shard a collection, mongos asks mongod to find split points to divide existing data (i.e. minKey to maxKey, or all documents) into as many chunks as necessary given several factors. Based on MongoDB v3.2.1:
Using the above description, let’s work out where the 256GB guideline came from:
ConfiguredChunkSize = 64 MB // default chunk size
AverageDocumentSize = 1 MB // db.collection.stats().avgObjSize
MaxSplitPoints = 8192
( ConfiguredChunkSize / (2 * AverageDocumentSize) ) * MaxSplitPoints = 262,144 MB // or ~256 GB.
If you need to shard an existing collection greater than 256GB, you could temporarily raise the chunk size to get around the initial maximum split number. For example, if you increase the chunk size to 128MB then you should be able to shard a collection around ~512GB of data. Once initial split points are done you can lower the chunk size back to default and the mongos will end up splitting the “larger” chunks when it next checks them for whether they need to be split.
Alternatively, you could pre-split chunks in a sharded cluster manually, and then migrate the data into the new sharded collection.
Generally, it is recommended not to wait to shard until a collection size is really huge. For an example, if you have 250GB collection that you want to shard and you have two shards, then 125GB of data will need to be migrated between shards to balance the collection. This will take some time and will add to the load on the cluster. So sharding a collection earlier is a good idea regardless.
It may also be worth pointing out that there is no limit on the actual size of a sharded cluster, just the size of the initial collection you’re sharding.
Best regards,
Wan.
ConfiguredChunkSize = 64 MB // default chunk size AverageDocumentSize = 1 MB // db.collection.stats().avgObjSize MaxSplitPoints = 8192 ( ConfiguredChunkSize / (2 * AverageDocumentSize) ) * MaxSplitPoints = 262,144 MB // or ~256 GB.
I am wondering if we still need to calculate factor of 2 for AverageDocumentSize even in case of wired tiger engine? Please correct me if i'm wrong, this is true in case of mmap only
I am wondering if we still need to calculate factor of 2 for AverageDocumentSize even in case of wired tiger engine? Please correct me if i’m wrong, this is true in case of mmap only
Hi Saleem,
The concept of multiplying AverageDocumentSize by two is to avoid creating chunks close to the maximum chunk size. If chunks are filled close to the chunk maximum size after an initial sharding process, then the subsequent operations that would grow the chunk size would most likely trigger a chunk split.
This part of the initial sharding calculation is unrelated to the storage engine you are using.
Regards,
Wan.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d417fd66-48e4-49b3-adcf-387979c0b4d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/57c36567-529e-4712-8dce-c4f961f9f28c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.