Sharding Existing Collection Data Size

404 views
Skip to first unread message

Yos Tj

unread,
Jan 26, 2016, 1:12:17 AM1/26/16
to mongodb-user
Hello,

Would you please help me to understand "Sharding Existing Collection Data Size"
in the manual?
https://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size

As for the maximum size of collection which can be sharded,
the manual says "supports ... 256GB" at the first sentence.

But the following table there(in the manual) shows the maximum size
varies with the shard key size and chunk size.

As an example, the table says that max collection size is 31GB,
if the collection has no data other than shard key, and the
shard key size and chunk size is 512bytes and 64MB, respectively.

In my understanding of the table there, as the shard key size
becomes larger, the number of splits(means chunks, right?)
and the max collection size become smaller.

Nevertheless, why does the first sentence insists 256GB,
which is a constant, and much larger than 31GB?

Regards,

Wan Bachtiar

unread,
Feb 4, 2016, 9:38:53 PM2/4/16
to mongodb-user

Nevertheless, why does the first sentence insists 256GB,
which is a constant, and much larger than 31GB?

Hi Yos,

The listed collection size of 256GB is a reasonable guideline for an existing collection to be sharded successfully. Although, I agree that the manual could be made clearer on how the number came to be.

When you initially shard a collection, mongos asks mongod to find split points to divide existing data (i.e. minKey to maxKey, or all documents) into as many chunks as necessary given several factors. Based on MongoDB v3.2.1:

  • The initial total number of split points has to be lower than 8192 when you are sharding an existing collection.
  • The default chunk size is 64MB, but configurable up to a maximum of 1024 MB.
  • The value of the chunk size is the maximum chunk size, not the target chunk size. Generally the target chunk size will be created approximately half of the configured size so that it would be able to grow. Based on the split code snippet, it is using the average document size to calculate the target chunk size.

Using the above description, let’s work out where the 256GB guideline came from:

ConfiguredChunkSize = 64 MB  // default chunk size
AverageDocumentSize = 1 MB // db.collection.stats().avgObjSize
MaxSplitPoints = 8192

( ConfiguredChunkSize / (2 * AverageDocumentSize) ) * MaxSplitPoints = 262,144 MB // or ~256 GB.


If you need to shard an existing collection greater than 256GB, you could temporarily raise the chunk size to get around the initial maximum split number. For example, if you increase the chunk size to 128MB then you should be able to shard a collection around ~512GB of data. Once initial split points are done you can lower the chunk size back to default and the mongos will end up splitting the “larger” chunks when it next checks them for whether they need to be split.

Alternatively, you could pre-split chunks in a sharded cluster manually, and then migrate the data into the new sharded collection.

Generally, it is recommended not to wait to shard until a collection size is really huge. For an example, if you have 250GB collection that you want to shard and you have two shards, then 125GB of data will need to be migrated between shards to balance the collection. This will take some time and will add to the load on the cluster. So sharding a collection earlier is a good idea regardless.

It may also be worth pointing out that there is no limit on the actual size of a sharded cluster, just the size of the initial collection you’re sharding.


Best regards,

Wan.

Saleem Mirza

unread,
Feb 5, 2016, 9:13:51 AM2/5/16
to mongodb-user
@Wan, as you mentioned: 

ConfiguredChunkSize = 64 MB  // default chunk size
AverageDocumentSize = 1 MB // db.collection.stats().avgObjSize
MaxSplitPoints = 8192

( ConfiguredChunkSize / (2 * AverageDocumentSize) ) * MaxSplitPoints = 262,144 MB // or ~256 GB.

I am wondering if we still need to calculate factor of 2 for AverageDocumentSize even in case of wired tiger engine? Please correct me if i'm wrong, this is true in case of mmap only

Wan Bachtiar

unread,
Feb 8, 2016, 12:03:40 AM2/8/16
to mongodb-user

I am wondering if we still need to calculate factor of 2 for AverageDocumentSize even in case of wired tiger engine? Please correct me if i’m wrong, this is true in case of mmap only

Hi Saleem,

The concept of multiplying AverageDocumentSize by two is to avoid creating chunks close to the maximum chunk size. If chunks are filled close to the chunk maximum size after an initial sharding process, then the subsequent operations that would grow the chunk size would most likely trigger a chunk split.

This part of the initial sharding calculation is unrelated to the storage engine you are using.

Regards,

Wan.

Yos Tj

unread,
Feb 22, 2016, 4:06:22 AM2/22/16
to mongodb-user
Hi Wan,

Sorry for my late response.


 > The initial total number of split points has to be lower than 8192  when you are sharding an existing collection.

Oh, it's quite simple. But, in contrast, the table in the manual says
"Number of Splits" can be 31558 or 85196 (higher than 8192), depending on Shard Key Size.

I also found maxSplitPoints in the source code.
It is 8192 in V3.2, and was 8192 even in very old V2.2.
  https://github.com/mongodb/mongo/blob/v2.2/src/mongo/s/chunk.cpp
    bool Chunk::multiSplit( const vector<BSONObj>& m , BSONObj& res ) const {
        const size_t maxSplitPoints = 8192;

Is the manual telling us truth?

Regards,

Asya Kamsky

unread,
Feb 23, 2016, 5:52:06 PM2/23/16
to mongod...@googlegroups.com
If the shard key size is maximum allowed size then 8192 is the largest possible initial split that fits in BSON document.  
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d417fd66-48e4-49b3-adcf-387979c0b4d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Asya Kamsky
Lead Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
Free MongoDB Monitoring - cloud.mongodb.com
Free Online Education - university.mongodb.com
Get Involved - mongodb.org/community
We're Hiring! - https://www.mongodb.com/careers

Yos Tj

unread,
Feb 24, 2016, 5:11:39 AM2/24/16
to mongodb-user
Hello,


 > If the shard key size is maximum allowed size then 8192 is the largest possible initial split that fits in BSON document.

I'm confused. I know the maximum shard key size is 512 bytes.
Then, you say, the largest possible initial number of splits is 8192. It's OK for me, so far.

But, in contrast, the manual says that Number of Splits is 31,558, when Shard Key Size is 512.
What does "31558" mean here?
https://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size

Regards,

Asya Kamsky

unread,
Feb 25, 2016, 1:23:37 AM2/25/16
to mongod...@googlegroups.com
It looks like a mistake. I'll open a docs ticket to get it fixed.  
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/57c36567-529e-4712-8dce-c4f961f9f28c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Asya Kamsky

unread,
Mar 10, 2016, 2:39:00 PM3/10/16
to mongodb-user
Sorry about the delay.

It turns out that 8,192 number of splits limit only applies if you are doing a manual multi-split.  Initial split of the collection by the system is not limited to that number of chunks, however, it *is* limited to the number of split points fitting into the 16MBs document max size.

I filed DOCS-7254 to track clarifying the wording.

Asya
Reply all
Reply to author
Forward
0 new messages