Understanding Padding Factor

434 views
Skip to first unread message

Raji Sridar

unread,
Jun 3, 2014, 5:21:56 PM6/3/14
to mongod...@googlegroups.com
Hi,

I am trying to use padding factor to optimize on update times on a document with max 50 sub-documents each 1K max. I tested 100000 inserts and 100000 updates with no pad, padding factor 1.0, padding factor 2.0, padding factor 3.0, padding factor 4.0, manual padding and unsetting pad for 25 sub-doc, 30 sub-doc, 35 sub-doc, 40 sub-doc and 50 sub-doc. I find that 

1.stats after compact with padding factor 2,3 or 4 also only shows padding factor 1. After updating the document with 50 sub-doc still shows padding factor 2.
Compact padding factor 4.0 Elapsed time:  594 ms
{
        "ns" : "test.assets4",
        "count" : 100000,
        "size" : 26880000,
        "avgObjSize" : 268,
        "storageSize" : 30969856,
        "numExtents" : 9,
        "nindexes" : 1,
        "lastExtentSize" : 11325440,
        "paddingFactor" : 1,
        "systemFlags" : 0,
        "userFlags" : 1,
        "totalIndexSize" : 2935184,
        "indexSizes" : {
                "_id_" : 2935184
        },
        "ok" : 1
}
WriteResult({ "nMatched" : 100000, "nUpserted" : 0, "nModified" : 100000 })
Adding 50 sub-doc in doc with padding factor 4.0 Elapsed time:  97636 ms
{
        "ns" : "test.assets4",
        "count" : 100000,
        "size" : 6552019952,
        "avgObjSize" : 65520,
        "storageSize" : 7167971328,
        "numExtents" : 26,
        "nindexes" : 1,
        "lastExtentSize" : 1861685248,
        "paddingFactor" : 2,
        "systemFlags" : 0,
        "userFlags" : 1,
        "totalIndexSize" : 2935184,
        "indexSizes" : {
                "_id_" : 2935184
        },
        "ok" : 1

2. For my use case, manual padding with 30 sub-doc and unsetting in the initial document seems to be most efficient with best times upto 40 1K records and then the performance starts deteriorating. 

Are both the above observations consistent with expectations based on Mongodb design? Please let me know.

Raji Sridar

unread,
Jun 4, 2014, 9:38:09 PM6/4/14
to mongod...@googlegroups.com
Is this a bug or expected behavior? How do I get response from Mongodb developers?
Thanks
Raji 

s.molinari

unread,
Jun 5, 2014, 1:54:06 AM6/5/14
to mongod...@googlegroups.com
Hi Raji,

Have you read this?


From what I understand, the padding can't go above 2. 

 Each collection has its own padding factor, which defaults to 0 when you insert the first document in a collection, and may grow incrementally to 2 as as documents grow as a result of updates.

Also important is the fact the usePowerOf2Sizes is the most performant for inserting AND updating and this is now default in 2.6.x.

Different allocation strategies support different kinds of workloads: the power of 2 allocations are more efficient for insert/update/delete workloads; while exact fit allocations is ideal for collections without update and delete workloads.

So, that being said, I think your logic with setting up your own padding/ allocation to speed up updating is actually what is flawed. 

Scott

s.molinari

unread,
Jun 5, 2014, 1:56:01 AM6/5/14
to mongod...@googlegroups.com
Oh, and I am not a MongoDB developer. I am just a newb, slowly becoming an amatuer to mongodb.:D

Scott

Raji Sridar

unread,
Jun 6, 2014, 6:03:44 PM6/6/14
to mongod...@googlegroups.com
Hi Scott,

Thanks for the response. I am confused why the logic is flawed. Mongo by default does power of 2 size or exact fit. I know my doc is going to have 1-50 sub documents. Whats wrong in pre-allocating for say 30 sub-doc? Whats your logic in saying the default padding will suffice? Please clarify.

Regards
Raji

------


On Tuesday, June 3, 2014 2:21:56 PM UTC-7, Raji Sridar wrote:

s.molinari

unread,
Jun 7, 2014, 7:24:28 AM6/7/14
to mongod...@googlegroups.com
This is what I know and I'll also admit, it is founded on very little experience. 

The allocation of memory/ disk space (which padding is part of) is to avoid movement of documents on disk, which can happen, if they get bigger than their allocated space. This kind of document movement is really bad for performance, should it happen a lot. I am not sure what "a lot" is, but document movement seems to be something you really want to avoid if you can. I've been told it several times and you can read it often in the manual. So that is the why for allocation. You probably already knew this.

But how does powerOf2Sizes allocation work?

Say you store the first document in your collection with only 1 subdoc, then the allocation will be for whatever size in KB that record is, starting at 32KB or multiple of 2 higher (i.e. 64KB, 128KB, 256KB etc. .... up to 4MB, after that, it can be anywhere between 4 and 16MB, rounded up to the nearest MB). So if your data is 24KB in size in that one document, your document will take up 32KB of space. If you then store a lot of documents with only 1 subdocument, then you'll have a lot of 32KB blocks of space used.

Now say you need to store a new document with 50 subdocuments and it is actually 140KB in size. Mongo will then increase the allocation of that single record to 256KB (128KB is too small). And, every other document you save after that will also be 256KB, despite them actually being smaller. ( I am not sure about this one fact, couldn't find any examples or info.)

Now, what about updates?

For any new documents, they have plenty of room to grow. No problems there.

However, if you have to add the other 49 subdocuments to one or more of the older 32KB documents, this will cause document movement, since the 140KB is much larger than the allocated 32KB. Not good and what you want to avoid. 

So, what you might want to do for a preallocation of your data, is to load an initial dummy document with the 50 subdocuments filled with fake data. If you are absolutely sure the documents won't grow out of this certain schema, then you could tighten down the padding a lot. But be aware, changing the schema in any way in the future, which will cause any updated documents to grow in size, will cause havoc. The padding can only be smaller than the powerOf2sizes. 

So in the end, if you aren't really sure about the size now or in the future, let MongoDB handle the allocation. 

If you are absolutely sure about the average size, prefill a dummy document to that average size and let Mongo handle the rest. 

If you really know the max size and it will never, ever change, prefil a dummy document to the max size, with very tight padding. 

Although, with that last possibility, I am not really sure it will help performance in a read/update/insert scenario, which is what you are suggesting and expecting. I am thinking that the powerOf2sizes are specially organized in this certain way, in order to match the chunks of data logically read from disk (which can also only be set in powers of 2 too). 

The other advantage of powerOf2Sizes is the reuse of deleted data segments. Since they are "standardized", it is easy for Mongo to reuse those empty chunks.

I can't wait for Asya or William to grade/ correct my answer. I hope one of them will.:) They are the real experts.

Scott

Asya Kamsky

unread,
Jun 9, 2014, 5:36:06 PM6/9/14
to mongodb-user
It looks like you are using 2.6.x so powersOfTwo sizing will be the
default. That means that you won't see the same behavior you would
had you turned off powers of 2 sizing option for your collection.

You may be confusing the "paddingFactor" displayed in collection stats
- this simply tells you how much space will be allocated for the very
next document that will be inserted into the collection. If it's 1 it
means the default. If it's 2 it's double the default, etc. Prior
padding does determine future padding strategy, but you cannot tell by
paddingFactor displayed how much padding each document has currently.

If you determined the optimal pattern for your use case, by all means, use it.

Performance deteriorating may be unrelated to padding on the
documents, or it may be for other reasons (index to RAM ratio, random
IO, something else)...

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/65ba164b-7f74-44d6-a407-cfc9981ecd5f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Raji Sridar

unread,
Jun 9, 2014, 7:13:42 PM6/9/14
to mongod...@googlegroups.com
Thanks much Scott & Asya!
I was not expecting futuristic padding in the output of 'paddingFactor' in stats (I was expecting current padding factor). Is this future padding based on the last inserted record alone or an average of all the inserted records or any other formula?
Using Mongo2.6.x, if I keep my array size as a power of 2 and my array element size as a power of 2, will that be the best padding possible ( assuming all indexes are in RAM, no random IO...) to minimize fragmentation and any performance deterioration due to fragmentation and moving documents within a collection alone (leaving out other factors that contribute to deterioration)? Please confirm. Greatly appreciate your inputs.

Thanks
Raji
------
On Tuesday, June 3, 2014 2:21:56 PM UTC-7, Raji Sridar wrote:

Asya Kamsky

unread,
Jun 9, 2014, 10:06:54 PM6/9/14
to mongodb-user
The padding factor is computed and recomputed continuously based on
all the updates that caused a document to move because it outgrew its
original allocation.

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/98e52308-a487-40bb-b354-d27940488cea%40googlegroups.com.

s.molinari

unread,
Jun 10, 2014, 2:08:55 AM6/10/14
to mongod...@googlegroups.com
That I didn't know. Thanks Asya. Keep getting those little golden nuggets here.;)

What would happen, if say later in time, a new feature is added and it would mean a ton of data must be added to older documents, extending them over their allocated storage size? I know it won't happen too often, but this could reek havoc on the database, couldn't it? How to avoid it? How to cope with it?

If there is anything that I can foresee as a very serious issue for our use of Mongo, although rare, it would be retro-updating older documents with a load of data and causing a lot of document movement. Denying our customers the ability to update older documents with a lot of data wouldn't be a real solution either. Hmmm......

Scott

Asya Kamsky

unread,
Jun 10, 2014, 4:42:07 AM6/10/14
to mongodb-user
Why would all the documents need to be updated at once?

Nice thing about flexible schema is you can do lazy-update - when you
happen to be accessing the document. Or you can bite the bullet,
update them all at once during a pseudo-maintenance period and maybe
compact after that...

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/ee444abe-35fe-4f7d-ba9d-1eb45475bb8f%40googlegroups.com.

s.molinari

unread,
Jun 10, 2014, 8:55:32 AM6/10/14
to mongod...@googlegroups.com
To be honest, I don't know. That is our problem. We won't be controlling completely what our customers can do with the database from a schema standpoint. We will only be putting limits on the database usage to avoid things like high numbers of queries per HTTP request. But a limit on schema changes that could cause excessive document movement by updating documents isn't planned. Should it be?

Scott 

Raji Sridar

unread,
Jun 11, 2014, 2:59:08 PM6/11/14
to mongod...@googlegroups.com
Hi Scott,

Mongodb preallocates up to double the size using power of 2 size by default starting 2.6.x. This will accommodate up to a max of doubling the doc size only. Any more that you anticipate, do a manual pad. For example I expect a sub-doc 50 times, I tested for starting with 30 (the doc size with 30 is the max doc size required for up to around 50 due to power of 2 size allocation), unsetting and then filling it with actual sub-doc as and when they come. If you leave this open and just go with power of 2 sizes, the price is the document movement overhead.
Asya, Is my understanding correct? Please confirm.

Thanks
Raji


On Tuesday, June 3, 2014 2:21:56 PM UTC-7, Raji Sridar wrote:

s.molinari

unread,
Jun 12, 2014, 3:01:10 AM6/12/14
to mongod...@googlegroups.com
As I understand how powerOf2Sizes works, it has nothing to do with the doubling of the document size. It has to do with the physical size of the document on disk. So if your pre-filled document with 30 subdocuments is say, 169KB, then the space used for that document will be the next highest power of 2 size, which is 256KB. 

The only other possibility would be it does double the document size and then takes the next power of 2 size, which would be 512KB, but that would seem to me like a real waste of storage. 

Asya, what is the real story?:)

Scott

Raji Sridar

unread,
Jun 12, 2014, 3:49:39 PM6/12/14
to mongod...@googlegroups.com
It takes it to the next power of 2. In extreme case, next power of 2 utmost doubles the doc to next power of 2. Thats what the name implies, documents state and I tested to ensure that 30 sub-docs pad suffices for upto 50 docs for best performance.
If your max possible doc size is 512KB, bloat it to 257KB physically manually padding it, Mongo will pad it to 512KB, unset the pad and wait and watch the doc become 512KB without any doc movement with minimal update times, assuming you have enough RAM and other factors that influence better performance. This is what I learnt in this test, now sharing with you.

Thanks
Raji

On Tuesday, June 3, 2014 2:21:56 PM UTC-7, Raji Sridar wrote:

Asya Kamsky

unread,
Jun 12, 2014, 7:02:04 PM6/12/14
to mongodb-user
Sounds like you both came to the same conclusion :)

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/d1ec61a5-418e-4210-8dd1-ffd4d4b0b490%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages