Performance problems on multiKey index

293 views
Skip to first unread message

Joao Santos

unread,
Sep 25, 2017, 7:39:57 AM9/25/17
to mongodb-user
Hi,

We're thinking on using a multiKey index on an application we are developing but after some performance testing we found that it considerably slows down the document update times.
This is true both when you update the full document, but also when you use atomic $push/$pop operations to add/remove an element from the array configured with the multiKey indexes. We were expecting the full document update to be slow, but we were also expecting the $push/$pop operations to be fast, even with the index. What we found is that the latency of the $push/$pop operations basically doubles/triples.

These are the results we got on a document with the following structure (the multiKey index was configured on field "z"):
{
  "_id": 1,
  "array": {
    "x": 2,
    "y": 3,
    "z": 4
  }
}

$push - no index
push 1 element to a X elements array
update document - no index
add 1 element to a X elements array and re-insert document
array length50010002000array length50010002000
total latency for 5000 operations
4781894116777
total latency for 5000 operations
91141559037451
468892311510585201719838681
574178921487592341640436017
446379571521886191530035392
460579151751680181528934851
avg(ms)4855.608387.2015898.20avg(ms)8701.0015956.2036478.40
per op (ms)0.971.683.18per op (ms)1.743.197.30
update cost
vs push
79%90%129%
$push - with index on "z"
push 1 element to a X elements array
update document - with index on "z"
add 1 element to a X elements array and re-insert document
array length50010002000array length50010002000
total latency for 5000 operations
131972359646991
total latency for 5000 operations
195753132365944
126802358346014178393265164005
126672394945089174602898366441
130912321446071171603417564536
131622394446762167272921665302
avg(ms)12959.4023657.2046185.40avg(ms)17752.2031269.6065245.60
per op (ms)2.594.739.24per op (ms)3.556.2513.05
update cost
vs push
37%32%41%
idx overhead167%182%191%idx overhead104%96%79%


We considered it could be because the document was growing and Mongo had to find a new place for it on the disk, so, we also tried $pop, but got the same bad results:
$pop - with index on "z"
pop 1 element to a X elements array
array length50010002000
total latency for 5000 operations
131102420544863
124532406844319
122642393545605
120992253945726
140512400248023
avg(ms)12795.4023749.8045707.20
per op (ms)2.564.759.14


Is this expected behaviour for MongoDB or some is there a possible optimisation that can be done on the index update when only a single element is being inserted/deleted/updated on the array?




*Confidentiality Notice: The information contained in this e-mail and any
attachments may be confidential. If you are not an intended recipient, you
are hereby notified that any dissemination, distribution or copying of this
e-mail is strictly prohibited. If you have received this e-mail in error,
please notify the sender and permanently delete the e-mail and any
attachments immediately. You should not retain, copy or use this e-mail or
any attachment for any purpose, nor disclose all or any part of the
contents to any other person. Thank you.*

Kevin Adistambha

unread,
Oct 4, 2017, 10:33:15 PM10/4/17
to mongodb-user

Hi

Is this expected behaviour for MongoDB or some is there a possible optimisation that can be done on the index update when only a single element is being inserted/deleted/updated on the array?

Do you absolutely require the field array.z to be an array, or is there any alternative schema design that you could consider for your use case?

I wouldn’t call it a problem, but having a large array with thousands of entries might limit your options in the future, since there is a 16MB limit on BSON document size (Limitations and Thresholds). If this array could grow to an undetermined size, hitting this limit in the future might be a real possibility.

Another potential issue is the way multikey index works. MongoDB creates a multikey index by essentially creating one index key per array element. Thus, if your document contains an array with 2000 elements, then MongoDB will create 2000 index entries for that single document. This could create an issue with index sizes.

I noted from your extensive experiments that the growth in operation timing between array sizes is pretty linear. For example, without index, updating a document with array of 1000 took approximately twice as long as updating a document with 500 elements. Also updating a document with 2000 elements took about twice as long as updating a document with 1000 elements. It took longer with indexes, but that’s because the indexes also needs updating as well. This is because WiredTiger is a no-overwrite storage engine, which means that a small update translates to a full document rewrite inside WiredTiger. If your use case require small updates to a large document, you could see faster performance using the MMAPv1 storage engine. Having said that, WiredTiger provides you with better concurrency, compression, and more efficient disk use.

Regarding schema design, you might want to consider the following links:

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages