Performance problems on multiKey index

293 views

Skip to first unread message

Joao Santos

unread,

Sep 25, 2017, 7:39:57 AM9/25/17

to mongodb-user

Hi,

We're thinking on using a multiKey index on an application we are developing but after some performance testing we found that it considerably slows down the document update times.

This is true both when you update the full document, but also when you use atomic $push/$pop operations to add/remove an element from the array configured with the multiKey indexes. We were expecting the full document update to be slow, but we were also expecting the $push/$pop operations to be fast, even with the index. What we found is that the latency of the $push/$pop operations basically doubles/triples.

These are the results we got on a document with the following structure (the multiKey index was configured on field "z"):

{

"_id": 1,

"array": {

"x": 2,

"y": 3,

"z": 4

}


	$push - no index push 1 element to a X elements array				update document - no index add 1 element to a X elements array and re-insert document
array length	500	1000	2000	array length	500	1000	2000

total latency for 5000 operations	4781	8941	16777	total latency for 5000 operations	9114	15590	37451
	4688	9231	15105		8520	17198	38681
	5741	7892	14875		9234	16404	36017
	4463	7957	15218		8619	15300	35392
	4605	7915	17516		8018	15289	34851

avg(ms)	4855.60	8387.20	15898.20	avg(ms)	8701.00	15956.20	36478.40
per op (ms)	0.97	1.68	3.18	per op (ms)	1.74	3.19	7.30

				update cost vs push	79%	90%	129%

	$push - with index on "z" push 1 element to a X elements array				update document - with index on "z" add 1 element to a X elements array and re-insert document
array length	500	1000	2000	array length	500	1000	2000

total latency for 5000 operations	13197	23596	46991	total latency for 5000 operations	19575	31323	65944
	12680	23583	46014		17839	32651	64005
	12667	23949	45089		17460	28983	66441
	13091	23214	46071		17160	34175	64536
	13162	23944	46762		16727	29216	65302

avg(ms)	12959.40	23657.20	46185.40	avg(ms)	17752.20	31269.60	65245.60
per op (ms)	2.59	4.73	9.24	per op (ms)	3.55	6.25	13.05

				update cost vs push	37%	32%	41%

idx overhead	167%	182%	191%	idx overhead	104%	96%	79%

We considered it could be because the document was growing and Mongo had to find a new place for it on the disk, so, we also tried $pop, but got the same bad results:


	$pop - with index on "z" pop 1 element to a X elements array
array length	500	1000	2000

total latency for 5000 operations	13110	24205	44863
	12453	24068	44319
	12264	23935	45605
	12099	22539	45726
	14051	24002	48023

avg(ms)	12795.40	23749.80	45707.20
per op (ms)	2.56	4.75	9.14

Is this expected behaviour for MongoDB or some is there a possible optimisation that can be done on the index update when only a single element is being inserted/deleted/updated on the array?

*Confidentiality Notice: The information contained in this e-mail and any
attachments may be confidential. If you are not an intended recipient, you
are hereby notified that any dissemination, distribution or copying of this
e-mail is strictly prohibited. If you have received this e-mail in error,
please notify the sender and permanently delete the e-mail and any
attachments immediately. You should not retain, copy or use this e-mail or
any attachment for any purpose, nor disclose all or any part of the
contents to any other person. Thank you.*

Kevin Adistambha

unread,

Oct 4, 2017, 10:33:15 PM10/4/17

to mongodb-user

Is this expected behaviour for MongoDB or some is there a possible optimisation that can be done on the index update when only a single element is being inserted/deleted/updated on the array?

Do you absolutely require the field array.z to be an array, or is there any alternative schema design that you could consider for your use case?

I wouldn’t call it a problem, but having a large array with thousands of entries might limit your options in the future, since there is a 16MB limit on BSON document size (Limitations and Thresholds). If this array could grow to an undetermined size, hitting this limit in the future might be a real possibility.

Another potential issue is the way multikey index works. MongoDB creates a multikey index by essentially creating one index key per array element. Thus, if your document contains an array with 2000 elements, then MongoDB will create 2000 index entries for that single document. This could create an issue with index sizes.

I noted from your extensive experiments that the growth in operation timing between array sizes is pretty linear. For example, without index, updating a document with array of 1000 took approximately twice as long as updating a document with 500 elements. Also updating a document with 2000 elements took about twice as long as updating a document with 1000 elements. It took longer with indexes, but that’s because the indexes also needs updating as well. This is because WiredTiger is a no-overwrite storage engine, which means that a small update translates to a full document rewrite inside WiredTiger. If your use case require small updates to a large document, you could see faster performance using the MMAPv1 storage engine. Having said that, WiredTiger provides you with better concurrency, compression, and more efficient disk use.

Regarding schema design, you might want to consider the following links:

Best regards,
Kevin

Reply all

Reply to author

Forward

0 new messages