Slow update with large array of subdocuments

566 views
Skip to first unread message

areka...@gmail.com

unread,
Apr 18, 2014, 1:48:18 PM4/18/14
to mongod...@googlegroups.com
Hello,

This problem has been driving me nuts and I am starting to believe that probably mongo can not handle it and a schema change is required.

I will simplify everything in order to make things more clear. I have a small collection (about 5000 items) with documents that do not change in size or their schema after they are inserted in the collection and I am only doing small updates on a specific field and the field type (integer with value 0 or 1) stays the same after the update

The document looks like this:

{_id: STRING
 outputs
: [
           
{:foo INTEGER, :index INTEGER, :bar STRING}
           
more subdocuments with same structure..
         
]}

The outputs array has 3000 items some times and maybe more.

The following index exists:

db.collection.ensureIndex({"_id": 1,"outputs.index":1})

db.collection.stats() reports a padding factor of 1.

So, in my code I am doing something like this:

outputs = (load outputs from some other resource)
for each output
  db
.collection.update({"_id": output.STRING_ID, "outputs.index": output.INTEGER_INDEX},
         
{$set: {"outputs.$.foo": output.INTEGER_FOO}})

And the query takes >3000ms BUT if I do a simply db.collection.find() it takes less than 100ms.

The following is the find().explain():

{
"cursor" : "BtreeCursor _id_1_outputs.index_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"_id" : [
[
"SOME_STRING",
"SOME_STRING"
]
],
"outputs.index" : [
[
SOME_INTEGER,
SOME_INTEGER
]
]
},
"server" : "localhost:27017"
}

Finally, mongostat reports 99%-100% database lock, fault 0 and CPU goes to 100%. All of this is happening to a powerful server with a lots of RAM and 4 CPUs and I can also see the same behavior under a Macbook Air. 

Mongo Version: 2.4.5

I would appreciate ANY ideas at all about what's going on here. Is it because there are many subdocuments in the array maybe?

Cheers,

Ryan

Asya Kamsky

unread,
Apr 19, 2014, 2:43:17 AM4/19/14
to mongodb-user

This is SERVER-8192.

It's a good idea to avoid indexed huge arrays, especially in cases when you expect to be updating the values.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bb214adc-3876-4853-9304-518cd90d1272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

areka...@gmail.com

unread,
Apr 19, 2014, 11:53:11 AM4/19/14
to mongod...@googlegroups.com
Hello Asya,

Thank you for your reply. 

Is there a work around for this? or the only option is to put the array in a separate collection?

Ryan

Asya Kamsky

unread,
Apr 21, 2014, 12:04:48 AM4/21/14
to mongod...@googlegroups.com
it's not the best schema to have very large indexed arrays, even outside of this bug.

Putting array in a separate collection will help.   If the array elements don't need to be indexed, then that would help as well.  Other options may be to flatten the outputs array to have multiple documents with one output each - it really depends on your uses of the data...

Asya


On Saturday, April 19, 2014, <areka...@gmail.com> wrote:
Hello Asya,

Thank you for your reply. 

Is there a work around for this? or the only option is to put the array in a separate collection?

Ryan

On Saturday, April 19, 2014 9:43:17 AM UTC+3, Asya Kamsky wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bb214adc-3876-4853-9304-518cd90d1272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

areka...@gmail.com

unread,
Apr 22, 2014, 9:53:57 AM4/22/14
to mongod...@googlegroups.com
Hi Asya,

Thanks for the reply. I guess I will just split the outputs array to a new collection and re-factor my code.
I hope the bug is resolved soon because i am confident that mongo can handle my case since I am not changing the documents in size and it will definitely simplify use cases for other people too.

Cheers,

Ryan
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

areka...@gmail.com

unread,
Jul 2, 2014, 3:46:23 PM7/2/14
to mongod...@googlegroups.com
Just a follow up regarding this issue:

If I am updating a subdocument, like so:

db.collection.update({"_id": output.STRING_ID}, 
         
{$set: {"outputs.100.foo": output.INTEGER_FOO}})

Shouldn't this be very fast since I am setting the indexed key of a subdocument in a specific position (position 100) in the array?
And my second question would be, If i am doing 1000 similar updates (but always provide the array index), shouldn't this be fast?

Or does Mongo simply can not perform, no matter what I try, with huge arrays of subdocuments and updates?

Once again, the document does not change in size with the above update. 

Thanks for any input.

Cheers, 

Ryan

Asya Kamsky

unread,
Jul 4, 2014, 4:00:52 AM7/4/14
to mongod...@googlegroups.com
Setting a single specific field may be "fast".
But 1000 times fast may not be all that fast anymore...

Asya

areka...@gmail.com

unread,
Jul 4, 2014, 5:11:06 AM7/4/14
to mongod...@googlegroups.com
Thanks for getting back to me Asya.

I do have one more question though; What if I use $set to set the whole subdocuments array in one query? The array will not change in size once again, I am only changing a few zeros to ones and setting the whole array (which might include 1000 items). I do experience slowness however from Mongo when I do this and I see 100% CPU usage well. Is this expected?

Regards,

Ryan

William Berkeley

unread,
Jul 7, 2014, 11:32:50 AM7/7/14
to mongod...@googlegroups.com
If you are doing the update by replacing the old array with a slightly changed new array, it's the same as just replacing the whole array, so it's expected that this would be slower than changing an individual entry. I think it's important to point out that in your case a lot of the work involved isn't strictly updating the document, it's updating all of the indexes associated with the document. There are a lot of index keys to update because you have indexed a large array. SERVER-8192 referenced by Asya is talking about exactly this issue of performance when updating all the index entries for arrays.

-Will
Reply all
Reply to author
Forward
0 new messages