efficiency of populate?

Bob

unread,

Jan 26, 2012, 4:28:28 AM1/26/12

to Mongoose Node.JS ORM

Can anybody tell me something about how populate is implemented? Does
it do its lookups on the server or through additional client-side
queries?

If I have a few hundred thousand objects that I want to stream back to
the client in a single query, will including a populate of a small
object for that query change performance a great deal, more than in
proportion to the number of bytes of data returned?

--Bob

Aaron Heckmann

unread,

Jan 26, 2012, 12:43:54 PM1/26/12

to mongoo...@googlegroups.com

The look ups are done on the client since there are no joins in MongoDB.

A separate query is made for each populated path for each document. If you only populate one path in your case it ends up being a few hundred thousand queries.

Keep in mind that a query returning a few hundred thousand docs requires many "getMore" commands back to MongoDB (done transparently by the driver) anyway. The default batchSize of these "getMore" commands is 1000, meaning 1000 docs will be loaded into memory before again requesting "getMore" to fetch the next 1000. The batchSize of these getMore commands is configurable with the query.batchSize(int) method.

The default Model.find() buffers all several hundred thousand docs into memory before passing off to your callback so I'd use query.stream() for this. That way you only fetch batchSize records into memory at a time. As each doc is streamed in, the extra query is made for population. After population the doc is emitted in the 'data' event.

var query = Stuff.find(params).batchSize(something).stream();

http://mongoosejs.com/docs/querystream.html

--
http://mongoosejs.com
http://github.com/learnboost/mongoose
You received this message because you are subscribed to the Google
Groups "Mongoose Node.JS ORM" group.
To post to this group, send email to mongoo...@googlegroups.com
To unsubscribe from this group, send email to
mongoose-orm...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/mongoose-orm?hl=en

--
Aaron

Robert Mayo

unread,

Jan 26, 2012, 5:45:51 PM1/26/12

to mongoo...@googlegroups.com

Thanks for the detailed response. Excellent information for my design.

Even .each() buffers the whole query? I could imagine someday it could work on a batch-at-a-time basis.

--Bob

Aaron Heckmann

unread,

Jan 26, 2012, 6:00:28 PM1/26/12

to mongoo...@googlegroups.com

each() doesn't buffer either but i've found it clunky and rather "non-node-like" as far as api. each() also doesn't always close the underlying cursor and will be deprecated soonish. there's no benefit of each() over stream(). streams are more configurable, do proper clean up, and compatible with other node streams.

Charlie

unread,

Aug 23, 2013, 7:10:34 AM8/23/13

to mongoo...@googlegroups.com

Couldn't populate() just do one additional query per field that's being populated?

First, run the initial query, then run a second query like the below?

{ _id: { $in: [<IDs from the first query] } }

Then on the client side you could then add the populated docs back into the main doc?

Message has been deleted

ta...@mindblazetech.com

unread,

May 6, 2014, 4:34:42 AM5/6/14

to mongoo...@googlegroups.com

Yes that would be a good implementation, Plus i think it can be further optimized if the mongoose layer does population intelligently so like if i get 1000 documents in total that need to be populated. If 300 out of those 1000 have the same DBRef, meaning the same _id for population, mongoose should just use the already populated doc for populating those docs. So like a mini cache during the time of population.

I think optimizing population method in mongoose is a must as it seems to be a core feature.

Aaron Heckmann, what do you think ?

sproj...@gmail.com

unread,

May 21, 2014, 7:54:56 PM5/21/14

to mongoo...@googlegroups.com

+1. Also, is there a way mongoose can sit close to mongodb instead of the client side to add efficiency by avoiding the n/w round trip? If I have a SomeModel.find(..).populate(...) , can the second query be run at a place closer to the mongodb?

Reply all

Reply to author

Forward