MongoDB skip & limit when querying two collections

56 views
Skip to first unread message

El Hefe

unread,
Apr 13, 2014, 1:34:35 PM4/13/14
to mongod...@googlegroups.com
This is a cross post from SO, if people would rather reply there: http://stackoverflow.com/questions/22885057/mongodb-skip-limit-when-querying-two-collections

Let's say I have two collections, A and B, and a single document in A is related to N documents in B. For example, the schemas could look like this:

Collection A:
{id: (int),
 propA1: (int),
 propA2: (boolean)
}
Collection B:
{idA: (int), # id for document in Collection A
 propB1: (int),
 propB2: (...),
 ...
 propBN: (...)
}

I want to return properties propB2-BN and propA2 from my API, and only return information where (for example) propA2 = true, propB6 = 42, and propB1 = propA1.

This is normally fairly simple - I query Collection B to find documents where propB6 = 42, collect the idA values from the result, query Collection A with those values, and filter the results with the Collection A documents from the query.

However, adding skip and limit parameters to this seems impossible to do while keeping the behavior users would expect. Naively applying skip and limit to the first query means that, since filtering occurs after the query, less than limit documents could be returned. Worse, in some cases no documents could be returned when there are actually still documents in the collection to be read. For example, if the limit was 10 and the first 10 Collection B documents returned pointed to a document in Collection A where propA2 = false, the function would return nothing. Then the user would assume there's nothing left to read, which may not be the case.

A slightly less naive solution is to simply check if the return count is < limit, and if so, repeat the queries until the return count = limit. The problem here is that skip/limit queries where the user would expect exclusive sets of documents returned could actually return the same documents.

I want to apply skip and limit at the mongo query level, not at the API level, because the results of querying collection B could be very large.

MapReduce and the aggregation framework appear to only work on a single collection, so they don't appear to be alternatives.

This seems like something that'd come up a lot in Mongo use - any ideas/hints would be appreciated.


s.molinari

unread,
Apr 13, 2014, 3:00:35 PM4/13/14
to mongod...@googlegroups.com
Disclaimer: I am learning Mongo. 

AFAIK, you can also only query one collection at a time. So, you'd have to query for your set of documents in Collection A according to the A2=true filter, then get the IDs from those documents and use them to query for the data in Collection B, along with any other filtering you'd need. The A2=true filter doesn't really seem very selective, so you would most likely have to filter down further somehow to get a more refined set of results. Using skip and limit at this point won't be very helpful, as you pointed out. 

If the data in Collection B is not unbounded, or said differently, if the maximum number of possible related records in Collection B to one record in Collection A is finite, you might want to think about denormalizing the data, by moving the collection B into embedded documents of Collection A. That way, you only have one collection with all the data to query on. 

How many B records could there be for one record in Collection A? If the answer to that question is, only a limited amount, then you could think about denormalizing collection B.

Scott


Reply all
Reply to author
Forward
0 new messages