Let's say I have two collections, A and B, and a single document in A is related to N documents in B. For example, the schemas could look like this:
Collection A:
{id: (int),
propA1: (int),
propA2: (boolean)
}
Collection B:
{idA: (int), # id for document in Collection A
propB1: (int),
propB2: (...),
...
propBN: (...)
}
I want to return properties propB2-BN
and propA2
from my API, and only return information where (for example) propA2 = true
, propB6 = 42
, and propB1 = propA1
.
This is normally fairly simple - I query Collection B to find documents where propB6 = 42
, collect the idA
values from the result, query Collection A with those values, and
filter the results with the Collection A documents from the query.
However, adding skip and limit parameters to this seems impossible to
do while keeping the behavior users would expect. Naively applying skip
and limit to the first query means that, since filtering occurs after
the query, less than limit documents could be returned. Worse, in some
cases no documents could be returned when there are actually still
documents in the collection to be read. For example, if the limit was 10
and the first 10 Collection B documents returned pointed to a document
in Collection A where propA2 = false
, the function would return nothing. Then the user would assume there's nothing left to read, which may not be the case.
A slightly less naive solution is to simply check if the return count is < limit, and if so, repeat the queries until the return count = limit. The problem here is that skip/limit queries where the user would expect exclusive sets of documents returned could actually return the same documents.
I want to apply skip and limit at the mongo query level, not at the API level, because the results of querying collection B could be very large.
MapReduce and the aggregation framework appear to only work on a single collection, so they don't appear to be alternatives.
This seems like something that'd come up a lot in Mongo use - any ideas/hints would be appreciated.