Why is the 2.5 aggregation (and maybe other) cursor's first batch of results is still a single document?

Robert Moore

unread,

Nov 3, 2013, 12:23:14 AM11/3/13

to mongo...@googlegroups.com

I'm looking at some of the code for the new cursor support for the aggregation command and was surprised that the first batch of results is returned in a reply message still containing a single document with the following structure:

{
cursor : {
id : NumberLong('1234567890'),
ns : 'database.collection',
firstBatch : [
{ ... },
...
]
},
ok : 1.0
}

Why is a single document still being returned with the results nested inside? I thought one of the reasons for moving to the cursors was to allow result documents up to the maximum size. Doesn't this method limit the first documents to less than that?

Wouldn't it be more natural to the drivers to move the cursor id into the OP_REPLY field reserved for that purpose? You could also then use the query failed bit to indicate a failure. Just like a query...

Sorry if this is a silly question but it was 180 degrees out of phase from what I expected. What am I missing?

Rob.

Derick Rethans

unread,

Nov 3, 2013, 10:14:54 AM11/3/13

to mongo...@googlegroups.com

Not much. By default the firstBatch does contain responses, just like
what would happen with a normal query — with the difference that with a
normal query documents are indeed returned as full. At the maximum size
there is still a limit on how many documents can be returned in a
package btw (right now, 48MB iirc).

The aggregate cursor returns things as a firstBatch mostly to prevent
multiple round-trips to the server. In the vast amount of cases the
results are going to be less than 16MB.

If you *don't* want to have documents in the firstBatch, then you can
simply set the batchSize to 0 — then no documents will be part of the
first batch.

This way of approaching queries (a simple request/result, without using
OP_REPLY etc. flags) is being adopted more and more. In 2.5.4 there is a
new write API using a similar interface. It is not unlikely that this
will be slowly transitioned to for all server interaction.

cheers,
Derick

--
{
website: [ "http://mongodb.org", "http://derickrethans.nl" ],
twitter: [ "@derickr", "@mongodb" ]
}

Robert Moore

unread,

Nov 3, 2013, 10:57:38 AM11/3/13

to mongo...@googlegroups.com

Not much. By default the firstBatch does contain responses, just like
what would happen with a normal query — with the difference that with a
normal query documents are indeed returned as full. At the maximum size
there is still a limit on how many documents can be returned in a
package btw (right now, 48MB iirc).

I expect the first response to contain results. That was not my concern. My concern was that there will be cases were users carefully construct (ok - they will be lucky that) there documents to be just under the 16MB limit from the aggregation and they will still run into problems due to the response being nested inside of another document. There is also now a un-documented internal messaging structure. I think it is fair to say that MongoDB Inc. has historically not done a good job documenting protocols... When will this new structure/interaction get documented?

The aggregate cursor returns things as a firstBatch mostly to prevent
multiple round-trips to the server. In the vast amount of cases the
results are going to be less than 16MB.

Agreed, for old aggregation pipelines, but with the new security features being baked into aggregation pipelines these cursors will become very common (in certain workloads) as they replace all queries.

If you *don't* want to have documents in the firstBatch, then you can
simply set the batchSize to 0 — then no documents will be part of the
first batch.

This way of approaching queries (a simple request/result, without using
OP_REPLY etc. flags) is being adopted more and more. In 2.5.4 there is a
new write API using a similar interface. It is not unlikely that this
will be slowly transitioned to for all server interaction.

But the results are already returned in an OP_REPLY message. Why not just use it? The second through N batch of documents (as a result of the getmore) come back using the 2.4 semantics.

If you/MongoDB want to move to a simple request/reply model then just do it and create a new wire protocol from the ground up. Hard break to a completely new message format. Yeah you would have to support old and new for a few releases but then you can just fix whatever the perceived issues are with the current message structure. You will still need a way to return multiple documents and that should be consistent across all the cases were documents are returned.

The write commands are another case where I think it might be half implemented. Why have the insert/update/delete as distinct commands? Why on have a single "write" command that takes a list of insert/update/delete operations to be applied in order. Inserts are limited to a single document but you can nest as many as you like. The results are then a document per operation. If you can interleave the results it opens a whole new set of possibilities: What if there was "continue on error" semantics similar to what insert has today? What if that command had an optional (per-shard) all-or-nothing semantics? I think I might of just invented MongoDB's cross document transactions... Imagine...

These are easy to fix now. After 2.6 is released (what is the target date for that?) It will be a much bigger pain.

Rob.

Asya Kamsky

unread,

Nov 3, 2013, 12:02:30 PM11/3/13

to mongo...@googlegroups.com

On Sun, Nov 3, 2013 at 10:57 AM, Robert Moore
<robert.a...@gmail.com> wrote:
>> Not much. By default the firstBatch does contain responses, just like
>> what would happen with a normal query — with the difference that with a
>> normal query documents are indeed returned as full. At the maximum size
>> there is still a limit on how many documents can be returned in a

> I expect the first response to contain results. That was not my concern.

> My concern was that there will be cases were users carefully construct ...

> documents to be just under the 16MB limit from the aggregation and they will still run
> into problems due to the response being nested inside of another document.

What problems would they run into? When you make a "regular" query you are not
guaranteed to get all the results in a single response, that's why
getmore exists to get
more "batches" of results - why is it a problem for aggregation?

Asya

Robert Moore

unread,

Nov 3, 2013, 12:29:06 PM11/3/13

to mongo...@googlegroups.com

On Sunday, November 3, 2013 12:02:30 PM UTC-5, Asya Kamsky wrote:

What problems would they run into? When you make a "regular" query you are not
guaranteed to get all the results in a single response, that's why
getmore exists to get
more "batches" of results - why is it a problem for aggregation?

Imagine a aggregation where the first result is 16MB - 1. Now what? Does it wait for a get more? Does it get put in the "firstBatch" array and violate the document size?

If you look int the code you will see that the limit is actually 4MB (MaxBytesToReturnToClientAtOnce) which means even more cases where there needs to be a second round trip to the server before getting any results. If the results had been returned as part of the normal OP_REPLY you would have gotten a couple of the results in the first message. I don't think I have ever seen a reply to the initial query that did not contain at least 1 document (assuming there is a matching document).

Will the aggregation method work - yes. Is it optimal or expected?

Again - Aggregation is the future of queries that want to use the new security features. For some users that will be all queries. Making it have as little performance impact as possible is important.

Rob.

Asya Kamsky

unread,

Nov 3, 2013, 12:37:41 PM11/3/13

to mongo...@googlegroups.com

Robert,

I'm aware of the 4MB or 101 document limit for batches - in fact, this guarantees that first batch would fit just fine into the document returned for aggregation - that's no different than for a find query.

I guess I'm not sure what you would like to see under the hood instead. If you got a batch of documents from aggregate it would not be any bigger - so additional batches would be fetched. This happens automatically (via the driver) and shouldn't affect the application no matter what it's expecting, no?

Asya

--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev...@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Robert Moore

unread,

Nov 3, 2013, 2:17:53 PM11/3/13

to mongo...@googlegroups.com

On Sunday, November 3, 2013 12:37:41 PM UTC-5, Asya Kamsky wrote:

Robert,

I'm aware of the 4MB or 101 document limit for batches - in fact, this guarantees that first batch would fit just fine into the document returned for aggregation - that's no different than for a find query.

I guess I'm not sure what you would like to see under the hood instead. If you got a batch of documents from aggregate it would not be any bigger - so additional batches would be fetched. This happens automatically (via the driver) and shouldn't affect the application no matter what it's expecting, no?

First - I work on one of those drivers. ;-)

I have spent more hours than I care to count trying to make MongoDB go as fast as possible. I assure you we do lots of really crazy stuff and worry about every bit of latency we can squeeze out of the system. Having to make two round trips to the server to get 1 result is exactly the kind of stuff that blows our performance budget.

The find does return a result that is close to the 16MB limit in the first reply. There is no 4MB limit imposed on the first reply. The aggregate cursor has to based on the format of the reply alone.

What I expected was for the reply to a cursor based aggregation to be a OP_REPLY message where each of the result documents in the message was one of the results from the aggregation.

I have already written the code to adapt the "command" cursor into a standard reply, I am just wondering why not make the command cursors look like a query's cursor from the very first reply. Like I asked in the original message: What am I missing?

Rob

P.S. In writing a test to verify that find would return and the aggregation wouldn't (my attempt at understanding the code) I ran into a bug... The code below (which uses my driver and a pre-release version at that) returns this reply:

{

errmsg : 'exception: Tried to create string longer than 16MB',

code : 16493,

ok : 0.0

}

If I reduce the size of the _id by 1000 bytes I get back a cursor document but with no entries in the "firstBatch".

{

cursor : {

id : NumberLong('89091207075701'),

ns : 'acceptance_test.acceptance_1',

firstBatch : []

},

ok : 1.0

}

I will try to reproduce in the shell and submit a Jira ticket.

@Test

public void testFindReallyBigDocument() {

final DocumentBuilder doc = BuilderFactory.start();

// 4 is the cString for _id.

// 6 is the Binary Element overhead.

// 5 is the Document overhead.

doc.add("_id", new byte[Client.MAX_DOCUMENT_SIZE - (4 + 6 + 5)]);

myCollection.insert(Durability.ACK, doc);

final Find.Builder find = new Find.Builder();

find.setQuery(Find.ALL);

MongoIterator<Document> iter = myCollection.find(find);

try {

assertTrue(iter.hasNext());

assertEquals(doc.build(), iter.next());

assertFalse(iter.hasNext());

}

finally {

iter.close();

}

// Now use aggregation.

final Aggregation aggregation = Aggregation.builder().match(Find.ALL)

.useCursor().build();

iter = myCollection.aggregate(aggregation);

try {

assertTrue(iter.hasNext());

assertEquals(doc.build(), iter.next());

assertFalse(iter.hasNext());

}

finally {

iter.close();

}

Robert Moore

unread,

Nov 3, 2013, 7:38:16 PM11/3/13

to mongo...@googlegroups.com

On Sunday, November 3, 2013 2:17:53 PM UTC-5, Robert Moore wrote:

I will try to reproduce in the shell and submit a Jira ticket.

Submitted: https://jira.mongodb.org/browse/SERVER-11539

Asya Kamsky

unread,

Nov 3, 2013, 9:45:23 PM11/3/13

to mongo...@googlegroups.com

While I can understand your position, I'm wondering how you would handle

the transition from 2.4 to 2.5.x/2.6 if aggregate now returns something different.

It seems like changing the behavior of aggregate command directly would

break any and all code that was relying on getting back a single document...

As a driver developer what would you expect as a "transition" or compatibility

option?

Asya

Robert Moore

unread,

Nov 3, 2013, 10:18:22 PM11/3/13

to mongo...@googlegroups.com

On Sunday, November 3, 2013 9:45:23 PM UTC-5, Asya Kamsky wrote:

While I can understand your position, I'm wondering how you would handle
the transition from 2.4 to 2.5.x/2.6 if aggregate now returns something different.

It seems like changing the behavior of aggregate command directly would

break any and all code that was relying on getting back a single document...

As a driver developer what would you expect as a "transition" or compatibility
option?

In this case the driver asked the server to return something different by adding the "cursor" sub-document to the aggregate command. There really is no direct compatibility issue since old commands will continue to return what they always did. A document with a results array.

I think there is a "mental model" compatibility issue.

I have spent some time in the Software Engineering ivory tower and a lot of that time was spent thinking about how we build systems. How we make them maintainable and keep them maintainable. One of the strongest tools we have is what the Agile group calls "the metaphor". I have always called it the mental model. The concept is to have a few simple rules/models that describe the way that the system works. The idea is to allow you to intelligently reason about the system's behavior without having to go and look at the details. There are always exceptions but you should try to avoid those if it all possible (and then document the heck out of them). If this is done right it helps new developers (and old) working on the system to understand the impact of changes and how to make those changes. As my grad adviser liked to say: This is all motherhood and apple pie, why are you telling me this?

One of my mental models for MongoDB was cursor starts with a OP_REPLY messages containing a document for each result. The cursor id is in the reply header and you ask for more results via a OP_GETMORE which returns a OP_REPLY, rinse repeat. The aggregation cursor ripped that fabric out from under my feet. Changing the mental model for a system is serious business and I just want to make sure that I am not missing anything. If the decision was arbitrary that is OK (the server developer's mental model is by necessity different from a driver developer's) but I just want to be sure. I also think maintaining my cursor model has some value...

I'm happy to continue this discussion but I think I have circled the block enough times. Feel free to ignore me.

Rob.

Reply all

Reply to author

Forward