wrong number of documents when using batch size in sharded system

59 views
Skip to first unread message

Shi Shei

unread,
Nov 25, 2011, 5:18:43 AM11/25/11
to mongodb-user
Since yesterday we were using a replicaSet of 4 member. Now, we
changed it into a sharded system with 2 shards.
Mongo is balancing chunks to the new shard and all seemed to be ok.

However, a query executed through the Java driver (v2.6.5) which
worked yesterday correctly, returns now an incorrect number of
documents.
I found out that when I set a batch size (say 100) in java, it returns
the correct number of documents (round about 150.000).
It also returns the correct number of documents when I omit the batch
size.

However, when I set the batch size to 1.000, I only get 2.410
documents.
When I set the batch size to 100.000, I get approximately 102.000
documents.

Why the batch size has such an impact on my results? The query
executes quite fast since it only queries one indexed field and
returns the whole document. One document is about 1-2 kB.
Does the router (mongos) not correctly handle the batch size option?

We're using Mongo v2.0.1 running on 64 Bit Linux.

Nat

unread,
Nov 25, 2011, 5:20:59 AM11/25/11
to mongod...@googlegroups.com
Is the problem repeatable all the time? It's possible that your data got moved (by balancer) so you are not getting completed data.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Shi Shei

unread,
Nov 25, 2011, 5:45:34 AM11/25/11
to mongodb-user
Yes it's repeatable all the time.
I understand that I can get a bit different data due to balancing but
not in this dimension (2.410 vs. 150.000). It seems that it has to do
with batch size.
Thank you for having a look at the issue!

Nat

unread,
Nov 25, 2011, 5:49:21 AM11/25/11
to mongod...@googlegroups.com
Is there any change if you use diffrent sorting? Did you see any error in mongod/mongos log file? Can you also post your explain() plan as well?
-----Original Message-----
From: Shi Shei <QTRAUR...@spammotel.com>
Sender: mongod...@googlegroups.com
Date: Fri, 25 Nov 2011 02:45:34
To: mongodb-user<mongod...@googlegroups.com>
Reply-To: mongod...@googlegroups.com
Subject: [mongodb-user] Re: wrong number of documents when using batch size in
sharded system

Shi Shei

unread,
Nov 25, 2011, 6:01:49 AM11/25/11
to mongodb-user
Oh yes! When I let sort over _id, the query takes much more time and,
more interestingly, the number of documents I receive corresponds
exactly to the number of the batch size!

I'll see, if I see anything in the logs about. How I can I use
explain() through the java driver?
Thanks!

Shi Shei

unread,
Nov 25, 2011, 7:46:37 AM11/25/11
to mongodb-user
Here is the query plan without using batch size:
http://pastebin.com/raw.php?i=KMyQSNY4

Here is the query plan using batch size of 1.000:
http://pastebin.com/raw.php?i=0j8FqTFS

Here is the query plan using batch size of 1.000 and sorting by _id:
http://pastebin.com/raw.php?i=8GxdiJat

Shi Shei

unread,
Nov 25, 2011, 8:46:10 AM11/25/11
to mongodb-user
I'm sorry, the query plan without using a batch size is this one:
http://pastebin.com/raw.php?i=aX4bYt5T

The above posted first query plan was using a batch size of 500.

Thanks!

Shi Shei

unread,
Nov 25, 2011, 9:24:08 AM11/25/11
to mongodb-user
I can reproduce it in the mongo shell as well, so the issue is not
related to the java driver:

> var count = 0;
> var cursor = db.offer.find({shopId: 205640}).batchSize(100);
> for (var i = 0; cursor.hasNext(); ++i){cursor.next(); count++;}
154062
> count
154063
> var count = 0;
> var cursor = db.offer.find({shopId: 205640}).batchSize(1000);
> for (var i = 0; cursor.hasNext(); ++i){cursor.next(); count++;}
2419
> count
2420
> db.offer.find({shopId: 205640}).count()
156484
>

As you can see, a batch size of 100 retrieves almost all documents but
batch size of 1.000 retrieves only very few.
Any hints?

Eliot Horowitz

unread,
Nov 25, 2011, 10:06:49 AM11/25/11
to mongod...@googlegroups.com
Can you do an explain and the shell and check the mongos logs for errors?
All mongos and mongod are 2.0.1?

Shi Shei

unread,
Nov 25, 2011, 10:56:45 AM11/25/11
to mongodb-user
Yes, all mongos and mongod are running v2.0.1
Only my mongo shell is running v1.8.3

Here is the query plan from the console using batch size 100:
http://pastebin.com/raw.php?i=F8ajfACS

This are the last lines of the mongos logs.
I executed the explain query at 16:50
http://pastebin.com/raw.php?i=k3mmtkL2

On Nov 25, 4:06 pm, Eliot Horowitz <el...@10gen.com> wrote:
> Can you do an explain and the shell and check the mongos logs for errors?
> All mongos and mongod are 2.0.1?
>

Shi Shei

unread,
Nov 25, 2011, 11:05:50 AM11/25/11
to mongodb-user
I just repeated the query using batch size 100 and then 1000 in the
mongo shell.
The first took about one hour but it retrieved almost all documents.
The second query took a few seconds and retrieved only 2419 documents:

MongoDB shell version: 1.8.3
connecting to: sx175:27019/offerStore


> var count = 0;
> var cursor = db.offer.find({shopId: 205640}).batchSize(100);
> for (var i = 0; cursor.hasNext(); ++i){cursor.next(); count++;}
154062

> var count = 0;
> var cursor = db.offer.find({shopId: 205640}).batchSize(1000);
> for (var i = 0; cursor.hasNext(); ++i){cursor.next(); count++;}
2419
>

In mongos log is only 1 line more:

Fri Nov 25 17:00:31 [LockPinger] cluster
sx175:20020,sx176:20020,sx177:20020 pinged successfully at Fri Nov 25
17:00:31 2011 by distributed lock pinger
'sx175:20020,sx176:20020,sx177:20020/
sx175:27019:1322234137:1804289383', sleeping for 30000ms


On Nov 25, 4:56 pm, Shi Shei <QTRAURFUI...@spammotel.com> wrote:
> Yes, all mongos and mongod are running v2.0.1
> Only my mongo shell is running v1.8.3
>
> Here is the query plan from the console using batch size 100:http://pastebin.com/raw.php?i=F8ajfACS
>
> This are the last lines of the mongos logs.

> I executed the explain query at 16:50http://pastebin.com/raw.php?i=k3mmtkL2

Shi Shei

unread,
Nov 25, 2011, 11:07:58 AM11/25/11
to mongodb-user
Now, I get this exception when doing explain using batch size 1.000:

> db.offer.find({shopId: 205640}).batchSize(1000).explain()
Fri Nov 25 17:06:26 uncaught exception: error: {
"$err" : "could not initialize cursor across all shards because : no
ts field in query @ offerStoreDE2/s197:27018,s209:27018,s198:27018 ::
and :: no ts field in query @ offerstore/
s141:27018,s142:27018,s144:27018,s143:27018",
"code" : 14827

Shi Shei

unread,
Nov 25, 2011, 11:10:11 AM11/25/11
to mongodb-user
Same message in the mongos logs:

Fri Nov 25 17:06:24 [conn2] DBException in process: could not


initialize cursor across all shards because : no ts field in query @
offerStoreDE2/s197:27018,s209:27018,s198:27018 :: and :: no ts field
in query @ offerstore/s141:27018,s142:27018,s144:27018,s143:27018

Eliot Horowitz

unread,
Nov 28, 2011, 1:36:43 AM11/28/11
to mongod...@googlegroups.com
Do you have any other options set in the shell?
CAn you send the entire session?
That last message is very weird.

Shi Shei

unread,
Nov 28, 2011, 3:07:50 AM11/28/11
to mongodb-user
I don't use any option in the shell. At least, I'm not aware of.

Here is my last session replayed a few minutes ago:

MongoDB shell version: 1.8.3
connecting to: sx175:27019/offerStore
> var count = 0;

> var cursor = db.offer.find({shopId: 205640}).batchSize(1000);
> while(cursor.hasNext()){cursor.next(); count++;}
2419
> db.offer.find({shopId: 205640}).batchSize(1000).explain()
Mon Nov 28 09:01:10 uncaught exception: error: {


"$err" : "could not initialize cursor across all shards because : no
ts field in query @ offerStoreDE2/s197:27018,s209:27018,s198:27018 ::
and :: no ts field in query @ offerstore/
s141:27018,s142:27018,s144:27018,s143:27018",
"code" : 14827
}
>

And here is the complete mongos log:
http://pastebin.com/raw.php?i=KacP6SCv

The mongos is used only by me, so the logs are clean. They are not
messed up with requests of other mongo clients.

Thanks for looking into that!


On Nov 28, 7:36 am, Eliot Horowitz <el...@10gen.com> wrote:
> Do you have any other options set in the shell?
> CAn you send the entire session?
> That last message is very weird.
>

Eliot Horowitz

unread,
Nov 29, 2011, 1:38:08 AM11/29/11
to mongod...@googlegroups.com
Are the shards themselves 2.0.1 as well?

Shi Shei

unread,
Nov 29, 2011, 2:20:22 AM11/29/11
to mongodb-user
Yes, all shards are 2.0.1 as well. Also all config servers and all
routers are 2.0.1.

> ...
>
> read more »

Greg Studer

unread,
Dec 6, 2011, 2:36:12 PM12/6/11
to mongodb-user
Can you turn up the logging on the mongos to level 5 temporarily and
show the output (use admin; db.runCommand({ setParameter : 1,
logLevel : 5 }))? That should give us the actual options being passed
to each shard.

> ...
>
> read more »

Shi Shei

unread,
Jan 9, 2012, 8:50:50 AM1/9/12
to mongodb-user
I replayed the test.
I expected 166.031 documents but only 2.317 were returned:

MongoDB shell version: 2.0.1
connecting to: localhost:27018/offerStore
mongos> db.offer.count({shopId: 205640});
166031
mongos> var count = 0;
mongos> var cursor = db.offer.find({shopId: 205640}).batchSize(1000);
mongos> while(cursor.hasNext()){cursor.next(); count++;}
2317
mongos> db.offer.find({shopId: 205640}).batchSize(1000).explain()
Mon Jan 9 14:46:09 uncaught exception: error: {
"$err" : "could not initialize cursor across all shards because : no
ts field in query @ offerStoreDE2/s197:27018,s209:27018,s198:27018 ::
and :: no ts field in query @ offerStoreDE3/
s117:27018,s129:27018,s118:27018 :: and :: no ts field in query @
offerstore/s142:27018,s144:27018,s143:27018",
"code" : 14827
}
mongos> db.offer.count({shopId: 205640});
166031

Here are the router logs (level 6):
http://pastebin.com/raw.php?i=UbUBdTqf
> ...
>
> read more »

Greg Studer

unread,
Jan 9, 2012, 11:04:03 AM1/9/12
to mongodb-user
Thanks for following up - can see the batchsize/other options at least
in explain() are getting passed incorrectly - opened a ticket
SERVER-4651 which you can follow for this fix.
> > > > >> >> > > > >> > > > > Mongo is balancing chunks to the new...
>
> read more »

Shi Shei

unread,
Jan 9, 2012, 11:49:48 AM1/9/12
to mongodb-user
Thank you Greg! This issue seems related:
http://groups.google.com/group/mongodb-user/browse_thread/thread/6b5f8bb97510707b
What do you think?
> ...
>
> read more »

Greg Studer

unread,
Jan 10, 2012, 5:16:56 PM1/10/12
to mongodb-user
Potentially - hard to say at this point, but incorrect options could
cause strange results similar to what you're seeing. It doesn't seem
like the non-explain results have the same incorrect options though.

On Jan 9, 11:49 am, Shi Shei <QTRAURFUI...@spammotel.com> wrote:
> Thank you Greg! This issue seems related:http://groups.google.com/group/mongodb-user/browse_thread/thread/6b5f...
> > > > > > >> >> > > > >> > > > Sender: mongod...@googlegroups.com...
>
> read more »
Reply all
Reply to author
Forward
0 new messages