Google Groups

Re: iterating across sharded collection


Greg Studer Apr 27, 2012 4:01 PM
Posted in group: mongodb-user
There's a known issue where your cursors may be timing out - have you
added the NoTimeout option to the query to mongodb?  How to do so
depends on your driver.

On Apr 27, 12:45 pm, progolferyo <sfan...@gmail.com> wrote:
> We have a sharded cluster, 3 shards, each with a replica set.
> Standard stuff.  I am trying to write a script that simply iterates
> over the entire collection.
>
> So I connect to the mongos server, and iterate across the entire
> collection for the purposes of testing, im just counting how many
> items i iterate over.  My java code looks like this:
>
> DB db = m.getDB("main_db");
> DBCollection coll = db.getCollection("users");
> DBCursor cur = coll.find();
> int counter = 0;
>
> while(cur.hasNext()) {
>     cur.next();
>     counter++;
>
> }
>
> So the issue is that i have about 30M items in the collection and when
> I run the script, counter finishes with around 11M items, which means
> it quit after the first shard.  If I run the script again, shortly
> after the first time I ran it, I sometimes get to 20M items, which
> looks like it got to the second shard but not the third.  I have yet
> to get all the way to the third shard.
>
> So question is how do I actually iterate over an entire collection
> with the guarantee that I'm hitting all shards in the cluster?
>
> And am I doing something wrong in my implementation, it seems pretty
> straightforward.