progolferyo
unread,Apr 27, 2012, 12:45:03 PM4/27/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongodb-user
We have a sharded cluster, 3 shards, each with a replica set.
Standard stuff. I am trying to write a script that simply iterates
over the entire collection.
So I connect to the mongos server, and iterate across the entire
collection for the purposes of testing, im just counting how many
items i iterate over. My java code looks like this:
DB db = m.getDB("main_db");
DBCollection coll = db.getCollection("users");
DBCursor cur = coll.find();
int counter = 0;
while(cur.hasNext()) {
cur.next();
counter++;
}
So the issue is that i have about 30M items in the collection and when
I run the script, counter finishes with around 11M items, which means
it quit after the first shard. If I run the script again, shortly
after the first time I ran it, I sometimes get to 20M items, which
looks like it got to the second shard but not the third. I have yet
to get all the way to the third shard.
So question is how do I actually iterate over an entire collection
with the guarantee that I'm hitting all shards in the cluster?
And am I doing something wrong in my implementation, it seems pretty
straightforward.