Data on wrong shard

31 views
Skip to first unread message

Philip Southam

unread,
Nov 16, 2010, 8:21:38 PM11/16/10
to mongod...@googlegroups.com
Got a severe problem here again. So this seems to be another iteration of the same problem I had about a month ago. I've got a number of records that are not showing up when i query them by _id (which is also the sharding key).

For example, here is the relevant part of the printShardingStatus for this particular data base:
 sharding version: { "_id" : 1, "version" : 3 }
  shards:
      {
        "_id" : "shard1",
        "host" : "set1/mongo0-a:10000,mongo1-a:10000,mongo2-a:10000"
}
      {
        "_id" : "shard2",
        "host" : "set2/mongo0-a:20000,mongo1-a:20000,mongo2-a:20000"
}
      {
        "_id" : "shard3",
        "host" : "set3/mongo0-a:30000,mongo1-a:30000,mongo2-a:30000"
}
databases:
{ "_id" : "movieclips", "partitioned" : true, "primary" : "shard1" }
                movieclips.base chunks:
                        { "_id" : { $minKey : 1 } } -->> { "_id" : "234H" } on : shard3 { "t" : 4000, "i" : 0 }
                        { "_id" : "234H" } -->> { "_id" : "CK2Fd" } on : shard3 { "t" : 5000, "i" : 0 }
                        { "_id" : "CK2Fd" } -->> { "_id" : "HT2w" } on : shard3 { "t" : 7000, "i" : 0 }
                        { "_id" : "HT2w" } -->> { "_id" : "Nb2v" } on : shard1 { "t" : 7000, "i" : 1 }
                        { "_id" : "Nb2v" } -->> { "_id" : "Yz7gx" } on : shard2 { "t" : 3000, "i" : 2 }
                        { "_id" : "Yz7gx" } -->> { "_id" : "e2sb4" } on : shard2 { "t" : 6000, "i" : 2 }
                        { "_id" : "e2sb4" } -->> { "_id" : "jCY6e" } on : shard2 { "t" : 6000, "i" : 3 }
                        { "_id" : "jCY6e" } -->> { "_id" : "s5MW" } on : shard1 { "t" : 3000, "i" : 6 }
                        { "_id" : "s5MW" } -->> { "_id" : "zyxK" } on : shard1 { "t" : 3000, "i" : 7 }
                        { "_id" : "zyxK" } -->> { "_id" : { $maxKey : 1 } } on : shard2 { "t" : 3000, "i" : 0 }


Now I've got a record with the _id of "N5gkw". Now according to the output of the printShardingStatus command this record should live in either shard1 or shard2 (not sure if numerics as strings go before or after a-zA-Z). When I run a

> db.base.find({_id:'N5gkw'}) 

On a mongos connected to the instance i return nothing, but when I connect directly to the master of shard3 and run the command I get a result. I can get a result from the mongos connection by querying different properties of the object however, just not by the _id. 

I've tried upgrading to 1.6.4, restarting all the instances in my replica set, restarting all the mongos', arbieters, and config db; and running a repair database on the collection. Even tried re-importing the data used to create the dataset with no luck.

If someone could please tell me how to get the proper records updated quickly I would appreciate it. This bug is reeking havoc on our site right now.


--
Philip Southam
VP Engineering, MOVIECLIPS.com
phi...@movieclips.com
http://j.mp/7Qo6Kh

Eliot Horowitz

unread,
Nov 16, 2010, 8:32:51 PM11/16/10
to mongod...@googlegroups.com
Can you send all the mongos logs?
Is it all things in that chunk?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Philip Southam

unread,
Nov 16, 2010, 8:54:39 PM11/16/10
to mongod...@googlegroups.com
No, unfortunately it doesn't look like the affected data is limited to just one chunk. It's spread out through the _id/shard key distribution. One of the logs is taking forever to download, will send when I've got them zipped up.

Eliot Horowitz

unread,
Nov 16, 2010, 8:55:42 PM11/16/10
to mongod...@googlegroups.com
Can you do an explain on a bad query?

Philip Southam

unread,
Nov 16, 2010, 8:59:14 PM11/16/10
to mongod...@googlegroups.com
k, so explain from a mongos connection:

db.base.find({_id:'N5gkw'}).explain()
{
        "clusteredType" : "SerialServer",
        "shards" : {
                "set1/mongo0-a:10000,mongo1-a:10000,mongo2-a:10000" : [
                        {
                                "cursor" : "BtreeCursor _id_",
                                "nscanned" : 0,
                                "nscannedObjects" : 0,
                                "n" : 0,
                                "millis" : 0,
                                "indexBounds" : {
                                        "_id" : [
                                                [
                                                        "N5gkw",
                                                        "N5gkw"
                                                ]
                                        ]
                                }
                        }
                ]
        },
        "nscanned" : 0,
        "nscannedObjects" : 0,
        "n" : 0,
        "millisTotal" : 0,
        "millisAvg" : 0,
        "numQueries" : 1,
        "numShards" : 1
}


As mentioned before, this seems to live on set/shard 3 in actuality.

Philip Southam

unread,
Nov 16, 2010, 9:01:47 PM11/16/10
to mongod...@googlegroups.com
Here's the various mongos logs from our different apps. mongos-1.log is from the client I'm using to try and re-update the data from.

Scott Hernandez

unread,
Nov 16, 2010, 9:06:58 PM11/16/10
to mongod...@googlegroups.com
When you do the same query on shard3 (or all of the shards
individually) does it return data?

Philip Southam

unread,
Nov 16, 2010, 9:11:32 PM11/16/10
to mongod...@googlegroups.com
Yes, that's how I found out that the record was located in the wrong place. I connected directly to the master of the instance running that shard and issued the same query.

Eliot Horowitz

unread,
Nov 16, 2010, 9:15:45 PM11/16/10
to mongod...@googlegroups.com
When did you upgrade to 1.6.4?  Didn't we find the bug with you?  We fixed in 1.6.4. But now need to get you cleaned up. Will dive into logs soon

Philip Southam

unread,
Nov 16, 2010, 9:20:10 PM11/16/10
to mongod...@googlegroups.com
I am indeed the same person who went through this with you before. I upgraded today (noticed that 1.6.4 finally made it in to the debian repo), after discovering this issue in the hopes that it would resolve it once I upgraded. 

Philip Southam

unread,
Nov 16, 2010, 9:39:31 PM11/16/10
to mongod...@googlegroups.com
Would there be any danger in doing this?

db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$lt:'234H'} } , to : "shard3" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'234H', $lt:'CK2Fd'} } , to : "shard3" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'CK2Fd', $lt:'HT2w'} } , to : "shard3" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'HT2w', $lt:'Nb2v'} } , to : "shard1" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'Nb2v', $lt:'Yz7gx'} } , to : "shard2" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'Yz7gx', $lt:'e2sb4'} } , to : "shard2" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'e2sb4', $lt:'jCY6e'} } , to : "shard2" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'jCY6e', $lt:'s5MW'} } , to : "shard1" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'s5MW', $lt:'zyxK'} } , to : "shard3" } );
db.runCommand( { moveChunk : "movieclips.base", find : {_id: {$gte:'zyxK'} } , to : "shard2" } );

Basically calling a series of move chunk commands based on what's in the printShardingStatus output.

Philip Southam

unread,
Nov 16, 2010, 10:07:29 PM11/16/10
to mongod...@googlegroups.com
I've also got a mongodump from a couple of days ago, if I did a mongorestore --drop could I expect that to help/fix this; or would that cause more harm than good? If it was on a standalone system it's a no brainer, but this sharding magic give me pause before performing such drastic actions.

Eliot Horowitz

unread,
Nov 16, 2010, 11:01:46 PM11/16/10
to mongod...@googlegroups.com
I wouldn't call moveChunk.
moceChunk asumes things are in the state of the meta data, since
that's not true, it could cause more problems.

Eliot Horowitz

unread,
Nov 16, 2010, 11:02:16 PM11/16/10
to mongod...@googlegroups.com
Can you take a dump of each shard and then try mongorestore through mongos?

Philip Southam

unread,
Nov 16, 2010, 11:20:53 PM11/16/10
to mongod...@googlegroups.com
Should I use the --drop flag in mongos?

Eliot Horowitz

unread,
Nov 16, 2010, 11:21:28 PM11/16/10
to mongod...@googlegroups.com
No - I wouldn't.

Philip Southam

unread,
Nov 16, 2010, 11:56:11 PM11/16/10
to mongod...@googlegroups.com
Now that was interesting. I've got about 6k+ tmp.mrs collections in my metrics db that don't show up when i do a show collections from a mongos client. Running the restores from each of the shards/sets now.

Philip Southam

unread,
Nov 17, 2010, 12:33:48 AM11/17/10
to mongod...@googlegroups.com
You're the man Eliot! Mongorestoring each of the sets mongodumps through the mongos client seemed to work like a champ; thank you. Now one more question. I mentioned in my last email that I had 6k+ tmp.mr(s) collections in another database that were only visible if i ran a "show collections" on each set/shard master; that is to say that when I ran show collections from the mongos client in that database they where not visible. I managed to drop most of them through some crafty copy/paste methods, but I've got about 1500 that return false when dropping, but still show up in the output of show collections on the masters (one in particular). How can I remove these? I hesitate to do a repair db because I've got data constantly streaming to that db, but if that's the only way I'm willing. Also, would a db.repairDatabase() through the mongos work, or should I issue it from the master of the set/shard?

Eliot Horowitz

unread,
Nov 17, 2010, 12:35:21 AM11/17/10
to mongod...@googlegroups.com
I would bring up new slaves.
That way you get the benefit of repair, without downtime.
Can just swap them in when ready.

Philip Southam

unread,
Nov 17, 2010, 3:03:36 PM11/17/10
to mongod...@googlegroups.com
Eliot,

Do you happen to know the JIRA ticket that fixed the improperly mapped shard keys bug that was fixed in 1.6.4? I would like to include it in a post-mortem report I've got to put together for our internal folks.

Thanks,

Eliot Horowitz

unread,
Nov 17, 2010, 3:07:20 PM11/17/10
to mongod...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages