Received: by 10.58.135.42 with SMTP id pp10mr1364082veb.18.1346342700758; Thu, 30 Aug 2012 09:05:00 -0700 (PDT) X-BeenThere: mongodb-user@googlegroups.com Received: by 10.220.141.210 with SMTP id n18ls2417943vcu.2.gmail; Thu, 30 Aug 2012 09:04:52 -0700 (PDT) Received: by 10.52.30.133 with SMTP id s5mr976194vdh.10.1346342692224; Thu, 30 Aug 2012 09:04:52 -0700 (PDT) Date: Thu, 30 Aug 2012 09:04:51 -0700 (PDT) From: Daniel Schlegel To: mongodb-user@googlegroups.com Message-Id: <0b38f8d8-9499-424c-b6cb-05c6bf0337fc@googlegroups.com> In-Reply-To: <6958c597-db0a-48bf-a147-3dfccb331273@googlegroups.com> References: <55bcdddf-9c03-44c7-9b0c-25468f3523d4@googlegroups.com> <0407f0eb-ce1e-4a5c-abd0-d531e04c925a@googlegroups.com> <76d1791b-0bd6-44ee-9768-fb7fc65bf951@googlegroups.com> <6958c597-db0a-48bf-a147-3dfccb331273@googlegroups.com> Subject: Re: Balancing does not work MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_757_8652119.1346342691513" ------=_Part_757_8652119.1346342691513 Content-Type: multipart/alternative; boundary="----=_Part_758_22458677.1346342691519" ------=_Part_758_22458677.1346342691519 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Im pretty sure i restarted all the mongod's in our system. I made it again for the s2 mongod's. Not sure if it was exactly on point of restarting the mongod's but the exception seam to have disappeared. Is it possible, that i had to bounce first the primary of s1 and then the primary of s2? I did it in reverse order. Here are the outputs: s2:PRIMARY> rs.status() { "set" : "s2", "date" : ISODate("2012-08-30T15:52:40Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "mongo21.example.com:20021", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 57, "optime" : Timestamp(1346341931000, 3), "optimeDate" : ISODate("2012-08-30T15:52:11Z"), "lastHeartbeat" : ISODate("2012-08-30T15:52:40Z"), "pingMs" : 0, "errmsg" : "syncing to: mongo22.example.com:20022" }, { "_id" : 1, "name" : "arbiter23.example.com:20023", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 57, "lastHeartbeat" : ISODate("2012-08-30T15:52:40Z"), "pingMs" : 0 }, { "_id" : 2, "name" : "mongo22.example.com:20022", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 57, "optime" : Timestamp(1346341931000, 3), "optimeDate" : ISODate("2012-08-30T15:52:11Z"), "self" : true } ], "ok" : 1 } s2:PRIMARY> db.currentOp() { "inprog" : [ { "opid" : 748, "active" : true, "secs_running" : 2, "op" : "getmore", "ns" : "local.oplog.rs", "query" : { }, "client" : "6.23.38.199:46472", "desc" : "conn35", "threadId" : "0x7fad895e7700", "connectionId" : 35, "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(97), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(11), "w" : NumberLong(0) } } }, { "opid" : 24, "active" : false, "op" : "none", "ns" : "", "query" : { }, "desc" : "repl writer worker 1", "threadId" : "0x7fddb837b700", "waitingForLock" : false, "numYields" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(33) }, "timeAcquiringMicros" : { } } } ] } The really good looking part of it is the sharding status :) mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3 } shards: { "_id" : "s1", "host" : "s1/mongo11.example.com:20011,mongo12.example.com:20012" } { "_id" : "s2", "host" : "s2/mongo21.example.com:20021,mongo22.example.com:20022" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "example_production", "partitioned" : true, "primary" : "s1" } example_production.people chunks: s2 1 s1 3078 too many chunks to print, use verbose if you want to force print it's about 10min ago and there didn't appear another chunk on s2 but it seams really better. Thank you very much for your help. I will get back if it's not going on with sharding. Dani Am Donnerstag, 30. August 2012 17:43:50 UTC+2 schrieb Adam C: > > What it is saying is that the shard it is attempting to migrate to, namely > the primary from: > > s2/mongo21.example.com:20021,mongo22.example.com:20022 > > Thinks it is already taking part in a migration (as I mentioned, it can > only take part in one at a time, hence the error). Are you sure those > mongod instances were restarted? > > Can you show me rs.status() from that s2 replica set, and a db.currentOp() > from the primary? > > In addition - are there any migration messages (or other errors) visible > in the logs for the primary for that set? > > Adam > > On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote: >> >> Hi Adam >> i removed all the entries and restartet both, mongos and mongod. >> The Problem is still here: >> >> Thu Aug 30 17:18:34 [conn502] resetting shard version of >> example_production.people on mongo22.example.com:20022, version is zero >> Thu Aug 30 17:18:34 [conn503] resetting shard version of >> example_production.people on mongo22.example.com:20022, version is zero >> ... >> >> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to >> move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >> shard: "s1" } from: s1 to: s2 tag [] >> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: example_production.people >> moving ( ns:example_production.people at: s1:s1/mongo11.example.com:20011 >> ,mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 min: { >> _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011, >> mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021, >> mongo22.example.com:20022 >> ... >> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> } >> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey } >> >> mongos> sh.status() >> --- Sharding Status --- >> sharding version: { "_id" : 1, "version" : 3 } >> shards: >> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >> mongo12.example.com:20012" } >> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >> mongo22.example.com:20022" } >> databases: >> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >> { "_id" : "example_production", "partitioned" : true, "primary" : "s1" >> } >> example_production.people chunks: >> s1 3079 >> too many chunks to print, use verbose if you want to force print >> >> these locks appeared again(in think this is like it should be): >> >> mongos> db.locks.find(); >> { "_id" : "example_production.people", "process" : >> "mongo11:20011:1346339905:604514035", "state" : 2, "ts" : >> ObjectId("503f85a8b61cd58923b1fcea"), "when" : >> ISODate("2012-08-30T15:24:24.524Z"), "who" : >> "mongo11:20011:1346339905:604514035:conn46:460093839", "why" : "migrate-{ >> _id: MinKey }" } >> { "_id" : "balancer", "process" : "worker:27069:1346339673:1804289383", >> "state" : 2, "ts" : ObjectId("503f85a740d7cd22f64415dc"), "when" : >> ISODate("2012-08-30T15:24:23.583Z"), "who" : >> "worker:27069:1346339673:1804289383:Balancer:846930886", "why" : "doing >> balance round" } >> >> i really appreciate your >> help, i'm pretty stuck with my current know how. >> Dani >> >> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C: >>> >>> Well, let's clear up the old entries first, then we can bounce the >>> mongos and see if that helps things. It may help if you restart the >>> mongod's too - clear out anything stale on their side. Each primary can >>> only take part in a single migration at a time, so if they believe they are >>> actually doing a migration that may be part of the problem too. >>> >>> To remove the stale entries: >>> >>> use config; >>> db.locks.remove({"_id" : "admin-movePrimary"}); >>> db.locks.remove({"_id" : "example_production-movePrimary"}); >>> db.locks.remove({"_id" : "example_production_vanity-movePrimary"}); >>> >>> Then bounce the mongos, see if the migrations kick off. If not, bounce >>> the mongod's also to clear out anything stale there. >>> >>> Adam >>> >>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote: >>>> >>>> Yes i did drain a shard because i had a problem removing a replica: >>>> 1. I was not able to remove a replica but had to move the server. >>>> 2. So ive added a new shard, drained the old one and remove the old >>>> shard. >>>> >>>> Heres the discussion about it: >>>> >>>> https://groups.google.com/forum/?fromgroups=#!searchin/mongodb-user/remove$20old$20replica/mongodb-user/Goi5cIqC6hQ/XmQF9NODUfMJ >>>> >>>> mongos> db.databases.find(); >>>> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >>>> { "_id" : "silp_production", "partitioned" : true, "primary" : "s1" } >>>> >>>> Thank you so much >>>> Dani >>>> >>>> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C: >>>>> >>>>> You have several movePrimary locks dating back to last month, >>>>> including the admin database, which should *always* be on "config" - did >>>>> you drain a shard, or use movePrimary for some other reason? >>>>> >>>>> Can you also post the output of: >>>>> >>>>> use config; >>>>> db.databases.find(); >>>>> >>>>> Adam. >>>>> >>>>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote: >>>>>> >>>>>> example_production.people is the important collection. >>>>>> >>>>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel: >>>>>>> >>>>>>> Hi Adam >>>>>>> Here's the output: >>>>>>> >>>>>>> mongos> db.locks.find(); >>>>>>> >>>>>>> { "_id" : "admin-movePrimary", "process" : >>>>>>> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >>>>>>> ObjectId("501555d74439248d85dc8867"), "when" : >>>>>>> ISODate("2012-07-29T15:25:11.099Z"), "who" : >>>>>>> "Web1:27069:1343571761:1804289383:conn122:1714636915", "why" : "Moving >>>>>>> primary shard of admin" } >>>>>>> >>>>>>> { "_id" : "example_production-movePrimary", "process" : >>>>>>> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >>>>>>> ObjectId("501553614439248d85dc885a"), "when" : >>>>>>> ISODate("2012-07-29T15:14:41.616Z"), "who" : >>>>>>> "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" : "Moving >>>>>>> primary shard of example_production" } >>>>>>> >>>>>>> { "_id" : "example_production_vanity-movePrimary", "process" : >>>>>>> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >>>>>>> ObjectId("501552fb4439248d85dc8855"), "when" : >>>>>>> ISODate("2012-07-29T15:12:59.598Z"), "who" : >>>>>>> "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" : "Moving >>>>>>> primary shard of example_production_vanity" } >>>>>>> >>>>>>> { "_id" : "balancer", "process" : "web1:27069:1346283357:314909341", >>>>>>> "state" : 2, "ts" : ObjectId("503f4f2a3c113ffbd8e4a7e9"), "when" : >>>>>>> ISODate("2012-08-30T11:31:54.320Z"), "who" : >>>>>>> "web1:27069:1346283357:314909341:Balancer:1842493053", "why" : "doing >>>>>>> balance round" } >>>>>>> >>>>>>> { "_id" : "example_production.people", "process" : >>>>>>> "mongo11:20011:1346282264:758785138", "state" : 0, "ts" : >>>>>>> ObjectId("503f4f2ce69a6c2009e22331"), "when" : >>>>>>> ISODate("2012-08-30T11:31:56.182Z"), "who" : >>>>>>> "mongo11:20011:1346282264:758785138:conn37208:1670912857", "why" : >>>>>>> "migrate-{ _id: MinKey }" } >>>>>>> >>>>>>> { "_id" : "example_production.new_coll", "process" : >>>>>>> "web1:27069:1346283357:314909341", "state" : 0, "ts" : >>>>>>> ObjectId("503f36eb3c113ffbd8e4a6a1"), "when" : >>>>>>> ISODate("2012-08-30T09:48:27.208Z"), "who" : >>>>>>> "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" } >>>>>>> >>>>>>> { "_id" : "example_production_vanity.metrics", "process" : >>>>>>> "web1:27069:1346283357:314909341", "state" : 0, "ts" : >>>>>>> ObjectId("503f37833c113ffbd8e4a6aa"), "when" : >>>>>>> ISODate("2012-08-30T09:50:59.474Z"), "who" : >>>>>>> "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" } >>>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Dani >>>>>>> >>>>>>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C: >>>>>>>> >>>>>>>> Dani, >>>>>>>> >>>>>>>> You might have a stale lock lying around, or the mongos may have a >>>>>>>> stale view of things - can you do a couple of things for me: >>>>>>>> >>>>>>>> 1. Bounce (restart) all of your mongos >>>>>>>> 2. Once the bounce is complete, log into the mongos and run: >>>>>>>> >>>>>>>> use config; >>>>>>>> db.locks.find(); >>>>>>>> >>>>>>>> And post the results here. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Adam >>>>>>>> >>>>>>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello >>>>>>>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>>>>>>> could shard my collection. >>>>>>>>> In the documentation i read: >>>>>>>>> Both splits and migrates are performed automatically. >>>>>>>>> >>>>>>>>> But unfortunately this doesn't work on our setup. >>>>>>>>> I get error messages like these: >>>>>>>>> >>>>>>>>> Thu Aug 30 12:16:42 [Balancer] ns: production.people going to >>>>>>>>> move { _id: "production.people-_id_MinKey", lastmod: Timestamp 1000|0, >>>>>>>>> lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>>>>>>>> "production.people", min: { _id: MinKey }, max: { _id: 304836 }, shard: >>>>>>>>> "s1" } from: s1 to: s2 tag [] >>>>>>>>> >>>>>>>>> Thu Aug 30 12:16:42 [Balancer] moving chunk ns: production.people >>>>>>>>> moving ( ns:production.people at: s1:s1/mongo11.example.com:20011, >>>>>>>>> mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 >>>>>>>>> min: { _id: MinKey } max: { _id: 304836 }) s1:s1/ >>>>>>>>> mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/ >>>>>>>>> mongo21.example.com:20021,mongo22.example.com:20022 >>>>>>>>> >>>>>>>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { >>>>>>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>>>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>>>>>> 0.0 } >>>>>>>>> >>>>>>>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { >>>>>>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>>>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>>>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey } >>>>>>>>> >>>>>>>>> After 10h of waiting there is still all data on the first shard >>>>>>>>> and the second is empty. >>>>>>>>> I think about manually splitting but as it's not so easy to find >>>>>>>>> the right splitpoint in our database i've not done this yet. >>>>>>>>> >>>>>>>>> Thanks for your Help >>>>>>>>> Dani >>>>>>>>> >>>>>>>> ------=_Part_758_22458677.1346342691519 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Im pretty sure i restarted all the mongod's in our system. I made it again = for the s2 mongod's.
Not sure if it was exactly on point of restarting = the mongod's but the exception seam to have disappeared. 
Is= it possible, that i had to bounce first the primary of s1 and then the pri= mary of s2? I did it in reverse order.

Here are th= e outputs:
s2:PRIMARY> rs.status()
{
= =09"set" : = "s2",
= =09"date" : ISODate("2012-08-30T15:52:40Z"),
=09"myState" : 1,
=09"m= embers" : [
=09=09{
=09=09=09"_id" : 0,
=09=09=09"name" : "mongo21.example.com= :20021",
=09=09=09"health" : 1,
=09=09=09"state" : 2,
=09=09=09"stateStr" : = "SECONDARY",
=09=09=09"uptime" : 57,
=09=09=09"optime" : Timestamp(1346341931= 000, 3),
=09=09=09"optimeDate" : ISODate("2012-08-30T15:52:11Z"),
= =09=09=09"l= astHeartbeat" : ISODate("2012-08-30T15:52:40Z"),
=09=09=09"pingMs" : 0,
=09=09=09"errmsg" : "syncing to: mongo22.example.com:20022"
=09=09},
=09=09{
=
=09=09=09"_id" : 1,
=09=09=09"name" : "arbiter23.example.com:20023",
=09=09=09"heal= th" : 1,
=09=09=09"state" : 7,
=09=09=09"stateStr" : "ARBITER",
=09=09=09"upti= me" : 57,
=09=09=09"lastHeartbeat" : ISODate("2012-08-30T15:52:40Z"),
<= div>=09=09=09"pingMs" : 0
=09=09},
=09=09{
=09=09=09"_id" : 2,
=09=09=09"name" : "mongo22.= example.com:20022",
=09=09=09"health" : 1,
=09=09=09"state" : 1,
=09=09=09"sta= teStr" : "PRIMARY",
=09=09=09"uptime" : 57,
=09=09=09"optime" : Timestamp(134= 6341931000, 3),
=09=09=09"optimeDate" : ISODate("2012-08-30T15:52:11Z"),
=09=09=09"self" : true
=09=09}
=09],
=09"ok" : 1
}

<= div>
s2:PRIMARY> db.currentOp()
{
=09"inprog" : [=
=09=09<= /span>{
= =09=09=09"opid" : 748,
=09=09=09"active" : true,
=09=09=09"secs_runnin= g" : 2,
= =09=09=09"op" : "getmore",
=09=09=09"ns" : "local.oplog.rs",
=09=09=09= "query" : {
=09=09=09=09
=09=09=09},
=09=09=09"client" : "6.23.38.199:46472",<= /div>
=09=09= =09"desc" : "conn35",
=09=09=09"threadId" : "0x7fad895e7700",
=09=09=09"connectionId" : 35,
=09=09=09"waitingForLock" : false,
=09=09=09"numYields"= : 0,
= =09=09=09"lockStats" : {
=09=09=09=09"timeLockedMicros" : {
=09=09=09=09=09"r" : NumberLong(97),
=09=09=09=09=09"w" : NumberLong(0)
=09=09=09=09}= ,
=09=09= =09=09"timeAcquiringMicros" : {
=09=09=09=09=09"r" : NumberLong(11),<= /div>
=09=09= =09=09=09"w" : NumberLong(0)
=09=09=09=09}
=09=09=09}
=09=09},
=09=09{
=09=09=09= "opid" : 24,
=09=09=09"active" : false,
=09=09=09"op" : "none",
<= div>=09=09=09"ns" : "",
=09=09=09"query" : {
=09=09=09=09
=09=09=09},
=09=09=09"desc" : "r= epl writer worker 1",
=09=09=09"threadId" : "0x7fddb837b700",
=09=09=09"waiti= ngForLock" : false,
=09=09=09"numYields" : 0,
=09=09=09"lockStats" : {
<= div>=09=09=09=09"timeLockedMicros" : {
=09=09=09=09=09"r" : NumberLong(0),
<= span class=3D"Apple-tab-span" style=3D"white-space:pre">=09=09=09=09=09"w" : NumberLong(33)
=09=09=09=09},
=09=09=09=09"timeAcquiringMicros" : {<= /div>
=09=09= =09=09=09
=09=09=09=09}
=09=09=09}
=09=09}
=09]
}
The really good looking part of it is the sharding status :)
mongos> sh.status()
--- Sharding Status --- = ;
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
=09{  "_id" : "s1",  "host" : "s1/mongo11.example.= com:20011,mongo12.example.com:20012" }
=09{  "_id" : "s2",  "host"= : "s2/mongo21.example.com:20021,mongo22.example.com:20022" }
&nb= sp; databases:
=09{  "_id" : "admin",  "partitioned" : false, &nbs= p;"primary" : "config" }
=09{  "_id" : "example_production",  "par= titioned" : true,  "primary" : "s1" }
=09=09example_production.people c= hunks:
= =09=09=09=09s2=091
=09=09=09=09s1=093078
=09=09=09too many chunks to print, use verbose if y= ou want to force print

it's about 10min ago = and there didn't appear another chunk on s2 but it seams really better.

Thank you very much for your help. I will get back if= it's not going on with sharding.

Dani
<= br>

Am Donnerstag, 30. August 2012 17:43:50 UTC+2 schrieb Adam C:<= blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;bord= er-left: 1px #ccc solid;padding-left: 1ex;">What it is saying is that the s= hard it is attempting to migrate to, namely the primary from:


= Thinks it is already taking part in a migration (as I mentioned, it can onl= y take part in one at a time, hence the error).  Are you sure those mo= ngod instances were restarted?

Can you show me rs.= status() from that s2 replica set, and a db.currentOp() from the primary?

In addition - are there any migration messages (or = other errors) visible in the logs for the primary for that set?
<= br>
Adam

On Thursday, August 30, 2012 4:26:36 PM UTC+1, Da= niel Schlegel wrote:
Hi Adam
i r= emoved all the entries and restartet both, mongos and mongod.
The= Problem is still here:

Thu Aug 30 17:18:34 [= conn502] resetting shard version of example_production.people on mongo22.example.com:20= 022, version is zero
Thu Aug 30 17:18:34 [conn503] resetting = shard version of example_production.people on mongo22.example.com:20022, version is= zero
...

Thu Aug 30 17:18:41 [Balancer]=  ns: example_production.people going to move { _id: "example_producti= on.people-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: Object= Id('503e9d5ef940d75c2de07f8e'), ns: "example_production.people", min: = { _id: MinKey }, max: { _id: 304836 }, shard: "s1" } from: s1 to: s2 tag []=
Thu Aug 30 17:18:41 [Balancer] moving chunk ns: example_producti= on.people moving ( ns:example_production.people at: s1:s1/mongo11.example.com:2001= 1,mongo1= 2.example.com:20012 lastmod: 1|0||000000000000000000000000 min: { = _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021,mongo22.example.com:20022
...
Thu Aug 30 17:18:44 [Balancer] moveChunk result: { caus= e: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk f= ailed to engage TO-shard in the data transfer: migrate already in progress"= , ok: 0.0 }
Thu Aug 30 17:18:44 [Balancer] balancer move failed: = { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveC= hunk failed to engage TO-shard in the data transfer: migrate already in pro= gress", ok: 0.0 } from: s1 to: s2 chunk:  min: { _id: MinKey } max: { = _id: MinKey }

mongos> sh.status()
--- Sharding Status --- 
  sharding version: { = "_id" : 1, "version" : 3 }
  shards:
=09{  "_id" : "s1",  "host" : "s1/mongo11.example= .com:20011,mongo12.example.com:20012" }
=09{  "_id" : "s2",  "host" : "s2/mongo21.example.com:20021,mongo= 22.example.com:20022" }
  databases:
=09{  "_id" : "admin",  "partitione= d" : false,  "primary" : "config" }
=09{  "_id" : "example_production",  "partitioned"= : true,  "primary" : "s1" }
=09=09example_production.people chunks:
=09=09=09=09s1=093079
=09=09=09too many = chunks to print, use verbose if you want to force print
these locks appeared again(in think this is like it should be):=

mongos> db.locks.find();
{ "_id= " : "example_production.people", "process" : "mongo11:20011:1346339905:604514035", "state" : 2, "ts" : ObjectId("503f85a8b61cd58923b1fcea"),= "when" : ISODate("2012-08-30T15:24:24.524Z"), "who" : "mongo11:20011:= 1346339905:604514035:conn46:460093839", "why" : "migrate-{ _id: MinKey= }" }
{ "_id" : "balancer", "process" : "worker:27069:1346339673:= 1804289383", "state" : 2, "ts" : ObjectId("503f85a740d7cd22f64415= dc"), "when" : ISODate("2012-08-30T15:24:23.583Z"), "who" : "worker:27= 069:1346339673:1804289383:Balancer:846930886", "why" : "doing bal= ance round" }

i really appreciate you= r help, i'm pretty stuck with my current know how.
Dani

Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:Well, let's clear up the old entries first, = then we can bounce the mongos and see if that helps things.  It may he= lp if you restart the mongod's too - clear out anything stale on their side= .  Each primary can only take part in a single migration at a time, so= if they believe they are actually doing a migration that may be part of th= e problem too.

To remove the stale entries:

use c= onfig;
db.locks.remove({"_id" : "admin-movePrimary"});
db.locks.r= emove({"_id" : "exampl= e_production-mo= vePrimary"});
db.locks.remove({"_id" : "example_production_vanit= y-movePrim= ary"});

Then bounce the mongos, see i= f the migrations kick off.  If not, bounce the mongod's also to clear = out anything stale there.

Adam

On Th= ursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:Yes i did drain a shard because i had a proble= m removing a replica:
1. I was not able to remove a replica but had to = move the server.
2. So ive added a new shard, drained the old one= and remove the old shard.

Heres the discussion ab= out it:
https://groups.google.com/forum/?fromgrou= ps=3D#!searchin/mongodb-user/remove$20old$20replica/mongodb-user/= Goi5cIqC6hQ/XmQF9NODUfMJ

mongos> db= .databases.find();
{ "_id" : "admin", "partitioned" : false, "pri= mary" : "config" }
{ "_id" : "silp_production", "partitioned" : t= rue, "primary" : "s1" }

Thank you so much
Dani

Am Donnerstag, 30. August 2012 13:48:51 UTC+2 sc= hrieb Adam C:
You have several moveP= rimary locks dating back to last month, including the admin database, which= should *always* be on "config" - did you drain a shard, or use movePrimary= for some other reason?

Can you also post the output of:=

use config;
db.databases.find();
Adam.

On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schleg= el wrote:
example_production.people is the important colle= ction.

Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb = Daniel Schlegel:
Hi Adam
<= div>Here's the output:

mongos> db.l= ocks.find();

{ "_id" : "admin-movePrimary", "process" : "Web1:27069:13435717= 61:1804289383", "state" : 0, "ts" : ObjectId("501555d74439248d85d= c8867"), "when" : ISODate("2012-07-29T15:25:11.099Z"), "who" : "Web1:2= 7069:1343571761:1804289383:conn122:1714636915", "why" : "Moving p= rimary shard of admin" }

{ "_id" : "example_production-movePrimary", "process" : "W= eb1:27069:1343571761:1804289383", "state" : 0, "ts" : ObjectId("5= 01553614439248d85dc885a"), "when" : ISODate("2012-07-29T15:14:41.616Z"= ), "who" : "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" = : "Moving primary shard of example_production" }

{ "_id" : "example_production_vanity-movePrimary", "proces= s" : "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : ObjectId(= "501552fb4439248d85dc8855"), "when" : ISODate("2012-07-29T15:12:59.598Z"), "who" : "Web1:27069:1343571761:1804289383:conn1:1681692777",= "why" : "Moving primary shard of example_production_vanity" }

{ "_id" : "balancer", "process" : "web1:27069:1346283357:3= 14909341", "state" : 2, "ts" : ObjectId("503f4f2a3c113ffbd8e4a7e9"), "= when" : ISODate("2012-08-30T11:31:54.320Z"), "who" : "web1:27069:13462= 83357:314909341:Balancer:1842493053", "why" : "doing balance roun= d" }

{ "_id" : "example_production.people", "process" : "mongo11:200= 11:1346282264:758785138", "state" : 0, "ts" : ObjectId("503f4f2ce= 69a6c2009e22331"), "when" : ISODate("2012-08-30T11:31:56.182Z"), "who"= : "mongo11:20011:1346282264:758785138:conn37208:1670912857", "wh= y" : "migrate-{ _id: MinKey }" }

{ "_id" : "example_production.new_coll", "process" : "web1:2706= 9:1346283357:314909341", "state" : 0, "ts" : ObjectId("503f36eb3c= 113ffbd8e4a6a1"), "when" : ISODate("2012-08-30T09:48:27.208Z"), "who" = : "web1:27069:1346283357:314909341:conn37665:149759223", "why" : = "drop" }

{ "_id" : "example_production_vanity.metrics", "process" := "web1:27069:1346283357:314909341", "state" : 0, "ts" : ObjectId("503f37833c113ffbd8e4a6aa"), "when" : ISODate("2012-08-30T09:50:59.474= Z"), "who" : "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" }


Thanks!

Dani


Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb= Adam C:
Dani,

Yo= u might have a stale lock lying around, or the mongos may have a stale view= of things - can you do a couple of things for me:

1. Bounce (restart) all of your mongos
2. Once the bounce is complete, = log into the mongos and run:

use config;
db.locks.find();

And post the results here.
=

Thanks,

Adam

On Thursday, August 3= 0, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
Hello
i've upgrade= d to mongo 2.2.0 yesterday. all went like charme and i could shard my colle= ction.
In the documentation i read:=
Both splits and migrates are performed automatic= ally.

But unfortunately this doesn't work on our setup. 
<= div>I get error messages like these:

Thu Aug 30 12:16:42 [Balancer]=   ns: production.people going to move { _id: "production.people-_id_Mi= nKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('503e9d5= ef940d75c2de07f8e'), ns: "production.people", min: { _id: MinKey }, max: { = _id: 304836 }, shard: "s1" } from: s1 to: s2 tag []

Thu Aug 30 12:16:42 [Balancer] moving chunk ns: production.peop= le moving ( ns:production.people at: s1:s1/mongo11.example.com:20011,mongo12.example.com= :20012 lastmod: 1|0||000000000000000000000000 min: { _id: MinKey }= max: { _id: 304836 }) s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 = -> s2:s2/= mongo21.example.com:20021,mongo22.example.com:20022

Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { err= msg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to= engage TO-shard in the data transfer: migrate already in progress", ok: 0.= 0 }

Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: {= errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk faile= d to engage TO-shard in the data transfer: migrate already in progress", ok= : 0.0 } from: s1 to: s2 chunk:  min: { _id: MinKey } max: { _id: MinKe= y }


After 10h of waiting there is still all da= ta on the first shard and the second is empty.
I think about manually spl= itting but as it's not so easy to find the right splitpoint in our database= i've not done this yet.

Thanks for your Help
Dani
<= /div> ------=_Part_758_22458677.1346342691519-- ------=_Part_757_8652119.1346342691513--