Received: by 10.59.0.226 with SMTP id bb2mr1141632ved.30.1346327341116; Thu, 30 Aug 2012 04:49:01 -0700 (PDT) X-BeenThere: mongodb-user@googlegroups.com Received: by 10.52.100.72 with SMTP id ew8ls2072307vdb.9.gmail; Thu, 30 Aug 2012 04:48:52 -0700 (PDT) Received: by 10.52.35.104 with SMTP id g8mr703645vdj.19.1346327332220; Thu, 30 Aug 2012 04:48:52 -0700 (PDT) Date: Thu, 30 Aug 2012 04:48:51 -0700 (PDT) From: Adam C To: mongodb-user@googlegroups.com Message-Id: In-Reply-To: <0407f0eb-ce1e-4a5c-abd0-d531e04c925a@googlegroups.com> References: <55bcdddf-9c03-44c7-9b0c-25468f3523d4@googlegroups.com> <0407f0eb-ce1e-4a5c-abd0-d531e04c925a@googlegroups.com> Subject: Re: Balancing does not work MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_488_6811808.1346327331912" ------=_Part_488_6811808.1346327331912 Content-Type: multipart/alternative; boundary="----=_Part_489_31904263.1346327331912" ------=_Part_489_31904263.1346327331912 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit You have several movePrimary locks dating back to last month, including the admin database, which should *always* be on "config" - did you drain a shard, or use movePrimary for some other reason? Can you also post the output of: use config; db.databases.find(); Adam. On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote: > > example_production.people is the important collection. > > Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel: >> >> Hi Adam >> Here's the output: >> >> mongos> db.locks.find(); >> >> { "_id" : "admin-movePrimary", "process" : >> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >> ObjectId("501555d74439248d85dc8867"), "when" : >> ISODate("2012-07-29T15:25:11.099Z"), "who" : >> "Web1:27069:1343571761:1804289383:conn122:1714636915", "why" : "Moving >> primary shard of admin" } >> >> { "_id" : "example_production-movePrimary", "process" : >> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >> ObjectId("501553614439248d85dc885a"), "when" : >> ISODate("2012-07-29T15:14:41.616Z"), "who" : >> "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" : "Moving >> primary shard of example_production" } >> >> { "_id" : "example_production_vanity-movePrimary", "process" : >> "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : >> ObjectId("501552fb4439248d85dc8855"), "when" : >> ISODate("2012-07-29T15:12:59.598Z"), "who" : >> "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" : "Moving >> primary shard of example_production_vanity" } >> >> { "_id" : "balancer", "process" : "web1:27069:1346283357:314909341", >> "state" : 2, "ts" : ObjectId("503f4f2a3c113ffbd8e4a7e9"), "when" : >> ISODate("2012-08-30T11:31:54.320Z"), "who" : >> "web1:27069:1346283357:314909341:Balancer:1842493053", "why" : "doing >> balance round" } >> >> { "_id" : "example_production.people", "process" : >> "mongo11:20011:1346282264:758785138", "state" : 0, "ts" : >> ObjectId("503f4f2ce69a6c2009e22331"), "when" : >> ISODate("2012-08-30T11:31:56.182Z"), "who" : >> "mongo11:20011:1346282264:758785138:conn37208:1670912857", "why" : >> "migrate-{ _id: MinKey }" } >> >> { "_id" : "example_production.new_coll", "process" : >> "web1:27069:1346283357:314909341", "state" : 0, "ts" : >> ObjectId("503f36eb3c113ffbd8e4a6a1"), "when" : >> ISODate("2012-08-30T09:48:27.208Z"), "who" : >> "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" } >> >> { "_id" : "example_production_vanity.metrics", "process" : >> "web1:27069:1346283357:314909341", "state" : 0, "ts" : >> ObjectId("503f37833c113ffbd8e4a6aa"), "when" : >> ISODate("2012-08-30T09:50:59.474Z"), "who" : >> "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" } >> >> >> Thanks! >> >> Dani >> >> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C: >>> >>> Dani, >>> >>> You might have a stale lock lying around, or the mongos may have a stale >>> view of things - can you do a couple of things for me: >>> >>> 1. Bounce (restart) all of your mongos >>> 2. Once the bounce is complete, log into the mongos and run: >>> >>> use config; >>> db.locks.find(); >>> >>> And post the results here. >>> >>> Thanks, >>> >>> Adam >>> >>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote: >>>> >>>> Hello >>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>> could shard my collection. >>>> In the documentation i read: >>>> Both splits and migrates are performed automatically. >>>> >>>> But unfortunately this doesn't work on our setup. >>>> I get error messages like these: >>>> >>>> Thu Aug 30 12:16:42 [Balancer] ns: production.people going to move { >>>> _id: "production.people-_id_MinKey", lastmod: Timestamp 1000|0, >>>> lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>>> "production.people", min: { _id: MinKey }, max: { _id: 304836 }, shard: >>>> "s1" } from: s1 to: s2 tag [] >>>> >>>> Thu Aug 30 12:16:42 [Balancer] moving chunk ns: production.people >>>> moving ( ns:production.people at: s1:s1/mongo11.example.com:20011, >>>> mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 min: >>>> { _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011, >>>> mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021, >>>> mongo22.example.com:20022 >>>> >>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> } >>>> >>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey } >>>> >>>> After 10h of waiting there is still all data on the first shard and the >>>> second is empty. >>>> I think about manually splitting but as it's not so easy to find the >>>> right splitpoint in our database i've not done this yet. >>>> >>>> Thanks for your Help >>>> Dani >>>> >>> ------=_Part_489_31904263.1346327331912 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable You have several movePrimary locks dating back to last month, including the= admin database, which should *always* be on "config" - did you drain a sha= rd, or use movePrimary for some other reason?

Can you al= so post the output of:

use config;
db.da= tabases.find();

Adam.

On Thursday, August 30, 2012 12:37:57 P= M UTC+1, Daniel Schlegel wrote:
example_production.pe= ople is the important collection.

Am Donnerstag, 30. August 2= 012 13:37:02 UTC+2 schrieb Daniel Schlegel:
Hi Adam
Here's the output:

=

mongos> db.locks.find();

{ "_id" : "admin-movePrimary", "process" : "Web1:27069:13435717= 61:1804289383", "state" : 0, "ts" : ObjectId("501555d74439248d85d= c8867"), "when" : ISODate("2012-07-29T15:25:11.099Z"), "who" : "Web1:2= 7069:1343571761:1804289383:conn122:1714636915", "why" : "Moving p= rimary shard of admin" }

{ "_id" : "example_production-movePrimary", "process" : "W= eb1:27069:1343571761:1804289383", "state" : 0, "ts" : ObjectId("5= 01553614439248d85dc885a"), "when" : ISODate("2012-07-29T15:14:41.616Z"= ), "who" : "Web1:27069:1343571761:1804289383:conn1:1681692777", "why" = : "Moving primary shard of example_production" }

{ "_id" : "example_production_vanity-movePrimary", "proces= s" : "Web1:27069:1343571761:1804289383", "state" : 0, "ts" : ObjectId(= "501552fb4439248d85dc8855"), "when" : ISODate("2012-07-29T15:12:59.598Z"), "who" : "Web1:27069:1343571761:1804289383:conn1:1681692777",= "why" : "Moving primary shard of example_production_vanity" }

{ "_id" : "balancer", "process" : "web1:27069:1346283357:3= 14909341", "state" : 2, "ts" : ObjectId("503f4f2a3c113ffbd8e4a7e9"), "= when" : ISODate("2012-08-30T11:31:54.320Z"), "who" : "web1:27069:13462= 83357:314909341:Balancer:1842493053", "why" : "doing balance roun= d" }

{ "_id" : "example_production.people", "process" : "mongo11:200= 11:1346282264:758785138", "state" : 0, "ts" : ObjectId("503f4f2ce= 69a6c2009e22331"), "when" : ISODate("2012-08-30T11:31:56.182Z"), "who"= : "mongo11:20011:1346282264:758785138:conn37208:1670912857", "wh= y" : "migrate-{ _id: MinKey }" }

{ "_id" : "example_production.new_coll", "process" : "web1:2706= 9:1346283357:314909341", "state" : 0, "ts" : ObjectId("503f36eb3c= 113ffbd8e4a6a1"), "when" : ISODate("2012-08-30T09:48:27.208Z"), "who" = : "web1:27069:1346283357:314909341:conn37665:149759223", "why" : = "drop" }

{ "_id" : "example_production_vanity.metrics", "process" := "web1:27069:1346283357:314909341", "state" : 0, "ts" : ObjectId("503f37833c113ffbd8e4a6aa"), "when" : ISODate("2012-08-30T09:50:59.474= Z"), "who" : "web1:27069:1346283357:314909341:conn37665:149759223", "why" : "drop" }


Thanks!

Dani


Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb= Adam C:
Dani,

Yo= u might have a stale lock lying around, or the mongos may have a stale view= of things - can you do a couple of things for me:

1. Bounce (restart) all of your mongos
2. Once the bounce is complete, = log into the mongos and run:

use config;
db.locks.find();

And post the results here.
=

Thanks,

Adam

On Thursday, August 3= 0, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
Hello
i've upgrade= d to mongo 2.2.0 yesterday. all went like charme and i could shard my colle= ction.
In the documentation i read:=
Both splits and migrates are performed automatic= ally.

But unfortunately this doesn't work on our setup. 
<= div>I get error messages like these:

Thu Aug 30 12:16:42 [Balancer]=   ns: production.people going to move { _id: "production.people-_id_Mi= nKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('503e9d5= ef940d75c2de07f8e'), ns: "production.people", min: { _id: MinKey }, max: { = _id: 304836 }, shard: "s1" } from: s1 to: s2 tag []

Thu Aug 30 12:16:42 [Balancer] moving chunk ns: production.peop= le moving ( ns:production.people at: s1:s1/mongo11.example.com:20011,mongo12.example.com= :20012 lastmod: 1|0||000000000000000000000000 min: { _id: MinKey }= max: { _id: 304836 }) s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 = -> s2:s2/= mongo21.example.com:20021,mongo22.example.com:20022

Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { err= msg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to= engage TO-shard in the data transfer: migrate already in progress", ok: 0.= 0 }

Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: {= errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk faile= d to engage TO-shard in the data transfer: migrate already in progress", ok= : 0.0 } from: s1 to: s2 chunk:  min: { _id: MinKey } max: { _id: MinKe= y }


After 10h of waiting there is still all da= ta on the first shard and the second is empty.
I think about manually spl= itting but as it's not so easy to find the right splitpoint in our database= i've not done this yet.

Thanks for your Help
Dani
------=_Part_489_31904263.1346327331912-- ------=_Part_488_6811808.1346327331912--