Hello i've upgraded to mongo 2.2.0 yesterday. all went like charme and i could shard my collection. In the documentation i read: Both splits and migrates are performed automatically.
But unfortunately this doesn't work on our setup. I get error messages like these:
Thu Aug 30 12:16:42 [Balancer] ns: production.people going to move { _id: "production.people-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: "production.people", min: { _id: MinKey }, max: { _id: 304836 }, shard: "s1" } from: s1 to: s2 tag []
Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
}
Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
After 10h of waiting there is still all data on the first shard and the second is empty. I think about manually splitting but as it's not so easy to find the right splitpoint in our database i've not done this yet.
On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
> Hello > i've upgraded to mongo 2.2.0 yesterday. all went like charme and i could > shard my collection. > In the documentation i read: > Both splits and migrates are performed automatically.
> But unfortunately this doesn't work on our setup. > I get error messages like these:
> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: > "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to > engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 > }
> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: > "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to > engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 > } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
> After 10h of waiting there is still all data on the first shard and the > second is empty. > I think about manually splitting but as it's not so easy to find the right > splitpoint in our database i've not done this yet.
> You might have a stale lock lying around, or the mongos may have a stale > view of things - can you do a couple of things for me:
> 1. Bounce (restart) all of your mongos > 2. Once the bounce is complete, log into the mongos and run:
> use config; > db.locks.find();
> And post the results here.
> Thanks,
> Adam
> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>> Hello >> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i could >> shard my collection. >> In the documentation i read: >> Both splits and migrates are performed automatically.
>> But unfortunately this doesn't work on our setup. >> I get error messages like these:
>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> }
>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>> After 10h of waiting there is still all data on the first shard and the >> second is empty. >> I think about manually splitting but as it's not so easy to find the >> right splitpoint in our database i've not done this yet.
> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>> Dani,
>> You might have a stale lock lying around, or the mongos may have a stale >> view of things - can you do a couple of things for me:
>> 1. Bounce (restart) all of your mongos >> 2. Once the bounce is complete, log into the mongos and run:
>> use config; >> db.locks.find();
>> And post the results here.
>> Thanks,
>> Adam
>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>>> Hello >>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i could >>> shard my collection. >>> In the documentation i read: >>> Both splits and migrates are performed automatically.
>>> But unfortunately this doesn't work on our setup. >>> I get error messages like these:
>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>> }
>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: >>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>> After 10h of waiting there is still all data on the first shard and the >>> second is empty. >>> I think about manually splitting but as it's not so easy to find the >>> right splitpoint in our database i've not done this yet.
You have several movePrimary locks dating back to last month, including the admin database, which should *always* be on "config" - did you drain a shard, or use movePrimary for some other reason?
>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>>> Dani,
>>> You might have a stale lock lying around, or the mongos may have a stale >>> view of things - can you do a couple of things for me:
>>> 1. Bounce (restart) all of your mongos >>> 2. Once the bounce is complete, log into the mongos and run:
>>> use config; >>> db.locks.find();
>>> And post the results here.
>>> Thanks,
>>> Adam
>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>>>> Hello >>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>> could shard my collection. >>>> In the documentation i read: >>>> Both splits and migrates are performed automatically.
>>>> But unfortunately this doesn't work on our setup. >>>> I get error messages like these:
>>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> }
>>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>> After 10h of waiting there is still all data on the first shard and the >>>> second is empty. >>>> I think about manually splitting but as it's not so easy to find the >>>> right splitpoint in our database i've not done this yet.
Yes i did drain a shard because i had a problem removing a replica: 1. I was not able to remove a replica but had to move the server. 2. So ive added a new shard, drained the old one and remove the old shard.
> You have several movePrimary locks dating back to last month, including > the admin database, which should *always* be on "config" - did you drain a > shard, or use movePrimary for some other reason?
> Can you also post the output of:
> use config; > db.databases.find();
> Adam.
> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>> example_production.people is the important collection.
>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel:
>>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>>>> Dani,
>>>> You might have a stale lock lying around, or the mongos may have a >>>> stale view of things - can you do a couple of things for me:
>>>> 1. Bounce (restart) all of your mongos >>>> 2. Once the bounce is complete, log into the mongos and run:
>>>> use config; >>>> db.locks.find();
>>>> And post the results here.
>>>> Thanks,
>>>> Adam
>>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>>>>> Hello >>>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>>> could shard my collection. >>>>> In the documentation i read: >>>>> Both splits and migrates are performed automatically.
>>>>> But unfortunately this doesn't work on our setup. >>>>> I get error messages like these:
>>>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>>> }
>>>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { >>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>>> After 10h of waiting there is still all data on the first shard and >>>>> the second is empty. >>>>> I think about manually splitting but as it's not so easy to find the >>>>> right splitpoint in our database i've not done this yet.
Well, let's clear up the old entries first, then we can bounce the mongos and see if that helps things. It may help if you restart the mongod's too - clear out anything stale on their side. Each primary can only take part in a single migration at a time, so if they believe they are actually doing a migration that may be part of the problem too.
On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
> Yes i did drain a shard because i had a problem removing a replica: > 1. I was not able to remove a replica but had to move the server. > 2. So ive added a new shard, drained the old one and remove the old shard.
> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C:
>> You have several movePrimary locks dating back to last month, including >> the admin database, which should *always* be on "config" - did you drain a >> shard, or use movePrimary for some other reason?
>> Can you also post the output of:
>> use config; >> db.databases.find();
>> Adam.
>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>>> example_production.people is the important collection.
>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel:
>>>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>>>>> Dani,
>>>>> You might have a stale lock lying around, or the mongos may have a >>>>> stale view of things - can you do a couple of things for me:
>>>>> 1. Bounce (restart) all of your mongos >>>>> 2. Once the bounce is complete, log into the mongos and run:
>>>>> use config; >>>>> db.locks.find();
>>>>> And post the results here.
>>>>> Thanks,
>>>>> Adam
>>>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>>>>>> Hello >>>>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>>>> could shard my collection. >>>>>> In the documentation i read: >>>>>> Both splits and migrates are performed automatically.
>>>>>> But unfortunately this doesn't work on our setup. >>>>>> I get error messages like these:
>>>>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>>>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>>>> }
>>>>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { >>>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>>>> After 10h of waiting there is still all data on the first shard and >>>>>> the second is empty. >>>>>> I think about manually splitting but as it's not so easy to find the >>>>>> right splitpoint in our database i've not done this yet.
Hi Adam i removed all the entries and restartet both, mongos and mongod. The Problem is still here:
Thu Aug 30 17:18:34 [conn502] resetting shard version of example_production.people on mongo22.example.com:20022, version is zero Thu Aug 30 17:18:34 [conn503] resetting shard version of example_production.people on mongo22.example.com:20022, version is zero ...
Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, shard: "s1" } from: s1 to: s2 tag [] Thu Aug 30 17:18:41 [Balancer] moving chunk ns: example_production.people moving ( ns:example_production.people at: s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 min: { _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021,mongo22.example.com:20022 ... Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
}
Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
> Well, let's clear up the old entries first, then we can bounce the mongos > and see if that helps things. It may help if you restart the mongod's too > - clear out anything stale on their side. Each primary can only take part > in a single migration at a time, so if they believe they are actually doing > a migration that may be part of the problem too.
> Then bounce the mongos, see if the migrations kick off. If not, bounce > the mongod's also to clear out anything stale there.
> Adam
> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>> Yes i did drain a shard because i had a problem removing a replica: >> 1. I was not able to remove a replica but had to move the server. >> 2. So ive added a new shard, drained the old one and remove the old shard.
>> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C:
>>> You have several movePrimary locks dating back to last month, including >>> the admin database, which should *always* be on "config" - did you drain a >>> shard, or use movePrimary for some other reason?
>>> Can you also post the output of:
>>> use config; >>> db.databases.find();
>>> Adam.
>>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>>>> example_production.people is the important collection.
>>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel:
>>>>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>>>>>> Dani,
>>>>>> You might have a stale lock lying around, or the mongos may have a >>>>>> stale view of things - can you do a couple of things for me:
>>>>>> 1. Bounce (restart) all of your mongos >>>>>> 2. Once the bounce is complete, log into the mongos and run:
>>>>>> use config; >>>>>> db.locks.find();
>>>>>> And post the results here.
>>>>>> Thanks,
>>>>>> Adam
>>>>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel wrote:
>>>>>>> Hello >>>>>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>>>>> could shard my collection. >>>>>>> In the documentation i read: >>>>>>> Both splits and migrates are performed automatically.
>>>>>>> But unfortunately this doesn't work on our setup. >>>>>>> I get error messages like these:
>>>>>>> Thu Aug 30 12:16:43 [Balancer] moveChunk result: { cause: { errmsg: >>>>>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>>>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>>>>> }
>>>>>>> Thu Aug 30 12:16:43 [Balancer] balancer move failed: { cause: { >>>>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>>>>> After 10h of waiting there is still all data on the first shard and >>>>>>> the second is empty. >>>>>>> I think about manually splitting but as it's not so easy to find the >>>>>>> right splitpoint in our database i've not
Thinks it is already taking part in a migration (as I mentioned, it can only take part in one at a time, hence the error). Are you sure those mongod instances were restarted?
Can you show me rs.status() from that s2 replica set, and a db.currentOp() from the primary?
In addition - are there any migration messages (or other errors) visible in the logs for the primary for that set?
On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
> Hi Adam > i removed all the entries and restartet both, mongos and mongod. > The Problem is still here:
> Thu Aug 30 17:18:34 [conn502] resetting shard version of > example_production.people on mongo22.example.com:20022, version is zero > Thu Aug 30 17:18:34 [conn503] resetting shard version of > example_production.people on mongo22.example.com:20022, version is zero > ...
> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to > move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp > 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: > "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, > shard: "s1" } from: s1 to: s2 tag [] > Thu Aug 30 17:18:41 [Balancer] moving chunk ns: example_production.people > moving ( ns:example_production.people at: s1:s1/mongo11.example.com:20011, > mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 min: { > _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011, > mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021, > mongo22.example.com:20022 > ... > Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: > "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to > engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 > } > Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: > "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to > engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 > } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
> mongos> sh.status() > --- Sharding Status --- > sharding version: { "_id" : 1, "version" : 3 } > shards: > { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, > mongo12.example.com:20012" } > { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, > mongo22.example.com:20022" } > databases: > { "_id" : "admin", "partitioned" : false, "primary" : "config" } > { "_id" : "example_production", "partitioned" : true, "primary" : "s1" } > example_production.people chunks: > s1 3079 > too many chunks to print, use verbose if you want to force print
> these locks appeared again(in think this is like it should be):
> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>> Well, let's clear up the old entries first, then we can bounce the mongos >> and see if that helps things. It may help if you restart the mongod's too >> - clear out anything stale on their side. Each primary can only take part >> in a single migration at a time, so if they believe they are actually doing >> a migration that may be part of the problem too.
>> Then bounce the mongos, see if the migrations kick off. If not, bounce >> the mongod's also to clear out anything stale there.
>> Adam
>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>>> Yes i did drain a shard because i had a problem removing a replica: >>> 1. I was not able to remove a replica but had to move the server. >>> 2. So ive added a new shard, drained the old one and remove the old >>> shard.
>>> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C:
>>>> You have several movePrimary locks dating back to last month, including >>>> the admin database, which should *always* be on "config" - did you drain a >>>> shard, or use movePrimary for some other reason?
>>>> Can you also post the output of:
>>>> use config; >>>> db.databases.find();
>>>> Adam.
>>>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>>>>> example_production.people is the important collection.
>>>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel:
>>>>>> Am Donnerstag, 30. August 2012 12:32:06 UTC+2 schrieb Adam C:
>>>>>>> Dani,
>>>>>>> You might have a stale lock lying around, or the mongos may have a >>>>>>> stale view of things - can you do a couple of things for me:
>>>>>>> 1. Bounce (restart) all of your mongos >>>>>>> 2. Once the bounce is complete, log into the mongos and run:
>>>>>>> use config; >>>>>>> db.locks.find();
>>>>>>> And post the results here.
>>>>>>> Thanks,
>>>>>>> Adam
>>>>>>> On Thursday, August 30, 2012 11:20:08 AM UTC+1, Daniel Schlegel >>>>>>> wrote:
>>>>>>>> Hello >>>>>>>> i've upgraded to mongo 2.2.0 yesterday. all went like charme and i >>>>>>>> could shard my collection. >>>>>>>> In the documentation i read: >>>>>>>> Both splits and migrates are performed automatically.
>>>>>>>> But unfortunately this doesn't work on our setup. >>>>>>>> I get error messages like these:
Im pretty sure i restarted all the mongod's in our system. I made it again for the s2 mongod's. Not sure if it was exactly on point of restarting the mongod's but the exception seam to have disappeared. Is it possible, that i had to bounce first the primary of s1 and then the primary of s2? I did it in reverse order.
> Thinks it is already taking part in a migration (as I mentioned, it can > only take part in one at a time, hence the error). Are you sure those > mongod instances were restarted?
> Can you show me rs.status() from that s2 replica set, and a db.currentOp() > from the primary?
> In addition - are there any migration messages (or other errors) visible > in the logs for the primary for that set?
> Adam
> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>> Hi Adam >> i removed all the entries and restartet both, mongos and mongod. >> The Problem is still here:
>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >> example_production.people on mongo22.example.com:20022, version is zero >> Thu Aug 30 17:18:34 [conn503] resetting shard version of >> example_production.people on mongo22.example.com:20022, version is zero >> ...
>> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to >> move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >> shard: "s1" } from: s1 to: s2 tag [] >> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: example_production.people >> moving ( ns:example_production.people at: s1:s1/mongo11.example.com:20011 >> ,mongo12.example.com:20012 lastmod: 1|0||000000000000000000000000 min: { >> _id: MinKey } max: { _id: 304836 }) s1:s1/mongo11.example.com:20011, >> mongo12.example.com:20012 -> s2:s2/mongo21.example.com:20021, >> mongo22.example.com:20022 >> ... >> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> } >> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: >> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>> mongos> sh.status() >> --- Sharding Status --- >> sharding version: { "_id" : 1, "version" : 3 } >> shards: >> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >> mongo12.example.com:20012" } >> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >> mongo22.example.com:20022" } >> databases: >> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >> { "_id" : "example_production", "partitioned" : true, "primary" : "s1" >> } >> example_production.people chunks: >> s1 3079 >> too many chunks to print, use verbose if you want to force print
>> these locks appeared again(in think this is like it should be):
>> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>>> Well, let's clear up the old entries first, then we can bounce the >>> mongos and see if that helps things. It may help if you restart the >>> mongod's too - clear out anything stale on their side. Each primary can >>> only take part in a single migration at a time, so if they believe they are >>> actually doing a migration that may be part of the problem too.
>>> Then bounce the mongos, see if the migrations kick off. If not, bounce >>> the mongod's also to clear out anything stale there.
>>> Adam
>>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>>>> Yes i did drain a shard because i had a problem removing a replica: >>>> 1. I was not able to remove a replica but had to move the server. >>>> 2. So ive added a new shard, drained the old one and remove the old >>>> shard.
>>>> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C:
>>>>> You have several movePrimary locks dating back to last month, >>>>> including the admin database, which should *always* be on "config" - did >>>>> you drain a shard, or use movePrimary for some other reason?
>>>>> Can you also post the output of:
>>>>> use config; >>>>> db.databases.find();
>>>>> Adam.
>>>>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>>>>>> example_production.people is the important collection.
>>>>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel Schlegel:
The primaries are the ones responsible for the migrations, at the behest of the mongos. I would be surprised if the order mattered, but at least the chunks have started moving. Let us know if it breaks down or has issues again.
On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
> Im pretty sure i restarted all the mongod's in our system. I made it again > for the s2 mongod's. > Not sure if it was exactly on point of restarting the mongod's but the > exception seam to have disappeared. > Is it possible, that i had to bounce first the primary of s1 and then the > primary of s2? I did it in reverse order.
>> Thinks it is already taking part in a migration (as I mentioned, it can >> only take part in one at a time, hence the error). Are you sure those >> mongod instances were restarted?
>> Can you show me rs.status() from that s2 replica set, and a >> db.currentOp() from the primary?
>> In addition - are there any migration messages (or other errors) visible >> in the logs for the primary for that set?
>> Adam
>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>> Hi Adam >>> i removed all the entries and restartet both, mongos and mongod. >>> The Problem is still here:
>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>> example_production.people on mongo22.example.com:20022, version is zero >>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>> example_production.people on mongo22.example.com:20022, version is zero >>> ...
>>> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to >>> move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >>> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >>> shard: "s1" } from: s1 to: s2 tag [] >>> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: >>> example_production.people moving ( ns:example_production.people at: s1:s1/ >>> mongo11.example.com:20011,mongo12.example.com:20012 lastmod: >>> 1|0||000000000000000000000000 min: { _id: MinKey } max: { _id: 304836 }) >>> s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/ >>> mongo21.example.com:20021,mongo22.example.com:20022 >>> ... >>> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>> } >>> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: >>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>> mongos> sh.status() >>> --- Sharding Status --- >>> sharding version: { "_id" : 1, "version" : 3 } >>> shards: >>> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >>> mongo12.example.com:20012" } >>> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >>> mongo22.example.com:20022" } >>> databases: >>> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >>> { "_id" : "example_production", "partitioned" : true, "primary" : >>> "s1" } >>> example_production.people chunks: >>> s1 3079 >>> too many chunks to print, use verbose if you want to force print
>>> these locks appeared again(in think this is like it should be):
>>> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>>>> Well, let's clear up the old entries first, then we can bounce the >>>> mongos and see if that helps things. It may help if you restart the >>>> mongod's too - clear out anything stale on their side. Each primary can >>>> only take part in a single migration at a time, so if they believe they are >>>> actually doing a migration that may be part of the problem too.
>>>> Then bounce the mongos, see if the migrations kick off. If not, bounce >>>> the mongod's also to clear out anything stale there.
>>>> Adam
>>>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>>>>> Yes i did drain a shard because i had a problem removing a replica: >>>>> 1. I was not able to remove a replica but had to move the server. >>>>> 2. So ive added a new shard, drained the old one and remove the old >>>>> shard.
>>>>> Am Donnerstag, 30. August 2012 13:48:51 UTC+2 schrieb Adam C:
>>>>>> You have several movePrimary locks dating back to last month, >>>>>> including the admin database, which should *always* be on "config" - did >>>>>> you drain a shard, or use movePrimary for some other reason?
>>>>>> Can you also post the output of:
>>>>>> use config; >>>>>> db.databases.find();
>>>>>> Adam.
>>>>>> On Thursday, August 30, 2012 12:37:57 PM UTC+1, Daniel Schlegel wrote:
>>>>>>> example_production.people is the important collection.
>>>>>>> Am Donnerstag, 30. August 2012 13:37:02 UTC+2 schrieb Daniel >>>>>>> Schlegel:
I've got a similar problem with my database, I had 2 shards, I added a 3rd, and now the third one won't accept any data. the mongos logs show the same "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress" message that Daniel got. I've tried bouncing just the mongoses, then both the mongoses and the mongods, but I still get the same message. I also get
[Balancer] distributed lock 'balancer/dbs3a:27017:1346462580:1804289383' unlocked. but I think that's the balancer giving up it's lock.
Any advice would be great and greatly appreciated.
On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
> The primaries are the ones responsible for the migrations, at the behest > of the mongos. I would be surprised if the order mattered, but at least > the chunks have started moving. Let us know if it breaks down or has > issues again.
> Adam
> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>> Im pretty sure i restarted all the mongod's in our system. I made it >> again for the s2 mongod's. >> Not sure if it was exactly on point of restarting the mongod's but the >> exception seam to have disappeared. >> Is it possible, that i had to bounce first the primary of s1 and then the >> primary of s2? I did it in reverse order.
>>> Thinks it is already taking part in a migration (as I mentioned, it can >>> only take part in one at a time, hence the error). Are you sure those >>> mongod instances were restarted?
>>> Can you show me rs.status() from that s2 replica set, and a >>> db.currentOp() from the primary?
>>> In addition - are there any migration messages (or other errors) visible >>> in the logs for the primary for that set?
>>> Adam
>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>>> Hi Adam >>>> i removed all the entries and restartet both, mongos and mongod. >>>> The Problem is still here:
>>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>>> example_production.people on mongo22.example.com:20022, version is zero >>>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>>> example_production.people on mongo22.example.com:20022, version is zero >>>> ...
>>>> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to >>>> move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >>>> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>>> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >>>> shard: "s1" } from: s1 to: s2 tag [] >>>> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: >>>> example_production.people moving ( ns:example_production.people at: s1:s1/ >>>> mongo11.example.com:20011,mongo12.example.com:20012 lastmod: >>>> 1|0||000000000000000000000000 min: { _id: MinKey } max: { _id: 304836 }) >>>> s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/ >>>> mongo21.example.com:20021,mongo22.example.com:20022 >>>> ... >>>> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> } >>>> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { errmsg: >>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>> } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>> mongos> sh.status() >>>> --- Sharding Status --- >>>> sharding version: { "_id" : 1, "version" : 3 } >>>> shards: >>>> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >>>> mongo12.example.com:20012" } >>>> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >>>> mongo22.example.com:20022" } >>>> databases: >>>> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >>>> { "_id" : "example_production", "partitioned" : true, "primary" : >>>> "s1" } >>>> example_production.people chunks: >>>> s1 3079 >>>> too many chunks to print, use verbose if you want to force print
>>>> these locks appeared again(in think this is like it should be):
>>>> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>>>>> Well, let's clear up the old entries first, then we can bounce the >>>>> mongos and see if that helps things. It may help if you restart the >>>>> mongod's too - clear out anything stale on their side. Each primary can >>>>> only take part in a single migration at a time, so if they believe they are >>>>> actually doing a migration that may be part of the problem too.
>>>>> Then bounce the mongos, see if the migrations kick off. If not, >>>>> bounce the mongod's also to clear out anything stale there.
>>>>> Adam
>>>>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>>>>>> Yes i did drain a shard because i had a problem removing a replica: >>>>>> 1. I was not able to remove a replica but had to move the server. >>>>>> 2. So ive added a new shard, drained the old one and remove the old >>>>>> shard.
On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
> I've got a similar problem with my database, I had 2 shards, I added a > 3rd, and now the third one won't accept any data. the mongos logs show the > same "moveChunk failed to engage TO-shard in the data transfer: migrate > already in progress" message that Daniel got. I've tried bouncing just the > mongoses, then both the mongoses and the mongods, but I still get the same > message. I also get
> [Balancer] distributed lock 'balancer/dbs3a:27017:1346462580:1804289383' > unlocked. but I think that's the balancer giving up it's lock.
> Any advice would be great and greatly appreciated.
> Geoff
> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>> The primaries are the ones responsible for the migrations, at the behest >> of the mongos. I would be surprised if the order mattered, but at least >> the chunks have started moving. Let us know if it breaks down or has >> issues again.
>> Adam
>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>> Im pretty sure i restarted all the mongod's in our system. I made it >>> again for the s2 mongod's. >>> Not sure if it was exactly on point of restarting the mongod's but the >>> exception seam to have disappeared. >>> Is it possible, that i had to bounce first the primary of s1 and then >>> the primary of s2? I did it in reverse order.
>>>> Thinks it is already taking part in a migration (as I mentioned, it can >>>> only take part in one at a time, hence the error). Are you sure those >>>> mongod instances were restarted?
>>>> Can you show me rs.status() from that s2 replica set, and a >>>> db.currentOp() from the primary?
>>>> In addition - are there any migration messages (or other errors) >>>> visible in the logs for the primary for that set?
>>>> Adam
>>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>>>> Hi Adam >>>>> i removed all the entries and restartet both, mongos and mongod. >>>>> The Problem is still here:
>>>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>>>> example_production.people on mongo22.example.com:20022, version is >>>>> zero >>>>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>>>> example_production.people on mongo22.example.com:20022, version is >>>>> zero >>>>> ...
>>>>> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going to >>>>> move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >>>>> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>>>> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >>>>> shard: "s1" } from: s1 to: s2 tag [] >>>>> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: >>>>> example_production.people moving ( ns:example_production.people at: s1:s1/ >>>>> mongo11.example.com:20011,mongo12.example.com:20012 lastmod: >>>>> 1|0||000000000000000000000000 min: { _id: MinKey } max: { _id: 304836 }) >>>>> s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/ >>>>> mongo21.example.com:20021,mongo22.example.com:20022 >>>>> ... >>>>> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >>>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>>> } >>>>> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { >>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>>> mongos> sh.status() >>>>> --- Sharding Status --- >>>>> sharding version: { "_id" : 1, "version" : 3 } >>>>> shards: >>>>> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >>>>> mongo12.example.com:20012" } >>>>> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >>>>> mongo22.example.com:20022" } >>>>> databases: >>>>> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >>>>> { "_id" : "example_production", "partitioned" : true, "primary" : >>>>> "s1" } >>>>> example_production.people chunks: >>>>> s1 3079 >>>>> too many chunks to print, use verbose if you want to force print
>>>>> these locks appeared again(in think this is like it should be):
>>>>> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>>>>>> Well, let's clear up the old entries first, then we can bounce the >>>>>> mongos and see if that helps things. It may help if you restart the >>>>>> mongod's too - clear out anything stale on their side. Each primary can >>>>>> only take part in a single migration at a time, so if they believe they are >>>>>> actually doing a migration that may be part of the problem too.
>>>>>> Then bounce the mongos, see if the migrations kick off. If not, >>>>>> bounce the mongod's also to clear out anything stale there.
>>>>>> Adam
>>>>>> On Thursday, August 30, 2012 1:17:15 PM UTC+1, Daniel Schlegel wrote:
>>>>>>> Yes i did drain a shard because i had a problem removing a replica: >>>>>>> 1. I was not able to remove a replica but had to move the server. >>>>>>> 2. So ive added a new shard, drained the old one and remove the old >>>>>>> shard.
The upgrade sequence needed to ensure this does not happen is as follows:
1) turn off balancer 2) upgrade all mongos's 3) upgrade config dabases 4) upgrade each shard (using procedure for replica sets or standalone, whichever applies) 5) turn on balancer
On Saturday, September 1, 2012 1:46:33 PM UTC+1, Adam C wrote:
> Geoff, are you also using 2.2 or are you on the 2.0 branch?
> Can you post the output of the information I requested above on your > system? In particular, from the mongos:
> use config; > db.locks.find();
> Thanks,
> Adam
> On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
>> I've got a similar problem with my database, I had 2 shards, I added a >> 3rd, and now the third one won't accept any data. the mongos logs show the >> same "moveChunk failed to engage TO-shard in the data transfer: migrate >> already in progress" message that Daniel got. I've tried bouncing just the >> mongoses, then both the mongoses and the mongods, but I still get the same >> message. I also get
>> [Balancer] distributed lock 'balancer/dbs3a:27017:1346462580:1804289383' >> unlocked. but I think that's the balancer giving up it's lock.
>> Any advice would be great and greatly appreciated.
>> Geoff
>> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>>> The primaries are the ones responsible for the migrations, at the behest >>> of the mongos. I would be surprised if the order mattered, but at least >>> the chunks have started moving. Let us know if it breaks down or has >>> issues again.
>>> Adam
>>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>>> Im pretty sure i restarted all the mongod's in our system. I made it >>>> again for the s2 mongod's. >>>> Not sure if it was exactly on point of restarting the mongod's but the >>>> exception seam to have disappeared. >>>> Is it possible, that i had to bounce first the primary of s1 and then >>>> the primary of s2? I did it in reverse order.
>>>>> Thinks it is already taking part in a migration (as I mentioned, it >>>>> can only take part in one at a time, hence the error). Are you sure those >>>>> mongod instances were restarted?
>>>>> Can you show me rs.status() from that s2 replica set, and a >>>>> db.currentOp() from the primary?
>>>>> In addition - are there any migration messages (or other errors) >>>>> visible in the logs for the primary for that set?
>>>>> Adam
>>>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>>>>> Hi Adam >>>>>> i removed all the entries and restartet both, mongos and mongod. >>>>>> The Problem is still here:
>>>>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>> zero >>>>>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>> zero >>>>>> ...
>>>>>> Thu Aug 30 17:18:41 [Balancer] ns: example_production.people going >>>>>> to move { _id: "example_production.people-_id_MinKey", lastmod: Timestamp >>>>>> 1000|0, lastmodEpoch: ObjectId('503e9d5ef940d75c2de07f8e'), ns: >>>>>> "example_production.people", min: { _id: MinKey }, max: { _id: 304836 }, >>>>>> shard: "s1" } from: s1 to: s2 tag [] >>>>>> Thu Aug 30 17:18:41 [Balancer] moving chunk ns: >>>>>> example_production.people moving ( ns:example_production.people at: s1:s1/ >>>>>> mongo11.example.com:20011,mongo12.example.com:20012 lastmod: >>>>>> 1|0||000000000000000000000000 min: { _id: MinKey } max: { _id: 304836 }) >>>>>> s1:s1/mongo11.example.com:20011,mongo12.example.com:20012 -> s2:s2/ >>>>>> mongo21.example.com:20021,mongo22.example.com:20022 >>>>>> ... >>>>>> Thu Aug 30 17:18:44 [Balancer] moveChunk result: { cause: { errmsg: >>>>>> "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to >>>>>> engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 >>>>>> } >>>>>> Thu Aug 30 17:18:44 [Balancer] balancer move failed: { cause: { >>>>>> errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed >>>>>> to engage TO-shard in the data transfer: migrate already in progress", ok: >>>>>> 0.0 } from: s1 to: s2 chunk: min: { _id: MinKey } max: { _id: MinKey }
>>>>>> mongos> sh.status() >>>>>> --- Sharding Status --- >>>>>> sharding version: { "_id" : 1, "version" : 3 } >>>>>> shards: >>>>>> { "_id" : "s1", "host" : "s1/mongo11.example.com:20011, >>>>>> mongo12.example.com:20012" } >>>>>> { "_id" : "s2", "host" : "s2/mongo21.example.com:20021, >>>>>> mongo22.example.com:20022" } >>>>>> databases: >>>>>> { "_id" : "admin", "partitioned" : false, "primary" : "config" } >>>>>> { "_id" : "example_production", "partitioned" : true, "primary" : >>>>>> "s1" } >>>>>> example_production.people chunks: >>>>>> s1 3079 >>>>>> too many chunks to print, use verbose if you want to force print
>>>>>> these locks appeared again(in think this is like it should be):
>>>>>> Am Donnerstag, 30. August 2012 16:36:23 UTC+2 schrieb Adam C:
>>>>>>> Well, let's clear up the old entries first, then we can bounce the >>>>>>> mongos and see if that helps things. It may help if you restart the >>>>>>> mongod's too - clear out anything stale on their side. Each primary can >>>>>>> only take part in a single migration at a time, so if they believe they are >>>>>>> actually doing a migration that may be part of the problem too.
>>>>>>> To remove the stale entries:
>>>>>>> use config; >>>>>>> db.locks.remove({"_id" :
> The upgrade sequence needed to ensure this does not happen is as follows:
> 1) turn off balancer > 2) upgrade all mongos's > 3) upgrade config dabases > 4) upgrade each shard (using procedure for replica sets or standalone, > whichever applies) > 5) turn on balancer
> Adam
> On Saturday, September 1, 2012 1:46:33 PM UTC+1, Adam C wrote:
>> Geoff, are you also using 2.2 or are you on the 2.0 branch?
>> Can you post the output of the information I requested above on your >> system? In particular, from the mongos:
>> use config; >> db.locks.find();
>> Thanks,
>> Adam
>> On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
>>> I've got a similar problem with my database, I had 2 shards, I added a >>> 3rd, and now the third one won't accept any data. the mongos logs show the >>> same "moveChunk failed to engage TO-shard in the data transfer: migrate >>> already in progress" message that Daniel got. I've tried bouncing just the >>> mongoses, then both the mongoses and the mongods, but I still get the same >>> message. I also get
>>> [Balancer] distributed lock 'balancer/dbs3a:27017:1346462580:1804289383' >>> unlocked. but I think that's the balancer giving up it's lock.
>>> Any advice would be great and greatly appreciated.
>>> Geoff
>>> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>>>> The primaries are the ones responsible for the migrations, at the >>>> behest of the mongos. I would be surprised if the order mattered, but at >>>> least the chunks have started moving. Let us know if it breaks down or has >>>> issues again.
>>>> Adam
>>>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>>>> Im pretty sure i restarted all the mongod's in our system. I made it >>>>> again for the s2 mongod's. >>>>> Not sure if it was exactly on point of restarting the mongod's but the >>>>> exception seam to have disappeared. >>>>> Is it possible, that i had to bounce first the primary of s1 and then >>>>> the primary of s2? I did it in reverse order.
>>>>>> Thinks it is already taking part in a migration (as I mentioned, it >>>>>> can only take part in one at a time, hence the error). Are you sure those >>>>>> mongod instances were restarted?
>>>>>> Can you show me rs.status() from that s2 replica set, and a >>>>>> db.currentOp() from the primary?
>>>>>> In addition - are there any migration messages (or other errors) >>>>>> visible in the logs for the primary for that set?
>>>>>> Adam
>>>>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>>>>>> Hi Adam >>>>>>> i removed all the entries and restartet both, mongos and mongod. >>>>>>> The Problem is still here:
>>>>>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>>> zero >>>>>>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>>> zero >>>>>>> ...
>> The upgrade sequence needed to ensure this does not happen is as follows:
>> 1) turn off balancer >> 2) upgrade all mongos's >> 3) upgrade config dabases >> 4) upgrade each shard (using procedure for replica sets or standalone, >> whichever applies) >> 5) turn on balancer
>> Adam
>> On Saturday, September 1, 2012 1:46:33 PM UTC+1, Adam C wrote:
>>> Geoff, are you also using 2.2 or are you on the 2.0 branch?
>>> Can you post the output of the information I requested above on your >>> system? In particular, from the mongos:
>>> use config; >>> db.locks.find();
>>> Thanks,
>>> Adam
>>> On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
>>>> I've got a similar problem with my database, I had 2 shards, I added a >>>> 3rd, and now the third one won't accept any data. the mongos logs show the >>>> same "moveChunk failed to engage TO-shard in the data transfer: migrate >>>> already in progress" message that Daniel got. I've tried bouncing just the >>>> mongoses, then both the mongoses and the mongods, but I still get the same >>>> message. I also get
>>>> [Balancer] distributed lock >>>> 'balancer/dbs3a:27017:1346462580:1804289383' unlocked. but I think that's >>>> the balancer giving up it's lock.
>>>> Any advice would be great and greatly appreciated.
>>>> Geoff
>>>> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>>>>> The primaries are the ones responsible for the migrations, at the >>>>> behest of the mongos. I would be surprised if the order mattered, but at >>>>> least the chunks have started moving. Let us know if it breaks down or has >>>>> issues again.
>>>>> Adam
>>>>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>>>>> Im pretty sure i restarted all the mongod's in our system. I made it >>>>>> again for the s2 mongod's. >>>>>> Not sure if it was exactly on point of restarting the mongod's but >>>>>> the exception seam to have disappeared. >>>>>> Is it possible, that i had to bounce first the primary of s1 and then >>>>>> the primary of s2? I did it in reverse order.
>>>>>>> Thinks it is already taking part in a migration (as I mentioned, it >>>>>>> can only take part in one at a time, hence the error). Are you sure those >>>>>>> mongod instances were restarted?
>>>>>>> Can you show me rs.status() from that s2 replica set, and a >>>>>>> db.currentOp() from the primary?
>>>>>>> In addition - are there any migration messages (or other errors) >>>>>>> visible in the logs for the primary for that set?
>>>>>>> Adam
>>>>>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel Schlegel wrote:
>>>>>>>> Hi Adam >>>>>>>> i removed all the entries and restartet both, mongos and mongod. >>>>>>>> The Problem is still here:
>>>>>>>> Thu Aug 30 17:18:34 [conn502] resetting shard version of >>>>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>>>> zero >>>>>>>> Thu Aug 30 17:18:34 [conn503] resetting shard version of >>>>>>>> example_production.people on mongo22.example.com:20022, version is >>>>>>>> zero >>>>>>>> ...
For anyone else running into similar issues here on 2.2.0, the root cause is very likely SERVER-7003 <https://jira.mongodb.org/browse/SERVER-7003> which will be fixed in 2.2.1
>>> The upgrade sequence needed to ensure this does not happen is as follows:
>>> 1) turn off balancer >>> 2) upgrade all mongos's >>> 3) upgrade config dabases >>> 4) upgrade each shard (using procedure for replica sets or standalone, >>> whichever applies) >>> 5) turn on balancer
>>> Adam
>>> On Saturday, September 1, 2012 1:46:33 PM UTC+1, Adam C wrote:
>>>> Geoff, are you also using 2.2 or are you on the 2.0 branch?
>>>> Can you post the output of the information I requested above on your >>>> system? In particular, from the mongos:
>>>> use config; >>>> db.locks.find();
>>>> Thanks,
>>>> Adam
>>>> On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
>>>>> I've got a similar problem with my database, I had 2 shards, I added a >>>>> 3rd, and now the third one won't accept any data. the mongos logs show the >>>>> same "moveChunk failed to engage TO-shard in the data transfer: migrate >>>>> already in progress" message that Daniel got. I've tried bouncing just the >>>>> mongoses, then both the mongoses and the mongods, but I still get the same >>>>> message. I also get
>>>>> [Balancer] distributed lock >>>>> 'balancer/dbs3a:27017:1346462580:1804289383' unlocked. but I think that's >>>>> the balancer giving up it's lock.
>>>>> Any advice would be great and greatly appreciated.
>>>>> Geoff
>>>>> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>>>>>> The primaries are the ones responsible for the migrations, at the >>>>>> behest of the mongos. I would be surprised if the order mattered, but at >>>>>> least the chunks have started moving. Let us know if it breaks down or has >>>>>> issues again.
>>>>>> Adam
>>>>>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>>>>>> Im pretty sure i restarted all the mongod's in our system. I made it >>>>>>> again for the s2 mongod's. >>>>>>> Not sure if it was exactly on point of restarting the mongod's but >>>>>>> the exception seam to have disappeared. >>>>>>> Is it possible, that i had to bounce first the primary of s1 and >>>>>>> then the primary of s2? I did it in reverse order.
>>>>>>>> Thinks it is already taking part in a migration (as I mentioned, it >>>>>>>> can only take part in one at a time, hence the error). Are you sure those >>>>>>>> mongod instances were restarted?
>>>>>>>> Can you show me rs.status() from that s2 replica set, and a >>>>>>>> db.currentOp() from the primary?
>>>>>>>> In addition - are there any migration messages (or other errors) >>>>>>>> visible in the logs for the primary for that set?
>>>>>>>> Adam
>>>>>>>> On Thursday, August 30, 2012 4:26:36 PM UTC+1, Daniel
<code>
Sat Feb 9 04:26:19 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
User: MinKey }
Sat Feb 9 04:26:59 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
}
Sat Feb 9 04:27:01 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
User: MinKey }
Sat Feb 9 04:27:36 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
}
Sat Feb 9 04:27:38 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
UserID: MinKey }
Sat Feb 9 04:28:16 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
}
Sat Feb 9 04:28:22 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0
} from: replicaSet2 to: replicaSet1 chunk: min: { OutgoingMessagesID:
Also, recently we got DBException in process: socket exception [SEND_ERROR] for 127.0.0.1:41725 (bug https://jira.mongodb.org/browse/SERVER-7008)
which brokes mongoses connection permanently with cursor timing out on application side (timeout: 30000, time left: 0:0, status: 0)
Full condition state shows that socket exception [SEND_ERROR] for localhost occurs once ChunkManager with autosplitting kicks in:
<code>Sat Feb 9 09:41:49 [conn7] authenticate db: que_db { authenticate: 1, user: "user", nonce: "xxxx", key: "xxx" }
Sat Feb 9 09:41:49 [conn7] resetting shard version of que_db.exp on m1.10001, version is zero
Sat Feb 9 11:01:38 [conn7] SyncClusterConnection connecting to [conf1:10003]
Sat Feb 9 11:01:38 [conn7] SyncClusterConnection connecting to [conf2:10003]
Sat Feb 9 11:01:39 [conn7] SyncClusterConnection connecting to [conf3:10003]
Sat Feb 9 11:01:41 [conn7] ChunkManager: time to load chunks for que_db.exp: 1ms sequenceNumber: 9 version: 1|40||50f5369065d3598b74ff2841 based on: 1|38||50f5369065d3598b74ff2841
Sat Feb 9 11:01:41 [conn7] autosplitted que_db.exp shard: ns:que_db.exp at: replicaSet2:replicaSet2/m2:10001,m2rpl:10001 lastmod: 1|24||000000000000000000000000 min: { UserID: 21596 } max: { UserID: 21744
} on: { UserID: 21608 } (splitThreshold 67108864)
Sat Feb 9 11:02:36 [conn7] DBException in process: socket exception [SEND_ERROR] for 127.0.0.1:41725
Sat Feb 9 11:02:36 [conn7] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:41725]
</code>
Thanks
Vasyl
Субота, 6 жовтня 2012 р. 19:28:43 UTC+3 користувач Adam C написав:
> For anyone else running into similar issues here on 2.2.0, the root cause > is very likely SERVER-7003 <https://jira.mongodb.org/browse/SERVER-7003> which > will be fixed in 2.2.1
>>>> The upgrade sequence needed to ensure this does not happen is as >>>> follows:
>>>> 1) turn off balancer
>>>> 2) upgrade all mongos's >>>> 3) upgrade config dabases >>>> 4) upgrade each shard (using procedure for replica sets or standalone, >>>> whichever applies) >>>> 5) turn on balancer
>>>> Adam
>>>> On Saturday, September 1, 2012 1:46:33 PM UTC+1, Adam C wrote:
>>>>> Geoff, are you also using 2.2 or are you on the 2.0 branch?
>>>>> Can you post the output of the information I requested above on your >>>>> system? In particular, from the mongos:
>>>>> use config;
>>>>> db.locks.find();
>>>>> Thanks,
>>>>> Adam
>>>>> On Saturday, September 1, 2012 2:58:37 AM UTC+1, Geoff L wrote:
>>>>>> I've got a similar problem with my database, I had 2 shards, I added >>>>>> a 3rd, and now the third one won't accept any data. the mongos logs show >>>>>> the same "moveChunk failed to engage TO-shard in the data transfer: migrate >>>>>> already in progress" message that Daniel got. I've tried bouncing just the >>>>>> mongoses, then both the mongoses and the mongods, but I still get the same >>>>>> message. I also get
>>>>>> [Balancer] distributed lock >>>>>> 'balancer/dbs3a:27017:1346462580:1804289383' unlocked. but I think that's >>>>>> the balancer giving up it's lock.
>>>>>> Any advice would be great and greatly appreciated.
>>>>>> Geoff
>>>>>> On Thursday, August 30, 2012 9:12:25 AM UTC-7, Adam C wrote:
>>>>>>> The primaries are the ones responsible for the migrations, at the >>>>>>> behest of the mongos. I would be surprised if the order mattered, but at >>>>>>> least the chunks have started moving. Let us know if it breaks down or has >>>>>>> issues again.
>>>>>>> Adam
>>>>>>> On Thursday, August 30, 2012 5:04:51 PM UTC+1, Daniel Schlegel wrote:
>>>>>>>> Im pretty sure i restarted all the mongod's in our system. I made >>>>>>>> it again for the s2 mongod's.
>>>>>>>> Not sure if it was exactly on point of restarting the mongod's but >>>>>>>> the exception seam to have disappeared. >>>>>>>> Is it possible, that i had to bounce first the primary of s1 and >>>>>>>> then the primary of s2? I did it in reverse order.
>>>>>>>> Here are the outputs:
>>>>>>>> s2:PRIMARY> rs.status()
>>>>>>>> {
>>>>>>>> "set" : "s2",
>>>>>>>> "date" :