No data moving to new shard

669 views
Skip to first unread message

David K Storrs

unread,
Sep 3, 2012, 5:59:42 PM9/3/12
to mongodb-user
I added a third shard to my cluster last week and no data has moved to
it at all. I've been digging for answers and I'm finding related
things but nothing that quite addresses the issue -- my first thought
was that there is a stale lock somewhere, but I don't see it. Any
help would be much appreciated, because I've GOT to get this sorted
soon -- the RS1 shard is about 80% on disk.

DB is v2.0; we have been using v2.0 for several months now and have
not made the jump to 2.2 yet.

Below is everything I think might help someone figure this out. Sorry
for the wall o' text; I wasn't sure how much to include.

---------------------------------------------------------------------------
[The following done from the RS1 primary unless otherwise indicated,
if that matters]

mongos> ObjectId().getTimestamp()
ISODate("2012-09-03T21:03:09Z")

mongos> use admin
switched to db admin
mongos> db.printShardingStatus()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "RS1", "host" : "RS1/xxx:27018" }
{ "_id" : "RS2", "host" : "RS2/xxx:27018" }
{ "_id" : "RS3", "host" : "RS3/xxx:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" :
"config" }
{ "_id" : "demo", "partitioned" : false, "primary" :
"RS1" }
{ "_id" : "cm_prod", "partitioned" : true, "primary" :
"RS1" }
cm_prod.profile_daily_stats chunks:
RS2 16
RS1 16
too many chunks to print, use verbose if you
want to force print
cm_prod.profile_raw_stats chunks:
RS2 160
RS1 157
too many chunks to print, use verbose if you
want to force print
cm_prod.video_latest_stats chunks:
RS1 152
RS2 153
too many chunks to print, use verbose if you
want to force print
cm_prod.video_raw_stats chunks:
RS1 3360
RS2 3364
too many chunks to print, use verbose if you
want to force print
{ "_id" : "test", "partitioned" : false, "primary" :
"RS2" }
{ "_id" : "admin; db", "partitioned" : false, "primary" :
"RS2" }
{ "_id" : "system", "partitioned" : false, "primary" :
"RS2" }
{ "_id" : "geoff", "partitioned" : false, "primary" :
"RS2" }
{ "_id" : "cm_demo_data", "partitioned" : false,
"primary" : "RS2" }
{ "_id" : "cm_develop_data", "partitioned" : false,
"primary" : "RS2" }
{ "_id" : "cm", "partitioned" : false, "primary" : "RS2" }

As you can see, there are only 4 sharded collections, all in the
cm_prod DB. Of those, video_raw_stats completely dominates the size:

mongos> use cm_prod
switched to db cm_prod

mongos> db.video_raw_stats.totalSize()
558973318352 [ == 520G]


mongos> use config
switched to db config


There is only one balancer lock outstanding:

mongos> db.locks.find({ _id : 'balancer'})
{ "_id" : "balancer", "process" : "RS3:27017:1346462580:1804289383",
"state" : 2, "ts" : ObjectId("50451aea4bbb07ad98706071"), "when" :
ISODate("2012-09-03T21:02:34.033Z"), "who" :
"RS3:27017:1346462580:1804289383:Balancer:846930886", "why" : "doing
balance round" }




All existing locks (included for completeness):

mongos> db.locks.find()
{ "_id" : "balancer", "process" : "RS3:27017:1346462580:1804289383",
"state" : 2, "ts" : ObjectId("50451b814bbb07ad9870607f"), "when" :
ISODate("2012-09-03T21:05:05.089Z"), "who" :
"RS3:27017:1346462580:1804289383:Balancer:846930886", "why" : "doing
balance round" }

{ "_id" : "cm_prod.video_latest_stats", "process" :
"RS2:27018:1346463433:367275570", "state" : 0, "ts" :
ObjectId("50451b8828145fb74ca6281a"), "when" :
ISODate("2012-09-03T21:05:12.887Z"), "who" :
"RS2:27018:1346463433:367275570:conn19:96364934", "why" : "migrate-
{ _id: \"-OkLQldVbhI\" }" }

{ "_id" : "cm_prod.video_descriptions", "process" :
RS1:27017:1343957738:1804289383", "state" : 0, "ts" :
ObjectId("5022feeaf7448e409c940963"), "when" :
ISODate("2012-08-09T00:06:02.618Z"), "who" :
"RS1:27017:1343957738:1804289383:conn9530:1058323149", "why" :
"drop" }

{ "_id" : "cm_prod.video_raw_stats", "process" :
"RS2:27018:1346463433:367275570", "state" : 0, "ts" :
ObjectId("50451b8a28145fb74ca6281c"), "when" :
ISODate("2012-09-03T21:05:14.050Z"), "who" :
"RS2:27018:1346463433:367275570:conn19:96364934", "why" : "migrate-
{ vt: \"--RsEGRDz6g\", h: 1343228523 }" }

{ "_id" : "cm_prod.profile_raw_stats", "process" :
"RS2:27018:1346463433:367275570", "state" : 0, "ts" :
ObjectId("50451b8928145fb74ca6281b"), "when" :
ISODate("2012-09-03T21:05:13.469Z"), "who" :
"RS2:27018:1346463433:367275570:conn19:96364934", "why" : "migrate-
{ u: MinKey, h: MinKey }" }

{ "_id" : "cm_prod.profile_daily_stats", "process" :
"RS1:27018:1346463419:1483632176", "state" : 0, "ts" :
ObjectId("50451b7d43ad3b2149370fe0"), "when" :
ISODate("2012-09-03T21:05:01.805Z"), "who" :
"RS1:27018:1346463419:1483632176:conn19:2127675310", "why" : "migrate-
{ u: \"lionsgatelive\", h: 1345939325 }" }

{ "_id" : "cm_prod.channel_engagements", "process" :
"RS1:27017:1342673255:1804289383", "state" : 0, "ts" :
ObjectId("5007927048e98d9778948bb9"), "when" :
ISODate("2012-07-19T04:52:00.791Z"), "who" :
"RS1:27017:1342673255:1804289383:conn2:972605807", "why" : "drop" }

{ "_id" : "cm_prod.channel_video_grouped", "process" :
"RS1:27017:1343540831:1804289383", "state" : 0, "ts" :
ObjectId("501adff96dac5fc19aab36e4"), "when" :
ISODate("2012-08-02T20:15:53.401Z"), "who" :
"RS1:27017:1343540831:1804289383:conn1093:258764218", "why" : "drop" }



On the RS2 machine:

[RS2 ~]# tail -n 30 /var/log/mongo/mongod.log
Mon Sep 3 21:26:41 [conn19] command admin.$cmd command: { moveChunk:
"cm_prod.video_latest_stats", from: "RS2/xxx:27018", to: "RS3/xxx:
27018", fromShard: "RS2", toShard: "RS3", min: { _id: "-OkLQldVbhI" },
max: { _id: "-amRGWbej2g" }, maxChunkSizeBytes: 67108864, shardId:
"cm_prod.video_latest_stats-_id_"-OkLQldVbhI"", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019", secondaryThrottle:
false } ntoreturn:1 reslen:199 680ms
Mon Sep 3 21:26:41 [conn19] received moveChunk request: { moveChunk:
"cm_prod.profile_raw_stats", from: "RS2/xxx:27018", to: "RS3/xxx:
27018", fromShard: "RS2", toShard: "RS3", min: { u: MinKey, h:
MinKey }, max: { u: "000christoph000", h: 1339286523 },
maxChunkSizeBytes: 67108864, shardId: "cm_prod.profile_raw_stats-
u_MinKeyh_MinKey", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019", secondaryThrottle:
false }
Mon Sep 3 21:26:41 [conn19] created new distributed lock for
cm_prod.profile_raw_stats on dbconf1:27019,dbconf2:27019,dbconf3:27019
( lock timeout : 900000, ping interval : 30000, process : 0 )
Mon Sep 3 21:26:41 [conn19] distributed lock
'cm_prod.profile_raw_stats/RS2:27018:1346463433:367275570' acquired,
ts : 5045209128145fb74ca6290c
Mon Sep 3 21:26:41 [conn19] about to log metadata event: { _id:
"RS2-2012-09-03T21:26:41-228109", server: "RS2", clientAddr:
"xxx.xxx.xxx.xxx (RS3):58137", time: new Date(1346707601262), what:
"moveChunk.start", ns: "cm_prod.profile_raw_stats", details: { min:
{ u: MinKey, h: MinKey }, max: { u: "000christoph000", h:
1339286523 }, from: "RS2", to: "RS3" } }
Mon Sep 3 21:26:41 [conn19] moveChunk request accepted at version 52|
7
Mon Sep 3 21:26:41 [conn19] moveChunk number of documents: 1241
Mon Sep 3 21:26:41 [conn19] distributed lock
'cm_prod.profile_raw_stats/RS2:27018:1346463433:367275570' unlocked.
Mon Sep 3 21:26:41 [conn19] about to log metadata event: { _id:
"RS2-2012-09-03T21:26:41-228110", server: "RS2", clientAddr:
"xxx.xxx.xxx.xxx (RS3):58137", time: new Date(1346707601534), what:
"moveChunk.from", ns: "cm_prod.profile_raw_stats", details: { min:
{ u: MinKey, h: MinKey }, max: { u: "000christoph000", h:
1339286523 }, step1: 0, step2: 279, note: "aborted" } }
Mon Sep 3 21:26:41 [conn19] command admin.$cmd command: { moveChunk:
"cm_prod.profile_raw_stats", from: "RS2/xxx:27018", to: "RS3/xxx:
27018", fromShard: "RS2", toShard: "RS3", min: { u: MinKey, h:
MinKey }, max: { u: "000christoph000", h: 1339286523 },
maxChunkSizeBytes: 67108864, shardId: "cm_prod.profile_raw_stats-
u_MinKeyh_MinKey", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019", secondaryThrottle:
false } ntoreturn:1 reslen:199 511ms
Mon Sep 3 21:26:41 [conn19] received moveChunk request: { moveChunk:
"cm_prod.video_raw_stats", from: "RS2/xxx:27018", to: "RS3/xxx:27018",
fromShard: "RS2", toShard: "RS3", min: { vt: "--RsEGRDz6g", h:
1343228523 }, max: { vt: "--u6Xjj9524", h: "1340809383" },
maxChunkSizeBytes: 67108864, shardId: "cm_prod.video_raw_stats-vt_"--
RsEGRDz6g"h_1343228523", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019", secondaryThrottle:
false }
Mon Sep 3 21:26:41 [conn19] created new distributed lock for
cm_prod.video_raw_stats on dbconf1:27019,dbconf2:27019,dbconf3:27019
( lock timeout : 900000, ping interval : 30000, process : 0 )
Mon Sep 3 21:26:41 [conn195] remove cm_prod.video_latest_stats query:
{ _id: "Fh-DHjnZ1Q8" } 125ms
Mon Sep 3 21:26:41 [conn177] remove cm_prod.video_latest_stats query:
{ _id: "NTo1BEPzXBs" } 118ms
Mon Sep 3 21:26:41 [conn191] remove cm_prod.video_latest_stats query:
{ _id: "0lM78b-hM8k" } 162ms
Mon Sep 3 21:26:41 [conn19] distributed lock 'cm_prod.video_raw_stats/
RS2:27018:1346463433:367275570' acquired, ts :
5045209128145fb74ca6290d
Mon Sep 3 21:26:41 [conn19] about to log metadata event: { _id:
"RS2-2012-09-03T21:26:41-228111", server: "RS2", clientAddr:
"xxx.xxx.xxx.xxx (RS3):58137", time: new Date(1346707601742), what:
"moveChunk.start", ns: "cm_prod.video_raw_stats", details: { min:
{ vt: "--RsEGRDz6g", h: 1343228523 }, max: { vt: "--u6Xjj9524", h:
"1340809383" }, from: "RS2", to: "RS3" } }
Mon Sep 3 21:26:41 [conn19] moveChunk request accepted at version
1542|437
Mon Sep 3 21:26:42 [conn177] remove cm_prod.video_latest_stats query:
{ _id: "75B0x5bT87U" } 140ms
Mon Sep 3 21:26:42 [conn157] query cm_prod.video_raw_stats query:
{ orderby: { h: -1 }, query: { vt: "4_V-NB2v6zI", hs: { $in:
[ 1346198400, 1345852800, 1345248000, 1343865600, 1338681600 ] } } }
ntoreturn:10 nscanned:10 scanAndOrder:1 nreturned:6 reslen:704 101ms
Mon Sep 3 21:26:43 [conn19] moveChunk number of documents: 68064
Mon Sep 3 21:26:43 [conn19] distributed lock 'cm_prod.video_raw_stats/
RS2:27018:1346463433:367275570' unlocked.
Mon Sep 3 21:26:43 [conn19] about to log metadata event: { _id:
"RS2-2012-09-03T21:26:43-228112", server: "RS2", clientAddr:
"xxx.xxx.xxx.xxx (RS3):58137", time: new Date(1346707603436), what:
"moveChunk.from", ns: "cm_prod.video_raw_stats", details: { min: { vt:
"--RsEGRDz6g", h: 1343228523 }, max: { vt: "--u6Xjj9524", h:
"1340809383" }, step1: 0, step2: 268, note: "aborted" } }
Mon Sep 3 21:26:43 [conn19] command admin.$cmd command: { moveChunk:
"cm_prod.video_raw_stats", from: "RS2/xxx:27018", to: "RS3/xxx:27018",
fromShard: "RS2", toShard: "RS3", min: { vt: "--RsEGRDz6g", h:
1343228523 }, max: { vt: "--u6Xjj9524", h: "1340809383" },
maxChunkSizeBytes: 67108864, shardId: "cm_prod.video_raw_stats-vt_"--
RsEGRDz6g"h_1343228523", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019", secondaryThrottle:
false } ntoreturn:1 reslen:199 2036ms
Mon Sep 3 21:26:44 [conn193] remove cm_prod.video_latest_stats query:
{ _id: "PXRPX2yZeBo" } 107ms
Mon Sep 3 21:26:44 [conn158] remove cm_prod.video_latest_stats query:
{ _id: "RhvB4mPPYr0" } 152ms
Mon Sep 3 21:26:44 [conn187] remove cm_prod.video_latest_stats query:
{ _id: "B2CtCrT3tXU" } 128ms
Mon Sep 3 21:26:44 [conn196] remove cm_prod.video_latest_stats query:
{ _id: "AlKQTBRJ3nk" } 121ms
Mon Sep 3 21:26:44 [conn159] remove cm_prod.video_latest_stats query:
{ _id: "HczAI7Hgf1Q" } 121ms
Mon Sep 3 21:26:44 [conn193] remove cm_prod.video_latest_stats query:
{ _id: "N5WjWgatUm8" } 146ms



On the RS3 machine:

[RS3 ~]# tail -n 30 /var/log/mongo/mongod.log

Mon Sep 3 20:37:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:37:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:37:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:39:37 [initandlisten] connection accepted from
127.0.1.1:43404 #192 (74 connections now open)
Mon Sep 3 20:40:35 [initandlisten] connection accepted from <RS1
machine private IP>:46020 #193 (75 connections now open)
Mon Sep 3 20:42:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:42:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:42:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:47:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:47:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:47:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:52:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:52:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:52:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:57:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:57:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 20:57:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:02:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:02:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:02:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:03:40 [conn140] end connection "<RS2 machine private IP>:
45967 (74 connections now open)
Mon Sep 3 21:07:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:07:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:07:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:12:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:12:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:12:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:17:06 [conn4] command admin.$cmd command:
{ writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:17:14 [conn10] command admin.$cmd command:
{ writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms
Mon Sep 3 21:17:37 [conn14] command admin.$cmd command:
{ writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0 reslen:44 300000ms

David Hows

unread,
Sep 4, 2012, 1:19:43 AM9/4/12
to mongod...@googlegroups.com
Hi David,

After some discussion it seems like the new shard has registered but isn't being included properly in the balancing.

Would you be willing to execute the flushRouterConfig command to try and clean out the cached router configurations and load them from the config servers, in order to get the router to acknowledged the new shard. The command is db.adminCommand( 'flushRouterConfig' ) on the mongos instance.

Documentation on the flush router config command is available here
Message has been deleted

Geoff L

unread,
Sep 4, 2012, 2:56:09 PM9/4/12
to mongod...@googlegroups.com
Hi David,
I work with David Storrs, here's what he got from the flushRouterConfig. I also poked in the mongos logs on the new shard machine and saw that the balancer was having issues migrating. After all of dave's stuff is the mongos log with the messages.

Thanks for the response.  I tried it from mongos on the RS3 instance
and got the following:

mongos> db.adminCommand( 'flushRouterConfig' )
{ "flushed" : true, "ok" : 1 }

mongos> use cm_prod
switched to db cm_prod
mongos> db.printShardingStatus()
--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
        {  "_id" : "RS1",  "host" : "RS1/xxx:27018" }
        {  "_id" : "RS2",  "host" : "RS2/xxx:27018" }
        {  "_id" : "RS3",  "host" : "RS3/xxx:27018" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "demo",  "partitioned" : false,  "primary" : "RS1" }
        {  "_id" : "cm_prod",  "partitioned" : true,  "primary" : "RS1" }
                cm_prod.profile_daily_stats chunks:
                                RS2     16
                                RS1     16
                        too many chunks to print, use verbose if you
want to force print
                cm_prod.profile_raw_stats chunks:
                                RS2     160
                                RS1     158

                        too many chunks to print, use verbose if you
want to force print
                cm_prod.video_latest_stats chunks:
                                RS1     152
                                RS2     153
                        too many chunks to print, use verbose if you
want to force print
                cm_prod.video_raw_stats chunks:
                                RS1     3564
                                RS2     3533

                        too many chunks to print, use verbose if you
want to force print
        {  "_id" : "test",  "partitioned" : false,  "primary" : "RS2" }
        {  "_id" : "admin; db",  "partitioned" : false,  "primary" : "RS2" }
        {  "_id" : "system",  "partitioned" : false,  "primary" : "RS2" }
        {  "_id" : "geoff",  "partitioned" : false,  "primary" : "RS2" }
        {  "_id" : "cm_demo_data",  "partitioned" : false,  "primary" : "RS2" }
        {  "_id" : "cm_develop_data",  "partitioned" : false,
"primary" : "RS2" }
        {  "_id" : "cm",  "partitioned" : false,  "primary" : "RS2" }


[RS1]# date
Tue Sep  4 17:28:37 UTC 2012

[RS1]# tail /var/log/mongo/mongod.log
Tue Sep  4 17:28:18 [conn287] about to log metadata event: { _id:
"RS1-2012-09-04T17:28:18-94774", server: "RS1", clientAddr:
"xxx.xxx.xxx.xxx:40811", time: new Date(1346779698624), what:

"moveChunk.start", ns: "cm_prod.video_raw_stats", details: { min: {
vt: MinKey, h: MinKey }, max: { vt: "---56wpitTM", h: 1340006584 },
from: "RS1", to: "RS3" } }
Tue Sep  4 17:28:18 [conn287] moveChunk request accepted at version 1542|1167
Tue Sep  4 17:28:18 [conn287] moveChunk number of documents: 18
Tue Sep  4 17:28:18 [conn287] distributed lock
'cm_prod.video_raw_stats/RS1:27018:1346463419:1483632176' unlocked.
Tue Sep  4 17:28:18 [conn287] about to log metadata event: { _id:
"RS1-2012-09-04T17:28:18-94775", server: "RS1", clientAddr:
"xxx.xxx.xxx.xxx:40811", time: new Date(1346779698932), what:

"moveChunk.from", ns: "cm_prod.video_raw_stats", details: { min: { vt:
MinKey, h: MinKey }, max: { vt: "---56wpitTM", h: 1340006584 }, step1:
0, step2: 360, note: "aborted" } }
Tue Sep  4 17:28:19 [conn287] command admin.$cmd command: { moveChunk:
"cm_prod.video_raw_stats", from: "RS1/RS1:27018", to:
"RS3/RS3:27018", min: { vt: MinKey, h: MinKey }, max: { vt:
"---56wpitTM", h: 1340006584 }, maxChunkSizeBytes: 67108864, shardId:
"cm_prod.video_raw_stats-vt_MinKeyh_MinKey", configdb:
"dbconf1:27019,dbconf2:27019,dbconf3:27019" } ntoreturn:1 reslen:199
751ms
Tue Sep  4 17:28:20 [conn287] request split points lookup for chunk
cm_prod.video_latest_stats { : "c49ZJ_AtsAE" } -->> { : "cI6nKmxZqwQ"
}


RS3#: date
Tue Sep  4 17:27:07 UTC 2012

RS3#: tail /var/log/mongo/mongod.log
Tue Sep  4 17:24:05 [initandlisten] connection accepted from
xxx.xxx.xxx.xxx:43430 #321 (99 connections now open)
Tue Sep  4 17:24:23 [conn198] command admin.$cmd command:

{writebacklisten: ObjectId('5041633c0774eaaaa46d50d3') } ntoreturn:1
keyUpdates:0  reslen:44 300000ms
Tue Sep  4 17:24:25 [conn200] command admin.$cmd command: {

writebacklisten: ObjectId('504163334bbb07ad9870119d') } ntoreturn:1
keyUpdates:0  reslen:44 300000ms
Tue Sep  4 17:24:25 [conn201] command admin.$cmd command: {

writebacklisten: ObjectId('50416343edfa40550746b1b8') } ntoreturn:1
keyUpdates:0  reslen:44 300000ms

RS3#: tail mongos.log
Tue Sep  4 17:49:11 [Balancer] moving chunk ns: cm_prod.video_latest_stats moving ( ns:cm_prod.video_latest_stats at: RS2:RS2/RS2:27018 lastmod: 60|1||000000000000000000000000 min: { _id: "-OkLQldVbhI" } max: { _id: "-amRGWbej2g" }) RS2:RS2/RS2:27018 -> RS3:RS3/RS3:27018
Tue Sep  4 17:49:15 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Tue Sep  4 17:49:15 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 } from: RS2 to: RS3 chunk:  min: { _id: "-OkLQldVbhI" } max: { _id: "-OkLQldVbhI" }
Tue Sep  4 17:49:15 [Balancer] moving chunk ns: cm_prod.profile_raw_stats moving ( ns:cm_prod.profile_raw_stats at: RS2:RS2/RS2:27018 lastmod: 52|1||000000000000000000000000 min: { u: MinKey, h: MinKey } max: { u: "000christoph000", h: 1339286523 }) RS2:RS2/RS2:27018 -> RS3:RS3/RS3:27018
Tue Sep  4 17:49:16 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Tue Sep  4 17:49:16 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 } from: RS2 to: RS3 chunk:  min: { u: MinKey, h: MinKey } max: { u: MinKey, h: MinKey }
Tue Sep  4 17:49:16 [Balancer] moving chunk ns: cm_prod.video_raw_stats moving ( ns:cm_prod.video_raw_stats at: RS1:RS1/RS1:27018 lastmod: 1541|0||000000000000000000000000 min: { vt: MinKey, h: MinKey } max: { vt: "---56wpitTM", h: 1340006584 }) RS1:RS1/RS1:27018 -> RS3:RS3/RS3:27018
Tue Sep  4 17:49:17 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Tue Sep  4 17:49:17 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 } from: RS1 to: RS3 chunk:  min: { vt: MinKey, h: MinKey } max: { vt: MinKey, h: MinKey }
Tue Sep  4 17:49:17 [Balancer] distributed lock 'balancer/RS3:27017:1346462580:1804289383' unlocked.

David Hows

unread,
Sep 4, 2012, 10:53:11 PM9/4/12
to mongod...@googlegroups.com
Hi Geoff,

I've seen that you have also been speaking with Adam.

Looking further it seems that the balancer lock could be stale, can you confirm this? Please query as follows:
use config
db.locks.find({ _id : 'balancer'})

If the time in the "when" of the lock is greater than a few hours (normally we say 2) then it is likely that this lock is stale and needs to be deleted.

Would you be able to please check the balancer lock again to see if they are indeed stale and if so try deleting the balancer lock.

Thanks

David

Geoff L

unread,
Sep 5, 2012, 1:49:42 PM9/5/12
to mongod...@googlegroups.com
Hi Dave,

Ya, I was talking to Adam looked like he might have been working on a problem similar to ours.

It doesn't seem like the balancer lock is stale. There are locks in the collection that are old, but refer to collections that no longer exist, so I don't believe they are relevant. I have also noticed that balancing has completely stopped when the new shard is in the cluster. If I remove the new shard from the cluster using the removeShard command, the cluster goes back to working fine. Is there something I'm supposed to do with the balancer when I add a new shard?

Thanks,

Geoff

David Hows

unread,
Sep 5, 2012, 7:07:56 PM9/5/12
to mongod...@googlegroups.com
Hi Geoff,

Your problem is an interesting one. Its clear that the balancer isn't working, but there is no stale lock on the balancer. Which is normally the cause of these kinds of issues.

Would you have any issue with removing the old locks on the collections that dont exist? 

Additionally, would you be able to add back the third shard and enable logging on the mongos. Mongos is saying its aware of a chunk migration already in progress and the extra logging should give us an idea of what the cause is. You would only need to turn the logging up for a short time (1 minute or so) and just need to see the error message in the log around balancing. 

To change the logging level execute the following command "db.adminCommand({ setParameter: 1, logLevel: 5 })"

Cheers,

David

David K Storrs

unread,
Sep 5, 2012, 8:51:56 PM9/5/12
to mongod...@googlegroups.com


On Wednesday, September 5, 2012 4:07:56 PM UTC-7, David Hows wrote:
Hi Geoff,

Your problem is an interesting one. Its clear that the balancer isn't working, but there is no stale lock on the balancer. Which is normally the cause of these kinds of issues.

Would you have any issue with removing the old locks on the collections that dont exist? 

Additionally, would you be able to add back the third shard and enable logging on the mongos. Mongos is saying its aware of a chunk migration already in progress and the extra logging should give us an idea of what the cause is. You would only need to turn the logging up for a short time (1 minute or so) and just need to see the error message in the log around balancing. 

To change the logging level execute the following command "db.adminCommand({ setParameter: 1, logLevel: 5 })"

Cheers,

David


Well, we found the issue: we had mongo 2.2 on shard 3 and 2.0 on the shards 1 and 2, exactly the way we were certain we didn't.

We installed the new shard the day AFTER 2.2 came out, and when we did 'yum install ...' we got the 2.2 instance without realizing it.  When we started debugging this, the first thing we checked was "is this v2.0.6?", but we accidentally connected to the mongod on the dev server, not the prod server.  Everything after that was therefore obviously futile thrashing around, since we had already tested and "eliminated" the correct answer.  Once we found the problem, it took 20 minutes to fix, after nearly a week of forehead pounding.  

Yeesh.


Thanks for your help, David.

Dave

David Hows

unread,
Sep 5, 2012, 9:02:06 PM9/5/12
to mongod...@googlegroups.com
Ah!

That will most certainly be the cause of the issue here!

Thanks for coming back to me with clarification

Cheers and good luck

David


On Tuesday, September 4, 2012 7:59:42 AM UTC+10, David K Storrs wrote:

Kev

unread,
Oct 4, 2012, 6:13:47 PM10/4/12
to mongod...@googlegroups.com
Wasn't sure whether to make a new topic or not, but I would have named my problem the same thing.  

It seems like moving chunks has stopped happening recently and I'm not sure why.  I have a bunch of collections that were previously sharding fine.  Recently, I upgraded all my boxes to Mongo 2.2 (I verified that the mongod's are 2.2) but I noticed that my chunks weren't getting moved very much anymore.  

I grepped my mongo logs for the words "aborted" and see a lot of these messages:
Thu Oct  4 22:09:12 [conn128724] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:12-302899", server: "ip-10-188-20-57", clientAddr: "10.166.29.83:44133", time: new Date(1349388552424), what: "moveChunk.from", ns: "db_name.collection1", details: { min: { did: MinKey, version: MinKey, error_hash: MinKey }, max: { did: "0000460e-b541-3e73-92f6-45644b1a7951", version: "3.2.1", error_hash: "e035313ce9ebbec5f9effb44b4550d0a3d4ad77d4a1e796a9a08acf0" }, step1 of 6: 1, step2 of 6: 209, note: "aborted" } }
Thu Oct  4 22:09:14 [conn115874] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:14-302901", server: "ip-10-188-20-57", clientAddr: "10.166.47.149:36241", time: new Date(1349388554635), what: "moveChunk.from", ns: "db_name.collection2", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fe6e574b8d7d56f7e000064') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fea3be08a43cb5505000343') }, step1 of 6: 0, step2 of 6: 242, note: "aborted" } }
Thu Oct  4 22:09:16 [conn115874] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:16-302903", server: "ip-10-188-20-57", clientAddr: "10.166.47.149:36241", time: new Date(1349388556107), what: "moveChunk.from", ns: "db_name.collection3", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, step1 of 6: 0, step2 of 6: 182, note: "aborted" } }
Thu Oct  4 22:09:20 [conn115874] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:20-302905", server: "ip-10-188-20-57", clientAddr: "10.166.47.149:36241", time: new Date(1349388560847), what: "moveChunk.from", ns: "db_name.collection1", details: { min: { did: MinKey, version: MinKey, error_hash: MinKey }, max: { did: "0000460e-b541-3e73-92f6-45644b1a7951", version: "3.2.1", error_hash: "e035313ce9ebbec5f9effb44b4550d0a3d4ad77d4a1e796a9a08acf0" }, step1 of 6: 1, step2 of 6: 3140, note: "aborted" } }
Thu Oct  4 22:09:32 [conn115852] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:32-302906", server: "ip-10-188-20-57", clientAddr: "10.166.39.153:46072", time: new Date(1349388572097), what: "moveChunk.from", ns: "db_name.collection2", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fe6e574b8d7d56f7e000064') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fea3be08a43cb5505000343') }, step1 of 6: 0, note: "aborted" } }
Thu Oct  4 22:09:45 [conn115852] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-04T22:09:45-302908", server: "ip-10-188-20-57", clientAddr: "10.166.39.153:46072", time: new Date(1349388585515), what: "moveChunk.from", ns: "db_name.collection3", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, step1 of 6: 0, step2 of 6: 3554, note: "aborted" } }

So it looks like a bunch of the moveChunk.from commands are being aborted.  

I tried the flushRouter command, and checking for stale locks, as well as making sure my mongod's are of the correct version.  I apologize for not giving enough detail here, but I'm really not sure what would be useful to paste here. 

Appreciate any help I can get in what I should be looking at to try to figure this out!

Cheers,
Kevin

Kev

unread,
Oct 4, 2012, 6:48:57 PM10/4/12
to mongod...@googlegroups.com
Also, lemme say that I'm not deleting anything from here so I don't think it is related to https://groups.google.com/forum/?fromgroups=#!searchin/mongodb-user/log$20metadata$20aborted/mongodb-user/q4MtfpDjSGc/R7EJBwA1h3EJ

David Hows

unread,
Oct 4, 2012, 8:00:36 PM10/4/12
to mongod...@googlegroups.com
Hi Kev,

How did you check for stale migrations? What did you find?

Can you attach the Mongos and Config server logs for the same time period? They should have more information about the status of these failing moves.

Cheers,

David

Kev

unread,
Oct 4, 2012, 8:15:17 PM10/4/12
to mongod...@googlegroups.com
In terms of how I checked for the migrations.  I basically added a new collection within the past few days.  I noticed it wasn't sharding right away but I figured it might take some time to move the data.  
Checking db.printShardingStatus() a few days later revealed that while I had chunks, they weren't moving properly:

crittercism.user_crash chunks:
crittercism         21
crittercism7 1

mongos> db.user_crash.stats()
{
"sharded" : true,
"ns" : "crittercism.user_crash",
"count" : 4546157,
"numExtents" : 19,
"size" : 947845172,
"storageSize" : 1310638080,
"totalIndexSize" : 1079215648,
"indexSizes" : {
"_id_" : 147511392,
"hashed_device_id_1_app_version_1_error_hash_1" : 931704256
},
"avgObjSize" : 208.49371722094068,
"nindexes" : 2,
"nchunks" : 21,
"shards" : {
"crittercism" : {
"ns" : "crittercism.user_crash",
"count" : 4546132,
"size" : 947840112,
"avgObjSize" : 208.493750731391,
"storageSize" : 1310629888,
"numExtents" : 18,
"nindexes" : 2,
"lastExtentSize" : 345985024,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 1079199296,
"indexSizes" : {
"_id_" : 147503216,
"hashed_device_id_1_app_version_1_error_hash_1" : 931696080
},
"ok" : 1
},
"crittercism7" : {
"ns" : "crittercism.user_crash",
"count" : 25,
"size" : 5060,
"avgObjSize" : 202.4,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 2,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"hashed_device_id_1_app_version_1_error_hash_1" : 8176
},
"ok" : 1
}
},
"ok" : 1
}

After seeing this strange imbalance, went to the logs and just started staring at it when I saw the moveChunk.from ..."aborted" log message.  

Attached a few of the logs at the same time.  Sorry for the huge block of text.  

Config server logs:
Thu Oct  4 20:09:35 [conn156794] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:39 reslen:2417 306ms
Thu Oct  4 20:09:35 [conn156249] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:327019 reslen:664 327ms
Thu Oct  4 20:09:35 [conn156893] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 305ms
Thu Oct  4 20:09:35 [conn158238] CMD fsync: sync:1 lock:0
Thu Oct  4 20:09:36 [conn156329] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:273725 reslen:664 274ms
Thu Oct  4 20:09:36 [conn158238] command admin.$cmd command: { fsync: 1 } ntoreturn:1 keyUpdates:0 locks(micros) W:16579 reslen:51 279ms
Thu Oct  4 20:09:36 [conn156506] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:304792 reslen:664 304ms
Thu Oct  4 20:09:36 [conn156813] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:73 reslen:2417 297ms
Thu Oct  4 20:09:36 [conn156707] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:39 reslen:2417 297ms
Thu Oct  4 20:09:36 [conn156739] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 295ms
Thu Oct  4 20:09:36 [conn156271] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:275764 reslen:664 275ms
Thu Oct  4 20:09:36 [conn157170] CMD fsync: sync:1 lock:0
Thu Oct  4 20:09:37 [conn156492] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:280497 reslen:664 280ms
Thu Oct  4 20:09:37 [conn156288] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:297720 reslen:664 297ms
Thu Oct  4 20:09:37 [conn156517] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:299612 reslen:664 299ms
Thu Oct  4 20:09:37 [conn156259] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:271957 reslen:664 272ms
Thu Oct  4 20:09:38 [conn161043] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:270136 reslen:664 277ms
Thu Oct  4 20:09:38 [conn156473] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:319628 reslen:664 319ms
Thu Oct  4 20:09:38 [conn156689] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:55 reslen:2417 311ms
Thu Oct  4 20:09:38 [conn156935] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:73 reslen:2417 312ms
Thu Oct  4 20:09:38 [conn156330] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:273630 reslen:664 275ms
Thu Oct  4 20:09:38 [conn162503] CMD fsync: sync:1 lock:0
Thu Oct  4 20:09:39 [conn156267] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:272260 reslen:664 272ms
Thu Oct  4 20:09:39 [conn162503] command admin.$cmd command: { fsync: 1 } ntoreturn:1 keyUpdates:0 locks(micros) W:6229 reslen:51 286ms
Thu Oct  4 20:09:39 [conn157932] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:285307 reslen:664 285ms
Thu Oct  4 20:09:39 [conn161795] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:41 reslen:2417 285ms
Thu Oct  4 20:09:39 [conn156726] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:50 reslen:2417 287ms
Thu Oct  4 20:09:39 [conn156806] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 288ms
Thu Oct  4 20:09:39 [conn156808] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:39 reslen:2417 290ms
Thu Oct  4 20:09:39 [conn156223] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:276058 reslen:664 276ms
Thu Oct  4 20:09:39 [conn156242] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:272934 reslen:664 272ms
Thu Oct  4 20:09:39 [conn156725] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:57 reslen:2417 551ms
Thu Oct  4 20:09:39 [conn156741] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:31 reslen:2417 552ms
Thu Oct  4 20:09:39 [conn156852] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:34 reslen:2417 555ms
Thu Oct  4 20:09:40 [conn156292] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:273821 reslen:664 273ms
Thu Oct  4 20:09:40 [conn162503] CMD fsync: sync:1 lock:0
Thu Oct  4 20:09:40 [conn156792] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:42 reslen:2417 560ms
Thu Oct  4 20:09:40 [conn156818] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 285ms
Thu Oct  4 20:09:40 [conn156238] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:290925 reslen:664 290ms
Thu Oct  4 20:09:40 [conn156793] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:39 reslen:2417 576ms
Thu Oct  4 20:09:40 [conn156684] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:33 reslen:2417 577ms
Thu Oct  4 20:09:40 [conn156733] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:34 reslen:2417 573ms
Thu Oct  4 20:09:40 [conn156861] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:31 reslen:2417 856ms
Thu Oct  4 20:09:40 [conn156732] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:30 reslen:2417 581ms
Thu Oct  4 20:09:40 [conn156959] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:32 reslen:2417 574ms
Thu Oct  4 20:09:40 [conn156268] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:273691 reslen:664 280ms
Thu Oct  4 20:09:40 [conn156921] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:35 reslen:2417 282ms
Thu Oct  4 20:09:40 [conn156703] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 282ms
Thu Oct  4 20:09:40 [conn156688] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:58 reslen:2417 284ms
Thu Oct  4 20:09:40 [conn156821] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:36 reslen:2417 283ms
Thu Oct  4 20:09:40 [conn156907] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:31 reslen:2417 283ms
Thu Oct  4 20:09:40 [conn156930] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:30 reslen:2417 284ms
Thu Oct  4 20:09:41 [conn156319] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:310838 reslen:664 310ms
Thu Oct  4 20:09:41 [conn156691] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:51 reslen:2417 308ms
Thu Oct  4 20:09:41 [conn156484] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:277125 reslen:664 277ms
Thu Oct  4 20:09:41 [conn156723] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:47 reslen:2417 279ms
Thu Oct  4 20:09:41 [conn156868] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:40 reslen:2417 591ms
Thu Oct  4 20:09:41 [conn156903] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:41 reslen:2417 591ms
Thu Oct  4 20:09:41 [conn156922] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:45 reslen:2417 591ms
Thu Oct  4 20:09:41 [conn156471] command config.$cmd command: { dbhash: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:276729 reslen:664 276ms
Thu Oct  4 20:09:41 [conn156958] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:52 reslen:2417 282ms
Thu Oct  4 20:09:41 [conn156671] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:67 reslen:2417 279ms
Thu Oct  4 20:09:41 [conn158141] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:39 reslen:2417 278ms
Thu Oct  4 20:09:41 [conn156687] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:38 reslen:2417 277ms
Thu Oct  4 20:09:41 [conn156699] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:36 reslen:2417 879ms

Mongod logs
Fri Oct  5 00:07:06 [conn120176] created new distributed lock for crittercism.rawcrashlog on silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Oct  5 00:07:06 [conn120176] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' acquired, ts : 506e24aa8d8e9f9ae941c4b5
Fri Oct  5 00:07:06 [conn120176] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-05T00:07:06-306402", server: "ip-10-188-20-57", clientAddr: "10.160.161.228:44988", time: new Date(1349395626228), what: "moveChunk.start", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fea3be08a43cb5505000343') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fed4d2f1a12c5245c00016c') }, from: "crittercism", to: "crittercism1" } }
Fri Oct  5 00:07:06 [conn120176] moveChunk request accepted at version 28825|81||000000000000000000000000
Fri Oct  5 00:07:06 [conn120176] moveChunk number of documents: 1724
Fri Oct  5 00:07:06 [conn120176] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' unlocked. 
Fri Oct  5 00:07:06 [conn120176] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-05T00:07:06-306403", server: "ip-10-188-20-57", clientAddr: "10.160.161.228:44988", time: new Date(1349395626369), what: "moveChunk.from", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fea3be08a43cb5505000343') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fed4d2f1a12c5245c00016c') }, step1 of 6: 0, step2 of 6: 216, note: "aborted" } }
Fri Oct  5 00:07:06 [conn120176] command admin.$cmd command: { moveChunk: "crittercism.rawcrashlog", from: "crittercism/apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism1/pineapple-3.crittercism.com:27018,pineapple-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism1", min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fea3be08a43cb5505000343') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('4fed4d2f1a12c5245c00016c') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.rawcrashlog-unsymbolized_hash_"bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526"_id_ObjectId('4fea3be08a43cb5505000343')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false } ntoreturn:1 keyUpdates:0 locks(micros) r:4569 reslen:199 348ms
Fri Oct  5 00:07:07 [conn120176] received moveChunk request: { moveChunk: "crittercism.crash_log", from: "crittercism/apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism9/acai-3.crittercism.com:27018,acai-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism9", min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.crash_log-hash_"e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979"_id_ObjectId('500b46d7e230974770000054')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false }
Fri Oct  5 00:07:07 [conn120176] created new distributed lock for crittercism.crash_log on silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Oct  5 00:07:07 [conn120176] distributed lock 'crittercism.crash_log/ip-10-188-20-57:27018:1347666812:127904054' acquired, ts : 506e24ab8d8e9f9ae941c4d2
Fri Oct  5 00:07:07 [conn120176] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-05T00:07:07-306404", server: "ip-10-188-20-57", clientAddr: "10.160.161.228:44988", time: new Date(1349395627739), what: "moveChunk.start", ns: "crittercism.crash_log", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, from: "crittercism", to: "crittercism9" } }
Fri Oct  5 00:07:07 [conn120176] moveChunk request accepted at version 2400|53||000000000000000000000000
Fri Oct  5 00:07:07 [conn120176] moveChunk number of documents: 9851
Fri Oct  5 00:07:07 [conn120176] distributed lock 'crittercism.crash_log/ip-10-188-20-57:27018:1347666812:127904054' unlocked. 
Fri Oct  5 00:07:07 [conn120176] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-05T00:07:07-306405", server: "ip-10-188-20-57", clientAddr: "10.160.161.228:44988", time: new Date(1349395627914), what: "moveChunk.from", ns: "crittercism.crash_log", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, step1 of 6: 1, step2 of 6: 209, note: "aborted" } }
Fri Oct  5 00:07:07 [conn120176] command admin.$cmd command: { moveChunk: "crittercism.crash_log", from: "crittercism/apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism9/acai-3.crittercism.com:27018,acai-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism9", min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.crash_log-hash_"e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979"_id_ObjectId('500b46d7e230974770000054')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false } ntoreturn:1 keyUpdates:0 locks(micros) r:24154 reslen:199 383ms
Fri Oct  5 00:07:08 [conn172529] end connection 10.161.114.205:39365 (2670 connections now open)
Fri Oct  5 00:07:08 [initandlisten] connection accepted from 10.161.114.205:39389 #172531 (2671 connections now open)

David Hows

unread,
Oct 4, 2012, 9:12:03 PM10/4/12
to mongod...@googlegroups.com
Hi Kevin,

Can you also post the Mongos logs.

The mongos is responsible for initiating the chunk migrations, so we would expect some more detail on the errors there.

Cheers,

David

Kev

unread,
Oct 5, 2012, 12:42:47 AM10/5/12
to mongod...@googlegroups.com
Oh sure.  Sorry, I misread the first request.

Fri Oct  5 04:29:32 [Balancer] moving chunk ns: crittercism.handled_exception_log moving ( ns:crittercism.handled_exception_log at: crittercism10:crittercism10/tomato-3.crittercism.com:27018,tomato-4.crittercism.com:27018 lastmod: 7215|1||000000000000000000000000 min: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('5049340ccd2b6e79a6000298') } max: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('50493b1bdcd8b022d600009e') }) crittercism10:crittercism10/tomato-3.crittercism.com:27018,tomato-4.crittercism.com:27018 -> crittercism4:crittercism4/grape-3.crittercism.com:27018,grape-4.crittercism.com:27018
Fri Oct  5 04:29:33 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Fri Oct  5 04:29:33 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 } from: crittercism10 to: crittercism4 chunk:  min: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('5049340ccd2b6e79a6000298') } max: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('5049340ccd2b6e79a6000298') }
Fri Oct  5 04:29:33 [Balancer] moving chunk ns: crittercism.user_crash moving ( ns:crittercism.user_crash at: crittercism:crittercism/apple-3.crittercism.com:27018,apple-4.crittercism.com:27018 lastmod: 2|12||000000000000000000000000 min: { hashed_device_id: MinKey, app_version: MinKey, error_hash: MinKey } max: { hashed_device_id: "0000460e-b541-3e73-92f6-45644b1a7951", app_version: "3.2.1", error_hash: "e035313ce9ebbec5f9effb44b4550d0a3d4ad77d4a1e796a9a08acf0" }) crittercism:crittercism/apple-3.crittercism.com:27018,apple-4.crittercism.com:27018 -> crittercism1:crittercism1/pineapple-3.crittercism.com:27018,pineapple-4.crittercism.com:27018
Fri Oct  5 04:29:33 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Fri Oct  5 04:29:33 [Balancer] balancer move failed: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 } from: crittercism to: crittercism1 chunk:  min: { hashed_device_id: MinKey, app_version: MinKey, error_hash: MinKey } max: { hashed_device_id: MinKey, app_version: MinKey, error_hash: MinKey }
Fri Oct  5 04:29:33 [Balancer] distributed lock 'balancer/ip-10-161-6-230:27017:1348603845:1804289383' unlocked. 
Fri Oct  5 04:33:12 [LockPinger] cluster silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 pinged successfully at Fri Oct  5 04:33:12 2012 by distributed lock pinger 'silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019/ip-10-161-6-230:27017:1348603845:1804289383', sleeping for 30000ms
Fri Oct  5 04:37:42 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 29ms
Fri Oct  5 04:38:17 [LockPinger] cluster silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 pinged successfully at Fri Oct  5 04:38:16 2012 by distributed lock pinger 'silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019/ip-10-161-6-230:27017:1348603845:1804289383', sleeping for 30000ms

That does seem a bit more informative, though I'm still not quite sure how to solve.  

David Hows

unread,
Oct 7, 2012, 6:02:14 PM10/7/12
to mongod...@googlegroups.com
Kev, 

It thinks there is a migration already in progress.

Can you execute these command on the config server for your cluster about 10 minutes apart?

use config;
db.locks.find({_id: "balancer"}).pretty();

This will find the current locks taken out by the balancer. Want to check if you have a stale balancer lock.

Cheers,

David


Kev

unread,
Oct 7, 2012, 9:00:11 PM10/7/12
to mongod...@googlegroups.com
Hey David, here's what you requested:

{
"_id" : "balancer",
"process" : "ip-10-160-170-16:27017:1349421999:1804289383",
"state" : 2,
"ts" : ObjectId("50722319d219a8c48596019e"),
"when" : ISODate("2012-10-08T00:49:29.678Z"),
"who" : "ip-10-160-170-16:27017:1349421999:1804289383:Balancer:846930886",
"why" : "doing balance round"
}

{
"_id" : "balancer",
"process" : "ip-10-160-214-226:27017:1349421165:1804289383",
"state" : 2,
"ts" : ObjectId("5072255c9c2b6bc0b96cfab8"),
"when" : ISODate("2012-10-08T00:59:08.397Z"),
"who" : "ip-10-160-214-226:27017:1349421165:1804289383:Balancer:846930886",
"why" : "doing balance round"

David Hows

unread,
Oct 7, 2012, 9:48:35 PM10/7/12
to mongod...@googlegroups.com
Hi Kevin,

Those locks do seem to be moving.

Would you be willing to try stopping the balancer, waiting for its locks to clear and then restarting it?

You will need to execute this on the config server
use config;
sh.stopBalancer();
sh.startBalancer();

Since you are using 2.2, the stopBalancer should wait for the locks to be released before starting again.

Once you restart the balancer, can you check the output from your mongos instance?

Cheers,

David

Kev

unread,
Oct 8, 2012, 1:30:12 PM10/8/12
to mongod...@googlegroups.com
Hi David.  I stopped the balancer, but I don't think it's lock ever cleared.  I always saw this in the db.locks.find()
{ "_id" : "balancer", "process" : "ip-10-166-159-125:27017:1348603846:1804289383", "state" : 2, "ts" : ObjectId("50730ad0b0ac522e95c71163"), "when" : ISODate("2012-10-08T17:18:08.541Z"), "who" : "ip-10-166-159-125:27017:1348603846:1804289383:Balancer:846930886", "why" : "doing balance round" }

I waited about an hour but it didn't clear.  How long do I typically have to wait for that to happen?  

When I restarted balancing the lock almost immediately updated to the current time for the "when" field.  

Interestingly enough, the mongos do not display those errors anymore, although the mongod's continue to show the aborted ones.  

David Hows

unread,
Oct 8, 2012, 6:12:08 PM10/8/12
to mongod...@googlegroups.com
Hi Kev,

It looks like that lock is stale.

Can you try running the following to set the state value to 0, then restarting the balancer.

You should be able to do the update with something like 
use config;
db.locks.update({ _id : "balancer"}, { state: 0})

Try that and see if your migrations start again and if you are still getting errors on the mongos.

Kev

unread,
Oct 8, 2012, 9:01:02 PM10/8/12
to mongod...@googlegroups.com
Hey David, 
     Doesn't look like migration has started again.  I'll try the balancer turn off for a longer period of time later tonight, but should I expect to have to wait an hour for it to clear?  

Mongos.log output
Tue Oct  9 00:49:56 [LockPinger] cluster silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 pinged successfully at Tue Oct  9 00:49:56 2012 by distributed lock pinger 'silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019/ip-10-174-183-254:27017:1349418400:1804289383', sleeping for 30000ms
Tue Oct  9 00:50:53 [conn7726] end connection 127.0.0.1:36687 (6 connections now open)
Tue Oct  9 00:51:16 [conn7727] end connection 127.0.0.1:37396 (5 connections now open)
Tue Oct  9 00:51:19 [conn7721] end connection 127.0.0.1:55807 (4 connections now open)
Tue Oct  9 00:51:20 [mongosMain] connection accepted from 127.0.0.1:43635 #7729 (5 connections now open)
Tue Oct  9 00:51:43 [mongosMain] connection accepted from 127.0.0.1:44458 #7730 (6 connections now open)
Tue Oct  9 00:51:57 [conn7728] end connection 127.0.0.1:40274 (5 connections now open)
Tue Oct  9 00:51:57 [mongosMain] connection accepted from 127.0.0.1:45427 #7731 (6 connections now open)
Tue Oct  9 00:52:54 [Balancer] warning: distributed lock 'balancer/ip-10-174-183-254:27017:1349418400:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('507375615b5aac7653e146e7') } update: { $set: { state: 1, who: "ip-10-174-183-254:27017:1349418400:1804289383:Balancer:1793813364", process: "ip-10-174-183-254:27017:1349418400:1804289383", when: new Date(1349743974432), why: "doing balance round", ts: ObjectId('507375660fa8ef3a64f0699e') } } gle1: { updatedExisting: true, n: 1, connectionId: 163862, waited: 2, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 213070, waited: 20, err: null, ok: 1.0 }
Tue Oct  9 00:52:54 [Balancer] lock update lost, lock 'balancer/ip-10-174-183-254:27017:1349418400:1804289383' not propagated.
Tue Oct  9 00:52:54 [Balancer] distributed lock 'balancer/ip-10-174-183-254:27017:1349418400:1804289383' was not acquired.
Tue Oct  9 00:53:00 [LockPinger] trying to delete 1 old lock entries for process ip-10-174-183-254:27017:1349418400:1804289383
Tue Oct  9 00:53:00 [LockPinger] handled late remove of old distributed lock with ts 507375660fa8ef3a64f0699e
Tue Oct  9 00:53:35 [conn7730] end connection 127.0.0.1:44458 (5 connections now open)
Tue Oct  9 00:53:40 [mongosMain] connection accepted from 127.0.0.1:49666 #7732 (6 connections now open)
Tue Oct  9 00:53:52 [conn7729] end connection 127.0.0.1:43635 (5 connections now open)
Tue Oct  9 00:54:01 [mongosMain] connection accepted from 127.0.0.1:50570 #7733 (6 connections now open)
Tue Oct  9 00:54:26 [conn7731] end connection 127.0.0.1:45427 (5 connections now open)
Tue Oct  9 00:54:31 [mongosMain] connection accepted from 127.0.0.1:51793 #7734 (6 connections now open)
Tue Oct  9 00:54:55 [mongosMain] connection accepted from 127.0.0.1:52447 #7735 (7 connections now open)
Tue Oct  9 00:55:02 [LockPinger] cluster silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 pinged successfully at Tue Oct  9 00:55:02 2012 by distributed lock pinger 'silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019/ip-10-174-183-254:27017:1349418400:1804289383', sleeping for 30000ms
Tue Oct  9 00:56:11 [conn7732] end connection 127.0.0.1:49666 (6 connections now open)
Tue Oct  9 00:56:30 [mongosMain] connection accepted from 127.0.0.1:56323 #7736 (7 connections now open)
Tue Oct  9 00:56:59 [conn7734] end connection 127.0.0.1:51793 (6 connections now open)
Tue Oct  9 00:57:00 [mongosMain] connection accepted from 127.0.0.1:57791 #7737 (7 connections now open)
Tue Oct  9 00:58:02 [conn7735] end connection 127.0.0.1:52447 (6 connections now open)
Tue Oct  9 00:58:06 [mongosMain] connection accepted from 127.0.0.1:60607 #7738 (7 connections now open)
Tue Oct  9 00:58:49 [conn7736] end connection 127.0.0.1:56323 (6 connections now open)
Tue Oct  9 00:58:58 [mongosMain] connection accepted from 127.0.0.1:34244 #7739 (7 connections now open)
Tue Oct  9 00:59:04 [conn7737] end connection 127.0.0.1:57791 (6 connections now open)
Tue Oct  9 00:59:04 [conn7733] end connection 127.0.0.1:50570 (5 connections now open)
Tue Oct  9 00:59:04 [conn7738] end connection 127.0.0.1:60607 (4 connections now open)
Tue Oct  9 00:59:04 [conn7739] end connection 127.0.0.1:34244 (3 connections now open)
Tue Oct  9 00:59:04 [conn7717] end connection 127.0.0.1:49790 (2 connections now open)
Tue Oct  9 00:59:06 [mongosMain] connection accepted from 127.0.0.1:34583 #7740 (3 connections now open)

Kev

unread,
Oct 9, 2012, 2:22:45 PM10/9/12
to mongod...@googlegroups.com
I tried messing around again with restarting the balancer so I did the following:
1.  sh.setBalancerState(false)
2.  Waited about 15 min
3.  Restarted every mongos
4.  Waited another 15 min
5.  sh.setBalancerState(true)

The logs look a little different, though still nothing migrating.  

Tue Oct  9 18:19:20 [conn28] end connection 127.0.0.1:32917 (5 connections now open)
Tue Oct  9 18:19:52 [mongosMain] connection accepted from 127.0.0.1:39823 #31 (6 connections now open)
Tue Oct  9 18:20:24 [Balancer] distributed lock 'balancer/ip-10-161-7-17:27017:1349806072:1144614846' acquired, ts : 50746ae70f3bc9ac81018ad5
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on acai-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on blueberry-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on citron-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on tomato-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on olive-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [conn29] resetting shard version of crittercism.app on mango-3.crittercism.com:27018, version is zero
Tue Oct  9 18:20:26 [Balancer]  ns: crittercism.rawcrashlog going to move { _id: "crittercism.rawcrashlog-unsymbolized_hash_"bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526"_id_ObjectId('503ad114f2318d1995000297')", lastmod: Timestamp 28866000|1, lastmodEpoch: ObjectId('000000000000000000000000'), ns: "crittercism.rawcrashlog", min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, shard: "crittercism" } from: crittercism to: crittercism4 tag []
Tue Oct  9 18:20:26 [Balancer]  ns: crittercism.symbolicatedcrashlog going to move { _id: "crittercism.symbolicatedcrashlog-unsymbolized_hash_"0a52c208a1c2320bdf179b7f2d4352a71b39fb9f4f9ac5111ca1bc7f"", lastmod: Timestamp 573000|1, ns: "crittercism.symbolicatedcrashlog", min: { unsymbolized_hash: "0a52c208a1c2320bdf179b7f2d4352a71b39fb9f4f9ac5111ca1bc7f" }, max: { unsymbolized_hash: "0a9f8f71b18aa8aa073b9af9bc220a67dfec2b9614c3f0afc026cdc4" }, shard: "crittercism4" } from: crittercism4 to: crittercism10 tag []
Tue Oct  9 18:20:27 [Balancer]  ns: crittercism.crash_log going to move { _id: "crittercism.crash_log-hash_"e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979"_id_ObjectId('500b46d7e230974770000054')", lastmod: Timestamp 2400000|1, lastmodEpoch: ObjectId('000000000000000000000000'), ns: "crittercism.crash_log", min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, shard: "crittercism" } from: crittercism to: crittercism9 tag []
Tue Oct  9 18:20:27 [Balancer]  ns: crittercism.crash_analytics going to move { _id: "crittercism.crash_analytics-day_id_ObjectId('4f779a800000000000000000')hash_"ffb66412df4cf23d23a51aa4340911861e8da00875464b5df2e79ece"", lastmod: Timestamp 162000|1, lastmodEpoch: ObjectId('000000000000000000000000'), ns: "crittercism.crash_analytics", min: { day_id: ObjectId('4f779a800000000000000000'), hash: "ffb66412df4cf23d23a51aa4340911861e8da00875464b5df2e79ece" }, max: { day_id: ObjectId('4f7a3d800000000000000000'), hash: "6fdccd2109ad06bb6c162e42b1d17f2d995f905602ecbabf831624fa" }, shard: "crittercism7" } from: crittercism7 to: crittercism9 tag []
Tue Oct  9 18:20:27 [Balancer]  ns: crittercism.handled_exception_log going to move { _id: "crittercism.handled_exception_log-unsymbolized_hash_"225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9"_id_ObjectId('5049340ccd2b6e79a6000298')", lastmod: Timestamp 7215000|1, lastmodEpoch: ObjectId('000000000000000000000000'), ns: "crittercism.handled_exception_log", min: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('5049340ccd2b6e79a6000298') }, max: { unsymbolized_hash: "225440a5b53f8b748b036fd9f739d05b96b32cc5143d992eb24b7aa9", _id: ObjectId('50493b1bdcd8b022d600009e') }, shard: "crittercism10" } from: crittercism10 to: crittercism4 tag []
Tue Oct  9 18:20:27 [Balancer]  ns: crittercism.user_crash going to move { _id: "crittercism.user_crash-hashed_device_id_MinKeyapp_version_MinKeyerror_hash_MinKey", lastmod: Timestamp 2000|12, lastmodEpoch: ObjectId('506a1657d5dc72ee5f03460c'), ns: "crittercism.user_crash", min: { hashed_device_id: MinKey, app_version: MinKey, error_hash: MinKey }, max: { hashed_device_id: "0000460e-b541-3e73-92f6-45644b1a7951", app_version: "3.2.1", error_hash: "e035313ce9ebbec5f9effb44b4550d0a3d4ad77d4a1e796a9a08acf0" }, shard: "crittercism" } from: crittercism to: crittercism1 tag []
Tue Oct  9 18:20:27 [Balancer] moving chunk ns: crittercism.rawcrashlog moving ( ns:crittercism.rawcrashlog at: crittercism:crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018 lastmod: 28866|1||000000000000000000000000 min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') } max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }) crittercism:crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018 -> crittercism4:crittercism4/grape-2.crittercism.com:27018,grape-3.crittercism.com:27018,grape-4.crittercism.com:27018
Tue Oct  9 18:20:29 [Balancer] moveChunk result: { cause: { errmsg: "migrate already in progress", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: migrate already in progress", ok: 0.0 }
Tue Oct  9 18:20:30 [Balancer] ChunkManager: time to load chunks for crittercism.rawcrashlog: 684ms sequenceNumber: 18 version: 28866|2801||000000000000000000000000 based on: 28866|2797||000000000000000000000000
Tue Oct  9 18:20:30 [Balancer]   Assertion failure n == a src/mongo/s/shard.h 105
0x552271 0x5fcf6d 0x6d0a79 0x6e5b88 0x5ae60f 0x6cf4f6 0x6e09ac 0x807c15 0x80fd76 0x6b5f3e 0x6b75c4 0x61e189 0x7fefc6a72efc 0x7fefc5e03f4d 
 /usr/local/mongodb/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0x552271]
 /usr/local/mongodb/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0x5fcf6d]
 /usr/local/mongodb/bin/mongos(_ZN5mongo17ChunkRangeManager12_insertRangeESt23_Rb_tree_const_iteratorISt4pairIKNS_7BSONObjEN5boost10shared_ptrIKNS_5ChunkEEEEESB_+0x539) [0x6d0a79]
 /usr/local/mongodb/bin/mongos(_ZN5mongo12ChunkManager18loadExistingRangesERKSs+0x788) [0x6e5b88]
 /usr/local/mongodb/bin/mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsbb+0x4ef) [0x5ae60f]
 /usr/local/mongodb/bin/mongos(_ZNK5mongo12ChunkManager6reloadEb+0x96) [0x6cf4f6]
 /usr/local/mongodb/bin/mongos(_ZNK5mongo5Chunk13moveAndCommitERKNS_5ShardExbRNS_7BSONObjE+0xf2c) [0x6e09ac]
 /usr/local/mongodb/bin/mongos(_ZN5mongo8Balancer11_moveChunksEPKSt6vectorIN5boost10shared_ptrINS_11MigrateInfoEEESaIS5_EEb+0x415) [0x807c15]
 /usr/local/mongodb/bin/mongos(_ZN5mongo8Balancer3runEv+0x976) [0x80fd76]
 /usr/local/mongodb/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbe) [0x6b5f3e]
 /usr/local/mongodb/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x6b75c4]
 /usr/local/mongodb/bin/mongos() [0x61e189]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7fefc6a72efc]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fefc5e03f4d]
Tue Oct  9 18:20:31 [Balancer] distributed lock 'balancer/ip-10-161-7-17:27017:1349806072:1144614846' unlocked. 
Tue Oct  9 18:20:31 [Balancer] scoped connection to silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 not being returned to the pool
Tue Oct  9 18:20:31 [Balancer] caught exception while doing balance: assertion src/mongo/s/shard.h:105
Tue Oct  9 18:20:37 [Balancer] SyncClusterConnection connecting to [silva.crittercism.com:27019]
Tue Oct  9 18:20:37 [Balancer] SyncClusterConnection connecting to [kompany.crittercism.com:27019]
Tue Oct  9 18:20:37 [Balancer] SyncClusterConnection connecting to [balotelli.crittercism.com:27019]

Kev

unread,
Oct 9, 2012, 2:24:07 PM10/9/12
to mongod...@googlegroups.com
Oh, and here are the corresponding mongods at the time:

Tue Oct  9 18:22:59 [conn204948] created new distributed lock for crittercism.rawcrashlog on silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Tue Oct  9 18:22:59 [conn204948] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' acquired, ts : 50746b838d8e9f9ae99d6014
Tue Oct  9 18:22:59 [conn204948] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:22:59-486414", server: "ip-10-188-20-57", clientAddr: "10.161.7.17:49260", time: new Date(1349806979450), what: "moveChunk.start", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, from: "crittercism", to: "crittercism4" } }
Tue Oct  9 18:22:59 [conn204948] moveChunk request accepted at version 28866|2785||000000000000000000000000
Tue Oct  9 18:22:59 [conn204948] moveChunk number of documents: 1789
Tue Oct  9 18:22:59 [conn204948] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' unlocked. 
Tue Oct  9 18:22:59 [conn204948] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:22:59-486415", server: "ip-10-188-20-57", clientAddr: "10.161.7.17:49260", time: new Date(1349806979625), what: "moveChunk.from", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, step1 of 6: 0, step2 of 6: 259, note: "aborted" } }
Tue Oct  9 18:22:59 [conn204948] command admin.$cmd command: { moveChunk: "crittercism.rawcrashlog", from: "crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism4/grape-2.crittercism.com:27018,grape-3.crittercism.com:27018,grape-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism4", min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.rawcrashlog-unsymbolized_hash_"bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526"_id_ObjectId('503ad114f2318d1995000297')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false } ntoreturn:1 keyUpdates:0 locks(micros) r:2636 reslen:199 393ms
Tue Oct  9 18:23:05 [conn205071] received moveChunk request: { moveChunk: "crittercism.rawcrashlog", from: "crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism4/grape-2.crittercism.com:27018,grape-3.crittercism.com:27018,grape-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism4", min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.rawcrashlog-unsymbolized_hash_"bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526"_id_ObjectId('503ad114f2318d1995000297')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false }
Tue Oct  9 18:23:05 [conn205071] created new distributed lock for crittercism.rawcrashlog on silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Tue Oct  9 18:23:06 [conn206374] end connection 10.179.39.230:37302 (1416 connections now open)
Tue Oct  9 18:23:06 [initandlisten] connection accepted from 10.179.39.230:37305 #206377 (1417 connections now open)
Tue Oct  9 18:23:06 [conn206252] update crittercism.user_crash query: { app_version: "1.2.1", hashed_device_id: "506fd29adac522e7cb3bb96ade36d260e6da07f8d3d8530093b4096798ae2f56g", error_hash: "6848b79d508962477d35ac877ac4eb6ce9b707bf7ce464f6a64f98b2" } update: { $inc: { count: 1 } } nscanned:0 nupdated:1 fastmodinsert:1 keyUpdates:0 locks(micros) w:457007 228ms
Tue Oct  9 18:23:06 [conn205071] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' acquired, ts : 50746b898d8e9f9ae99d6089
Tue Oct  9 18:23:06 [conn205071] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:23:06-486416", server: "ip-10-188-20-57", clientAddr: "10.160.10.224:52449", time: new Date(1349806986838), what: "moveChunk.start", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, from: "crittercism", to: "crittercism4" } }
Tue Oct  9 18:23:06 [conn205071] moveChunk request accepted at version 28866|2785||000000000000000000000000
Tue Oct  9 18:23:06 [conn205071] moveChunk number of documents: 1789
Tue Oct  9 18:23:07 [conn205071] distributed lock 'crittercism.rawcrashlog/ip-10-188-20-57:27018:1347666812:127904054' unlocked. 
Tue Oct  9 18:23:07 [conn205071] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:23:07-486417", server: "ip-10-188-20-57", clientAddr: "10.160.10.224:52449", time: new Date(1349806987013), what: "moveChunk.from", ns: "crittercism.rawcrashlog", details: { min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, step1 of 6: 2, step2 of 6: 1038, note: "aborted" } }
Tue Oct  9 18:23:07 [conn205071] command admin.$cmd command: { moveChunk: "crittercism.rawcrashlog", from: "crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism4/grape-2.crittercism.com:27018,grape-3.crittercism.com:27018,grape-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism4", min: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503ad114f2318d1995000297') }, max: { unsymbolized_hash: "bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526", _id: ObjectId('503e047d209996162c0002b0') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.rawcrashlog-unsymbolized_hash_"bad232c81412ae200d88cda6ca0bf7ed4cfcab6514895b061f2c5526"_id_ObjectId('503ad114f2318d1995000297')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false } ntoreturn:1 keyUpdates:0 locks(micros) r:2574 reslen:199 1207ms
Tue Oct  9 18:23:10 [conn205071] received moveChunk request: { moveChunk: "crittercism.crash_log", from: "crittercism/apple-2.crittercism.com:27018,apple-3.crittercism.com:27018,apple-4.crittercism.com:27018", to: "crittercism9/acai-3.crittercism.com:27018,acai-4.crittercism.com:27018", fromShard: "crittercism", toShard: "crittercism9", min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, maxChunkSizeBytes: 67108864, shardId: "crittercism.crash_log-hash_"e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979"_id_ObjectId('500b46d7e230974770000054')", configdb: "silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019", secondaryThrottle: false }
Tue Oct  9 18:23:10 [conn205071] created new distributed lock for crittercism.crash_log on silva.crittercism.com:27019,kompany.crittercism.com:27019,balotelli.crittercism.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Tue Oct  9 18:23:11 [conn205071] distributed lock 'crittercism.crash_log/ip-10-188-20-57:27018:1347666812:127904054' acquired, ts : 50746b8e8d8e9f9ae99d60e4
Tue Oct  9 18:23:11 [conn205071] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:23:11-486418", server: "ip-10-188-20-57", clientAddr: "10.160.10.224:52449", time: new Date(1349806991062), what: "moveChunk.start", ns: "crittercism.crash_log", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, from: "crittercism", to: "crittercism9" } }
Tue Oct  9 18:23:11 [conn205071] moveChunk request accepted at version 2400|53||000000000000000000000000
Tue Oct  9 18:23:11 [conn205071] moveChunk number of documents: 9851
Tue Oct  9 18:23:11 [conn205071] distributed lock 'crittercism.crash_log/ip-10-188-20-57:27018:1347666812:127904054' unlocked. 
Tue Oct  9 18:23:11 [conn205071] about to log metadata event: { _id: "ip-10-188-20-57-2012-10-09T18:23:11-486419", server: "ip-10-188-20-57", clientAddr: "10.160.10.224:52449", time: new Date(1349806991237), what: "moveChunk.from", ns: "crittercism.crash_log", details: { min: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500b46d7e230974770000054') }, max: { hash: "e364999fc1280323e3f1070c6322c818c2d27758049a4a3133170979", _id: ObjectId('500bf0351644321276000477') }, step1 of 6: 0, step2 of 6: 649, note: "aborted" } }
Reply all
Reply to author
Forward
0 new messages