I'm getting very poor performance trying to insert a lot of documents for my analytics system. Looking at the mongos logs I'm seeing a lot of lines with multi-second time to load chunks. I also see a lot of warning: splitChunk failed because the collection's metadata lock is taken. The auto-balancer is turned off. I suppose the indexes are too big to fit in memory and it's thrashing trying to load index segments from disk, but I would appreciate any advice on the quickest way to relieve this. I'm already at the highest memory instances amazon offers. Is my only option to add more shards? Re-balancing is going to take a very long time. There are no updates or queries happening. Only inserts.
Mon Jun 11 06:47:43 [conn246] ChunkManager: time to load chunks for pb3.hourly_stats: 3156ms sequenceNumber: 133 version: 38916|395
Mon Jun 11 06:47:45 [mongosMain] connection accepted from
127.0.0.1:35566 #259
Mon Jun 11 06:47:47 [conn254] ChunkManager: time to load chunks for pb3.hourly_stats: 3480ms sequenceNumber: 134 version: 38916|395
Mon Jun 11 06:47:51 [conn247] ChunkManager: time to load chunks for pb3.hourly_stats: 4184ms sequenceNumber: 135 version: 38916|397
Mon Jun 11 06:47:52 [mongosMain] connection accepted from
127.0.0.1:35567 #260
Mon Jun 11 06:47:54 [mongosMain] connection accepted from
127.0.0.1:35569 #261
Mon Jun 11 06:47:54 [mongosMain] connection accepted from
127.0.0.1:35570 #262
Mon Jun 11 06:47:55 [conn256] ChunkManager: time to load chunks for pb3.hourly_stats: 3312ms sequenceNumber: 136 version: 38916|397
Mon Jun 11 06:47:58 [conn248] ChunkManager: time to load chunks for pb3.hourly_stats: 3151ms sequenceNumber: 137 version: 38916|399
Mon Jun 11 06:48:00 [mongosMain] connection accepted from
127.0.0.1:35571 #263
Mon Jun 11 06:48:02 [conn257] ChunkManager: time to load chunks for pb3.hourly_stats: 3238ms sequenceNumber: 138 version: 38916|399
Mon Jun 11 06:48:07 [conn258] ChunkManager: time to load chunks for pb3.hourly_stats: 4293ms sequenceNumber: 139 version: 38916|401
Mon Jun 11 06:48:09 [mongosMain] connection accepted from
127.0.0.1:35573 #264
Mon Jun 11 06:48:13 [conn246] ChunkManager: time to load chunks for pb3.hourly_stats: 6308ms sequenceNumber: 140 version: 38916|403
Mon Jun 11 06:48:17 [conn260] ChunkManager: time to load chunks for pb3.hourly_stats: 3625ms sequenceNumber: 141 version: 38916|403
Mon Jun 11 06:48:21 [conn261] ChunkManager: time to load chunks for pb3.hourly_stats: 3283ms sequenceNumber: 142 version: 38916|405
mongostat doesn't show anything locked because there's no activity, but I see 1 of the 3 shards (rotating) with a lot of faults. These are the 64GB amazon instances with a RAID 10.
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 1 0 4057g 8116g 1.4g 11 0 0 1|24 1|23 62b 1k 78 06:51:15
0 0 0 0 0 2 0 4057g 8116g 1.4g 10 0 0 1|24 1|23 329b 1k 78 06:51:16
0 0 0 0 0 2 0 4057g 8116g 1.4g 13 0 0 1|24 1|23 329b 1k 78 06:51:17
mongos> db.hourly_stats.stats()
{
"sharded" : true,
"flags" : 0,
"ns" : "pb3.hourly_stats",
"count" : 10869908568,
"numExtents" : 3787,
"size" : NumberLong("7012767684972"),
"storageSize" : NumberLong("8128240079712"),
"totalIndexSize" : NumberLong("3085999411712"),
"indexSizes" : {
"_id_" : NumberLong("1972366499264"),
},
"avgObjSize" : 645.1542477198875,
"nindexes" : 2,
"nchunks" : 195991,
"shards" : {
"shard0000" : {
"ns" : "pb3.hourly_stats",
"count" : 3560547991,
"size" : NumberLong("2332729783592"),
"avgObjSize" : 655.160326300458,
"storageSize" : NumberLong("2474551719936"),
"numExtents" : 1153,
"nindexes" : 2,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 926457716800,
"indexSizes" : {
"_id_" : 569445326576,
"log_file_name_1" : 357012390224
},
"ok" : 1
},
"shard0001" : {
"ns" : "pb3.hourly_stats",
"size" : NumberLong("2338500244900"),
"avgObjSize" : 639.2576539549224,
"storageSize" : NumberLong("2401851660816"),
"numExtents" : 1119,
"nindexes" : 2,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 1074908400384,
"indexSizes" : {
"_id_" : 695319768528,
"log_file_name_1" : 379588631856
},
"ok" : 1
},
"shard0002" : {
"ns" : "pb3.hourly_stats",
"count" : 3651210799,
"size" : NumberLong("2341537656480"),
"avgObjSize" : 641.3044289640314,
"storageSize" : NumberLong("3251836698960"),
"numExtents" : 1515,
"nindexes" : 2,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 1084633294528,
"indexSizes" : {
"_id_" : 707601404160,
"log_file_name_1" : 377031890368
},
"ok" : 1
}
},
"ok" : 1
}