Speeding up chunk moving against shards

Beier

unread,

Jun 28, 2011, 5:46:25 PM6/28/11

to mongodb-user

I started sharding a big collection with about 500 millions documents,
150GB in size. The router divided the collection into over 5000 chunks
and started moving them from primary shard to the new one. the problem
is, looks like each chunk takes like 20-30 minutes to be moved (really
long time considering chunk size is only 64MB), why is it so slow? any
way to speed this up. It's gonna take a month to make it balanced at
this speed.

mongodb 1.8.1
Two shards setup, each shard is a replica set with 3 servers (1
primary, 1 secondary, and 1 hidden). and 3 config servers.

Thanks,

Eliot Horowitz

unread,

Jun 28, 2011, 8:24:29 PM6/28/11

to mongod...@googlegroups.com

Are other things taking place at the same time?
If so - 1.8.2 will speed things up a bit.

Can you send mongostat and iostat to look for a bottleneck?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Greg Studer

unread,

Jun 29, 2011, 12:38:56 PM6/29/11

to mongod...@googlegroups.com

Are you performing updates on the collection as you're moving chunks?
One possible cause is that it's taking awhile to reach a steady state
before committing the migration.

Beier

unread,

Jun 30, 2011, 4:49:54 PM6/30/11

to mongodb-user

sorry, correction: the data size is 240GB, I've heard there might be
issues if the initial data is over 256GB, but we are under that. And
we use Amazon EC2 m2.2xlarge instance type for all non-config servers
(32GB memory, 12 EC2 computing unit on 4 cores)

There is no extra load on any of the sharded servers other than chunks
moving, we are waiting for the balancing to finish before hooking it
up with production.

The whole data set was imported from mysql to the primary shard in
about 2 days (6 threads doing about 3000 inserts/sec, synchronous
insert), so the slowness of moving chunks kinda surprised me.

Right now it's been 2 days, and only 191 of 5000 chunks got moved to
the secondary shard.

Below are some iostat and mongostat requested 1pm-1:30pm PST today. I
noticed that the primary shard that contains majority of the data has
been very busy in terms of I/O, much busier than the new shard, is
that supposed to be the correct behavior?

http://dl.dropbox.com/u/282992/iostat_primary_shard.log
http://dl.dropbox.com/u/282992/mongostat_primary_shard.log

http://dl.dropbox.com/u/282992/iostat_secondary_shard.log
http://dl.dropbox.com/u/282992/mongostat_secondary_shard.log

Beier

unread,

Jun 30, 2011, 5:07:49 PM6/30/11

to mongodb-user

some more info that might be helpful.

All mongod servers use single non-raid EBS volumn

The initial import was done against a single non-sharded, non-
replicated mongod server. When it finished, we built a replication set
and let replication finishes. Then we built a new replication set as a
second Shard, then started sharding.

On Jun 30, 1:49 pm, Beier <beier...@gmail.com> wrote:
> sorry, correction: the data size is 240GB, I've heard there might be
> issues if the initial data is over 256GB, but we are under that. And
> we use Amazon EC2 m2.2xlarge instance type for all non-config servers
> (32GB memory, 12 EC2 computing unit on 4 cores)
>
> There is no extra load on any of the sharded servers other than chunks
> moving, we are waiting for the balancing to finish before hooking it
> up with production.
>
> The whole data set was imported from mysql to the primary shard in
> about 2 days (6 threads doing about 3000 inserts/sec, synchronous
> insert), so the slowness of moving chunks kinda surprised me.
>
> Right now it's been 2 days, and only 191 of 5000 chunks got moved to
> the secondary shard.
>
> Below are some iostat and mongostat requested 1pm-1:30pm PST today. I
> noticed that the primary shard that contains majority of the data has
> been very busy in terms of I/O, much busier than the new shard, is
> that supposed to be the correct behavior?
>
> http://dl.dropbox.com/u/282992/iostat_primary_shard.loghttp://dl.dropbox.com/u/282992/mongostat_primary_shard.log
>

> http://dl.dropbox.com/u/282992/iostat_secondary_shard.loghttp://dl.dropbox.com/u/282992/mongostat_secondary_shard.log

Greg Studer

unread,

Jun 30, 2011, 10:42:21 PM6/30/11

to Beier, mongodb-user

There's a high lock % which goes in phases in your primary shard
mongostat log - this is probably deletions of old chunks, which can
interfere with your chunk movement if constant. Looking to see if
there's a build with less aggressive chunk deletion, but in the
meantime, can you also send us a few db.currentOp()s to verify and
ensure there's nothing else going on? Feel free to open a support
ticket if you like.

Using a raided setup would probably help here - as would doing a
pre-split to load the data initially. Depends on your timeline, but
since the initial load was very fast, it may be faster for you to
pre-split and reload the data into both shards at once.

Beier

unread,

Jul 4, 2011, 4:28:58 PM7/4/11

to mongodb-user

Thanks for the suggestion about pre-splitting, will definitely try
that out. But it looks like we have a new issue here.

The chunk movement has stopped a couple of days ago (thus currentOp()
dosn't show splitchunk anymore) because it got stuck at a splitvector
operation due to the chunk size over 64MB (80MB). It's been re-trying
to move the same chunk since then, but couldn't move on. I can think
of increasing the chunk size to something like 128MB, but there is no
guarantee all chunks are under 128MB. Another way I can think of is
change the shard key (like a composed key with original shard key in
it), but then our application may end up hitting multiple shards all
the time when querying. Any other good options?

On Jun 30, 7:42 pm, Greg Studer <g...@10gen.com> wrote:
> There's a high lock % which goes in phases in your primary shard
> mongostat log - this is probably deletions of old chunks, which can

> interfere with yourchunkmovement if constant. Looking to see if
> there's a build with less aggressivechunkdeletion, but in the

> meantime, can you also send us a few db.currentOp()s to verify and
> ensure there's nothing else going on? Feel free to open a support
> ticket if you like.
>
> Using a raided setup would probably help here - as would doing a
> pre-split to load the data initially. Depends on your timeline, but
> since the initial load was very fast, it may be faster for you to

> pre-split and reload the data into bothshardsat once.

>
>
>
>
>
>
>
> On Thu, 2011-06-30 at 14:07 -0700, Beier wrote:
> > some more info that might be helpful.
>
> > All mongod servers use single non-raid EBS volumn
>

> > The initial import was doneagainsta single non-sharded, non-

> > replicated mongod server. When it finished, we built a replication set
> > and let replication finishes. Then we built a new replication set as a
> > second Shard, then started sharding.
>
> > On Jun 30, 1:49 pm, Beier <beier...@gmail.com> wrote:
> > > sorry, correction: the data size is 240GB, I've heard there might be
> > > issues if the initial data is over 256GB, but we are under that. And
> > > we use Amazon EC2 m2.2xlarge instance type for all non-config servers
> > > (32GB memory, 12 EC2 computing unit on 4 cores)
>
> > > There is no extra load on any of the sharded servers other than chunks
> > >moving, we are waiting for the balancing to finish before hooking it

> > >upwith production.

>
> > > The whole data set was imported from mysql to the primary shard in
> > > about 2 days (6 threads doing about 3000 inserts/sec, synchronous

> > > insert), so the slowness ofmovingchunks kinda surprised me.

>
> > > Right now it's been 2 days, and only 191 of 5000 chunks got moved to
> > > the secondary shard.
>
> > > Below are some iostat and mongostat requested 1pm-1:30pm PST today. I
> > > noticed that the primary shard that contains majority of the data has
> > > been very busy in terms of I/O, much busier than the new shard, is
> > > that supposed to be the correct behavior?
>

> > >http://dl.dropbox.com/u/282992/iostat_primary_shard.loghttp://dl.drop...
>
> > >http://dl.dropbox.com/u/282992/iostat_secondary_shard.loghttp://dl.dr...

>
> > > On Jun 29, 9:38 am, Greg Studer <g...@10gen.com> wrote:
>
> > > > Are you performing updates on the collection as you'removingchunks?
> > > > One possible cause is that it's taking awhile to reach a steady state
> > > > before committing the migration.
>
> > > > On Tue, 2011-06-28 at 14:46 -0700, Beier wrote:
> > > > > I started sharding a big collection with about 500 millions documents,
> > > > > 150GB in size. The router divided the collection into over 5000 chunks

> > > > > and startedmovingthem from primary shard to the new one. the problem
> > > > > is, looks like eachchunktakes like 20-30 minutes to be moved (really
> > > > > long time consideringchunksize is only 64MB), why is it so slow? any
> > > > > way to speed thisup. It's gonna take a month to make it balanced at
> > > > > this speed.
>
> > > > > mongodb 1.8.1
> > > > > Twoshardssetup, each shard is a replica set with 3 servers (1

Beier

unread,

Jul 4, 2011, 4:42:24 PM7/4/11

to mongodb-user

here is a typical repetitive log entry when it tries to move that
stuck chunk

Mon Jul 4 13:34:49 [conn142] received moveChunk request: { moveChunk:
"myDatabaseName.myCollectionName", from: "rs_z/
mgoDz1.ourdomainname.com:27018,mgoDz2.ourdomainname.com:27018", to:
"rs_y/mgoDy1.ourdomainname.com:27018,mgoDy2.ourdomainname.com:27018",
min: { shardKeyInCol: 76586 }, max: { shardKeyInCol: 76587.0 },
maxChunkSizeBytes: 67108864, shardId: "myDatabaseName.myCollectionName-
shardKeyInCol_76586", configdb: "mgocf1.ourdomainname.com:
27029,mgocf2.ourdomainname.com:27029,mgocf3.myDatabaseName..." }
Mon Jul 4 13:34:49 [conn142] about to log metadata event: { _id:
"mgoDz1.ourdomainname.com-2011-07-04T20:34:49-58283", server:
"mgoDz1.ourdomainname.com", clientAddr: "10.68.63.252:34410", time:
new Date(1309811689555), what: "moveChunk.start", ns:
"myDatabaseName.myCollectionName", details: { min: { shardKeyInCol:
76586 }, max: { shardKeyInCol: 76587.0 }, from: "rs_z", to: "rs_y" } }
Mon Jul 4 13:34:49 [conn142] moveChunk request accepted at version
235|3
Mon Jul 4 13:34:49 [conn142] warning: can't move chunk of size
(aprox) 88136224 because maximum size allowed to move is 67108864 ns:
myDatabaseName.myCollectionName { shardKeyInCol: 76586 } ->
{ shardKeyInCol: 76587.0 }
Mon Jul 4 13:34:49 [conn142] about to log metadata event: { _id:
"mgoDz1.ourdomainname.com-2011-07-04T20:34:49-58284", server:
"mgoDz1.ourdomainname.com", clientAddr: "10.68.63.252:34410", time:
new Date(1309811689913), what: "moveChunk.from", ns:
"myDatabaseName.myCollectionName", details: { min: { shardKeyInCol:
76586 }, max: { shardKeyInCol: 76587.0 }, step1: 0, step2: 228, note:
"aborted" } }
Mon Jul 4 13:34:50 [conn142] query admin.$cmd ntoreturn:1 command:
{ moveChunk: "myDatabaseName.myCollectionName", from: "rs_z/
mgoDz1.ourdomainname.com:27018,mgoDz2.ourdomainname.com:27018", to:
"rs_y/mgoDy1.ourdomainname.com:27018,mgoDy2.ourdomainname.com:27018",
min: { shardKeyInCol: 76586 }, max: { shardKeyInCol: 76587.0 },
maxChunkSizeBytes: 67108864, shardId: "myDatabaseName.myCollectionName-
shardKeyInCol_76586", configdb: "mgocf1.ourdomainname.com:
27029,mgocf2.ourdomainname.com:27029,mgocf3.myDatabaseName..." }
reslen:116 529ms
Mon Jul 4 13:34:50 [conn142] request split points lookup for chunk
myDatabaseName.myCollectionName { : 76586 } -->> { : 76587.0 }
Mon Jul 4 13:34:50 [conn142] limiting split vector to 250000 (from
213705206) objects
Mon Jul 4 13:34:50 [conn142] splitVector doing another cycle because
of force, keyCount now: 111848
Mon Jul 4 13:34:50 [conn142] warning: chunk is larger than
168399703088 bytes because of key { shardKeyInCol: 76586 }
Mon Jul 4 13:34:50 [conn142] warning: Finding the split vector for
myDatabaseName.myCollectionName over { shardKeyInCol: 1.0 } keyCount:
111848 numSplits: 0 lookedAt: 223696 took 231ms

Eliot Horowitz

unread,

Jul 4, 2011, 5:48:21 PM7/4/11

to mongod...@googlegroups.com

It should be splitting those, then moving.
Can you send a larger section of the log and the mongos log?

Beier

unread,

Jul 5, 2011, 7:43:47 PM7/5/11

to mongodb-user

found this article and discussion, probably the exact same issue.
looks like pre-splitting is the way to go. Unfortunately our key of
choice doesn't give much predictable pattern, so we'll have to do some
estimates, will see how it goes.

http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/
http://groups.google.com/group/mongodb-user/browse_thread/thread/63d13fbf6d699594/

Reply all

Reply to author

Forward