Forcing a Round Robin on Sharded Collections

Anatoly Polinsky

unread,

Jun 6, 2011, 4:35:31 PM6/6/11

to mongodb-user

Is there a way to force mongo to do a real time round robin for
inserts?

We have a pretty high write / insert requirements [speed wise and
number wise], and we are trying to avoid a global ( mongod ) locks.

Thank you,
/Anatoly

P.S. Pre splitting alone does not do it. Pre splitting and moving is
not a good solution, since it needs to be done and re done ( as data
grows ) manually.

Greg Studer

unread,

Jun 6, 2011, 5:21:11 PM6/6/11

to mongod...@googlegroups.com

Nothing built-in - assuming you're talking about a sharded setup, the
shards are organized by a particular shard key, which determines where
the insert will go. You can choose shard keys which distribute the data
evenly (random hash values, for example), but insert order isn't a
feature of sharding.

As your data grows, mongodb should be able to automatically migrate it
between servers using the balancer so there are fewer hotspots when you
add more machines.

Anatoly Polinsky

unread,

Jun 6, 2011, 5:32:55 PM6/6/11

to mongodb-user

"which determines where the insert will go." => that is: where the
insert will go _eventually_, once a chunk size is reached and a
balancer has finished moving it.

We are looking for a _real time_ round robin, since if a high number
of records is thrown on mongoS, it will start writing into a single
shard, which by the way will block writes, since application layer can
throw ( through the socket ) data faster than mongo server side can
process.

If sharding is not an answer to a high write throughput, due to a
constant mongoD write blocks + data does not fit into the memory in
full, then what is?

Thank you,
/Anatoly

Robert Moore

unread,

Jun 6, 2011, 10:22:09 PM6/6/11

to mongod...@googlegroups.com

You could you setup a preshard on a prefix of the shard key and make
sure that is distributed uniformly across the shard servers. e.g., for
9 shards using a "string" for the shard key if could be as simple as 10
split and moves.
"0" => shard0000
"1" => shard0001
...
"9" => shard0009

When you insert documents construct the shard key so each document uses
a round-robin prefix, "0<rest>", "1<rest>", ..., "9<rest>", "0<rest>", etc.

If you keep the relative volume of documents for each shard about the
same the chunks will split fairly uniformly and the balancer won't ever
find much work to do.

Rob.

Andrew Armstrong

unread,

Jun 6, 2011, 11:21:47 PM6/6/11

to mongodb-user

You could vote for https://jira.mongodb.org/browse/SERVER-2001
(supposedly a 1 day change!) which would let you hash a shard key,
providing 'even' distribution of your records (pretty much round
robin).

Alternatively you can do this manually by using a 'random' hash
identifier like md5/sha1/etc as your shard key of a random value.

- Andrew

Anatoly Polinsky

unread,

Jun 7, 2011, 2:03:19 PM6/7/11

to mongodb-user

@Robert,

Yep, that is what we tried, and that is why I mentioned that pre-
splitting and moving chunks would not be a good solution, since as
data grows we would need to re- pre-split / re- move, which would take
a long time with TBs of data. But thanks for your reply.

@Andrew,

Just voted for the JIRA. But would not implementing hash manually
will still write to a single mongoD until a balancer decides a chunk
size is reached and start moving it across the shards?

Thank you,
/Anatoly

On Jun 6, 11:21 pm, Andrew Armstrong <phpla...@gmail.com> wrote:
> You could vote forhttps://jira.mongodb.org/browse/SERVER-2001

Greg Studer

unread,

Jun 8, 2011, 1:43:25 PM6/8/11

to mongod...@googlegroups.com

What kind of behavior are you concerned about as you add more data?
Assuming you start with N shards, pre-splitting and using a manual hash
shard key (as Andrew and Robert suggested) can give you even writes
across all N shards, even when you very first start inserting data, with
no balancing required. Or are you more concerned about what happens
when you need to add a new N+1 shard after many inserts? The new shard
will start empty, so some amount of data movement will probably need to
take place if the shards are to stay balanced.

Anatoly Polinsky

unread,

Jun 8, 2011, 5:44:31 PM6/8/11

to mongodb-user

@Greg,

Scaling out ( adding more shards ) becomes a difficult task, since
the shard key should reflect some "business partitioning", and if all
you need is to maintain that "business partitioning", but to scale
out, you would have to RE shard everything, given a different number
of shards. Plus of course to RE move everything, which with TBs of
data could take... a good while.

/Anatoly

Eliot Horowitz

unread,

Jun 9, 2011, 7:13:00 AM6/9/11

to mongod...@googlegroups.com

I'm not sure what you mean by business partitioning.

Lets say you don't need to do range queries on the shard key.

You can take whatever key you want, and hash it.

That distributes reads/writes completely evenly throughout a cluster.

When you add a new shard (lets say going from 5 to 6) 16% of the data from each shard has to move.

So if you have 1 TB per machine, 168gb has to move.

Which is a lot, but not an insane amount.

And if you do it early enough, the system will do it slowly as to have a low impact.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Greg Studer

unread,

Jun 9, 2011, 8:27:18 AM6/9/11

to mongod...@googlegroups.com

MongoDB doesn't require resharding all the data when you add an
additional server, only a smaller number 1/N of documents would need to
be migrated initially (or if this is unacceptable, you might redirect
inserts temporarily by manipulating the shard keys generated). Chunks
don't need to be contiguous on a particular server (though mongodb does
try when possible). Do you have an additional requirement that your
documents be maintained on particular servers?

Anatoly Polinsky

unread,

Jun 10, 2011, 1:44:25 PM6/10/11

to mongodb-user

@Eliot,

We do need to run range queries on the shard key, and I agree
200GB can be moved as a part of a end of day batch.

But even if we did not, I still don't see how hashing a shard key
would result in: "_completely evenly_ throughout a cluster". I
understand how it would help to achieve a _more_ even distribution,
but a true round robin? Can you give an example with simple numbers
and a simple function?
@Greg,

Yes, I understand if a balancer is on, and auto sharding is used,
adding a shard is not a problem. However to achieve a true round
robin, we have to pre split and pre move chunks accordingly. And then
adding additional shard will require to re... split?

Thank you,
/Anatoly

Greg Studer

unread,

Jun 10, 2011, 5:30:00 PM6/10/11

to mongod...@googlegroups.com

> robin, we have to pre split and pre move chunks accordingly. And then
> adding additional shard will require to re... split?

You'll need to perform some splits (a metadata operation, very fast),
and migrate about 1/N+1 of the data from each shard to the new shard
(this is all automated by the balancer, but you can do it manually), as
Eliot mentioned. If you're using hashed shard keys, future writes will
be distributed very evenly across the whole cluster once this takes
place. The round-robin order won't really matter for throughput, so
long as on average you're "feeding" each shard inserts on average at
(slightly less) than the maximum rate it can process - won't matter at
all if there are always a few writes in each shard's queue. Hashing
ensures this even write distribution on average.

If absolute maximum throughput is less important than the organization
of the data, you can let mongod migrate data in hotspots to new shards
(for example, if one user has a lot of inserts), which will eventually
lead to close to the same distribution. This is a trade-off - if you
need to maintain data ordering, you will need to move data to maintain
this ordering (but never *all* of it, just hotspot chunks).

Eliot Horowitz

unread,

Jun 10, 2011, 7:40:18 PM6/10/11

to mongod...@googlegroups.com

Lets say you have a hash function that evenly distributes keys between 0 and 100.

Then you put ranges

0-25 shard a

25-50 shard b

50-75 shard c

75-100 shard d

If you insert 1000 documents randomly, they'll be evenly distributed between each shard.

Anatoly Polinsky

unread,

Jun 12, 2011, 3:30:21 PM6/12/11

to mongodb-user

@Greg,

If everything is pre split and pre moved, and all shards have
data, once I add a new shard I can pre split, pre move again? Last
time I tried it, it gave a "shard already has data" error.

@Eliot,

Ok, so it is not a true Round Robin, but an "optimistic" one =>
they be _eventually_ somewhat evenly distributed, plus a shard key is
usually more complex then a single number: e.g. may include date, some
business id, etc..

I am still unclear why a true Round Robin is hard to provide _as
an option_ out of the box just by inserting each batch of documents
into the "next" shard to increase throughput?

Thank you,
/Anatoly

On Jun 10, 7:40 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Lets say you have a hash function that evenly distributes keys between 0 and
> 100.
> Then you put ranges
> 0-25 shard a
> 25-50 shard b
> 50-75 shard c
> 75-100 shard d
>
> If you insert 1000 documents randomly, they'll be evenly distributed between
> each shard.
>

Eliot Horowitz

unread,

Jun 12, 2011, 6:18:08 PM6/12/11

to mongod...@googlegroups.com

We are going to be adding hash based sharding which makes this easier: https://jira.mongodb.org/browse/SERVER-2001

Greg Studer

unread,

Jun 12, 2011, 7:11:07 PM6/12/11

to mongod...@googlegroups.com

> data, once I add a new shard I can pre split, pre move again? Last
> time I tried it, it gave a "shard already has data" error.
>

When you add a new shard, you (regular-)split and move existing chunks
to the new (empty) shard, which then accepts writes in those chunk
ranges. Not sure exactly which error you're referring to, but you need
to add the empty shard first, then move data second.

> an option_ out of the box just by inserting each batch of documents
> into the "next" shard to increase throughput?

To do this, you'd need to track the "current" shard between every mongos
(there are usually many) across the network, which would be slow. If
you only track per-mongos, you will end up with big spikes in inserts
when all mongoses are in phase. To avoid spikes and coordinating, you
randomize - hashed keys - perhaps using other data too.

> they be _eventually_ somewhat evenly distributed

If you have enough writes for this to be an issue, they'll really be
very evenly distributed using hashed keys.

///

Anatoly Polinsky

unread,

Jun 13, 2011, 11:39:58 AM6/13/11

to mongodb-user

@Eliot,

Thanks for the JIRA => already voted for it after MongoPhilly back
in April.

@Greg,

If I have X shards with X mongoSs, having the "next shard key"
always belong to the "next physical shard", I don't see how keeping
the "current" shard is even relevant if the "batch insert" just "round
robins" over given X of mongoSs.

The tricky part is ' having the "next shard key" always belong to
the "next physical shard" ', which I think https://jira.mongodb.org/browse/SERVER-2001
would help with, otherwise it feels like an application has actually
be _aware_ that it sits on top of X shards, which is .. not really an
application concern.

Thank you,
/Anatoly

Greg Studer

unread,

Jun 13, 2011, 6:58:40 PM6/13/11

to mongod...@googlegroups.com

> If I have X shards with X mongoSs, having the "next shard key"
> always belong to the "next physical shard", I don't see how keeping
> the "current" shard is even relevant if the "batch insert" just "round
> robins" over given X of mongoSs.

Each mongos instance will route writes identically to (multiple) shards,
and there is no mongos write lock - don't think ordered insertion into
the router mongoses will help you with your write locking concern? The
writes will go to the same shard mongod(s), maybe faster.

I think we're missing each other in some way here, let me know if I'm
misunderstanding something - it seems like in your case you have a (very
fast) single app client,? which is somewhat of a special case. In
general there are multiple clients to a sharded db, say 20, and if each
client uses a round-robin approach, however you achieve this, you'll get
spikes of writes to shards when all 20 clients happen to pick the same
order at the same time. Hashing avoids this issue pretty cheaply, which
is why it (will be) an option, but we're just trying to consider what
happens when you suddenly need multiple app servers :-).

Anatoly Polinsky

unread,

Jun 27, 2011, 11:02:54 AM6/27/11

to mongodb-user

@Greg,

I agree, hashing will help with write spikes, and that is what we
are going to rely on. In case we actually can use Mongo. For now
inserts results alone are not satisfactory: http://bit.ly/khNCtv [ and
we also have heavy read requirements against the data that is being
"stream inserted", which with locking and current single threaded map/
reduce implementation is definitely a blocker ]

But back to round robin, we tried many clients, each connected to
their dedicated MongoS, where each client would insert into a single
shard => client#3 => document with business id #3 => goes to its
separate MongoS => that inserts it into shard #3. Where everything is
pre split, pre moved to achieve this ideal [not real life] scenario.
The real life scenario (hash based sharding) will be worse :) But even
with this setup Mongo can't keep up.

/Anatoly

P.S. The more I spent time with Mongo the more I think that it is
suitable for small datasets with not too heavy write throughput, and
heavy ( with no aggregation ) read throughput.

> > the "next physical shard" ', which I thinkhttps://jira.mongodb.org/browse/SERVER-2001

Greg Studer

unread,

Jun 27, 2011, 11:54:56 AM6/27/11

to mongod...@googlegroups.com

Yeah, single-threaded map reduce is being addressed somewhat, but not in
the current version - same with some concurrency fixes in 2.0. Sorry
you weren't able to get the performance you need, appears a single app
pushes data faster than a single mongod consumes it. Just for my own
curiosity, I wonder if you'd be able to scale shards outward even
further, so one app feeds two shards, for example, but this may or may
not become unmanageable/expensive for you vs other solutions.

Reply all

Reply to author

Forward