As your data grows, mongodb should be able to automatically migrate it
between servers using the balancer so there are fewer hotspots when you
add more machines.
When you insert documents construct the shard key so each document uses
a round-robin prefix, "0<rest>", "1<rest>", ..., "9<rest>", "0<rest>", etc.
If you keep the relative volume of documents for each shard about the
same the chunks will split fairly uniformly and the balancer won't ever
find much work to do.
Rob.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
You'll need to perform some splits (a metadata operation, very fast),
and migrate about 1/N+1 of the data from each shard to the new shard
(this is all automated by the balancer, but you can do it manually), as
Eliot mentioned. If you're using hashed shard keys, future writes will
be distributed very evenly across the whole cluster once this takes
place. The round-robin order won't really matter for throughput, so
long as on average you're "feeding" each shard inserts on average at
(slightly less) than the maximum rate it can process - won't matter at
all if there are always a few writes in each shard's queue. Hashing
ensures this even write distribution on average.
If absolute maximum throughput is less important than the organization
of the data, you can let mongod migrate data in hotspots to new shards
(for example, if one user has a lot of inserts), which will eventually
lead to close to the same distribution. This is a trade-off - if you
need to maintain data ordering, you will need to move data to maintain
this ordering (but never *all* of it, just hotspot chunks).
When you add a new shard, you (regular-)split and move existing chunks
to the new (empty) shard, which then accepts writes in those chunk
ranges. Not sure exactly which error you're referring to, but you need
to add the empty shard first, then move data second.
> an option_ out of the box just by inserting each batch of documents
> into the "next" shard to increase throughput?
To do this, you'd need to track the "current" shard between every mongos
(there are usually many) across the network, which would be slow. If
you only track per-mongos, you will end up with big spikes in inserts
when all mongoses are in phase. To avoid spikes and coordinating, you
randomize - hashed keys - perhaps using other data too.
> they be _eventually_ somewhat evenly distributed
If you have enough writes for this to be an issue, they'll really be
very evenly distributed using hashed keys.
///
Each mongos instance will route writes identically to (multiple) shards,
and there is no mongos write lock - don't think ordered insertion into
the router mongoses will help you with your write locking concern? The
writes will go to the same shard mongod(s), maybe faster.
I think we're missing each other in some way here, let me know if I'm
misunderstanding something - it seems like in your case you have a (very
fast) single app client,? which is somewhat of a special case. In
general there are multiple clients to a sharded db, say 20, and if each
client uses a round-robin approach, however you achieve this, you'll get
spikes of writes to shards when all 20 clients happen to pick the same
order at the same time. Hashing avoids this issue pretty cheaply, which
is why it (will be) an option, but we're just trying to consider what
happens when you suddenly need multiple app servers :-).