Loading with a sharded database configuration

jdr2000

unread,

Feb 11, 2011, 1:31:26 PM2/11/11

to mongodb-user

Is there a thread somewhere that discusses in full the system design
implications of sharding? (or let's start one)

In particular: I think of a limiter with a single replica set is
maxing-out the write-load on the current Master, or some combination
of read/write that doesn't include any slaves. But at some point
you'll need to increase write capacity. But if I do so with
sharding*, the situation changes and the limiter becomes either the
monogs or mongod config server. When should I start worrying about
maxing-out a sharded configuration? And what is the ideal
configuration for high-capacity sharding? Adding N pairs of (mongos +
mongod config servers), then load-balancing my client application(s)
connections to the N monogs boxes? Is this sharding front end
designed to be infinitely** scalable, assuming some reasonable load-
balancing scheme from the client application(s)?

Is there any experience or data or guidance out there with this aspect
of the MongoDB design?

Thanks.

*=and assume in this case that the sharded replica set masters are
never maxed-out...can always add a new shard
**=OK, a large finite value will do

roger

unread,

Feb 11, 2011, 1:59:25 PM2/11/11

to mongodb-user

I think you're thinking of capacity planning. So based on demand, you
have to plan ahead and start adding resources before you hit capacity.
Simply put you start worrying about capacity when your tests or demand
indicate that your performance will be below what you want it to be.

Regarding sharding, there's documentation on how to upgrade from a non-
sharded systerm: http://www.mongodb.org/display/DOCS/Upgrading+from+a+Non-Sharded+System

Hope this helps.
-Roger

Gates

unread,

Feb 11, 2011, 3:05:20 PM2/11/11

to mongodb-user

> I think of a limiter with a single replica set is maxing-out the write-load on the current Master...

The way we describe this in our talks is as follows:
1. Replica Sets -> scaling reads
2. Sharding -> scaling writes

> Is there a thread somewhere that discusses in full the system design implications of sharding?

The full design implications of sharding can be quite complex. I'm not
sure that a thread or even a single doc is really going to adequately
address this. Maybe a book, probably some on-site consulting.

The unfortunate problem with several of your questions is that there
are multiple sets of trade-offs that need to be done. Here's a quick
sampling:

> When should I start worrying about maxing-out a sharded configuration?

Depends on performance specs. Are you write-heavy or ready-heavy? To
remain performant, do you need indexes in RAM or all data in RAM?
Are your drives big enough for the data? If you're running SSDs you
may run out of space before your performance tanks.

> And what is the ideal configuration for high-capacity sharding?

Depends on "high-capacity" and "ideal". If all you want is capacity,
then just RAID 0 all of the 3TB drives you can fit into a box and
stuff in as much RAM as you can afford.

Of course, that's not a really complete answer because there are
obviously other trade-offs here.

> Is this sharding front end designed to be infinitely** scalable, assuming some reasonable load-balancing scheme from the client application(s)?

We have a doc here on sharding limits:
http://www.mongodb.org/display/DOCS/Sharding+Limits

> Adding N pairs of (mongos + mongod config servers), then load-balancing my client application(s) connections to the N monogs boxes?

You can only have 1 or 3 config servers right now. So you're not
actually "adding pairs of mongos and config".

Our standard advice is to put one mongos on each application server.
For most people this means one mongos per web server. You can also
pool multiple mongos behind a load balancer, but you only need to do
this if mongos start hogging too much RAM / CPU time. In our
experience this is not happening very often.

> Is there any experience or data or guidance out there with this aspect of the MongoDB design?

At some level, this experience does exist. You can see our production
deployments page to see who's running us. Many of these companies are
running some form of RS or RS + Sharding.
http://www.mongodb.org/display/DOCS/Production+Deployments

However, when it comes to implementing RS + Sharding you're handling
lots of data across lots of servers. There are many trade-offs to be
made here based on what you need. Some basic things to consider:
- Read vs Write rates
- Replication of writes vs write speed (how many copies do you need
or want?)
- Fail-over scenarios within a data center
- Fail-over scenarios between data centers
- Conflict resolution or avoidance
- Backup & restore scenarios
- Data extraction, are you running M/R? are you running reports on
the system?

So, at some level, 10gen definitely wants to provide basic guidance
here. And we're working towards providing more of this both at
conferences and in white papers.

However, at the end of the day, the questions you're asking are
actually quite complex and will be really specific to your
implementation.

The best way to get this kind of guidance right now is to work with
10gen directly.
http://www.10gen.com/consulting

- Gates

On Feb 11, 10:31 am, jdr2000 <john.ri...@vonage.com> wrote:

Reply all

Reply to author

Forward