Q limits scalability?

ooi

unread,

Oct 11, 2012, 6:16:14 PM10/11/12

to bigcou...@googlegroups.com

BigCouch documentation recommends to overshard because you can't (currently) add shards later, and the number of shards limits your scalability. You can only have up to Q x N nodes in a cluster, and they will have 1 shard each. Adam@cloudant's talk on BigCounch says it makes sense to overprovision as long as there are file descriptors to handle it and uses 64 as an example.

So I want to pick a really big value for Q, right? Our system is expected to grow by TB daily, and exceed PB over 25 years. I figure thousands of shards would be overkill for now, but let us continue to scale as scientists around the world try to read huge data sets from our multi-PB store. But…

Our puny test system was created with only 256 shards (Q=256, N=3, W=R=2). It crashed and burned on fairly small batch updates (<500KB) on a 3-node cluster (each with 3 cores, 6gb RAM). I didn't expect Q to have such an impact on performance, but it really does. As Q goes up, the time to do the bulk insert goes up dramatically:

Q    sec
4      6.4
8      7.7
16    10.8
32    16.9
64    37.0

When I started with BigCouch a couple months ago, I thought this would be able to scale from a couple of regular PCs to a sizable server farm as long as Q was big enough. Now it looks like you either have to pick Q low enough to run on those regular PCs (but then you can't scale past a couple dozen nodes) or you need to start with a big HW investment to support Q high enough to scale into petabytes.

Am I missing something? Has anyone else run into this? Why would it take so much more CPU? Has anyone worked around this?

And sorry adam@cloudant, this isn't limited by file descriptors. Tried upping that limit too...

Idan Shechter

unread,

Nov 11, 2012, 2:34:46 AM11/11/12

to bigcou...@googlegroups.com

I am interested to hear any opinion about it too.

Michael Miller

unread,

Nov 11, 2012, 12:42:24 PM11/11/12

to bigcou...@googlegroups.com, bigcou...@googlegroups.com

Hi, this is a fair question and worthy of a longer response than I can hack out on a phone. I'll try to turn around something more helpful later today, but in the short term:

1) you are correct. The optimal value for q depends on both DB scale and the resources available.

2) while bigcouch doesn't support automatic resharding, it is standard practice to manually reshard as you grow. Manual here means creating a new database and replicating, so it's not that brutal, nor that manual.

3) bigcouch is built to support both a single massive database or millions of small databases. In the latter case it is routine to have a node count far exceeding n*q.