BigCouch documentation recommends to overshard because you can't (currently) add shards later, and the number of shards limits your scalability. You can only have up to Q x N nodes in a cluster, and they will have 1 shard each. Adam@cloudant's talk on BigCounch says it makes sense to overprovision as long as there are file descriptors to handle it and uses 64 as an example.
So I want to pick a really big value for Q, right? Our system is expected to grow by TB daily, and exceed PB over 25 years. I figure thousands of shards would be overkill for now, but let us continue to scale as scientists around the world try to read huge data sets from our multi-PB store. But…
Our puny test system was created with only 256 shards (Q=256, N=3, W=R=2). It crashed and burned on fairly small batch updates (<500KB) on a 3-node cluster (each with 3 cores, 6gb RAM). I didn't expect Q to have such an impact on performance, but it really does. As Q goes up, the time to do the bulk insert goes up dramatically:
Q sec
4 6.4
8 7.7
16 10.8
32 16.9
64 37.0
When I started with BigCouch a couple months ago, I thought this would be able to scale from a couple of regular PCs to a sizable server farm as long as Q was big enough. Now it looks like you either have to pick Q low enough to run on those regular PCs (but then you can't scale past a couple dozen nodes) or you need to start with a big HW investment to support Q high enough to scale into petabytes.
Am I missing something? Has anyone else run into this? Why would it take so much more CPU? Has anyone worked around this?
And sorry adam@cloudant, this isn't limited by file descriptors. Tried upping that limit too...