Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Q limits scalability?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
ooi  
View profile  
 More options Oct 11 2012, 6:16 pm
From: ooi <jonathan.newbro...@gyregroup.com>
Date: Thu, 11 Oct 2012 15:16:14 -0700 (PDT)
Local: Thurs, Oct 11 2012 6:16 pm
Subject: Q limits scalability?

BigCouch documentation recommends to overshard because you can't
(currently) add shards later, and the number of shards limits your
scalability.  You can only have up to Q x N nodes in a cluster, and they
will have 1 shard each.  Adam@cloudant's talk on BigCounch says it makes
sense to overprovision as long as there are file descriptors to handle it
and uses 64 as an example.  

So I want to pick a really big value for Q, right?  Our system is expected
to grow by TB daily, and exceed PB over 25 years.  I figure thousands of
shards would be overkill for now, but let us continue to scale as
scientists around the world try to read huge data sets from our multi-PB
store.  But…

Our puny test system was created with only 256 shards (Q=256, N=3, W=R=2).  
It crashed and burned on fairly small batch updates (<500KB) on a 3-node
cluster (each with 3 cores, 6gb RAM).  I didn't expect Q to have such an
impact on performance, but it really does.  As Q goes up, the time to do
the bulk insert goes up dramatically:

Q    sec
4      6.4
8      7.7
16    10.8
32    16.9
64    37.0

When I started with BigCouch a couple months ago, I thought this would be
able to scale from a couple of regular PCs to a sizable server farm as long
as Q was big enough.  Now it looks like you either have to pick Q low
enough to run on those regular PCs (but then you can't scale past a couple
dozen nodes) or you need to start with a big HW investment to support Q
high enough to scale into petabytes.

Am I missing something?  Has anyone else run into this?  Why would it take
so much more CPU?  Has anyone worked around this?

And sorry adam@cloudant, this isn't limited by file descriptors.  Tried
upping that limit too...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Idan Shechter  
View profile  
 More options Nov 11 2012, 2:34 am
From: Idan Shechter <lesbitc...@gmail.com>
Date: Sat, 10 Nov 2012 23:34:46 -0800 (PST)
Local: Sun, Nov 11 2012 2:34 am
Subject: Re: Q limits scalability?

I am interested to hear any opinion about it too.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Miller  
View profile  
 More options Nov 11 2012, 12:42 pm
From: Michael Miller <mlmillerat...@gmail.com>
Date: Sun, 11 Nov 2012 09:42:24 -0800
Local: Sun, Nov 11 2012 12:42 pm
Subject: Re: Q limits scalability?

Hi, this is a fair question and worthy of a longer response than I can hack out on a phone. I'll try to turn around something more helpful later today, but in the short term:

1) you are correct.  The optimal value for q depends on both DB scale and the resources available.

2) while bigcouch doesn't support automatic resharding, it is standard practice to manually reshard as you grow. Manual here means creating a new database and replicating, so it's not that brutal, nor that manual.

3) bigcouch is built to support both a single massive database or millions of small databases. In the latter case it is routine to have a node count far exceeding n*q.

I'll try to add more specifics later.

Thanks, Mike Miller (Cloudant)

On Nov 10, 2012, at 11:34 PM, Idan Shechter <lesbitc...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »