2 Node High Availability

Andrew Eross

unread,

Sep 22, 2012, 8:56:50 AM9/22/12

to couc...@googlegroups.com

Hi folks,

I've been experimenting with Couchbase's clustering and hit an unexpected hiccup with the behavior upon node failure.

I have 2 database servers running in a simple master/standby setup and I wanted to add couchbase to both machines in as highly available a configuration as possible.

Initially I just setup an independent couchbase instance on each machine since we're mostly using CB for session storage, so it's bad, but not horrible if the data is lost.. it's more important for us that the app stays available with a quick switch-over in case the primary fails.

So I was hoping I could do one better and cluster the 2 nodes such that in case of the primary failing, we'd simply be able to instantly direct traffic at the secondary and have it continue services while preserving our data.

However, I've found this does not work that way, even worse I found I was actually reducing our availability because if the backup went down it takes out the primary and requests would start coming back with messages like "failed with: SERVER_ERROR proxy write to downstream 192.168.0.16"

I've seen that it's possible to do auto-failover with 3 nodes, but even that delays for some-odd 30 seconds, and also we don't have a third machine available.

I've also seen we could try to write our own fail-over system with the API to monitor the servers and make API calls to do the equivalent of pressing the fail-over button.

However, before going down that path.. I was wondering if this is a known use case and there's a standard solution that others are using? Is there a standard way to do a master/backup type solution?

Thank you!

Andrew

Aliaksey Kandratsenka

unread,

Sep 22, 2012, 7:52:46 PM9/22/12

to couc...@googlegroups.com

On Sat, Sep 22, 2012 at 3:56 PM, Andrew Eross <angrygr...@gmail.com> wrote:

Hi folks,

I've been experimenting with Couchbase's clustering and hit an unexpected hiccup with the behavior upon node failure.

I have 2 database servers running in a simple master/standby setup and I wanted to add couchbase to both machines in as highly available a configuration as possible.

Initially I just setup an independent couchbase instance on each machine since we're mostly using CB for session storage, so it's bad, but not horrible if the data is lost.. it's more important for us that the app stays available with a quick switch-over in case the primary fails.

So I was hoping I could do one better and cluster the 2 nodes such that in case of the primary failing, we'd simply be able to instantly direct traffic at the secondary and have it continue services while preserving our data.

However, I've found this does not work that way, even worse I found I was actually reducing our availability because if the backup went down it takes out the primary and requests would start coming back with messages like "failed with: SERVER_ERROR proxy write to downstream 192.168.0.16"

That's because couchbase is not master-slave but it's actually sharding your data to spread ops and data evenly between hosts. Each node is simultaneously master for subset of all keys and replica (aka slave) for some other subset of keys.

I've seen that it's possible to do auto-failover with 3 nodes, but even that delays for some-odd 30 seconds, and also we don't have a third machine available.

That's because our autofailover implementation is consensus based and very conservative. You may want to google for "github mysql failover" for recent famous incident where autofailover led to cascading failure. You can see some notable people are actually advocating for manual-only failover, but with automatic notification of admins that some action is required.

Our automatic failover implementation is mainly targeted for larger installations where probability of just one or few nodes failing, say at night when you sleep, is significant and when increased load on rest of cluster after failover of just one node is small.

In general if all you have is just 2 nodes and network between them, then reliable automatic failover is seemingly impossible. I.e. consider a case when both nodes are healthy but network link between them fails. You'll have two identical cluster partitions. Neither of partitions can distinguish this case from true failure of other node.

Note that's just 2 nodes. I believe in your case you may have some other boxes. E.g. application server nodes.

I've also seen we could try to write our own fail-over system with the API to monitor the servers and make API calls to do the equivalent of pressing the fail-over button.

However, before going down that path.. I was wondering if this is a known use case and there's a standard solution that others are using? Is there a standard way to do a master/backup type solution?

If you go that path I'll be eager seeing how exactly you plan to do that. One possibility would be failing over DB node based on some 'good enough' consensus on 'whether it's working or not' from perspective of client nodes.

Chad Kouse

unread,

Sep 23, 2012, 12:12:59 AM9/23/12

to couc...@googlegroups.com

I'd be interested to see your implementation too. A/B failover is hard. I think couchbase is perfect for a larger installation there where if one server from your cluster fails it only impacts, say, 1 out of 16 users. And then 30 seconds later fails over so even those 1:16 are happy. Then you just need a little logic in your application to know the difference between "service unavailable" and "session doesn't exist" and tell your users what is happening (ex "sorry we are down for maintenance--try again in a few seconds" rather than "please login anonymous user!")

--
Chad Kouse

Andrew Eross

unread,

Sep 24, 2012, 4:00:16 PM9/24/12

to couc...@googlegroups.com

Hi Aliaksey, Chad,

Thanks for the responses --

I saw the github article, definitely good food for thought.

I understand and thanks for the extra explanations! At first I was just taken aback when taking one node offline broke the cluster as I imagined the cluster would keep working automagically somehow you know? However, I see that's what auto-failover is for? Do I have it right this would be the same case with a cluster of 10 nodes.. e.g. if 1 node goes down, the whole cluster will no longer be able to write for 30 seconds until auto-failover (or an admin) removes the broken node?

I think the problem is that what I really want is a replication solution as opposed to a cluster solution. I'd like a primary master that replicates out to a standby secondary, analogous to our existing master/slave setup for our Postgres db where we use Slony for that purpose. However, I don't think CB currently provides just a replication facility?

I've been mulling over the various A/B failover methods and concluded it's probably not worth the effort compared to just adding a third node and using auto-failover.

However if we do want to just use the 2 CB nodes, we could use our load balancers as the management nodes and write a check for heartbeat to monitor the CB service on both machines, something like:

Check A is alive, check B is alive
If both nodes alive or both nodes dead, no action
If one node is dead and one node is alive, use API on live node to failover dead node and also remove dead node from the load balancer VIP
Disable further actions, notify admins

Would there be any problem with that?

Thanks!
Andrew

Chad Kouse

unread,

Sep 24, 2012, 4:39:50 PM9/24/12

to couc...@googlegroups.com

Andrew,

If you had 10 nodes and 1 node went down, any vbucket that the 1 failed node was the primary host for would be inaccessible for reads or writes until a failover happened. So, in plain english - if 1 out of 10 nodes went down 90% of your data would still be accessible and once you failed over that down node (or got it back up and running) all 100% of your data would be accessible once again.

--

Chad Kouse

Aliaksey Kandratsenka

unread,

Sep 24, 2012, 9:55:52 PM9/24/12

to couc...@googlegroups.com

On Mon, Sep 24, 2012 at 11:00 PM, Andrew Eross <angrygr...@gmail.com> wrote:

However if we do want to just use the 2 CB nodes, we could use our load balancers as the management nodes and write a check for heartbeat to monitor the CB service on both machines, something like:

Check A is alive, check B is alive
If both nodes alive or both nodes dead, no action
If one node is dead and one node is alive, use API on live node to failover dead node and also remove dead node from the load balancer VIP
Disable further actions, notify admins

Would there be any problem with that?

If you run it on both couchbase nodes, then it may cause quite severe problems in case of network partition. I.e. if both nodes are alive but lose network to other one, they'll both failover other one.

As for pure master-slave setup, it is possible to have that in principle, but as of now there's no official support for that in couchbase server. In most cases our approach where master-slave-ness is sharded and spread over nodes is clearly better as it spreads the load. I can only imagine pure master-slave being useful for 2 nodes automatic failover. I.e you could run that autofailover logic on just slave.

It is seemingly possible to run pure master slave if you force single vbucket (that's our name for shard). I don't think anybody tested this particular case, so use caution. Here's how you can do that.

* stop service

* export COUCHBASE_NUM_VBUCKETS=1 (or add it to initscript)

* start service

* create _new_ bucket and based on environment variable it will have exactly one vbucket

* add second node and rebalance

Your _lexicographically_ least node should be master.

NOTE however, there's no way to change number of vbuckets. Thus in this case you're stuck with max 2 nodes and pure master-slave. You will have to recreate your bucket in order to enable elasticity.

Andrew Eross

unread,

Oct 18, 2012, 12:50:53 PM10/18/12

to couc...@googlegroups.com

Hi guys,

I've been researching and slowly working on this (hence the delayed reply, apologies hope you haven't all forgotten what we were chatting about).

Chad - Thank you, and I've been reading the docs about this.. would you mind confirming if I have these points correct?

In the case of my 2 node Couchbase cluster setup:
Roughly 50% of the data is primary on each node (depending on the exact vbucket mapping)
In the case when I had node #2 fail and I started seeing write failures... in theory that was probably only failing for half of the requests coming in
Replicas of the data are kept on the other nodes.. so when my node #2 failed and I pressed the fail-over button on node #1, I didn't actually lose any data, just that any of the data that was primary on node #2 wasn't accessible until the fail-over occurred and node #1 was made the the sole primary for all data.

Aliaksey -

I was talking before about scripting a 2 node fail-over setup...with a check something like this:

Check A is alive, check B is alive
If both nodes alive or both nodes dead, no action
If one node is dead and one node is alive, use API on live node to failover dead node and also remove dead node from the load balancer VIP
Disable further actions, notify admins

You mentioned that if the check was run on the nodes themselves you could get a split brain (definitely agree), but I was thinking about running the checks from a 3rd party server(s). We have 2 other machines we use for load balancers, so I could have them act as the controller. Would you see any problem with this setup?

Thanks!

Andrew

Aliaksey Kandratsenka

unread,

Oct 18, 2012, 12:53:59 PM10/18/12

to couc...@googlegroups.com

On Thu, Oct 18, 2012 at 9:50 AM, Andrew Eross <angrygr...@gmail.com> wrote:

Aliaksey -

I was talking before about scripting a 2 node fail-over setup...with a check something like this:
Check A is alive, check B is alive

If both nodes alive or both nodes dead, no action
If one node is dead and one node is alive, use API on live node to failover dead node and also remove dead node from the load balancer VIP
Disable further actions, notify admins

You mentioned that if the check was run on the nodes themselves you could get a split brain (definitely agree), but I was thinking about running the checks from a 3rd party server(s). We have 2 other machines we use for load balancers, so I could have them act as the controller. Would you see any problem with this setup?

No problem. Unless you ran same script on two load balancers and there's network split separating one balancer node and one couchbase node in first partition, and second balancer node and second couchbase node.

Reply all

Reply to author

Forward