Multi region cluster vs multiple clusters

Iain Roberts

unread,

Dec 8, 2021, 12:57:29 AM12/8/21

to RavenDB - an awesome database

Hi,

I am weighing up how to have high availability and would apprechiate your toughts.

I will be using 2 azure regions with a stated round trip latency of 14ms (or maybe azure + aws). I believe that ravendb suggest approach is 2 clusters with bi-directional replication between them.

I am comparing this to a multi region cluster where the secondary region includes some non-voting nodes to avoid cluster wide transaction latency/availability by staying to the primary region. For example 3 nodes in primary and 3 nodes in secondary (1 non-voting).

In a fail over I would promote the 1 non-voting to voting, demote the other, and move the preferred node over (moving preferred may be automatic). Some of which will be challenging if the regions are not talking.

The main reason I'm considering to have 1 cluster is to make use of ravendb smarts around failover and recover within a cluster. I doubt I can match the thought that has gone into it. There are several scenarios in which I believe ravendb will take care of the fail over:

1. RavenDb is down in primary region. Application layer in primary will failover to secondary ravendb.
2. Application layer is down in primary region. Users can use secondary application layer with primary ravendb layer (while preferred node is moved to secondary).

3. Always doing some reads from the secondary. Well maybe external replication is as fast as cluster replication, I am not sure.

Downsides
1. Might be slow to fail over?
2. 1 node down in primary now requires secondary for voting

I have found the following useful:
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/15-production-deployments#using-replication-outside-the-cluster

https://ravendb.net/articles/ravendb-and-multi-region-setup
https://ayende.com/blog/185057-C/analyzing-the-github-outage

Thanks,

Iain

Oren Eini (Ayende Rahien)

unread,

Dec 8, 2021, 4:53:04 AM12/8/21

to rav...@googlegroups.com

We need to make a distinction here between cluster nodes and database group members.

In a cluster, a node may be voting or not.

A database group member is a part of a gossip network and can always accept writes.

In a multi region cluster, you may have 3 nodes in us-east-1 and 3 nodes in us-west-1, let's say.

If us-east-1 is down, you will have automatic failover to the nodes in us-west-1, no action needed. Certain operations (creating databases, creating indexes, subscription, etc) will not work, however.

Reads & writes will continue to run without issue.

Note that you _cannot_ make the non voting members into voting members without the cluster being healthy (majority of voting nodes are up).

There are emergency operations that you can take, but they complicate life afterward when the issue is done.

You seem to want the failover to us-east-1 cluster will cause the application on us-east-1 to talk to the database nodes on us-west-1. I would consider that a problem, not a feature.

That is what happened to GitHub, see:

https://ayende.com/blog/185057-C/analyzing-the-github-outage

Given 14 ms latency between the regions, let's assume you make 10 database queries per request.

When running locally, you'll have page load time of < 25 ms, most likely.

When running cross region, you'll have page load time of > 200 ms.

That compounds very quickly and you are likely better to just direct all traffic to the other region, instead.

--
You received this message because you are subscribed to the Google Groups "RavenDB - an awesome database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/487a257b-afae-4ef3-b894-4581d0c68be1n%40googlegroups.com.

--

Oren Eini
CEO / Hibernating Rhinos LTD

Mobile: 972-52-548-6969

Sales: sa...@ravendb.net

Skype: ayenderahien

Support: sup...@ravendb.net

Oren Eini (Ayende Rahien)

unread,

Dec 8, 2021, 4:53:57 AM12/8/21

to rav...@googlegroups.com

Also, just to node, with 3 nodes in each region, you need two nodes in the primary region to fail to have an issue with the cluster.

Reply all

Reply to author

Forward