Adding a docker instance to an existing cluster

29 views
Skip to first unread message

pl...@asperasoft.com

unread,
Nov 10, 2017, 12:55:28 AM11/10/17
to codership
Hi All,

I have three docker instances running, lets say: docker-1, docker-2 and docker-3. Currently, I have a small logic before the startup of the cluster, to select the seed docker instance.

The logic queries on the Cluster state port for their peers status,

For eg: docker-1 executes nc docker-2 5500(5500 is the port number, its a cluster state port) and it returns "docker-2":{"uuid":"xxx-xxxx-xxxx","seqno":-1,"safe_to_bootstrap":0,"host":"7315f1cb"}

The logic basically depends on the sequence number, if nothing is present(like initial stage), it selects the docker-3. This works out for a graceful startup/shutdown too. As during a graceful recovery of the whole cluster, the docker instance with the largest sequence number gets the seed ability. During start up the sequence number stays non-negative

But I found a case, where my logic is incorrect.

Consider there are 3 dockers running, their actual sequence numbers are:

docker-1's sequence no: 47500
docker-2's sequence no: 47500
docker-3's sequence no: 47500

But the grastate.dat file shows the sequence for each of the running instance as -1. (some say this is expected, but I dont understand why)

Now, I shutdown docker-3 gracefully, and update something on docker-2, so now the sequence number is:

docker-1's sequence no: 47502
docker-2's sequence no: 47502
docker-3's sequence no: 47500 ( This is offline so does not get updated)

Now, I start docker-3 again, and expect it to join the cluster, and update itself from docker-2 or docker-1. ( Am I correct? )


But when docker-3 starts again, it queries for its peers status:

from docker1: {"uuid":"xxx-xxxx-xxxx","seqno":-1,"safe_to_bootstrap":0,"host":"abcg1234"} because docker-1 is running from a long time

from docker2: {"uuid":"xxx-xxxx-xxxx","seqno":-1,"safe_to_bootstrap":0,"host":"7315f1cb"}  because docker-2 is running from a long time

and thinks itself as the leading because the seqno of docker-3 is 47500. Hence it starts a new cluster instead of joining into the existing one.

So I have two questions:

1) How can I query the latest GTID/sequence number from peers other that doing an nc (netcat) to their cluster state port? ( I can always query the peer's mysql, but is there a better solution)

2) If you know a better workaround or feel this method is incorrect, can you please suggest a better method?


This situation can occur in Marathon, where marathon finds a docker unhealthy it restarts the docker and I think my case is like a restart.


Lammert Bies

unread,
Nov 14, 2017, 4:47:31 AM11/14/17
to codership
I think you are missing the point of a database cluster. What you are trying to do here is to define some automated external logic to start a cluster from unsynced nodes. But the main idea behind a database cluster is that it never stops and that there will be always a primary component to connect to. In case all cluster nodes fail, normally manually check of the fail reason and cluster state is the best way to determine how to start the cluster again.

If full cluster failure is normal, or if your application is designed in such a way that stopping of all nodes is part of normal daily operation, you may as well reconsider the reasons to use a cluster because obviously you don't need it for high-availability reasons.
Reply all
Reply to author
Forward
0 new messages