Contanerized 2 node cluster. Original "primary" shut down unexpectedly, what is the proper method to rejoin it to the cluster?

72 views
Skip to first unread message

YeeP

unread,
Aug 9, 2017, 12:11:00 PM8/9/17
to codership
So I started a cluster using the mariadb 10.1 here: https://hub.docker.com/_/mariadb/

Here are my two run commands for the two containers:
mariadb
10.0.0.1
docker run \
  --name mariadb-0 \
  --restart=unless-stopped \
  -d \
  -v /root/mariadb:/etc/mysql/conf.d \
  -v /data/mariadb:/var/lib/mysql \
  -e MYSQL_INITDB_SKIP_TZINFO=yes \
  -e MYSQL_ROOT_PASSWORD=secret \
  -p 3306:3306 \
  -p 4567:4567/udp \
  -p 4567-4568:4567-4568 \
  -p 4444:4444 \
  mariadb:10.1 \
  --wsrep-new-cluster \
  --wsrep_node_address=10.0.0.1 \
  --wsrep_cluster_name=lamaria

  10.0.0.2
  docker run \
  --name mariadb-1 \
  --restart=unless-stopped \
  -d \
  -v /root/mariadb:/etc/mysql/conf.d \
  -v /data/mariadb:/var/lib/mysql \
  -e MYSQL_ROOT_PASSWORD=secret \
  -p 3306:3306 \
  -p 4567:4567/udp \
  -p 4567-4568:4567-4568 \
  -p 4444:4444 \
  mariadb:10.1 \
  --wsrep_node_address=10.0.0.2 \
  --wsrep_cluster_name=lamaria 

-----------------------------
As you can see, the 10.0.0.1 node, is the one that was started first in chronological order, based on the addition of "--wsrep-new-cluster". This is also the node that went down unexpectedly. If I try to start it with the exact same run command (which should not work because of the "safe-to-bootsrap" feature in 10.x.). Reguarless here is the result in trying to start that node:

2017-08-08 17:59:56 140364513134528 [Note] mysqld (mysqld 10.1.22-MariaDB-1~jessie) starting as process 1 ...
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-08-08 17:59:56 140364513134528 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2017-08-08 17:59:56 140364513134528 [Note] WSREP: wsrep_load(): Galera 25.3.19(r3667) by Codership Oy <in...@codership.com> loaded successfully.
2017-08-08 17:59:56 140364513134528 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Found saved state: 00a495d6-0e4b-11e7-b9cf-0ff280071f90:-1, safe_to_bootsrap: 0
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.21.1.34; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ign
2017-08-08 17:59:56 140364513134528 [Note] WSREP: GCache history reset: old(00a495d6-0e4b-11e7-b9cf-0ff280071f90:0) -> new(00a495d6-0e4b-11e7-b9cf-0ff280071f90:-1)
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-08-08 17:59:56 140364513134528 [Note] WSREP: wsrep_sst_grab()
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Start replication
2017-08-08 17:59:56 140364513134528 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
2017-08-08 17:59:56 140364513134528 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-08-08 17:59:56 140364513134528 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2017-08-08 17:59:56 140364513134528 [ERROR] WSREP: wsrep::connect(gcomm://10.0.0.1,10.0.0.2) failed: 7

------------------------
So finally we come to my issue. At this point, the only way I have come up with starting the second node, is to shut down the current node that is running, and restart it as the "primary", then join the cluster with the one that went down unexpectantly. Honestly, this methodology seems stupid to me. What is the point in having a cluster that requires this kind of input when one node goes down? If I have one node running, I should be able to rejoin reguarless of which side goes down, as long as one is currently running. At least, that is what I believe. Can anyone give me any tips on a way that I can join this cluster with the node that went down, that does not require this kind of process? 

Thank you,
Ryan

Jörg Brühe

unread,
Aug 9, 2017, 1:16:42 PM8/9/17
to codersh...@googlegroups.com
Hallo Ryan, all!


On 09.08.2017 17:42, YeeP wrote:
> So I started a cluster using the mariadb 10.1
> here: https://hub.docker.com/_/mariadb/
>
> Here are my two run commands for the two containers:
> mariadb
> 10.0.0.1
> docker run \
> [[...]]
> --wsrep-new-cluster \
> --wsrep_node_address=10.0.0.1 \
> --wsrep_cluster_name=lamaria
>
> 10.0.0.2
> docker run \
> [[...]]
> --wsrep_node_address=10.0.0.2 \
> --wsrep_cluster_name=lamaria
>
> -----------------------------
> As you can see, the 10.0.0.1 node, is the one that was started first in
> chronological order, based on the addition of "--wsrep-new-cluster".
> This is also the node that went down unexpectedly. If I try to start it
> with the exact same run command (which should not work because of the
> "safe-to-bootsrap" feature in 10.x.). Reguarless here is the result in
> trying to start that node:
>
> [[...]]
>
> ------------------------
> Note: some of my thoughts here come from this
> article: http://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/
> So finally we come to my issue. At this point, the only way I have come
> up with starting the second node, is to shut down the current node that
> is running, and restart it as the "primary", then join the cluster with
> the one that went down unexpectantly. Honestly, this methodology seems
> stupid to me. What is the point in having a cluster that requires this
> kind of input when one node goes down? If I have one node running, I
> should be able to rejoin reguarless of which side goes down, as long as
> one is currently running. At least, that is what I believe. Can anyone
> give me any tips on a way that I can join this cluster with the node
> that went down, that does not require this kind of process?

A Galera cluster is fully symmetric.
The start option "--wsrep-new-cluster" does not give that node any
special property, rather it tells the node to start a new cluster
without having any partner node (they will join later).

"Primary" is no permanent node property either, it only describes that
the node currently is part of a majority.

Just restart any failed node without the "--wsrep-new-cluster" option
(provided at least one other node is still up and running).
You best make sure that the config files of all nodes contain the full
list of nodes, so that a restarting node knows which ones to connect.


HTH,
Jörg

--
Joerg Bruehe, Senior MySQL Support Engineer, joerg....@fromdual.com
FromDual GmbH, Rebenweg 6, CH - 8610 Uster; phone +41 44 500 58 26
Geschäftsführer: Oliver Sennhauser
Handelsregister-Eintrag: CH-020.4.044.539-3

YeeP

unread,
Aug 10, 2017, 9:57:09 AM8/10/17
to codership
Thank you Jörg! That totally makes sense and I appreciate it.

Arjen Lentz

unread,
Aug 10, 2017, 8:03:55 PM8/10/17
to codership-team
Hi Ryan

One should also note that a 2-node cluster is not particularly useful in cluster terms.

Consider the split brain problem.
If one node doesn't see the other node any more, it can't know whether the other node is actually dead, or if there is "just" a connection problem between the two nodes.
This, naturally, applies to both nodes. Neither of them can decide whether it is the "last one standing", and thus they cannot take over. So at this point your cluster is dead.

You can change the weight of a node, for instance you could say that node B weighs more than node A. So if communication fails, and B happens to still be working, it'll take control of the cluster (much as it is, with one node) and continue to serve queries from your application. Node A, if it's still alive, will have less than 50% of the weight on its own, and thus it won't serve any queries.
While this may seem like a shiny improvement, it doesn't really help much for the 2 node scenario because you can't predict which node or data center (if separated) is going to go fail. Thus, it will only help if node A fails, and not if node B fails. Not ideal.

It's better to have at least 3 nodes, or an arbitrator on a separate physical machine (and possibly location).
Note that an arbitrator gets all the cluster traffic anyway, so having an arbitrator that is not a node itself is not really more efficient.


Regards,
Arjen.
--
Arjen Lentz, Exec.Director @ Open Query Pty Ltd
Your databases and online infrastructure. Managed.
Web https://openquery.com.au | Twitter @openquery
Reply all
Reply to author
Forward
0 new messages