2 Nodes of Galera Cluster (with 3 nodes) restart again and again

681 views
Skip to first unread message

向迎兵

unread,
Oct 10, 2021, 5:53:18 AM10/10/21
to codership
Dear guys,
I spend more than 2 week on galera cluster docker deployment, but failed to set up it.

I searched all most all related topics via google and studied almost all articles/tutorials from bellow links:
My docker-stack.yml is attached.

After I start the docker stack through bellow commands:
echo "prepare clean data directories ******************************************"
rm -rf mariadb01-data  mariadb02-data  mariadb03-data
sleep 1
mkdir  mariadb01-data  mariadb02-data  mariadb03-data

echo "prepare fresh overlay network *******************************************"
docker network rm terra-overlay-net
docker network prune -f
sleep 1
docker network create -d overlay --attachable --subnet 172.16.238.0/24 terra-overlay-net
sleep 1

echo "start services **********************************************************"
docker stack deploy --compose-file=docker-stack.yml terra-mariadb-cluster

According to the logs (mariadb01.log and mariadb02.log are attached)shown through bellow commands:
docker service logs -f terra-mariadb-cluster_mariadb01
docker service logs -f terra-mariadb-cluster_mariadb02
docker service logs -f terra-mariadb-cluster_mariadb03

The first node "mariadb01" DID received msg from other nodes, but disconnected them again and again. The other 2 nodes "mariadb02 and mariadb03" restart again and again with bellow errors (refer to the attachment mariadb02.log for details):

WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20211009 07:08:04.838)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20211009 07:08:05.849)
WSREP_SST: [ERROR] previous SST script still running. (20211009 07:08:05.852)
2021-10-09  7:08:05 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address 'mariadb02' --datadir '/var/lib/mysql/' --parent '1' --mysqld-args --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb03,mariadb02 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb02 --wsrep-node-name=server2 --server-id=2 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2
    Read: '(null)'
2021-10-09  7:08:05 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address 'mariadb02' --datadir '/var/lib/mysql/' --parent '1' --mysqld-args --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb03,mariadb02 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb02 --wsrep-node-name=server2 --server-id=2 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2: 114 (Operation already in progress)
2021-10-09  7:08:05 1 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2021-10-09  7:08:05 1 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
2021-10-09  7:08:05 1 [Note] WSREP: ReplicatorSMM::abort()
2021-10-09  7:08:05 1 [Note] WSREP: Closing send monitor...
2021-10-09  7:08:05 1 [Note] WSREP: Closed send monitor.
2021-10-09  7:08:05 1 [Note] WSREP: gcomm: terminating thread
2021-10-09  7:08:05 1 [Note] WSREP: gcomm: joining thread
2021-10-09  7:08:05 1 [Note] WSREP: gcomm: closing backend
2021-10-09  7:08:06 1 [Note] WSREP: (a06c4fbb-93b6, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-10-09  7:08:06 1 [Note] WSREP: view(view_id(NON_PRIM,831e0fae-987b,16) memb {
    a06c4fbb-93b6,0
} joined {
} left {
} partitioned {
    831e0fae-987b,0
})
2021-10-09  7:08:06 1 [Note] WSREP: PC protocol downgrade 1 -> 0
2021-10-09  7:08:06 1 [Note] WSREP: view((empty))
2021-10-09  7:08:06 1 [Note] WSREP: gcomm: closed
2021-10-09  7:08:06 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2021-10-09  7:08:06 0 [Note] WSREP: Flow-control interval: [16, 16]
2021-10-09  7:08:06 0 [Note] WSREP: Received NON-PRIMARY.
2021-10-09  7:08:06 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 16)
2021-10-09  7:08:06 0 [Note] WSREP: New SELF-LEAVE.
2021-10-09  7:08:06 0 [Note] WSREP: Flow-control interval: [0, 0]
2021-10-09  7:08:06 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2021-10-09  7:08:06 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 16)
2021-10-09  7:08:06 0 [Note] WSREP: RECV thread exiting 0: Success
2021-10-09  7:08:06 1 [Note] WSREP: recv_thread() joined.
2021-10-09  7:08:06 1 [Note] WSREP: Closing replication queue.
2021-10-09  7:08:06 1 [Note] WSREP: Closing slave action queue.
2021-10-09  7:08:06 1 [Note] WSREP: mysqld: Terminated.
211009  7:08:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-1:10.5.12+maria~focal
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336799 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f6e40000c58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6e58272d98 thread_stack 0x49000
mysqld(my_print_stacktrace+0x32)[0x56280b4d93a2]
Printing to addr2line failed
mysqld(handle_fatal_signal+0x485)[0x56280af2c0b5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f6e5b72a3c0]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f6e5b20f941]
/usr/lib/galera/libgalera_smm.so(+0x212f52)[0x7f6e5ae3ef52]
/usr/lib/galera/libgalera_smm.so(+0x6b908)[0x7f6e5ac97908]
/usr/lib/galera/libgalera_smm.so(+0x79955)[0x7f6e5aca5955]
/usr/lib/galera/libgalera_smm.so(+0x7a3bf)[0x7f6e5aca63bf]
/usr/lib/galera/libgalera_smm.so(+0x7a9d4)[0x7f6e5aca69d4]
/usr/lib/galera/libgalera_smm.so(+0xabf88)[0x7f6e5acd7f88]
/usr/lib/galera/libgalera_smm.so(+0xac112)[0x7f6e5acd8112]
/usr/lib/galera/libgalera_smm.so(+0x78e00)[0x7f6e5aca4e00]
/usr/lib/galera/libgalera_smm.so(+0x4efe1)[0x7f6e5ac7afe1]
mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x56280b56f6d2]
mysqld(+0xc50a77)[0x56280b204a77]
mysqld(_Z15start_wsrep_THDPv+0x267)[0x56280b1f5157]
mysqld(+0xbd0a36)[0x56280b184a36]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f6e5b71e609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f6e5b30c293]
...


You can reproduce the problem with above commands (or deploy.sh) easily.

Would you please help fix the bugs in the docker-stack.yml?
Please! I spent too much time on it.




mariadb01.log
docker-stack.yml
deploy.sh
Reply all
Reply to author
Forward
0 new messages