Hi,
I cannot restart the cluster with the scenario below.
What's wrong with my procedure?
(1) Three-node cluster is running. Node#1, #2, #3, respectively.
(2) Kill node#1 on purpose by 'killall -9 mysqld_safe; killall -9 mysqld'.
(3) Kill node#2, too.
In this scenario, assume that node#2 will not be available for some time.
(4) Restart node#1 by 'mysqld_safe &' to join the cluster again,
but node#1 won't start completely.
Node#1 continues to output error log like:
151125 0:32:40 [Note] WSREP: (71e534c1, 'tcp://
0.0.0.0:4567') address 'tcp://
192.168.0.125:4567' pointing to uuid 71e534c1 is blacklisted, skipping
151125 0:32:40 [Note] WSREP: (71e534c1, 'tcp://
0.0.0.0:4567') address 'tcp://
192.168.0.125:4567' pointing to uuid 71e534c1 is blacklisted, skipping
151125 0:32:41 [Note] WSREP: (71e534c1, 'tcp://
0.0.0.0:4567') address 'tcp://
192.168.0.125:4567' pointing to uuid 71e534c1 is blacklisted, skipping
At the same time, node#3 continues to output error log like:
151125 0:31:37 [Note] WSREP: (a0214ca0, 'tcp://
0.0.0.0:4567') reconnecting to dd1fa25c (tcp://
192.168.0.124:4567), attempt 330
151125 0:32:08 [Note] WSREP: (a0214ca0, 'tcp://
0.0.0.0:4567') reconnecting to dd1fa25c (tcp://
192.168.0.124:4567), attempt 360
151125 0:32:38 [Note] WSREP: (a0214ca0, 'tcp://
0.0.0.0:4567') reconnecting to dd1fa25c (tcp://
192.168.0.124:4567), attempt 390
Their IP addresses are:
node#1 192.168.0.125
node#2 192.168.0.124
node#3 192.168.0.153
(5) As a try, restart node#2 by 'mysqld_safe &'.
Then, node#1 and #3 proceeded, and finally become ready to connect.
Node#1's error log above was followed by:
151125 0:48:24 [Note] WSREP: declaring a0214ca0 at tcp://
192.168.0.153:4567 stable
151125 0:48:24 [Note] WSREP: declaring dd1fa25c at tcp://
192.168.0.124:4567 stable
151125 0:48:24 [Note] WSREP: re-bootstrapping prim from partitioned components
151125 0:48:24 [Note] WSREP: view(view_id(PRIM,71e534c1,25) memb {
71e534c1,0
a0214ca0,0
dd1fa25c,0
} joined {
} left {
} partitioned {
})
151125 0:48:24 [Note] WSREP: save pc into disk
(snip)
Node#3's error log above was followed by:
151125 0:48:24 [Note] WSREP: declaring 71e534c1 at tcp://
192.168.0.125:4567 stable
151125 0:48:24 [Note] WSREP: declaring dd1fa25c at tcp://
192.168.0.124:4567 stable
151125 0:48:24 [Note] WSREP: re-bootstrapping prim from partitioned components
151125 0:48:24 [Note] WSREP: view(view_id(PRIM,71e534c1,25) memb {
71e534c1,0
a0214ca0,0
dd1fa25c,0
} joined {
} left {
} partitioned {
})
(snip)
I'm using mariadb-galera-10.0.20-linux-x86_64 on CentOS 6.
Regards,