Hi Alex ,
I copied the wsrep_sst_xtrabackup file which was downloaded from
codeship websit to my /usr/bin directory,and renamed it as
wsrep_sst_backup,
and then on a joiner machine which wasn't in the cluster at that
time , issued:
garbd -o gmcast.listen_addr=tcp://
0.0.0.0:3333 -g my_wsrep_cluster -a
gcomm://
10.0.211.79:4567 --sst backup --donor 10.0.211.79
then,the donor seems to work with a minor error :
WSREP_SST: [ERROR] innobackupex /tmp 2> /var/lib/mysql//
innobackup.backup.log | nc (20120111 13:52:08.781)
this is a statement I add in the wsrep_sst_backup script
I reviewed the script ,and konw the variable WSREP_SST_OPT_ADDR didn't
be obtained.
what's wrong with it?
if I want to execute a hotcopy of any node of the cluster, is the
operation above correct?
what are the steps of recovery ?
can you give me some instructions about it ?
many thanks!
Best Regards
yours Danny Pu
Jinan shandong provice China
donor's logs was as follows:
120111 13:52:01 [Note] WSREP: Flow-control interval: [28, 28]
120111 13:52:01 [Note] WSREP: New cluster view: global state:
857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28, view# 103: Primary, number of
nodes: 3, my index: 2, protocol version 2
120111 13:52:01 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
120111 13:52:01 [Note] WSREP: Assign initial position for
certification: 28, protocol version: 2
120111 13:52:01 [Note] WSREP: Member 0 (10.0.211.81) synced with
group.
120111 13:52:07 [Note] WSREP: declaring
0520ccde-5bb3-11e2-0800-5625fa869020 stable
120111 13:52:07 [Note] WSREP: declaring
0833def2-5bb3-11e2-0800-8cca055d837b stable
120111 13:52:07 [Note] WSREP: declaring
688ee448-72f4-11e1-0800-71e26c0e9177 stable
120111 13:52:08 [Note] WSREP: view(view_id(PRIM,
0520ccde-5bb3-11e2-0800-5625fa869020,107) memb {
0520ccde-5bb3-11e2-0800-5625fa869020,
0833def2-5bb3-11e2-0800-8cca055d837b,
688ee448-72f4-11e1-0800-71e26c0e9177,
aea8d302-3bf5-11e1-0800-fba69358a1b5,
} joined {
} left {
} partitioned {
})
120111 13:52:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap
= no, my_idx = 3, memb_num = 4
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: sent state msg:
09b8aca8-5bb3-11e2-0800-27e8b55ea5c5
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
09b8aca8-5bb3-11e2-0800-27e8b55ea5c5 from 0 (10.0.211.81)
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
09b8aca8-5bb3-11e2-0800-27e8b55ea5c5 from 2 (localhost.localdomain)
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
09b8aca8-5bb3-11e2-0800-27e8b55ea5c5 from 3 (10.0.211.79)
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
09b8aca8-5bb3-11e2-0800-27e8b55ea5c5 from 1 (garb)
120111 13:52:08 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 103,
members = 3/4 (joined/total),
act_id = 28,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 857b217e-3a8d-11e1-0800-5eb8ec0b0a53
120111 13:52:08 [Note] WSREP: Flow-control interval: [32, 32]
120111 13:52:08 [Note] WSREP: New cluster view: global state:
857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28, view# 104: Primary, number of
nodes: 4, my index: 3, protocol version 2
120111 13:52:08 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
120111 13:52:08 [Note] WSREP: Assign initial position for
certification: 28, protocol version: 2
120111 13:52:08 [Note] WSREP: Node 1 (garb) requested state transfer
from '10.0.211.79'. Selected 3 (10.0.211.79)(SYNCED) as donor.
120111 13:52:08 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO:
28)
120111 13:52:08 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
120111 13:52:08 [Note] WSREP: Running: 'wsrep_sst_backup --role
'donor' --address '' --auth 'sst:123abc' --socket '/var/lib/mysql/
mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf'
--gtid '857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28''
120111 13:52:08 [Note] WSREP: 1 (garb): State transfer from 3
(10.0.211.79) complete.
120111 13:52:08 [Note] WSREP: sst_donor_thread signaled with 0
120111 13:52:08 [Note] WSREP: declaring
0520ccde-5bb3-11e2-0800-5625fa869020 stable
120111 13:52:08 [Note] WSREP: declaring
688ee448-72f4-11e1-0800-71e26c0e9177 stable
120111 13:52:08 [Note] WSREP: (aea8d302-3bf5-11e1-0800-fba69358a1b5,
'tcp://
0.0.0.0:4567') turning message relay requesting on, nonlive
peers: tcp://
10.0.211.81:3333
120111 13:52:08 [Note] WSREP: view(view_id(PRIM,
0520ccde-5bb3-11e2-0800-5625fa869020,108) memb {
0520ccde-5bb3-11e2-0800-5625fa869020,
688ee448-72f4-11e1-0800-71e26c0e9177,
aea8d302-3bf5-11e1-0800-fba69358a1b5,
} joined {
} left {
} partitioned {
0833def2-5bb3-11e2-0800-8cca055d837b,
})
120111 13:52:08 [Note] WSREP: forgetting
0833def2-5bb3-11e2-0800-8cca055d837b (tcp://
10.0.211.81:3333)
120111 13:52:08 [Note] WSREP: (aea8d302-3bf5-11e1-0800-fba69358a1b5,
120111 13:52:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap
= no, my_idx = 2, memb_num = 3
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: sent state msg:
0a082e2c-5bb3-11e2-0800-8f0715bdf090
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
0a082e2c-5bb3-11e2-0800-8f0715bdf090 from 0 (10.0.211.81)
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
0a082e2c-5bb3-11e2-0800-8f0715bdf090 from 1 (localhost.localdomain)
120111 13:52:08 [Note] WSREP: STATE EXCHANGE: got state msg:
0a082e2c-5bb3-11e2-0800-8f0715bdf090 from 2 (10.0.211.79)
120111 13:52:08 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 104,
members = 3/3 (joined/total),
act_id = 28,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 857b217e-3a8d-11e1-0800-5eb8ec0b0a53
120111 13:52:08 [Note] WSREP: Flow-control interval: [28, 28]
120111 13:52:08 [Note] WSREP: New cluster view: global state:
857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28, view# 105: Primary, number of
nodes: 3, my index: 2, protocol version 2
120111 13:52:08 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
120111 13:52:08 [Note] WSREP: Assign initial position for
certification: 28, protocol version: 2
WSREP_SST: [ERROR] innobackupex /tmp 2> /var/lib/mysql//
innobackup.backup.log | nc (20120111 13:52:08.781)
usage: nc [-46DdhklnrStUuvzC] [-i interval] [-p source_port]
[-s source_ip_address] [-T ToS] [-w timeout] [-X
proxy_version]
[-x proxy_address[:port]] [hostname] [port[s]]
120111 13:52:14 [Note] WSREP: cleaning up
0833def2-5bb3-11e2-0800-8cca055d837b (tcp://
10.0.211.81:3333)
WSREP_SST: [ERROR] innobackupex finished with error: 25. Check /var/
lib/mysql//innobackup.backup.log (20120111 13:52:15.028)
120111 13:52:15 [ERROR] WSREP: Failed to read from: wsrep_sst_backup --
role 'donor' --address '' --auth 'sst:123abc' --socket '/var/lib/mysql/
mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf'
--gtid '857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28'
120111 13:52:15 [ERROR] WSREP: Process completed with error:
wsrep_sst_backup --role 'donor' --address '' --auth 'sst:123abc' --
socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --
defaults-file '/etc/my.cnf' --gtid
'857b217e-3a8d-11e1-0800-5eb8ec0b0a53:28': 22 (Invalid argument)
120111 13:52:15 [Warning] WSREP: Could not find peer:
0833def2-5bb3-11e2-0800-8cca055d837b
120111 13:52:15 [Warning] WSREP: 2 (10.0.211.79): State transfer to -1
(left the group) failed: -1 (Operation not permitted)
joiner's terminal was as follows:
[root@10 pushuaiye]# garbd -o gmcast.listen_addr=tcp://
0.0.0.0:3333 -g
my_wsrep_cluster -a gcomm://
10.0.211.79:4567 --sst backup --donor
10.0.211.79
2013-01-11 14:11:37.379 INFO: Read config:
daemon: 0
address: gcomm://
10.0.211.79:4567
group: my_wsrep_cluster
sst: backup
donor: 10.0.211.79
options: gmcast.listen_addr=tcp://
0.0.0.0:3333;
gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
cfg:
log:
2013-01-11 14:11:37.386 INFO: protonet asio version 0
2013-01-11 14:11:37.387 INFO: backend: asio
2013-01-11 14:11:37.396 INFO: GMCast version 0
2013-01-11 14:11:37.400 INFO: (c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
'tcp://
0.0.0.0:3333') listening at tcp://
0.0.0.0:3333
2013-01-11 14:11:37.400 INFO: (c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
'tcp://
0.0.0.0:3333') multicast: , ttl: 1
2013-01-11 14:11:37.418 INFO: EVS version 0
2013-01-11 14:11:37.423 INFO: PC version 0
2013-01-11 14:11:37.423 INFO: gcomm: connecting to group
'my_wsrep_cluster', peer '
10.0.211.79:4567'
2013-01-11 14:11:37.431 INFO: (c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
'tcp://
0.0.0.0:3333') turning message relay requesting on, nonlive
peers: tcp://
10.0.211.78:4567 tcp://
10.0.211.81:4567
2013-01-11 14:11:37.705 INFO: (c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
'tcp://
0.0.0.0:3333') turning message relay requesting off
2013-01-11 14:11:37.705 INFO: (c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
'tcp://
0.0.0.0:3333') cleaning up established 0x1f29f2f0 which is
duplicate of 0x1f295e20
2013-01-11 14:11:38.902 INFO: declaring
0520ccde-5bb3-11e2-0800-5625fa869020 stable
2013-01-11 14:11:38.902 INFO: declaring
688ee448-72f4-11e1-0800-71e26c0e9177 stable
2013-01-11 14:11:38.902 INFO: declaring aea8d302-3bf5-11e1-0800-
fba69358a1b5 stable
2013-01-11 14:11:39.906 INFO: view(view_id(PRIM,
0520ccde-5bb3-11e2-0800-5625fa869020,109) memb {
0520ccde-5bb3-11e2-0800-5625fa869020,
688ee448-72f4-11e1-0800-71e26c0e9177,
aea8d302-3bf5-11e1-0800-fba69358a1b5,
c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
} joined {
} left {
} partitioned {
})
2013-01-11 14:11:39.930 INFO: gcomm: connected
2013-01-11 14:11:39.930 INFO: Changing maximum packet size to 64500,
resulting msg size: 32636
2013-01-11 14:11:39.930 INFO: Shifting CLOSED -> OPEN (TO: 0)
2013-01-11 14:11:39.930 INFO: Opened channel 'my_wsrep_cluster'
2013-01-11 14:11:39.931 INFO: New COMPONENT: primary = yes, bootstrap
= no, my_idx = 3, memb_num = 4
2013-01-11 14:11:39.931 INFO: STATE EXCHANGE: Waiting for state UUID.
2013-01-11 14:11:39.931 INFO: STATE EXCHANGE: sent state msg:
c35812c8-5bb5-11e2-0800-708ccd89c894
2013-01-11 14:11:39.931 INFO: STATE EXCHANGE: got state msg:
c35812c8-5bb5-11e2-0800-708ccd89c894 from 0 (10.0.211.81)
2013-01-11 14:11:39.931 INFO: STATE EXCHANGE: got state msg:
c35812c8-5bb5-11e2-0800-708ccd89c894 from 1 (localhost.localdomain)
2013-01-11 14:11:39.931 INFO: STATE EXCHANGE: got state msg:
c35812c8-5bb5-11e2-0800-708ccd89c894 from 2 (10.0.211.79)
2013-01-11 14:11:39.933 INFO: STATE EXCHANGE: got state msg:
c35812c8-5bb5-11e2-0800-708ccd89c894 from 3 (garb)
2013-01-11 14:11:39.933 INFO: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 105,
members = 3/4 (joined/total),
act_id = 28,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 857b217e-3a8d-11e1-0800-5eb8ec0b0a53
2013-01-11 14:11:39.933 INFO: Flow-control interval: [9999999,
9999999]
2013-01-11 14:11:39.933 INFO: Shifting OPEN -> PRIMARY (TO: 28)
2013-01-11 14:11:39.933 INFO: Sending state transfer request:
'backup', size: 6
2013-01-11 14:11:39.934 INFO: Node 3 (garb) requested state transfer
from '10.0.211.79'. Selected 2 (10.0.211.79)(SYNCED) as donor.
2013-01-11 14:11:39.934 INFO: Shifting PRIMARY -> JOINER (TO: 28)
2013-01-11 14:11:39.934 INFO: Closing send monitor...
2013-01-11 14:11:39.934 INFO: Closed send monitor.
2013-01-11 14:11:39.934 INFO: gcomm: terminating thread
2013-01-11 14:11:39.934 INFO: gcomm: joining thread
2013-01-11 14:11:39.935 INFO: gcomm: closing backend
2013-01-11 14:11:39.964 INFO: view(view_id(NON_PRIM,
0520ccde-5bb3-11e2-0800-5625fa869020,109) memb {
c1d7fb8e-5bb5-11e2-0800-58929ca59ad6,
} joined {
} left {
} partitioned {
0520ccde-5bb3-11e2-0800-5625fa869020,
688ee448-72f4-11e1-0800-71e26c0e9177,
aea8d302-3bf5-11e1-0800-fba69358a1b5,
})
2013-01-11 14:11:39.964 INFO: view((empty))
2013-01-11 14:11:39.964 INFO: gcomm: closed
2013-01-11 14:11:39.966 INFO: 3 (garb): State transfer from 2
(10.0.211.79) complete.
2013-01-11 14:11:39.966 INFO: Shifting JOINER -> JOINED (TO: 28)
2013-01-11 14:11:39.966 WARN: 0x1f26ee80 down context(s) not set
2013-01-11 14:11:39.966 WARN: Failed to send SYNC signal: -107
(Transport endpoint is not connected)
2013-01-11 14:11:39.966 INFO: New COMPONENT: primary = no, bootstrap
= no, my_idx = 0, memb_num = 1
2013-01-11 14:11:39.966 INFO: Flow-control interval: [9999999,
9999999]
2013-01-11 14:11:39.966 INFO: Received NON-PRIMARY.
2013-01-11 14:11:39.967 INFO: Shifting JOINED -> OPEN (TO: 28)
2013-01-11 14:11:39.967 INFO: Received self-leave message.
2013-01-11 14:11:39.967 INFO: Flow-control interval: [9999999,
9999999]
2013-01-11 14:11:39.967 INFO: Received SELF-LEAVE. Closing
connection.
2013-01-11 14:11:39.967 INFO: Shifting OPEN -> CLOSED (TO: 28)
2013-01-11 14:11:39.967 INFO: RECV thread exiting 0: Success
2013-01-11 14:11:39.969 INFO: recv_thread() joined.
2013-01-11 14:11:39.970 INFO: Closing slave action queue.
2013-01-11 14:11:39.970 WARN: Attempt to close a closed connection
2013-01-11 14:11:39.970 INFO: Exiting main loop
2013-01-11 14:11:39.971 INFO: Shifting CLOSED -> DESTROYED (TO: 28)