MariyaDB clutter joiner cannot connect with doner

503 views
Skip to first unread message

parakrama55

unread,
Dec 23, 2015, 7:57:52 AM12/23/15
to codership
Hi All 

Im trying  to setup  3 node mariyaDB cluster in EC2 . 


So i created first node and started the node with service mysql start --wsrep-new-cluster

Node : 172.19.1.29 (config)

[mysqld]
#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_doublewrite=0
innodb_support_xa=0
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="ncluster"
wsrep_cluster_address="gcomm://172.19.1.29,172.19.1.27"
#wsrep_sst_method=xtrabackup
wsrep_sst_method=xtrabackup-v2
wsrep_slave_threads=1
innodb_locks_unsafe_for_binlog=1
wsrep_sst_auth=galera:BCCSmNhHyFfLU
wsrep_provider_options="socket.ssl_cert=/etc/mysql/cert.pem; socket.ssl_key=/etc/mysql/key.pem"
#wsrep_node_address=172.19.1.29

[sst]
encrypt=3
tkey=/etc/mysql/key.pem
tcert=/etc/mysql/cert.pem

Then i add the second node to the cluster It god added without any issue

Node : 172.19.1.27 (config)

[mysqld]
#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_doublewrite=0
innodb_support_xa=0
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="ncluster"
wsrep_cluster_address="gcomm://172.19.1.29"
#wsrep_sst_method=xtrabackup
wsrep_sst_method=xtrabackup-v2
wsrep_slave_threads=1
innodb_locks_unsafe_for_binlog=1
wsrep_sst_auth=galera:BCCSmNhHyFfLU
wsrep_provider_options="socket.ssl_cert=/etc/mysql/cert.pem; socket.ssl_key=/etc/mysql/key.pem"
#wsrep_node_address=172.19.1.190

[sst]
encrypt=3
tkey=/etc/mysql/key.pem
tcert=/etc/mysql/cert.pem




Then i added some data to the cluster via primary node and secondary node And data syncing worked fine between nodes 


Then i tryied to add the thired ( node but  when i added it i get the following errors in the logs . I have attached the error logs of both doner and joner do you guys have any clue why im getting this error ?  please advice 

New node configs (172.19.1.191)
========================

[mysqld]
#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_doublewrite=0
innodb_support_xa=0
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="ncluster"
wsrep_cluster_address="gcomm://172.19.1.29,172.19.1.27"
#wsrep_sst_method=xtrabackup
wsrep_sst_method=xtrabackup-v2
wsrep_slave_threads=1
innodb_locks_unsafe_for_binlog=1
wsrep_sst_auth=galera:BCCSmNhHyFfLU
wsrep_provider_options="socket.ssl_cert=/etc/mysql/cert.pem; socket.ssl_key=/etc/mysql/key.pem"


[sst]
encrypt=3
tkey=/etc/mysql/key.pem
tcert=/etc/mysql/cert.pem



Error in Joiner noide (172.19.1.191)
=========================





Dec 23 11:52:07 ip-172-19-1-191 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Dec 23 11:52:07 ip-172-19-1-191 mysqld_safe: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.C7AWL7' --pid-file='/var/lib/mysql/ip-172-19-1-191-recover.pid'
Dec 23 11:52:07 ip-172-19-1-191 mysqld: 151223 11:52:07 [Note] /usr/sbin/mysqld (mysqld 10.0.22-MariaDB-1~trusty-wsrep-log) starting as process 4629 ...
Dec 23 11:52:10 ip-172-19-1-191 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] /usr/sbin/mysqld (mysqld 10.0.22-MariaDB-1~trusty-wsrep-log) starting as process 4678 ...
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Read nil XID from storage engines, skipping position init
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <in...@codership.com> loaded successfully.
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: CRC-32C: using hardware acceleration.
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Passing config to GCS: base_host = 172.19.1.191; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.n
Dec 23 11:52:10 ip-172-19-1-191 mysqld: pvo = false; pc.recover
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: wsrep_sst_grab()
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Start replication
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: protonet asio version 0
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: Using CRC-32C for message checksums.
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: initializing ssl context
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: backend: asio
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: restore pc from disk failed
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: GMCast version 0
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: (999de2e6, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: (999de2e6, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: EVS version 0
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: gcomm: connecting to group 'ncluster', peer '172.19.1.29:,172.19.1.27:'
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.19.1.29:4567 local endpoint ssl://172.19.1.191:45020 cipher: AES128-SHA compression: 
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: (999de2e6, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
Dec 23 11:52:10 ip-172-19-1-191 mysqld: 151223 11:52:10 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.19.1.27:4567 local endpoint ssl://172.19.1.191:51430 cipher: AES128-SHA compression: 
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: declaring 78e5a847 at ssl://172.19.1.29:4567 stable
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: declaring 7d4c14a0 at ssl://172.19.1.27:4567 stable
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Node 78e5a847 state prim
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: view(view_id(PRIM,78e5a847,29) memb {
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #01178e5a847,0
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #0117d4c14a0,0
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011999de2e6,0
Dec 23 11:52:11 ip-172-19-1-191 mysqld: } joined {
Dec 23 11:52:11 ip-172-19-1-191 mysqld: } left {
Dec 23 11:52:11 ip-172-19-1-191 mysqld: } partitioned {
Dec 23 11:52:11 ip-172-19-1-191 mysqld: })
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: save pc into disk
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: gcomm: connected
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Opened channel 'ncluster'
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: sent state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 0 (ip-172-19-1-29)
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 1 (ip-172-19-1-27)
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Waiting for SST to complete.
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 2 (ip-172-19-1-191)
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Quorum results:
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011version    = 3,
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011component  = PRIMARY,
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011conf_id    = 28,
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011members    = 2/3 (joined/total),
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011act_id     = 28,
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011last_appl. = -1,
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011group UUID = 3d580fc8-a862-11e5-a4c0-27d206784152
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Flow-control interval: [28, 28]
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 28)
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: State transfer required: 
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011Group state: 3d580fc8-a862-11e5-a4c0-27d206784152:28
Dec 23 11:52:11 ip-172-19-1-191 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: New cluster view: global state: 3d580fc8-a862-11e5-a4c0-27d206784152:28, view# 29: Primary, number of nodes: 3, my index: 2, protocol version 3
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Warning] WSREP: Gap in state sequence. Need state transfer.
Dec 23 11:52:11 ip-172-19-1-191 mysqld: 151223 11:52:11 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.19.1.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '4678' --binlog '/var/log/mysql/mariadb-bin' '
Dec 23 11:52:11 ip-172-19-1-191 mysqld: WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20151223 11:52:11.966)
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Streaming with xbstream
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Using socat as streamer
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Using openssl based encryption with socat: with key and crt
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Decrypting with certificate /etc/mysql/cert.pem, key /etc/mysql/key.pem
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Stale sst_in_progress file: /var/lib/mysql//sst_in_progress
Dec 23 11:52:11 ip-172-19-1-191 -wsrep-sst-joiner: Evaluating timeout -k 110 100 socat -u openssl-listen:4444,reuseaddr,cert=/etc/mysql/cert.pem,key=/etc/mysql/key.pem,verify=0 stdio | xbstream -x; RC=( ${PIPESTATUS[@]} )
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.19.1.191:4444/xtrabackup_sst//1
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: REPL Protocols: 7 (3, 2)
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Assign initial position for certification: 28, protocol version: 3
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (3d580fc8-a862-11e5-a4c0-27d206784152): 1 (Operation not permitted)
Dec 23 11:52:12 ip-172-19-1-191 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Member 2.0 (ip-172-19-1-191) requested state transfer from '*any*'. Selected 0.0 (ip-172-19-1-29)(SYNCED) as donor.
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 28)
Dec 23 11:52:12 ip-172-19-1-191 mysqld: 151223 11:52:12 [Note] WSREP: Requesting state transfer: success, donor: 0
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Proceeding with SST
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Evaluating socat -u openssl-listen:4444,reuseaddr,cert=/etc/mysql/cert.pem,key=/etc/mysql/key.pem,verify=0 stdio | xbstream -x; RC=( ${PIPESTATUS[@]} )
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Cleaning the existing datadir and innodb-data/log directories
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/ib_logfile1'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/ib_logfile0'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log_control'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log.00000001'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/ibdata1'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Cleaning the binlog directory /var/log/mysql as well
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/log/mysql/mariadb-bin.000001'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Dec 23 11:52:14 ip-172-19-1-191 mysqld: 151223 11:52:14 [Note] WSREP: (999de2e6, 'ssl://0.0.0.0:4567') turning message relay requesting off
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: #007xbstream: Can't create/write to file './ibdata1' (Errcode: 17 - File exists)
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: xbstream: failed to create file.
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: 2015/12/23 11:52:22 socat[4949] E write(1, 0x21fad30, 8192): Broken pipe
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: Error while getting data from donor node:  exit codes: 1 1
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: Cleanup after exit with status:32
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.19.1.191' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '4678' --binlog '/var/log/mysql/mariadb-bin' : 32 (Broken pipe)
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [ERROR] WSREP: SST failed: 32 (Broken pipe)
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [ERROR] Aborting
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Warning] WSREP: 0.0 (ip-172-19-1-29): State transfer to 2.0 (ip-172-19-1-191) failed: -22 (Invalid argument)
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: gcomm: terminating thread
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: gcomm: joining thread
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: gcomm: closing backend
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: view(view_id(NON_PRIM,78e5a847,29) memb {
Dec 23 11:52:22 ip-172-19-1-191 mysqld: #011999de2e6,0
Dec 23 11:52:22 ip-172-19-1-191 mysqld: } joined {
Dec 23 11:52:22 ip-172-19-1-191 mysqld: } left {
Dec 23 11:52:22 ip-172-19-1-191 mysqld: } partitioned {
Dec 23 11:52:22 ip-172-19-1-191 mysqld: #01178e5a847,0
Dec 23 11:52:22 ip-172-19-1-191 mysqld: #0117d4c14a0,0
Dec 23 11:52:22 ip-172-19-1-191 mysqld: })
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: view((empty))
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: gcomm: closed
Dec 23 11:52:22 ip-172-19-1-191 mysqld: 151223 11:52:22 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Dec 23 11:52:22 ip-172-19-1-191 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
Dec 23 11:52:39 ip-172-19-1-191 /etc/init.d/mysql[5227]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Dec 23 11:52:39 ip-172-19-1-191 /etc/init.d/mysql[5227]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Dec 23 11:52:39 ip-172-19-1-191 /etc/init.d/mysql[5227]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'
Dec 23 11:52:39 ip-172-19-1-191 /etc/init.d/mysql[5227]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!





Doner node error (192.168.1.29)
=======================

Dec 23 11:52:11 ip-172-19-1-29 mysqld: #01178e5a847,0
Dec 23 11:52:11 ip-172-19-1-29 mysqld: #0117d4c14a0,0
Dec 23 11:52:11 ip-172-19-1-29 mysqld: #011999de2e6,0
Dec 23 11:52:11 ip-172-19-1-29 mysqld: } joined {
Dec 23 11:52:11 ip-172-19-1-29 mysqld: } left {
Dec 23 11:52:11 ip-172-19-1-29 mysqld: } partitioned {
Dec 23 11:52:11 ip-172-19-1-29 mysqld: })
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: save pc into disk
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 9a16972b-a96b-11e5-9ec7-c38b369e2e62
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: sent state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 0 (ip-172-19-1-29)
Dec 23 11:52:11 ip-172-19-1-29 mysqld: 151223 11:52:11 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 1 (ip-172-19-1-27)
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: STATE EXCHANGE: got state msg: 9a16972b-a96b-11e5-9ec7-c38b369e2e62 from 2 (ip-172-19-1-191)
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Quorum results:
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011version    = 3,
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011component  = PRIMARY,
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011conf_id    = 28,
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011members    = 2/3 (joined/total),
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011act_id     = 28,
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011last_appl. = 0,
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
Dec 23 11:52:12 ip-172-19-1-29 mysqld: #011group UUID = 3d580fc8-a862-11e5-a4c0-27d206784152
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Flow-control interval: [28, 28]
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: New cluster view: global state: 3d580fc8-a862-11e5-a4c0-27d206784152:28, view# 29: Primary, number of nodes: 3, my index: 0, protocol version 3
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: REPL Protocols: 7 (3, 2)
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Assign initial position for certification: 28, protocol version: 3
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Member 2.0 (ip-172-19-1-191) requested state transfer from '*any*'. Selected 0.0 (ip-172-19-1-29)(SYNCED) as donor.
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 28)
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.19.1.191:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog '/var/log/mysql/mariadb-bin' --gtid '3d580fc8-a862-11e5-a4c0-27d206784152:28''
Dec 23 11:52:12 ip-172-19-1-29 mysqld: 151223 11:52:12 [Note] WSREP: sst_donor_thread signaled with 0
Dec 23 11:52:12 ip-172-19-1-29 mysqld: WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20151223 11:52:12.598)
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Streaming with xbstream
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Using socat as streamer
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Using openssl based encryption with socat: with key and crt
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Encrypting with certificate /etc/mysql/cert.pem, key /etc/mysql/key.pem
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Using /tmp/tmp.6q769bDash as innobackupex temporary directory
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Streaming GTID file before SST
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Evaluating xbstream -c ${INFO_FILE} | socat -u stdio openssl-connect:172.19.1.191:4444,cert=/etc/mysql/cert.pem,key=/etc/mysql/key.pem,verify=0; RC=( ${PIPESTATUS[@]} )
Dec 23 11:52:12 ip-172-19-1-29 -wsrep-sst-donor: Sleeping before data transfer for SST
Dec 23 11:52:14 ip-172-19-1-29 mysqld: 151223 11:52:14 [Note] WSREP: (78e5a847, 'ssl://0.0.0.0:4567') turning message relay requesting off
Dec 23 11:52:22 ip-172-19-1-29 -wsrep-sst-donor: Streaming the backup to joiner at 172.19.1.191 4444
Dec 23 11:52:22 ip-172-19-1-29 -wsrep-sst-donor: Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> >(logger -p daemon.err -t -innobackupex-backup) | socat -u stdio openssl-connect:172.19.1.191:4444,cert=/etc/mysql/cert.pem,key=/etc/mysql/key.pem,verify=0; RC=( ${PIPESTATUS[@]} )
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 151223 11:52:22 innobackupex: Starting the backup operation
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: IMPORTANT: Please check that the backup run completes successfully.
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup:            At the end of a successful backup run innobackupex
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup:            prints "completed OK!".
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 151223 11:52:22 Connecting to MySQL server host: localhost, user: galera, password: not set, port: 3306, socket: /var/run/mysqld/mysqld.sock
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: Using server version 10.0.22-MariaDB-1~trusty-wsrep-log
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: innobackupex version 2.3.3 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 525ca7d)
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: uses posix_fadvise().
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: cd to /var/lib/mysql
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: open files limit requested 0, set to 1024
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: using the following InnoDB configuration:
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup:   innodb_data_home_dir = ./
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup:   innodb_log_group_home_dir = ./
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup:   innodb_log_files_in_group = 2
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup:   innodb_log_file_size = 50331648
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: using O_DIRECT
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 151223 11:52:22 >> log scanned up to (1653511)
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: xtrabackup: Generating a list of tablespaces
Dec 23 11:52:22 ip-172-19-1-29 -innobackupex-backup: 151223 11:52:22 [01] Streaming ./ibdata1
Dec 23 11:52:23 ip-172-19-1-29 -innobackupex-backup: 151223 11:52:23 [01]        ...done
Dec 23 11:52:23 ip-172-19-1-29 -wsrep-sst-donor: 2015/12/23 11:52:23 socat[10127] E SSL_write(): Connection reset by peer
Dec 23 11:52:23 ip-172-19-1-29 -innobackupex-backup: #007innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
Dec 23 11:52:23 ip-172-19-1-29 -innobackupex-backup: [01] xtrabackup: Error: failed to copy datafile.
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Warning] Aborted connection 43 to db: 'unconnected' user: 'galera' host: 'localhost' (Unknown error)
Dec 23 11:52:23 ip-172-19-1-29 -wsrep-sst-donor: innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log
Dec 23 11:52:23 ip-172-19-1-29 -wsrep-sst-donor: Cleanup after exit with status:22
Dec 23 11:52:23 ip-172-19-1-29 -wsrep-sst-donor: Cleaning up temporary directories
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.19.1.191:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog '/var/log/mysql/mariadb-bin' --gtid '3d580fc8-a862-11e5-a4c0-27d206784152:28'
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.19.1.191:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog '/var/log/mysql/mariadb-bin' --gtid '3d580fc8-a862-11e5-a4c0-27d206784152:28': 22 (Invalid argument)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.19.1.191:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog '/var/log/mysql/mariadb-bin' --gtid '3d580fc8-a862-11e5-a4c0-27d206784152:28'
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Warning] WSREP: 0.0 (ip-172-19-1-29): State transfer to 2.0 (ip-172-19-1-191) failed: -22 (Invalid argument)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 28)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Member 0.0 (ip-172-19-1-29) synced with group.
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 28)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: declaring 7d4c14a0 at ssl://172.19.1.27:4567 stable
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Synchronized with group, ready for connections
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: forgetting 999de2e6 (ssl://172.19.1.191:4567)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Node 78e5a847 state prim
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: view(view_id(PRIM,78e5a847,30) memb {
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #01178e5a847,0
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #0117d4c14a0,0
Dec 23 11:52:23 ip-172-19-1-29 mysqld: } joined {
Dec 23 11:52:23 ip-172-19-1-29 mysqld: } left {
Dec 23 11:52:23 ip-172-19-1-29 mysqld: } partitioned {
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011999de2e6,0
Dec 23 11:52:23 ip-172-19-1-29 mysqld: })
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: save pc into disk
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: forgetting 999de2e6 (ssl://172.19.1.191:4567)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: STATE_EXCHANGE: sent state UUID: a0e3b038-a96b-11e5-8ef6-93a653bcbb15
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: STATE EXCHANGE: sent state msg: a0e3b038-a96b-11e5-8ef6-93a653bcbb15
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: STATE EXCHANGE: got state msg: a0e3b038-a96b-11e5-8ef6-93a653bcbb15 from 0 (ip-172-19-1-29)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: STATE EXCHANGE: got state msg: a0e3b038-a96b-11e5-8ef6-93a653bcbb15 from 1 (ip-172-19-1-27)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Quorum results:
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011version    = 3,
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011component  = PRIMARY,
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011conf_id    = 29,
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011members    = 2/2 (joined/total),
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011act_id     = 28,
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011last_appl. = 0,
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
Dec 23 11:52:23 ip-172-19-1-29 mysqld: #011group UUID = 3d580fc8-a862-11e5-a4c0-27d206784152
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Flow-control interval: [23, 23]
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: New cluster view: global state: 3d580fc8-a862-11e5-a4c0-27d206784152:28, view# 30: Primary, number of nodes: 2, my index: 0, protocol version 3
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: REPL Protocols: 7 (3, 2)
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Service thread queue flushed.
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Assign initial position for certification: 28, protocol version: 3
Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Note] WSREP: Service thread queue flushed.

Bogdan Dobrelya

unread,
Apr 26, 2016, 11:10:05 AM4/26/16
to codership
Not much answers :/

Not, I can randomly reproduce the same issue, like in the logs snipper above

Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/lib/mysql/ibdata1'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Cleaning the binlog directory /var/log/mysql as well
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: removed '/var/log/mysql/mariadb-bin.000001'
Dec 23 11:52:12 ip-172-19-1-191 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Dec 23 11:52:14 ip-172-19-1-191 mysqld: 151223 11:52:14 [Note] WSREP: (999de2e6, 'ssl://0.0.0.0:4567') turning message relay requesting off
Dec 23 11:52:22 ip-172-19-1-191 -wsrep-sst-joiner: #007xbstream: Can't create/write to file './ibdata1' (Errcode: 17 - File exists)

Looks like a race condition. It may be OK after a few start attempts in a raw, or may not.
Packages used: percona-xtradb-cluster-server-5.6 5.6.21-25.8-0ubuntu2,
percona-xtrabackup 2.2.3-2.1build1


середа, 23 грудня 2015 р. 13:57:52 UTC+1 користувач Dhanushka написав:

Andrew Garner

unread,
Apr 26, 2016, 1:16:56 PM4/26/16
to Bogdan Dobrelya, codership
Looks like stale bits in the .sst temporary directory on the joiner
(used to stage the copy), and the old files (likely from some previous
failure) are not being cleaned up.

I opened this recently with PXC but I expect I am not the first to
request this change.

https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1573159

This affects mariadb as well when using wsrep_sst_xtrabackup-v2 (which
has other issues with this method anyway). MariaDB also appears to
have an old open jira on this:

https://jira.mariadb.org/browse/MDEV-9884

This has annoyed me recently and I've taken to just patching the
wsrep_sst_xtrabackup-v2 script to ensure .sst is deleted before state
transfer is started.

~Andrew
> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to codership-tea...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bogdan Dobrelya

unread,
Apr 27, 2016, 7:58:42 AM4/27/16
to codership, bdob...@mirantis.com
Annoying, it is.
Actually it's a show stopper, and I'm wondering how such a bug has survived this long.

I tried to use, but with the same sad result:
[sst]
sst-special-dirs = 0

and made a patch:
--- b/wsrep_sst_xtrabackup-v2
+++ a/wsrep_sst_xtrabackup-v2
@@ -829,6 +829,7 @@ then
 
 
         MAGIC_FILE="${DATA}/${INFO_FILE}"
+        rm -f ${DATA}/ibdata1
         recv_joiner $DATA "${stagemsg}-SST" 0 0
 
         get_proc


вівторок, 26 квітня 2016 р. 19:16:56 UTC+2 користувач Andrew Garner написав:

Andrew Garner

unread,
Apr 27, 2016, 11:34:49 AM4/27/16
to Bogdan Dobrelya, codership
So, I see that MariaDB's wsrep_sst_xtrabackup-v2 is not removing the
stale .sst directory, but Percona XtraDB Cluster fixed this some time
ago in this commit:

https://github.com/percona/percona-xtradb-cluster/commit/0bd6bf1b8a0f53acabc2198e270129d8925b0cf6

In particular MariaDB needs to apply the change here:

https://github.com/percona/percona-xtradb-cluster/blob/5.6/scripts/wsrep_sst_xtrabackup-v2.sh#L862

Also in that commit sst-special-dirs was obsoleted - I don't think it
does anything useful anymore but generate a warning if you set it to
0. This seems to be the case in MariaDB 10.X and any recent
Percona-XtraDB-Cluster.

As far as some race condition with an ibdata1 magically appearing in
the $datadir, I'm not sure what that is. I've run a lot of SST tests
lately and can't say I've run across this. I have managed to
generate failures in SST - often due to mismanaging storage - and then
MariaDB will fail to recover without someone going in and manually
remove the .sst contents.

If you have any details how you can tickle a race condition here
without using SIGSTOP + touching a file, I am very interested.

hunter86bg

unread,
Apr 28, 2016, 6:21:59 AM4/28/16
to codership, bdob...@mirantis.com
In my setup ,

the .sst folder is cleaned up without issues, same for the Datadir once a SST is initiated. You could try to manually edit the xtabacup-v2 script and modify into "#!/bin/bash -x". Then try the SST again. Probably you might need to start the script itself to find out where it detects any issue, missing dependency or wrong permissions.Also, I didn't see if you have socat of the same version on both nodes (donor and joiner).Have you set a user for the xtrabackup-v2 in the my.cnf file ?
As I see , it's asking for socat:

Dec 23 11:52:23 ip-172-19-1-29 -wsrep-sst-donor: 2015/12/23 11:52:23
>> >> socat[10127] E SSL_write(): Connection reset by peer
>> >> Dec 23 11:52:23 ip-172-19-1-29 -innobackupex-backup: #007innobackupex:
>> >> Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
>> >> Dec 23 11:52:23 ip-172-19-1-29 -innobackupex-backup: [01] xtrabackup:
>> >> Error: failed to copy datafile.
>> >> Dec 23 11:52:23 ip-172-19-1-29 mysqld: 151223 11:52:23 [Warning]
>> >> Aborted
>> >> connection 43 to db: 'unconnected' user: 'galera' host: 'localhost'
>> >> (Unknown
>> >> error)

P.S.: I am not using encryption , as my Galera's are working in the same private LAN in the same rack.

Bogdan Dobrelya

unread,
May 5, 2016, 4:42:39 AM5/5/16
to codership, bdob...@mirantis.com
Answers inline, thank you!

середа, 27 квітня 2016 р. 17:34:49 UTC+2 користувач Andrew Garner написав:
So, I see that MariaDB's wsrep_sst_xtrabackup-v2 is not removing the
stale .sst directory, but Percona XtraDB Cluster fixed this some time
ago in this commit:

https://github.com/percona/percona-xtradb-cluster/commit/0bd6bf1b8a0f53acabc2198e270129d8925b0cf6

In particular MariaDB needs to apply the change here:

https://github.com/percona/percona-xtradb-cluster/blob/5.6/scripts/wsrep_sst_xtrabackup-v2.sh#L862


Yes, I ended up using wsrep_sst_xtrabackup-v2.sh script and percona-xtrabackup package from percona distribution on top of my MariaDB setup.
Works for me, although there are still issues may happen to the binary log files with non-default location [0].

 
Also in that commit sst-special-dirs was obsoleted - I don't think it
does anything useful anymore but generate a warning if you set it to
0.  This seems to be the case in MariaDB 10.X and any recent
Percona-XtraDB-Cluster.

As far as some race condition with an ibdata1 magically appearing in
the $datadir, I'm not sure what that is.   I've run a lot of SST tests

This is a race related to external monitoring race condition. But that's another story [1]

 
lately and can't say I've run across this.   I have managed to
generate failures in SST - often due to mismanaging storage - and then
MariaDB will fail to recover without someone going in and manually
remove the .sst contents.

If you have any details how you can tickle a race condition here
without using SIGSTOP + touching a file, I am  very interested.

As I noted above, I'm not sure how to fix the race with external monitoring 
Reply all
Reply to author
Forward
0 new messages