Galera - node 2 cannot join

1,511 views
Skip to first unread message

Yung Hee Lee

unread,
Sep 2, 2016, 6:09:30 AM9/2/16
to codership
Hi all,

My first node works but my second node mysql service always failed to start with error 'connection time out'. But the rsync is working.
I've tried to cnfigure from different references and found the second node always having the connection time out issue.
I have no idea what is wrong..

Here are the configuration of my nodes.

node1:
[mysqld]
datadir
=/var/lib/mysql
socket
=/var/lib/mysql/mysql.sock
binlog_format
=ROW
bind
-address=0.0.0.0
default_storage_engine
=innodb
innodb_autoinc_lock_mode
=2
innodb_flush_log_at_trx_commit
=0
innodb_buffer_pool_size
=122M
wsrep_on
=ON
wsrep_debug
=ON
wsrep_provider
=/usr/lib64/galera/libgalera_smm.so
wsrep_provider_options
="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name
=MariaDB_Cluster
wsrep_cluster_address
="gcomm://"
wsrep_node_name
=sdb1
wsrep_node_address
="node1_IP"
wsrep_sst_method
=rsync
wsrep_sst_auth
=username:password




node2:

[mysqld]
datadir
=/var/lib/mysql
socket
=/var/lib/mysql/mysql.sock
binlog_format
=ROW
bind
-address=0.0.0.0
default_storage_engine
=innodb
innodb_autoinc_lock_mode
=2
innodb_flush_log_at_trx_commit
=0
innodb_buffer_pool_size
=122M
wsrep_on
=ON
wsrep_debug
=ON
wsrep_provider
=/usr/lib64/galera/libgalera_smm.so
wsrep_provider_options
="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name
=MariaDB_Cluster
wsrep_cluster_address
="gcomm://node1_IP,node2_IP"
wsrep_node_name
=sweb1
wsrep_node_address
="node2_IP"
wsrep_sst_method
=rsync
#wsrep_sst_auth=username:password

I start my first node with following command:
service mysql start --wsrep-new-cluster


And start my second node with the command:
service mysql start

It runs for few minutes then display error below:
Starting mysql (via systemctl):  Job for mariadb.service failed because a timeout was exceeded. See "systemctl status mariadb.service" and "journalctl -xe" for details.

                                                           
[FAILED]




Here is the output in /var/log/messages:
(sweb1 is my node2)
Sep  2 17:26:00 sweb1 systemd: Starting MariaDB database server...
Sep  2 17:26:01 sweb1 sh: 2016-09-02 17:26:01 139944069388416 [Note] /usr/sbin/mysqld (mysqld 10.1.16-MariaDB) starting as process 26732 ...
Sep  2 17:26:03 sweb1 sh: WSREP: Recovered position 2f435f73-48c3-11e6-aea7-9a14860cf8a5:6027910
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] /usr/sbin/mysqld (mysqld 10.1.16-MariaDB) starting as process 26766 ...
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Setting wsrep_ready to 0
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Read nil XID from storage engines, skipping position init
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: wsrep_load(): Galera 25.3.15(r3578) by Codership Oy <info@codership.com> loaded successfully.
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: CRC-32C: using hardware acceleration.
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = NODE2_IP; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 300M; gcache.size = 300M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = fals
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685218936576 [Note] WSREP: Service thread queue flushed.
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: wsrep_sst_grab()
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Start replication
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: protonet asio version 0
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: Using CRC-32C for message checksums.
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: backend: asio
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: restore pc from disk successfully
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: GMCast version 0
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: (d2fae28f, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: (d2fae28f, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: EVS version 0
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: gcomm: connecting to group 'MariaDB_Cluster', peer 'NODE1_IP:,NODE2_IP:'
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Warning] WSREP: (d2fae28f, 'tcp://0.0.0.0:4567') address 'tcp://NODE2_IP:4567' points to own listening address, blacklisting
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: (d2fae28f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: declaring bc00a1e7 at tcp://NODE1_IP:4567 stable
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: re-bootstrapping prim from partitioned components
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: view(view_id(PRIM,bc00a1e7,4) memb {
Sep  2 17:26:04 sweb1 mysqld: bc00a1e7,0
Sep  2 17:26:04 sweb1 mysqld: d2fae28f,0
Sep  2 17:26:04 sweb1 mysqld: } joined {
Sep  2 17:26:04 sweb1 mysqld: } left {
Sep  2 17:26:04 sweb1 mysqld: } partitioned {
Sep  2 17:26:04 sweb1 mysqld: })
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: save pc into disk
Sep  2 17:26:04 sweb1 mysqld: 2016-09-02 17:26:04 140685646694528 [Note] WSREP: clear restored view
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646694528 [Note] WSREP: gcomm: connected
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646694528 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646694528 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646694528 [Note] WSREP: Opened channel 'MariaDB_Cluster'
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646694528 [Note] WSREP: Waiting for SST to complete.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: STATE EXCHANGE: sent state msg: 4409ace1-70ef-11e6-8c2a-7719abf77a67
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: STATE EXCHANGE: got state msg: 4409ace1-70ef-11e6-8c2a-7719abf77a67 from 0 (sweb1)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: STATE EXCHANGE: got state msg: 4409ace1-70ef-11e6-8c2a-7719abf77a67 from 1 (sweb1)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Warning] WSREP: Quorum: No node with complete state:
Sep  2 17:26:05 sweb1 mysqld: Version      : 3
Sep  2 17:26:05 sweb1 mysqld: Flags        : 0x3
Sep  2 17:26:05 sweb1 mysqld: Protocols    : 0 / 7 / 3
Sep  2 17:26:05 sweb1 mysqld: State        : NON-PRIMARY
Sep  2 17:26:05 sweb1 mysqld: Prim state   : DONOR
Sep  2 17:26:05 sweb1 mysqld: Prim UUID    : d1f31dd9-70eb-11e6-ac5f-8fc2cb1d66e7
Sep  2 17:26:05 sweb1 mysqld: Prim  seqno  : 2
Sep  2 17:26:05 sweb1 mysqld: First seqno  : -1
Sep  2 17:26:05 sweb1 mysqld: Last  seqno  : 6027910
Sep  2 17:26:05 sweb1 mysqld: Prim JOINED  : 1
Sep  2 17:26:05 sweb1 mysqld: State UUID   : 4409ace1-70ef-11e6-8c2a-7719abf77a67
Sep  2 17:26:05 sweb1 mysqld: Group UUID   : 2f435f73-48c3-11e6-aea7-9a14860cf8a5
Sep  2 17:26:05 sweb1 mysqld: Name         : 'sweb1'
Sep  2 17:26:05 sweb1 mysqld: Incoming addr: '2NODE1_IP:3306'
Sep  2 17:26:05 sweb1 mysqld: Version      : 3
Sep  2 17:26:05 sweb1 mysqld: Flags        : 00
Sep  2 17:26:05 sweb1 mysqld: Protocols    : 0 / 7 / 3
Sep  2 17:26:05 sweb1 mysqld: State        : NON-PRIMARY
Sep  2 17:26:05 sweb1 mysqld: Prim state   : NON-PRIMARY
Sep  2 17:26:05 sweb1 mysqld: Prim UUID    : 00000000-0000-0000-0000-000000000000
Sep  2 17:26:05 sweb1 mysqld: Prim  seqno  : -1
Sep  2 17:26:05 sweb1 mysqld: First seqno  : -1
Sep  2 17:26:05 sweb1 mysqld: Last  seqno  : -1
Sep  2 17:26:05 sweb1 mysqld: Prim JOINED  : 0
Sep  2 17:26:05 sweb1 mysqld: State UUID   : 4409ace1-70ef-11e6-8c2a-7719abf77a67
Sep  2 17:26:05 sweb1 mysqld: Group UUID   : 00000000-0000-0000-0000-000000000000
Sep  2 17:26:05 sweb1 mysqld: Name         : 'sweb1'
Sep  2 17:26:05 sweb1 mysqld: Incoming addr: 'NODE2_IP:3306'
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Full re-merge of primary d1f31dd9-70eb-11e6-ac5f-8fc2cb1d66e7 found: 1 of 1.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Quorum results:
Sep  2 17:26:05 sweb1 mysqld: version    = 3,
Sep  2 17:26:05 sweb1 mysqld: component  = PRIMARY,
Sep  2 17:26:05 sweb1 mysqld: conf_id    = 2,
Sep  2 17:26:05 sweb1 mysqld: members    = 1/2 (joined/total),
Sep  2 17:26:05 sweb1 mysqld: act_id     = 6027910,
Sep  2 17:26:05 sweb1 mysqld: last_appl. = -1,
Sep  2 17:26:05 sweb1 mysqld: protocols  = 0/7/3 (gcs/repl/appl),
Sep  2 17:26:05 sweb1 mysqld: group UUID = 2f435f73-48c3-11e6-aea7-9a14860cf8a5
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Flow-control interval: [23, 23]
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 6027910)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: State transfer required:
Sep  2 17:26:05 sweb1 mysqld: Group state: 2f435f73-48c3-11e6-aea7-9a14860cf8a5:6027910
Sep  2 17:26:05 sweb1 mysqld: Local state: 00000000-0000-0000-0000-000000000000:-1
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: New cluster view: global state: 2f435f73-48c3-11e6-aea7-9a14860cf8a5:6027910, view# 3: Primary, number of nodes: 2, my index: 1, protocol version 3
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Warning] WSREP: Gap in state sequence. Need state transfer.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: Setting wsrep_ready to 0
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685126661888 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'NODE2_IP' --datadir '/var/lib/mysql/'   --parent '26766'  '' '
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Warning] WSREP: Could not find peer:
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Warning] WSREP: 0.0 (sweb1): State transfer to -1.-1 (left the group) failed: -255 (Unknown error 255)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Member 0.0 (sweb1) synced with group.
Sep  2 17:26:05 sweb1 rsyncd[26804]: rsyncd version 3.0.9 starting, listening on port 4444
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: Prepared SST request: rsync|NODE2_IP:4444/rsync_sst
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: REPL Protocols: 7 (3, 2)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685218936576 [Note] WSREP: Service thread queue flushed.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: Assign initial position for certification: 6027910, protocol version: 3
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685218936576 [Note] WSREP: Service thread queue flushed.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (2f435f73-48c3-11e6-aea7-9a14860cf8a5): 1 (Operation not permitted)
Sep  2 17:26:05 sweb1 mysqld: at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Member 1.0 (sweb1) requested state transfer from '*any*'. Selected 0.0 (sweb1)(SYNCED) as donor.
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685156017920 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6027910)
Sep  2 17:26:05 sweb1 mysqld: 2016-09-02 17:26:05 140685646379776 [Note] WSREP: Requesting state transfer: success, donor: 0
Sep  2 17:26:05 sweb1 rsyncd[26819]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:05 sweb1 rsyncd[26819]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:05 sweb1 rsyncd[26819]: rsync to rsync_sst/ from UNKNOWN (NODE1_IP)
Sep  2 17:26:05 sweb1 rsyncd[26819]: receiving file list
Sep  2 17:26:07 sweb1 mysqld: 2016-09-02 17:26:07 140685164410624 [Note] WSREP: (d2fae28f, 'tcp://0.0.0.0:4567') turning message relay requesting off
Sep  2 17:26:35 sweb1 rsyncd[26937]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:35 sweb1 rsyncd[26937]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:35 sweb1 rsyncd[26819]: sent 73 bytes  received 348180544 bytes  total size 348137472
Sep  2 17:26:35 sweb1 rsyncd[26937]: rsync to rsync_sst-log_dir/ from UNKNOWN (NODE1_IP)
Sep  2 17:26:35 sweb1 rsyncd[26937]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27034]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27034]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[26937]: sent 111 bytes  received 201351509 bytes  total size 201326592
Sep  2 17:26:52 sweb1 rsyncd[27039]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27039]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27036]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27036]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27042]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27042]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27045]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27045]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27038]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27038]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27043]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27043]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27035]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27040]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27035]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27040]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27044]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27041]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27044]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27041]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27034]: rsync to rsync_sst/./mysql from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27039]: rsync to rsync_sst/./db5 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27037]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27037]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27038]: rsync to rsync_sst/./db1 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27043]: rsync to rsync_sst/./db2 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27042]: rsync to rsync_sst/./db3 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27045]: rsync to rsync_sst/./db4 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27036]: rsync to rsync_sst/./performance_schema from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27035]: rsync to rsync_sst/./test from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27040]: rsync to rsync_sst/./db6 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27044]: rsync to rsync_sst/./db7 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27041]: rsync to rsync_sst/./db8 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27039]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27037]: rsync to rsync_sst/./db9 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27045]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27035]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27043]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27042]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27036]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27040]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27038]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27044]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27041]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27034]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27037]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27036]: sent 54 bytes  received 203 bytes  total size 61
Sep  2 17:26:52 sweb1 rsyncd[27043]: sent 54 bytes  received 207 bytes  total size 61
Sep  2 17:26:52 sweb1 rsyncd[27045]: sent 54 bytes  received 203 bytes  total size 61
Sep  2 17:26:52 sweb1 rsyncd[27044]: sent 54 bytes  received 207 bytes  total size 61
Sep  2 17:26:52 sweb1 rsyncd[27059]: name lookup failed for NODE1_IP: Name or service not known
Sep  2 17:26:52 sweb1 rsyncd[27059]: connect from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27059]: rsync to rsync_sst/./db0 from UNKNOWN (NODE1_IP)
Sep  2 17:26:52 sweb1 rsyncd[27037]: sent 92 bytes  received 99703 bytes  total size 99433
Sep  2 17:26:52 sweb1 rsyncd[27059]: receiving file list
Sep  2 17:26:52 sweb1 rsyncd[27059]: sent 54 bytes  received 203 bytes  total size 61
Sep  2 17:26:59 sweb1 rsyncd[27034]: sent 1688 bytes  received 18221102 bytes  total size 18214054
Sep  2 17:27:34 sweb1 systemd: mariadb.service start operation timed out. Terminating.
Sep  2 17:29:04 sweb1 systemd: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
Sep  2 17:29:04 sweb1 systemd: Failed to start MariaDB database server.
Sep  2 17:29:04 sweb1 systemd: Unit mariadb.service entered failed state.
Sep  2 17:29:04 sweb1 systemd: mariadb.service failed.
Sep  2 17:29:50 sweb1 rsyncd[27035]: sent 149 bytes  received 713656656 bytes  total size 713569127
Sep  2 17:30:01 sweb1 systemd: Started Session 9338 of user gensupp.
Sep  2 17:30:01 sweb1 systemd: Starting Session 9338 of user gensupp.
Sep  2 17:30:01 sweb1 systemd: Created slice user-0.slice.
Sep  2 17:30:01 sweb1 systemd: Starting user-0.slice.
Sep  2 17:30:01 sweb1 systemd: Started Session 9337 of user root.
Sep  2 17:30:01 sweb1 systemd: Starting Session 9337 of user root.
Sep  2 17:30:01 sweb1 systemd: Removed slice user-0.slice.
Sep  2 17:30:01 sweb1 systemd: Stopping user-0.slice.




There are the processes run by user mysql. It seems node1 able to sync to node2.
[node2]# ps aux | grep mysql

mysql    
26766  0.0  0.3 528732 52984 ?        Ssl  17:26   0:00 /usr/sbin/mysqld --wsrep_start_position=2f435f73-48c3-11e6-aea7-9a14860cf8a5:6027910

mysql    
26774  0.1  0.0 113256  1640 ?        S    17:26   0:00 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role joiner --address NODE2_IP --datadir /var/lib/mysql/ --parent 26766

mysql    
26804  0.0  0.0 114644  1196 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27038  0.0  0.0 146472  1876 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27039  0.0  0.0 146472  1872 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27040  0.0  0.0 146472  1856 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27041  0.0  0.0 146472  1884 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27042  0.0  0.0 146472  1856 ?        S    17:26   0:00 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27047  5.9  0.0 146732   880 ?        S    17:26   0:40 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27048  0.9  0.0 146732   868 ?        S    17:26   0:06 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27052  4.0  0.0 146732   868 ?        S    17:26   0:27 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27055  1.3  0.0 146732   884 ?        S    17:26   0:08 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
27056  5.2  0.0 146732   888 ?        S    17:26   0:35 rsync --daemon --no-detach --port 4444 --config /var/lib/mysql//rsync_sst.conf

mysql    
29922  0.0  0.0 107896   356 ?        S    17:38   0:00 sleep 1

root    
29924  0.0  0.0 112652   972 pts/0    S+   17:38   0:00 grep --color=auto mysql




However, mysqld is not running properly.
[node2]# service mysql status
 ERROR
! MySQL is running but PID file could not be found


I have no idea why my mariadb.service start operation always timed out in my second node.

Any hints?



p/s: 

1. ports 3306,4444,4567,4568 have been enabled in both servers. I even tried without firewall enabled also same result.

2. the sst user also has been created correctly (I can connect via command [mysql -h -u sst_user -p]) 

hunter86bg

unread,
Sep 3, 2016, 2:02:36 PM9/3/16
to codership
Please provide the OS and the packages installed.In RHEL you need also "socat", "nmap-ncat","rsync" to be installed.
Also consider setting the same information on all nodes for "wsrep_cluster_address=".

As you are using rsync
, you need to create this user in the OS ,and ensure that ssh communication is possible.
Please setup a passwordless ssh authentication (exchange keys for the sst user between both nodes) for user selected in "wsrep_sst_auth" and then try agian.Also check the DNS (or the "/etc/hosts" file) as you have issues with the

Yung Hee Lee

unread,
Sep 4, 2016, 9:52:34 PM9/4/16
to codership
Dear Sir,

I am using Cent OS 7.2
I've tried rsync manually via command, worked on both servers.
Weird thing happens. Now I setup node 2 as primary, node 1 as second, it works! But really no idea why node 2 must be primary..

Thanks!

hunter86bg

unread,
Sep 5, 2016, 2:37:54 AM9/5/16
to codership
If you have the possibility, please try to setup a 3rd node. Let's see how it is gonna behave.
Please consider an odd number of cluster members for your cluster (or using 2 nodes + Galera Arbitrator),as in case of failure of 1 of your 2 cluster nodes - the whole cluster will shut down.The reason for this behaviour is the "Split Brain" protection.

By the way , check that the socat and nmap-ncat are installed.When I configured my 3 node cluster it didn't require them , but I had issues.Also a SST Method "wsrep-v2" is recommended,as it is a non-blocking method and will allow the Donor to still process traffic.

Yin Xuesong

unread,
Sep 7, 2016, 5:15:58 AM9/7/16
to codership
maybe you should put "wsrep_cluster_name" "wsrep_node_name" "wsrep_node_address" 's location before "wsrep_cluster_address",like this:
#
# * Galera-related settings
#
[galera]
wsrep_cluster_name
=MyCluster
wsrep_node_name
= mariadb50
wsrep_node_address
=172.27.12.64


wsrep_on
=ON
wsrep_provider
=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address
="gcomm://"
#wsrep_cluster_address="gcomm://172.27.12.64,172.27.12.67,172.27.12.63"
binlog_format
=row
default_storage_engine
=InnoDB
innodb_autoinc_lock_mode
=2


在 2016年9月2日星期五 UTC+8下午6:09:30,Yung Hee Lee写道:

Will Fong

unread,
Sep 7, 2016, 8:28:29 PM9/7/16
to Yung Hee Lee, codership
Hi,

On Mon, Sep 5, 2016 at 9:52 AM, Yung Hee Lee <yungh...@gmail.com> wrote:
> Weird thing happens. Now I setup node 2 as primary, node 1 as second, it
> works! But really no idea why node 2 must be primary..

By "primary", do you mean you bootstrap node2, and then have node1 join it?

Sounds like a firewall issue. But we would need to see the full error
logs from both nodes.


-will


--
Will Fong, Senior Support Engineer
MariaDB Corporation
Reply all
Reply to author
Forward
0 new messages