Can't get Galera cluster working

625 views
Skip to first unread message

maks...@balticom.eu

unread,
Feb 14, 2018, 6:14:44 AM2/14/18
to codership
Hello!

I have recently installed Galera cluster, but I cannot get up running mariadb on the second node.

NODE1 /etc/my.cnf.d/server.cnf:


[server]

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]
datadir=/var/lib/mysql
tmpdir=/tmp/zabbix
[mariadb-10.1]
optimizer_switch = 'index_condition_pushdown=off'



wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"
wsrep_cluster_name='zabbix-cluster'
wsrep_node_address='10.10.16.1'
wsrep_node_name='zabbix-db-node1'
wsrep_sst_method=xtrabackup
binlog_format=row
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
query_cache_size = 0
query_cache_type = 0
default_storage_engine=InnoDB
wsrep_sst_auth="zabbix_db_node1:zabbix_db_node1"
innodb_flush_log_at_trx_commit = 2
wsrep_node_incoming_address='10.10.16.1'
wsrep_sst_receive_address='10.10.16.1'


innodb_file_per_table
innodb_buffer_pool_size = 10G 
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 128M
innodb_lock_wait_timeout = 500
innodb_log_file_size = 100M


[mariadb-5.5]

NODE2 /etc/my.cnf.d/server.cnf:


[server]

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]
datadir=/var/lib/mysql
tmpdir=/tmp/zabbix
[mariadb-10.1]
optimizer_switch = 'index_condition_pushdown=off'



wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"
wsrep_cluster_name='zabbix-cluster'
wsrep_node_address='10.10.16.2'
wsrep_node_name='zabbix-db-node2'
wsrep_sst_method=xtrabackup
binlog_format=row
innodb_doublewrite=1
query_cache_size = 0
query_cache_type = 0
innodb_autoinc_lock_mode=2
default_storage_engine=InnoDB
wsrep_sst_auth="zabbix_db_node2:zabbix_db_node2"
innodb_flush_log_at_trx_commit = 2
wsrep_node_incoming_address='10.10.16.2'
wsrep_sst_receive_address='10.10.16.2'


innodb_file_per_table
innodb_buffer_pool_size = 10G 
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 128M
innodb_lock_wait_timeout = 500
innodb_log_file_size = 100M


[mariadb-5.5]


1. I start galera_new_cluster on NODE1. Everything seems to be OK:

[root@zabbix-db-node1 ~]# systemctl status mariadb
● mariadb.service - MariaDB 10.1.31 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: active (running) since Wed 2018-02-14 05:31:09 EST; 8min ago
     Docs: man:mysqld(8)
  Process: 10744 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 10694 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
  Process: 10691 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 10711 (mysqld)
   Status: "Taking your SQL requests now..."
   CGroup: /system.slice/mariadb.service
           └─10711 /usr/sbin/mysqld --wsrep-new-cluster

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139659971651328 [Note] WSREP: Flow-control interval: [16, 16]
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139659971651328 [Note] WSREP: Trying to continue unpaused monitor
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139660283120384 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:1780, view# 3: P...ol version 3
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139660283120384 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139660283120384 [Note] WSREP: REPL Protocols: 7 (3, 2)
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139660283120384 [Note] WSREP: Assign initial position for certification: 1780, protocol version: 3
Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:19 139660030375680 [Note] WSREP: Service thread queue flushed.
Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:23 139659980044032 [Note] WSREP: (2b3ca384, 'tcp://0.0.0.0:4567') connection established to 3324f771 tcp://10.10.16.2:4567
Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:23 139659980044032 [Warning] WSREP: discarding established (time wait) 3324f771 (tcp://10.10.16.2:4567)
Feb 14 05:31:24 zabbix-db-node1 mysqld[10711]: 2018-02-14  5:31:24 139659980044032 [Note] WSREP:  cleaning up 3324f771 (tcp://10.10.16.2:4567)
Hint: Some lines were ellipsized, use -l to show in full.

2. I systemctl start mariadb on NODE2, wait some 30 seconds and get:

Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.

3. Then I look into the journal:

Feb 14 12:48:03 zabbix-db-node2 systemd[1]: Starting MariaDB 10.1.31 database server...
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] /usr/sbin/mysqld (mysqld 10.1.31-MariaDB) starting as process 3150 ...
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Read nil XID from storage engines, skipping position init
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): Galera 25.3.22(r3764) by Codership Oy <in...@codership.com> loaded successfully.
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: CRC-32C: using hardware acceleration.
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Found saved state: ad8e3ac3-116d-11e8-859f-eadc403c6207:-1, safe_to_bootstrap: 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.10.16.2; base_port = 4567; cert.log
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: GCache history reset: ad8e3ac3-116d-11e8-859f-eadc403c6207:0 -> ad8e3ac3-116d-11e8-859f-eadc403c6207:
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_sst_grab()
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Start replication
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: protonet asio version 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Using CRC-32C for message checksums.
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: backend: asio
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm thread scheduling priority set to other:0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: restore pc from disk failed
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: GMCast version 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: EVS version 0
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm: connecting to group 'zabbix-cluster', peer '10.10.16.1:,10.10.16.2:'
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') connection established to 887aad3d tcp://10.10.16.2:4567
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') address 'tcp://10.10.16.2:4567' points to own listening address,
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') connection established to 2b3ca384 tcp://10.10.16.1:4567
Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: declaring 2b3ca384 at tcp://10.10.16.1:4567 stable
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Node 2b3ca384 state prim
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: view(view_id(PRIM,2b3ca384,4) memb {
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2b3ca384,0
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 887aad3d,0
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } joined {
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } left {
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } partitioned {
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: })
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: save pc into disk
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: gcomm: connected
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Opened channel 'zabbix-cluster'
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Waiting for SST to complete.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: sent state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: got state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596 from 0 (zabbix-db-node1)
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: got state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596 from 1 (zabbix-db-node2)
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Quorum results:
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: version    = 4,
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: component  = PRIMARY,
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: conf_id    = 3,
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: members    = 1/2 (joined/total),
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: act_id     = 2321,
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: last_appl. = -1,
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: protocols  = 0/7/3 (gcs/repl/appl),
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: group UUID = ad8e3ac3-116d-11e8-859f-eadc403c6207
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Flow-control interval: [23, 23]
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Trying to continue unpaused monitor
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2321)
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: State transfer required:
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Group state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Local state: ad8e3ac3-116d-11e8-859f-eadc403c6207:-1
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321, view# 4: Primary, number o
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Warning] WSREP: Gap in state sequence. Need state transfer.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --datadir '/var/lib/mysql/'   -
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: /usr//bin/wsrep_sst_xtrabackup: line 397: setup_ports: command not found
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --da
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Read: '(null)'
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --datadir
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] WSREP: Failed to prepare for 'xtrabackup' SST. Unrecoverable.
Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] Aborting
Feb 14 12:48:18 zabbix-db-node2 mysqld[3150]: Error in my_thread_global_end(): 1 threads didn't exit
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Failed to start MariaDB 10.1.31 database server.
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Unit mariadb.service entered failed state.
Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service failed.

What is wrong?
Thank you!

Karl-König Königsson

unread,
Feb 15, 2018, 3:11:08 AM2/15/18
to codership
One thing I noticed is that you have "wsrep_node_incoming_address='10.10.16.2'". From what I have gathered is this setting not really necessary when you set up the cluster. 

It might be a good plan to try to start the cluster with a minimal configuration; perhaps something like this:

wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"
wsrep_cluster_name='zabbix-cluster'
wsrep_sst_method=xtrabackup
binlog_format=row
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
query_cache_size = 0
query_cache_type = 0
default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit = 2

That should be a minimal cluster config but, just to be clear, I might be wrong - I have not set up a MariaDB cluster before so I am just guessing.

Claudio Nanni

unread,
Feb 15, 2018, 3:58:30 AM2/15/18
to maks...@balticom.eu, codership
Hi,

What is wrong?

> Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: /usr//bin/wsrep_sst_xtrabackup: line 397: setup_ports: command not found


wsrep_sst_method=xtrabackup
 

That method is outdated, you should use xtrabackup-v2.


wsrep_sst_method=xtrabackup-2


Please note also: MariaDB 10.1.31 is affected by https://jira.mariadb.org/browse/MDEV-15254

So if the node won't join (again) just replace the two wsrep_sst_common and wsrep_sst_xtrabackup-v2 with the ones attached to that Jira report.


Best Regards
-- 
Claudio Nanni
Principal Support Engineer
MariaDB Corporation
Reply all
Reply to author
Forward
0 new messages