Can't get Galera cluster working

625 views

Skip to first unread message

maks...@balticom.eu

unread,

Feb 14, 2018, 6:14:44 AM2/14/18

to codership

Hello!

I have recently installed Galera cluster, but I cannot get up running mariadb on the second node.

NODE1 /etc/my.cnf.d/server.cnf:

[server]

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]

datadir=/var/lib/mysql

tmpdir=/tmp/zabbix

[mariadb-10.1]

optimizer_switch = 'index_condition_pushdown=off'

wsrep_on=ON

wsrep_provider=/usr/lib64/galera/libgalera_smm.so

wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"

wsrep_cluster_name='zabbix-cluster'

wsrep_node_address='10.10.16.1'

wsrep_node_name='zabbix-db-node1'

wsrep_sst_method=xtrabackup

binlog_format=row

innodb_autoinc_lock_mode=2

innodb_doublewrite=1

query_cache_size = 0

query_cache_type = 0

default_storage_engine=InnoDB

wsrep_sst_auth="zabbix_db_node1:zabbix_db_node1"

innodb_flush_log_at_trx_commit = 2

wsrep_node_incoming_address='10.10.16.1'

wsrep_sst_receive_address='10.10.16.1'

innodb_file_per_table

innodb_buffer_pool_size = 10G

innodb_flush_method = O_DIRECT

innodb_log_buffer_size = 128M

innodb_lock_wait_timeout = 500

innodb_log_file_size = 100M

[mariadb-5.5]

NODE2 /etc/my.cnf.d/server.cnf:

[server]

[mysqld]

[embedded]

[mysqld-5.5]

[mariadb]

datadir=/var/lib/mysql

tmpdir=/tmp/zabbix

[mariadb-10.1]

optimizer_switch = 'index_condition_pushdown=off'

wsrep_on=ON

wsrep_provider=/usr/lib64/galera/libgalera_smm.so

wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"

wsrep_cluster_name='zabbix-cluster'

wsrep_node_address='10.10.16.2'

wsrep_node_name='zabbix-db-node2'

wsrep_sst_method=xtrabackup

binlog_format=row

innodb_doublewrite=1

query_cache_size = 0

query_cache_type = 0

innodb_autoinc_lock_mode=2

default_storage_engine=InnoDB

wsrep_sst_auth="zabbix_db_node2:zabbix_db_node2"

innodb_flush_log_at_trx_commit = 2

wsrep_node_incoming_address='10.10.16.2'

wsrep_sst_receive_address='10.10.16.2'

innodb_file_per_table

innodb_buffer_pool_size = 10G

innodb_flush_method = O_DIRECT

innodb_log_buffer_size = 128M

innodb_lock_wait_timeout = 500

innodb_log_file_size = 100M

[mariadb-5.5]

1. I start galera_new_cluster on NODE1. Everything seems to be OK:

[root@zabbix-db-node1 ~]# systemctl status mariadb

● mariadb.service - MariaDB 10.1.31 database server

Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)

Drop-In: /etc/systemd/system/mariadb.service.d

└─migrated-from-my.cnf-settings.conf

Active: active (running) since Wed 2018-02-14 05:31:09 EST; 8min ago

Docs: man:mysqld(8)

https://mariadb.com/kb/en/library/systemd/

Process: 10744 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)

Process: 10694 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)

Process: 10691 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)

Main PID: 10711 (mysqld)

Status: "Taking your SQL requests now..."

CGroup: /system.slice/mariadb.service

└─10711 /usr/sbin/mysqld --wsrep-new-cluster

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139659971651328 [Note] WSREP: Flow-control interval: [16, 16]

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139659971651328 [Note] WSREP: Trying to continue unpaused monitor

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:1780, view# 3: P...ol version 3

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: REPL Protocols: 7 (3, 2)

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660283120384 [Note] WSREP: Assign initial position for certification: 1780, protocol version: 3

Feb 14 05:31:19 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:19 139660030375680 [Note] WSREP: Service thread queue flushed.

Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:23 139659980044032 [Note] WSREP: (2b3ca384, 'tcp://0.0.0.0:4567') connection established to 3324f771 tcp://10.10.16.2:4567

Feb 14 05:31:23 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:23 139659980044032 [Warning] WSREP: discarding established (time wait) 3324f771 (tcp://10.10.16.2:4567)

Feb 14 05:31:24 zabbix-db-node1 mysqld[10711]: 2018-02-14 5:31:24 139659980044032 [Note] WSREP: cleaning up 3324f771 (tcp://10.10.16.2:4567)

Hint: Some lines were ellipsized, use -l to show in full.

2. I systemctl start mariadb on NODE2, wait some 30 seconds and get:

Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.

3. Then I look into the journal:

Feb 14 12:48:03 zabbix-db-node2 systemd[1]: Starting MariaDB 10.1.31 database server...

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] /usr/sbin/mysqld (mysqld 10.1.31-MariaDB) starting as process 3150 ...

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Read nil XID from storage engines, skipping position init

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_load(): Galera 25.3.22(r3764) by Codership Oy <in...@codership.com> loaded successfully.

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: CRC-32C: using hardware acceleration.

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Found saved state: ad8e3ac3-116d-11e8-859f-eadc403c6207:-1, safe_to_bootstrap: 0

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.10.16.2; base_port = 4567; cert.log

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: GCache history reset: ad8e3ac3-116d-11e8-859f-eadc403c6207:0 -> ad8e3ac3-116d-11e8-859f-eadc403c6207:

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: wsrep_sst_grab()

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Start replication

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: protonet asio version 0

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: Using CRC-32C for message checksums.

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: backend: asio

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm thread scheduling priority set to other:0

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: restore pc from disk failed

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: GMCast version 0

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: EVS version 0

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: gcomm: connecting to group 'zabbix-cluster', peer '10.10.16.1:,10.10.16.2:'

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') connection established to 887aad3d tcp://10.10.16.2:4567

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Warning] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') address 'tcp://10.10.16.2:4567' points to own listening address,

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') connection established to 2b3ca384 tcp://10.10.16.1:4567

Feb 14 12:48:03 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:03 139646100506880 [Note] WSREP: (887aad3d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: declaring 2b3ca384 at tcp://10.10.16.1:4567 stable

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Node 2b3ca384 state prim

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: view(view_id(PRIM,2b3ca384,4) memb {

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2b3ca384,0

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 887aad3d,0

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } joined {

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } left {

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: } partitioned {

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: })

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: save pc into disk

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: gcomm: connected

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Opened channel 'zabbix-cluster'

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100506880 [Note] WSREP: Waiting for SST to complete.

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: sent state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: got state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596 from 0 (zabbix-db-node1)

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: STATE EXCHANGE: got state msg: 8539e68d-1173-11e8-9fd1-7be871dd1596 from 1 (zabbix-db-node2)

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Quorum results:

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: version = 4,

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: component = PRIMARY,

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: conf_id = 3,

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: members = 1/2 (joined/total),

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: act_id = 2321,

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: last_appl. = -1,

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: protocols = 0/7/3 (gcs/repl/appl),

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: group UUID = ad8e3ac3-116d-11e8-859f-eadc403c6207

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Flow-control interval: [23, 23]

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Trying to continue unpaused monitor

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645799098112 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2321)

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: State transfer required:

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Group state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Local state: ad8e3ac3-116d-11e8-859f-eadc403c6207:-1

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Note] WSREP: New cluster view: global state: ad8e3ac3-116d-11e8-859f-eadc403c6207:2321, view# 4: Primary, number o

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [Warning] WSREP: Gap in state sequence. Need state transfer.

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --datadir '/var/lib/mysql/' -

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: /usr//bin/wsrep_sst_xtrabackup: line 397: setup_ports: command not found

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --da

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: Read: '(null)'

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139645769742080 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.16.2' --datadir

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] WSREP: Failed to prepare for 'xtrabackup' SST. Unrecoverable.

Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: 2018-02-14 12:48:04 139646100187904 [ERROR] Aborting

Feb 14 12:48:18 zabbix-db-node2 mysqld[3150]: Error in my_thread_global_end(): 1 threads didn't exit

Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE

Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Failed to start MariaDB 10.1.31 database server.

Feb 14 12:48:18 zabbix-db-node2 systemd[1]: Unit mariadb.service entered failed state.

Feb 14 12:48:18 zabbix-db-node2 systemd[1]: mariadb.service failed.

What is wrong?

Thank you!

Karl-König Königsson

unread,

Feb 15, 2018, 3:11:08 AM2/15/18

to codership

One thing I noticed is that you have "wsrep_node_incoming_address='10.10.16.2'". From what I have gathered is this setting not really necessary when you set up the cluster.

It might be a good plan to try to start the cluster with a minimal configuration; perhaps something like this:

wsrep_provider=/usr/lib64/galera/libgalera_smm.so

wsrep_cluster_address="gcomm://10.10.16.1,10.10.16.2"

wsrep_cluster_name='zabbix-cluster'

wsrep_sst_method=xtrabackup

binlog_format=row

innodb_autoinc_lock_mode=2

innodb_doublewrite=1

query_cache_size = 0

query_cache_type = 0

default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit = 2

That should be a minimal cluster config but, just to be clear, I might be wrong - I have not set up a MariaDB cluster before so I am just guessing.

Claudio Nanni

unread,

Feb 15, 2018, 3:58:30 AM2/15/18

to maks...@balticom.eu, codership

Hi,

What is wrong?

> Feb 14 12:48:04 zabbix-db-node2 mysqld[3150]: /usr//bin/wsrep_sst_xtrabackup: line 397: setup_ports: command not found

wsrep_sst_method=xtrabackup

That method is outdated, you should use xtrabackup-v2.

wsrep_sst_method=xtrabackup-2

Please note also: MariaDB 10.1.31 is affected by https://jira.mariadb.org/browse/MDEV-15254

So if the node won't join (again) just replace the two wsrep_sst_common and wsrep_sst_xtrabackup-v2 with the ones attached to that Jira report.

Best Regards

Claudio Nanni

Principal Support Engineer

MariaDB Corporation

Reply all

Reply to author

Forward

0 new messages