Galera cluster not starting

435 vues
Accéder directement au premier message non lu

Judson Borges

non lue,
7 févr. 2021, 10:54:3807/02/2021
à codership
Hello,

I have an 3 nodes on galera cluster, Debian 10 master, CentOS 7 slave1 and CentOS 8 slave2. 

On master when I try "show status like 'wsrep_cluster_size'; the result is this:

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 0     |
+--------------------+-------+
1 row in set (0.002 sec)

On slave2 mariadb not starts, systemctl status mariadb the result is this:

● mariadb.service - MariaDB 10.5.8 database server
   Loaded: loaded (/usr/lib/systemd/system/./mariadb.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Sun 2021-02-07 10:40:10 -03; 14min ago
     Docs: man:mariadbd(8)
  Process: 2205 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 2127 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSI>
  Process: 2125 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 2205 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"

fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
fev 07 10:40:10 slave2 mariadbd[2205]:          at gcomm/src/pc.cpp:connect():160
fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1632: Failed to open channel 'galera_cluster' at 'gcomm://10.10.0.1,10.20.20.2,10.30.30.2'>
fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] WSREP: gcs connect failed: Connection timed out
fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] WSREP: wsrep::connect(gcomm://10.10.0.1,10.20.20.2,10.30.30.2) failed: 7
fev 07 10:40:10 slave2 mariadbd[2205]: 2021-02-07 10:40:10 0 [ERROR] Aborting
fev 07 10:40:10 slave2 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
fev 07 10:40:10 slave2 systemd[1]: mariadb.service: Failed with result 'exit-code'.
fev 07 10:40:10 slave2 systemd[1]: Failed to start MariaDB 10.5.8 database server.

On slave1 mariadb not starts, systemctl status mariadb the result is this:

mariadb.service - MariaDB 10.5.8 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Dom 2021-02-07 10:40:14 -03; 14min ago
     Docs: man:mariadbd(8)
  Process: 1660 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 1584 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
  Process: 1582 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 1660 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"

Fev 07 10:40:14 slave1 mariadbd[1660]: at gcomm/src/pc.cpp:connect():160
Fev 07 10:40:14 slave1 mariadbd[1660]: 2021-02-07 10:40:14 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
Fev 07 10:40:14 slave1 mariadbd[1660]: 2021-02-07 10:40:14 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1632: Failed to open channel 'galera_cluster' at 'gcomm://10.10.0.1...n timed out)
Fev 07 10:40:14 slave1 mariadbd[1660]: 2021-02-07 10:40:14 0 [ERROR] WSREP: gcs connect failed: Connection timed out
Fev 07 10:40:14 slave1 mariadbd[1660]: 2021-02-07 10:40:14 0 [ERROR] WSREP: wsrep::connect(gcomm://10.10.0.1,10.20.20.2,10.30.30.2) failed: 7
Fev 07 10:40:14 slave1 mariadbd[1660]: 2021-02-07 10:40:14 0 [ERROR] Aborting
Fev 07 10:40:14 slave1 systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Fev 07 10:40:14 slave1 systemd[1]: Failed to start MariaDB 10.5.8 database server.
Fev 07 10:40:14 slave systemd[1]: Unit mariadb.service entered failed state.
Fev 07 10:40:14 slave1 systemd[1]: mariadb.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

What's could to be wrong?

Thanks advance!


Khalid Mustafa

non lue,
19 févr. 2021, 05:18:3619/02/2021
à codership
On slave1 and 2 mariadb verify the directory "wsrep_provider=/usr/lib64/galera/libgalera_smm.so"
> make sure the directory is pointing to an existing directory  
> also check you /etc/my.cf it should point to your galera directory
 

micah milano

non lue,
24 févr. 2021, 02:43:1324/02/2021
à codership
It appears that you are getting a connection time out to the other nodes specified in your gcomm:// line. Are you able to ping those ips? Do you have a firewall in between them that might not be allowing the correct ports?

Judson Borges

non lue,
5 mars 2021, 09:59:0105/03/2021
à codership
I reinstall OS on master and solve the problem.
Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message