Galera Cluster will not start (MySQL)

2,388 views
Skip to first unread message

Steven Hancz

unread,
Apr 3, 2017, 11:36:49 AM4/3/17
to codership
Hi,
I am doing a new installation of Galera with MySQL 5.7 on RHEL 6.8.

Packages installed OK.

Here is the my.cnf file

# cat /etc/my.cnf
# For advice on how to change settings please see

[mysqld]
#/usr/lib64/galera-3/libgalera_smm.so
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M

# Cluster specific
wsrep_cluster_name="GT_MySQL_Cluster"
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_address="gcomm://10.102.192.96,10.102.192.176,10.102.192.177"
wsrep_sst_method=rsync

# Unique to this host!
wsrep_node_name=GT_MySQL_Node1
wsrep_node_address="10.102.192.96"

[mysql_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


When I attempt to start the service for the first time I get this:

service mysqld start --wsrep-new-cluster


Initializing MySQL database:  2017-04-03T15:06:33.008317Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:06:33.451575Z 0 [Warning] InnoDB: New log files created, LSN=45790
2017-04-03T15:06:33.486814Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2017-04-03T15:06:33.539203Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 2007e851-187f-11e7-8a59-5cb9018de794.
2017-04-03T15:06:33.539943Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2017-04-03T15:06:33.540990Z 1 [Note] A temporary password is generated for root@localhost: jI_lj,pr#2?,
                                                           [  OK  ]
Installing validate password plugin:  2017-04-03T15:06:36.601475Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-04-03T15:06:36.603159Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.17) starting as process 27651 ...
2017-04-03T15:06:36.605029Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-04-03T15:06:36.605041Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
2017-04-03T15:06:36.610863Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <in...@codership.com> loaded successfully.
2017-04-03T15:06:36.610921Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-04-03T15:06:36.611143Z 0 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2017-04-03T15:06:36.611175Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 1
2017-04-03T15:06:36.613944Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.102.192.96; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.
2017-04-03T15:06:36.633067Z 0 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(00000000-0000-0000-0000-000000000000:-1)
2017-04-03T15:06:36.633508Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-04-03T15:06:36.633544Z 0 [Note] WSREP: wsrep_sst_grab()
2017-04-03T15:06:36.633556Z 0 [Note] WSREP: Start replication
2017-04-03T15:06:36.633579Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-04-03T15:06:36.633712Z 0 [Note] WSREP: protonet asio version 0
2017-04-03T15:06:36.633926Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-04-03T15:06:36.633983Z 0 [Note] WSREP: backend: asio
2017-04-03T15:06:36.634060Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2017-04-03T15:06:36.634232Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-04-03T15:06:36.634247Z 0 [Note] WSREP: restore pc from disk failed
2017-04-03T15:06:36.635624Z 0 [Note] WSREP: GMCast version 0
2017-04-03T15:06:36.636180Z 0 [Note] WSREP: (21e02ede, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-04-03T15:06:36.636214Z 0 [Note] WSREP: (21e02ede, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-04-03T15:06:36.636844Z 0 [Note] WSREP: EVS version 0
2017-04-03T15:06:36.637055Z 0 [Note] WSREP: gcomm: connecting to group 'GT_MySQL_Cluster', peer '10.102.192.96:,10.102.192.176:,10.102.192.177:'
2017-04-03T15:06:36.639204Z 0 [Note] WSREP: (21e02ede, 'tcp://0.0.0.0:4567') connection established to 21e02ede tcp://10.102.192.96:4567
2017-04-03T15:06:36.639223Z 0 [Warning] WSREP: (21e02ede, 'tcp://0.0.0.0:4567') address 'tcp://10.102.192.96:4567' points to own listening address, blacklisting
2017-04-03T15:06:39.639456Z 0 [Note] WSREP: (21e02ede, 'tcp://0.0.0.0:4567') connection to peer 21e02ede with addr tcp://10.102.192.96:4567 timed out, no messages seen in PT3S
2017-04-03T15:06:39.639646Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-04-03T15:06:39.639694Z 0 [Note] WSREP: view(view_id(NON_PRIM,21e02ede,1) memb {
        21e02ede,0
} joined {
} left {
} partitioned {
})
2017-04-03T15:06:40.140122Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50332S), skipping check
2017-04-03T15:07:09.649499Z 0 [Note] WSREP: view((empty))
2017-04-03T15:07:09.649721Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2017-04-03T15:07:09.649750Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-04-03T15:07:09.649911Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'GT_MySQL_Cluster' at 'gcomm://10.102.192.96,10.102.192.176,10.102.192.177': -110 (Connection timed out)
2017-04-03T15:07:09.649953Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-04-03T15:07:09.649968Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.102.192.96,10.102.192.176,10.102.192.177) failed: 7
2017-04-03T15:07:09.649976Z 0 [ERROR] Aborting

2017-04-03T15:07:09.649987Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-04-03T15:07:09.650000Z 0 [Note] WSREP: Service disconnected.
2017-04-03T15:07:10.650118Z 0 [Note] WSREP: Some threads may fail to exit.
2017-04-03T15:07:10.650166Z 0 [Note] Binlog end
2017-04-03T15:07:10.650324Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

                                                           [FAILED]


Any clues? 

What I am doing wrong?

Thank you all.
Steven

hunter86bg

unread,
Apr 3, 2017, 4:44:35 PM4/3/17
to codership
Let's check the most obvious things. First ,do you have the "socat" and "nmap-ncat" packages installed.

Also check if the firewall is opened for tcp/3306  and tcp/4567?If you use firewalld, you can setup your own firewall service taking care of both ports, otherwise this should do the trick:
firewall-cmd --add-service=mysql --permanent; firewall-cmd --add-port=tcp/4567 --permanent ; firewall-cmd --reload

hunter86bg

unread,
Apr 3, 2017, 4:46:17 PM4/3/17
to codership
P.S.: Check if selinux is in enforcing mode
getenforce


Steven Hancz

unread,
Apr 4, 2017, 11:07:35 AM4/4/17
to codership
Hi,

Thank you all for the help.
I did make some progress. It seems that one has to initialize mysql by executing:

mysqld --initialize

after that I can execute:

service mysqld start --wsrep-new-cluster

and finally execute:

mysql_secure_installation

The temporary password for the root user is returned by the first command (mysqld --initialize) use that to continue with the secure_installation.

After that I was able to start the service mysqld on two more nodes and the cluster was working.

However once I stopped it it will not restart.

The last host that stopped reported the following:

# cat  /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    92c39dd8-1944-11e7-8e2d-6bc3135aa70f
seqno:   8
safe_to_bootstrap: 1

So one can think that this is the node where I could restart the service first. So I execute service mysqld start this did fail with:
MySQL Daemon failed to start.
Starting mysqld:                                           [FAILED]

The error log contains:

2017-04-04T14:58:41.723273Z 0 [Note] WSREP: (2f218e69, 'tcp://0.0.0.0:4567') connection to peer 2f218e69 with addr tcp://10.102.192.96:4567 timed out, no messages seen in PT3S
2017-04-04T14:58:41.724879Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50325S), skipping check
2017-04-04T14:59:11.234216Z 0 [Note] WSREP: view((empty))
2017-04-04T14:59:11.234458Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2017-04-04T14:59:11.234494Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-04-04T14:59:11.234614Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'GT_MySQL_Cluster' at 'gcomm://10.102.192.96,10.102.192.176,10.102.192.177': -110 (Connection timed out)
2017-04-04T14:59:11.234658Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-04-04T14:59:11.234682Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.102.192.96,10.102.192.176,10.102.192.177) failed: 7
2017-04-04T14:59:11.234697Z 0 [ERROR] Aborting

2017-04-04T14:59:11.234715Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-04-04T14:59:11.234737Z 0 [Note] WSREP: Service disconnected.
2017-04-04T14:59:12.234876Z 0 [Note] WSREP: Some threads may fail to exit.
2017-04-04T14:59:12.234934Z 0 [Note] Binlog end


Hmm what am I missing?

Steven

Steven Hancz

unread,
Apr 4, 2017, 1:43:47 PM4/4/17
to codership
OK it is resolved now.

A restart also needs the  --wsrep-new-cluster option.

You need to use it on first startup and subsequent ones as well.

Steven
Reply all
Reply to author
Forward
0 new messages