Galera second node connects, but then crashes or so it appears

1,504 views
Skip to first unread message

Roland Giesler GTS

unread,
Sep 7, 2017, 5:25:52 PM9/7/17
to codership
mysql  Ver 15.1 Distrib 10.2.7-MariaDB, for FreeBSD11.0 (amd64) using readline 5.1

First Galera node is started and the second seems to be connecting:

MariaDB [(none)]> show status like '%wsrep%';
+------------------------------+--------------------------------------+
| Variable_name                | Value                                |
+------------------------------+--------------------------------------+
| wsrep_apply_oooe             | 0.000000                             |
| wsrep_apply_oool             | 0.000000                             |
| wsrep_apply_window           | 0.000000                             |
| wsrep_causal_reads           | 0                                    |
| wsrep_cert_deps_distance     | 0.000000                             |
| wsrep_cert_index_size        | 0                                    |
| wsrep_cert_interval          | 0.000000                             |
| wsrep_cluster_conf_id        | 2                                    |
| wsrep_cluster_size           | 2                                    |
| wsrep_cluster_state_uuid     | 2749a781-9248-11e7-83a2-0262448701d2 |
| wsrep_cluster_status         | Primary                              |
| wsrep_commit_oooe            | 0.000000                             |
| wsrep_commit_oool            | 0.000000                             |
| wsrep_commit_window          | 0.000000                             |
| wsrep_connected              | ON                                   |
| wsrep_desync_count           | 0                                    |
| wsrep_evs_delayed            |                                      |
| wsrep_evs_evict_list         |                                      |
| wsrep_evs_repl_latency       | 0/0/0/0/0                            |
| wsrep_evs_state              | OPERATIONAL                          |
| wsrep_flow_control_paused    | 0.000000                             |
| wsrep_flow_control_paused_ns | 0                                    |
| wsrep_flow_control_recv      | 0                                    |
| wsrep_flow_control_sent      | 0                                    |
| wsrep_gcomm_uuid             | f78818e9-930b-11e7-9979-42755dc8ab32 |
| wsrep_incoming_addresses     | 172.16.15.21:3306,172.16.15.20:3306  |
| wsrep_last_committed         | 0                                    |
| wsrep_local_bf_aborts        | 0                                    |
| wsrep_local_cached_downto    | 18446744073709551615                 |
| wsrep_local_cert_failures    | 0                                    |
| wsrep_local_commits          | 0                                    |
| wsrep_local_index            | 1                                    |
| wsrep_local_recv_queue       | 0                                    |
| wsrep_local_recv_queue_avg   | 0.000000                             |
| wsrep_local_recv_queue_max   | 1                                    |
| wsrep_local_recv_queue_min   | 0                                    |
| wsrep_local_replays          | 0                                    |
| wsrep_local_send_queue       | 0                                    |
| wsrep_local_send_queue_avg   | 0.000000                             |
| wsrep_local_send_queue_max   | 1                                    |
| wsrep_local_send_queue_min   | 0                                    |
| wsrep_local_state            | 4                                    |
| wsrep_local_state_comment    | Synced                               |
| wsrep_local_state_uuid       | 2749a781-9248-11e7-83a2-0262448701d2 |
| wsrep_protocol_version       | 7                                    |
| wsrep_provider_name          | Galera                               |
| wsrep_provider_vendor        | Codership Oy <in...@codership.com>    |
| wsrep_provider_version       | 3.21(rrelease_25.3.21)               |
| wsrep_ready                  | ON                                   |
| wsrep_received               | 3                                    |
| wsrep_received_bytes         | 343                                  |
| wsrep_repl_data_bytes        | 0                                    |
| wsrep_repl_keys              | 0                                    |
| wsrep_repl_keys_bytes        | 0                                    |
| wsrep_repl_other_bytes       | 0                                    |
| wsrep_replicated             | 0                                    |
| wsrep_replicated_bytes       | 0                                    |
| wsrep_thread_count           | 2                                    |
+------------------------------+--------------------------------------+


However:

The second node has this in it's log /var/db/mysql/hout3.log

2017-09-07 21:38:49 34424840192 [Note] /usr/local/libexec/mysqld (mysqld 10.2.7-MariaDB-log) starting as process 39492 ...
170907 21:38:51 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
2017-09-07 21:38:51 34424840192 [Note] /usr/local/libexec/mysqld (mysqld 10.2.7-MariaDB-log) starting as process 39661 ...
2017-09-07 21:38:51 34424840192 [Note] WSREP: Setting wsrep_ready to 0
2017-09-07 21:38:51 34424840192 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-09-07 21:38:51 34424840192 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/lib/libgalera_smm.so'
2017-09-07 21:38:51 34424840192 [Note] WSREP: wsrep_load(): Galera 3.21(rrelease_25.3.21) by Codership Oy <in...@codership.com> loaded successfully.
2017-09-07 21:38:51 34424840192 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2017-09-07 21:38:51 34424840192 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 1
2017-09-07 21:38:51 34424840192 [Note] WSREP: Passing config to GCS: base_dir = /var/db/mysql/; base_host = 172.16.15.21; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/db/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/db/mysql//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.igno
2017-09-07 21:38:51 34424840192 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2017-09-07 21:38:51 34424840192 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-09-07 21:38:51 34424840192 [Note] WSREP: wsrep_sst_grab()
2017-09-07 21:38:51 34424840192 [Note] WSREP: Start replication
2017-09-07 21:38:51 34424840192 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-09-07 21:38:51 34424840192 [Note] WSREP: protonet asio version 0
2017-09-07 21:38:51 34424840192 [Note] WSREP: Using CRC-32C for message checksums.
2017-09-07 21:38:51 34424840192 [Note] WSREP: backend: asio
2017-09-07 21:38:51 34424840192 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-09-07 21:38:51 34424840192 [Note] WSREP: restore pc from disk successfully
2017-09-07 21:38:51 34424840192 [Note] WSREP: GMCast version 0
2017-09-07 21:38:51 34424840192 [Note] WSREP: (281feca2, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-09-07 21:38:51 34424840192 [Note] WSREP: (281feca2, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-09-07 21:38:51 34424840192 [Note] WSREP: EVS version 0
2017-09-07 21:38:51 34424840192 [Note] WSREP: gcomm: connecting to group 'hout_wsrep_cluster', peer '172.16.15.20:'
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: bind: Address already in use
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: failed to open gcomm backend connection: 48: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'bind: Address already in use': 48 (Address already in use)
         at gcomm/src/asio_tcp.cpp:listen():867
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -48 (Address already in use)
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1457: Failed to open channel 'hout_wsrep_cluster' at 'gcomm://172.16.15.20': -48 (Address already in use)
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: gcs connect failed: Address already in use
2017-09-07 21:38:51 34424840192 [ERROR] WSREP: wsrep::connect(gcomm://172.16.15.20) failed: 7
2017-09-07 21:38:51 34424840192 [ERROR] Aborting

170907 21:38:52 mysqld_safe mysqld from pid file /var/db/mysql/hout3.pid ended

This "[ERROR] WSREP: bind: Address already in use" doesn't make sense to me.  Why does the previous line "WSREP: (281feca2, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567" occur?

I've tried to find documentation that explains what one should have in the config files to fix/prevent this, but haven't been able to locate any.

Here's my config for node "hout3":

# cat /usr/local/etc/my.cnf
# Example MariaDB config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MariaDB.
#
# MariaDB programs look for option files in a set of
# locations which depend on the deployment platform.
# You can copy this option file to one of those
# locations. For information about these locations, do:
# 'my_print_defaults --help' and see what is printed under
# Default options are read from the following files in the given order:
# More information at: http://dev.mysql.com/doc/mysql/en/option-files.html
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
#password       = your_password
#port           = 3306
#socket         = @MYSQL_UNIX_ADDR@

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
port            = 3306
bind-address    = 172.16.15.21
skip-external-locking
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
#thread_concurrency = 16

# Point the following paths to a dedicated disk
#tmpdir         = /tmp/

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id       = 2

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin
#
# binary logging format - mixed recommended
#binlog_format=row

# Uncomment the following if you are using InnoDB tables
innodb_data_home_dir = /var/db/mysql
innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
innodb_log_group_home_dir = /var/db/mysql
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 1G
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

!include /usr/local/etc/wsrep.cnf

and

# cat /usr/local/etc/wsrep.cnf
# This file contains wsrep-related mysqld options. It should be included
# in the main MySQL configuration file.
#
# Options that need to be customized:
#  - wsrep_provider
#  - wsrep_cluster_address
#  - wsrep_sst_auth
# The rest of defaults should work out of the box.

##
## mysqld options _MANDATORY_ for correct opration of the cluster
##
[mysqld]

# (This must be substituted by wsrep_format)
binlog_format=ROW

# Currently only InnoDB storage engine is supported
default-storage-engine=innodb

# to avoid issues with 'bulk mode inserts' using autoinc
innodb_autoinc_lock_mode=2

# Override bind-address
# In some systems bind-address defaults to 127.0.0.1, and with mysqldump SST
# it will have (most likely) disastrous consequences on donor node
#bind-address=0.0.0.0

##
## WSREP options
##
wsrep_on=ON

# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/local/lib/libgalera_smm.so

# Provider specific configuration options
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"

# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="hout_wsrep_cluster"

# Group communication system handle
wsrep_cluster_address="gcomm://172.16.15.20"

# Human-readable node name (non-unique). Hostname by default.
wsrep_node_name=hout3

# Base replication <address|hostname>[:port] of the node.
# The values supplied will be used as defaults for state transfer receiving,
# listening ports and so on. Default: address of the first network interface.
wsrep_node_address=172.16.15.21:4567

# Address for incoming client connections. Autodetect by default.
#wsrep_node_incoming_address=

# How many threads will process writesets from other nodes
wsrep_slave_threads=1

# DBUG options for wsrep provider
wsrep_dbug_option=1

# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1

# Maximum number of rows in write set
wsrep_max_ws_rows=131072

# Maximum size of write set
wsrep_max_ws_size=1073741824

# to enable debug level logging, set this to 1
wsrep_debug=1

# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0

# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1

# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1

# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_282555_workaround=0

# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0

# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status  - new status of this node
# --uuid    - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index   - index of this node in the list
wsrep_notify_cmd=

##
## WSREP State Transfer options
##

# State Snapshot Transfer method
wsrep_sst_method=rsync

# Address which donor should send State Snapshot to.
# Should be the address of THIS node. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=

# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_auth=root:

# Desired SST donor name.
#wsrep_sst_donor=

# Reject client queries when donating SST (false)
#wsrep_sst_donor_rejects_queries=0

# Protocol version to use
# wsrep_protocol_version=

What should I do to get "hout3" to connect to "hout2" without this error?

Roland Giesler GTS

unread,
Sep 7, 2017, 6:09:36 PM9/7/17
to codership
It seems that mysql was already running.  Stopping those processes fixed that.  However, now I get this

2017-09-07 23:52:55 34820008448 [Note] WSREP: State transfer required:
       
Group state: 2749a781-9248-11e7-83a2-0262448701d2:0
       
Local state: 00000000-0000-0000-0000-000000000000:-1
2017-09-07 23:52:55 34820008448 [Note] WSREP: New cluster view: global state: 2749a781-9248-11e7-83a2-0262448701d2:0, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-09-07 23:52:55 34820008448 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-09-07 23:52:55 34820008448 [Note] WSREP: Setting wsrep_ready to 0
2017-09-07 23:52:55 34827545088 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.16.15.21' --datadir '/var/db/mysql/'  --defaults-extra-file '/var/db/mysql/my.cnf'  --parent '1249' --binlog 'mysql-bin' '
WSREP_SST
: [ERROR] rsync daemon port '4444' has been taken (20170907 23:52:55.N)
WSREP_SST
: [INFO] Joiner cleanup. rsync PID: 1288 (20170907 23:52:55.N)
WSREP_SST
: [INFO] Joiner cleanup done. (20170907 23:52:56.N)
2017-09-07 23:52:56 34827545088 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.16.15.21' --datadir '/var/db/mysql/'  --defaults-extra-file '/var/db/mysql/my.cnf'  --parent '1249' --binlog 'mysql-bin'
       
Read: '(null)'
2017-09-07 23:52:56 34827545088 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.16.15.21' --datadir '/var/db/mysql/'  --defaults-extra-file '/var/db/mysql/my.cnf'  --parent '1249' --binlog 'mysql-bin' : 16 (Device busy)
2017-09-07 23:52:56 34820008448 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2017-09-07 23:52:56 34820008448 [ERROR] Aborting

Error in my_thread_global_end(): 1 threads didn't exit
170907 23:53:10 mysqld_safe mysqld from pid file /var/db/mysql/hout3.pid ended

However, port 4444 is not taken:

# netstat -an
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4      
0     36 172.16.15.21.22        192.168.120.224.41774  ESTABLISHED
tcp4      
0      0 *.22                   *.*                    LISTEN
udp4      
0      0 127.0.0.1.123          *.*                    
udp6      
0      0 fe80::1%lo0.123        *.*                    
udp6      
0      0 ::1.123                *.*                    
udp4      
0      0 172.16.15.21.123       *.*                    
udp4      
0      0 *.123                  *.*                    
udp6      
0      0 *.123                  *.*                    
udp4      
0      0 *.514                  *.*                    
udp6      
0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001a6b0870 stream      
0      0                0 fffff8001a6b0960                0                0
fffff8001a6b0960 stream      
0      0                0 fffff8001a6b0870                0                0
fffff8001a6b0b40 stream      
0      0 fffff8001a6f8588                0                0                0 /var/run/devd.pipe
fffff8001a6b05a0 dgram      
0      0                0 fffff8001a6b5780                0 fffff8001a6b5690
fffff8001a6b5690 dgram      
0      0                0 fffff8001a6b5780                0                0
fffff8001a6b5780 dgram      
0      0 fffff8001a664ce8                0 fffff8001a6b05a0                0 /var/run/logpriv
fffff8001a6b5870 dgram      
0      0 fffff8003acaa000                0                0                0 /var/run/log
fffff8001a6b0a50 seqpac      
0      0 fffff8001a6f83b0                0                0                0 /var/run/devd.seqpacket.pipe

What now?

Reply all
Reply to author
Forward
0 new messages