node timeout error when joining cluster

47 views

Skip to first unread message

Sudha Ram

unread,

Jun 22, 2017, 6:02:56 AM6/22/17

to codership

Hi,

I've a 3 node cluster setup. And everything is working fine. But when I increase the network delay using "tc qdisc add dev eth0 root netem delay 5000ms" ( 5 seconds) to simulate a remote node ( node 3) , the node won't join the cluster.

Error log is :

2017-06-22 14:35:03 14084 [Note] WSREP: Read nil XID from storage engines, skipping position init

2017-06-22 14:35:03 14084 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'

2017-06-22 14:35:03 14084 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <in...@codership.com> loaded successfully.

2017-06-22 14:35:03 14084 [Note] WSREP: CRC-32C: using hardware acceleration.

2017-06-22 14:35:03 14084 [Note] WSREP: Found saved state: fcc29741-49df-11e7-b78c-3ecc331426c8:-1, safe_to_bootsrap: 0

2017-06-22 14:35:03 14084 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.2.253; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.peer_timeout = PT10S; gmcast.segment = 1; gmcast.time_wait = PT15S; gmcast.version =

2017-06-22 14:35:03 14084 [Note] WSREP: GCache history reset: old(fcc29741-49df-11e7-b78c-3ecc331426c8:0) -> new(fcc29741-49df-11e7-b78c-3ecc331426c8:641)

2017-06-22 14:35:03 14084 [Note] WSREP: Assign initial position for certification: 641, protocol version: -1

2017-06-22 14:35:03 14084 [Note] WSREP: wsrep_sst_grab()

2017-06-22 14:35:03 14084 [Note] WSREP: Start replication

2017-06-22 14:35:03 14084 [Note] WSREP: Setting initial position to fcc29741-49df-11e7-b78c-3ecc331426c8:641

2017-06-22 14:35:03 14084 [Note] WSREP: protonet asio version 0

2017-06-22 14:35:03 14084 [Note] WSREP: Using CRC-32C for message checksums.

2017-06-22 14:35:03 14084 [Note] WSREP: initializing ssl context

2017-06-22 14:35:03 14084 [Note] WSREP: backend: asio

2017-06-22 14:35:03 14084 [Note] WSREP: gcomm thread scheduling priority set to other:0

2017-06-22 14:35:03 14084 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)

2017-06-22 14:35:03 14084 [Note] WSREP: restore pc from disk failed

2017-06-22 14:35:03 14084 [Note] WSREP: GMCast version 0

2017-06-22 14:35:03 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567

2017-06-22 14:35:03 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') multicast: , ttl: 1

2017-06-22 14:35:03 14084 [Note] WSREP: EVS version 0

2017-06-22 14:35:03 14084 [Note] WSREP: gcomm: connecting to group 'test_cluster', peer '192.168.2.251:,192.168.2.252:,192.168.2.253:'

2017-06-22 14:35:03 14084 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.2.253:59486 local endpoint ssl://192.168.2.253:4567 cipher: AES128-SHA compression:

2017-06-22 14:35:03 14084 [Note] WSREP: SSL handshake successful, remote endpoint ssl://192.168.2.253:4567 local endpoint ssl://192.168.2.253:59486 cipher: AES128-SHA compression:

2017-06-22 14:35:03 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') connection established to e1070e3c ssl://192.168.2.253:4567

2017-06-22 14:35:03 14084 [Warning] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') address 'ssl://192.168.2.253:4567' points to own listening address, blacklisting

2017-06-22 14:35:06 14084 [Warning] WSREP: no nodes coming from prim view, prim not possible

2017-06-22 14:35:06 14084 [Note] WSREP: view(view_id(NON_PRIM,e1070e3c,1) memb {

e1070e3c,1

} joined {

} left {

} partitioned {

})

2017-06-22 14:35:07 14084 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50834S), skipping check

2017-06-22 14:35:13 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.2.252:4567 timed out, no messages seen in PT10S

2017-06-22 14:35:13 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') connection to peer e1070e3c with addr ssl://192.168.2.253:4567 timed out, no messages seen in PT10S

2017-06-22 14:35:28 14084 [Note] WSREP: (e1070e3c, 'ssl://0.0.0.0:4567') connection to peer 00000000 with addr ssl://192.168.2.252:4567 timed out, no messages seen in PT10S

2017-06-22 14:35:36 14084 [Note] WSREP: view((empty))

2017-06-22 14:35:36 14084 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)

at gcomm/src/pc.cpp:connect():158

2017-06-22 14:35:36 14084 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)

2017-06-22 14:35:36 14084 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'test_cluster' at 'gcomm://192.168.2.251,192.168.2.252,192.168.2.253': -110 (Connection timed out)

2017-06-22 14:35:36 14084 [ERROR] WSREP: gcs connect failed: Connection timed out

2017-06-22 14:35:36 14084 [ERROR] WSREP: wsrep::connect(gcomm://192.168.2.251,192.168.2.252,192.168.2.253) failed: 7

2017-06-22 14:35:36 14084 [ERROR] Aborting

2017-06-22 14:35:36 14084 [Note] WSREP: Service disconnected.

2017-06-22 14:35:37 14084 [Note] WSREP: Some threads may fail to exit.

2017-06-22 14:35:37 14084 [Note] Binlog end

2017-06-22 14:35:37 14084 [Note] /usr/sbin/mysqld: Shutdown complete

my galera settings :

[mysqld]

#binlog_format=ROW

#default-storage-engine=innodb

#innodb_autoinc_lock_mode=2

bind-address=0.0.0.0

# Galera Provider Configuration

wsrep_on=ON

wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration

wsrep_cluster_name="test_cluster"

wsrep_cluster_address="gcomm://192.168.2.251,192.168.2.252,192.168.2.253"

# Galera Synchronization Configuration

wsrep_sst_method=rsync

# Galera Node Configuration

wsrep_node_address="192.168.2.253"

wsrep_node_name="node-ins4"

#ssl settings

wsrep_provider_options="socket.ssl_key=/etc/mysql/certs/server-key.pem;socket.ssl_cert=/etc/mysql/certs/server-cert.pem;socket.ssl_ca=/etc/mysql/certs/ca-cert.pem;socket.checksum=2;socket.ssl_cipher=AES128-SHA;pc.bootstrap=YES;gmcast.segment=2;gmcast.peer_timeout=PT10S;gmcast.time_wait=PT15S"