Unable to get nodes join Multi-DC Galera cluster.

515 views
Skip to first unread message

Carlos Parada

unread,
Feb 18, 2019, 5:58:31 AM2/18/19
to codership
Hi,

I am trying to set up a Galera cluster using 3 nodes in two different datacenters. This is just for testing now, as the plan, going forward, is to add a 3rd DC or just use Galera Arbitrator.

I have tried lots of different configs but so far I have been unable to get nodes from DC2 to join the ones on DC1. I have tried getting 3 nodes nodes up on DC1 and then trying to join 3 from DC2 or spinning up just the bootstrap node on DC1 and then add a few nodes from DC2, to no avail. Network connectivity is total open (ANY to ANY). Ping response times cross-DC are pretty acceptable, ~2.5ms

The nodes seem to initially but then just time out

2019-02-18  9:31:32 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)

 

 at gcomm
/src/pc.cpp:connect():158

2019-02-18  9:31:32 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)

2019-02-18  9:31:32 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1608: Failed to open channel 'my_wsrep_cluster' at 'gcomm://<list of ips>': -110 (Connection timed out)

2019-02-18  9:31:32 0 [ERROR] WSREP: gcs connect failed: Connection timed out

2019-02-18  9:31:32 0 [ERROR] WSREP: wsrep::connect(gcomm://<list of ips>) failed: 7

2019-02-18  9:31:32 0 [ERROR] Aborting




My configuration is the below.


#



# Galera Cluster: mandatory settings

#




[server]

bind
-address=0.0.0.0

binlog_format
=row

binlog_cache_size
=2M

default_storage_engine
=InnoDB

innodb_autoinc_lock_mode
=2

innodb_locks_unsafe_for_binlog
=1

query_cache_size
=0

query_cache_type
=0

max_allowed_packet
=1G

innodb_fast_shutdown
=0

innodb_thread_concurrency
=0




[galera]

wsrep_on
=ON

wsrep_provider
="/usr/lib/galera/libgalera_smm.so"

wsrep_provider_options
="gmcast.segment=1; evs.suspect_timeout=PT1M; evs.inactive_timeout=PT1M; evs.send_window=1024; evs.user_send_window=1024; evs.inactive_check_period=PT30S; gcs.fc_factor=1; gcs.fc_limit=260; gcs.max_packet_size=2097152; evs.join_retrans_period=PT0.5S"

wsrep_cluster_address
="gcomm://${cluster_nodes}"

wsrep
-sst-method=rsync-wan




#

# Optional setting

#




# Tune this value for your system, roughly 2x cores; see https://mariadb.com/kb/en/mariadb/galera-cluster-system-variables/#wsrep_slave_threads

wsrep_slave_threads
=4






# innodb_flush_log_at_trx_commit=0


Victor Lindellee

unread,
Feb 18, 2019, 6:14:08 AM2/18/19
to codersh...@googlegroups.com, Carlos Parada
Hi Carlos!

We had problems in connecting our nodes until we defined the wsrep_sst_receive_address, somehow it assumed another IP address. I could imagine this is the case since you might have different IP-series in your different locations.

regards
Victor

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carlos Parada

unread,
Feb 18, 2019, 10:31:29 AM2/18/19
to codership
Hi,

I just tried adding that option that it didn't help, the gets totally messed after I try to add the nodes from DC2..

19-02-18 15:28:56 0 [Note] WSREP: evs::proto(50320d06, GATHER, view_id(REG,50320d06,32)) suspecting node: 9da138c4
2019-02-18 15:28:56 0 [Note] WSREP: evs::proto(50320d06, GATHER, view_id(REG,50320d06,32)) suspected node without join message, declaring inactive
2019-02-18 15:28:56 0 [Note] WSREP: evs::proto(50320d06, GATHER, view_id(REG,50320d06,32)) suspecting node: a091ae33
2019-02-18 15:28:56 0 [Note] WSREP: evs::proto(50320d06, GATHER, view_id(REG,50320d06,32)) suspected node without join message, declaring inactive
2019-02-18 15:28:57 0 [Note] WSREP: (50320d06, 'tcp://0.0.0.0:4567') turning message relay requesting off


The bootstrap node (that was fine initially) also craps out.

wsrep_evs_state              | GATHER                                                                                                                                                                                                  |
| wsrep_gcomm_uuid             | 9da138c4-338e-11e9-8bca-a27c06269e24                                                                                                                                                                    |
| wsrep_cluster_capabilities   |                                                                                                                                                                                                         |
| wsrep_cluster_conf_id        | 18446744073709551615                                                                                                                                                                                    |
| wsrep_cluster_size           | 3                                                                                                                                                                                                       |
| wsrep_cluster_state_uuid     | 9b420d85-338e-11e9-a804-9ec4385d18fd                                                                                                                                                                    |
| wsrep_cluster_status         | non-Primary                                                                                                                                                                                             |
| wsrep_connected              | ON                                                                                                                                                                                                      |
| wsrep_local_bf_aborts        | 0                                                                                                                                                                                                       |
| wsrep_local_index            | 1                                                                                                                                                                                                       |
| wsrep_provider_capabilities  | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO:                                                          |
| wsrep_provider_name          | Galera                                                                                                                                                                                                  |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>                                                                                                                                                                       |
| wsrep_provider_version       | 26.4.0(r4463)                                                                                                                                                                                           |
| wsrep_ready                  | OFF


What settings are you using for wsrep_provider_options ?

Thanks.

Victor Lindellee

unread,
Feb 19, 2019, 3:37:01 AM2/19/19
to codersh...@googlegroups.com, Carlos Parada
Hi again!

I'm noticing that your wsrep is broadcasting wildcard address. `[Note] WSREP: (50320d06, 'tcp://0.0.0.0:4567')`

We are also using the option wsrep_node_address that might also be helpful in your case.

For our wsrep_provider_options we use nothing fancy, just defining gcache like `wsrep_provider_options="gcache.size = 1G"`

regards
Victor

Reply all
Reply to author
Forward
0 new messages