Galera cluster and restart. Seems I need --wsrep-new-cluster every times.

1,815 views

Skip to first unread message

Xavier BEAUDOUIN

unread,

Feb 8, 2019, 4:15:12 AM2/8/19

to codership

Hello,

I am trying to make a galera cluster to remove all mysql/mariadb standalone servers we have.

I have 4 mariadb server 10.2.21 with galera 25.3.25.

All our machines are FreeBSD 12 based.

Mariadb has been compiled with WSREP support and default freebsd ports options :
https://www.freshports.org/databases/mariadb102-server/ eg :

CONNECT_EXTRA=on: Enable ODBC and XML in CONNECT engine
DOCS=on: Build and/or install documentation
====> Optional page compression
LZ4=off: LZ4 compression support
LZO=off: LZO compression support
SNAPPY=off: Snappy compression library support
ZSTD=off: Zstandard compression support (RocksDB only)
====> Optional MariaDB storage engines
INNOBASE=on: InnoDB default engine
MROONGA=off: Mroonga Full Text Search engine
OQGRAPH=off: Open Query Graph Computation engine
ROCKSDB=off: RocksDB LSM engine (Alpha)
SPHINX=on: SphinxSE engine
SPIDER=on: Partitioning and XA-transactions engine
TOKUDB=off: Fractal tree index tree data structure engine
XTRADB=off: Build XtraDB engine next to InnoDB
====> Optional Mroonga features
ZMQ=off: ZeroMQ support
MSGPACK=off: MsgPack support
====> GSSAPI Security API support: you have to select exactly one of them
GSSAPI_BASE=on: GSSAPI support via base system (needs Kerberos)
GSSAPI_HEIMDAL=off: GSSAPI support via security/heimdal
GSSAPI_MIT=off: GSSAPI support via security/krb5
GSSAPI_NONE=off: Disable GSSAPI support

Galera is configured with the following options :
https://www.freshports.org/databases/galera/

===> The following configuration options are available for galera-25.3.25_1:
BOOSTPOOL=off: Use boost pool allocator
BPOSTATIC=off: Use static boost_program_options
DEBUG=off: Build with debugging support

Mariadb configuration is configured like this on each servers (server-id are, unique on all 4 servers).

; Puppetd DOES update this file
[mysqld]
;feedback=on
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
; 4 servers -> 12
innodb_thread_concurrency=12
thread_concurrency=12
wsrep_provider=/usr/local/lib/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M; gcache.recover=yes"
wsrep_cluster_name="db-vip"
; only on boot
wsrep_cluster_address="gcomm://192.0.2.1,192.0.2.2,192.0.2.3,192.0.2.4"
wsrep_sst_method="rsync"
wsrep_on=on
wsrep_slave_threads = 8
;transaction-isolation = read-commited
max_allowed_packet=1024M
query_cache_type = 1
query_cache_size = 128M
query_cache_limit = 64M
thread_cache_size = 16
key_buffer = 256M
slow-query-log = 1
long_query_time = 3
slow-query-log-file = /var/db/mysql/mysql-slow.log
log-queries-not-using-indexes=0
max_heap_table_size = 512M
join_buffer_size = 8M
max_connections = 4096
max_user_connections = 4096
sync_binlog=0
open_files_limit = 50000
server-id=1

Now the big issue is to restart the server.

In fact when all 4 servers have mariadb down, I cannot restart the server because of the following :

2019-02-08 9:01:54 34388045824 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-02-08 9:01:54 34388045824 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/lib/libgalera_smm.so'
2019-02-08 9:01:54 34388045824 [Note] WSREP: wsrep_load(): Galera 3.25(rrelease_25.3.25) by Codership Oy <in...@codership.com> loaded successfully.
2019-02-08 9:01:54 34388045824 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-02-08 9:01:54 34388045824 [Note] WSREP: Found saved state: 0fc257b6-1a31-11e9-b1dc-87b5e2e64482:3423534, safe_to_bootstrap: 1
2019-02-08 9:01:54 34388045824 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 0fc257b6-1a31-11e9-b1dc-87b5e2e64482, offset: 285051568
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/314572840 bytes) complete.
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (314572840/314572840 bytes) complete.
2019-02-08 9:01:54 34388045824 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 3206965-3423534
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/314570728 bytes) complete.
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (314570728/314570728 bytes) complete.
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/216570 locked buffers
2019-02-08 9:01:54 34388045824 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 314570728/314572800
2019-02-08 9:01:54 34388045824 [Note] WSREP: Passing config to GCS: base_dir = /var/db/mysql/; base_host = 192.0.2.1; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/db/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/db/mysql//galera.cache; gcache.page_size = 300M; gcache.recover = yes; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S;
2019-02-08 9:01:54 34388045824 [Note] WSREP: Assign initial position for certification: 3423534, protocol version: -1
2019-02-08 9:01:54 34388045824 [Note] WSREP: wsrep_sst_grab()
2019-02-08 9:01:54 34388045824 [Note] WSREP: Start replication
2019-02-08 9:01:54 34388045824 [Note] WSREP: Start replication
2019-02-08 9:01:54 34388045824 [Note] WSREP: Setting initial position to 0fc257b6-1a31-11e9-b1dc-87b5e2e64482:3423534
2019-02-08 9:01:54 34388045824 [Note] WSREP: protonet asio version 0
2019-02-08 9:01:54 34388045824 [Note] WSREP: Using CRC-32C for message checksums.
2019-02-08 9:01:54 34388045824 [Note] WSREP: backend: asio
2019-02-08 9:01:54 34388045824 [Note] WSREP: gcomm thread scheduling priority set to other:0
2019-02-08 9:01:54 34388045824 [Warning] WSREP: access file(/var/db/mysql//gvwstate.dat) failed(No such file or directory)
2019-02-08 9:01:54 34388045824 [Note] WSREP: restore pc from disk failed
2019-02-08 9:01:54 34388045824 [Note] WSREP: GMCast version 0
2019-02-08 9:01:54 34388045824 [Note] WSREP: (cc866de5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-02-08 9:01:54 34388045824 [Note] WSREP: (cc866de5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-02-08 9:01:54 34388045824 [Note] WSREP: EVS version 0
2019-02-08 9:01:54 34388045824 [Note] WSREP: gcomm: connecting to group 'db-vip', peer ‘192.0.2.1:,192.0.2.2:,192.0.2.3:,192.0.2.4:'
2019-02-08 9:01:54 34388045824 [Note] WSREP: (cc866de5, 'tcp://0.0.0.0:4567') connection established to cc866de5 tcp://192.0.2.1:4567
2019-02-08 9:01:54 34388045824 [Warning] WSREP: (cc866de5, 'tcp://0.0.0.0:4567') address 'tcp://192.0.2.1:4567' points to own listening address, blacklisting
2019-02-08 9:01:57 34388045824 [Note] WSREP: (cc866de5, 'tcp://0.0.0.0:4567') connection to peer cc866de5 with addr tcp://192.0.2.1:4567 timed out, no messages seen in PT3S
2019-02-08 9:01:57 34388045824 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-02-08 9:01:57 34388045824 [Note] WSREP: view(view_id(NON_PRIM,cc866de5,1) memb {
cc866de5,0
} joined {
} left {
} partitioned {
})
2019-02-08 9:01:58 34388045824 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.74233S), skipping check
2019-02-08 9:02:28 34388045824 [Note] WSREP: view((empty))
2019-02-08 9:02:28 34388045824 [ERROR] WSREP: failed to open gcomm backend connection: 60: failed to reach primary view: 60 (Operation timed out)
at gcomm/src/pc.cpp:connect():158
2019-02-08 9:02:28 34388045824 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -60 (Operation timed out)
2019-02-08 9:02:28 34388045824 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'db-vip' at 'gcomm://192.0.2.1,192.0.2.2,192.0.2.3,192.0.2.4': -60 (Operation timed out)
2019-02-08 9:02:28 34388045824 [ERROR] WSREP: gcs connect failed: Operation timed out
2019-02-08 9:02:28 34388045824 [ERROR] WSREP: wsrep::connect(gcomm://192.0.2.1,192.0.2.2,192.0.2.3,192.0.2.4) failed: 7
2019-02-08 9:02:28 34388045824 [ERROR] Aborting

The starting server in this log was 192.0.2.1 (that latest that was shutdown correctly).

To make my cluster working again, I have then to do —wsrep-new-cluster on this mariadb server then recreate all nodes like it was just a new server.

Do I have missed something? Is this normal way of working a galera server ?

Kind regards,
Xavier

Jonathan Leroy - Inikup

unread,

Feb 8, 2019, 4:40:04 AM2/8/19

to Xavier BEAUDOUIN, codership

Le ven. 8 févr. 2019 à 10:15, Xavier BEAUDOUIN <xbeau...@hotcity.lu> a écrit :

Is this normal way of working a galera server ?

Yep.

Jonathan Leroy
http://www.inikup.com/
Tel: +33 (0)9 74 77 41 72

Reply all

Reply to author

Forward

0 new messages