Hi,
I have a question about restarting the cluster when all the nodes are shutdown gracefully.
I might have missed some config setting or might be a mistake. Hoping that some one can help me out.
Really appreciate the help/comments.
Thanks,
Chandra
Sorry for the long message; I wanted to provide as much info as possible....
Issue: I am NOT able to restart the cluster when both the nodes are shut down gracefully with Mariadb.
Also, I see that gvwstate.dat file gets deleted once I stop the service (on both the nodes).
(I am using 2 nodes and trying to setup master-slave kind of setup.)
The provider options set : pc.bootstrap=YES;pc.recovery=TRUE;pc.wait_prim=FALSE;gcache.recover=YES;pc.ignore_sb=TRUE
Steps followed (in this order)
Node1 : galera_new_cluster
Node2 : systemctl start mariadb
Cluster is up and running and I could see cluster_size as 2 etc (other info is attached in the file)
Node 2: systemctl stop mariadb
Node 1: systemctl stop mariadb
I see that gvwstate.dat file is not present on both the systems
node 1: systemctl start mariadb
This fails with the following error:
Job for mariadb.service failed because a timeout was exceeded. See "systemctl status mariadb.service" and "journalctl -xe" for details.
Few things from mysql.log (file)
2017-07-21 13:14:43 139707470456960 [Note] WSREP: Setting wsrep_ready to 0
2017-07-21 13:14:43 139707470456960 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-07-21 13:14:43 139707470456960 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2017-07-21 13:14:44 139707470456960 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <
in...@codership.com> loaded successfully.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Found saved state: 7edab534-6e15-11e7-9600-9a33d350e94b:0, safe_to_bootsrap: 1
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Recovering GCache ring buffer: version: 1, UUID: 7edab534-6e15-11e7-9600-9a33d350e94b, offset: -1
2017-07-21 13:14:44 139707470456960 [Note] WSREP: GCache::RingBuffer initial scan (134217768 bytes)... 0.0% (0 bytes) complete.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: GCache::RingBuffer initial scan (134217768 bytes)... 100.0% (134217768 bytes) complete.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Recovering GCache ring buffer: didn't recover any events.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.1.10.72; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = YES; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.bootstrap = YES; pc.che
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2017-07-21 13:14:44 139707470456960 [Note] WSREP: wsrep_sst_grab()
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Start replication
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Setting initial position to 7edab534-6e15-11e7-9600-9a33d350e94b:0
2017-07-21 13:14:44 139707470456960 [Note] WSREP: protonet asio version 0
2017-07-21 13:14:44 139707470456960 [Note] WSREP: Using CRC-32C for message checksums.
2017-07-21 13:14:44 139707470456960 [Note] WSREP: initializing ssl context
2017-07-21 13:14:44 139707470456960 [Note] WSREP: backend: asio
2017-07-21 13:14:44 139707470456960 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-07-21 13:14:44 139707470456960 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2017-07-21 13:14:44 139707470456960 [Note] WSREP: restore pc from disk failed
2017-07-21 13:14:44 139707470456960 [Note] WSREP: GMCast version 0
2017-07-21 13:14:44 139707470456960 [Note] WSREP: (905415ad, 'ssl://
0.0.0.0:4567') multicast: , ttl: 1
2017-07-21 13:14:44 139707470456960 [Note] WSREP: EVS version 0
2017-07-21 13:14:44 139707470456960 [Note] WSREP: gcomm: connecting to group 'galera', peer '
10.1.10.72:,
10.1.10.73:'
2017-07-21 13:14:44 139707470456960 [Note] WSREP: SSL handshake successful, remote endpoint ssl://
10.1.10.72:41760 local endpoint ssl://
10.1.10.72:4567 cipher: AES128-SHA compression:
2017-07-21 13:14:44 139707470456960 [Note] WSREP: SSL handshake successful, remote endpoint ssl://
10.1.10.72:4567 local endpoint ssl://
10.1.10.72:41760 cipher: AES128-SHA compression:
2017-07-21 13:14:44 139707470456960 [Note] WSREP: (905415ad, 'ssl://
0.0.0.0:4567') connection established to 905415ad ssl://
10.1.10.72:4567 2017-07-21 13:14:44 139707470456960 [Warning] WSREP: (905415ad, 'ssl://
0.0.0.0:4567') address 'ssl://
10.1.10.72:4567' points to own listening address, blacklisting
2017-07-21 13:14:47 139707470456960 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-07-21 13:14:47 139707470456960 [Note] WSREP: view(view_id(NON_PRIM,905415ad,1) memb {
905415ad,0
} joined {
} left {
} partitioned {
})
2017-07-21 13:14:47 139707470456960 [Note] WSREP: gcomm: connected
2017-07-21 13:14:47 139707470456960 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-07-21 13:14:47 139707470456960 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-07-21 13:14:47 139707470456960 [Note] WSREP: Opened channel 'galera'
2017-07-21 13:14:47 139707470456960 [Note] WSREP: Waiting for SST to complete.
2017-07-21 13:14:47 139707132389120 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-07-21 13:14:47 139707132389120 [Note] WSREP: Flow-control interval: [16, 16]
2017-07-21 13:14:47 139707132389120 [Note] WSREP: Received NON-PRIMARY.
2017-07-21 13:14:47 139707371542272 [Note] WSREP: New cluster view: global state: 7edab534-6e15-11e7-9600-9a33d350e94b:0, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
2017-07-21 13:14:47 139707371542272 [Note] WSREP: Setting wsrep_ready to 0
2017-07-21 13:14:48 139707140781824 [Note] WSREP: (905415ad, 'ssl://
0.0.0.0:4567') connection to peer 905415ad with addr ssl://
10.1.10.72:4567 timed out, no messages seen in PT3S
2017-07-21 13:14:48 139707140781824 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50597S), skipping check
Node1 : Some of important vars/status from mysql command... from node1
wsrep_provider_version | 25.3.20(r3703)
Variable_name: wsrep_provider_options
Value: base_dir = /var/lib/mysql/; base_host = 10.1.10.72; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT7.5S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = P1D; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = YES; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = ssl://
0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.1.10.72; pc.announce_timeout = PT3S; pc.bootstrap = YES; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = true; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = FALSE; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992; socket.ssl = YES; socket.ssl_ca = /opt/Certs/ca-cert.pem; socket.ssl_cert = /opt/Certs/server-cert.pem; socket.ssl_cipher = AES128-SHA; socket.ssl_compression = YES; socket.ssl_key = /opt/Certs/server-key.pem;
Env : 2 Linux CentOuS sytems (Linux vm33_mariadb_72 4.4.49-1.el7.elrepo.x86_64 #1 SMP Wed Feb 15 12:43:41 EST 2017 x86_64 x86_64 x86_64 GNU/Linux)
Mariadb Packages:
MariaDB-compat-10.2.7-1.el7.centos.x86_64
MariaDB-client-10.2.7-1.el7.centos.x86_64
MariaDB-server-10.2.7-1.el7.centos.x86_64
MariaDB-common-10.2.7-1.el7.centos.x86_64
galera-25.3.20-1.rhel7.el7.centos.x86_64
socat-1.7.2.2-5.el7.x86_64
Configs: Important ones.. (others I have attached the file)
[galera]
#wasrep settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://
10.1.10.72,10.1.10.73
wsrep_cluster_name=galera
wsrep_node_address=10.1.10.72
wsrep_node_name=vm33_mariadb_72
#wsrep_sst_method=xtrabackup-v2
wsrep_sst_method=rsync
wsrep_sst_auth=timsgalera:abc123
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
bind-address=10.1.10.72
query_cache_size=0
innodb_doublewrite=1
wsrep_log_conflicts=ON wsrep_provider_options="pc.bootstrap=YES;pc.recovery=TRUE;pc.wait_prim=FALSE;gcache.recover=YES;pc.ignore_sb=TRUE;socket.ssl_key=/opt/Certs/server-key.pem;socket.ssl_cert=/opt/Certs/server-cert.pem;socket.ssl_ca=/opt/Certs/ca-cert.pem;socket.ssl_cipher=AES128-SHA"
wsrep_log_conflicts=ON
wsrep_debug=ON
wsrep_notify_cmd=/opt/scripts/nodeStatusChange.sh