Not able to make second node join existing galera cluster

2,244 views
Skip to first unread message

trupti mali

unread,
Sep 3, 2015, 1:48:34 AM9/3/15
to codership

Hi,I already had a galera cluster setup of two nodes.(No issue of split brain was faced. Both the nodes were replicatiing data nicely. Then I took off the secondary node. And tried to add it back. While starting the second ndoe's mysql service, I am facing below issue. Highlighted in red...


Sep  2 13:49:59 localhost mysqld:

Sep  2 13:49:59 localhost mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

Sep  2 14:04:01 localhost systemd[1]: Starting LSB: Start and stop the mysql database server daemon...

Sep  2 14:04:02 localhost mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql

Sep  2 14:04:02 localhost mysqld_safe: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.SZVcjJ' --pid-file='/var/lib/mysql/localhost-recover.pid'

Sep  2 14:04:02 localhost mysqld: 150902 14:04:02 [Note] /usr/sbin/mysqld (mysqld 10.0.21-MariaDB-1~wheezy-wsrep-log) starting as process 9865 ...

Sep  2 14:04:04 localhost mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] /usr/sbin/mysqld (mysqld 10.0.21-MariaDB-1~wheezy-wsrep-log) starting as process 9906 ...

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Read nil XID from storage engines, skipping position init

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <in...@codership.com> loaded successfully.

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: CRC-32C: using hardware acceleration.

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Passing config to GCS: base_host = 192.168.198.146; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; p

Sep  2 14:04:04 localhost mysqld: c.npvo = false; pc.reco

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Service thread queue flushed.

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: wsrep_sst_grab()

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Start replication

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: protonet asio version 0

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: Using CRC-32C for message checksums.

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: backend: asio

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: restore pc from disk failed

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: GMCast version 0

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: (7869632d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: (7869632d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: EVS version 0

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: gcomm: connecting to group 'london_galera_cluster', peer '192.168.199.182:,192.168.198.146:'

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Warning] WSREP: (7869632d, 'tcp://0.0.0.0:4567') address 'tcp://192.168.198.146:4567' points to own listening address, blacklisting

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: (7869632d, 'tcp://0.0.0.0:4567') address 'tcp://192.168.198.146:4567' pointing to uuid 7869632d is blacklisted, skipping

Sep  2 14:04:04 localhost mysqld: 150902 14:04:04 [Note] WSREP: (7869632d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: declaring 3915befb at tcp://192.168.199.182:4567 stable

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Node 3915befb state prim

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: view(view_id(PRIM,3915befb,20) memb {

Sep  2 14:04:05 localhost mysqld: #0113915befb,0

Sep  2 14:04:05 localhost mysqld: #0117869632d,0

Sep  2 14:04:05 localhost mysqld: } joined {

Sep  2 14:04:05 localhost mysqld: } left {

Sep  2 14:04:05 localhost mysqld: } partitioned {

Sep  2 14:04:05 localhost mysqld: })

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: save pc into disk

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: gcomm: connected

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Opened channel 'london_galera_cluster'

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Waiting for SST to complete.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: STATE EXCHANGE: sent state msg: 78c26f22-517b-11e5-8730-d7bb507c047a

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: STATE EXCHANGE: got state msg: 78c26f22-517b-11e5-8730-d7bb507c047a from 0 (192.168.199.182)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: STATE EXCHANGE: got state msg: 78c26f22-517b-11e5-8730-d7bb507c047a from 1 (192.168.198.146)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Quorum results:

Sep  2 14:04:05 localhost mysqld: #011version    = 3,

Sep  2 14:04:05 localhost mysqld: #011component  = PRIMARY,

Sep  2 14:04:05 localhost mysqld: #011conf_id    = 19,

Sep  2 14:04:05 localhost mysqld: #011members    = 1/2 (joined/total),

Sep  2 14:04:05 localhost mysqld: #011act_id     = 1757,

Sep  2 14:04:05 localhost mysqld: #011last_appl. = -1,

Sep  2 14:04:05 localhost mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),

Sep  2 14:04:05 localhost mysqld: #011group UUID = 0c771f10-4fb5-11e5-a418-164f3b890a85

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Flow-control interval: [23, 23]

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1757)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: State transfer required:

Sep  2 14:04:05 localhost mysqld: #011Group state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1757

Sep  2 14:04:05 localhost mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: New cluster view: global state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1757, view# 20: Primary, number of nodes: 2, my index: 1, protocol version 3

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Warning] WSREP: Gap in state sequence. Need state transfer.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.198.146' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '9906' --binlog '/var/log/mysql/mariadb-bin' '

Sep  2 14:04:05 localhost rsyncd[9956]: rsyncd version 3.1.1 starting, listening on port 4444

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Prepared SST request: rsync|192.168.XXX.XXX:4444/rsync_sst

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: REPL Protocols: 7 (3, 2)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Service thread queue flushed.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Assign initial position for certification: 1757, protocol version: 3

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Service thread queue flushed.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (0c771f10-4fb5-11e5-a418-164f3b890a85): 1 (Operation not permitted)

Sep  2 14:04:05 localhost mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Member 1.0 (192.168.XXX.XXX) requested state transfer from '*any*'. Selected 0.0 (192.168.YYY.YYY)(SYNCED) as donor.

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1757)

Sep  2 14:04:05 localhost mysqld: 150902 14:04:05 [Note] WSREP: Requesting state transfer: success, donor: 0

Sep  2 14:04:06 localhost mysqld: 150902 14:04:06 [Warning] WSREP: 0.0 (192.168.YYY.YYY): State transfer to 1.0 (192.168.XXX.XXX) failed: -1 (Operation not permitted)

Sep  2 14:04:06 localhost mysqld: 150902 14:04:06 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.

Sep  2 14:04:06 localhost mysqld: 150902 14:04:06 [Note] WSREP: gcomm: terminating thread

Sep  2 14:04:06 localhost mysqld: 150902 14:04:06 [Note] WSREP: gcomm: joining thread

Sep  2 14:04:06 localhost mysqld: 150902 14:04:06 [Note] WSREP: gcomm: closing backend

Sep  2 14:04:07 localhost mysqld: 150902 14:04:07 [Note] WSREP: view(view_id(NON_PRIM,3915befb,20) memb {

Sep  2 14:04:07 localhost mysqld: #0117869632d,0

Sep  2 14:04:07 localhost mysqld: } joined {

Sep  2 14:04:07 localhost mysqld: } left {

Sep  2 14:04:07 localhost mysqld: } partitioned {

Sep  2 14:04:07 localhost mysqld: #0113915befb,0

Sep  2 14:04:07 localhost mysqld: })

Sep  2 14:04:07 localhost mysqld: 150902 14:04:07 [Note] WSREP: view((empty))

Sep  2 14:04:07 localhost mysqld: 150902 14:04:07 [Note] WSREP: gcomm: closed

Sep  2 14:04:07 localhost mysqld: 150902 14:04:07 [Note] WSREP: /usr/sbin/mysqld: Terminated.

Sep  2 14:04:08 localhost mysqld: WSREP_SST: [ERROR] Parent mysqld process (PID:9906) terminated unexpectedly. (20150902 14:04:08.010)

Sep  2 14:04:08 localhost mysqld: WSREP_SST: [INFO] Joiner cleanup. (20150902 14:04:08.017)

Sep  2 14:04:08 localhost rsyncd[9956]: sent 0 bytes  received 0 bytes  total size 0

Sep  2 14:04:08 localhost mysqld: WSREP_SST: [INFO] Joiner cleanup done. (20150902 14:04:08.537)

Sep  2 14:04:08 localhost mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

Sep  2 14:04:34 localhost /etc/init.d/mysql[10260]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in

Sep  2 14:04:34 localhost /etc/init.d/mysql[10260]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed

Sep  2 14:04:34 localhost /etc/init.d/mysql[10260]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'

Sep  2 14:04:34 localhost /etc/init.d/mysql[10260]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

Sep  2 14:04:34 localhost /etc/init.d/mysql[10260]:

Sep  2 14:04:34 localhost mysql[9585]: Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!

Sep  2 14:04:34 localhost systemd[1]: mysql.service: control process exited, code=exited status=1

Sep  2 14:04:34 localhost systemd[1]: Failed to start LSB: Start and stop the mysql database server daemon.

Sep  2 14:04:34 localhost systemd[1]: Unit mysql.service entered failed state.

Philip Stoev

unread,
Sep 3, 2015, 2:27:09 AM9/3/15
to trupti mali, codersh...@googlegroups.com
Hello,

Please check the error log on the other node. It should have a corresponding
error message as to why it was unable to provide a snapshot to the new node.

Philip Stoev
--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

trupti mali

unread,
Sep 3, 2015, 3:14:25 AM9/3/15
to codership, trupt...@gmail.com
Hi Thanks for the response Philip.
Here is the o/p of log from the primary node



Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: (3915befb, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: declaring 8d3be05a at tcp://192.168.<Node2 IP>:4567 stable

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Node 3915befb state prim

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: view(view_id(PRIM,3915befb,24) memb {

Sep  3 07:08:18 localhost mysqld: #0113915befb,0

Sep  3 07:08:18 localhost mysqld: #0118d3be05a,0

Sep  3 07:08:18 localhost mysqld: } joined {

Sep  3 07:08:18 localhost mysqld: } left {

Sep  3 07:08:18 localhost mysqld: } partitioned {

Sep  3 07:08:18 localhost mysqld: })

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: save pc into disk

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8d8bf177-520a-11e5-bc55-777c43ddc81a

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: sent state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: got state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a from 0 (192.168.199.182)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: got state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a from 1 (192.168.198.146)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Quorum results:

Sep  3 07:08:18 localhost mysqld: #011version    = 3,

Sep  3 07:08:18 localhost mysqld: #011component  = PRIMARY,

Sep  3 07:08:18 localhost mysqld: #011conf_id    = 23,

Sep  3 07:08:18 localhost mysqld: #011members    = 1/2 (joined/total),

Sep  3 07:08:18 localhost mysqld: #011act_id     = 1866,

Sep  3 07:08:18 localhost mysqld: #011last_appl. = 1857,

Sep  3 07:08:18 localhost mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),

Sep  3 07:08:18 localhost mysqld: #011group UUID = 0c771f10-4fb5-11e5-a418-164f3b890a85

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Flow-control interval: [23, 23]

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: New cluster view: global state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1866, view# 24: Primary, number of nodes: 2, my index: 0, protocol version 3

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: REPL Protocols: 7 (3, 2)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Assign initial position for certification: 1866, protocol version: 3

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Member 1.0 (192.168.<node2 ip>) requested state transfer from '*any*'. Selected 0.0 (192.168.<Node1 ip>)(SYNCED) as donor.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1866)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.198.146:4444/rsync_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog 'binlog' --gtid '0c771f10-4fb5-11e5-a418-164f3b890a85:1866''

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: sst_donor_thread signaled with 0

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Flushing tables for SST...

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Provider paused at 0c771f10-4fb5-11e5-a418-164f3b890a85:1866 (1199)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Tables flushed.

Sep  3 07:08:19 localhost mysqld: tail: cannot open ‘binlog.index’ for reading: No such file or directory

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '192.168.<node2 ip>:4444/rsync_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog 'binlog' --gtid '0c771f10-4fb5-11e5-a418-164f3b890a85:1866'

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address '192.168.<node 2 ip>:4444/rsync_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog 'binlog' --gtid '0c771f10-4fb5-11e5-a418-164f3b890a85:1866': 1 (Operation not permitted)

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: resuming provider at 1199

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: Provider resumed.

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [ERROR] WSREP: Command did not run: wsrep_sst_rsync --role 'donor' --address '192.168.198.146:4444/rsync_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog 'binlog' --gtid '0c771f10-4fb5-11e5-a418-164f3b890a85:1866'

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Warning] WSREP: 0.0 (192.168.199.182): State transfer to 1.0 (192.168.<node 2 ip>) failed: -1 (Operation not permitted)

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 1866)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Member 0.0 (192.168.199.182) synced with group.

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 1866)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Synchronized with group, ready for connections

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: forgetting 8d3be05a (tcp://192.168.198.146:4567)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Node 3915befb state prim

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: view(view_id(PRIM,3915befb,25) memb {

Sep  3 07:08:20 localhost mysqld: #0113915befb,0

Sep  3 07:08:20 localhost mysqld: } joined {

Sep  3 07:08:20 localhost mysqld: } left {

Sep  3 07:08:20 localhost mysqld: } partitioned {

Sep  3 07:08:20 localhost mysqld: #0118d3be05a,0

Sep  3 07:08:20 localhost mysqld: })

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: save pc into disk

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: forgetting 8d3be05a (tcp://192.168.198.146:4567)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8eb372fa-520a-11e5-970f-2efc43c5fe36

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: STATE EXCHANGE: sent state msg: 8eb372fa-520a-11e5-970f-2efc43c5fe36

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: STATE EXCHANGE: got state msg: 8eb372fa-520a-11e5-970f-2efc43c5fe36 from 0 (192.168.199.182)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Quorum results:

Sep  3 07:08:20 localhost mysqld: #011version    = 3,

Sep  3 07:08:20 localhost mysqld: #011component  = PRIMARY,

Sep  3 07:08:20 localhost mysqld: #011conf_id    = 24,

Sep  3 07:08:20 localhost mysqld: #011members    = 1/1 (joined/total),

Sep  3 07:08:20 localhost mysqld: #011act_id     = 1866,

Sep  3 07:08:20 localhost mysqld: #011last_appl. = 1857,

Sep  3 07:08:20 localhost mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),

Sep  3 07:08:20 localhost mysqld: #011group UUID = 0c771f10-4fb5-11e5-a418-164f3b890a85

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Flow-control interval: [16, 16]

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: New cluster view: global state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1866, view# 25: Primary, number of nodes: 1, my index: 0, protocol version 3

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: REPL Protocols: 7 (3, 2)

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Assign initial position for certification: 1866, protocol version: 3

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: (3915befb, 'tcp://0.0.0.0:4567') turning message relay requesting off

Philip Stoev

unread,
Sep 3, 2015, 3:42:19 AM9/3/15
to trupti mali, codership, trupt...@gmail.com
Hello,

The logs you provided are from different dates, so they can not be matched
together . Please look at the log entries on one node that were generated at
the same time as the second node refused to start.

trupti mali

unread,
Sep 3, 2015, 4:19:46 AM9/3/15
to codership, trupt...@gmail.com
Ah i see you point. But the logs are same irrespective of date time. I have been facing this issue since yday.

 

Sep  3 07:06:50 localhost mysql[9811]: Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!

Sep  3 07:06:50 localhost systemd[1]: mysql.service: control process exited, code=exited status=1

Sep  3 07:06:50 localhost systemd[1]: Failed to start LSB: Start and stop the mysql database server daemon.

Sep  3 07:06:50 localhost systemd[1]: Unit mysql.service entered failed state.

Sep  3 07:08:14 localhost systemd[1]: Starting LSB: Start and stop the mysql database server daemon...

Sep  3 07:08:15 localhost mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql

Sep  3 07:08:15 localhost mysqld_safe: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.3u9bQL' --pid-file='/var/lib/mysql/localhost-recover.pid'

Sep  3 07:08:15 localhost mysqld: 150903  7:08:15 [Note] /usr/sbin/mysqld (mysqld 10.0.21-MariaDB-1~wheezy-wsrep-log) starting as process 10787 ...

Sep  3 07:08:17 localhost mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] /usr/sbin/mysqld (mysqld 10.0.21-MariaDB-1~wheezy-wsrep-log) starting as process 10828 ...

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Read nil XID from storage engines, skipping position init

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <in...@codership.com> loaded successfully.

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: CRC-32C: using hardware acceleration.

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Passing config to GCS: base_host = 192.168.198.146; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; p

Sep  3 07:08:17 localhost mysqld: c.npvo = false; pc.reco

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: wsrep_sst_grab()

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Start replication

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: protonet asio version 0

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: Using CRC-32C for message checksums.

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: backend: asio

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: restore pc from disk failed

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: GMCast version 0

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: (8d3be05a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: (8d3be05a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: EVS version 0

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: gcomm: connecting to group 'london_galera_cluster', peer '192.168.199.182:,192.168.198.146:'

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Warning] WSREP: (8d3be05a, 'tcp://0.0.0.0:4567') address 'tcp://192.168.198.146:4567' points to own listening address, blacklisting

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: (8d3be05a, 'tcp://0.0.0.0:4567') address 'tcp://192.168.198.146:4567' pointing to uuid 8d3be05a is blacklisted, skipping

Sep  3 07:08:17 localhost mysqld: 150903  7:08:17 [Note] WSREP: (8d3be05a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: declaring 3915befb at tcp://192.168.199.182:4567 stable

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Node 3915befb state prim

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: view(view_id(PRIM,3915befb,24) memb {

Sep  3 07:08:18 localhost mysqld: #0113915befb,0

Sep  3 07:08:18 localhost mysqld: #0118d3be05a,0

Sep  3 07:08:18 localhost mysqld: } joined {

Sep  3 07:08:18 localhost mysqld: } left {

Sep  3 07:08:18 localhost mysqld: } partitioned {

Sep  3 07:08:18 localhost mysqld: })

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: save pc into disk

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: gcomm: connected

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Opened channel 'london_galera_cluster'

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Waiting for SST to complete.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: sent state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: got state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a from 0 (192.168.199.182)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: STATE EXCHANGE: got state msg: 8d8bf177-520a-11e5-bc55-777c43ddc81a from 1 (192.168.198.146)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Quorum results:

Sep  3 07:08:18 localhost mysqld: #011version    = 3,

Sep  3 07:08:18 localhost mysqld: #011component  = PRIMARY,

Sep  3 07:08:18 localhost mysqld: #011conf_id    = 23,

Sep  3 07:08:18 localhost mysqld: #011members    = 1/2 (joined/total),

Sep  3 07:08:18 localhost mysqld: #011act_id     = 1866,

Sep  3 07:08:18 localhost mysqld: #011last_appl. = -1,

Sep  3 07:08:18 localhost mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),

Sep  3 07:08:18 localhost mysqld: #011group UUID = 0c771f10-4fb5-11e5-a418-164f3b890a85

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Flow-control interval: [23, 23]

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1866)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: State transfer required:

Sep  3 07:08:18 localhost mysqld: #011Group state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1866

Sep  3 07:08:18 localhost mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: New cluster view: global state: 0c771f10-4fb5-11e5-a418-164f3b890a85:1866, view# 24: Primary, number of nodes: 2, my index: 1, protocol version 3

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Warning] WSREP: Gap in state sequence. Need state transfer.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.198.146' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '10828' --binlog '/var/log/mysql/mariadb-bin' '

Sep  3 07:08:18 localhost rsyncd[10878]: rsyncd version 3.1.1 starting, listening on port 4444

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Prepared SST request: rsync|192.168.198.146:4444/rsync_sst

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: REPL Protocols: 7 (3, 2)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Assign initial position for certification: 1866, protocol version: 3

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Service thread queue flushed.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (0c771f10-4fb5-11e5-a418-164f3b890a85): 1 (Operation not permitted)

Sep  3 07:08:18 localhost mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Member 1.0 (192.168.198.146) requested state transfer from '*any*'. Selected 0.0 (192.168.199.182)(SYNCED) as donor.

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1866)

Sep  3 07:08:18 localhost mysqld: 150903  7:08:18 [Note] WSREP: Requesting state transfer: success, donor: 0

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Warning] WSREP: 0.0 (192.168.199.182): State transfer to 1.0 (192.168.198.146) failed: -1 (Operation not permitted)

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: gcomm: terminating thread

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: gcomm: joining thread

Sep  3 07:08:19 localhost mysqld: 150903  7:08:19 [Note] WSREP: gcomm: closing backend

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: view(view_id(NON_PRIM,3915befb,24) memb {

Sep  3 07:08:20 localhost mysqld: #0118d3be05a,0

Sep  3 07:08:20 localhost mysqld: } joined {

Sep  3 07:08:20 localhost mysqld: } left {

Sep  3 07:08:20 localhost mysqld: } partitioned {

Sep  3 07:08:20 localhost mysqld: #0113915befb,0

Sep  3 07:08:20 localhost mysqld: })

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: view((empty))

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: gcomm: closed

Sep  3 07:08:20 localhost mysqld: 150903  7:08:20 [Note] WSREP: /usr/sbin/mysqld: Terminated.

Sep  3 07:08:20 localhost mysqld: WSREP_SST: [ERROR] Parent mysqld process (PID:10828) terminated unexpectedly. (20150903 07:08:20.982)

Sep  3 07:08:20 localhost mysqld: WSREP_SST: [INFO] Joiner cleanup. (20150903 07:08:20.988)

Sep  3 07:08:21 localhost rsyncd[10878]: sent 0 bytes  received 0 bytes  total size 0

Sep  3 07:08:21 localhost mysqld: WSREP_SST: [INFO] Joiner cleanup done. (20150903 07:08:21.507)

Sep  3 07:08:21 localhost mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

Sep  3 07:08:47 localhost /etc/init.d/mysql[11180]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in

Sep  3 07:08:47 localhost /etc/init.d/mysql[11180]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed

Sep  3 07:08:47 localhost /etc/init.d/mysql[11180]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'

Sep  3 07:08:47 localhost /etc/init.d/mysql[11180]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!

Sep  3 07:08:47 localhost /etc/init.d/mysql[11180]:

Sep  3 07:08:47 localhost mysql[10507]: Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!

Sep  3 07:08:47 localhost systemd[1]: mysql.service: control process exited, code=exited status=1

Sep  3 07:08:47 localhost systemd[1]: Failed to start LSB: Start and stop the mysql database server daemon.

Sep  3 07:08:47 localhost systemd[1]: Unit mysql.service entered failed state.

Sep  3 07:30:01 localhost systemd[1]: Starting Run anacron jobs...

Sep  3 07:30:01 localhost systemd[1]: Started Run anacron jobs. 

trupti mali

unread,
Sep 3, 2015, 4:40:56 AM9/3/15
to codership, trupt...@gmail.com
Hi Philip ,
so to make it easier to read ,  I have attached the logs of the same time from two nodes. PFA the attachments  Node1 and Node2 ones.
 -those contain logs of Sept 3 (time from 7:08:17 to 7:08:20) on both the nodes.

Appreciate your responses.

Thx
Trupti
Node2.txt
PrimaryNode1.txt

Philip Stoev

unread,
Sep 3, 2015, 5:36:17 AM9/3/15
to trupti mali, codership, trupt...@gmail.com
Hello,

I found the relevant log line:

Sep 3 07:08:19 localhost mysqld: tail: cannot open ‘binlog.index’ for
reading: No such file or directory

This has been seen happening with servers were the binary log is enabled via
the log-bin option in my.cnf? If you do not intend to use the Galera cluster
with traditional MySQL async replication, can you remove those entries from
my.cnf on both nodes and try again?

Philip Stoev

-----Original Message-----
From: trupti mali
Sent: Thursday, September 03, 2015 11:40
To: codership
Cc: trupt...@gmail.com
Subject: Re: [codership-team] Not able to make second node join existing
galera cluster


trupti mali

unread,
Sep 3, 2015, 7:07:21 AM9/3/15
to codership, trupt...@gmail.com
Ah ok...is this the real issue? But ours is a hybrid replicaiton where two galera clusters talk asynchronously over WAN. Hence we have this set in my.cnf

log_bin                 = /var/log/mysql/mariadb-bin

log_bin_index           = /var/log/mysql/mariadb-bin.index.

Philip Stoev

unread,
Sep 3, 2015, 7:21:02 AM9/3/15
to trupti mali, codership, trupt...@gmail.com
Hello,

I think the problem comes from the presence of the log_bin_index setting. My
experiments show that if you remove it (and let the server pick the name for
the index file) then the node will start up properly.

If you remove the setting, then mysql will begin using the default name of
/var/log/mysql/mariadb-bin.index (without a trailing dot), which should work
OK with Galera. You may need to move the index file manually while the
server is not running, so that mysql can find it under its new name.

Alternatively, you can configure Galera to use xtrabackup rather than rsync
for performing snapshot transfers between servers. The xtrabackup method
does not seem to be affected by this issue.

trupti mali

unread,
Sep 3, 2015, 8:06:18 AM9/3/15
to codership, trupt...@gmail.com
Hello Philip,
Well that trailing dot was accidentally put in the post. In real there is not such dot. 
So I just tried commenting the index line in my.cnf of primary and secondary nodes and tried restarting the server.
I got the same error - (but with some weird notations around binlog.index.

Sep  3 11:53:15 localhost mysqld: tail: cannot open ?~@~Xbinlog.index?~@~Y for reading: No such file or directory



I also checked existence of this file in the dir - its as below -

root@localhost:/etc/mysql# ls -lrt /var/log/mysql

total 5132

-rw-rw---- 1 mysql adm     333 Aug 31 07:50 mariadb-bin.000001

-rw-rw---- 1 mysql adm     977 Aug 31 07:51 mariadb-bin.000002

-rw-rw---- 1 mysql adm     347 Aug 31 07:51 mariadb-bin.000003

-rw-rw---- 1 mysql adm     373 Aug 31 07:51 mariadb-bin.000004

-rw-rw---- 1 mysql adm     569 Aug 31 08:00 mariadb-bin.000005

-rw-rw---- 1 mysql adm     347 Aug 31 08:00 mariadb-bin.000006

-rw-rw---- 1 mysql adm     347 Aug 31 08:04 mariadb-bin.000007

-rw-rw---- 1 mysql adm     347 Aug 31 08:04 mariadb-bin.000008

-rw-rw---- 1 mysql adm       8 Aug 31 09:39 mariadb-bin.state

-rw-rw---- 1 mysql adm 5210396 Aug 31 09:39 mariadb-bin.000009

-rw-rw---- 1 mysql adm     418 Sep  3 11:53 mariadb-bin.index


are the rights given on these files right? Why would by default its looking for binlog.index. Does that mean the index file name should be binlog.index?


Thanks

Trupti

trupti mali

unread,
Sep 3, 2015, 10:14:55 AM9/3/15
to codership, trupt...@gmail.com
Another important discovery -
When I ran following command on primary node I saw two sets of binary logs...this could be a concern?


MariaDB [(none)]> SHOW BINARY LOGS;

+--------------------+-----------+

| Log_name           | File_size |

+--------------------+-----------+

| mariadb-bin.000001 |       333 |

| mariadb-bin.000002 |       977 |

| mariadb-bin.000003 |       347 |

| mariadb-bin.000004 |       373 |

| mariadb-bin.000005 |       569 |

| mariadb-bin.000006 |       347 |

| mariadb-bin.000007 |       347 |

| mariadb-bin.000008 |       347 |

| mariadb-bin.000009 |   5210396 |

| binlog.000001      |       342 |

| binlog.000002      | 106359000 |

| binlog.000003      |  70507772 |

| binlog.000004      |  47815394 |

| binlog.000005      |   3299462 |

| binlog.000006      |  19759499 |

| binlog.000007      |       415 |

| binlog.000008      |   1646061 |

| binlog.000009      |  31261549 |

+--------------------+-----------+


Philip Stoev

unread,
Sep 4, 2015, 1:28:21 AM9/4/15
to trupti mali, codership, trupt...@gmail.com
Hello,

For some reason the name of the binary logs has changed to "binlog" but the
index file is named something else. I do not know why that would be -- you
may wish to ask a MariaDB mailing list as this seems to be a async
replication configuration issue unrelated to Galera.

In order to side-step the issue with Galera, I think the easiest way would
be to use xtrabackup as an SST method.

Thank you.

Philip Stoev

-----Original Message-----
From: trupti mali
Sent: Thursday, September 03, 2015 17:14
To: codership
Cc: trupt...@gmail.com
Subject: Re: [codership-team] Not able to make second node join existing
galera cluster


trupti mali

unread,
Sep 10, 2015, 8:36:11 AM9/10/15
to codership, trupt...@gmail.com

Hi Philip,
Sorry for the delayed response. I was caught up with some other task. Only to get back to it today. So I tried to using SST method = xtrabackup-v2. Since I tried this on a fresh setup , I could get this working perfectly. 

So my architecture is a hybrid one.

I have 1 node in Location1 (Node1_loc1)
and 2 nodes in  Location2 (Node1_loc2 , Node2_loc2)

these later two nodes are connected by galera cluster over LAN. Wherein I used SST=Xtrabackup
Node1_loc1 is replicating to Node1_loc2 using Async Replicaiton.
Once I dumped te data onto Node1_loc2 and I could see that got replicated over the LAN and WAN nicely.
Then I tried adding a DB in Node1_loc1 , I could see it getting replicated in Node1_loc2. But When I tried connecting to Node2_loc2, I realized that I am no longer able to login using root credentials to Node2_loc2. Getting error [Warning] Access denied for user 'root'@'localhost' (using password: YES).

I have attached the logs of all the three nodes herewith.
Node1_Location2.txt
Node1-Location1.txt
Node2-Location2.txt

Philip Stoev

unread,
Sep 10, 2015, 8:46:48 AM9/10/15
to trupti mali, codership, trupt...@gmail.com
Hello,

Galera has synchronized Node2_loc2 with Node1_loc2 and this includes the
mysql database and the authentication information in the mysql.user table.

To log into Node2_loc2, you need to use the login credentials that you use
for Node1_loc2

trupti mali

unread,
Sep 10, 2015, 8:52:29 AM9/10/15
to codership, trupt...@gmail.com
Right , but my root pwd on both the nodes are same.  Infact its same on all the three nodes involved in the setup.

In the beginning when I dumped DB onto Node1_loc2 , I could see data replicated to Node2_loc2 nicely using the same root pwd.

It was when I made a DB update in Node1_loc1 which is asyncrhnously connected to the galera cluster, I faced this issue. Very strange.

But I faced the same issue when I started this thread few days back. While diagnosing this issue, I tried stoping Node2_loc2 (non primary node from galera cluster) and then it was not able to join the cluster again.

Philip Stoev

unread,
Sep 10, 2015, 9:32:02 AM9/10/15
to trupti mali, codership, trupt...@gmail.com
Hello,

The only thing I can think of is that your mysql root account may be
restricted by hostname and does no longer work for that reason. You can try
adding skip-grant-tables to the [mysqld] section of your my.cnf. If you then
restart the server you will be able to log in without a password and can
then examine the contents of the mysql database and/or recreate any accounts
that are problematic.

It appears to me that this is not an issue due to the Galera Replication
library, but rather a MariaDB or package installation issue you may wish to
email their mailing list about.

Philip Stoev

-----Original Message-----
From: trupti mali
Sent: Thursday, September 10, 2015 15:52
To: codership
Cc: trupt...@gmail.com
Subject: Re: [codership-team] Not able to make second node join existing
galera cluster


Reply all
Reply to author
Forward
0 new messages