On 2012-10-04 14:50, Lee Henson wrote:
> Hi
>
> Our 3-node cluster has just broken down. It looks like one of the
> nodes
> dropped out of the cluster (not sure why) and when it tried to
> connect back
> up again it resulted in the output shown below. The important lines
> look
> (to me) like:
>
>
> mysqld: 121004 3:59:30 [Note] WSREP: You have configured 'rsync'
> state
> snapshot transfer method which cannot be performed on a running
> server.
> Wsrep provider won't be able to fall back to it if other means of
> state
> transfer are unavailable. In that case you will need to restart the
> server.
> .....
> mysqld: 121004 3:59:31 [Warning] WSREP: 0 (pwpdb1n3): State transfer
> to 1
> (pwpdb1n1) failed: -125 (Operation canceled)
> mysqld: 121004 3:59:31 [ERROR] WSREP:
> gcs/src/gcs_group.c:gcs_group_handle_join_msg():691: Will never
> receive
> state. Need to abort.
>
>
> What does code 125 mean?
That we will learn when we see the corresponding log from pwpdb1n3, who
failed to deliver and hence may know something special... ;)
> Full log output:
>
> mysqld: 121004 3:59:19 [Note] WSREP:
> (74650a7c-d6a9-11e1-0800-1c210e737947, 'tcp://
0.0.0.0:4567') turning
> message relay requesting on, nonlive peers: tcp://
10.183.171.211:4567
> tcp://
10.183.171.238:4567
> mysqld: 121004 3:59:20 [Note] WSREP:
> (74650a7c-d6a9-11e1-0800-1c210e737947, 'tcp://
0.0.0.0:4567')
> reconnecting
> to c0c08f1c-00a6-11e2-0800-39ac6663a85c (tcp://
10.183.171.211:4567),
> attempt 0
> mysqld: 121004 3:59:20 [Note] WSREP:
> (74650a7c-d6a9-11e1-0800-1c210e737947, 'tcp://
0.0.0.0:4567')
> reconnecting
> to 34392287-d6a3-11e1-0800-50c8d16e4128 (tcp://
10.183.171.238:4567),
> attempt 0
> mysqld: 121004 3:59:20 [Note] WSREP:
> evs::proto(74650a7c-d6a9-11e1-0800-1c210e737947, OPERATIONAL,
> view_id(REG,34392287-d6a3-11e1-0800-50c8d16e4128,22)) suspecting
> node:
> 34392287-d6a3-11e1-0800-50c8d16e4128
> mysqld: 121004 3:59:20 [Note] WSREP:
> evs::proto(74650a7c-d6a9-11e1-0800-1c210e737947, OPERATIONAL,
> view_id(REG,34392287-d6a3-11e1-0800-50c8d16e4128,22)) suspecting
> node:
> c0c08f1c-00a6-11e2-0800-39ac6663a85c
> mysqld: 121004 3:59:21 [Note] WSREP: GMCast::handle_stable_view:
> view(view_id(NON_PRIM,34392287-d6a3-11e1-0800-50c8d16e4128,22) memb {
> mysqld: #01174650a7c-d6a9-11e1-0800-1c210e737947,
> mysqld: } joined {
> mysqld: } left {
> mysqld: } partitioned {
> mysqld: #01134392287-d6a3-11e1-0800-50c8d16e4128,
> mysqld: #011c0c08f1c-00a6-11e2-0800-39ac6663a85c,
> mysqld: })
> mysqld: 121004 3:59:21 [Note] WSREP: GMCast::handle_stable_view:
> view(view_id(NON_PRIM,74650a7c-d6a9-11e1-0800-1c210e737947,23) memb {
> mysqld: #01174650a7c-d6a9-11e1-0800-1c210e737947,
> mysqld: } joined {
> mysqld: } left {
> mysqld: } partitioned {
> mysqld: #01134392287-d6a3-11e1-0800-50c8d16e4128,
> mysqld: #011c0c08f1c-00a6-11e2-0800-39ac6663a85c,
> mysqld: })
> mysqld: 121004 3:59:21 [Note] WSREP: New COMPONENT: primary = no,
> my_idx =
> 0, memb_num = 1
> mysqld: 121004 3:59:21 [Note] WSREP: Flow-control interval: [8, 16]
> mysqld: 121004 3:59:21 [Note] WSREP: Received NON-PRIMARY.
> mysqld: 121004 3:59:21 [Note] WSREP: Shifting SYNCED -> OPEN (TO:
> 82422120)
> mysqld: 121004 3:59:21 [Note] WSREP: New COMPONENT: primary = no,
> my_idx =
> 0, memb_num = 1
> mysqld: 121004 3:59:21 [Note] WSREP: Flow-control interval: [8, 16]
> mysqld: 121004 3:59:21 [Note] WSREP: Received NON-PRIMARY.
> mysqld: 121004 3:59:21 [Note] WSREP: New cluster view: global state:
> 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2:82422120, view# -1: non-Primary,
> number of nodes: 1, my index: 0, protocol version 1
> mysqld: 121004 3:59:21 [Note] WSREP: wsrep_notify_cmd is not
> defined,
> skipping notification.
> mysqld: 121004 3:59:21 [Note] WSREP: New cluster view: global state:
> 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2:82422120, view# -1: non-Primary,
> number of nodes: 1, my index: 0, protocol version 1
> mysqld: 121004 3:59:21 [Note] WSREP: wsrep_notify_cmd is not
> defined,
> skipping notification.
> mysqld: 121004 3:59:23 [Note] WSREP:
> (74650a7c-d6a9-11e1-0800-1c210e737947, 'tcp://
0.0.0.0:4567')
> reconnecting
> to c0c08f1c-00a6-11e2-0800-39ac6663a85c (tcp://
10.183.171.211:4567),
> attempt 0
> mysqld: 121004 3:59:27 [Note] WSREP:
> (74650a7c-d6a9-11e1-0800-1c210e737947, 'tcp://
0.0.0.0:4567') turning
> message relay requesting off
> mysqld: 121004 3:59:28 [Note] WSREP: New COMPONENT: primary = yes,
> my_idx
> = 1, memb_num = 3
> mysqld: 121004 3:59:28 [Note] WSREP: STATE EXCHANGE: Waiting for
> state
> UUID.
> mysqld: 121004 3:59:28 [Note] WSREP: GMCast::handle_stable_view:
> view(view_id(PRIM,34392287-d6a3-11e1-0800-50c8d16e4128,24) memb {
> mysqld: #01134392287-d6a3-11e1-0800-50c8d16e4128,
> mysqld: #01174650a7c-d6a9-11e1-0800-1c210e737947,
> mysqld: #011c0c08f1c-00a6-11e2-0800-39ac6663a85c,
> mysqld: } joined {
> mysqld: } left {
> mysqld: } partitioned {
> mysqld: })
> mysqld: 121004 3:59:28 [Note] WSREP: declaring
> 34392287-d6a3-11e1-0800-50c8d16e4128 stable
> mysqld: 121004 3:59:28 [Note] WSREP: declaring
> c0c08f1c-00a6-11e2-0800-39ac6663a85c stable
> mysqld: 121004 3:59:28 [Note] WSREP: STATE EXCHANGE: sent state msg:
> e4ccadbe-0dd7-11e2-0800-263d4b71cb69
> mysqld: 121004 3:59:28 [Note] WSREP: STATE EXCHANGE: got state msg:
> e4ccadbe-0dd7-11e2-0800-263d4b71cb69 from 0 (pwpdb1n3)
> mysqld: 121004 3:59:28 [Note] WSREP: STATE EXCHANGE: got state msg:
> e4ccadbe-0dd7-11e2-0800-263d4b71cb69 from 1 (pwpdb1n1)
> mysqld: 121004 3:59:28 [Note] WSREP: STATE EXCHANGE: got state msg:
> e4ccadbe-0dd7-11e2-0800-263d4b71cb69 from 2 (pwpdb1n2)
> mysqld: 121004 3:59:28 [Note] WSREP: Quorum results:
> mysqld: #011version = 2,
> mysqld: #011component = PRIMARY,
> mysqld: #011conf_id = 21,
> mysqld: #011members = 2/3 (joined/total),
> mysqld: #011act_id = 82422209,
> mysqld: #011last_appl. = 82421930,
> mysqld: #011protocols = 0/3/1 (gcs/repl/appl),
> mysqld: #011group UUID = 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2
> mysqld: 121004 3:59:28 [Note] WSREP: Flow-control interval: [14, 28]
> mysqld: 121004 3:59:28 [Note] WSREP: Shifting OPEN -> PRIMARY (TO:
> 82422209)
> mysqld: 121004 3:59:28 [Note] WSREP: State transfer required:
> mysqld: #011Group state:
> 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2:82422209
> mysqld: #011Local state:
> 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2:82422120
> mysqld: 121004 3:59:28 [Note] WSREP: New cluster view: global state:
> 1e42b63f-d6a3-11e1-0800-4a6fc40ca7a2:82422209, view# 22: Primary,
> number of
> nodes: 3, my index: 1, protocol version 1
> mysqld: 121004 3:59:28 [Warning] WSREP: Gap in state sequence. Need
> state
> transfer.
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82397084
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82393286
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82390801
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82388878
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82385001
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82379491
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82375137
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82373471
> mysqld: 121004 3:59:30 [Note] WSREP: SST kill local trx: 82372237
> mysqld: 121004 3:59:30 [Note] WSREP: You have configured 'rsync'
> state
> snapshot transfer method which cannot be performed on a running
> server.
> Wsrep provider won't be able to fall back to it if other means of
> state
> transfer are unavailable. In that case you will need to restart the
> server.
> mysqld: 121004 3:59:30 [Note] WSREP: wsrep_notify_cmd is not
> defined,
> skipping notification.
> mysqld: 121004 3:59:30 [Note] WSREP: Assign initial position for
> certification: 82422209, protocol version: 2
> mysqld: 121004 3:59:30 [Note] WSREP: Prepared IST receiver,
> listening at:
> tcp://
173.45.237.153:4568
> mysqld: 121004 3:59:30 [Note] WSREP: Node 1 (pwpdb1n1) requested
> state
> transfer from '*any*'. Selected 0 (pwpdb1n3)(SYNCED) as donor.
> mysqld: 121004 3:59:30 [Note] WSREP: Shifting PRIMARY -> JOINER (TO:
> 82422245)
> mysqld: 121004 3:59:30 [Note] WSREP: Requesting state transfer:
> success,
> donor: 0
> mysqld: 121004 3:59:30 [Note] WSREP: GCache DEBUG:
> RingBuffer::seqno_reset(): discarded 134196507 bytes
> mysqld: 121004 3:59:30 [Note] WSREP: GCache DEBUG:
> RingBuffer::seqno_reset(): found 0/36 locked buffers
> mysqld: 121004 3:59:30 [Note] WSREP: Receiving IST: 89 writesets,
> seqnos
> 82422120-82422209
> mysqld: 121004 3:59:31 [Warning] WSREP: 0 (pwpdb1n3): State transfer
> to 1
> (pwpdb1n1) failed: -125 (Operation canceled)
> mysqld: 121004 3:59:31 [ERROR] WSREP:
> gcs/src/gcs_group.c:gcs_group_handle_join_msg():691: Will never
> receive
> state. Need to abort.
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: terminating thread
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: joining thread
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: closing backend
> mysqld: 121004 3:59:31 [Note] WSREP:
> evs::proto(74650a7c-d6a9-11e1-0800-1c210e737947, LEAVING,
> view_id(REG,34392287-d6a3-11e1-0800-50c8d16e4128,24)) uuid
> 34392287-d6a3-11e1-0800-50c8d16e4128 missing from install message,
> assuming
> partitioned
> mysqld: 121004 3:59:31 [Note] WSREP:
> evs::proto(74650a7c-d6a9-11e1-0800-1c210e737947, LEAVING,
> view_id(REG,34392287-d6a3-11e1-0800-50c8d16e4128,24)) uuid
> c0c08f1c-00a6-11e2-0800-39ac6663a85c missing from install message,
> assuming
> partitioned
> mysqld: 121004 3:59:31 [Note] WSREP: GMCast::handle_stable_view:
> view(view_id(NON_PRIM,34392287-d6a3-11e1-0800-50c8d16e4128,24) memb {
> mysqld: #01174650a7c-d6a9-11e1-0800-1c210e737947,
> mysqld: } joined {
> mysqld: } left {
> mysqld: } partitioned {
> mysqld: #01134392287-d6a3-11e1-0800-50c8d16e4128,
> mysqld: #011c0c08f1c-00a6-11e2-0800-39ac6663a85c,
> mysqld: })
> mysqld: 121004 3:59:31 [Note] WSREP: GMCast::handle_stable_view:
> view((empty))
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: closed
> mysqld: 121004 3:59:31 [Note] WSREP: /usr/sbin/mysqld: Terminated.
> mysqld_safe: Number of processes running now: 0
> mysqld_safe: mysqld restarted
> mysqld: 121004 3:59:31 [Note] WSREP: wsrep_load(): loading provider
> library '/usr/lib/galera/libgalera_smm.so'
> mysqld: 121004 3:59:31 [Note] WSREP: wsrep_load(): Galera
> 23.2.0(r120) by
> Codership Oy <
in...@codership.com> loaded succesfully.
> mysqld: 121004 3:59:31 [Note] WSREP: Reusing existing
> '/var/lib/mysql//galera.cache'.
> mysqld: 121004 3:59:31 [Note] WSREP: Passing config to GCS:
> gcache.dir =
> /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
>
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M;
> gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5;
> gcs.fc_limit =
> 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
> gcs.max_throttle
> = 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
> gcs.recv_q_soft_limit
> = 0.25; replicator.causal_read_timeout = PT30S;
> replicator.commit_order = 3
> mysqld: 121004 3:59:31 [Note] WSREP: wsrep_sst_grab()
> mysqld: 121004 3:59:31 [Note] WSREP: Start replication
> mysqld: 121004 3:59:31 [Note] WSREP: Found saved state:
> 00000000-0000-0000-0000-000000000000:-1
> mysqld: 121004 3:59:31 [Note] WSREP: Assign initial position for
> certification: -1, protocol version: -1
> mysqld: 121004 3:59:31 [Note] WSREP: Setting initial position to
> 00000000-0000-0000-0000-000000000000:-1
> mysqld: 121004 3:59:31 [Note] WSREP: protonet asio version 0
> mysqld: 121004 3:59:31 [Note] WSREP: backend: asio
> mysqld: 121004 3:59:31 [Note] WSREP: GMCast version 0
> mysqld: 121004 3:59:31 [Note] WSREP:
> (e6ebeb87-0dd7-11e2-0800-1f5cfae992c7, 'tcp://
0.0.0.0:4567')
> listening at
> tcp://
0.0.0.0:4567
> mysqld: 121004 3:59:31 [Note] WSREP:
> (e6ebeb87-0dd7-11e2-0800-1f5cfae992c7, 'tcp://
0.0.0.0:4567')
> multicast: ,
> ttl: 1
> mysqld: 121004 3:59:31 [Note] WSREP: EVS version 0
> mysqld: 121004 3:59:31 [Note] WSREP: PC version 0
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: connecting to group
> 'my_wsrep_cluster', peer ''
> mysqld: 121004 3:59:31 [Note] WSREP: GMCast::handle_stable_view:
> view(view_id(PRIM,e6ebeb87-0dd7-11e2-0800-1f5cfae992c7,1) memb {
> mysqld: #011e6ebeb87-0dd7-11e2-0800-1f5cfae992c7,
> mysqld: } joined {
> mysqld: } left {
> mysqld: } partitioned {
> mysqld: })
> mysqld: 121004 3:59:31 [Note] WSREP: gcomm: connected
> mysqld: 121004 3:59:31 [Note] WSREP: Changing maximum packet size to
> 64500, resulting msg size: 32636
> mysqld: 121004 3:59:31 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
> mysqld: 121004 3:59:31 [Note] WSREP: Opened channel
> 'my_wsrep_cluster'
> mysqld: 121004 3:59:31 [Note] WSREP: New COMPONENT: primary = yes,
> my_idx
> = 0, memb_num = 1
> mysqld: 121004 3:59:31 [Note] WSREP: Waiting for SST to complete.
> mysqld: 121004 3:59:31 [Note] WSREP: Starting new group from
> scratch:
> e6eca9d0-0dd7-11e2-0800-970a0d5930d3
> mysqld: 121004 3:59:31 [Note] WSREP: STATE_EXCHANGE: sent state
> UUID:
> e6ecca17-0dd7-11e2-0800-ff093cac64aa
> mysqld: 121004 3:59:31 [Note] WSREP: STATE EXCHANGE: sent state msg:
> e6ecca17-0dd7-11e2-0800-ff093cac64aa
> mysqld: 121004 3:59:31 [Note] WSREP: STATE EXCHANGE: got state msg:
> e6ecca17-0dd7-11e2-0800-ff093cac64aa from 0 (pwpdb1n1)
> mysqld: 121004 3:59:31 [Note] WSREP: Quorum results:
> mysqld: #011version = 2,
> mysqld: #011component = PRIMARY,
> mysqld: #011conf_id = 0,
> mysqld: #011members = 1/1 (joined/total),
> mysqld: #011act_id = 0,
> mysqld: #011last_appl. = -1,
> mysqld: #011protocols = 0/3/1 (gcs/repl/appl),
> mysqld: #011group UUID = e6eca9d0-0dd7-11e2-0800-970a0d5930d3
> mysqld: 121004 3:59:31 [Note] WSREP: Flow-control interval: [8, 16]
> mysqld: 121004 3:59:31 [Note] WSREP: Restored state OPEN -> JOINED
> (0)
> mysqld: 121004 3:59:31 [Note] WSREP: Member 0 (pwpdb1n1) synced with
> group.
> mysqld: 121004 3:59:31 [Note] WSREP: Shifting JOINED -> SYNCED (TO:
> 0)
> mysqld: 121004 3:59:31 [Note] WSREP: New cluster view: global state:
> e6eca9d0-0dd7-11e2-0800-970a0d5930d3:0, view# 1: Primary, number of
> nodes:
> 1, my index: 0, protocol version 1
> mysqld: 121004 3:59:31 [Note] WSREP: SST complete, seqno: 0
> mysqld: 121004 3:59:31 InnoDB: The InnoDB memory heap is disabled
> mysqld: 121004 3:59:31 InnoDB: Mutexes and rw_locks use GCC atomic
> builtins
> mysqld: 121004 3:59:31 InnoDB: Compressed tables use zlib 1.2.3.3
> mysqld: 121004 3:59:31 InnoDB: Using Linux native AIO
> mysqld: 121004 3:59:31 InnoDB: Initializing buffer pool, size =
> 128.0M
> mysqld: 121004 3:59:31 InnoDB: Completed initialization of buffer
> pool
> mysqld: 121004 3:59:31 InnoDB: highest supported file format is
> Barracuda.
> mysqld: InnoDB: The log sequence number in ibdata files does not
> match
> mysqld: InnoDB: the log sequence number in the ib_logfiles!
> mysqld: 121004 3:59:31 InnoDB: Database was not shut down normally!
> mysqld: InnoDB: Starting crash recovery.
> mysqld: InnoDB: Reading tablespace information from the .ibd files...
> ...
>
>
> Cheers
> Lee
--
Alexey Yurchenko,
Codership Oy,
www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011