Cluster state changes to "No node with complete state" very often

118 views
Skip to first unread message

Anand S

unread,
Jan 4, 2018, 4:26:14 AM1/4/18
to codership
Hi All,

we have a 3 node mariadb galera cluster being used for drupal application, only one of the 3 node is taking writes at any point of time and is being managed by maxscale.

we have a problem of nodes keeps disconnecting very often with the message below, its weird we are unable to find what is causing this issues, when the nodes goes out of the cluster and joins again, the application is receiving issues connecting to the cluster. any help on troubleshooting this issue is highly appreciated.

2018-01-04  3:41:04 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-01-04  4:02:14 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
 tcp://node3:4567
2018-01-04  4:02:15 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') reconnecting to a963adc6 (tcp://node3:4567),
attempt 0
2018-01-04  4:02:15 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') reconnecting to 7c54fbd4 (tcp://node2:4567),
attempt 0
2018-01-04  4:02:16 139678116214528 [Note] WSREP: evs::proto(5544a269, OPERATIONAL, view_id(REG,5544a269,2834)) suspecting node: a963a
dc6
2018-01-04  4:02:16 139678116214528 [Note] WSREP: evs::proto(5544a269, OPERATIONAL, view_id(REG,5544a269,2834)) suspected node without
 join message, declaring inactive
2018-01-04  4:02:19 139678116214528 [Note] WSREP: evs::proto(5544a269, GATHER, view_id(REG,5544a269,2834)) suspecting node: 7c54fbd4
2018-01-04  4:02:19 139678116214528 [Note] WSREP: evs::proto(5544a269, GATHER, view_id(REG,5544a269,2834)) suspected node without join
 message, declaring inactive
2018-01-04  4:02:20 139678116214528 [Note] WSREP: view(view_id(NON_PRIM,5544a269,2834) memb {
        5544a269,0
} joined {
} left {
} partitioned {
        7c54fbd4,0
        a963adc6,0
})
2018-01-04  4:02:20 139678116214528 [Note] WSREP: view(view_id(NON_PRIM,5544a269,2835) memb {
        5544a269,0
} joined {
} left {
} partitioned {
        7c54fbd4,0
        a963adc6,0
})
2018-01-04  4:02:20 139678105724672 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Flow-control interval: [16, 16]
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Received NON-PRIMARY.
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 14486485)
2018-01-04  4:02:20 139678105724672 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Flow-control interval: [16, 16]
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Received NON-PRIMARY.
2018-01-04  4:02:20 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-01-04  4:02:20 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:02:20 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-01-04  4:02:20 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:02:20 139678116214528 [Note] WSREP: declaring 7c54fbd4 at tcp://node2:4567 stable
2018-01-04  4:02:20 139678116214528 [Note] WSREP: declaring a963adc6 at tcp://node3:4567 stable
2018-01-04  4:02:20 139678116214528 [Note] WSREP: re-bootstrapping prim from partitioned components
2018-01-04  4:02:20 139678116214528 [Note] WSREP: view(view_id(PRIM,5544a269,2836) memb {
        5544a269,0
        7c54fbd4,0
        a963adc6,0
} joined {
} left {
} partitioned {
})
2018-01-04  4:02:20 139678116214528 [Note] WSREP: save pc into disk
2018-01-04  4:02:20 139678105724672 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2018-01-04  4:02:20 139678105724672 [Note] WSREP: STATE_EXCHANGE: sent state UUID: f8ac73e9-f12d-11e7-a805-eb98a045e058
2018-01-04  4:02:20 139678105724672 [Note] WSREP: STATE EXCHANGE: sent state msg: f8ac73e9-f12d-11e7-a805-eb98a045e058
2018-01-04  4:02:20 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: f8ac73e9-f12d-11e7-a805-eb98a045e058 from 0 (Node1)
2018-01-04  4:02:20 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: f8ac73e9-f12d-11e7-a805-eb98a045e058 from 1 (Node2)
2018-01-04  4:02:20 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: f8ac73e9-f12d-11e7-a805-eb98a045e058 from 2 (Node3)
2018-01-04  4:02:20 139678105724672 [Warning] WSREP: Quorum: No node with complete state:


        Version      : 3
        Flags        : 0x3
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : fe57d05e-f12a-11e7-8055-6f125168b9f9
        Prim  seqno  : 1914
        First seqno  : 14473889
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node1'
        Incoming addr: 'node1:3306'

        Version      : 3
        Flags        : 0x2
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : fe57d05e-f12a-11e7-8055-6f125168b9f9
        Prim  seqno  : 1914
        First seqno  : 14473886
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node2'
        Incoming addr: 'node2:3306'

        Version      : 3
        Flags        : 0x2
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : fe57d05e-f12a-11e7-8055-6f125168b9f9
        Prim  seqno  : 1914
        First seqno  : 14473887
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node3'
        Incoming addr: 'node3:3306'

2018-01-04  4:02:20 139678105724672 [Note] WSREP: Full re-merge of primary fe57d05e-f12a-11e7-8055-6f125168b9f9 found: 3 of 3.
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 1914,
        members    = 3/3 (joined/total),
        act_id     = 14486485,
        last_appl. = 14486416,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Flow-control interval: [28, 28]
2018-01-04  4:02:20 139678105724672 [Note] WSREP: Restored state OPEN -> SYNCED (14486485)
2018-01-04  4:02:20 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# 1915: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-01-04  4:02:20 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:02:20 139678578936576 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-01-04  4:02:20 139678173947648 [Note] WSREP: Service thread queue flushed.
2018-01-04  4:02:20 139678578936576 [Note] WSREP: Assign initial position for certification: 14486485, protocol version: 3
2018-01-04  4:02:20 139678173947648 [Note] WSREP: Service thread queue flushed.
2018-01-04  4:02:20 139678578936576 [Note] WSREP: Synchronized with group, ready for connections
2018-01-04  4:02:20 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:02:22 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-01-04  4:02:55 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://node2:4567 tcp://node3:4567
2018-01-04  4:02:56 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') reconnecting to 7c54fbd4 (tcp://node2:4567), attempt 0
2018-01-04  4:02:56 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') reconnecting to a963adc6 (tcp://node3:4567),
attempt 0
2018-01-04  4:02:57 139678116214528 [Note] WSREP: evs::proto(5544a269, OPERATIONAL, view_id(REG,5544a269,2836)) suspecting node: 7c54fbd4
2018-01-04  4:02:57 139678116214528 [Note] WSREP: evs::proto(5544a269, OPERATIONAL, view_id(REG,5544a269,2836)) suspected node without join message, declaring inactive
2018-01-04  4:03:00 139678116214528 [Note] WSREP: evs::proto(5544a269, GATHER, view_id(REG,5544a269,2836)) suspecting node: a963adc6
2018-01-04  4:03:00 139678116214528 [Note] WSREP: evs::proto(5544a269, GATHER, view_id(REG,5544a269,2836)) suspected node without join message, declaring inactive
2018-01-04  4:03:01 139678116214528 [Note] WSREP: view(view_id(NON_PRIM,5544a269,2836) memb {
        5544a269,0
} joined {
} left {
} partitioned {
        7c54fbd4,0
        a963adc6,0
})
2018-01-04  4:03:01 139678105724672 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Flow-control interval: [16, 16]
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Received NON-PRIMARY.
2018-01-04  4:03:01 139678116214528 [Note] WSREP: view(view_id(NON_PRIM,5544a269,2837) memb {
        5544a269,0
} joined {
} left {
} partitioned {
        7c54fbd4,0
        a963adc6,0
})
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 14486485)
2018-01-04  4:03:01 139678105724672 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Flow-control interval: [16, 16]
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Received NON-PRIMARY.
2018-01-04  4:03:01 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-01-04  4:03:01 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:03:01 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-01-04  4:03:01 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:03:01 139678116214528 [Note] WSREP: declaring 7c54fbd4 at tcp://node2:4567 stable
2018-01-04  4:03:01 139678116214528 [Note] WSREP: declaring a963adc6 at tcp://node3:4567 stable
2018-01-04  4:03:01 139678116214528 [Note] WSREP: re-bootstrapping prim from partitioned components
2018-01-04  4:03:01 139678116214528 [Note] WSREP: view(view_id(PRIM,5544a269,2838) memb {
        5544a269,0
        7c54fbd4,0
        a963adc6,0
} joined {
} left {
} partitioned {
})
2018-01-04  4:03:01 139678116214528 [Note] WSREP: save pc into disk
2018-01-04  4:03:01 139678105724672 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2018-01-04  4:03:01 139678105724672 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 114484a1-f12e-11e7-a7a0-9f10765d3a38
2018-01-04  4:03:01 139678105724672 [Note] WSREP: STATE EXCHANGE: sent state msg: 114484a1-f12e-11e7-a7a0-9f10765d3a38
2018-01-04  4:03:01 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: 114484a1-f12e-11e7-a7a0-9f10765d3a38 from 0 (Node1)
2018-01-04  4:03:01 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: 114484a1-f12e-11e7-a7a0-9f10765d3a38 from 1 (Node2)
2018-01-04  4:03:01 139678105724672 [Note] WSREP: STATE EXCHANGE: got state msg: 114484a1-f12e-11e7-a7a0-9f10765d3a38 from 2 (Node3)
2018-01-04  4:03:01 139678105724672 [Warning] WSREP: Quorum: No node with complete state:
        Version      : 3
        Flags        : 0x3
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Prim  seqno  : 1915
        First seqno  : 14473889
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : 114484a1-f12e-11e7-a7a0-9f10765d3a38
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node1'
        Incoming addr: 'node1:3306'

        Version      : 3
        Flags        : 0x2
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Prim  seqno  : 1915
        First seqno  : 14473886
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : 114484a1-f12e-11e7-a7a0-9f10765d3a38
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node2'
        Incoming addr: 'node2:3306'

        Version      : 3
        Flags        : 0x2
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : SYNCED
        Prim UUID    : f8ac73e9-f12d-11e7-a805-eb98a045e058
        Prim  seqno  : 1915
        First seqno  : 14473887
        Last  seqno  : 14486485
        Prim JOINED  : 3
        State UUID   : 114484a1-f12e-11e7-a7a0-9f10765d3a38
        Group UUID   : 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
        Name         : 'Node3'
        Incoming addr: 'node3:3306'

2018-01-04  4:03:01 139678105724672 [Note] WSREP: Full re-merge of primary f8ac73e9-f12d-11e7-a805-eb98a045e058 found: 3 of 3.
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 1915,
        members    = 3/3 (joined/total),
        act_id     = 14486485,
        last_appl. = 14486416,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 25399ae9-8a2d-11e6-ae1a-9f3235818cfa
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Flow-control interval: [28, 28]
2018-01-04  4:03:01 139678105724672 [Note] WSREP: Restored state OPEN -> SYNCED (14486485)
2018-01-04  4:03:01 139678578936576 [Note] WSREP: New cluster view: global state: 25399ae9-8a2d-11e6-ae1a-9f3235818cfa:14486485, view# 1916: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-01-04  4:03:01 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:03:01 139678578936576 [Note] WSREP: REPL Protocols: 7 (3, 2)
2018-01-04  4:03:01 139678173947648 [Note] WSREP: Service thread queue flushed.
2018-01-04  4:03:01 139678578936576 [Note] WSREP: Assign initial position for certification: 14486485, protocol version: 3
2018-01-04  4:03:01 139678173947648 [Note] WSREP: Service thread queue flushed.
2018-01-04  4:03:01 139678578936576 [Note] WSREP: Synchronized with group, ready for connections
2018-01-04  4:03:01 139678578936576 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-01-04  4:03:04 139678116214528 [Note] WSREP: (5544a269, 'tcp://0.0.0.0:4567') turning message relay requesting off



Thanks


Reply all
Reply to author
Forward
0 new messages