Why SST instead of IST?

44 views
Skip to first unread message

Bram Jannsen

unread,
Jun 18, 2015, 9:38:20 AM6/18/15
to codersh...@googlegroups.com
When starting one of my cluster members it required a full SST, even though it was down for a couple of minutes and hardly any database mutations were done in that period. When I checked the logs, I noticed this:

150618 14:15:35 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 58987190)
150618 14:15:35 [Note] WSREP: State transfer required:
       
Group state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987190
       
Local state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987180
150618 14:15:35 [Note] WSREP: New cluster view: global state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987190, view# 3: Primary, number of nodes: 3, my index: 2, protocol version 3
150618 14:15:35 [Warning] WSREP: Gap in state sequence. Need state transfer.
150618 14:15:35 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'REDACTED' --auth 'REDACTED' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '4290' --binlog 'mysql-bin' '
150618 14:15:35 [Note] WSREP: Prepared SST request: rsync|REDACTED:4444/rsync_sst
150618 14:15:35 [Note] WSREP: REPL Protocols: 7 (3, 2)
150618 14:15:35 [Note] WSREP: Service thread queue flushed.
150618 14:15:35 [Note] WSREP: Assign initial position for certification: 58987190, protocol version: 3
150618 14:15:35 [Note] WSREP: Service thread queue flushed.
150618 14:15:35 [Note] WSREP: IST receiver using ssl
150618 14:15:35 [Note] WSREP: Prepared IST receiver, listening at: ssl://REDACTED:4568
150618 14:15:35 [Note] WSREP: Member 2.0 (REDACTED) requested state transfer from '*any*'. Selected 0.0 (REDACTED)(SYNCED) as donor.
150618 14:15:35 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 58987190)
150618 14:15:35 [Note] WSREP: Requesting state transfer: success, donor: 0
150618 14:15:37 [Note] WSREP: (b8978ed4, 'ssl://0.0.0.0:4567') turning message relay requesting off

According to that, it's only missing 10 transactions (the difference between 58987190 and 58987180). Why is Galera still doing a SST that takes almost 2 hours in this case?

Regards, Bram J.

alexey.y...@galeracluster.com

unread,
Jun 18, 2015, 9:44:11 AM6/18/15
to Bram Jannsen, codersh...@googlegroups.com
1. From this piece of log it is not obvious that SST happened.
2. There are TWO nodes involved in SST, right? So you may want to check
what the donor is saying.
Reply all
Reply to author
Forward
0 new messages