When starting one of my cluster members it required a full SST, even though it was down for a couple of minutes and hardly any database mutations were done in that period. When I checked the logs, I noticed this:
150618 14:15:35 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 58987190)
150618 14:15:35 [Note] WSREP: State transfer required:
Group state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987190
Local state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987180
150618 14:15:35 [Note] WSREP: New cluster view: global state: 1da5bdb3-9504-11e4-b698-a304b11a4d59:58987190, view# 3: Primary, number of nodes: 3, my index: 2, protocol version 3
150618 14:15:35 [Warning] WSREP: Gap in state sequence. Need state transfer.
150618 14:15:35 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'REDACTED' --auth 'REDACTED' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '4290' --binlog 'mysql-bin' '
150618 14:15:35 [Note] WSREP: Prepared SST request: rsync|REDACTED:4444/rsync_sst
150618 14:15:35 [Note] WSREP: REPL Protocols: 7 (3, 2)
150618 14:15:35 [Note] WSREP: Service thread queue flushed.
150618 14:15:35 [Note] WSREP: Assign initial position for certification: 58987190, protocol version: 3
150618 14:15:35 [Note] WSREP: Service thread queue flushed.
150618 14:15:35 [Note] WSREP: IST receiver using ssl
150618 14:15:35 [Note] WSREP: Prepared IST receiver, listening at: ssl://REDACTED:4568
150618 14:15:35 [Note] WSREP: Member 2.0 (REDACTED) requested state transfer from '*any*'. Selected 0.0 (REDACTED)(SYNCED) as donor.
150618 14:15:35 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 58987190)
150618 14:15:35 [Note] WSREP: Requesting state transfer: success, donor: 0
150618 14:15:37 [Note] WSREP: (b8978ed4, 'ssl://0.0.0.0:4567') turning message relay requesting off
According to that, it's only missing 10 transactions (the difference between 58987190 and 58987180). Why is Galera still doing a SST that takes almost 2 hours in this case?
Regards, Bram J.