Re: [codership-team] Warnings about certification failed for TO isolated action

207 views
Skip to first unread message

Alex Yurchenko

unread,
Sep 27, 2012, 1:36:23 PM9/27/12
to codersh...@googlegroups.com
Hi Luke,

This warning is a false positive (bug in reporting) which should be
fixed in later revisions. What is happening is that writesets that were
already contained in snapshot are discarded and cause this message. As
you can see it started right after SST completed and cache replay had
begun, latter pages didn't produce such warnings because they contained
writesets accumulated after state snapshot was made (during catch-up).

The warnings belong to truncates, which are executed (in this case
skipped) like DDL in TO-isolation.

As a side note, I guess you'd have much more luck (much faster trx
rates and saturation) if you used updates ;)

Regards,
Alex

On 2012-09-27 14:18, Luke Bigum wrote:
> Hi all,
>
> This morning I was trying to fill Galera's ring buffer cache by
> flooding a
> cluster with transactions while doing an SST. I was pleasantly
> surprised to
> find that Galera started to page off more cache files when it
> realized the
> first one was full. When it started to replay all it's transaction
> cache, I
> got these errors in the Joiner that I don't understand and hopefully
> someone can fill me in.
>
> They only seem to occur for the first and second cache pages (out of
> 5
> total pages), and there's about 380 errors from a total transaction
> set of
> about 100,000:
>
> 120927 10:37:42 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 14 trx_id: -1 seq
> nos (l: 3, g: 1, s: 0, d: 0, ts: 1348742154187710560)
> ...
> 120927 10:37:42 [Note] WSREP: 0 (pf1xdb03): State transfer from 1
> (pf1xdb01) complete.
> 120927 10:37:42 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 14 trx_id: -1 seq
> nos (l: 12, g: 10, s: 9, d: 9, ts: 1348742154209548932)
> 120927 10:37:42 [Note] WSREP: Shifting JOINER -> JOINED (TO: 44763)
> 120927 10:37:42 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 14 trx_id: -1 seq
> ...
> 120927 10:37:43 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 45 trx_id: -1 seqnos (l: 2465,
> g:
> 2463, s: 2462, d: 2462, ts: 1348742159040992990)
> ...
> 120927 10:37:51 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 255 trx_id: -1 seqnos (l: 19736,
> g:
> 19734, s: 19733, d: 19733, ts: 1348742197501714399)
> 120927 10:37:51 [Note] WSREP: Deleted page
> /var/lib/mysql-xfs/gcache.page.000000
> 120927 10:37:52 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 286 trx_id: -1 seqnos (l: 22162,
> g:
> 22160, s: 22159, d: 22159, ts: 1348742202566546659)
> ...
> 120927 10:37:52 [Warning] WSREP: Certification failed for TO isolated
> action: source: 508f908d-088e-11e2-0800-58a7d2bb80c1 version: 2
> local: 0
> state: MUST_ABORT flags: 65 conn_id: 286 trx_id: -1 seqnos (l: 22198,
> g:
> 22196, s: 22195, d: 22195, ts: 1348742202695906783)
> 120927 10:38:02 [Note] WSREP: Created page
> /var/lib/mysql-xfs/gcache.page.000003 of size 134217728 bytes
> 120927 10:38:03 [Note] WSREP: Deleted page
> /var/lib/mysql-xfs/gcache.page.000001
> 120927 10:38:22 [Note] WSREP: Deleted page
> /var/lib/mysql-xfs/gcache.page.000002
> 120927 10:38:27 [Note] WSREP: Deleted page
> /var/lib/mysql-xfs/gcache.page.000003
> 120927 10:38:38 [Note] WSREP: Member 0 (pf1xdb03) synced with group.
> 120927 10:38:38 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 72648)
> 120927 10:38:38 [Note] WSREP: Synchronized with group, ready for
> connections
>
> The joining node apparently caught up, but I'm not sure if these
> errors
> mean some transactions could have potentially been missed. The tests
> I was
> using to flood the database is a simple IOPS insert test, it
> continually
> insert the same data into the same table with the same keys, then
> truncate
> those tables and start all over again. This is Galera 2.1 that comes
> bundled with Percona XtraDB Cluster 5.5.27.
>
> Thanks,
>
> -Luke
>
> --
> Luke Bigum
> Senior Systems Engineer
>
> Information Systems
> luke....@lmax.com | http://www.lmax.com
> LMAX, Yellow Building, 1A Nicholas Road, London W11 4AN

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011
Reply all
Reply to author
Forward
0 new messages