We just had a crash in our 4 node setup, as a result of a discution earlier we've statet adding primary keys to some of our tables, and today we might have added 100 primary keys, but this one made node2-4 puke and exit (following a larger downtime because they all wanted to SST teir 75GB database at once, i plan to remove mysqld from the init scripts and stop using safe_mysqld in order to be able to control the restart of mysql myself).
The query was execute on node1, cause 2-4 to crash, the query was:
ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST
and the scema of abo_errorlogin_log is (after the alter):
| abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL DEFAULT '',
`name` varchar(50) NOT NULL DEFAULT '',
`timestamps` text NOT NULL,
`last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
`sites` varchar(50) NOT NULL DEFAULT '',
`last_site` varchar(2) NOT NULL DEFAULT '',
`institution_key` varchar(255) NOT NULL DEFAULT '',
`tally` int(11) NOT NULL DEFAULT '0',
`email` varchar(100) NOT NULL DEFAULT '',
`not_student` tinyint(1) NOT NULL DEFAULT '0',
`has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
In the logfiles @node1:
Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.13:4567
Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d (tcp://192.168.0.13:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a (tcp://192.168.0.12:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to d1ae5b68-1848-11e2-0800-df1784883427 (tcp://192.168.0.14:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote endpoint tcp://192.168.0.14:4567 changed identity d1ae5b68-1848-11e2-0800-df1784883427 -> 932da75e-192b-11e2-0800-f92adf02e877
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote endpoint tcp://192.168.0.12:4567 changed identity 779c1072-16e3-11e2-0800-d982ecf3214a -> 9356ae44-192b-11e2-0800-4817043eb975
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote endpoint tcp://192.168.0.13:4567 changed identity e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> 937a239d-192b-11e2-0800-36e5b1623d99
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message relay requesting off
Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: 779c1072-16e3-11e2-0800-d982ecf3214a
Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: d1ae5b68-1848-11e2-0800-df1784883427
[snip]
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
Oct 18 15:56:23 node1 mysqld: } joined {
Oct 18 15:56:23 node1 mysqld: } left {
Oct 18 15:56:23 node1 mysqld: } partitioned {
Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
Oct 18 15:56:23 node1 mysqld: })
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 (node1)
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 (node2)
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 (node4)
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 (node3)
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum results:
Oct 18 15:56:24 node1 mysqld: #011version = 2,
Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
Oct 18 15:56:24 node1 mysqld: #011group UUID = eb9d49a2-fe4c-11e1-0800-2738b8212139
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Flow-control interval: [16, 32]
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New cluster view: global state: eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary, number of nodes: 4, my index: 0, protocol version 2
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign initial position for certification: 3217313, protocol version: 2
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1 (node4) requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor.
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3217358)
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Running: 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst' 'root:password' '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf' 'eb9d49a2-fe4c-11e1-0800-2738b8212139' '3217358' '0''
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: sst_donor_thread signaled with 0
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Flushing tables for SST...
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Provider paused at eb9d49a2-fe4c-11e1-0800-2738b8212139:3217358
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Tables flushed.
and node2:
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could not execute Update_rows event on table sugarcrm3.abo_errorlogin_log; Can't find record in 'abo_errorlogin_log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 312, Error_code: 1032
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 3217313
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 seqnos (l: 1942351, g: 3217313, s: 3217312, d: 3216422, ts: 1350568568555954215)
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply app buffer: x
Oct 18 15:56:08 node2 mysqld: ?P#023, seqno: 3217313, status: WSREP_FATAL
Oct 18 15:56:08 node2 mysqld: #011 at galera/src/replicator_smm.cpp:apply_wscoll():50
Oct 18 15:56:08 node2 mysqld: #011 at galera/src/replicator_smm.cpp:apply_trx_ws():121
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Node consistency compromized, aborting...
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closing send monitor...
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closed send monitor.
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: terminating thread
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: joining thread
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: closing backend
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: view(view_id(NON_PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836) memb {
Oct 18 15:56:10 node2 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:10 node2 mysqld: } joined {
Oct 18 15:56:10 node2 mysqld: } left {
Oct 18 15:56:10 node2
i am very interested to know if you find a reason for this ?...i have xtradb cluster in production and would like to be aware if anything i am doing wrong
On Thursday, October 18, 2012 11:24:22 AM UTC-4, mc wrote:
> We just had a crash in our 4 node setup, as a result of a discution > earlier we've statet adding primary keys to some of our tables, and today > we might have added 100 primary keys, but this one made node2-4 puke and > exit (following a larger downtime because they all wanted to SST teir 75GB > database at once, i plan to remove mysqld from the init scripts and stop > using safe_mysqld in order to be able to control the restart of mysql > myself).
> The query was execute on node1, cause 2-4 to crash, the query was:
> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL AUTO_INCREMENT > PRIMARY KEY FIRST
> and the scema of abo_errorlogin_log is (after the alter):
> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` ( > `id` int(11) NOT NULL AUTO_INCREMENT, > `username` varchar(30) NOT NULL DEFAULT '', > `name` varchar(50) NOT NULL DEFAULT '', > `timestamps` text NOT NULL, > `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0', > `sites` varchar(50) NOT NULL DEFAULT '', > `last_site` varchar(2) NOT NULL DEFAULT '', > `institution_key` varchar(255) NOT NULL DEFAULT '', > `tally` int(11) NOT NULL DEFAULT '0', > `email` varchar(100) NOT NULL DEFAULT '', > `not_student` tinyint(1) NOT NULL DEFAULT '0', > `has_been_changed` tinyint(1) NOT NULL DEFAULT '0', > PRIMARY KEY (`id`) > ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
> In the logfiles @node1:
> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning > message relay requesting on, nonlive peers: tcp://192.168.0.13:4567 > Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting > to e1b77fc5-16c6-11e2-0800-1ff76a856f4d (tcp://192.168.0.13:4567), > attempt 0 > Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting > to 779c1072-16e3-11e2-0800-d982ecf3214a (tcp://192.168.0.12:4567), > attempt 0 > Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting > to d1ae5b68-1848-11e2-0800-df1784883427 (tcp://192.168.0.14:4567), > attempt 0 > Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote > endpoint tcp://192.168.0.14:4567 changed identity > d1ae5b68-1848-11e2-0800-df1784883427 -> 932da75e-192b-11e2-0800-f92adf02e877 > Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote > endpoint tcp://192.168.0.12:4567 changed identity > 779c1072-16e3-11e2-0800-d982ecf3214a -> 9356ae44-192b-11e2-0800-4817043eb975 > Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote > endpoint tcp://192.168.0.13:4567 changed identity > e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> 937a239d-192b-11e2-0800-36e5b1623d99 > Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning > message relay requesting off > Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: > 779c1072-16e3-11e2-0800-d982ecf3214a > Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: > d1ae5b68-1848-11e2-0800-df1784883427 > [snip] > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: > e1b77fc5-16c6-11e2-0800-1ff76a856f4d > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 932da75e-192b-11e2-0800-f92adf02e877 stable > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 9356ae44-192b-11e2-0800-4817043eb975 stable > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 937a239d-192b-11e2-0800-36e5b1623d99 stable > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb { > Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, > Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877, > Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975, > Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99, > Oct 18 15:56:23 node1 mysqld: } joined { > Oct 18 15:56:23 node1 mysqld: } left { > Oct 18 15:56:23 node1 mysqld: } partitioned { > Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a, > Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427, > Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d, > Oct 18 15:56:23 node1 mysqld: }) > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New COMPONENT: > primary = yes, bootstrap = no, my_idx = 0, memb_num = 4 > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9 > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 (node1) > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 (node2) > Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 (node4) > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 (node3) > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum results: > Oct 18 15:56:24 node1 mysqld: #011version = 2, > Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY, > Oct 18 15:56:24 node1 mysqld: #011conf_id = 7, > Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total), > Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313, > Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424, > Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl), > Oct 18 15:56:24 node1 mysqld: #011group UUID = > eb9d49a2-fe4c-11e1-0800-2738b8212139 > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Flow-control > interval: [16, 32] > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New cluster > view: global state: eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: > Primary, number of nodes: 4, my index: 0, protocol version 2 > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification. > Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign initial > position for certification: 3217313, protocol version: 2 > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1 (node4) > requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor. > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Shifting > SYNCED -> DONOR/DESYNCED (TO: 3217358) > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification. > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Running: > 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst' 'root:password' > '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf' > 'eb9d49a2-fe4c-11e1-0800-2738b8212139' '3217358' '0'' > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: > sst_donor_thread signaled with 0 > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Flushing > tables for SST... > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Provider > paused at eb9d49a2-fe4c-11e1-0800-2738b8212139:3217358 > Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Tables flushed.
> and node2: > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could not > execute Update_rows event on table sugarcrm3.abo_errorlogin_log; Can't find > record in 'abo_errorlogin_log', Error_code: 1032; handler error > HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 312, > Error_code: 1032 > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Warning] WSREP: RBR event 2 > Update_rows apply warning: 120, 3217313 > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to > apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 local: 0 > state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 seqnos (l: > 1942351, g: 3217313, s: 3217312, d: 3216422, ts: 1350568568555954215) > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to > apply app buffer: x > Oct 18 15:56:08 node2 mysqld: ?P#023, seqno: 3217313, status: WSREP_FATAL > Oct 18 15:56:08 node2 mysqld: #011 at > galera/src/replicator_smm.cpp:apply_wscoll():50 > Oct 18 15:56:08 node2 mysqld: #011 at > galera/src/replicator_smm.cpp:apply_trx_ws():121 > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Node > consistency compromized, aborting... > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closing send > monitor... > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closed send > monitor. > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: > terminating thread > Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: joining > thread > Oct 18 15:56:08 node2
> We just had a crash in our 4 node setup, as a result of a discution > earlier we've statet adding primary keys to some of our tables, and > today we might have added 100 primary keys, but this one made node2-4 > puke and exit (following a larger downtime because they all wanted to > SST teir 75GB database at once, i plan to remove mysqld from the init > scripts and stop using safe_mysqld in order to be able to control the > restart of mysql myself).
> The query was execute on node1, cause 2-4 to crash, the query was:
> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL AUTO_INCREMENT > PRIMARY KEY FIRST
> and the scema of abo_errorlogin_log is (after the alter):
> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
> `id` int(11) NOT NULL AUTO_INCREMENT,
> `username` varchar(30) NOT NULL DEFAULT '',
> `name` varchar(50) NOT NULL DEFAULT '',
> `timestamps` text NOT NULL,
> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
> `sites` varchar(50) NOT NULL DEFAULT '',
> `last_site` varchar(2) NOT NULL DEFAULT '',
> `institution_key` varchar(255) NOT NULL DEFAULT '',
> `tally` int(11) NOT NULL DEFAULT '0',
> `email` varchar(100) NOT NULL DEFAULT '',
> `not_student` tinyint(1) NOT NULL DEFAULT '0',
> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
> PRIMARY KEY (`id`)
> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
> In the logfiles @node1:
> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning > message relay requesting on, nonlive peers: tcp://192.168.0.13:4567
> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') > reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d > (tcp://192.168.0.13:4567), attempt 0
> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') > reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a > (tcp://192.168.0.12:4567), attempt 0
> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') > reconnecting to d1ae5b68-1848-11e2-0800-df1784883427 > (tcp://192.168.0.14:4567), attempt 0
> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote > endpoint tcp://192.168.0.14:4567 changed identity > d1ae5b68-1848-11e2-0800-df1784883427 -> > 932da75e-192b-11e2-0800-f92adf02e877
> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote > endpoint tcp://192.168.0.12:4567 changed identity > 779c1072-16e3-11e2-0800-d982ecf3214a -> > 9356ae44-192b-11e2-0800-4817043eb975
> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote > endpoint tcp://192.168.0.13:4567 changed identity > e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> > 937a239d-192b-11e2-0800-36e5b1623d99
> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: > (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning > message relay requesting off
> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting > node: 779c1072-16e3-11e2-0800-d982ecf3214a
> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting > node: d1ae5b68-1848-11e2-0800-df1784883427
> [snip]
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, > view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting > node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 932da75e-192b-11e2-0800-f92adf02e877 stable
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 9356ae44-192b-11e2-0800-4817043eb975 stable
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring > 937a239d-192b-11e2-0800-36e5b1623d99 stable
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
> Oct 18 15:56:23 node1 mysqld: } joined {
> Oct 18 15:56:23 node1 mysqld: } left {
> Oct 18 15:56:23 node1 mysqld: } partitioned {
> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
> Oct 18 15:56:23 node1 mysqld: })
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New > COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: > STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 > (node1)
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 > (node2)
> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 > (node4)
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE > EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 > (node3)
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum > results:
> Oct 18 15:56:24 node1 mysqld: #011version = 2,
> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
> Oct 18 15:56:24 node1 mysqld: #011group UUID = > eb9d49a2-fe4c-11e1-0800-2738b8212139
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: > Flow-control interval: [16, 32]
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New > cluster view: global state: > eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary, number > of nodes: 4, my index: 0, protocol version 2
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification.
> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign > initial position for certification: 3217313, protocol version: 2
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1 > (node4) requested state transfer from '*any*'. Selected 0 > (node1)(SYNCED) as donor.
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Shifting > SYNCED -> DONOR/DESYNCED (TO: 3217358)
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: > wsrep_notify_cmd is not defined, skipping notification.
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Running: > 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst' 'root:password' > '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf' > 'eb9d49a2-fe4c-11e1-0800-2738b8212139' '3217358' '0''
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: > sst_donor_thread signaled with 0
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Flushing > tables for SST...
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Provider > paused at eb9d49a2-fe4c-11e1-0800-2738b8212139:3217358
> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Tables > flushed.
> and node2:
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could > not execute Update_rows event on table sugarcrm3.abo_errorlogin_log; > Can't find record in 'abo_errorlogin_log', Error_code: 1032; handler > error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos > 312, Error_code: 1032
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Warning] WSREP: RBR > event 2 Update_rows apply warning: 120, 3217313
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to > apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 > local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 > seqnos (l: 1942351, g: 3217313, s: 3217312, d: 3216422, ts: > 1350568568555954215)
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to > apply app buffer: x
> Oct 18 15:56:08 node2 mysqld: ?P#023, seqno: 3217313, status: WSREP_FATAL
> Oct 18 15:56:08 node2 mysqld: #011 at > galera/src/replicator_smm.cpp:apply_wscoll():50
> Oct 18 15:56:08 node2 mysqld: #011 at > galera/src/replicator_smm.cpp:apply_trx_ws():121
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Node > consistency compromized, aborting...
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closing > send monitor...
> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closed > send monitor.
> Oct 18 15:56:08
This bug is in "Fix Committed" state. It means that fix is in the
development branch, but there is not yet a publicly available build.
Target release milestone is 5.5.27-23.7 and the bug state will change
to "Fix Released" when this build will be released.
> If so will upgrading to latest release fix this problem?
> Den 18/10/12 17.24, Mikkel Christensen skrev:
>> We just had a crash in our 4 node setup, as a result of a discution
>> earlier we've statet adding primary keys to some of our tables, and
>> today we might have added 100 primary keys, but this one made
>> node2-4 puke and exit (following a larger downtime because they all
>> wanted to SST teir 75GB database at once, i plan to remove mysqld
>> from the init scripts and stop using safe_mysqld in order to be
>> able to control the restart of mysql myself).
>> The query was execute on node1, cause 2-4 to crash, the query was:
>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL
>> AUTO_INCREMENT PRIMARY KEY FIRST
>> and the scema of abo_errorlogin_log is (after the alter):
>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>> `id` int(11) NOT NULL AUTO_INCREMENT,
>> `username` varchar(30) NOT NULL DEFAULT '',
>> `name` varchar(50) NOT NULL DEFAULT '',
>> `timestamps` text NOT NULL,
>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>> `sites` varchar(50) NOT NULL DEFAULT '',
>> `last_site` varchar(2) NOT NULL DEFAULT '',
>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>> `tally` int(11) NOT NULL DEFAULT '0',
>> `email` varchar(100) NOT NULL DEFAULT '',
>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>> PRIMARY KEY (`id`)
>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>> In the logfiles @node1:
>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP:
>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>> turning message relay requesting on, nonlive peers:
>> tcp://192.168.0.13:4567
>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP:
>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>> reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>> (tcp://192.168.0.13:4567), attempt 0
>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>> reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a
>> (tcp://192.168.0.12:4567), attempt 0
>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>> reconnecting to d1ae5b68-1848-11e2-0800-df1784883427
>> (tcp://192.168.0.14:4567), attempt 0
>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote
>> endpoint tcp://192.168.0.14:4567 changed identity
>> d1ae5b68-1848-11e2-0800-df1784883427 ->
>> 932da75e-192b-11e2-0800-f92adf02e877
>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>> endpoint tcp://192.168.0.12:4567 changed identity
>> 779c1072-16e3-11e2-0800-d982ecf3214a ->
>> 9356ae44-192b-11e2-0800-4817043eb975
>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>> endpoint tcp://192.168.0.13:4567 changed identity
>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d ->
>> 937a239d-192b-11e2-0800-36e5b1623d99
>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>> turning message relay requesting off
>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>> node: 779c1072-16e3-11e2-0800-d982ecf3214a
>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>> node: d1ae5b68-1848-11e2-0800-df1784883427
>> [snip]
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>> node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
>> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
>> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
>> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
>> Oct 18 15:56:23 node1 mysqld: } joined {
>> Oct 18 15:56:23 node1 mysqld: } left {
>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
>> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
>> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>> Oct 18 15:56:23 node1 mysqld: })
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New
>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>> STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>> EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>> 0 (node1)
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>> 2 (node2)
>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>> 1 (node4)
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE
>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>> 3 (node3)
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum results:
>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
>> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
>> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
>> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
>> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
>> Oct 18 15:56:24 node1 mysqld: #011group UUID =
>> eb9d49a2-fe4c-11e1-0800-2738b8212139
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>> Flow-control interval: [16, 32]
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New
>> cluster view: global state:
>> eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary,
>> number of nodes: 4, my index: 0, protocol version 2
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>> wsrep_notify_cmd is not defined, skipping notification.
>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign
>> initial position for certification: 3217313, protocol version: 2
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1
>> (node4) requested state transfer from '*any*'. Selected 0
>> (node1)(SYNCED) as donor.
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> Shifting SYNCED -> DONOR/DESYNCED (TO: 3217358)
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> wsrep_notify_cmd is not defined, skipping notification.
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> Running: 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst'
>> 'root:password' '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf'
>> 'eb9d49a2-fe4c-11e1-0800-2738b8212139' '3217358' '0''
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> sst_donor_thread signaled with 0
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> Flushing tables for SST...
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>> Provider paused at eb9d49a2-fe4c-11e1-0800-2738b8212139:3217358
>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Tables flushed.
>> and node2:
>> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] Slave SQL:
>> Could not execute Update_rows event on table
>> sugarcrm3.abo_errorlogin_log; Can't find record in
>> 'abo_errorlogin_log', Error_code: 1032; handler error
>> HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos
>> 312, Error_code: 1032
>> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Warning] WSREP: RBR
>> event 2 Update_rows apply warning: 120, 3217313
>> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed
>> to apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version:
>> 2 local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id:
>> 199094775 seqnos (l: 1942351, g: 3217313, s: 3217312, d: 3216422,
>> ts: 1350568568555954215)
>> Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed
>> to apply
Hold your horses! I see "interesting" behavior when testing ALTER to
create primary key:
Node 1:
=======
mysql> create table t (i int);
Query OK, 0 rows affected (0.37 sec)
mysql> insert into t values (1),(2),(3),(4),(5);
Query OK, 5 rows affected (0.00 sec)
mysql> ALTER TABLE `t` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY
KEY FIRST;Query OK, 5 rows affected (0.58 sec)
mysql> select * from t;
+----+------+
| id | i |
+----+------+
| 1 | 1 |
| 3 | 2 |
| 5 | 3 |
| 7 | 4 |
| 9 | 5 |
+----+------+
5 rows in set (0.00 sec)
Node 2:
=======
mysql> select * from t;
+----+------+
| id | i |
+----+------+
| 2 | 1 |
| 4 | 2 |
| 6 | 3 |
| 8 | 4 |
| 10 | 5 |
+----+------+
5 rows in set (0.00 sec)
=> reason for this inconsistency is in automatic control of autoinc
sequences. If such ALTER statement should be used, the autoinc control
must be turned off, and make sure every node has same
auto_increment_increment and auto_increment_offset values. I'm not
sure if we can prepare a fix to avoid this vulnerability, it may be
hard to diagnose.
> This bug is in "Fix Committed" state. It means that fix is in the
> development branch, but there is not yet a publicly available build.
> Target release milestone is 5.5.27-23.7 and the bug state will
> change to "Fix Released" when this build will be released.
>> If so will upgrading to latest release fix this problem?
>> Den 18/10/12 17.24, Mikkel Christensen skrev:
>>> We just had a crash in our 4 node setup, as a result of a
>>> discution earlier we've statet adding primary keys to some of our
>>> tables, and today we might have added 100 primary keys, but this
>>> one made node2-4 puke and exit (following a larger downtime
>>> because they all wanted to SST teir 75GB database at once, i plan
>>> to remove mysqld from the init scripts and stop using safe_mysqld
>>> in order to be able to control the restart of mysql myself).
>>> The query was execute on node1, cause 2-4 to crash, the query was:
>>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL
>>> AUTO_INCREMENT PRIMARY KEY FIRST
>>> and the scema of abo_errorlogin_log is (after the alter):
>>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>>> `id` int(11) NOT NULL AUTO_INCREMENT,
>>> `username` varchar(30) NOT NULL DEFAULT '',
>>> `name` varchar(50) NOT NULL DEFAULT '',
>>> `timestamps` text NOT NULL,
>>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>>> `sites` varchar(50) NOT NULL DEFAULT '',
>>> `last_site` varchar(2) NOT NULL DEFAULT '',
>>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>>> `tally` int(11) NOT NULL DEFAULT '0',
>>> `email` varchar(100) NOT NULL DEFAULT '',
>>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>>> PRIMARY KEY (`id`)
>>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>>> In the logfiles @node1:
>>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP:
>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>> turning message relay requesting on, nonlive peers:
>>> tcp://192.168.0.13:4567
>>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP:
>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>> reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>> (tcp://192.168.0.13:4567), attempt 0
>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>> reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a
>>> (tcp://192.168.0.12:4567), attempt 0
>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>> reconnecting to d1ae5b68-1848-11e2-0800-df1784883427
>>> (tcp://192.168.0.14:4567), attempt 0
>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote
>>> endpoint tcp://192.168.0.14:4567 changed identity
>>> d1ae5b68-1848-11e2-0800-df1784883427 ->
>>> 932da75e-192b-11e2-0800-f92adf02e877
>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>>> endpoint tcp://192.168.0.12:4567 changed identity
>>> 779c1072-16e3-11e2-0800-d982ecf3214a ->
>>> 9356ae44-192b-11e2-0800-4817043eb975
>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>>> endpoint tcp://192.168.0.13:4567 changed identity
>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d ->
>>> 937a239d-192b-11e2-0800-36e5b1623d99
>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>> turning message relay requesting off
>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>> node: 779c1072-16e3-11e2-0800-d982ecf3214a
>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>> node: d1ae5b68-1848-11e2-0800-df1784883427
>>> [snip]
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>> node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
>>> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>>> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
>>> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
>>> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
>>> Oct 18 15:56:23 node1 mysqld: } joined {
>>> Oct 18 15:56:23 node1 mysqld: } left {
>>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>>> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
>>> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
>>> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>>> Oct 18 15:56:23 node1 mysqld: })
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New
>>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>> STATE_EXCHANGE: sent state UUID:
>>> 9a50a41c-192b-11e2-0800-b239f13b62a9
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>> EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>>> 0 (node1)
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>>> 2 (node2)
>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>>> 1 (node4)
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE
>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from
>>> 3 (node3)
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum results:
>>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>>> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
>>> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
>>> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
>>> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
>>> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
>>> Oct 18 15:56:24 node1 mysqld: #011group UUID =
>>> eb9d49a2-fe4c-11e1-0800-2738b8212139
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>> Flow-control interval: [16, 32]
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New
>>> cluster view: global state:
>>> eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary,
>>> number of nodes: 4, my index: 0, protocol version 2
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>> wsrep_notify_cmd is not defined, skipping notification.
>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign
>>> initial position for certification: 3217313, protocol version: 2
>>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1
>>> (node4) requested state transfer from '*any*'. Selected 0
>>> (node1)(SYNCED) as donor.
>>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>>> Shifting SYNCED -> DONOR/DESYNCED (TO: 3217358)
>>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>>> wsrep_notify_cmd is not defined, skipping notification.
>>> Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP:
>>> Running: 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst'
>>> 'root:password' '/var/lib/mysql/'
But as said, cannot promise a fix for this yet. As a potential fix, We
could refuse to add an autoinc column if increment or offset are not
default values.
> => reason for this inconsistency is in automatic control of autoinc
> sequences. If such ALTER statement should be used, the autoinc
> control must be turned off, and make sure every node has same
> auto_increment_increment and auto_increment_offset values. I'm not
> sure if we can prepare a fix to avoid this vulnerability, it may be
> hard to diagnose.
>> This bug is in "Fix Committed" state. It means that fix is in the
>> development branch, but there is not yet a publicly available
>> build. Target release milestone is 5.5.27-23.7 and the bug state
>> will change to "Fix Released" when this build will be released.
>>> If so will upgrading to latest release fix this problem?
>>> Den 18/10/12 17.24, Mikkel Christensen skrev:
>>>> We just had a crash in our 4 node setup, as a result of a
>>>> discution earlier we've statet adding primary keys to some of our
>>>> tables, and today we might have added 100 primary keys, but this
>>>> one made node2-4 puke and exit (following a larger downtime
>>>> because they all wanted to SST teir 75GB database at once, i plan
>>>> to remove mysqld from the init scripts and stop using safe_mysqld
>>>> in order to be able to control the restart of mysql myself).
>>>> The query was execute on node1, cause 2-4 to crash, the query was:
>>>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL
>>>> AUTO_INCREMENT PRIMARY KEY FIRST
>>>> and the scema of abo_errorlogin_log is (after the alter):
>>>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>>>> `id` int(11) NOT NULL AUTO_INCREMENT,
>>>> `username` varchar(30) NOT NULL DEFAULT '',
>>>> `name` varchar(50) NOT NULL DEFAULT '',
>>>> `timestamps` text NOT NULL,
>>>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>>>> `sites` varchar(50) NOT NULL DEFAULT '',
>>>> `last_site` varchar(2) NOT NULL DEFAULT '',
>>>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>>>> `tally` int(11) NOT NULL DEFAULT '0',
>>>> `email` varchar(100) NOT NULL DEFAULT '',
>>>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>>>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>>>> PRIMARY KEY (`id`)
>>>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>>>> In the logfiles @node1:
>>>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP:
>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>> turning message relay requesting on, nonlive peers:
>>>> tcp://192.168.0.13:4567
>>>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP:
>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>> reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>> (tcp://192.168.0.13:4567), attempt 0
>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>> reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a
>>>> (tcp://192.168.0.12:4567), attempt 0
>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>> reconnecting to d1ae5b68-1848-11e2-0800-df1784883427
>>>> (tcp://192.168.0.14:4567), attempt 0
>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>> remote endpoint tcp://192.168.0.14:4567 changed identity
>>>> d1ae5b68-1848-11e2-0800-df1784883427 ->
>>>> 932da75e-192b-11e2-0800-f92adf02e877
>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>> remote endpoint tcp://192.168.0.12:4567 changed identity
>>>> 779c1072-16e3-11e2-0800-d982ecf3214a ->
>>>> 9356ae44-192b-11e2-0800-4817043eb975
>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>> remote endpoint tcp://192.168.0.13:4567 changed identity
>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d ->
>>>> 937a239d-192b-11e2-0800-36e5b1623d99
>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>> turning message relay requesting off
>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>>> node: 779c1072-16e3-11e2-0800-d982ecf3214a
>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>>> node: d1ae5b68-1848-11e2-0800-df1784883427
>>>> [snip]
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting
>>>> node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
>>>> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>>>> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
>>>> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
>>>> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
>>>> Oct 18 15:56:23 node1 mysqld: } joined {
>>>> Oct 18 15:56:23 node1 mysqld: } left {
>>>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>>>> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
>>>> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
>>>> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>>>> Oct 18 15:56:23 node1 mysqld: })
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New
>>>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>> STATE_EXCHANGE: sent state UUID:
>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>> EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> from 0 (node1)
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> from 2 (node2)
>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> from 1 (node4)
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE
>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>> from 3 (node3)
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>> Quorum results:
>>>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>>>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>>>> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
>>>> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
>>>> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
>>>> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
>>>> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
>>>> Oct 18 15:56:24 node1 mysqld: #011group UUID =
>>>> eb9d49a2-fe4c-11e1-0800-2738b8212139
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>> Flow-control interval: [16, 32]
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New
>>>> cluster view: global state:
>>>> eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary,
>>>> number of nodes: 4, my index: 0, protocol version 2
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>> wsrep_notify_cmd is not defined, skipping notification.
>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>> Assign initial position for certification: 3217313, protocol
>>>> version:
> But as said, cannot promise a fix for this yet. As a potential fix, We > could refuse to add an autoinc column if increment or offset are not > default values.
So i can fix this issue by making sure that auto_increment:offsets are equal on all nodes?
>> => reason for this inconsistency is in automatic control of autoinc >> sequences. If such ALTER statement should be used, the autoinc >> control must be turned off, and make sure every node has same >> auto_increment_increment and auto_increment_offset values. I'm not >> sure if we can prepare a fix to avoid this vulnerability, it may be >> hard to diagnose.
So we should avoid adding auto_increment columns to tables who allready has content to make sure that this doesn't happend?
Or what should i tell the developpers to (not) do to avoid it to crash again?
>>> This bug is in "Fix Committed" state. It means that fix is in the >>> development branch, but there is not yet a publicly available build. >>> Target release milestone is 5.5.27-23.7 and the bug state will >>> change to "Fix Released" when this build will be released.
>>>> If so will upgrading to latest release fix this problem?
>>>> Den 18/10/12 17.24, Mikkel Christensen skrev:
>>>>> We just had a crash in our 4 node setup, as a result of a >>>>> discution earlier we've statet adding primary keys to some of our >>>>> tables, and today we might have added 100 primary keys, but this >>>>> one made node2-4 puke and exit (following a larger downtime >>>>> because they all wanted to SST teir 75GB database at once, i plan >>>>> to remove mysqld from the init scripts and stop using safe_mysqld >>>>> in order to be able to control the restart of mysql myself).
>>>>> The query was execute on node1, cause 2-4 to crash, the query was:
>>>>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL >>>>> AUTO_INCREMENT PRIMARY KEY FIRST
>>>>> and the scema of abo_errorlogin_log is (after the alter):
>>>>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>>>>> `id` int(11) NOT NULL AUTO_INCREMENT,
>>>>> `username` varchar(30) NOT NULL DEFAULT '',
>>>>> `name` varchar(50) NOT NULL DEFAULT '',
>>>>> `timestamps` text NOT NULL,
>>>>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>>>>> `sites` varchar(50) NOT NULL DEFAULT '',
>>>>> `last_site` varchar(2) NOT NULL DEFAULT '',
>>>>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>>>>> `tally` int(11) NOT NULL DEFAULT '0',
>>>>> `email` varchar(100) NOT NULL DEFAULT '',
>>>>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>>>>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>>>>> PRIMARY KEY (`id`)
>>>>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>>>>> In the logfiles @node1:
>>>>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP: >>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') >>>>> turning message relay requesting on, nonlive peers: >>>>> tcp://192.168.0.13:4567
>>>>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP: >>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') >>>>> reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d >>>>> (tcp://192.168.0.13:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: >>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') >>>>> reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a >>>>> (tcp://192.168.0.12:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: >>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') >>>>> reconnecting to d1ae5b68-1848-11e2-0800-df1784883427 >>>>> (tcp://192.168.0.14:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote >>>>> endpoint tcp://192.168.0.14:4567 changed identity >>>>> d1ae5b68-1848-11e2-0800-df1784883427 -> >>>>> 932da75e-192b-11e2-0800-f92adf02e877
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote >>>>> endpoint tcp://192.168.0.12:4567 changed identity >>>>> 779c1072-16e3-11e2-0800-d982ecf3214a -> >>>>> 9356ae44-192b-11e2-0800-4817043eb975
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote >>>>> endpoint tcp://192.168.0.13:4567 changed identity >>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> >>>>> 937a239d-192b-11e2-0800-36e5b1623d99
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: >>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') >>>>> turning message relay requesting off
>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: >>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, >>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting >>>>> node: 779c1072-16e3-11e2-0800-d982ecf3214a
>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: >>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, >>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting >>>>> node: d1ae5b68-1848-11e2-0800-df1784883427
>>>>> [snip]
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, >>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting >>>>> node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #011932da75e-192b-11e2-0800-f92adf02e877,
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #0119356ae44-192b-11e2-0800-4817043eb975,
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #011937a239d-192b-11e2-0800-36e5b1623d99,
>>>>> Oct 18 15:56:23 node1 mysqld: } joined {
>>>>> Oct 18 15:56:23 node1 mysqld: } left {
>>>>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #011779c1072-16e3-11e2-0800-d982ecf3214a,
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #011d1ae5b68-1848-11e2-0800-df1784883427,
>>>>> Oct 18 15:56:23 node1 mysqld: >>>>> #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>>>>> Oct 18 15:56:23 node1 mysqld: })
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New >>>>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: >>>>> STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE >>>>> EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE >>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from >>>>> 0 (node1)
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE >>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from >>>>> 2 (node2)
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE >>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from >>>>> 1 (node4)
>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE >>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from >>>>> 3 (node3)
>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum >>>>> results:
>>>>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>>>>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>>>>> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
>>>>> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
>>>>> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
>>>>> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
>>>>> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
>>>>> Oct 18 15:56:24 node1 mysqld: #011group UUID = >>>>> eb9d49a2-fe4c-11e1-0800-2738b8212139
>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24
>> But as said, cannot promise a fix for this yet. As a potential fix,
>> We could refuse to add an autoinc column if increment or offset are
>> not default values.
> So i can fix this issue by making sure that auto_increment:offsets
> are equal on all nodes?
>>> => reason for this inconsistency is in automatic control of
>>> autoinc sequences. If such ALTER statement should be used, the
>>> autoinc control must be turned off, and make sure every node has
>>> same auto_increment_increment and auto_increment_offset values.
>>> I'm not sure if we can prepare a fix to avoid this vulnerability,
>>> it may be hard to diagnose.
> So we should avoid adding auto_increment columns to tables who
> allready has content to make sure that this doesn't happend?
> Or what should i tell the developpers to (not) do to avoid it to crash again?
>>>> This bug is in "Fix Committed" state. It means that fix is in the
>>>> development branch, but there is not yet a publicly available
>>>> build. Target release milestone is 5.5.27-23.7 and the bug state
>>>> will change to "Fix Released" when this build will be released.
>>>>> If so will upgrading to latest release fix this problem?
>>>>> Den 18/10/12 17.24, Mikkel Christensen skrev:
>>>>>> We just had a crash in our 4 node setup, as a result of a
>>>>>> discution earlier we've statet adding primary keys to some of
>>>>>> our tables, and today we might have added 100 primary keys, but
>>>>>> this one made node2-4 puke and exit (following a larger
>>>>>> downtime because they all wanted to SST teir 75GB database at
>>>>>> once, i plan to remove mysqld from the init scripts and stop
>>>>>> using safe_mysqld in order to be able to control the restart of
>>>>>> mysql myself).
>>>>>> The query was execute on node1, cause 2-4 to crash, the query was:
>>>>>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL
>>>>>> AUTO_INCREMENT PRIMARY KEY FIRST
>>>>>> and the scema of abo_errorlogin_log is (after the alter):
>>>>>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>>>>>> `id` int(11) NOT NULL AUTO_INCREMENT,
>>>>>> `username` varchar(30) NOT NULL DEFAULT '',
>>>>>> `name` varchar(50) NOT NULL DEFAULT '',
>>>>>> `timestamps` text NOT NULL,
>>>>>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>>>>>> `sites` varchar(50) NOT NULL DEFAULT '',
>>>>>> `last_site` varchar(2) NOT NULL DEFAULT '',
>>>>>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>>>>>> `tally` int(11) NOT NULL DEFAULT '0',
>>>>>> `email` varchar(100) NOT NULL DEFAULT '',
>>>>>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>>>>>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>>>>>> PRIMARY KEY (`id`)
>>>>>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>>>>>> In the logfiles @node1:
>>>>>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP:
>>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>>>> turning message relay requesting on, nonlive peers:
>>>>>> tcp://192.168.0.13:4567
>>>>>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP:
>>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>>>> reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>>>> (tcp://192.168.0.13:4567), attempt 0
>>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>>>> reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a
>>>>>> (tcp://192.168.0.12:4567), attempt 0
>>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>>>> reconnecting to d1ae5b68-1848-11e2-0800-df1784883427
>>>>>> (tcp://192.168.0.14:4567), attempt 0
>>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>>>> remote endpoint tcp://192.168.0.14:4567 changed identity
>>>>>> d1ae5b68-1848-11e2-0800-df1784883427 ->
>>>>>> 932da75e-192b-11e2-0800-f92adf02e877
>>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>>>> remote endpoint tcp://192.168.0.12:4567 changed identity
>>>>>> 779c1072-16e3-11e2-0800-d982ecf3214a ->
>>>>>> 9356ae44-192b-11e2-0800-4817043eb975
>>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>>>> remote endpoint tcp://192.168.0.13:4567 changed identity
>>>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d ->
>>>>>> 937a239d-192b-11e2-0800-36e5b1623d99
>>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567')
>>>>>> turning message relay requesting off
>>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836))
>>>>>> suspecting node: 779c1072-16e3-11e2-0800-d982ecf3214a
>>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836))
>>>>>> suspecting node: d1ae5b68-1848-11e2-0800-df1784883427
>>>>>> [snip]
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836))
>>>>>> suspecting node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837)
>>>>>> memb {
>>>>>> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>>>>>> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
>>>>>> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
>>>>>> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
>>>>>> Oct 18 15:56:23 node1 mysqld: } joined {
>>>>>> Oct 18 15:56:23 node1 mysqld: } left {
>>>>>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>>>>>> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
>>>>>> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
>>>>>> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>>>>>> Oct 18 15:56:23 node1 mysqld: })
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New
>>>>>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num
>>>>>> = 4
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> STATE_EXCHANGE: sent state UUID:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> STATE EXCHANGE: sent state msg:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> STATE EXCHANGE: got state msg:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 (node1)
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> STATE EXCHANGE: got state msg:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 (node2)
>>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>>> STATE EXCHANGE: got state msg:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 (node4)
>>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>>>> STATE EXCHANGE: got state msg:
>>>>>> 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 (node3)
>>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP:
>>>>>> Quorum results:
>>>>>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>>>>>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>>>>>> Oct 18
MySQL manual claims that auto_increment_increment and
auto_increment_offset are session variables. To be honest, this gives
me all kinds of wild ideas, but assuming it's true, then Galera could
simply set them to 1 and 0 respectively for all DDL statements. Since
you handle DDL specially anyway, and they are never part of a larger
transaction, it should not be a problem to do this.
Mikkel, note that just turning off wsrep_autoincrement_control
globally is no good either, because then your inserts will not work
correctly. To the tables that already have auto_increment columns,
that is.
> But as said, cannot promise a fix for this yet. As a potential fix, We could
> refuse to add an autoinc column if increment or offset are not default
> values.
>> => reason for this inconsistency is in automatic control of autoinc
>> sequences. If such ALTER statement should be used, the autoinc control must
>> be turned off, and make sure every node has same auto_increment_increment
>> and auto_increment_offset values. I'm not sure if we can prepare a fix to
>> avoid this vulnerability, it may be hard to diagnose.
>>> This bug is in "Fix Committed" state. It means that fix is in the
>>> development branch, but there is not yet a publicly available build. Target
>>> release milestone is 5.5.27-23.7 and the bug state will change to "Fix
>>> Released" when this build will be released.
>>>> If so will upgrading to latest release fix this problem?
>>>> Den 18/10/12 17.24, Mikkel Christensen skrev:
>>>>> We just had a crash in our 4 node setup, as a result of a discution
>>>>> earlier we've statet adding primary keys to some of our tables, and today we
>>>>> might have added 100 primary keys, but this one made node2-4 puke and exit
>>>>> (following a larger downtime because they all wanted to SST teir 75GB
>>>>> database at once, i plan to remove mysqld from the init scripts and stop
>>>>> using safe_mysqld in order to be able to control the restart of mysql
>>>>> myself).
>>>>> The query was execute on node1, cause 2-4 to crash, the query was:
>>>>> ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL AUTO_INCREMENT
>>>>> PRIMARY KEY FIRST
>>>>> and the scema of abo_errorlogin_log is (after the alter):
>>>>> | abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
>>>>> `id` int(11) NOT NULL AUTO_INCREMENT,
>>>>> `username` varchar(30) NOT NULL DEFAULT '',
>>>>> `name` varchar(50) NOT NULL DEFAULT '',
>>>>> `timestamps` text NOT NULL,
>>>>> `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
>>>>> `sites` varchar(50) NOT NULL DEFAULT '',
>>>>> `last_site` varchar(2) NOT NULL DEFAULT '',
>>>>> `institution_key` varchar(255) NOT NULL DEFAULT '',
>>>>> `tally` int(11) NOT NULL DEFAULT '0',
>>>>> `email` varchar(100) NOT NULL DEFAULT '',
>>>>> `not_student` tinyint(1) NOT NULL DEFAULT '0',
>>>>> `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
>>>>> PRIMARY KEY (`id`)
>>>>> ) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |
>>>>> In the logfiles @node1:
>>>>> Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP:
>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message
>>>>> relay requesting on, nonlive peers: tcp://192.168.0.13:4567
>>>>> Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP:
>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to
>>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d (tcp://192.168.0.13:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to
>>>>> 779c1072-16e3-11e2-0800-d982ecf3214a (tcp://192.168.0.12:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP:
>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to
>>>>> d1ae5b68-1848-11e2-0800-df1784883427 (tcp://192.168.0.14:4567), attempt 0
>>>>> Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote
>>>>> endpoint tcp://192.168.0.14:4567 changed identity
>>>>> d1ae5b68-1848-11e2-0800-df1784883427 -> 932da75e-192b-11e2-0800-f92adf02e877
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>>>>> endpoint tcp://192.168.0.12:4567 changed identity
>>>>> 779c1072-16e3-11e2-0800-d982ecf3214a -> 9356ae44-192b-11e2-0800-4817043eb975
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote
>>>>> endpoint tcp://192.168.0.13:4567 changed identity
>>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> 937a239d-192b-11e2-0800-36e5b1623d99
>>>>> Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP:
>>>>> (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message
>>>>> relay requesting off
>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node:
>>>>> 779c1072-16e3-11e2-0800-d982ecf3214a
>>>>> Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP:
>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node:
>>>>> d1ae5b68-1848-11e2-0800-df1784883427
>>>>> [snip]
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>> evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER,
>>>>> view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node:
>>>>> e1b77fc5-16c6-11e2-0800-1ff76a856f4d
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring
>>>>> 932da75e-192b-11e2-0800-f92adf02e877 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring
>>>>> 9356ae44-192b-11e2-0800-4817043eb975 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring
>>>>> 937a239d-192b-11e2-0800-36e5b1623d99 stable
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>> view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
>>>>> Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
>>>>> Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
>>>>> Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
>>>>> Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
>>>>> Oct 18 15:56:23 node1 mysqld: } joined {
>>>>> Oct 18 15:56:23 node1 mysqld: } left {
>>>>> Oct 18 15:56:23 node1 mysqld: } partitioned {
>>>>> Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
>>>>> Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
>>>>> Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
>>>>> Oct 18 15:56:23 node1 mysqld: })
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New
>>>>> COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP:
>>>>> STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>>> EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 (node1)
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 (node2)
>>>>> Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE
>>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 (node4)
>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE
>>>>> EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 (node3)
>>>>> Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum
>>>>> results:
>>>>> Oct 18 15:56:24 node1 mysqld: #011version = 2,
>>>>> Oct 18 15:56:24 node1 mysqld: #011component = PRIMARY,
>>>>> Oct 18 15:56:24 node1 mysqld: #011conf_id = 7,
>>>>> Oct 18 15:56:24 node1 mysqld: #011members = 1/4 (joined/total),
>>>>> Oct 18 15:56:24 node1 mysqld: #011act_id = 3217313,
>>>>> Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
>>>>> Oct 18 15:56:24 node1 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),
>>>>> Oct 18 15:56:24
> Mikkel, note that just turning off wsrep_autoincrement_control
> globally is no good either, because then your inserts will not work
> correctly. To the tables that already have auto_increment columns,
> that is.
This is not entirely accurate. Inserts WILL work correctly (that is - nodes will stay consistent), except some of them may be rejected due to a cluster-wide conflict (this is the part that your application may not be prepared to handle) - and that's only when you'll be trying to insert to the same table at different nodes at the very much same moment. And even then this might be mitigated for autocommit inserts by setting wsrep_retry_autocommit to, say, 10.
In other words, wsrep_autoincrement_control is just an optional setting to reduce cluster-wide conflicts, and if you direct all inserts to the table to a single node, wsrep_autoincrement_control is not needed at all.
Regards,
Alex
-- Alexey Yurchenko,
Codership Oy, www.codership.com Skype: alexey.yurchenko, Phone: +358-400-516-011
In this case Mikkel is writing to multiple nodes. You agree that not
having wsrep_autoincrement_control=ON would be unsafe, or is Galera
doing some magic to handle it?
<alexey.yurche...@codership.com> wrote:
> On 2012-10-19 21:07, Henrik Ingo wrote:
>> Mikkel, note that just turning off wsrep_autoincrement_control
>> globally is no good either, because then your inserts will not work
>> correctly. To the tables that already have auto_increment columns,
>> that is.
> This is not entirely accurate. Inserts WILL work correctly (that is - nodes
> will stay consistent), except some of them may be rejected due to a
> cluster-wide conflict (this is the part that your application may not be
> prepared to handle) - and that's only when you'll be trying to insert to the
> same table at different nodes at the very much same moment. And even then
> this might be mitigated for autocommit inserts by setting
> wsrep_retry_autocommit to, say, 10.
> In other words, wsrep_autoincrement_control is just an optional setting to
> reduce cluster-wide conflicts, and if you direct all inserts to the table to
> a single node, wsrep_autoincrement_control is not needed at all.
> In this case Mikkel is writing to multiple nodes. You agree that not
> having wsrep_autoincrement_control=ON would be unsafe, or is Galera
> doing some magic to handle it?
Galera detects a conflict and deterministically rolls back all but one transactions trying to insert to the given row. Thus I consider it safe from the data point of view - all nodes will be consistent, and if you get OK on INSERT, you can rest assured that row was inserted. As I mentioned below it is quite likely that application may freak out (how often are you prepared to handle deadlock on inserting the default value into an autoincrement column?), so from that point of view you probably can call it "unsafe" - for application, but I doubt that you will :) With a properly written application you will just get more deadlocks without wsrep_autoincrement_control than with it.
Also note that if you do like 1 insert per second, it is pretty safe in all respects.
> On Wed, Oct 24, 2012 at 8:58 AM, Alex Yurchenko
> <alexey.yurche...@codership.com> wrote:
>> On 2012-10-19 21:07, Henrik Ingo wrote:
>>> Mikkel, note that just turning off wsrep_autoincrement_control
>>> globally is no good either, because then your inserts will not work
>>> correctly. To the tables that already have auto_increment columns,
>>> that is.
>> This is not entirely accurate. Inserts WILL work correctly (that is >> - nodes
>> will stay consistent), except some of them may be rejected due to a
>> cluster-wide conflict (this is the part that your application may >> not be
>> prepared to handle) - and that's only when you'll be trying to >> insert to the
>> same table at different nodes at the very much same moment. And even >> then
>> this might be mitigated for autocommit inserts by setting
>> wsrep_retry_autocommit to, say, 10.
>> In other words, wsrep_autoincrement_control is just an optional >> setting to
>> reduce cluster-wide conflicts, and if you direct all inserts to the >> table to
>> a single node, wsrep_autoincrement_control is not needed at all.
<alexey.yurche...@codership.com> wrote:
> On 2012-10-24 10:07, Henrik Ingo wrote:
>> Alex,
>> In this case Mikkel is writing to multiple nodes. You agree that not
>> having wsrep_autoincrement_control=ON would be unsafe, or is Galera
>> doing some magic to handle it?
> Galera detects a conflict and deterministically rolls back all but one
> transactions trying to insert to the given row. Thus I consider it safe from
> the data point of view - all nodes will be consistent, and if you get OK on
> INSERT, you can rest assured that row was inserted. As I mentioned below it
> is quite likely that application may freak out (how often are you prepared
> to handle deadlock on inserting the default value into an autoincrement
> column?), so from that point of view you probably can call it "unsafe" - for
> application, but I doubt that you will :) With a properly written
> application you will just get more deadlocks without
> wsrep_autoincrement_control than with it.
> Also note that if you do like 1 insert per second, it is pretty safe in all
> respects.
Ah, right.
What I was worried about was that if I
1) insert a row on node A, and it is assigned auto_increment value 1
2) later, not concurrently, insert a row on node B, it is also
assigned auto_increment value 1
2b) -> things fail
You didn't quite answer this, but reading your text I realize that the
above won't happen, because by the time (1) is completed, the
auto_increment counter also on node B is incremented to return 2 on
next call. This happens also in single node MySQL if I just plain
insert a value "1" into a column that is of type auto_increment.)