3 of 4 nodes failed during "alter table"

118 views
Skip to first unread message

Mikkel Christensen

unread,
Oct 18, 2012, 11:24:14 AM10/18/12
to codersh...@googlegroups.com
We just had a crash in our 4 node setup, as a result of a discution earlier we've statet adding primary keys to some of our tables, and today we might have added 100 primary keys, but this one made node2-4 puke and exit (following a larger downtime because they all wanted to SST teir 75GB database at once, i plan to remove mysqld from the init scripts and stop using safe_mysqld in order to be able to control the restart of mysql myself).

The query was execute on node1, cause 2-4 to crash, the query was:

ALTER TABLE `abo_errorlogin_log` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST



and the scema of abo_errorlogin_log is (after the alter):

| abo_errorlogin_log | CREATE TABLE `abo_errorlogin_log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL DEFAULT '',
  `name` varchar(50) NOT NULL DEFAULT '',
  `timestamps` text NOT NULL,
  `last_timestamp` bigint(20) unsigned NOT NULL DEFAULT '0',
  `sites` varchar(50) NOT NULL DEFAULT '',
  `last_site` varchar(2) NOT NULL DEFAULT '',
  `institution_key` varchar(255) NOT NULL DEFAULT '',
  `tally` int(11) NOT NULL DEFAULT '0',
  `email` varchar(100) NOT NULL DEFAULT '',
  `not_student` tinyint(1) NOT NULL DEFAULT '0',
  `has_been_changed` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=97882 DEFAULT CHARSET=latin1 |


In the logfiles @node1:

Oct 18 15:56:10 node1 mysqld: 121018 15:56:10 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.13:4567
Oct 18 15:56:11 node1 mysqld: 121018 15:56:11 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to e1b77fc5-16c6-11e2-0800-1ff76a856f4d (tcp://192.168.0.13:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to 779c1072-16e3-11e2-0800-d982ecf3214a (tcp://192.168.0.12:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') reconnecting to d1ae5b68-1848-11e2-0800-df1784883427 (tcp://192.168.0.14:4567), attempt 0
Oct 18 15:56:12 node1 mysqld: 121018 15:56:12 [Note] WSREP: remote endpoint tcp://192.168.0.14:4567 changed identity d1ae5b68-1848-11e2-0800-df1784883427 -> 932da75e-192b-11e2-0800-f92adf02e877
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote endpoint tcp://192.168.0.12:4567 changed identity 779c1072-16e3-11e2-0800-d982ecf3214a -> 9356ae44-192b-11e2-0800-4817043eb975
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: remote endpoint tcp://192.168.0.13:4567 changed identity e1b77fc5-16c6-11e2-0800-1ff76a856f4d -> 937a239d-192b-11e2-0800-36e5b1623d99
Oct 18 15:56:13 node1 mysqld: 121018 15:56:13 [Note] WSREP: (13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, 'tcp://0.0.0.0:4567') turning message relay requesting off
Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: 779c1072-16e3-11e2-0800-d982ecf3214a
Oct 18 15:56:14 node1 mysqld: 121018 15:56:14 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: d1ae5b68-1848-11e2-0800-df1784883427
[snip]
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: evs::proto(13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8, GATHER, view_id(REG,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836)) suspecting node: e1b77fc5-16c6-11e2-0800-1ff76a856f4d
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 932da75e-192b-11e2-0800-f92adf02e877 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 9356ae44-192b-11e2-0800-4817043eb975 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: declaring 937a239d-192b-11e2-0800-36e5b1623d99 stable
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: view(view_id(PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,837) memb {
Oct 18 15:56:23 node1 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
Oct 18 15:56:23 node1 mysqld: #011932da75e-192b-11e2-0800-f92adf02e877,
Oct 18 15:56:23 node1 mysqld: #0119356ae44-192b-11e2-0800-4817043eb975,
Oct 18 15:56:23 node1 mysqld: #011937a239d-192b-11e2-0800-36e5b1623d99,
Oct 18 15:56:23 node1 mysqld: } joined {
Oct 18 15:56:23 node1 mysqld: } left {
Oct 18 15:56:23 node1 mysqld: } partitioned {
Oct 18 15:56:23 node1 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:23 node1 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
Oct 18 15:56:23 node1 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
Oct 18 15:56:23 node1 mysqld: })
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 9a50a41c-192b-11e2-0800-b239f13b62a9
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: sent state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 0 (node1)
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 2 (node2)
Oct 18 15:56:23 node1 mysqld: 121018 15:56:23 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 1 (node4)
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: STATE EXCHANGE: got state msg: 9a50a41c-192b-11e2-0800-b239f13b62a9 from 3 (node3)
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Quorum results:
Oct 18 15:56:24 node1 mysqld: #011version    = 2,
Oct 18 15:56:24 node1 mysqld: #011component  = PRIMARY,
Oct 18 15:56:24 node1 mysqld: #011conf_id    = 7,
Oct 18 15:56:24 node1 mysqld: #011members    = 1/4 (joined/total),
Oct 18 15:56:24 node1 mysqld: #011act_id     = 3217313,
Oct 18 15:56:24 node1 mysqld: #011last_appl. = 3216424,
Oct 18 15:56:24 node1 mysqld: #011protocols  = 0/4/2 (gcs/repl/appl),
Oct 18 15:56:24 node1 mysqld: #011group UUID = eb9d49a2-fe4c-11e1-0800-2738b8212139
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Flow-control interval: [16, 32]
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: New cluster view: global state: eb9d49a2-fe4c-11e1-0800-2738b8212139:3217313, view# 8: Primary, number of nodes: 4, my index: 0, protocol version 2
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Oct 18 15:56:24 node1 mysqld: 121018 15:56:24 [Note] WSREP: Assign initial position for certification: 3217313, protocol version: 2
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Node 1 (node4) requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor.
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3217358)
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Running: 'wsrep_sst_rsync 'donor' '192.168.0.14:4444/rsync_sst' 'root:password' '/var/lib/mysql/' '/etc/mysql/conf.d/wsrep.cnf' 'eb9d49a2-fe4c-11e1-0800-2738b8212139' '3217358' '0''
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: sst_donor_thread signaled with 0
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Flushing tables for SST...
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Provider paused at eb9d49a2-fe4c-11e1-0800-2738b8212139:3217358
Oct 18 15:56:26 node1 mysqld: 121018 15:56:26 [Note] WSREP: Tables flushed.


and node2:
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could not execute Update_rows event on table sugarcrm3.abo_errorlogin_log; Can't find record in 'abo_errorlogin_log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 312, Error_code: 1032
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 3217313
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 seqnos (l: 1942351, g: 3217313, s: 3217312, d: 3216422, ts: 1350568568555954215)
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply app buffer: x
Oct 18 15:56:08 node2 mysqld: ?P#023, seqno: 3217313, status: WSREP_FATAL
Oct 18 15:56:08 node2 mysqld: #011 at galera/src/replicator_smm.cpp:apply_wscoll():50
Oct 18 15:56:08 node2 mysqld: #011 at galera/src/replicator_smm.cpp:apply_trx_ws():121
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [ERROR] WSREP: Node consistency compromized, aborting...
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closing send monitor...
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: Closed send monitor.
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: terminating thread
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: joining thread
Oct 18 15:56:08 node2 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: closing backend
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: view(view_id(NON_PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836) memb {
Oct 18 15:56:10 node2 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:10 node2 mysqld: } joined {
Oct 18 15:56:10 node2 mysqld: } left {
Oct 18 15:56:10 node2 mysqld: } partitioned {
Oct 18 15:56:10 node2 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
Oct 18 15:56:10 node2 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
Oct 18 15:56:10 node2 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
Oct 18 15:56:10 node2 mysqld: })
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: view((empty))
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: gcomm: closed
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [8, 16]
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Received NON-PRIMARY.
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 3217313)
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Received self-leave message.
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [0, 0]
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3217313)
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: RECV thread exiting 0: Success
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: recv_thread() joined.
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: Closing slave action queue.
Oct 18 15:56:10 node2 mysqld: 121018 15:56:10 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Oct 18 15:56:11 node2 mysqld_safe: Number of processes running now: 0
Oct 18 15:56:11 node2 mysqld_safe: mysqld restarted

node3:
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could not execute Update_rows event on table sugarcrm3.abo_errorlogin_log; Can't find record in 'abo_errorlogin_log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;
 the event's master log FIRST, end_log_pos 312, Error_code: 1032
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 3217313
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 seqnos (l: 2010094, g: 32173
13, s: 3217312, d: 3216422, ts: 1350568568555954215)
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply app buffer: x
Oct 18 15:56:08 node3 mysqld: <80>P#023, seqno: 3217313, status: WSREP_FATAL
Oct 18 15:56:08 node3 mysqld: #011 at galera/src/replicator_smm.cpp:apply_wscoll():50
Oct 18 15:56:08 node3 mysqld: #011 at galera/src/replicator_smm.cpp:apply_trx_ws():121
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [ERROR] WSREP: Node consistency compromized, aborting...
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Note] WSREP: Closing send monitor...
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Note] WSREP: Closed send monitor.
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: terminating thread
Oct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: joining threadOct 18 15:56:08 node3 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: closing backendOct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: view(view_id(NON_PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836) memb {
Oct 18 15:56:10 node3 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
Oct 18 15:56:10 node3 mysqld: } joined {
Oct 18 15:56:10 node3 mysqld: } left {
Oct 18 15:56:10 node3 mysqld: } partitioned {
Oct 18 15:56:10 node3 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
Oct 18 15:56:10 node3 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:10 node3 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
Oct 18 15:56:10 node3 mysqld: })
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: view((empty))
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: gcomm: closed
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [8, 16]
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Received NON-PRIMARY.
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 3217313)
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Received self-leave message.
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [0, 0]
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3217313)
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: RECV thread exiting 0: Success
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: recv_thread() joined.
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: Closing slave action queue.
Oct 18 15:56:10 node3 mysqld: 121018 15:56:10 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Oct 18 15:56:12 node3 mysqld_safe: Number of processes running now: 0
Oct 18 15:56:12 node3 mysqld_safe: mysqld restarted

and node4:
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [ERROR] Slave SQL: Could not execute Update_rows event on table sugarcrm3.abo_errorlogin_log; Can't find record in 'abo_errorlogin_log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 312, Error_code: 1032
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 3217313
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply trx: source: 13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 271047 trx_id: 199094775 seqnos (l: 741198, g: 3217313, s: 3217312, d: 3216422, ts: 1350568568555954215)
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [ERROR] WSREP: Failed to apply app buffer: x
Oct 18 15:56:08 node4 mysqld: <80>P#023, seqno: 3217313, status: WSREP_FATAL
Oct 18 15:56:08 node4 mysqld: #011 at galera/src/replicator_smm.cpp:apply_wscoll():50
Oct 18 15:56:08 node4 mysqld: #011 at galera/src/replicator_smm.cpp:apply_trx_ws():121
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [ERROR] WSREP: Node consistency compromized, aborting...
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Note] WSREP: Closing send monitor...
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Note] WSREP: Closed send monitor.
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: terminating thread
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: joining thread
Oct 18 15:56:08 node4 mysqld: 121018 15:56:08 [Note] WSREP: gcomm: closing backend
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: view(view_id(NON_PRIM,13b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,836) memb {
Oct 18 15:56:10 node4 mysqld: #011d1ae5b68-1848-11e2-0800-df1784883427,
Oct 18 15:56:10 node4 mysqld: } joined {
Oct 18 15:56:10 node4 mysqld: } left {
Oct 18 15:56:10 node4 mysqld: } partitioned {
Oct 18 15:56:10 node4 mysqld: #01113b3b9be-17a7-11e2-0800-5d2a4f0ec2e8,
Oct 18 15:56:10 node4 mysqld: #011779c1072-16e3-11e2-0800-d982ecf3214a,
Oct 18 15:56:10 node4 mysqld: #011e1b77fc5-16c6-11e2-0800-1ff76a856f4d,
Oct 18 15:56:10 node4 mysqld: })
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: view((empty))
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: gcomm: closed
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [8, 16]
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Received NON-PRIMARY.
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 3217313)
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Received self-leave message.
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Flow-control interval: [0, 0]
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3217313)
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: RECV thread exiting 0: Success
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: recv_thread() joined.
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: Closing slave action queue.
Oct 18 15:56:10 node4 mysqld: 121018 15:56:10 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Oct 18 15:56:11 node4 mysqld_safe: Number of processes running now: 0
Oct 18 15:56:11 node4 mysqld_safe: mysqld restarted

Why did this happend? And what do i do to avoid it happending again?

-- 
De bedste hilsener / Best Regards
Mikkel Christensen

Mikkel Christensen

unread,
Oct 19, 2012, 4:20:09 AM10/19/12
to codersh...@googlegroups.com
Might this be related:

https://bugs.launchpad.net/codership-mysql/+bug/1003929

If so will upgrading to latest release fix this problem?

Den 18/10/12 17.24, Mikkel Christensen skrev:
--
 
 


-- 
De bedste hilsener / Best Regards
Mikkel Christensen

Mikjaer ApS
Silkeborgvej 128
8000 Århus C

Tlf. +45 4440 1337

Hosting - High Availability - High Performance Hosting - Webshops
Linux - Unix - BSD - Firewalls - Sikkerhed - Søgemaskineoptimering

Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
MySQL - Delphi mv.

Vi anvender udelukkende professionelle folk, de ydelser vi ikke
umiddelbart kan levere vha. inhouse konsulenter løser vi via vort
enorme netværk af freelancere. På samme vis kan vi via søster-
virksomheder, samarbejdspartnere og interne resourcer levere så
godt som alt inden for IT Udstyr, Hosting, Internet forbindelser,
Udvikling, Mobil kommunikation, Auditering, Fejlfinding, mv.

seppo....@codership.com

unread,
Oct 19, 2012, 4:27:56 AM10/19/12
to codersh...@googlegroups.com
This bug is in "Fix Committed" state. It means that fix is in the
development branch, but there is not yet a publicly available build.
Target release milestone is 5.5.27-23.7 and the bug state will change
to "Fix Released" when this build will be released.

-seppo
> --


seppo....@codership.com

unread,
Oct 19, 2012, 8:05:06 AM10/19/12
to codersh...@googlegroups.com
Hold your horses! I see "interesting" behavior when testing ALTER to
create primary key:

Node 1:
=======

mysql> create table t (i int);
Query OK, 0 rows affected (0.37 sec)

mysql> insert into t values (1),(2),(3),(4),(5);
Query OK, 5 rows affected (0.00 sec)

mysql> ALTER TABLE `t` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY
KEY FIRST;Query OK, 5 rows affected (0.58 sec)

mysql> select * from t;
+----+------+
| id | i |
+----+------+
| 1 | 1 |
| 3 | 2 |
| 5 | 3 |
| 7 | 4 |
| 9 | 5 |
+----+------+
5 rows in set (0.00 sec)


Node 2:
=======

mysql> select * from t;
+----+------+
| id | i |
+----+------+
| 2 | 1 |
| 4 | 2 |
| 6 | 3 |
| 8 | 4 |
| 10 | 5 |
+----+------+
5 rows in set (0.00 sec)


=> reason for this inconsistency is in automatic control of autoinc
sequences. If such ALTER statement should be used, the autoinc control
must be turned off, and make sure every node has same
auto_increment_increment and auto_increment_offset values. I'm not
sure if we can prepare a fix to avoid this vulnerability, it may be
hard to diagnose.


-seppo
> --


seppo....@codership.com

unread,
Oct 19, 2012, 8:20:31 AM10/19/12
to codersh...@googlegroups.com
This behavior is actually a known issue, we have a bug tracker here:
https://bugs.launchpad.net/codership-mysql/+bug/587170

But as said, cannot promise a fix for this yet. As a potential fix, We
could refuse to add an autoinc column if increment or offset are not
default values.

-seppo
> --



Mikkel Christensen

unread,
Oct 19, 2012, 9:02:46 AM10/19/12
to codersh...@googlegroups.com
Den 19/10/12 14.20, seppo....@codership.com skrev:
> This behavior is actually a known issue, we have a bug tracker here:
> https://bugs.launchpad.net/codership-mysql/+bug/587170
>
> But as said, cannot promise a fix for this yet. As a potential fix, We
> could refuse to add an autoinc column if increment or offset are not
> default values.
So i can fix this issue by making sure that auto_increment:offsets are
equal on all nodes?
So we should avoid adding auto_increment columns to tables who allready
has content to make sure that this doesn't happend?

Or what should i tell the developpers to (not) do to avoid it to crash
again?

/ Mikkel
>>>> 8000 �rhus C
>>>>
>>>> Tlf. +45 4440 1337
>>>>
>>>> Hosting - High Availability - High Performance Hosting - Webshops
>>>> Linux - Unix - BSD - Firewalls - Sikkerhed - S�gemaskineoptimering
>>>>
>>>> Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
>>>> MySQL - Delphi mv.
>>>>
>>>> Vi anvender udelukkende professionelle folk, de ydelser vi ikke
>>>> umiddelbart kan levere vha. inhouse konsulenter l�ser vi via vort
>>>> enorme netv�rk af freelancere. P� samme vis kan vi via s�ster-
>>>> virksomheder, samarbejdspartnere og interne resourcer levere s�
>>>> godt som alt inden for IT Udstyr, Hosting, Internet forbindelser,
>>>> Udvikling, Mobil kommunikation, Auditering, Fejlfinding, mv.
>>>>
>>>> --
>>>
>>>
>>> --
>>
>>
>> --
>
>
>


--
De bedste hilsener / Best Regards
Mikkel Christensen

Mikjaer ApS
Silkeborgvej 128
8000 �rhus C

Tlf. +45 4440 1337

Hosting - High Availability - High Performance Hosting - Webshops
Linux - Unix - BSD - Firewalls - Sikkerhed - S�gemaskineoptimering

Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
MySQL - Delphi mv.

Vi anvender udelukkende professionelle folk, de ydelser vi ikke
umiddelbart kan levere vha. inhouse konsulenter l�ser vi via vort
enorme netv�rk af freelancere. P� samme vis kan vi via s�ster-
virksomheder, samarbejdspartnere og interne resourcer levere s�

seppo....@codership.com

unread,
Oct 19, 2012, 9:32:05 AM10/19/12
to codersh...@googlegroups.com
yes, better also set wsrep_autoincrement_control=0. Otherwise Galera
will adjust increments & offsets during next node join.
>>>>> 8000 Århus C
>>>>>
>>>>> Tlf. +45 4440 1337
>>>>>
>>>>> Hosting - High Availability - High Performance Hosting - Webshops
>>>>> Linux - Unix - BSD - Firewalls - Sikkerhed - Søgemaskineoptimering
>>>>>
>>>>> Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
>>>>> MySQL - Delphi mv.
>>>>>
>>>>> Vi anvender udelukkende professionelle folk, de ydelser vi ikke
>>>>> umiddelbart kan levere vha. inhouse konsulenter løser vi via vort
>>>>> enorme netværk af freelancere. På samme vis kan vi via søster-
>>>>> virksomheder, samarbejdspartnere og interne resourcer levere så
>>>>> godt som alt inden for IT Udstyr, Hosting, Internet forbindelser,
>>>>> Udvikling, Mobil kommunikation, Auditering, Fejlfinding, mv.
>>>>>
>>>>> --
>>>>
>>>>
>>>> --
>>>
>>>
>>> --
>>
>>
>>
>
>
> --
> De bedste hilsener / Best Regards
> Mikkel Christensen
>
> Mikjaer ApS
> Silkeborgvej 128
> 8000 Århus C
>
> Tlf. +45 4440 1337
>
> Hosting - High Availability - High Performance Hosting - Webshops
> Linux - Unix - BSD - Firewalls - Sikkerhed - Søgemaskineoptimering
>
> Systemudvikling: Php, Perl - C - C+ - C# - Javascript - Smarty
> MySQL - Delphi mv.
>
> Vi anvender udelukkende professionelle folk, de ydelser vi ikke
> umiddelbart kan levere vha. inhouse konsulenter løser vi via vort
> enorme netværk af freelancere. På samme vis kan vi via søster-
> virksomheder, samarbejdspartnere og interne resourcer levere så

Henrik Ingo

unread,
Oct 19, 2012, 2:07:24 PM10/19/12
to seppo....@codership.com, codersh...@googlegroups.com
Seppo,

MySQL manual claims that auto_increment_increment and
auto_increment_offset are session variables. To be honest, this gives
me all kinds of wild ideas, but assuming it's true, then Galera could
simply set them to 1 and 0 respectively for all DDL statements. Since
you handle DDL specially anyway, and they are never part of a larger
transaction, it should not be a problem to do this.

Mikkel, note that just turning off wsrep_autoincrement_control
globally is no good either, because then your inserts will not work
correctly. To the tables that already have auto_increment columns,
that is.

henrik
> --
>
>



--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Alex Yurchenko

unread,
Oct 24, 2012, 1:58:36 AM10/24/12
to codersh...@googlegroups.com
On 2012-10-19 21:07, Henrik Ingo wrote:
> Mikkel, note that just turning off wsrep_autoincrement_control
> globally is no good either, because then your inserts will not work
> correctly. To the tables that already have auto_increment columns,
> that is.

This is not entirely accurate. Inserts WILL work correctly (that is -
nodes will stay consistent), except some of them may be rejected due to
a cluster-wide conflict (this is the part that your application may not
be prepared to handle) - and that's only when you'll be trying to insert
to the same table at different nodes at the very much same moment. And
even then this might be mitigated for autocommit inserts by setting
wsrep_retry_autocommit to, say, 10.

In other words, wsrep_autoincrement_control is just an optional setting
to reduce cluster-wide conflicts, and if you direct all inserts to the
table to a single node, wsrep_autoincrement_control is not needed at
all.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Henrik Ingo

unread,
Oct 24, 2012, 3:07:17 AM10/24/12
to Alex Yurchenko, codersh...@googlegroups.com
Alex,

In this case Mikkel is writing to multiple nodes. You agree that not
having wsrep_autoincrement_control=ON would be unsafe, or is Galera
doing some magic to handle it?

henrik

Alex Yurchenko

unread,
Oct 24, 2012, 2:40:55 PM10/24/12
to henri...@avoinelama.fi, codersh...@googlegroups.com
On 2012-10-24 10:07, Henrik Ingo wrote:
> Alex,
>
> In this case Mikkel is writing to multiple nodes. You agree that not
> having wsrep_autoincrement_control=ON would be unsafe, or is Galera
> doing some magic to handle it?

Galera detects a conflict and deterministically rolls back all but one
transactions trying to insert to the given row. Thus I consider it safe
from the data point of view - all nodes will be consistent, and if you
get OK on INSERT, you can rest assured that row was inserted. As I
mentioned below it is quite likely that application may freak out (how
often are you prepared to handle deadlock on inserting the default value
into an autoincrement column?), so from that point of view you probably
can call it "unsafe" - for application, but I doubt that you will :)
With a properly written application you will just get more deadlocks
without wsrep_autoincrement_control than with it.

Also note that if you do like 1 insert per second, it is pretty safe in
all respects.

Regards,
Alex

Henrik Ingo

unread,
Oct 24, 2012, 11:15:32 PM10/24/12
to Alex Yurchenko, codersh...@googlegroups.com
On Wed, Oct 24, 2012 at 9:40 PM, Alex Yurchenko
<alexey.y...@codership.com> wrote:
> On 2012-10-24 10:07, Henrik Ingo wrote:
>>
>> Alex,
>>
>> In this case Mikkel is writing to multiple nodes. You agree that not
>> having wsrep_autoincrement_control=ON would be unsafe, or is Galera
>> doing some magic to handle it?
>
>
> Galera detects a conflict and deterministically rolls back all but one
> transactions trying to insert to the given row. Thus I consider it safe from
> the data point of view - all nodes will be consistent, and if you get OK on
> INSERT, you can rest assured that row was inserted. As I mentioned below it
> is quite likely that application may freak out (how often are you prepared
> to handle deadlock on inserting the default value into an autoincrement
> column?), so from that point of view you probably can call it "unsafe" - for
> application, but I doubt that you will :) With a properly written
> application you will just get more deadlocks without
> wsrep_autoincrement_control than with it.
>
> Also note that if you do like 1 insert per second, it is pretty safe in all
> respects.

Ah, right.

What I was worried about was that if I
1) insert a row on node A, and it is assigned auto_increment value 1
2) later, not concurrently, insert a row on node B, it is also
assigned auto_increment value 1
2b) -> things fail

You didn't quite answer this, but reading your text I realize that the
above won't happen, because by the time (1) is completed, the
auto_increment counter also on node B is incremented to return 2 on
next call. This happens also in single node MySQL if I just plain
insert a value "1" into a column that is of type auto_increment.)

Thanks for clarifying, again.

henrik
Reply all
Reply to author
Forward
0 new messages