MySQL 5.6 / GTID status?

Thorn Roby

unread,

Jan 13, 2014, 2:20:09 PM1/13/14

to prm-d...@googlegroups.com

I understood from the December webinar on the Geo enhancements that MySQL with GTID would be available later in the month. I've downloaded the Github repository and I'm not really sure what to do with it. Do I just need the mysql-prm agent piece? At this point I don't need the booth components.

I'm mostly not sure what to do about the GTID requirement for log-slave-updates conflicting with the installation document prohibiting them.

Yves Trudeau

unread,

Jan 13, 2014, 6:29:44 PM1/13/14

to prm-d...@googlegroups.com

Hi Thorn,
I am working on it, most of Fred's work is integrated, I am now merging the new master score code. Ping me back in a week, it should be ready.

Regards,

Yves

I understood from the December webinar on the Geo enhancements that MySQL with GTID would be available later in the month. I've downloaded the Github repository and I'm not really sure what to do with it. Do I just need the mysql-prm agent piece? At this point I don't need the booth components.

I'm mostly not sure what to do about the GTID requirement for log-slave-updates conflicting with the installation document prohibiting them.

--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss...@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

Yves Trudeau

unread,

Jan 16, 2014, 12:49:25 PM1/16/14

to prm-d...@googlegroups.com

Hi Thorn,
Have a look on github, I pushed it this morning. I'll blog and announce in the next few days.

Regards,

Yves

2014/1/13 Yves Trudeau <trud...@gmail.com>

Thorn Roby

unread,

Jan 16, 2014, 3:24:41 PM1/16/14

to prm-d...@googlegroups.com

Great, thanks, I'll give it a try. Am I right in understanding that if I'm not doing the Geo stuff the only piece I need to download is the mysql-prm agent?

You received this message because you are subscribed to a topic in the Google Groups "PRM-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prm-discuss/XUKhE7QAs30/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prm-discuss...@googlegroups.com.

Yves Trudeau

unread,

Jan 16, 2014, 3:43:16 PM1/16/14

to prm-d...@googlegroups.com

The geo is built-in. For gtid, download mysql_prm56.

Regards,

Yves

2014/1/16 Thorn Roby <thor...@gmail.com>

Thorn Roby

unread,

Jan 22, 2014, 7:13:36 PM1/22/14

to prm-d...@googlegroups.com

I've set up a 3 node cluster using the mysql_prm56 agent, with one set of addresses for the VIPs to be accessed by DB clients (on the first NIC, named em1 on these servers, with addresses on "MM.NN.50.X" in the following status output) and another set on the second (em2, on "MM.NN.32.X"). MySQL (Percona 5.6.15 using GTID) is running and replication is working (except that the cluster seems to intermittently kill the mysql process on the 2 slaves). After trying a number of things, I eventually added an explicit "nic" attribute for the em2 interfaces, but that made no difference. The VIPs (these are distinct from the primary physical addresses of the em1 interfaces) appear to be assigned correctly, but the mysql processes on the em2 interfaces are never seen. Here is the output of "crm configure show", "crm status" and "show slave status":

crm configure show:

node eng-mysqlem2-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlem2-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.181" nic="em2"

node eng-mysqlem2-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.182" nic="em2"

node eng-mysqlha-p1.mydomain.net

node eng-mysqlha-p2.mydomain.net

node eng-mysqlha-p3.mydomain.net

primitive p_mysql ocf:rootpass:mysql \

params config="/etc/my.cnf" pid="/var/lib/mysql/mysqld.pid" socket="/var/lib/mysql/mysql.sock" replication_user="repl" replication_passwd="reppass" max_slave_lag="60" evict_outdated_slaves="false" binary="/root/PS5615/bin/mysqld" test_user="root" test_passwd="rootpass" \

op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \

op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1" \

op start interval="0" timeout="60s" \

op stop interval="0" timeout="60s"

primitive reader_vip_1 ocf:heartbeat:IPaddr2 \

params ip="MM.NN.2.10" nic="em1" \

op monitor interval="10s"

primitive reader_vip_2 ocf:heartbeat:IPaddr2 \

params ip="MM.NN.2.12" nic="em1" \

op monitor interval="10s"

primitive reader_vip_3 ocf:heartbeat:IPaddr2 \

params ip="MM.NN.2.14" nic="em1" \

op monitor interval="10s"

primitive writer_vip ocf:heartbeat:IPaddr2 \

params ip="MM.NN.2.9" nic="em1" \

op monitor interval="10s"

ms ms_MySQL p_mysql \

meta master-max="1" master-node-max="1" clone-max="3" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true"

location loc-No-reader-vip-2 reader_vip_2 \

rule $id="rule-no-reader-vip-2" -inf: readable gt 0

location loc-No-reader-vip-3 reader_vip_3 \

rule $id="rule-no-reader-vip-3" -inf: readable gt 0

location loc-no-reader-vip-1 reader_vip_1 \

rule $id="rule-no-reader-vip-1" -inf: readable gt 0

colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master

order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start

property $id="cib-bootstrap-options" \

dc-version="1.1.8-7.el6-394e906" \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes="3" \

no-quorum-policy="ignore" \

stonith-enabled="false" \

last-lrm-refresh="1338928815"

property $id="mysql_replication" \

p_mysql_REPL_STATUS="9f4f986b-82e5-11e3-9869-d4ae52e8cd5b:9" \

p_mysql_REPL_INFO="eng-mysqlha-p2.mydomain.net|:"

#vim:set syntax=pcmk

crm status:

Last updated: Wed Jan 22 16:45:56 2014

Last change: Wed Jan 22 16:44:33 2014 via cibadmin on eng-mysqlha-p1.mydomain.net

Stack: classic openais (with plugin)

Current DC: eng-mysqlha-p3.mydomain.net - partition with quorum

Version: 1.1.8-7.el6-394e906

6 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ eng-mysqlha-p1.mydomain.net eng-mysqlha-p2.mydomain.net eng-mysqlha-p3.mydomain.net ]

OFFLINE: [ eng-mysqlem2-p1.mydomain.net eng-mysqlem2-p2.mydomain.net eng-mysqlem2-p3.mydomain.net ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p2.mydomain.net

reader_vip_2 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p3.mydomain.net

reader_vip_3 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

Master/Slave Set: ms_MySQL [p_mysql]

Masters: [ eng-mysqlha-p1.mydomain.net ]

Stopped: [ p_mysql:1 p_mysql:2 ]

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: MM.NN.32.180

Master_User: repl

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql_bin.000010

Read_Master_Log_Pos: 1837

Relay_Log_File: mysqld-relay-bin.000013

Relay_Log_Pos: 448

Relay_Master_Log_File: mysql_bin.000010

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 1837

Relay_Log_Space: 622

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 1

Master_UUID: 9f4f986b-82e5-11e3-9869-d4ae52e8cd5b

Master_Info_File: /data/mysql/data5615/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: 9f4f986b-82e5-11e3-9869-d4ae52e8cd5b:1-14

Executed_Gtid_Set: 05b8fa69-82f4-11e3-98c8-d4ae52e8ccb9:1-7,

9f4f986b-82e5-11e3-9869-d4ae52e8cd5b:1-14

Auto_Position: 1

Yves Trudeau

unread,

Jan 23, 2014, 11:03:03 AM1/23/14

to prm-d...@googlegroups.com

Hi Thorn,

is it normal you have 6 nodes defined? I think you shouldn't have the eng-mysqlem2-p*.mydomain.net nodes and only the eng-mysqlha-p*.mydomain.net nodes. In such a case you should have:

node eng-mysqlem2-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlem2-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.181" nic="em2"

node eng-mysqlem2-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.182" nic="em2"

node eng-mysqlha-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlha-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlha-p3.mydomain.net \
attributes p_mysql_mysql_master_IP="MM.

NN.32.182" nic="em2"

2014/1/22 Thorn Roby <thor...@gmail.com>

Yves Trudeau

unread,

Jan 23, 2014, 11:08:00 AM1/23/14

to prm-d...@googlegroups.com

Sent too soon... gmail shortcuts :(

In such a case you should have only:

node eng-mysqlha-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlha-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlha-p3.mydomain.net \
attributes p_mysql_mysql_master_IP="MM.NN.32.182" nic="em2"

Also activate the trace log on the slaves, that may tell us why. To enable the trace:

mkdir -p /tmp/mysql.ocf.ra.debug

touch /tmp/mysql.ocf.ra.debug/log

if there's a stop, there will be a "stop" event in the log file, looks at the previous monitor event to have a clue why.

Regards,

Yves

2014/1/23 Yves Trudeau <trud...@gmail.com>

Thorn Roby

unread,

Jan 23, 2014, 1:35:56 PM1/23/14

to prm-d...@googlegroups.com

Maybe I should clarify what I'm trying to do (keep corosync/pacemaker/replication traffic on a "private" network MM.NN.32.176/28 on the second nic , and "public" DB client access to the VIPs (as well as other access like ssh) on MM.NN.2.50/24 on the first nic):

___________________________________________________________________________________________________________ db client access

| | |

__________|__em1_________ ____|__em1__________________ ____ |_em1_________________

| | | | | |

_________________________ ____________________________ ___________________________

| em2 | em2 | em2

| | |

______________________________________________________________________________________ replication, corosync, pacemaker traffic

There is no gateway on the MM.NN.32.176/28 network and it is (supposedly) configured for multicast, but I'm not certain (but corosync seems OK).

Here are my assumptions:

1. Replication: traffic on em2, initially the first system is the master, the other 2 are slaves. MySQL thinks this is working.

2. Corosync/Pacemaker - monitoring on em2 interfaces, able to establish VIP reader/writer addresses on em1 interfaces.

3. Client DB access to VIPs established on em1.

4. Other access (ssh) to physical em1 address.

Corosync.conf:

compatibility: whitetank

totem {

version: 2

secauth: on

threads: 0

interface {

ringnumber: 0

bindnetaddr: MM.NN.32.176

mcastaddr: 226.94.1.1

mcastport: 5405

ttl: 1

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

to_syslog: yes

debug: off

logfile: /var/log/cluster/corosync.log

debug: on

timestamp: on

logger_subsys {

subsys: AMF

debug: off

}

amf {

mode: disabled

}

When I define 3 nodes on em2 like this:

node eng-mysqlem2-p1.mydomain.net attributes p_mysql_mysql_master_IP="MM.NN.32.180" nic="em2"

node eng-mysqlem2-p2.mydomain.net attributes p_mysql_mysql_master_IP="MM.NN.32.181" nic="em2"

node eng-mysqlem2-p3.mydomain.net attributes p_mysql_mysql_master_IP="MM.NN.32.182" nic="em2"

The other mysqlha nodes on em1 get automatically restored to the pacemaker configuration, whether I make the change on node 1 or node 3, which has become the current DC:

Current DC: eng-mysqlha-p3.mydomain.net - partition with quorum

Version: 1.1.8-7.el6-394e906

6 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ eng-mysqlha-p1.mydomain.net eng-mysqlha-p2.mydomain.net eng-mysqlha-p3.mydomain.net ]

OFFLINE: [ eng-mysqlem2-p1.mydomain.net eng-mysqlem2-p2.mydomain.net eng-mysqlem2-p3.mydomain.net ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p2.mydomain.net

reader_vip_2 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p3.mydomain.net

reader_vip_3 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

Master/Slave Set: ms_MySQL [p_mysql]

Masters: [ eng-mysqlha-p1.mydomain.net ]

Stopped: [ p_mysql:1 p_mysql:2 ]

[root@eng-mysqlha-p1 ~]# crm configure show

node eng-mysqlem2-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.180" nic="em2"

node eng-mysqlem2-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.181" nic="em2"

node eng-mysqlem2-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.182" nic="em2"

node eng-mysqlha-p1.mydomain.net

node eng-mysqlha-p2.mydomain.net

node eng-mysqlha-p3.mydomain.net

If I define the 3 nodes as the em1 interfaces:

node eng-mysqlha-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.2.28" nic="em1"

node eng-mysqlha-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.2.29" nic="em1"

node eng-mysqlha-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.2.30" nic="em1"

[root@eng-mysqlha-p3 ~]# crm status

Last updated: Thu Jan 23 11:30:37 2014

Last change: Thu Jan 23 11:30:33 2014 via cibadmin on eng-mysqlha-p3.mydomain.net

Stack: classic openais (with plugin)

Current DC: eng-mysqlha-p3.mydomain.net - partition with quorum

Version: 1.1.8-7.el6-394e906

3 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ eng-mysqlha-p1.mydomain.net eng-mysqlha-p2.mydomain.net eng-mysqlha-p3.mydomain.net ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p2.mydomain.net

reader_vip_2 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p3.mydomain.net

reader_vip_3 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

Master/Slave Set: ms_MySQL [p_mysql]

Masters: [ eng-mysqlha-p1.mydomain.net ]

Stopped: [ p_mysql:1 p_mysql:2 ]

Which looks pretty good except the slave mysql processes are actually running. Do I need to manually start them in pacemaker? And is it OK to have a configuration that shows no knowledge of the network on which the pacemaker traffic is running?

Yves Trudeau

unread,

Jan 23, 2014, 3:30:22 PM1/23/14

to prm-d...@googlegroups.com

Hi Thorn,

in the node section, you need this:

node eng-mysqlha-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.180"

node eng-mysqlha-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.181"

node eng-mysqlha-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="10.50.32.182"

and that's all. Replication will use the 10.5.32.x IPs.

check the trace log files on the slaves to see why pacemaker think they are stopped. Have you started them manually with the init.d script or have you let pacemaker started them? You must make sure the pid and socket file in the cib match the ones in my.cnf if you start manually. Also starting manually uses mysqld_safe with is not desirable. Best is to let Pacemaker starts Mysql.

Regards,

Yves

2014/1/23 Thorn Roby <thor...@gmail.com>

--

Thorn Roby

unread,

Jan 23, 2014, 7:11:32 PM1/23/14

to prm-d...@googlegroups.com

I tried that but no change. Then I tried to simplify things but that didn't help either.

I think there's something wrong with how the OCF variables are being populated by the mysqld_56prm script. I remember having some similar issues when I tried to set it up last year. What seems to happen is that the mysqld command line that the script executes fails due to the fact that it's not getting all the variables out of my.cnf. In particular, I have the mysql datadir set to /data/mysql/data5615, and that should be the default innodb data directory also, but the mysqld command line started by the script uses /var/lib/mysql instead (and it creates a new ibdata1 file there, then dies because it doesn't find an existing one). I tried symlinking my real directory to /var/lib/mysql but it still dies, with a different error, which leads me to suspect some OCF variable is unset. I tried forcing OCF_ROOT to /usr/lib/ocf in root's environment but that didn't help.

The mysql error I'm getting that suggests a problem with the OCF_ROOT (or some other) variable (this is without forcing it in root's env) :

2014-01-23 16:58:11 7f12c981e7e0 InnoDB: Operating system error number 2 in a file operation.

InnoDB: The error means the system cannot find the path specified.

InnoDB: If you are installing InnoDB, remember that you must create

InnoDB: directories yourself, InnoDB does not create them.

2014-01-23 16:58:11 27911 [ERROR] InnoDB: File .//data/mysql/data5615/ibdata1: 'create' returned OS error 71. Cannot continue operation

The .//data prefix should be just /data.

If I try to run the script manually, I get

/bin/bash /usr/lib/ocf/resource.d/percona/mysql start

/usr/lib/ocf/resource.d/percona/mysql: line 61: /lib/heartbeat/ocf-shellfuncs: No such file or directory

(missing the /usr/lib/ocf prefix which I think OCF_ROOT should be set to)

Thorn Roby

unread,

Jan 23, 2014, 7:14:27 PM1/23/14

to prm-d...@googlegroups.com

Sorry, by "simplify things" I meant I reassigned everything to a single network (the MM.NN.32.0 network where the replication would normally run).

Yves Trudeau

unread,

Jan 24, 2014, 9:17:05 AM1/24/14

to prm-d...@googlegroups.com

hmmm, have you followed the instructions from:

https://github.com/percona/percona-pacemaker-agents/blob/master/doc/PRM-setup-guide.rst

Before you even started you need replication to work. The agent is not supposed to be called directly, it needs numerous variables set by Pacemaker. Next... in your config you have this:

primitive p_mysql ocf:rootpass:mysql \

and based on what I see in your email, it should be:

primitive p_mysql ocf:percona:mysql \

For the nodes, it _must_ be like I wrote, otherwise, it will not work.

Regards,

Yves

2014/1/23 Thorn Roby <thor...@gmail.com>

Thorn Roby

unread,

Jan 24, 2014, 12:23:28 PM1/24/14

to prm-d...@googlegroups.com

Yes, I followed the setup guide. And, as I said, replication is working fine. The configuration does have

primitive p_mysql ocf:percona:mysql

(the other text was an error resulting from overzealous redaction on my part).

The nodes are configured as you suggested:

node eng-mysqlha-p1.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.180"

node eng-mysqlha-p2.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.181"

node eng-mysqlha-p3.mydomain.net \

attributes p_mysql_mysql_master_IP="MM.NN.32.182"

I realize that running the script from the shell requires environment variables, in particular OCF_ROOT, to be set. I tried that, and also tried ocf-tester, with no luck. However, my observations about the apparent failure to read variables from my.cnf (specifically datadir, which is what is causing the restarted mysql instance to fail) were taken from watching the actual arguments in use during the attempted startup, not from running the script manually. Here is the command as the script starts it, note datadir is forced to /var/lib/mysql, not /data/mysql/data5615, as it is in my.cnf. I tried specifically adding the innodb-path-to-datadir variable (which defaults to the mysql datadir if it is not specified, which should also work) but that was also ignored. I also tried symlinking the real datadir to /var/lib/mysql, which also failed but for other reasons. Here are the startup arguments as run by the mysqld script (mysqld_safe is not running, I turned off automatic startup via chkconfig):

/root/PS5615/bin/mysqld --defaults-file=/etc/my.cnf --enforce_gtid_consistency=1 --gtid_mode=on --pid-file=/var/lib/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --user=mysql --skip-slave-start --read-only

With the real datadir symlinked to /var/lib/mysql, the failure looks like this (which is what suggests to me that some of the OCF variables are not getting set right, independent of the problem of not reading datadir from my.cnf):

grep datadir /etc/my.cnf

datadir=/data/mysql/data5615

2014-01-23 17:39:40 35056 [Note] InnoDB: Completed initialization of buffer pool

2014-01-23 17:39:42 7f8fcfd0c7e0 InnoDB: Operating system error number 2 in a file operation.

InnoDB: The error means the system cannot find the path specified.

InnoDB: If you are installing InnoDB, remember that you must create

InnoDB: directories yourself, InnoDB does not create them.

2014-01-23 17:39:42 35056 [ERROR] InnoDB: File .//data/mysql/data5615/ibdata1: 'create' returned OS error 71. Cannot continue operation

If the working directory of the mysqld script happens to be "/", this directory syntax would be OK, but in fact ibdata1 already exists in that directory and does not need to be recreated.

lrwxrwxrwx 1 root root 20 Jan 23 16:38 /var/lib/mysql -> /data/mysql/data5615

ls -ld /data/mysql/data5615

drwxr-xr-x 5 mysql mysql 104 Jan 24 09:59 /data/mysql/data5615

ls -l /data/mysql/data5615/ibdata1

-rw-rw---- 1 mysql mysql 12582912 Jan 23 16:13 /data/mysql/data5615/ibdata1

Yves Trudeau

unread,

Jan 24, 2014, 1:53:44 PM1/24/14

to prm-d...@googlegroups.com

Hi Thorn,
hmmm, it would be simpler if you just set datadir=/data/mysql/data5615 in the p_mysql cib configuration instead of pointing to /var/lib/mysql. Maybe I am wrong but have you set something like this in your my.cnf

innodb_data_file_path=/data/mysql/data5615/ibdata1

if so... keep in mind that path will interpreted as:

/var/lib/mysql/.//data/mysql/data5615/ibdata1

see here:

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_data_file_path

by MySQL. That looks like your error. I suggest you simply remove this line from your my.cnf and retry.

Regards,

Yves

2014/1/24 Thorn Roby <thor...@gmail.com>

Thorn Roby

unread,

Jan 27, 2014, 4:10:59 PM1/27/14

to prm-d...@googlegroups.com

By adding the "datadir" parameter to the CIB I was able to get the cluster to start up and start mysql. However, only 2 of the 3 nodes (P2,P3) successfully join the cluster and start replication. Replication is running on all 3 nodes and is caught up but the cluster reports P1 is not accessible, although it also assigns all 3 reader_vips to it. The cluster sees P2 as current master, and P3 agrees, but P1 still sees the master as P3 instead of P2 (which it was earlier) . Manually changing the master on P1 to be P2 is successful, but the cluster still reports P1 p_mysql is not started.

Current DC: eng-mysqlha-p1.mydomain.net - partition with quorum

Version: 1.1.8-7.el6-394e906

3 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ eng-mysqlha-p1.lanxtra.net eng-mysqlha-p2.lanxtra.net eng-mysqlha-p3.mydomain.net ]

reader_vip_1 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

reader_vip_2 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

reader_vip_3 (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p1.mydomain.net

writer_vip (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p2.mydomain.net

Master/Slave Set: ms_MySQL [p_mysql]

Masters: [ eng-mysqlha-p2.mydomain.net ]

Slaves: [ eng-mysqlha-p3.mydomain.net ]

Stopped: [ p_mysql:2 ]

Failed actions:

p_mysql_start_0 (node=eng-mysqlha-p1.mydomain.net, call=31, rc=1, status=Timed Out): unknown error

I've also noticed that "pacemaker service stop" shuts down the mysql instances on P2 and P3, but not on P1 (I have to do service mysql stop to shut it down).

Thorn Roby

unread,

Jan 29, 2014, 5:34:51 PM1/29/14

to prm-d...@googlegroups.com

After increasing the "start" timeout from 60 to 120 seconds I am able to get the 3 node cluster running. For a while I was able to connect to reader VIPs. However, now the pattern is that the cluster comes up, briefly shows all reader vips assigned to a single node (the one most recently started), then after about one minute all reader VIPs disappear, and no mysql connection can be made to them. The writer VIP remains intact, and all instances of mysql are running. Testing the readable parameter results in all nodes reporting readable:

cibadmin -Q |grep readable |grep nvpair

Restarting all pacemaker processes produces the same end result. Replication is intact, mysql connections can be made via the physical nic addresses and the writer VIP, but no reader VIPs exist.

Yves Trudeau

unread,

Jan 29, 2014, 7:03:42 PM1/29/14

to prm-d...@googlegroups.com

Hi Thorn,
At least there's a progress. Please send me the output of 'crm_mon -A1' went the rvips are gone.

Regards,

Yves

--

Thorn Roby

unread,

Jan 30, 2014, 12:12:59 PM1/30/14

to prm-d...@googlegroups.com

Last updated: Thu Jan 30 10:09:47 2014

Last change: Wed Jan 29 17:05:40 2014 via crmd on eng-mysqlha-p2.mydomain.net

Stack: classic openais (with plugin)

Current DC: eng-mysqlha-p3.mydomain.net - partition with quorum

Version: 1.1.8-7.el6-394e906

3 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ eng-mysqlha-p1.mydomain.net eng-mysqlha-p2.mydomain.net eng-mysqlha-p3.mydomain.net ]

writer_vip (ocf::heartbeat:IPaddr2): Started eng-mysqlha-p3.mydomain.net

Master/Slave Set: ms_MySQL [p_mysql]

Masters: [ eng-mysqlha-p3.mydomain.net ]

Slaves: [ eng-mysqlha-p1.mydomain.net eng-mysqlha-p2.mydomain.net ]

Node Attributes:

* Node eng-mysqlha-p1.mydomain.net:

+ master-p_mysql : 60

+ nic : em2

+ p_mysql_mysql_master_IP : MM.NN.32.180

+ readable : 1

* Node eng-mysqlha-p2.mydomain.net:

+ master-p_mysql : 60

+ nic : em2

+ p_mysql_mysql_master_IP : MM.NN.32.181

+ readable : 1

* Node eng-mysqlha-p3.mydomain.net:

+ master-p_mysql : 1060

+ nic : em2

+ p_mysql_mysql_master_IP : MM.NN.32.182

+ readable : 1

Yves Trudeau

unread,

Jan 30, 2014, 4:38:04 PM1/30/14

to prm-d...@googlegroups.com

Hi Thorn,

I think I found your problem...

location loc-No-reader-vip-2 reader_vip_2 \

rule $id="rule-no-reader-vip-2" -inf: readable gt 0

location loc-No-reader-vip-3 reader_vip_3 \

rule $id="rule-no-reader-vip-3" -inf: readable gt 0

location loc-no-reader-vip-1 reader_vip_1 \

rule $id="rule-no-reader-vip-1" -inf: readable gt 0

if readable is 1, that forbids the resource to start. You should have:

location loc-No-reader-vip-2 reader_vip_2 \

rule $id="rule-no-reader-vip-2" -inf: readable eq 0

location loc-No-reader-vip-3 reader_vip_3 \

rule $id="rule-no-reader-vip-3" -inf: readable eq 0

location loc-no-reader-vip-1 reader_vip_1 \

rule $id="rule-no-reader-vip-1" -inf: readable eq 0

Regards,

Yves

2014-01-30 Thorn Roby <thor...@gmail.com>:

Thorn Roby

unread,

Jan 30, 2014, 6:02:52 PM1/30/14

to prm-d...@googlegroups.com

Great, thanks. It looks like I cut-and-pasted the summary of all CIB entries at the bottom of the setup guide, which has "gt 0", but looking back at the original entry further up in the document, it is "eq 0". I'll go over the rest of them and see if there are any other discrepancies.

Thorn Roby

unread,

Feb 3, 2014, 7:56:52 PM2/3/14

to prm-d...@googlegroups.com

The cluster runs but continually resets the slave. I'm not sure if this has been happening all along, it's possible I just didn't notice because the status looks OK unless I tail the mysql log, and replication does work (I tested a few inserts) . I'm attaching mysql, corosync and pacemaker logs for the duration of a cycle starting up corosync, then pacemaker, and waiting until the master/slave status is established, then shutting everything down.

harlandlogs.tgz

Thorn Roby

unread,

Feb 4, 2014, 12:03:29 PM2/4/14

to prm-d...@googlegroups.com

I forgot to mention that with pacemaker off, replication functions normally with no thread restarts.

Yves Trudeau

unread,

Feb 4, 2014, 1:14:03 PM2/4/14

to prm-d...@googlegroups.com

Hi Thorn,

the pace.log was the way to go! In the merge process, a "return" was missing, causing the continous reconfig. I pushed the updated agent, please update yours.

wget https://github.com/percona/percona-pacemaker-agents/raw/master/agents/mysql_prm56

Regards,

Yves

2014-02-04 Thorn Roby <thor...@gmail.com>:

I forgot to mention that with pacemaker off, replication functions normally with no thread restarts.

On Monday, February 3, 2014 5:56:52 PM UTC-7, Thorn Roby wrote:

The cluster runs but continually resets the slave. I'm not sure if this has been happening all along, it's possible I just didn't notice because the status looks OK unless I tail the mysql log, and replication does work (I tested a few inserts) . I'm attaching mysql, corosync and pacemaker logs for the duration of a cycle starting up corosync, then pacemaker, and waiting until the master/slave status is established, then shutting everything down.

--

Reply all

Reply to author

Forward