PRM does not prevent failover to a broken replica

133 views
Skip to first unread message

aurelien lemaire

unread,
Jul 25, 2014, 6:09:20 AM7/25/14
to prm-d...@googlegroups.com, hos...@smile.fr
Hi folks,

hope you 're going all as great as the work you produce.

Short Story
I notice while testing your percona resource agent an odd decision when a replica get broken (for instance : duplicate key ) : this slave can get promoted Master if current master crash (power outage/ kernel panic, whatever the raeson is). It ends up with "a-lot-of-missing-data Databases" becoming the Master --> OUCHHH !

IMHO : a broken slave should never be promotable until it get the replication fixed.


Long story/HOW TO reproduce

Knowing the relative stiffness of mysql replication : we often got integrity diff (thanks pt-table-checksum by the way) that result sometime with a broken replica due to 1062 duplicated key error.

1- setup a 2 node Master<-> slave with percona 5.5 , SBR replication, 1 vip, corosync 1.4.2, pacemaker 1.1.7 .and percona RA from today github branch.
Here the conf:
pacemaker-prm2:~# crm configure show
node pacemaker
-prm1 \
    attributes IP
="x.x.x.1"
node pacemaker
-prm2 \
    attributes IP
="x.x.x.2"
primitive p_mysql ocf
:percona:mysql3 \
   
params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" replication_user="MYREPUSER" replication_passwd="MYREPPWD" max_slave_lag="60" evict_outdated_slaves="true" binary="/usr/bin/mysqld_safe" test_user="MYUSER" test_passwd="MYPASSWORD" test_table="test.example" \
    op monitor interval
="5s" role="Master" OCF_CHECK_LEVEL="1" \
    op monitor interval
="3s" role="Slave" OCF_CHECK_LEVEL="1" \
    op start interval
="0" timeout="300s" \
    op stop interval
="0" timeout="300s"
primitive p_vip ocf
:heartbeat:IPaddr2 \
   
params ip="x.x.x.3" cidr_netmask="25" nic="eth0" iflabel="vip-mysql" \
    op monitor interval
="10s"
ms ms_mysql p_mysql
\
    meta master
-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Master"
colocation c_vip_on_master inf
: p_vip ms_mysql:Master
order o_ms_mysql_promote_before_p_vip inf
: ms_mysql:promote p_vip:start
property $id
="cib-bootstrap-options" \
    dc
-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
    cluster
-infrastructure="openais" \
    stonith
-enabled="false" \
    expected
-quorum-votes="2" \
   
no-quorum-policy="ignore" \
   
last-lrm-refresh="1406209717"
property $id
="mysql_replication" \
    p_mysql_REPL_INFO
="pacemaker-prm1|mysql-bin.000298|107" \
    p_mysql_REPL_STATUS
="mysql-bin.000298|4334|104857600"
rsc_defaults $id
="rsc-options" \
    resource
-stickiness="100"



From here the prm1 is Master and prm2 is slave

2- Broke your repli
note : I use duplicated key method to break my repli
pacemaker-prm2 > mysql test -e "insert into example_autoincrement(id,data) values('5','test5')"
pacemaker
-prm1 > mysql test -e "insert into example_autoincrement(id,data) values('5','test5')"



3- notice the broken repli :
mysql level :
pacemaker-prm2:~# mysql -e "show slave status\G" | egrep "Last_Errno|Last_Error|Seconds_Behind_Master|Running"
             
Slave_IO_Running: Yes
           
Slave_SQL_Running: No
                   
Last_Errno: 1062
                   
Last_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into example_autoincrement(id,data) values('5','test5')'
       
Seconds_Behind_Master: NULL


Pacemaker level: crm_mon -A1
Online: [ pacemaker-prm1 pacemaker-prm2 ]

p_vip  
(ocf::heartbeat:IPaddr2):       Started pacemaker-prm1
 
Master/Slave Set: ms_mysql [p_mysql]
     
Masters: [ pacemaker-prm1 ]
     
Slaves: [ pacemaker-prm2 ]

Node Attributes:
* Node pacemaker-prm1:
   
+ IP                                : x.x.x.1
   
+ master-p_mysql:0                  : 2147483647
   
+ readable                          : 1
* Node pacemaker-prm2:
   
+ IP                                : x.x.x.2
   
+ master-p_mysql:1                  : 0
   
+ readable                          : 0


Notice the readable of prm2 is now "0". --> good to prevent reader to access the broekn replica but not enough to prevent failover to it

crm_simulate -sL
Allocation scores:
clone_color
: ms_mysql allocation score on pacemaker-prm1: 100
clone_color
: ms_mysql allocation score on pacemaker-prm2: 0
clone_color
: p_mysql:0 allocation score on pacemaker-prm1: INFINITY
clone_color
: p_mysql:0 allocation score on pacemaker-prm2: 0
clone_color
: p_mysql:1 allocation score on pacemaker-prm1: 0
clone_color
: p_mysql:1 allocation score on pacemaker-prm2: 100
native_color
: p_mysql:0 allocation score on pacemaker-prm1: INFINITY
native_color
: p_mysql:0 allocation score on pacemaker-prm2: 0
native_color
: p_mysql:1 allocation score on pacemaker-prm1: -INFINITY
native_color
: p_mysql:1 allocation score on pacemaker-prm2: 100
p_mysql
:0 promotion score on pacemaker-prm1: INFINITY
p_mysql
:1 promotion score on pacemaker-prm2: 0
native_color
: p_vip allocation score on pacemaker-prm1: INFINITY
native_color
: p_vip allocation score on pacemaker-prm2: -INFINITY


Nothing here prevent in case of Master crash that the broken slave ; which possibly lost 4 hours of not-replicated-data ;  take the Master role. Disaster pending here !

Fix suggestion
Could you please add an automatic colocation or  -INFINITY score on the broken replica ?


Overall :
Due to this issue : i cannot used your percona RA in prod env.
Keen to provide you any more infos/debug or tests.

Regards, Aurélien







Yves Trudeau

unread,
Jul 29, 2014, 4:05:55 PM7/29/14
to prm-d...@googlegroups.com
Hi Aurélien,
  good catch, I'll modify the code to set the master-score for a node that has replication broken to -INF.  That will prevent its promotion.

Regards,

Yves










--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss...@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/d/optout.

aurelien lemaire

unread,
Aug 13, 2014, 9:55:22 AM8/13/14
to prm-d...@googlegroups.com
Hi Yves,

Thanks a bunch for the reply.

I saw you git commit related to my bug report :     https://github.com/percona/percona-pacemaker-agents/commit/f9236d59b8d0d748ee646b27e4eb5cae9e3d96d7#diff-76ee4ad3b12ba3471b5273d26f48b51b

I found some other buggy stuff (but less disruptive) in your RA... would you prefer i post them here or directly to you ?

Regards, Aurélien

Yves Trudeau

unread,
Aug 13, 2014, 10:39:40 AM8/13/14
to prm-d...@googlegroups.com

Hi Aurélien,
  Here is fine.

Regards,

Yves

Reply all
Reply to author
Forward
0 new messages