PRM newsletter

95 views

Skip to first unread message

Yves Trudeau

unread,

May 24, 2013, 4:54:29 PM5/24/13

to prm-d...@googlegroups.com

Hi,
I have not been diligent keeping people informed about PRM. Here's the latest status.

PRM repository moved to Percona github
--------------------------------------

We decided to stop being a complete copy of the Pacemaker resource agents repository and move on our own. The new home is:

https://github.com/percona/percona-pacemaker-agents

the PRM agent is in the agents directory, renamed to mysql_prm. The rename is to easy the distinction with the Pacemaker regular agent.

Geo-DR
------

The latest agent includes support for Geo-DR type setup, namely 2 distinct Pacemaker cluster linked for replication using the booth daemon (paxos protocol). This is fairly new, I'll add documentation for this in the upcoming weeks.

Master crash
------------

One of the odd behavior of PRM was to not try to restart in place a master that hard crashed. The hard crash could have been caused by oom killer or something not permanent. In term of data consistency, the crashed master is likely the best candidate to be the next master if it restarts. In the repo, you'll find the branch "fix_crash_master" that restarts the master if it failed (no pid with existing pid file and socket file). There's a timer in the restart process, such restarts can't happened more than once per hour. This is hard-coded but I am open to discussion if someone gives me good reasons to do otherwise.

New doc
-------

I updated the setup guide (still some work needed) and added an operational guide. I am planning for a geo-dr guide and a migration guide shortly.

MHA like behavior
-----------------

MHA is currently much better than PRM regarding data consistency. Since Pacemaker gives us a distributed framework, architecturally speaking, we should be able to do the same in a much more elegant way. This kind of feature requires to process the binlogs. For that purpose, I modified a tool developed by yelp, ybinlogp, to get XID values (from Innodb), the position and md5sum of the data. The md5sum takes into account the host specific items that would break hash. The goal here is to relate the XID values from the slave's relay log to the master XID values in its binlog or from its relay log if it is a newly promoted master. Then, slaves will be able to resync with a bit of logic. Look at prm_binlog_parser.c for more details (in /tools/ybinlogp). I should have updates soon on this new feature.

Comments are welcome.

Regards,

Yves Trudeau

Reply all

Reply to author

Forward

0 new messages