Error on switchover in mysql_prm_gtid

83 views
Skip to first unread message

Neef Roel

unread,
Oct 15, 2015, 11:29:52 AM10/15/15
to PRM-discuss
Hi,

I'm currently testing a failover scenario with 3 hosts (1 master, 2 slaves).
When I try to failover the master from node01 to node02, i get error messages in the corosync logfile..



On node02:
Oct 15 17:17:01 [5385] NODE02 pacemaker_remoted:   notice: operation_finished:   p_mysql_monitor_2000:15006:stderr [ /usr/lib/ocf/resource.d/percona/mysql_prm_gtid: line 1061: 1000000000+35e67a77: value too great for base (error token is "35e67a77") ]


On node03
Oct 15 17:21:12 [7609] NODE03 pacemaker_remoted:   notice: operation_finished:   p_mysql_notify_0:13561:stderr [ /usr/lib/ocf/resource.d/percona/mysql_prm_gtid: line 1061: 1000000000+35e67a77: value too great for base (error token is "35e67a77") ]


I'm not sure, but it almost looks like it isn't parsing the GTIDs properly..
I've tried googling on this message, but I'm not getting many hits on it.


René

Yves Trudeau

unread,
Oct 15, 2015, 3:17:12 PM10/15/15
to prm-d...@googlegroups.com

Hi Neef,
  Indeed, like if the transaction id was in hexa, the documentation claims it is decimal.  Could send the content of the gtid_executed variable on your servers? 

Regards,

Yves

--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss...@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/d/optout.

Rene Loef

unread,
Oct 15, 2015, 3:28:38 PM10/15/15
to prm-d...@googlegroups.com

Hi Yves,

 

The gtid executed:

 

Node01;

mysql> select @@global.gtid_executed;

+------------------------------------------+

| @@global.gtid_executed                   |

+------------------------------------------+

| 35e67a77-20c4-11e5-a330-00505689325d:1-2 |

+------------------------------------------+

1 row in set (0.00 sec)

 

Node02

mysql> select @@global.gtid_executed;

+------------------------------------------+

| @@global.gtid_executed                   |

+------------------------------------------+

| 35e67a77-20c4-11e5-a330-00505689325d:1-2 |

+------------------------------------------+

1 row in set (0.00 sec)

 

Node03

mysql> select @@global.gtid_executed;

+------------------------------------------+

| @@global.gtid_executed                   |

+------------------------------------------+

| 35e67a77-20c4-11e5-a330-00505689325d:1-2 |

+------------------------------------------+

1 row in set (0.00 sec)

 

Neef

--
You received this message because you are subscribed to a topic in the Google Groups "PRM-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prm-discuss/gDkWo5hB2Pw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prm-discuss...@googlegroups.com.

Yves Trudeau

unread,
Oct 15, 2015, 3:52:42 PM10/15/15
to prm-d...@googlegroups.com
Hi,
   I am puzzled:

root@PRM-1:~# master_uuid='35e67a77-20c4-11e5-a330-00505689325d'                                                                              
root@PRM-1:~# echo '35e67a77-20c4-11e5-a330-00505689325d:1-2' | sed "s/.*$master_uuid:[0-9]*-\([0-9]*\).*/\1/"
2

which is correct.  Can you confirm the version of the agent you are using has this at line 1048:

executed_gtid=`cat $slave_status_file | sed "s/.*$master_uuid:[0-9]*-\([0-9]*\).*/\1/"`

Regards,

Yves

Rene Loef

unread,
Oct 15, 2015, 4:24:46 PM10/15/15
to prm-d...@googlegroups.com

I was puzzled as well..

 

Yes, it says;

executed_gtid=`cat $slave_status_file | sed "s/.*$master_uuid:[0-9]*-\([0-9]*\).*/\1/"`

 

On emore weird thing, when I echo some of the variables from line 1061 to a text file;

 

  echo "$executed_gtid - $last_gtid" > /tmp/tempfile.txt

 

this is what I get in the file;

 

35e67a77-20c4-11e5-a330-00505689325d:1-2 - GTID_EXECUTED 35e67a77-20c4-11e5-a330-00505689325d

 

One would expect that the executed_gtid has the value of 2, because when I do echo “35e67a77-20c4-11e5-a330-00505689325d:1-2” | sed …. It returns “2”

 

René

Yves Trudeau

unread,
Oct 15, 2015, 4:31:16 PM10/15/15
to prm-d...@googlegroups.com

How interesting :)

Could you add, just before:

cat $slave_status_file > /tmp/tempstatus.txt
cat $master_uuid > /tmp/tempmasteruuid.txt

And copy/paste the content.  Maybe the error is elsewhere.

Yves

Rene Loef

unread,
Oct 15, 2015, 4:34:27 PM10/15/15
to prm-d...@googlegroups.com

I was also busy doing things like that.

I exported the $master_uuid variable and this is probably the culprit;

 

The sed command;

sed s/.*$master_uuid:[0-9]*-\([0-9]*\).*/\1/"

 

translates in :

sed "s/.*40337b1d-734b-11e5-bd55-00505689230a:[0-9]*-\([0-9]*\).*/\1/"

whilst the gtid_executed is: 35e67a77-20c4-11e5-a330-00505689325d:1-2

 

different prefix, and it doesn’t match…

Rene Loef

unread,
Oct 15, 2015, 4:46:24 PM10/15/15
to prm-d...@googlegroups.com
I was just thinking... could it be because no changes have been made since the failover and the master gtid or gtid executed hasn't been updated to reflect the new master? 

René 

Yves Trudeau

unread,
Oct 15, 2015, 4:49:18 PM10/15/15
to prm-d...@googlegroups.com

That's the purpose of the substraction.  I suspect the error is elsewhere in the code, do you have the content for the 2 files?

--Yves

Rene Loef

unread,
Oct 15, 2015, 5:31:03 PM10/15/15
to prm-d...@googlegroups.com

cat /tmp/tempstatus.txt

35e67a77-20c4-11e5-a330-00505689325d:1-2

 

cat /tmp/tempmasteruuid.txt

40337b1d-734b-11e5-bd55-00505689230a

Yves Trudeau

unread,
Oct 15, 2015, 5:35:11 PM10/15/15
to prm-d...@googlegroups.com
Here we go :)

root@PRM-1:~# master_uuid='40337b1d-734b-11e5-bd55-00505689230a'                                                                              
root@PRM-1:~# echo '35e67a77-20c4-11e5-a330-00505689325d:1-2' | sed "s/.*$master_uuid:[0-9]*-\([0-9]*\).*/\1/"
35e67a77-20c4-11e5-a330-00505689325d:1-2

I think I hate sed.  I am done for the day but I'll check for something else.  Maybe there's a sed option.

Regards,

Yves

Yves Trudeau

unread,
Oct 15, 2015, 5:37:41 PM10/15/15
to prm-d...@googlegroups.com
need "sed -n"  I'll update the code tomorrow.

Regards,

Yves  

Yves Trudeau

unread,
Oct 15, 2015, 5:37:56 PM10/15/15
to prm-d...@googlegroups.com
BTW, thanks a lot

--Yves

Neef Roel

unread,
Oct 19, 2015, 3:27:23 PM10/19/15
to PRM-discuss
Yves,

Did you have a chance to look at the code?
(I don't want to rush you.. but our client is eager to use corosync in front of their databases)

René

Yves Trudeau

unread,
Oct 19, 2015, 5:14:44 PM10/19/15
to prm-d...@googlegroups.com

Hi René,
   Apparently I forgot to push to git.  I just did.

Regards,

Yves

Rene Loef

unread,
Oct 20, 2015, 3:16:07 PM10/20/15
to prm-d...@googlegroups.com

Yves,

 

I downloaded the new version (from the issue_init_packet branch), but I still got the error:

 

mysql_prm_gtid(p_mysql)[15564]: 2015/10/20_21:13:22 INFO: post-demote notification for NODE03.

mysql_prm_gtid(p_mysql)[15564]: 2015/10/20_21:13:23 INFO: Slave is not running - not waiting to finish

mysql_prm_gtid(p_mysql)[15564]: 2015/10/20_21:13:23 INFO: Slave not runnig - not waiting to finish

Oct 20 21:13:23 [2927] BLX179-TEST01 pacemaker_remoted:   notice: operation_finished:   p_mysql_notify_0:15564:stderr [ /usr/lib/ocf/resource.d/percona/mysql_prm_gtid: line 1066: 1000000000+-GTID_EXECUTED 525bb2c8: value too great for base (error token is "525bb2c8") ]

Oct 20 21:13:23 [2927] BLX179-TEST01 pacemaker_remoted:     info: log_finished:         finished - rsc:p_mysql action:notify call_id:25 pid:15564 exit-code:1 exec-time:489ms queue-time:0ms

Oct 20 21:13:23 [2927] BLX179-TEST01 pacemaker_remoted:     info: log_execute:  executing - rsc:p_mysql action:notify call_id:26

 

I’ve already added the debugging rules (and some), here are my current values;

 

cat $slave_status_file > /tmp/tempstatus.txt

echo $master_uuid > /tmp/tempmasteruuid.txt

echo $server_uuid > /tmp/tempserveruuid.txt

echo $executed_gtid > /tmp/tempexecuted_gtid.txt

 

 

cat /tmp/tempstatus.txt

525bb2c8-73f2-11e5-8198-00505689325d:1-5,72936af1-73f7-11e5-81b9-00505689230a:1

 

cat /tmp/tempmasteruuid.txt

72936af1-73f7-11e5-81b9-00505689230a

cat /tmp/tempserveruuid.txt

(empty)

 

cat /tmp/tempexecuted_gtid.txt

(empty)

Yves Trudeau

unread,
Oct 21, 2015, 7:44:45 AM10/21/15
to prm-d...@googlegroups.com

My bad, use the branch Issus_init_packet, I haven't merge yet.

--Yves

Rene Loef

unread,
Oct 21, 2015, 1:42:22 PM10/21/15
to prm-d...@googlegroups.com

Could it be that I’m looking at the wrong branch?

This is the URL in which I downloaded the newer version yesterday;

https://github.com/percona/percona-pacemaker-agents/tree/Issue_init_packet

or this file;

https://raw.githubusercontent.com/percona/percona-pacemaker-agents/Issue_init_packet/agents/mysql_prm_gtid

Yves Trudeau

unread,
Oct 21, 2015, 3:32:24 PM10/21/15
to prm-d...@googlegroups.com

Hi,
   so apparently there's more... I wrongly put the server_uuid in the sed expression instead of master_uuid, I fixed and pushed to git, same branch.

Regards,

Yves

Rene Loef

unread,
Oct 21, 2015, 6:21:17 PM10/21/15
to prm-d...@googlegroups.com

I did notice that this line was changed, but I assumed this was as it should be..

 

I’ve downloaded the new version.. first failover goes perfect.

Then I manually added the old master back into the cluster as a slave, is in sync

And then the Second failover not so good..

 

Again with the error..

 

I’ve added the same debug lines ;

/tmp/tempstatus.txt;

525bb2c8-73f2-11e5-8198-00505689325d:1-7,72936af1-73f7-11e5-81b9-00505689230a:1

 

/tmp/tempmasteruuid.txt

4b1e0a76-73f7-11e5-81b8-005056895aa4

 

/tmp/templastgtid.txt

GTID_EXECUTED 525bb2c8-73f2-11e5-8198-00505689325d

 

/tmp/tempexecutedgtid.txt

Empty line

Rene Loef

unread,
Oct 22, 2015, 2:50:20 PM10/22/15
to prm-d...@googlegroups.com

Yves,

 

I’ve been testing with the gtid behaviour..

When the master/slave failover has occurred, the master_uuid is not always already in the gtid_executed list. This mostly happens when there have not been any updates to the database.

 

say I have 3 database servers;

Node 1: 525bb2c8-73f2-11e5-8198-00505689325d

Node2: 4b1e0a76-73f7-11e5-81b8-005056895aa4

Node3: 72936af1-73f7-11e5-81b9-00505689230a

 

Node1 is the master, and there have been some database updates.

All slaves have master_uuid set to “525bb2c8-73f2-11e5-8198-00505689325d”

All slaves are in sync and have the gtid-executed set to: 525bb2c8-73f2-11e5-8198-00505689325d:1-7

 

Now when a failover occurs, the mysql_prm_gtid script does sed over the slavestatus file;

cat $slave_status_file | sed -n "s/.*${master_uuid}:[0-9]*-\([0-9]*\).*/\1/"

which would be like

echo “525bb2c8-73f2-11e5-8198-00505689325d:1-7” | sed "s/.*525bb2c8-73f2-11e5-8198-00505689325d:[0-9]*-\([0-9]*\).*/\1/"

this would be returning the integer value of 7

 

and the script would continue..

 

Now Node 02 has been promoted to master.

Node 3 has been changed to reflect the new master and has server_uuid set to 4b1e0a76-73f7-11e5-81b8-005056895aa4.

There have been no updates to the databases on the master, so there is no newer gtid_executed available

 

Node 1 will be added again is in sync, same status as Node 3.

 

Consider the next few scenarios;

No updates to the master;

Now when a failover would occur the master_uuid would be “4b1e0a76-73f7-11e5-81b8-005056895aa4 “

The gtid_executed would be : “525bb2c8-73f2-11e5-8198-00505689325d:1-7”

The mysql_prm_gtid would try;

echo “525bb2c8-73f2-11e5-8198-00505689325d:1-7” | sed "s/.*4b1e0a76-73f7-11e5-81b8-005056895aa4:[0-9]*-\([0-9]*\).*/\1/"

this would return an empty result..

 

One update to the master;

Now the failover would occur. The master_uuid would be “4b1e0a76-73f7-11e5-81b8-005056895aa4 “

The gtid_executed would be: “525bb2c8-73f2-11e5-8198-00505689325d:1-7, 4b1e0a76-73f7-11e5-81b8-005056895aa4:1”

The mysql_prm_gtid would try;

echo “525bb2c8-73f2-11e5-8198-00505689325d:1-7, 4b1e0a76-73f7-11e5-81b8-005056895aa4:1” | sed "s/.*4b1e0a76-73f7-11e5-81b8-005056895aa4:[0-9]*-\([0-9]*\).*/\1/"

this would return an empty result..

 

Two updates to the master:

Now the failover would occur. The master_uuid would be “4b1e0a76-73f7-11e5-81b8-005056895aa4 “

The gtid_executed would be: “525bb2c8-73f2-11e5-8198-00505689325d:1-7, 4b1e0a76-73f7-11e5-81b8-005056895aa4:1-2”

The mysql_prm_gtid would try;

echo “525bb2c8-73f2-11e5-8198-00505689325d:1-7, 4b1e0a76-73f7-11e5-81b8-005056895aa4:1-2” | sed "s/.*4b1e0a76-73f7-11e5-81b8-005056895aa4:[0-9]*-\([0-9]*\).*/\1/"

this would return the value 2

 

It looks like the script would be functioning OK if there have been at least two replicated events..

 

Once the GTID executed exists for the master_uuid with a suffix of their position.. the script seems to work fine.

It looks like it is only happening when failing over from a master on which less than 2 updates have been performed on the database…

 

Now that I have failed over a couple of times it keeps working for me..

Yves Trudeau

unread,
Oct 22, 2015, 4:33:28 PM10/22/15
to prm-d...@googlegroups.com

Hi René,
   Obviously the logic to figure which slave is the most up to date needs to be revisited.  I'll check when I have time, maybe tomorrow morning.

Regards,

Yves

Reply all
Reply to author
Forward
0 new messages