Relay log read failure w/ GTID

452 views
Skip to first unread message

Caio James

unread,
Jul 16, 2017, 10:44:57 AM7/16/17
to Percona Discussion
Greetings -

Last night one of our servers ran out of space in its logs partition, causing a problem downloading binary logs from its replication master.

Once we cleared the space, we now see the following error in our slave:

> Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.


We’ve confirmed there is no issue with the master’s binary log. In the past, prior to when we started using GTID replication, we would have just reset the slave to the last executed log position and replication would have resumed.

However in this case, we have auto-position enabled, which prevents us from that same thing.

Wondering if there is some incantation where we can simply reset to the last executed GTID? Would rather not have to dump and restore from the master as the data is rather large. Seems like Auto Position is the issue here - although if it was truly “Auto” it would recognize the sql error and automatically retry the binary log download from the last good position ;)

Thanks!

Caio

Lorraine Pocklington

unread,
Oct 13, 2017, 5:31:01 AM10/13/17
to Percona Discussion
Hi
There are 3 posts on the Percona Performance blog that might help work through your options in this scenario:
There's also a discussion on Stack Overflow - it's not for GTID but I understand the process is the same except you use master_auto_position instead of file and position co-ordinates:
This answer is a little late I know, apologies, but a good question to answer in case others have the same experience... Hope this helps (in the future!)

Jervin R

unread,
Oct 13, 2017, 1:43:10 PM10/13/17
to Percona Discussion
Lorraine is right, the SE linkhas the solution but not straightforward. It is most likely the relay logs were corrupted and were not closed properly. Solving this would be to simply reset replication and redownload the relay logs:

STOP SLAVE;
RESET SLAVE;
CHANGE MASTER TO MASTER_AUTO_POSITION=1;
START SLAVE;
Reply all
Reply to author
Forward
0 new messages