making a storage engine crash safe on a slave in MySQL 5.6

Zardosht Kasheff

unread,

Oct 3, 2012, 11:15:52 AM10/3/12

to

Hello all,

I read that InnoDB is now crash safe on slaves in MySQL 5.6. I
understand that a way they do this is on committing a transaction on a
slave, they store the binary log position in an InnoDB table (which
makes that information transactionally maintained). I also understand
that this position is stored for each database, as databases may apply
the replication log in parallel.

Upon recovery of a slave, how does InnoDB report to MySQL where the
replication log should resume?

Can another storage engine similarly make itself crash safe on a
slave? Will there be any issues with multiple storage engines doing
so?

Thanks
-Zardosht

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

Rick James

unread,

Oct 3, 2012, 11:42:09 AM10/3/12

to

As I understand it, it is not quite that...

The transaction must get into the Slave's relay log before returning from the COMMIT. (That's not the same as "committing".)

Zardosht Kasheff

unread,

Oct 3, 2012, 12:00:31 PM10/3/12

to

Thanks Rick. So my understanding is muddled.

So how are slaves crash safe? What does MySQL and InnoDB do upon recovery?

Rick James

unread,

Oct 3, 2012, 12:32:24 PM10/3/12

to

There are edge cases where MySQL can lose data, but they are rare.
You can tune things to minimize the chance of loss, but performance suffers.
There is no automatic failover when a Master fails. Attempts to automate it have led to disaster -- see the recent lengthy outage of GIT.

The design of MySQL's replication is to send a copy of all writes from the Master to the Slave, then _asynchronously_ apply the updates on the Slave.
Some of the "improvements" move that toward _sychronous_.

Old Replication: COMMIT returns to client before Slave necessarily has the write.
5.6 / Percona Cluster / Galera / etc: COMMIT returns after the Slave says "yes I have the write (in the "relay log"), but I have not applied it yet."
Full synchronous (not implemented) would wait for the write to be committed on all slave(s) before returning from the COMMIT.

Why not do sync?
How far apart are Master and Slave? -- this is a lag.
Is the Slave busy and cannot get around to doing the write? -- another lag
The replication stream is (was) "single-threaded" (for good reason); so it is the query stuck in this 'queue'? -- another lag
All these lags would mean High Availability --> Low Performance.

The Parallelism in 5.6 _partially_ fixes the single-threadedness.

I don't mean to be knocking Replication as implemented (and improved recently); I am trying to explain why it is a tough problem.

Another feature you did not mention is the GTID (Global Transaction ID). This has been needed for many years, was implemented at Google, and is now (5.6) mainstream. In complex and subtle ways it makes promoting an arbitrary Slave to Master easily scriptable. The problem before had to do with the _other_ slaves -- there was no easy way to have them point to the new Master and know where they "left off".

Workbench has such scripts.
A guy invented MHA to do the work without the GTID; he provides equivalent scripts.

What does InnoDB do upon recovery? There are many aspects to that.
On a single machine, transactions that were started but not committed need to be rolled back; InnoDB's "log" assists in that.
A transaction is not sent to the slaves until you COMMIT.
etc.
The "doublewrite" buffer protects you (at a cost) from writing part of a 16KB block then crashing. (InnoDB normally works in units of 16KB, but most file systems talk in units of 4KB.)

This is only a quick, cursory, summary of all the stuff that goes on in InnoDB an Replication.
MySQL+InnoDB is rather "crash safe", but not perfect.
XtraDB is essentially the same as InnoDB in this area of discussion.
MyISAM is much less crash safe, in many ways.
Aria (in MariaDB) touts being crash safe.

MARK CALLAGHAN

unread,

Oct 3, 2012, 12:46:01 PM10/3/12

to

On Wed, Oct 3, 2012 at 8:15 AM, Zardosht Kasheff <zard...@gmail.com> wrote:
> Hello all,
>
> I read that InnoDB is now crash safe on slaves in MySQL 5.6. I
> understand that a way they do this is on committing a transaction on a
> slave, they store the binary log position in an InnoDB table (which
> makes that information transactionally maintained). I also understand
> that this position is stored for each database, as databases may apply
> the replication log in parallel.
>
> Upon recovery of a slave, how does InnoDB report to MySQL where the
> replication log should resume?

I don't do much with 5.6 source. Grep the sql directory for
"repository" and then start by looking at rpl_info_factory.cc. The
vague answer is that the file name offset are read from either a table
or the info file. I am sure the replication team will respond soon
with a less vague answer.

--
Mark Callaghan
mdca...@gmail.com

Zardosht Kasheff

unread,

Oct 3, 2012, 2:22:57 PM10/3/12

to

Thanks Mark and Rich for the feedback.

So let me take a step back. As I understand it, something was done in
MySQL 5.6 to ensure that should a slave crash, the relay log is in
sync with the state of InnoDB. I would like to understand how this is
done and what another storage engine can do to ensure this.

Is the following correct?

With each InnoDB write transaction on a slave, the slave's relay log
info is updated with that transaction before commit? Because the
slave's relay log is now an InnoDB table, upon recovery, whatever the
slave relay log recovers to will be correct. If this is correct, where
is the code that is doing this?

How does this work with a slave running parallel threads? If multiple
threads are processing different parts of the relay log, how do these
multiple threads manage updating the relay log?

-Zardosht

Kristian Nielsen

unread,

Oct 4, 2012, 4:55:24 AM10/4/12

to

Zardosht Kasheff <zard...@gmail.com> writes:

> Is the following correct?
>
> With each InnoDB write transaction on a slave, the slave's relay log
> info is updated with that transaction before commit? Because the
> slave's relay log is now an InnoDB table, upon recovery, whatever the
> slave relay log recovers to will be correct. If this is correct, where
> is the code that is doing this?

As I understand it, the issue is not the relay log. If the relay log is cut
short after a crash, it can just be fetched anew from the master.

The issue is the position in the master binlog that the SQL thread has
reached, ie. the end of the last event applied by the slave. This used to be
stored in relay-log.info in a non-crash-safe way, and can now optionally be
stored in an InnoDB table so it can be recovered in a transactionally
consistent way along with applied changes.

> sync with the state of InnoDB. I would like to understand how this is
> done and what another storage engine can do to ensure this.

In principle, this should work for any other XA-capable engine as well. Since
transactions that span two or more such engines are XA-committed and recovered
consistently after a crash. So putting the replication info in InnoDB and
replicating tables in your engine should be ok, or vice versa. However, note
that XA between multiple engines is rather costly, and also there used to be
several severe bugs for this in MySQL that may or may not be fixed yet.

> How does this work with a slave running parallel threads? If multiple
> threads are processing different parts of the relay log, how do these
> multiple threads manage updating the relay log?

I do not know. I would assume there will be multiple rows in the table, one
per SQL thread?

Hope this helps,

- Kristian.

Zardosht Kasheff

unread,

Oct 4, 2012, 3:41:58 PM10/4/12

to

Thanks for the feedback Kristian. With this information, I now need to
experiment to investigate this behavior.