How new recovery patch work

1 view
Skip to first unread message

Vadim Tkachenko

unread,
Jul 17, 2009, 12:54:35 AM7/17/09
to percona-d...@googlegroups.com, MARK CALLAGHAN, Domas Mituzas, Yasufumi Kinoshita
Mark, Domas,

Yasufumi created page where he explains how patch for InnoDB recovery works

http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_recovery_patches#is_the_buf_flush_insert_sorted_into_flush_list_hack_correct

Your opinion on this is very welcome.


Thanks,
Vadim

--
Vadim Tkachenko, CTO
Percona Inc.
ICQ: 369-510-335, Skype: vadimtk153, Phone +1-888-401-3403
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.percona.com/

MARK CALLAGHAN

unread,
Jul 17, 2009, 12:42:24 PM7/17/09
to Vadim Tkachenko, percona-d...@googlegroups.com, Domas Mituzas, Yasufumi Kinoshita
On Thu, Jul 16, 2009 at 9:54 PM, Vadim Tkachenko<va...@percona.com> wrote:
> Mark, Domas,
>
> Yasufumi created page where he explains how patch for InnoDB recovery works
>
> http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_recovery_patches#is_the_buf_flush_insert_sorted_into_flush_list_hack_correct
>
> Your opinion on this is very welcome.

Thank you. It is on my TODO list for things to review.

--
Mark Callaghan
mdca...@gmail.com

Dimitri

unread,
Jul 20, 2009, 3:53:00 PM7/20/09
to percona-d...@googlegroups.com, Vadim Tkachenko, Domas Mituzas, Yasufumi Kinoshita
Folks,

if I understand well, the patch helps in case if redo log contains
several different copies of the same block(s) - and instead of
replaying all image changes it'll apply only the last one for each
block.
(Correct me, please, if it's not so)..

It make me think if redo log contains only a single copy of each block
the recovery process will still take one hour (or so, I mean the
initial time observed by Vadim in his test)..
Do you agree, or do I miss something?..

So, current patch is absolutely great, BUT - I think there is a need
to speed-up recovery process in any case - And I think if recovery
will be processed in *parallel* by several threads it'll give a huge
gain!
What do you think?..

Rgds,
-Dimitri

Vadim Tkachenko

unread,
Jul 20, 2009, 7:46:36 PM7/20/09
to Dimitri, percona-d...@googlegroups.com, Domas Mituzas, Yasufumi Kinoshita
Dimitri,

I left single copy vs several copies to comment for Yasufumi, he would
be much better in understanding (though he is on vacation this week).

As for parallel recovery - that would be good, but seems very hard to
implement. I can't this how to resolve dependencies. Well that maybe
done, but the patch is going to be quite quite complex.

We actually made changes that recovery can use many write io threads
(innodb_write_io_threads) - (though, it also may be by default in 5.4 also).

Yasufumi Kinoshita

unread,
Jul 20, 2009, 8:42:09 PM7/20/09
to percona-d...@googlegroups.com, dimit...@gmail.com, Vadim Tkachenko, Domas Mituzas, Yasufumi Kinoshita
Dimitri,

the InnoDB recovery process has 2 phases.

1. scanning the transaction log and stock the changes in internal hash
table by page offset as key, until it reaches almost full of the buffer
pool. (printed as LSN number)

2. recovering the each pages by the hashed logs. (printed as %)
(* the current patch gains speed of phase 2 on large buffer pool)

If there are more transaction log still, the phases are repeated.


I think about parallelize,,,
"1." may be difficult to be parallelized, because the transaction log is
contiguously (It is separated by block. But the each records is also
cross-border.) In "2.", the each recovery of the pages are done by the
each read IO threads after the reading. So, using several read IO
threads equals parallelizing phase "2.".

It can be thought as that it was already parallelized enough, I think
for now.

Best regards,
Yasufumi
--
Yasufumi Kinoshita,
Performance Engineer
Percona Inc.

Dimitri

unread,
Jul 21, 2009, 2:39:37 AM7/21/09
to Yasufumi Kinoshita, percona-d...@googlegroups.com, Vadim Tkachenko, Domas Mituzas, Yasufumi Kinoshita
Yasufumi, Vadim,

Thanks a lot for this detailed explanation!
Things are more clear now :-)

After all said few more questions/comments:

- the phase #2 originally should take much more time than phase #1
(otherwise we should not see a such improvement :-)) - but what is the
ratio now between these phases within a patched version?..

- seems to me a good suggestion for every production database will be
to set a buffer pool bigger enough to contain all log data (>4G); on
the same time how much hash table is efficient here?.. should it be
better to use AVL tree?..

- in phase #2 what is the ratio between CPU and I/O time spent here?
as well how much I/O threads were *really* used in your case?..

- you did not answer about block copies (what if there is only a
single copy of each block) :-)

Sorry, I'm also leaving on vacation, so don't need to hurry up with
answers, take your time :-))

The point is very interesting and critical on the same time (and also
joining Mark's question about recovery of partially written pages) -
but looks very promising also! :-))

Best regards!
-Dimitri
Reply all
Reply to author
Forward
0 new messages