Galera and ec2-consistent-snapshot

143 views
Skip to first unread message

Michiel van Vlaardingen

unread,
Feb 21, 2012, 2:10:22 PM2/21/12
to codership
As we are deploying Galera on AWS, I was wondering the following: Is
it safe to use ec2-consistent-snapshot with Galera? I assume it
should be ok for the data itself as it seems to use the same flush/
read lock approach as rsync_sst.

Now the more interesting part: would it be possible to initialize a
new node from such a snapshot? If it has reasonably new data, I would
assume that using IST, it would be able to catch up and allow for easy
recovery of broken volumes, etc. The the thing I worry about are
the galera specific files that are also in the mysql data directory.
Is there a possibility for them to be snapshotted in an inconsistent
or outdated state? Or would it require the snapshot to be taken when
the server is properly shut down? As it seems grastate.dat is only
stored on shutdown.

A step further: could this work to initialize additional nodes, or
would adding an exact clone of a node (as the snapshot would also copy
grastate.dat and galera.cache) give problems? I realize that SST is
meant for this purpose, but simply creating a new node from a snapshot
would be easier to do than writing an SST script that utilizes EBS
snapshots.

I could imagine that doing this properly would require to adapt the
snapshot script to also store $UUID:$SEQNO at the time of the snapshot
and maybe implementing an SST script that ignores the donor, but works
in a way similar to the rsync SST script on the joiner side.

Thanks,

Michiel

Alex Yurchenko

unread,
Feb 21, 2012, 5:07:03 PM2/21/12
to codersh...@googlegroups.com
Hi Michiel,

You are absolutely correct to identify the main problem: we do indeed
get a consistent snapshot, however we don't know what global transaction
ID it is consistent with.

Luckily ec2-consistent-snapshot has a --pre-freeze-cmd option. With
this option you can write a fairly simple script that would query the
server for wsrep_local_state_uuid and wsrep_last_committed status
variables and use them to fake a grastate.dat file. However, you should
not forget to invalidate it (e.g. remove) in --post-thaw-cmd.

And yes, from this snapshot you can start as many new nodes as you wish
(just make sure you clone the volume). The trick is to have gcache.size
big enough.

Speaking of gcache. By default it is placed in the MySQL datadir, i.e.
on snapshotted volume, and since it is not reusable, you may want to put
it elsewhere (e.g. on local storage) for the following reasons

1) I have not tested it yet, but there might be a possibility that more
data in the volume, the longer snapshot takes. I may be wrong about it
though and the volume is copied bit-to-bit, but I hope it is smarter
than that.
2) gcache files are constantly written to, and this can adversely
affect how long snapshot takes and how fast the filesystem runs out of
snapshot buffer.
3) spreading IO over the devices generally improves performance.

gcache.dir option is responsible for that.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Reply all
Reply to author
Forward
0 new messages