WAL-e chef cookbook

107 views
Skip to first unread message

Jesse House

unread,
Jun 2, 2013, 7:51:56 PM6/2/13
to wa...@googlegroups.com

WAL-e chef cookbook

I have been working on a WAL-e chef cookbook off and on the last few weeks, it is finally ready for some review from the WAL-e experts

Disclaimer: I have not used this cookbook in production yet, in fact I have not used wal-e in production yet either - but plan to soon

So far tested on ubuntu 12.04 on vagrant boxes

There are two recipes so far

  • master
    • builds postgres box with all wal-e dependencies
    • configured to use wal_level archive and wal-push archive command
    • sets up env s3 settings
    • creates initial backup file with wal-e backup-push
    • nightly cron to push new backups once a day with wal-e backup-push
  • recover
    • builds postgres box with wal-e dependencies
    • sets up env s3 settings
    • turns off postgres
    • deletes a bunch of the postgres data directories
    • pulls down backup with wal-e backup-fetch
    • creates a recovery.conf with wal-e wal-fetch
    • starts postgres backup
    • after that it is a manual process to verify all data has been restored and server is up and running, at this point a configuration change would need to be made to start writing new WAL files to s3 again

I plan to add a hot standby recipe at some point, not 100% of the details for this

All feedback is appreciated, especially if you see places the recipes can be improved

Thanks,
- Jesse

Daniel Farina

unread,
Jun 2, 2013, 8:37:42 PM6/2/13
to jesse...@gmail.com, wal-e
On Sun, Jun 2, 2013 at 4:51 PM, Jesse House <jesse...@gmail.com> wrote:
> WAL-e chef cookbook
>
> I have been working on a WAL-e chef cookbook off and on the last few weeks,
> it is finally ready for some review from the WAL-e experts

Very cool.
I have one warning, that I think you might be taking care of here but
isn't entirely clear. It hasn't been written down in full many times,
so bear with me:

When converting a standby into a (writable) primary, it is somewhat
dangerous to archive back into the same prefix that one originates
from. Consider the following topology:

primary1 (writing to prefix1 on timeline1)
standby1 (reading from prefix1)
standby2 (reading from prefix1)

At some point, one decides to promote standby1, now called primary2:

primary1 (writing to prefix1 on timeline1)
standby2 (reading from prefix1)
primary2 (writing to prefix1, on timeline2)

Note how primary2 is on another timeline: the way that works is it is
an integer incremented by postgres when it is told to diverge from its
primary.

This is sort of okay, but precarious, because the next step introduces
corruption, which is promoting standby2:

primary1 (writing to prefix1 on timeline1)
primary2 (writing to prefix1 on timeline2)
primary3 (writing to prefix1 on timeline2)

Now we can see that the two latter primaries are on the same timeline
and in the same prefix, which can result in WAL-overwriting, which
results in corruption. An ugly solution to this that is in use by me
is to give every standby a separate read and write prefix. The same
situation would evolve as follows:

primary1 (writing to prefix1 on timeline1)
standby1 (reading from prefix1)
standby2 (reading from prefix1)

primary1 (writing to prefix1 on timeline1)
standby2 (reading from prefix1)
primary2 (writing to prefix2, on timeline2)

primary1 (writing to prefix1 on timeline1)
primary2 (writing to prefix2 on timeline2)
primary3 (writing to prefix3 on timeline2)

The ease of making this mistake is among one of the most bad things
about wal-e's design to date (and doesn't occur if it's only used in
simple cases, which is what wal-e was originally used for), but nobody
has made a proposal that seems easy to use, is general (particularly
with regard to treatment of credentials), and sands off this rough
edge, and my interest is merely a moral one since I've worked around
it in production. So, until then, I think as input to the 'come out
of hot standby' step one needs a wal-e prefix that is intended to be
the new spot to write things to, and if possible it'd be nice to
recheck that it isn't the same as the read-url.

va...@spivak.net

unread,
Jan 26, 2014, 3:29:56 AM1/26/14
to wa...@googlegroups.com, jesse...@gmail.com

Daniel Farina

unread,
Jan 26, 2014, 4:15:03 AM1/26/14
to va...@spivak.net, wal-e, Jesse House
On Sun, Jan 26, 2014 at 12:29 AM, <va...@spivak.net> wrote:
> Is the WAL-overwriting still an issue for 9.3?
>
> http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-timeline-switch-of-slave-node-without-archives/

Yes. Timeline switching over streaming is neat in that it makes the
switch faster and doesn't require falling back on archives (many
people don't have any, although anyone using WAL-E would...), but
doesn't change the semantics of the problem.
Reply all
Reply to author
Forward
0 new messages