Need some help getting Wal-e setup

271 views
Skip to first unread message

epic...@gmail.com

unread,
Jun 15, 2013, 8:12:13 PM6/15/13
to wa...@googlegroups.com
These are my notes so far on how I setup Wal-e, https://epicserve-docs.readthedocs.org/en/latest/sys_admin/postgres_backup_with_wal-e.html.

However when I go to run the base-backup it just continually does the following ...

NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to be archived
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (240 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (480 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.

Here is the full output with paths anonymized, https://gist.github.com/epicserve/79f5eb758c34206117ad.

--Brent

Cody Caughlan

unread,
Jun 15, 2013, 8:48:03 PM6/15/13
to epic...@gmail.com, wa...@googlegroups.com
Try using absolute paths in your archive_command:

archive_command = '/usr/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-push %p'
> --
> You received this message because you are subscribed to the Google Groups "wal-e" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to wal-e+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

epic...@gmail.com

unread,
Jun 15, 2013, 9:16:20 PM6/15/13
to wa...@googlegroups.com, epic...@gmail.com
No change. I'm running the base backup manually from the command line with ...

$ sudo envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push /var/lib/postgresql/9.1/main

If I try ...

$ sudo -u postgres bash -c "envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push /var/lib/postgresql/9.1/main"

I get ...

[sudo] password for oconnor: 
Traceback (most recent call last):
  File "/usr/local/bin/wal-e", line 9, in <module>
    load_entry_point('wal-e==0.6.2', 'console_scripts', 'wal-e')()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 337, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2278, in load_entry_point
    raise ImportError("Entry point %r not found" % ((group,name),))
ImportError: Entry point ('console_scripts', 'wal-e') not found

Daniel Farina

unread,
Jun 15, 2013, 9:20:04 PM6/15/13
to epic...@gmail.com, wa...@googlegroups.com
On Sat, Jun 15, 2013 at 6:16 PM, <epic...@gmail.com> wrote:
> No change. I'm running the base backup manually from the command line with
> ...
>
> $ sudo envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push
> /var/lib/postgresql/9.1/main
>
> If I try ...
>
> $ sudo -u postgres bash -c "envdir /etc/wal-e.d/env /usr/local/bin/wal-e
> backup-push /var/lib/postgresql/9.1/main"
>
> I get ...
>

Your base backup invocation looks fine. It looks like either your
archiving is either behind (in which case you just need to wait) or
not working (which is scary, because eventually it'll backlog and
overflow, crashing the server.

You can use this query (as superuser) to monitor how many segments are
'ready' for archiving but have not yet been sent (those are 'done'):

SELECT suffix, count(*)
FROM (
SELECT (regexp_matches(name, E'\[0-9A-F]+\.([^\.]*)$'))[1] AS suffix
FROM pg_ls_dir('pg_xlog/archive_status')
name
) AS matches
GROUP BY suffix;

epic...@gmail.com

unread,
Jun 15, 2013, 10:06:55 PM6/15/13
to wa...@googlegroups.com, epic...@gmail.com
I think I got it fixed maybe ... I had some permission problems with my python packages. I fixed those and then ran the backup command using the following.

$ sudo -u postgres bash -c "envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push /var/lib/postgresql/9.1/main"
wal_e.operator.s3_operator INFO     MSG: start upload postgres version metadata
        DETAIL: Uploading to s3://my-bucket/wal-e/my-server/basebackups_005/base_000000010000000400000087_00000032/extended_version.txt.
        STRUCTURED: time=2013-06-16T01:38:47.330910-00 pid=7297
...

NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

I ran the following SQL command ...

template1=# SELECT suffix, count(*)
  FROM (
    SELECT (regexp_matches(name, E'\[0-9A-F]+\.([^\.]*)$'))[1] AS suffix
      FROM pg_ls_dir('pg_xlog/archive_status')
    name
  ) AS matches
  GROUP BY suffix;
 suffix | count 
--------+-------
 done   |    11
(1 row)

So how do I know if I'm good to go now for sure?

Daniel Farina

unread,
Jun 15, 2013, 10:19:45 PM6/15/13
to epic...@gmail.com, wa...@googlegroups.com
Run "wal-e backup-list" if you think your backup completed: it is
intended to only list complete backups.

epic...@gmail.com

unread,
Jun 15, 2013, 10:31:29 PM6/15/13
to wa...@googlegroups.com, epic...@gmail.com
That shows ...

$ sudo envdir /etc/wal-e.d/env wal-e backup-list
name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop
base_000000010000000400000087_00000032 2013-06-16T01:56:37.000Z 000000010000000400000087 00000032

But what I'm meaning is how do you verify that it keeps working? Like shouldn't I audit things every once in awhile? I'm seeing a new file (e.g. 0000000100000004000000B4.lzo) created in the wal_005 directory every minute. 

Daniel Farina

unread,
Jun 15, 2013, 10:48:08 PM6/15/13
to epic...@gmail.com, wa...@googlegroups.com
On Sat, Jun 15, 2013 at 7:31 PM, <epic...@gmail.com> wrote:
> That shows ...
>
> $ sudo envdir /etc/wal-e.d/env wal-e backup-list
> name last_modified expanded_size_bytes wal_segment_backup_start
> wal_segment_offset_backup_start wal_segment_backup_stop
> wal_segment_offset_backup_stop
> base_000000010000000400000087_00000032 2013-06-16T01:56:37.000Z
> 000000010000000400000087 00000032
>
> But what I'm meaning is how do you verify that it keeps working? Like
> shouldn't I audit things every once in awhile? I'm seeing a new file (e.g.
> 0000000100000004000000B4.lzo) created in the wal_005 directory every minute.

To test that archiving is continuing unimpeded, monitoring the number
of 'ready' segments and making sure they stay pretty small is how I do
it right now.

I also scan the backup-list data to make sure that a new backup is
taken frequently enough. My recent spate of work on reducing wal-e
memory use for certain kinds of workloads was instigated by this
monitoring.

You can also test restoration. I like to run pg_dumpall > /dev/null after
performing the restore to touch all the relation heaps and form their
tuples at least (indexes are not verified that way). If one finds
that the server crashes or delivers an error suggesting media defect,
then one dealing with corruption most likely, which in theory could be
introduced by wal-e (check the primary too in that case: to date wal-e
has no reported cases of mangling any database).

raana...@gmail.com

unread,
Aug 28, 2013, 8:28:33 AM8/28/13
to wa...@googlegroups.com, epic...@gmail.com
Hi,
when you say -> " I had some permission problems with my python packages"
what exactly do you mean? I have the exact same issue.

Thanks!
Reply all
Reply to author
Forward
0 new messages