Wal segments are not being archived anymore

169 views
Skip to first unread message

rjoh...@industrynewsletters.com

unread,
Sep 27, 2016, 5:42:26 PM9/27/16
to wal-e
I've had Wal-e setup and running fine for months. It runs a base backup every friday night and has been archiving to Rackspace files without any issues.

Today, I received a file system warning saying the box was running low on disk space when we average about 120GB free on that box. After investigating, the pb_xlog directory had over 5000 files sitting in it. 

This looks like it started sometime this past Saturday immediately after the last base backup.

I did some troubleshooting and found an older thread mentioning a similar issue. So I manually ran the wal-push command it was able to process all of the backlog. But now no matter what I try, the archive command doesnt process any of the files in the xlog dir and the archive_status folder just starts filling up with .ready files.

Ive checked the pg logs as well as the syslog and I see no errors at all from Wal-e.

Any thoughts why archiving would just stop? I know the process has to be running because when I manually ran the wal-push command it would convert the .ready files to .done and the process itself would remove those files in groups as it ran on its own. However it just wont ship anymore segments.

Any thoughts on where to look for more issues?

Thanks in advance!

Daniel Farina

unread,
Sep 27, 2016, 7:57:35 PM9/27/16
to rjoh...@industrynewsletters.com, wal-e
On Tue, Sep 27, 2016 at 2:42 PM <rjoh...@industrynewsletters.com> wrote:
I've had Wal-e setup and running fine for months. It runs a base backup every friday night and has been archiving to Rackspace files without any issues.

Today, I received a file system warning saying the box was running low on disk space when we average about 120GB free on that box. After investigating, the pb_xlog directory had over 5000 files sitting in it. 

This looks like it started sometime this past Saturday immediately after the last base backup.
 
The last base backup is probably coincidental... 
I did some troubleshooting and found an older thread mentioning a similar issue. So I manually ran the wal-push command it was able to process all of the backlog. But now no matter what I try, the archive command doesnt process any of the files in the xlog dir and the archive_status folder just starts filling up with .ready files.

Ive checked the pg logs as well as the syslog and I see no errors at all from Wal-e.

This is very suspicious, because WAL-E prints a message when it starts up, before it has done anything interesting. I'd look around very carefully to make sure it has started correctly. Like, carefully checking your archiving command, making sure $PATH is the same as Postgres's, and stuff like that...

rjoh...@industrynewsletters.com

unread,
Sep 28, 2016, 10:38:23 AM9/28/16
to wal-e, rjoh...@industrynewsletters.com
Woke up this morning to well over 1000 WAL segments sitting in the xlog dir and archive_status.

Still not seeing ANYTHING in any of the logs at all. I've been through every log under /var/logs including postgres logs for the last several days with nothing.

I did just spot something that I didnt notice yesterday, in the process list I see an entry like this:
00:05:55 postgres: archiver process   archiving 00000002000005F000000027

But when I check the xlog and archive_status directories, there is not file by that name. Could this just be a hung archive process? Safe to kill it off?

Thanks again for the help

Daniel Farina

unread,
Sep 28, 2016, 10:41:56 AM9/28/16
to rjoh...@industrynewsletters.com, wal-e
On Wed, Sep 28, 2016 at 7:38 AM <rjoh...@industrynewsletters.com> wrote:
Woke up this morning to well over 1000 WAL segments sitting in the xlog dir and archive_status.

Still not seeing ANYTHING in any of the logs at all. I've been through every log under /var/logs including postgres logs for the last several days with nothing.

I did just spot something that I didnt notice yesterday, in the process list I see an entry like this:
00:05:55 postgres: archiver process   archiving 00000002000005F000000027

But when I check the xlog and archive_status directories, there is not file by that name. Could this just be a hung archive process? Safe to kill it off?

Thanks again for the help

Have you cheched Postgres's pg_log directory? Maybe stderr is telling you something syslog would not. 

rjoh...@industrynewsletters.com

unread,
Sep 28, 2016, 10:48:12 AM9/28/16
to wal-e, rjoh...@industrynewsletters.com
I've been through several days worth of logs under the /var/log/postgresql directory and the only entries I see in there are the same ones I see when I try to run a base backup manually:

2016-09-27 20:44:02 UTC WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
2016-09-27 20:44:02 UTC HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments

.

On Tuesday, September 27, 2016 at 5:42:26 PM UTC-4, rjoh...@industrynewsletters.com wrote:

Daniel Farina

unread,
Sep 28, 2016, 12:12:58 PM9/28/16
to rjoh...@industrynewsletters.com, wal-e
On Wed, Sep 28, 2016, 7:48 AM <rjoh...@industrynewsletters.com> wrote:
I've been through several days worth of logs under the /var/log/postgresql directory and the only entries I see in there are the same ones I see when I try to run a base backup manually:

The /var/log is typically targeted by syslog, no? I don't remember debian's default config that well, I'm afraid.

The thing that is fishy about all of this is that Postgres should also (in addition to WAL-E) be complaining about something, yet, you are seeing no evidence of this.

Check very carefully what your log directory is and that the logging collector is enabled:

=> show logging_collector;
 logging_collector 
-------------------
 on
(1 row)

=> show log_directory ;
 log_directory 
---------------
 pg_log
(1 row)

And then check the path $PGDATA/$log_directory. 

The thing is that the log collector can pick up stderr output, e.g. that emitted by, say, the "sh" as called by "system()" in postgres. I don't think all such output is forwarded to syslog.

Reply all
Reply to author
Forward
0 new messages