BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg

boh...@visionlink.org

unread,

Apr 5, 2013, 3:27:29 PM4/5/13

to

The following bug has been logged on the website:

Bug reference: 8043
Logged by: Jeff Bohmer
Email address: boh...@visionlink.org
PostgreSQL version: 9.2.4
Operating system: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5
Description:

Hi pgsql-bugs,

PG version: 9.2.4, from yum.postgresql.org
OS: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5

Upgrading from 9.2.3 to 9.2.4 has broken archive recovery for me. Probably
related to this 9.2.4 change:

Ensure we do crash recovery before entering archive recovery,
if the database was not stopped cleanly and a recovery.conf
file is present (Heikki Linnakangas, Kyotaro Horiguchi,
Mitsumasa Kondo)

When starting the PostgreSQL 9.2.4 service on a base backup, I get this:

2013-04-05 12:49:04 MDT [10302]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 12:49:04 MDT [10302]: [2-1] user=,db= LOG: starting archive
recovery
2013-04-05 12:49:04 MDT [10302]: [3-1] user=,db= LOG: could not open file
"pg_xlog/000000010000002F0000002D" (log file 47, segment 45): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [4-1] user=,db= LOG: invalid primary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [5-1] user=,db= LOG: could not open file
"pg_xlog/000000010000002F0000002C" (log file 47, segment 44): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [6-1] user=,db= LOG: invalid secondary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [7-1] user=,db= PANIC: could not locate a
valid checkpoint record
2013-04-05 12:49:04 MDT [10297]: [2-1] user=,db= LOG: startup process (PID
10302) was terminated by signal 6: Aborted
2013-04-05 12:49:04 MDT [10297]: [3-1] user=,db= LOG: aborting startup due
to startup process failure

The WAL file 000000010000002F0000002D does exist in my WAL archive, but not
in pg_xlog. I exclude pg_xlog files when taking the base backup, per the
instructions from
<http://www.postgresql.org/docs/9.2/static/continuous-archiving.html>:

You can, however, omit from the backup dump the files within
the cluster's pg_xlog/ subdirectory. This slight adjustment
is worthwhile because it reduces the risk of mistakes when
restoring.

I use a custom base backup script to call pg_start/stop_backup() and make
the backup with rsync.

The restore_command in recovery.conf is never called by PG 9.2.4 during
startup. I confirmed this by adding a "touch /tmp/restore_command.`date
+%H:%M:%S`" line at the beginning of the shell script I use for my
restore_command. No such files are created when starting PG 9.2.4.

After downgrading back to 9.2.3, archive recovery works using the very same
base backup, recovery.conf file, and restore_command. The log indicates that
PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
pg_xlog:

2013-04-05 13:01:14 MDT [16981]: [1-1] user=,db= LOG: database system was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 13:01:14 MDT [16981]: [2-1] user=,db= LOG: starting archive
recovery
2013-04-05 13:01:14 MDT [16981]: [3-1] user=,db= LOG: restored log file
"000000010000002F0000002D" from archive
2013-04-05 13:01:14 MDT [16981]: [4-1] user=,db= LOG: consistent recovery
state reached at 2F/2D000080
2013-04-05 13:01:14 MDT [16981]: [5-1] user=,db= LOG: redo starts at
2F/2D000080
2013-04-05 13:01:15 MDT [16981]: [6-1] user=,db= LOG: restored log file
"000000010000002F0000002E" from archive
2013-04-05 13:01:15 MDT [16981]: [7-1] user=,db= LOG: restored log file
"000000010000002F0000002F" from archive
<snip>
2013-04-05 13:01:17 MDT [16981]: [25-1] user=,db= LOG: redo done at
2F/3F07B4D0
2013-04-05 13:01:17 MDT [16981]: [26-1] user=,db= LOG: last completed
transaction was at log time 2013-04-05 12:53:01.069826-06
2013-04-05 13:01:17 MDT [16981]: [27-1] user=,db= LOG: restored log file
"000000010000002F0000003F" from archive
2013-04-05 13:01:17 MDT [16981]: [28-1] user=,db= LOG: selected new
timeline ID: 2
2013-04-05 13:01:17 MDT [16981]: [29-1] user=,db= LOG: archive recovery
complete
2013-04-05 13:01:17 MDT [16991]: [1-1] user=,db= LOG: checkpoint starting:
end-of-recovery immediate wait
2013-04-05 13:01:17 MDT [16991]: [2-1] user=,db= LOG: checkpoint complete:
wrote 327 buffers (0.1%); 0 transaction log file(s) ad
ded, 0 removed, 0 recycled; write=0.015 s, sync=0.000 s, total=0.035 s; sync
files=0, longest=0.000 s, average=0.000 s
2013-04-05 13:01:17 MDT [16978]: [2-1] user=,db= LOG: database system is
ready to accept connections

So, the behavior has definitely changed between 9.2.3 and 9.2.4. Is this a
bug in 9.2.4?

Or, must I now include pg_xlog files when taking base backups with 9.2.4,
contrary to the documentation?

Or, is there a way to make PG 9.2.4 use restore_command for recovery, as
9.2.3 does?

Thank you,
- Jeff

--
Sent via pgsql-bugs mailing list (pgsql...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Jeff Janes

unread,

Apr 5, 2013, 6:02:28 PM4/5/13

to

...

I use a custom base backup script to call pg_start/stop_backup() and make
the backup with rsync.

The restore_command in recovery.conf is never called by PG 9.2.4 during
startup. I confirmed this by adding a "touch /tmp/restore_command.`date
+%H:%M:%S`" line at the beginning of the shell script I use for my
restore_command. No such files are created when starting PG 9.2.4.

After downgrading back to 9.2.3, archive recovery works using the very same
base backup, recovery.conf file, and restore_command. The log indicates that
PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
pg_xlog:

I can reproduce the behavior you report only if I remove the "backup_label" file from the restored data directory before I begin recovery. Of course, doing that renders the backup invalid, as without it recovery is very likely to begin from the wrong WAL recovery location.

I think it is appropriate that 9.2.4 refuses to cooperate in those circumstances, and it was a bug that 9.2.3 did allow it.

Do you have a "backup_label" file?

Or, must I now include pg_xlog files when taking base backups with 9.2.4,
contrary to the documentation?

You do not need to include pg_xlog, but you do need to include backup_label. And you always did need to include it--if you were not including it in the past, then you were playing with fire and is only due to luck that your database survived.

Cheers,

Jeff

Heikki Linnakangas

unread,

Apr 6, 2013, 4:24:31 AM4/6/13

to

On 06.04.2013 01:02, Jeff Janes wrote:
> On Fri, Apr 5, 2013 at 12:27 PM,<boh...@visionlink.org> wrote:
>> I use a custom base backup script to call pg_start/stop_backup() and make
>> the backup with rsync.
>>
>> The restore_command in recovery.conf is never called by PG 9.2.4 during
>> startup. I confirmed this by adding a "touch /tmp/restore_command.`date
>> +%H:%M:%S`" line at the beginning of the shell script I use for my
>> restore_command. No such files are created when starting PG 9.2.4.
>>
>> After downgrading back to 9.2.3, archive recovery works using the very same
>> base backup, recovery.conf file, and restore_command. The log indicates
>> that
>> PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
>> pg_xlog:
>
> I can reproduce the behavior you report only if I remove the "backup_label"
> file from the restored data directory before I begin recovery. Of course,
> doing that renders the backup invalid, as without it recovery is very
> likely to begin from the wrong WAL recovery location.

Yeah, if you use pg_start/stop_backup(), there definitely should be a
backup_label present.

But there is a point here, if you use an atomic filesystem snapshot
instead of pg_start/stop_backup(), or just a plain copy of the data
directory while the system is shut down. The problem in that case is
that if pg_xlog is empty, we have no idea how far we need to recover
until the system is consistent. Actually, if the system was shut down,
then the system is consistent immediately and we could allow that, but
the problem still remains for an online backup using an atomic
filesystem snapshot.

I don't think there's much we can do about that case. We could start up
and recover all the WAL from the archive before we declare consistency,
but that gets pretty complicated, and it would still not work if you
tried to do that in a standby that uses streaming replication without a
restore_command.

So, I think what we need to do is to update the documentation to make it
clear that you must not zap pg_xlog if you take a backup without
pg_start/stop_backup(). The documentation that talks about filesystem
snapshots and offline backups doesn't actually say that you can zap
pg_xlog - that is only mentioned in the section on
pg_start/stop_backup(). But perhaps that could be made more explicit.

>> Or, must I now include pg_xlog files when taking base backups with 9.2.4,
>> contrary to the documentation?
>
> You do not need to include pg_xlog, but you do need to include
> backup_label. And you always did need to include it--if you were not
> including it in the past, then you were playing with fire and is only due
> to luck that your database survived.

Incidentally, I bumped into another custom backup script just a few
weeks back that also excluded backup_label. I don't know what the author
was thinking when he wrote that, but it seems to be a surprisingly
common mistake. Maybe it's the "label" in the filename that makes people
think it's not important. Perhaps we should improve the documentation to
make it more explicit that backup_label must be included in the backup.
The docs already say that, though, so I suspect that people making this
mistake have not read the docs very carefully anyway.

Perhaps a comment in the beginning of backup_label would help:

# NOTE: This file MUST be included in the backup. Otherwise, the backup
# is inconsistent, and restoring it may result in a corrupt database.

Jeff B., assuming that you excluded backup_label from the backup for
some reason, do you have any thoughts on what would've helped you to
avoid that mistake? Would a comment like above have helped - did you
look inside backup_label at any point?

- Heikki

Jeff Bohmer

unread,

Apr 8, 2013, 11:58:37 AM4/8/13

to

On Apr 6, 2013, at 1:24 PM, Jeff Janes <jeff....@gmail.com> wrote:

> On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas
> <hlinna...@vmware.com>wrote:

>
>>
>> Incidentally, I bumped into another custom backup script just a few weeks
>> back that also excluded backup_label. I don't know what the author was
>> thinking when he wrote that, but it seems to be a surprisingly common
>> mistake. Maybe it's the "label" in the filename that makes people think
>> it's not important.
>
>
>

> I think part of it is the name "label', and part of it is that this file is
> similar to and hence easily confused with the .history files, which (as far
> as I know) truly are there only for human information and not for system
> operation.

While the backup_label file was included in all base backups by the custom backup script, it was not present in the cluster directory when starting PG 9.2.4. Because the data directory was synced from the base backup without backup_label. Sure enough, including backup_label in the data directory fixes this for me.

I think these custom scripts were written around 8.1. That the base backups have worked all this time (each base backup is tested) is probably because the backup script, immediately after calling pg_start_backup(), rsyncs pg_control before rsync'ing everything else in the data directory. Although, it seems this quirk allows for problems if WAL files are generated between the time pg_start_backup() is called and pg_control is rsync'd.

>> Perhaps we should improve the documentation to make it more explicit that
>> backup_label must be included in the backup. The docs already say that,
>> though, so I suspect that people making this mistake have not read the docs
>> very carefully anyway.
>>
>
>

> I don't think the docs are very clear on that. They say "This file will of
> course be archived as a part of your backup dump file", but "will be" does
> not imply "must be". Elsewhere it emphasizes that the label you gave to
> pg_start_backup is written into the file, but doesn't really say what the
> file itself is there for. To me it seems to imply that the file is there
> for your convenience, to hold that label, and not as a critical part of the
> system.
>
> Patch attached, which I hope can be back-patched. I'll also add it to
> commitfest-Next.
>
> Cheers,
>
> Jeff
> <backup_label_warning_v1.patch>

I think this documentation update would be helpful.

Thank you for your help,
- Jeff
--
Jeff Bohmer | Corporate Technology Manager | VisionLink, Inc.
First National Center | 3101 Iris Avenue, Suite 240 | Boulder CO, 80301

Office 303.402.0170 x121

Other ways to stay in touch - Blog | Twitter | Facebook | LinkedIn | Web

This message and any attachments may contain information that is privileged,
confidential or exempt from disclosure under applicable law or agreement. If you
have received this message in error, please reply and delete the message and
any attachments without opening the attachment. Thank you.

Jeff Bohmer

unread,

Apr 8, 2013, 12:06:13 PM4/8/13

to

On Apr 6, 2013, at 2:24 AM, Heikki Linnakangas <hlinna...@vmware.com> wrote:

> Incidentally, I bumped into another custom backup script just a few weeks back that also excluded backup_label. I don't know what the author was thinking when he wrote that, but it seems to be a surprisingly common mistake. Maybe it's the "label" in the filename that makes people think it's not important. Perhaps we should improve the documentation to make it more explicit that backup_label must be included in the backup. The docs already say that, though, so I suspect that people making this mistake have not read the docs very carefully anyway.
>
> Perhaps a comment in the beginning of backup_label would help:
>
> # NOTE: This file MUST be included in the backup. Otherwise, the backup
> # is inconsistent, and restoring it may result in a corrupt database.
>
> Jeff B., assuming that you excluded backup_label from the backup for some reason, do you have any thoughts on what would've helped you to avoid that mistake? Would a comment like above have helped - did you look inside backup_label at any point?

I think such a comment could be helpful. The base backups did contain backup_label. That is how another script would determine what WAL files to remove. It looked at the oldest base backup's backup_label file to get the oldest WAL file to keep.

My problem was that backup_label was excluded from the data directory when it was sync'd from the base backup. (We keep base backups separate from data directories, so they can be reused.) I guess it was excluded because it was assumed to not be used by archive recovery. So, clarifying backup_label's importance there seems helpful.

Thank you for your help,
- Jeff
--
Jeff Bohmer | Corporate Technology Manager | VisionLink, Inc.
First National Center | 3101 Iris Avenue, Suite 240 | Boulder CO, 80301

Office 303.402.0170 x121

Other ways to stay in touch - Blog | Twitter | Facebook | LinkedIn | Web

This message and any attachments may contain information that is privileged,
confidential or exempt from disclosure under applicable law or agreement. If you
have received this message in error, please reply and delete the message and
any attachments without opening the attachment. Thank you.

Christoph Berg

unread,

Apr 9, 2013, 7:59:25 AM4/9/13

to

Re: Jeff Janes 2013-04-06 <CAMkU=1zvioQvMgbWssHv3DnXMNe...@mail.gmail.com>

> I don't think the docs are very clear on that. They say "This file will of
> course be archived as a part of your backup dump file", but "will be" does
> not imply "must be". Elsewhere it emphasizes that the label you gave to
> pg_start_backup is written into the file, but doesn't really say what the
> file itself is there for. To me it seems to imply that the file is there
> for your convenience, to hold that label, and not as a critical part of the
> system.

This was also my impression, tbh :-/. And the problem just popped up
again independently on #postgresql.

> Patch attached, which I hope can be back-patched. I'll also add it to
> commitfest-Next.

+1. It makes the documentation clearer on this.

Christoph
--
c...@df7cb.de | http://www.df7cb.de/

BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog

boh...@visionlink.org

Jeff Janes

Heikki Linnakangas

Jeff Bohmer

Jeff Bohmer

Christoph Berg