missing recovery.conf, postgresql.conf, and pg_subtrans dir on backup-fetch?

265 views
Skip to first unread message

Joe Van Dyk

unread,
Apr 4, 2014, 4:33:12 PM4/4/14
to wa...@googlegroups.com
wal-e doesn't seem to make a recovery.conf, a postgresql.conf, and a pg_subtrans directory on a backup-fetch. I had to create each of those before I could restore.

Any reason for that?

Joe

Daniel Farina

unread,
Apr 4, 2014, 4:36:40 PM4/4/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 1:33 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> wal-e doesn't seem to make a recovery.conf, a postgresql.conf, and a
> pg_subtrans directory on a backup-fetch. I had to create each of those
> before I could restore.

Well, it explicitly avoids postgresql.conf, and doesn't make a
recovery.conf. But, pg_subtrans should show up if the primary had it.
Any errors in the logs of the download?

Joe Van Dyk

unread,
Apr 4, 2014, 4:42:26 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk
On Friday, April 4, 2014 1:36:40 PM UTC-7, Daniel Farina wrote:
On Fri, Apr 4, 2014 at 1:33 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> wal-e doesn't seem to make a recovery.conf, a postgresql.conf, and a
> pg_subtrans directory on a backup-fetch. I had to create each of those
> before I could restore.

Well, it explicitly avoids postgresql.conf, and doesn't make a
recovery.conf.

Why?
 
 But, pg_subtrans should show up if the primary had it.
 Any errors in the logs of the download?

No errors. Primary has it. 

Joe Van Dyk

unread,
Apr 4, 2014, 4:45:50 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk

Daniel Farina

unread,
Apr 4, 2014, 5:32:25 PM4/4/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 1:42 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Friday, April 4, 2014 1:36:40 PM UTC-7, Daniel Farina wrote:
>>
>> On Fri, Apr 4, 2014 at 1:33 PM, Joe Van Dyk <joev...@gmail.com> wrote:
>> > wal-e doesn't seem to make a recovery.conf, a postgresql.conf, and a
>> > pg_subtrans directory on a backup-fetch. I had to create each of those
>> > before I could restore.
>>
>> Well, it explicitly avoids postgresql.conf, and doesn't make a
>> recovery.conf.

The latter it never did make, because there are recovery.conf settings
in there that WAL-E as-is doesn't know anything about, like hot
standby or where the primary_conninfo ought to be or even how to load
configuration for WAL-E. I agree it'd be nice to get to one-less-step
without losing power somehow, but it's a new surface area that to date
WAL-E has never had.

As for avoiding postgresql.conf: my rationale is that it is full of
absolute paths and settings that can be dangerous upon restore.

The paradigm I have been using since the get-go is to always configure
a new postgresql.conf upon cluster set-up, including WAL-E restores.
This perhaps makes a bit more sense in Heroku's case, where one can
create standbys on both bigger and smaller database plan sizes where
the memory settings are totally different all the time. Heroku also
uses unique (file system) paths for every database install, so it
never occurred to me to try to keep postgres.conf the same looking in
the archives from the beginning, and its presence only represented a
foot-gun.

But, the other side of this is not lost on me. You can see the
trade-offs in detail in this thread raised some time ago:
http://comments.gmane.org/gmane.comp.db.postgresql.wal-e/239. There's
a workaround in there too.

>> But, pg_subtrans should show up if the primary had it.
>> Any errors in the logs of the download?
>
>
> No errors. Primary has it.

Well that's disturbing. Uh. What's the exact path relative to
$PGDATA that's missing? I think to catch this one we'll need some
instrumentation, and if you can reproduce this then I don't want to
let this go if you have the time.

Joe Van Dyk

unread,
Apr 4, 2014, 5:38:36 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk


On Friday, April 4, 2014 2:32:25 PM UTC-7, Daniel Farina wrote:
On Fri, Apr 4, 2014 at 1:42 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Friday, April 4, 2014 1:36:40 PM UTC-7, Daniel Farina wrote:
>>
>> On Fri, Apr 4, 2014 at 1:33 PM, Joe Van Dyk <joev...@gmail.com> wrote:
>> > wal-e doesn't seem to make a recovery.conf, a postgresql.conf, and a
>> > pg_subtrans directory on a backup-fetch. I had to create each of those
>> > before I could restore.
>>
>> Well, it explicitly avoids postgresql.conf, and doesn't make a
>> recovery.conf.

The latter it never did make, because there are recovery.conf settings
in there that WAL-E as-is doesn't know anything about, like hot
standby or where the primary_conninfo ought to be or even how to load
configuration for WAL-E.  I agree it'd be nice to get to one-less-step
without losing power somehow, but it's a new surface area that to date
WAL-E has never had. 

As for avoiding postgresql.conf: my rationale is that it is full of
absolute paths and settings that can be dangerous upon restore.

Maybe make recovery.conf.sample and postresql.conf.old files? 

recovery.conf.sample could use the sample recovery line in the documentation, with some information gathered from how the current wal-e backup-fetch process was started.

And postgresql.conf.old could contain the original master postgresql.conf file.

Then, at the end of backup-fetch, tell users about two files and indicate that they should restore them or make new ones?
 

The paradigm I have been using since the get-go is to always configure
a new postgresql.conf upon cluster set-up, including WAL-E restores.
This perhaps makes a bit more sense in Heroku's case, where one can
create standbys on both bigger and smaller database plan sizes where
the memory settings are totally different all the time.  Heroku also
uses unique (file system) paths for every database install, so it
never occurred to me to try to keep postgres.conf the same looking in
the archives from the beginning, and its presence only represented a
foot-gun.

But, the other side of this is not lost on me.  You can see the
trade-offs in detail in this thread raised some time ago:
http://comments.gmane.org/gmane.comp.db.postgresql.wal-e/239.  There's
a workaround in there too.

>>  But, pg_subtrans should show up if the primary had it.
>>  Any errors in the logs of the download?
>
>
> No errors. Primary has it.

Well that's disturbing.  Uh.  What's the exact path relative to
$PGDATA that's missing? I think to catch this one we'll need some
instrumentation, and if you can reproduce this then I don't want to
let this go if you have the time.

./pg_subtrans 

I'll see if it can be reproduced..

Daniel Farina

unread,
Apr 4, 2014, 6:33:32 PM4/4/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 2:38 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Friday, April 4, 2014 2:32:25 PM UTC-7, Daniel Farina wrote:
>> The latter it never did make, because there are recovery.conf settings
>> in there that WAL-E as-is doesn't know anything about, like hot
>> standby or where the primary_conninfo ought to be or even how to load
>> configuration for WAL-E. I agree it'd be nice to get to one-less-step
>> without losing power somehow, but it's a new surface area that to date
>> WAL-E has never had.
>>
>>
>> As for avoiding postgresql.conf: my rationale is that it is full of
>> absolute paths and settings that can be dangerous upon restore.
>
>
> Maybe make recovery.conf.sample and postresql.conf.old files?
>
> recovery.conf.sample could use the sample recovery line in the
> documentation, with some information gathered from how the current wal-e
> backup-fetch process was started.
>
> And postgresql.conf.old could contain the original master postgresql.conf
> file.
>
> Then, at the end of backup-fetch, tell users about two files and indicate
> that they should restore them or make new ones?

That sounds like a kernel of a good idea to me. There's some fine
tuning about the file names (as so one can overwrite or, alternatively
not-back-up old sample files) after a failover, but I like the
convenience to mechanism to power ratio in general. I'll think on it.

>> >> But, pg_subtrans should show up if the primary had it.
>> >> Any errors in the logs of the download?
>> >
>> >
>> > No errors. Primary has it.
>>
>> Well that's disturbing. Uh. What's the exact path relative to
>> $PGDATA that's missing? I think to catch this one we'll need some
>> instrumentation, and if you can reproduce this then I don't want to
>> let this go if you have the time.
>
>
> ./pg_subtrans
>
> I'll see if it can be reproduced..

Thanks a lot for the help on that one.

Joe Van Dyk

unread,
Apr 4, 2014, 6:41:22 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk
wal_e.worker.s3.s3_worker INFO     MSG: beginning partition download
        DETAIL: The partition being downloaded is part_99.tar.lzo.
        HINT: The absolute S3 key is wal-e/9.3/basebackups_005/base_0000000100000657000000A4_00678264/tar_partitions/part_99.tar.lzo.
        STRUCTURED: time=2014-04-04T21:58:18.805505-00 pid=12123

  [mon...@db.tanga.com:/disk/scratch]  $ ls -l wal-e-test/pg_
pg_clog/       pg_ident.conf  pg_notify/     pg_snapshots/  pg_stat_tmp/   pg_twophase/
pg_hba.conf    pg_multixact/  pg_serial/     pg_stat/       pg_tblspc/     pg_xlog/ 

pg_subtrans isn't there.

On the master:

$ sudo ls -l /mnt/postgresql/9.3/pg_subtrans
total 5024
-rw------- 1 postgres postgres 262144 Apr  4 09:35 18BE
-rw------- 1 postgres postgres 262144 Apr  4 10:00 18BF
-rw------- 1 postgres postgres 262144 Apr  4 10:20 18C0
-rw------- 1 postgres postgres 262144 Apr  4 10:40 18C1
-rw------- 1 postgres postgres 262144 Apr  4 11:00 18C2
-rw------- 1 postgres postgres 262144 Apr  4 11:25 18C3
-rw------- 1 postgres postgres 262144 Apr  4 11:40 18C4
-rw------- 1 postgres postgres 262144 Apr  4 12:05 18C5
-rw------- 1 postgres postgres 262144 Apr  4 12:20 18C6
-rw------- 1 postgres postgres 262144 Apr  4 12:40 18C7
-rw------- 1 postgres postgres 262144 Apr  4 12:55 18C8
-rw------- 1 postgres postgres 262144 Apr  4 13:15 18C9
-rw------- 1 postgres postgres 262144 Apr  4 13:30 18CA
-rw------- 1 postgres postgres 262144 Apr  4 13:50 18CB
-rw------- 1 postgres postgres 262144 Apr  4 14:10 18CC
-rw------- 1 postgres postgres 262144 Apr  4 14:25 18CD
-rw------- 1 postgres postgres 262144 Apr  4 14:50 18CE
-rw------- 1 postgres postgres 262144 Apr  4 15:10 18CF
-rw------- 1 postgres postgres 262144 Apr  4 15:30 18D0
-rw------- 1 postgres postgres  81920 Apr  4 15:35 18D1

$ sudo ls -l /mnt/postgresql/9.3
...
drwx------ 2 postgres postgres 4096 Apr  4 15:30 pg_subtrans

I don't know much about the internals of pg_subtrans -- does it exist on all postgresql installations? Or is it created as needed?

All the subtrans directories on the master look like they were recently created, what would happen if that directory was created after the wal-e basebackup was taken? The wal-e base backup was taken before 9 am today.

Daniel Farina

unread,
Apr 4, 2014, 7:10:59 PM4/4/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 3:41 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> $ sudo ls -l /mnt/postgresql/9.3
> ...
> drwx------ 2 postgres postgres 4096 Apr 4 15:30 pg_subtrans
>
> I don't know much about the internals of pg_subtrans -- does it exist on all
> postgresql installations? Or is it created as needed?
>
> All the subtrans directories on the master look like they were recently
> created, what would happen if that directory was created after the wal-e
> basebackup was taken? The wal-e base backup was taken before 9 am today.

It *should* always exist. initdb makes one. the files within are
bitmaps IIRC. Is your leader database (thing being backed-up)
pg_subtrans directory new somehow?

Is your pg_subtrans newer than, say, the "base" directory, or the
other slew of directories seen in "initdb" ?

Joe Van Dyk

unread,
Apr 4, 2014, 8:24:42 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk
I'm not sure how to figure out when a directory was created -- I know how to get last access, modified, and changed dates.

But I haven't done anything weird with the pg_subtrans or postgresql directories, other than having pg_xlog be a symlink.

Joe 

Daniel Farina

unread,
Apr 4, 2014, 8:39:54 PM4/4/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 5:24 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Friday, April 4, 2014 4:10:59 PM UTC-7, Daniel Farina wrote:
>>
>> On Fri, Apr 4, 2014 at 3:41 PM, Joe Van Dyk <joev...@gmail.com> wrote:
>> > $ sudo ls -l /mnt/postgresql/9.3
>> > ...
>> > drwx------ 2 postgres postgres 4096 Apr 4 15:30 pg_subtrans
>> >
>> > I don't know much about the internals of pg_subtrans -- does it exist on
>> > all
>> > postgresql installations? Or is it created as needed?
>> >
>> > All the subtrans directories on the master look like they were recently
>> > created, what would happen if that directory was created after the wal-e
>> > basebackup was taken? The wal-e base backup was taken before 9 am today.
>>
>> It *should* always exist. initdb makes one. the files within are
>> bitmaps IIRC. Is your leader database (thing being backed-up)
>> pg_subtrans directory new somehow?
>>
>> Is your pg_subtrans newer than, say, the "base" directory, or the
>> other slew of directories seen in "initdb" ?
>
> I'm not sure how to figure out when a directory was created -- I know how to
> get last access, modified, and changed dates.

The change time is probably best.

Maybe a dump of this would help:

ls -ldc $PGDATA/*

Joe Van Dyk

unread,
Apr 4, 2014, 9:27:01 PM4/4/14
to wa...@googlegroups.com, Joe Van Dyk
$ ls -ldc /mnt/postgresql/9.3/*
-rw------- 1 postgres postgres  213 Nov 10 12:00 /mnt/postgresql/9.3/backup_label.old
drwx------ 8 postgres postgres 4096 Nov 12 14:10 /mnt/postgresql/9.3/base
drwx------ 2 postgres postgres 4096 Apr  4 04:51 /mnt/postgresql/9.3/global
drwx------ 2 postgres postgres 4096 Apr  4 18:08 /mnt/postgresql/9.3/pg_clog
-rw------- 1 postgres postgres 1915 Apr  4 11:05 /mnt/postgresql/9.3/pg_hba.conf
-rw------- 1 postgres postgres 1636 Nov 10 11:59 /mnt/postgresql/9.3/pg_ident.conf
drwx------ 4 postgres postgres 4096 Nov 10 11:59 /mnt/postgresql/9.3/pg_multixact
drwx------ 2 postgres postgres 4096 Mar 24 22:05 /mnt/postgresql/9.3/pg_notify
drwx------ 2 postgres postgres 4096 Nov 10 11:59 /mnt/postgresql/9.3/pg_serial
drwx------ 2 postgres postgres 4096 Nov 10 11:59 /mnt/postgresql/9.3/pg_snapshots
drwx------ 2 postgres postgres 4096 Mar 24 22:05 /mnt/postgresql/9.3/pg_stat
drwx------ 2 postgres postgres 4096 Apr  4 18:26 /mnt/postgresql/9.3/pg_stat_tmp
drwx------ 2 postgres postgres 4096 Apr  4 18:25 /mnt/postgresql/9.3/pg_subtrans
drwx------ 2 postgres postgres 4096 Nov 10 11:59 /mnt/postgresql/9.3/pg_tblspc
drwx------ 2 postgres postgres 4096 Nov 10 11:59 /mnt/postgresql/9.3/pg_twophase
-rw------- 1 postgres postgres    4 Nov 10 11:59 /mnt/postgresql/9.3/PG_VERSION
lrwxrwxrwx 1 postgres postgres   18 Nov 10 11:59 /mnt/postgresql/9.3/pg_xlog -> /mnt1/pg_xlogs/9.3
-rw------- 1 postgres postgres 1051 Jan  1 23:30 /mnt/postgresql/9.3/postgresql.conf
-rw------- 1 postgres postgres   73 Mar 24 22:05 /mnt/postgresql/9.3/postmaster.opts
-rw------- 1 postgres postgres   73 Mar 24 22:05 /mnt/postgresql/9.3/postmaster.pid 

Daniel Farina

unread,
Apr 9, 2014, 6:10:35 PM4/9/14
to Joe Van Dyk, wal-e
On Fri, Apr 4, 2014 at 6:27 PM, Joe Van Dyk <joev...@gmail.com> wrote:
>
> $ ls -ldc /mnt/postgresql/9.3/*
> [...]
> drwx------ 2 postgres postgres 4096 Apr 4 18:25 /mnt/postgresql/9.3/pg_subtrans
> [...]


Heh. The last change time is recent, but so it is for several other
directories with thrashing contents, which is expected. Unfortunately
this diagnostic is not as useful as I had hoped because I forgot that
change time is useless on directories with changing entries.

To make matters worse, I've never seen this bug in spite of all the
backups/restores I've done (automatically) which have fingered several
other obscure bugs.

All in all, I'd like to pin down two things:

1) If one pokes at the tar files in S3 using a program like "tar", is
there no pg_subtrans directory entry in any of them?

This would isolate the problem to the back-up routine rather than
the restore routines.

2) Presuming the first fact yields "yes, it's a problem in taking the
backup", If you take another base backup, does it routinely produce
such problems (i.e. this is workload-dependent, or even
deterministic)?

Alternatively, if it's a problem in restore, well, life is much easier
because the specific manifest in each tar and can be put under a
microscope and the extraction routines can be fixed for everyone in a
point release.

jeff....@gmail.com

unread,
May 6, 2014, 1:04:22 PM5/6/14
to wa...@googlegroups.com, Joe Van Dyk

Just ran into this with a customer.

In our case pg_subtrans exists but is empty on the primary.

Unfortunately, the base backup is about 600GB in size, so it will be quite tedious to manually pull down all the segments.

Does anyone have a script handy for this?

Joe Van Dyk

unread,
May 8, 2014, 3:50:02 PM5/8/14
to wa...@googlegroups.com, Joe Van Dyk
On Friday, April 4, 2014 3:33:32 PM UTC-7, Daniel Farina wrote:
On Fri, Apr 4, 2014 at 2:38 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Friday, April 4, 2014 2:32:25 PM UTC-7, Daniel Farina wrote:
>> The latter it never did make, because there are recovery.conf settings
>> in there that WAL-E as-is doesn't know anything about, like hot
>> standby or where the primary_conninfo ought to be or even how to load
>> configuration for WAL-E.  I agree it'd be nice to get to one-less-step
>> without losing power somehow, but it's a new surface area that to date
>> WAL-E has never had.
>>
>>
>> As for avoiding postgresql.conf: my rationale is that it is full of
>> absolute paths and settings that can be dangerous upon restore.
>
>
> Maybe make recovery.conf.sample and postresql.conf.old files?
>
> recovery.conf.sample could use the sample recovery line in the
> documentation, with some information gathered from how the current wal-e
> backup-fetch process was started.
>
> And postgresql.conf.old could contain the original master postgresql.conf
> file.

BTW, 9.4 will also have a 'postgresql.auto.conf ' file, probably want to back that up as well.

Joe

Daniel Farina

unread,
May 22, 2014, 12:52:23 AM5/22/14
to Joe Van Dyk, wal-e
On Thu, May 8, 2014 at 12:50 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> BTW, 9.4 will also have a 'postgresql.auto.conf ' file, probably want to
> back that up as well.

That should come for free if WAL-E doesn't change I think.

linux...@gmail.com

unread,
Oct 24, 2014, 12:19:23 PM10/24/14
to wa...@googlegroups.com, joev...@gmail.com
Has there been any movement on this? We just ran into this ourselves, e.g; pg_subtrans is not transfered.

Joe Van Dyk

unread,
Oct 24, 2014, 12:27:14 PM10/24/14
to linux...@gmail.com, wal-e
I haven't seen the error recently, for what it's worth. 
Reply all
Reply to author
Forward
0 new messages