Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hot standby with streaming replication under PgSQL 9.1.x : failover when master crashes?

91 views
Skip to first unread message

David Guyot (English)

unread,
Apr 2, 2013, 10:29:34 AM4/2/13
to
Hello, world!

I'm trying to figure out how to use fail-over with 2 PgSQL 9.1 servers;
I already succeeded in exchanging roles (ie. making standby master and
vice versa), but only when both master and slave are up and running. I
tried to simulate a master crash by killing its virtual machine, but,
when I try to promote the standby server, even if
/usr/lib/postgresql/9.1/bin/pg_ctl promote -D
/var/lib/postgresql/9.1/main/ answers that promote is in progress, it
looks like the standby tries to connect to the no-longer-running master
to get some WAL files (despite the streaming replication; I noticed it
when I saw PgSQL launching SCP attempts on the crashed master), and, as
the master no longer runs, fails and stop. Here is the corresponding
(CSV) log output :

2013-04-02 15:13:56.983
CEST,,,3523,"[local]",515ad994.dc3,1,"",2013-04-02 15:13:56
CEST,,0,LOG,08P01,"incomplete startup
packet",,,,,,,,"ProcessStartupPacket, postmaster.c:1604",""
2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,1,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"database system was interrupted; last known up at
2013-04-02 14:46:17 CEST",,,,,,,,"StartupXLOG, xlog.c:6098",""
2013-04-02 15:13:56.985 CEST,,,3522,,515ad994.dc2,2,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"creating missing WAL directory
""pg_xlog/archive_status""",,,,,,,,"ValidateXLOGDirectoryStructure,
xlog.c:3514",""
2013-04-02 15:13:57.589
CEST,"postgres","postgres",3529,"[local]",515ad995.dc9,1,"",2013-04-02
15:13:57 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:57.842 CEST,,,3522,,515ad994.dc2,3,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG,
xlog.c:6164",""
2013-04-02 15:13:58.106
CEST,"postgres","postgres",3536,"[local]",515ad996.dd0,1,"",2013-04-02
15:13:58 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:58.620
CEST,"postgres","postgres",3539,"[local]",515ad996.dd3,1,"",2013-04-02
15:13:58 CEST,,0,FATAL,57P03,"the database system is starting
up",,,,,,,,"ProcessStartupPacket, postmaster.c:1858",""
2013-04-02 15:13:58.663 CEST,,,3522,,515ad994.dc2,4,,2013-04-02 15:13:56
CEST,,0,LOG,00000,"restored log file ""0000000200000001000000D5"" from
archive",,,,,,,,"RestoreArchivedFile, xlog.c:3082",""
2013-04-02 15:13:59.005 CEST,,,3522,,515ad994.dc2,5,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"redo starts at 1/D5000020",,,,,,,,"StartupXLOG,
xlog.c:6563",""
2013-04-02 15:13:59.006 CEST,,,3522,,515ad994.dc2,6,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"consistent recovery state reached at
1/D6000000",,,,,,,,"CheckRecoveryConsistency, xlog.c:7023",""
2013-04-02 15:13:59.006 CEST,,,3520,,515ad992.dc0,1,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"database system is ready to accept read only
connections",,,,,,,,"sigusr1_handler, postmaster.c:4268",""
2013-04-02 15:13:59.493 CEST,,,3548,,515ad997.ddc,1,,2013-04-02 15:13:59
CEST,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,"libpqrcv_connect, libpqwalreceiver.c:171",""
2013-04-02 15:15:46.940 CEST,,,3522,,515ad994.dc2,7,,2013-04-02 15:13:56
CEST,1/1,0,LOG,00000,"received promote
request",,,,,,,,"CheckForStandbyTrigger, xlog.c:10578",""
2013-04-02 15:15:46.940 CEST,,,3548,,515ad997.ddc,2,,2013-04-02 15:13:59
CEST,,0,FATAL,57P01,"terminating walreceiver process due to
administrator command",,,,,,,,"ProcessWalRcvInterrupts,
walreceiver.c:150",""
2013-04-02 15:16:08.044 CEST,,,3522,,515ad994.dc2,8,,2013-04-02 15:13:56
CEST,1/1,0,FATAL,XX000,"could not restore file
""0000000200000001000000D6"" from archive: return code
65280",,,,,,,,"RestoreArchivedFile, xlog.c:3132",""
2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,2,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"startup process (PID 3522) exited with exit code
1",,,,,,,,"LogChildExit, postmaster.c:2878",""
2013-04-02 15:16:08.044 CEST,,,3520,,515ad992.dc0,3,,2013-04-02 15:13:54
CEST,,0,LOG,00000,"terminating any other active server
processes",,,,,,,,"HandleChildCrash, postmaster.c:2712",""

As you can see, it tries to download an archive file from the old
master, but, obviously, it can't as the master has stopped.

Here is my recovery.conf :

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.80 port=5433 user=replika
password=replika'
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'

And my postgresql.conf (only the non-default values, same config on both
servers; note that all locales but lc_messages are in French because I'm
French and so are the servers [no froggies jokes, please ;)], but the
problem remains even if locales are all set to fr_FR.UTF-8; lc_messages
is set to en_GB.UTF-8 to allow me to send this mesage with English logs) :

data_directory = '/var/lib/postgresql/9.1/main'
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.1-main.pid'
listen_addresses = '*'
port = 5433
max_connections = 10
unix_socket_directory = '/var/run/postgresql'
ssl = true
shared_buffers = 24MB
wal_level = hot_standby
wal_buffers = 8MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/WAL/%f && cp %p
/var/lib/postgresql/WAL/%f'
max_wal_senders = 5
hot_standby = on
log_destination = 'csvlog'
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-requests.log'
log_min_messages = info
log_error_verbosity = verbose
log_line_prefix = '%t '
lc_messages = 'en_GB.UTF-8'
lc_monetary = 'fr_FR.UTF-8'
lc_numeric = 'fr_FR.UTF-8'
lc_time = 'fr_FR.UTF-8'
default_text_search_config = 'pg_catalog.french'

I assume I missed something there because, as this is a replication
mechanism, in my opinion, the standby should simply be promoted after
master failure, or at least I should have the possibility to ask this to
the standby while asking not to connect to the old master and make its
own way with the local data and whose the replication stream already
sent. Or is it the standby promotion procedure in case of master failure
which misses something? I can't tell because the documentation isn't
verbose about the standby promotion after a master failure (or I wasn't
able to find it by myself). If there is a detailed procedure available
for the event of a master crash, where is it? Else, where is my error?

Thank you in advance.

Regards.

--
David Guyot
Sysadmin
Europe Camions Interactive
Moulin Collot
F-88500 Ambacourt

Tel: +33 (0)3 29 30 47 85
Fax : +33 (0)3 29 31 31 31



--
Sent via pgsql-admin mailing list (pgsql...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Douglas J Hunley

unread,
Apr 2, 2013, 10:47:46 AM4/2/13
to

On Tue, Apr 2, 2013 at 10:29 AM, David Guyot (English) <david...@europecamions-interactive.com> wrote:
restore_command = 'rsync -av 192.168.0.80:/var/lib/postgresql/WAL/%f %p'

You're telling the recovery process to get the files off the (now dead) master here...


--
Douglas J Hunley (doug....@gmail.com)
Twitter: @hunleyd                                               Web: douglasjhunley.com
G+: http://goo.gl/sajR3

David Guyot

unread,
Apr 2, 2013, 10:53:12 AM4/2/13
to
Mister,

Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?


Thank you in advance.

Regards.

-- 
David Guyot
Administrateur système, réseau et télécommunications
Europe Camions Interactive
Moulin Collot
F-88500 Ambacourt

Tel: +33 (0)3 29 30 47 85
signature.asc

Douglas J Hunley

unread,
Apr 2, 2013, 12:51:48 PM4/2/13
to

On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david...@europecamions-interactive.com> wrote:
Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?

If you're going to do WAL shipping, your 'archive_command' on the master needs to be copying over to the standby (rsync or whatever). Then the standby's recovery.conf would be a simple 'cp /path/to/wal-archive' type of command. 

Douglas J Hunley

unread,
Apr 2, 2013, 12:54:02 PM4/2/13
to
On Tue, Apr 2, 2013 at 10:53 AM, David Guyot <david...@europecamions-interactive.com> wrote:
Mister,

Thank you for your answer. I had in mind that this line was for the initial start of the standby; should I just comment it before promoting the standby, or is this line useless for the day-to-day standby service?

This might help (just stop before the cascade stuff) http://bartek.im/blog/2012/12/04/postgresql-92-streaming-primer.html

AJ Weber

unread,
Apr 10, 2013, 2:15:30 PM4/10/13
to
AFAIK, you have to use stunnel to do it (which is not hard to setup, but it almost makes you wonder whether you should go to the trouble of using pgbouncer at all).

I just went through this and I ended up just testing direct connections through the tunnel without pgbouncer in the middle.  It worked for the most part.  I had some unexplained issues with my app that I couldn't pinpoint so I stopped testing it for a while.  (And my app already does connection pooling, so pgbouncer wasn't going to help that much in that regard.)


On 4/10/2013 2:06 PM, Bhanu Murthy wrote:
Hi all,
 
Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)? 
 
Thanks in advance!
 
Bhanu M. Gandikota
Cell: (415) 420-7740
 

k...@rice.edu

unread,
Apr 10, 2013, 2:16:56 PM4/10/13
to
On Wed, Apr 10, 2013 at 11:06:32AM -0700, Bhanu Murthy wrote:
> Hi all,
> �
> Can someone please point me to detailed documentation on how to secure/encrypt connections between PGBouncer and Postgresql database (version 8.4.3)?�
> �
> Thanks in advance!
> �
> Bhanu M. Gandikota
> Cell: (415) 420-7740

Hi Bhanu,

You will need to use your link encryption process of choice to tunnel the connections
from pgbouncer to the backend. SSH and STunnel are two that we have used successfully
in the past.

Regards,
Ken
0 new messages