wal-push caused PG crash?

Steve V

unread,

Aug 3, 2016, 11:43:44 AM8/3/16

to wal-e

I started using WAL-E, and things were going fine for about 4 hours when all of a sudden it appears that postgres just shut down. Connections show in the logs as "terminating connection due to administrator command". There's nothing outwardly obvious in the logs preceding that from Postgres, just:

Aug 3 04:19:07 prd-pg1 postgres[12577]: [3-1] 2016-08-03 00:19:07 EDT [12577-4] LOG: received fast shutdown request

Aug 3 04:19:07 prd-pg1 postgres[12577]: [4-1] 2016-08-03 00:19:07 EDT [12577-5] LOG: aborting any active transactions

Aug 3 04:19:07 prd-pg1 postgres[12582]: [3-1] 2016-08-03 00:19:07 EDT [12582-2] LOG: autovacuum launcher shutting down

No one issued any such commands to PG that would have initiated a shut down. This server had previously been running for 8 months without any unscheduled downtime, now starting to use wal-e there is an event 4 hours later. Don't know if it's just a coincidence or not. I did also see in my logs a message from repmgr which we use for our clustering:

Aug 3 04:20:07 prd-pg1 postgres[25349]: [3-1] 2016-08-03 00:20:07 EDT [25349-1] repmgr@[unknown] WARNING: terminating connection because of crash of another server process

Aug 3 04:20:07 prd-pg1 postgres[25349]: [3-2] 2016-08-03 00:20:07 EDT [25349-2] repmgr@[unknown] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

I'm wondering if the abnormal exit was the archive_command? I have pretty much the archive_command verbatim from the README:

archive_command = 'envdir /etc/wal-e.d/env wal-e wal-push %p'

It seems to be working just fine, except for that one event so far, but I would like to avoid random issues if possible. Would it be unsafe for the archiving process to wrap the archive_command in a script, and capture any errors that occur?

Thanks,

Steve

Christophe Pettus

unread,

Aug 3, 2016, 11:46:37 AM8/3/16

to st...@digitalnothing.com, wal-e

On Aug 3, 2016, at 8:43 AM, Steve V <st...@digitalnothing.com> wrote:
> I'm wondering if the abnormal exit was the archive_command?

The archive_command doesn't actually connect to a PostgreSQL backend, so it wouldn't produce a backend crash. (archive_command failures, if they result in output to stderr, are in the .log file.)

It's probably a coincidence; some other connection caused a backend crash which caused a PostgreSQL restart.

--
-- Christophe Pettus
x...@thebuild.com

Daniel Farina

unread,

Aug 3, 2016, 2:03:28 PM8/3/16

to x...@thebuild.com, st...@digitalnothing.com, wal-e

Agree. WAL-E doesn't send signals or do anything as the poster describes anyway.

If one must venture a guess, perhaps memory pressure may be in play? Check for OOMs in syslog.

Reply all

Reply to author

Forward