Different grain shutdown options

10 views
Skip to first unread message

Troy Farrell

unread,
Jan 25, 2021, 9:10:59 PM1/25/21
to Sandstorm Development
Hi Sandstormers,

I have a project that uses PostgreSQL.  My observation is that PostgreSQL is not happy with being killed all of the time.

My grain logs show this when the grain starts:

    waiting for PostgreSQL to be available at /var/run/postgresql/.s.PGSQL.5432
    LOG:  database system was interrupted; last known up at 2021-01-24 10:34:07 GMT
    LOG:  database system was not properly shut down; automatic recovery in progress
    LOG:  redo starts at 0/14EF198
    LOG:  invalid record length at 0/15D9620: wanted 24, got 0
    LOG:  redo done at 0/15D95F8
    LOG:  last completed transaction was at log time 2021-01-24 10:35:14.122739+00
    LOG:  MultiXact member wraparound protections are now enabled
    LOG:  autovacuum launcher started
    LOG:  database system is ready to accept connections

The PostgreSQL documentation on shutting down the server suggest that a more reasonable approach might be sending SIGINT, waiting several seconds, then sending SIGKILL if the server has not exited.

https://www.postgresql.org/docs/current/server-shutdown.html

I wonder if having Sandstorm send a SIGINT before a SIGKILL might be a reasonable thing to try.  If you agree, I am happy to attempt to solve this problem.  Perhaps packages could request a shutdownStrategy via sandstorm-pkgdef.capnp.

Also, I don't know how best to get the signal to the database server, so I'll have to work that out.  I suspect that the Sandstorm HTTP bridge would receive the signal, and not pass it along to the other processes.

Thanks for your feedback.
Troy

Ian Denhardt

unread,
Jan 25, 2021, 10:05:12 PM1/25/21
to Sandstorm Development, Troy Farrell
Hey Troy,

Are you actually seeing a *problem*, or is it just the log noise that's
a concern?

Quoting Troy Farrell (2021-01-25 21:10:59)

> Also, I don't know how best to get the signal to the database server,
> so I'll have to work that out. I suspect that the Sandstorm HTTP
> bridge would receive the signal, and not pass it along to the other
> processes.

With the current situation you can't; SIGKILL is special in that it
cannot be caught, and will kill the entire grain instantly, giving it no
chance to respond.

> I wonder if having Sandstorm send a SIGINT before a SIGKILL might be a
> reasonable thing to try.

This has come up a couple times before, and the consensus is it is not a
good idea. We should probably have an FAQ somewhere, but:

Sandstorm subscribes to a school of thought wrt the design of server
software that usually goes under the heading "crash only software" --
the idea is, if you include a "clean shutdown" command or the like
for your server, all you've really achieved is making sure the recovery
code path never gets tested. Ultimately there's nothing Sandstorm can to
do save apps from unclean shutdowns -- power outages happen, and so an
app *must* be able to recover. Not giving apps a way to "clean up"
before shutdown means that recovery after a crash is likely to be well
tested.

Fortunately any database worth its salt can recover just fine after a
crash -- postgres included -- so unless the app is actually misbehaving
in some way I would just not worry about it.

-Ian

Troy Farrell

unread,
Jan 25, 2021, 10:29:01 PM1/25/21
to Sandstorm Development
Ian, thanks for the quick reply.

Are you actually seeing a *problem*, or is it just the log noise that's
a concern?

I haven't seen data loss, but I'm not stress-testing the database yet.
 
> I wonder if having Sandstorm send a SIGINT before a SIGKILL might be a
> reasonable thing to try.

This has come up a couple times before, and the consensus is it is not a
good idea. We should probably have an FAQ somewhere, but:

Sandstorm subscribes to a school of thought wrt the design of server
software that usually goes under the heading "crash only software" --
the idea is, if you include a "clean shutdown" command or the like
for your server, all you've really achieved is making sure the recovery
code path never gets tested. Ultimately there's nothing Sandstorm can to
do save apps from unclean shutdowns -- power outages happen, and so an
app *must* be able to recover. Not giving apps a way to "clean up"
before shutdown means that recovery after a crash is likely to be well
tested.

Fortunately any database worth its salt can recover just fine after a
crash -- postgres included -- so unless the app is actually misbehaving
in some way I would just not worry about it.

I'm familiar with crash-only software, though I hadn't heard the potential for data loss justified as thorough testing of the recovery code path.  Some of us have enough power-related problems that the recovery code path is well-tested regardless of our best efforts.

Perhaps you or Kenton can answer this: does fsync() work inside the sandbox?  If it works, then PostgreSQL should be fine.  I suppose that I should try some stress tests to make sure that my configuration of PostgreSQL is sound.

Ian Denhardt

unread,
Jan 25, 2021, 11:09:35 PM1/25/21
to Troy Farrell, sandst...@googlegroups.com
Forgot to Cc the list.

Quoting Ian Denhardt (2021-01-25 23:08:40)
> Quoting Troy Farrell (2021-01-25 22:29:01)
>
> > Perhaps you or Kenton can answer this: does fsync() work inside the
> > sandbox? If it works, then PostgreSQL should be fine.
>
> Yes, fsync() works. I'd be pretty surprised if postgres's ACID
> guarantees didn't work properly inside of Sandstorm.
>
> -Ian
Reply all
Reply to author
Forward
0 new messages