True graceful restarts for mod_wsgi daemon mode

32 views
Skip to first unread message

Tomi Belan

unread,
May 7, 2022, 7:17:52 PMMay 7
to mod...@googlegroups.com
How much work would it take to have true graceful restarts for the mod_wsgi daemon processes?

Current behavior:
When "apache2ctl graceful" aka "httpd -k graceful" runs, the Apache parent process sends a SIGTERM to each mod_wsgi daemon process, waits up to 3 seconds (hardcoded maximum), and sends a SIGKILL to any that are still alive. After they're all dead, it spawns new wsgi processes. This is mentioned in various issues like #383 and #124, and in the documentation of WSGIDaemonProcess shutdown-timeout.
In contrast, Apache sends SIGUSR1 to its own worker processes, and whenever one of them exits, Apache spawns a new one. So there should almost always be enough processes ready to serve new connections. (https://httpd.apache.org/docs/2.4/stopping.html#graceful)

My wishlist for "true" graceful restarts would be:
1. Make the shutdown timeout configurable.
2. Don't wait until *all* old daemon processes exit. Either spawn 1 new process whenever 1 old process exits, or spawn all N new processes immediately and let the old processes exit when they want.
3. Add another signal between the SIGTERM and SIGKILL which throws a Python exception, so that "finally:" blocks have a chance to run.

Current code:
The linked github issues did mention that this behavior is hardcoded deep in Apache and there is nothing mod_wsgi can do, but I wanted to see it myself.
Actually, the logic is not anywhere in https://github.com/apache/httpd (in particular, it's NOT server/mpm_unix.c), but in https://github.com/apache/apr. Specifically the SIGKILL is sent at apr/memory/unix/apr_pools.c#L2810 and the 3 seconds timeout is hardcoded at apr/memory/unix/apr_pools.c#L98. Any subprocess registered with apr_pool_note_subprocess(..., APR_KILL_AFTER_TIMEOUT) will use that timeout. mod_wsgi calls that function at server/mod_wsgi.c#L10566.
The pool where the subprocesses are registered is the pconf pool given to wsgi_hook_init. I guess they are probably killed when Apache calls apr_pool_clear(process->pconf) in reset_process_pconf() in main.c, but I haven't verified this.
The normal worker process logic is implemented in each mpm. E.g. prefork replaces dead children with new live children at server/mpm/prefork/prefork.c#L1145, I think.

My thoughts: (please correct me if I'm wrong)
This seems pretty hard. I definitely see why it wasn't done yet. And maybe it's not worth the complexity even if it is possible.
Originally I hoped I could just write an Apache patch to replace the hardcoded timeout value with a config file option. But the logic is in a library (apr) so I can't read Apache config directly, and there might be API/ABI concerns with extending apr_pool_note_subprocess(). And anyway, *only* making the timeout configurable wouldn't be enough because the server would just wait without any mod_wsgi process accepting new connections.
I think the best chance of success would be to stop using apr_pool_t and apr_pool_note_subprocess() for process management in mod_wsgi. After all, it's not the only way: Either use fork() etc directly, like the mpm modules, or at least, keep apr_pool_t but use our own custom pool rather than "pconf" - most likely saved with ap_retained_data_get(). That way mod_wsgi would have more control. When it learns the server is gracefully restarting, it will spawn new daemon processes immediately with a new socket name, and timeout/kill the old processes later in the background. When it learns the server is stopping, it will block until the children are terminated.

Does this make sense? Are there any glaring issues I've overlooked?

If the strategy sounds sensible, and if I have enough time, I might try to code this. Is it something you would be potentially interested in merging? (not too much code review burden, maintenance burden, or risk of new bugs)

Just for completeness, the backstory of why I want this:
My Python app writes files to disk. Sadly, some requests take more than 3 seconds. If it is killed with SIGKILL, the file buffer data is not written, resulting in a corrupted empty/truncated file. A later batch process fails when it tries to read every file in the output directory. I know there are many workarounds, such as using a temporary file and atomically renaming it, but I became curious about the root cause.
The server gracefully restarts every day because of log rotation, using Ubuntu's default logrotate config. After reading #383 I also looked at Apache's rotatelogs, but it doesn't support compression, so I'd rather stay with logrotate.

Versions: Apache 2.4.41 with mpm_prefork, mod_wsgi 4.6.8 in daemon mode, Python 3.8.10, Ubuntu 20.04. (old but I don't think this matters)

Tomi

Graham Dumpleton

unread,
May 7, 2022, 7:27:38 PMMay 7
to mod...@googlegroups.com
It definitely is an annoying problem. To be honest I don't think I have ever really considered writing my own sub process manager instead of using the Apache other processes management code. I will need to think about why I never considered doing that and how complicated would be to replicate.

As to an interim solution, have a read of:


Reason am pointing at that is that if there is only one URL of your application which is writing these files, then you could consider delegating just that one URL to be handled under mod_wsgi embedded mode, rather than in the daemon mode process with the rest of your application code and aren't using preform MPM. As long as the request handler for that doesn't drag in too much code, the memory cost in Apache child processes may be manageable. By having that one URL be handled in daemon mode, then the processes it runs in will be handled under the graceful restart mode of the main Apache child processes.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/CACUV5oemMwr1YzKe%3D0JrBTma%2BwQcvyaN5Jzc5uz_Kf31mK12ng%40mail.gmail.com.

Graham Dumpleton

unread,
May 7, 2022, 7:31:33 PMMay 7
to mod...@googlegroups.com
Fixing my bad edit at the end so makes proper sense:

Reason am pointing at that is that if there is only one URL of your application which is writing these files, then you could consider delegating just that one URL to be handled under mod_wsgi embedded mode, rather than in the daemon mode process with the rest of your application code. As long as the request handler for that doesn't drag in too much code, and aren't using prefork MPM, the memory cost in Apache child processes may be manageable. By having that one URL be handled in daemon mode, then the processes it runs in will be handled under the graceful restart mode of the main Apache child processes.

Tomi Belan

unread,
May 7, 2022, 11:01:35 PMMay 7
to modwsgi
I didn't expect such a fast answer! Thank you!

I'm definitely interested if you have any other thoughts about writing a custom process manager. Especially any potential issues or edge cases that must be taken care of.

I will probably try my hand at it just for fun, but I'm not at all familiar with Apache and mod_wsgi internals, so it's pretty daunting. It probably won't go anywhere.

Looking at the code, MPM modules do have some superpowers, such as access to struct ap_unixd_mpm_retained_data. Normal modules will have a harder time distinguishing between a graceful restart and full shutdown. Maybe by registering one cleanup function on pconf and another one on ap_pglobal...? Who knows.

As for my app:
Partitioning by URL is an interesting idea. Sadly it won't work for my app, because almost every request can write these files, and the URL doesn't reveal which requests may be slow. Plus we're forced to use prefork because we need a certain ancient single-sign-on module which is not thread safe. Plus we probably can't use embedded mode anyway, because the server runs two wsgi apps with different virtualenvs, and it needs "WSGIApplicationGroup %{GLOBAL}" for the lxml library. As I understand it, embedded mode can't do that. Currently they are two daemon process-groups.
If I'm being honest with myself, the most pragmatic solution might be to switch to Gunicorn. ;) But even if it comes to that, this puzzle still interests me. It would be neat to find a proper solution, whether I ultimately use it in production or not.

Tomi Belan

unread,
May 17, 2022, 11:45:52 AMMay 17
to mod...@googlegroups.com
I didn't get far:

The main obstacle I found is that Apache uses dlclose() and dlopen() to unload and reload all module .so files during graceful reload. So registering a cleanup function on a long lived pool such as ap_pglobal or any similar trick just won't work. Any function pointers from mod_wsgi.so may become invalid. Normal data can be stored with ap_retained_data_get(), but not function pointers. See also.

It is even possible to add or remove LoadModule commands during graceful reload. So we might be dealing with a graceful reload where mod_wsgi should nevertheless shut down immediately. If we were to blindly assume that Apache graceful reload means mod_wsgi is also about to reload, it can lead to dangling child processes. But there is most likely no way to find out during the old mod_wsgi's cleanup, because the new config wasn't parsed yet.

I glanced at mod_fcgid. Unlike its modern replacement mod_proxy_fastcgi, it can spawn FastCGI services directly. I don't know if mod_fcgid handles all this stuff correctly, but I noticed it works by spawning a "mod_fcgid process manager" process, which then spawns all other children as needed. I guess something like that could work. Spawning a separate "mod_wsgi manager" process just once on first init and registering it with apr_pool_note_subprocess(ap_pglobal, ...) might do the trick -- to make sure that it gets cleaned up and avoid all the issues with function pointers or unloading/reloading of mod_wsgi. But I feel it's too big a change, with too many moving pieces and too much that can go wrong.

In conclusion I'd say mod_wsgi is at a local maximum. Its handling of graceful reloads is not the best, but it's good enough for most users, and given Apache's design and public API I don't think any easy fix exists.

That's probably all from me on this topic. It's a pity I didn't succeed, but I still had fun. So long. :)

You received this message because you are subscribed to a topic in the Google Groups "modwsgi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/modwsgi/ZqlJLOZGb5I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/6f3de9e7-d045-4b15-b771-956915c0ec32n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages