Random segfaults of quiescent mod_wsgi processes

33 views
Skip to first unread message

stuart mcgraw

unread,
Oct 16, 2022, 6:08:21 PM10/16/22
to modwsgi
I am author of a Flask application running under Linux/Apache mod_wsgi that is experiencing intermittent, random segmentation faults.  

What is unusual is that the mod_wsgi process segfaults are occurring not at startup when mod_wsgi is loaded, or at when an incoming request accesses the app, but when the wsgi processes are just sitting there, quiescent.

From a user's point of view, everything looks fine, the mod_wsgi processes and the app respond with the right results with no sign of trouble at the client's browser.  But looking at the Apache logs shows the wsgi processes periodically segfaulting and getting restarted with no correlated incoming requests.  They die sometimes after running for a few minutes, sometimes after a few hours.  There are no incoming requests to the the wsgi app logged near the time of these crashes.

For example:
[Mon May 30 22:35:43.040387 2022] [wsgi:info] [pid 2575903:tid 139929303559104] mod_wsgi (pid=2575903): Initializing Python.
[Mon May 30 22:35:43.099053 2022] [wsgi:info] [pid 2575903:tid 139929303559104] mod_wsgi (pid=2575903): Attach interpreter ''.
[Tue May 31 01:29:06.434000 2022] [core:notice] [pid 2876203:tid 139929303559104] AH00052: child pid 2511562 exit signal Segmentation fault (11)
[Tue May 31 01:29:07.466268 2022] [wsgi:info] [pid 2605661:tid 139929303559104] mod_wsgi (pid=2605661): Initializing Python.
[Tue May 31 01:29:07.517413 2022] [wsgi:info] [pid 2605661:tid 139929303559104] mod_wsgi (pid=2605661): Attach interpreter ''.
[Tue May 31 04:14:59.405491 2022] [core:notice] [pid 2876203:tid 139929303559104] AH00052: child pid 2575903 exit signal Segmentation fault (11)

My wsgi app is still being tested so other than infrequent requests generated by me and a few other people there is very little traffic to it.  However the web server itself is handling some continuous moderate volume of traffic to other apps including to C, Python and PHP CGI apps.

What I know about the environment (if any other info would be useful I'll try and dig it up):

$ cat /etc/*release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)

Apache, mod_wsgi, python were all built from source by the site's administrator.

There are (at least) two Python's on the system:
 /usr/bin/python3 -- 3.9.2
 /usr/local/bin/python3 -- 3.10.1

Apachche/mod_wsgi is was supposedly built against python-3.10.  From
the http server header:
  Apache/2.4.54 (Unix) OpenSSL/1.1.1n mod_wsgi/4.9.4 Python/3.10 PHP/7.4.23

The Apache .conf file uses:
  WSGIDaemonProcess myapp processes=2 threads=10 \
    display-name=apache2-myapp locale=en_US.UTF-8 lang=en_US.UTF-8

$ /usr/local/apache2/bin/httpd -V
Server version: Apache/2.4.54 (Unix)
Server built:   Oct 13 2022 00:07:38
Server's Module Magic Number: 20120211:124
Server loaded:  APR 1.6.5, APR-UTIL 1.6.1, PCRE 10.36 2020-12-04
Compiled using: APR 1.6.5, APR-UTIL 1.6.1, PCRE 10.36 2020-12-04
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/usr/local/apache2"
 -D SUEXEC_BIN="/usr/local/apache2/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

$ bin/httpd -M
Loaded Modules:
 core_module (static)
 so_module (static)
 http_module (static)
 mpm_event_module (static)
 authz_core_module (shared)
 authz_host_module (shared)
 unixd_module (shared)
 dir_module (shared)
 access_compat_module (shared)
 env_module (shared)
 alias_module (shared)
 log_config_module (shared)
 ssl_module (shared)
 mime_module (shared)
 socache_shmcb_module (shared)
 setenvif_module (shared)
 cgid_module (shared)
 userdir_module (shared)
 headers_module (shared)
 rewrite_module (shared)
 autoindex_module (shared)
 negotiation_module (shared)
 dav_module (shared)
 deflate_module (shared)
 info_module (shared)
 status_module (shared)
 wsgi_module (shared)
 evasive24_module (shared)
 php7_module (shared)

Graham Dumpleton

unread,
Oct 16, 2022, 6:16:09 PM10/16/22
to mod...@googlegroups.com
What other mod_wsgi configuration is there besides the WSGIDaemonProcess directive? That alone only creates a mod_wsgi daemon process group, but does not tell mod_wsgi to use it. Thus cannot tell whether you are using embedded mode or daemon mode. The logs are also odd in that would expect to see other messages in there around when processes are created if using daemon mode, plus an indication of whether a message is being generated from an Apache child process or mod_wsgi daemon process.

So can you supply the other parts of the mod_wsgi configuration so can see if properly using daemon mode or not. Also look for logs from mod_wsgi in any per virtual host specific error log file and not just main Apache error log if you separate them. Finally, if you are only intending to use mod_wsgi daemon mode, ensure you add the directive:

    WSGIRestrictEmbedded On

outside of all VirtualHost definitions so that any attempt to intitialise/use Python in main Apache child processes is disabled.

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/e06f789f-6023-417e-8b10-1f570adc069cn%40googlegroups.com.

stuart mcgraw

unread,
Oct 21, 2022, 11:35:52 PM10/21/22
to modwsgi
My apologies for the delayed response, I thought I had my google email forwarded to my main email account but... :-(

My intent was that the processes run in daemon mode.  I had missed the info about the WSGIRestrictEmbedded directive when I went through the doc, I'll ask the admin there to add that.  The full configuration for wsgi is:

  WSGIDaemonProcess jmwsgi processes=2 threads=10 \
      display-name=apache2-jmwsgi locale=en_US.UTF-8 lang=en_US.UTF-8
  WSGIProcessGroup jmwsgi
  WSGIScriptAlias /jmwsgi /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \
      process-group=jmwsgi
    # Serve static files directly without using the app.
  Alias /jmwsgi/web/ /usr/local/apache2/jmdictdb/
  <Directory /usr/local/apache2/jmdictdb>
      DirectoryIndex disabled
      Require all granted
      </Directory>

The server has a number of virtual hosts and there were a few mod_wsgi "Loading Python" messages in the error log for one of them (for ssl) but nothing looking errorish and only a few, nowhere near the number of segfault messages:

  [Sat Oct 01 07:50:12.090697 2022] [wsgi:info] [pid 731154:tid 140442461062912] [remote *.*.*.*:40566] mod_wsgi (pid=731154, process='jmwsgi', application='www.edrdg.org|/jmwsgi'): Loading Python script file '/usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi'.

But the wsgi configuration stuff is outside all the virtutal hosts.

When the server starts, there are a couple messages in the main error log file like:

  [Sat Oct 01 06:42:26.499086 2022] [wsgi:info] [pid 731041:tid 140442622753728] mod_wsgi (pid=731041): Starting process 'jmwsgi' with uid=33, gid=33 and threads=10.
  [Sat Oct 01 06:42:26.499518 2022] [wsgi:info] [pid 731039:tid 140442622753728] mod_wsgi (pid=731039): Starting process 'jmwsgi' with uid=33, gid=33 and threads=10.

and these are followed/interleaved with the "Initializing Python" and "Attach interpreter" messages but after server startup the messages are limited to the sets of three I showed: "Initializing Python" and "Attach interpreter" followed sometime later by the Segmentation fault.

Does any of that help?

Graham Dumpleton

unread,
Oct 22, 2022, 1:48:51 AM10/22/22
to mod...@googlegroups.com
Try changing it to:

WSGIScriptAlias /jmwsgi /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \
      process-group=jmwsgi application-group=%{GLOBAL}

You are possibly using a third party Python module which isn't designed to work in Python sub interpreters. That application group value forces the main Python interpreter context to be used, which can avoid problems with crashes, or thread deadlocks when such broken modules are used.


That option on WSGIScriptAlias has same affect as WSGIAplicationGroup but is more specific. For same reason, your use of WSGIProcessGroup is redundant as process group setting on WSGIScriptAlias takes precedence.

Graham

stuart mcgraw

unread,
Oct 23, 2022, 12:51:53 PM10/23/22
to modwsgi
Thanks for that suggestion.  I passed it on to the site admin made and he made the "application-group=%{GLOBAL}" change, but unfortunately it made no difference, the segfaults are still occurring as before.  Is there anything else I can look at?  The current configuration is:

WSGIDaemonProcess jmwsgi processes=2 threads=10 \
    display-name=apache2-jmwsgi locale=en_US.UTF-8 lang=en_US.UTF-8
WSGIScriptAlias /jmwsgi /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \
    process-group=jmwsgi application-group=%{GLOBAL}

Would changing to "process=N threads=1" or "processes=1 threads=N" provide any useful info?  Apache, mod_wsgi and the other web server components were all built there (ie, they are not from distro-supplied packages.)  Are the symptoms consistent with a mismatched library or some other build configuration issue?  Or conversely, maybe they make that unlikely?

Graham Dumpleton

unread,
Oct 23, 2022, 4:12:02 PM10/23/22
to mod...@googlegroups.com
How much memory do the processes use? Maybe the system OOM process killer is killing the processes as they consume lots of memory and the system thinks it is running low. There were some potential problems introduced with Python 3.9 with how process are shutdown and that causes embedded systems to fail on shutdown.

See:


You can try setting:

WSGIDestroyInterpreter Off

as mentioned in those change notes and see if it goes away.

Other than that, if you are confident that no new requests are arriving, can only suggest you work out if there are background threads running in Python.

You can do that be adding code as described in:


and triggering a dump of running threads by touching a file in the file system.

It might also be helpful if you can work out how to have the system preserve core dumps from Apache so they can be used to extract a true process stack trace as that may give a clue.

Graham

stuart mcgraw

unread,
Oct 25, 2022, 10:12:32 PM10/25/22
to modwsgi
Again, thanks for those suggestions.

The OOM killer seems not to be an issue.  I've been told there are no signs of it in the system logs and no signs of memory problems via monitoring during nomal operations.

Nor did "WSGIDestroyInterpreter Off" have any effect, the segfaults are still occurring after that was added and Apache restarted.

My understanding of how mod_wsgi works is pretty sketchy.  IIUC you are saying that the mod_wsgi processes are sitting there, waiting on a select() call or the like, to receive a request from the mod_wsgi code within Apache; and in that state they cannot simply spontaneously crash -- it must be that either that the process received request from Apache (via the mod_wsgi module) or there is some independent thread running in the Python part of the mod_wsgi process (which is running my wsgi app) that is causing the crash?

I based my claim that there were no requests coincidental with the segfaults based on the lack of log messages within a second or two for some of the segfaults.  (Its a moderately busy server so of course there were also some close in time but for seemingly unrelated pages: eg, python, php or c cgi, or html.)  Is it possible that the mod_wsgi processes are getting woken up by something that does not produce an apache access log entry?

I'm still working on the python thread hypothesis (this is a production server so changes aren't easy.)

Graham Dumpleton

unread,
Oct 25, 2022, 10:35:07 PM10/25/22
to mod...@googlegroups.com
I know you said you were using mod_wsgi/4.9.4, but are you absolutely sure?  Apache/2.4.54 made a breaking change by changing the default for LimitRequestBody directive, which would cause mod_wsgi daemon process to crash when there were sent large request bodies over 1Gi. This was fixed in version 4.9.4, but am wondering whether your production system has older version than your development systems use and you just aren't aware of that.

https://modwsgi.readthedocs.io/en/master/release-notes/version-4.9.4.html#bugs-fixed

As to back ground threads, mod_wsgi has a couple of background threads which check for idle activity, deadlocks and things, but they touch so little they have never caused issues in the past. Beyond that, the request handler threads themselves should be stuck on a select loop if no requests are happening.

stuart mcgraw

unread,
Oct 26, 2022, 12:33:26 AM10/26/22
to modwsgi
I didn't compile mod_wsgi myself so I can't say 100% but the person who did said so and there is a source directory on the machine named mod_wsgi-4.9.4 with a mod_wsgi.so file whose sha1 checksum matches that of the file in the apache modules/ directory, so I'd say I'm 99.9% sure.

But the chances of >1Gi requests being made seem pretty small.  The urls haven't been publicized, there are only a handful of known users accessing the urls infrequently as testers and nothing in the application would generate requests of that magnitude.

This may be out of scope for you, but are you aware of any (reasonably normal) circumstances under which a mod_wsgi process could receive a request that wasn't logged by Apache?  Or perhaps I could modify the mod_wsgi source code to print a message to a file when a request was received (which I could then correlate with the Apache logs to answer the question.)  Because usage is very light and this is only for short term debugging, I don't think locking or anything fancy would be needed?

And I am still wondering about library mismatches or conflicts since Apache, Python, mod_wsgi and C-based Python modules (eg psyocopg2) used by the app were all built from source.  It is possible that some version mismatch there causes some memory corruption that is later manifest when one of the mod_wsgi housekeeping threads runs?  I would like if possible to rule this out or at least put at the bottom of the list. 

Graham Dumpleton

unread,
Oct 26, 2022, 1:05:28 AM10/26/22
to mod...@googlegroups.com
The only way I can think of that you may get a request which wasn't logged, is if an internal Apache request was triggered via an internal redirect from another Apache module. There still has to be an original request, but it would be logged as different request URL to where it got internally redirected.

Since you are using mod_wsgi daemon mode you can likely see better evidence of all requests being handled if turn on verbose debugging mode, but would be quite noisy.

    LogLevel debug
    WSGIVerboseDebugging On

Graham

stuart mcgraw

unread,
Oct 26, 2022, 1:34:19 AM10/26/22
to modwsgi
I was grepping all the log files for any messages within a minute or two before the segfaults so if a request was logged anywhere I should have seen it. 

I'll mention the LogLevel Debug setting to them, but there were complaints before that LogLevel Info was too noisy so I'm not sure that will fly.

I'll look into the possibility of errant app threads and post back if anything turns up.  Thanks very much for your help with this.

Graham Dumpleton

unread,
Oct 26, 2022, 2:10:38 AM10/26/22
to mod...@googlegroups.com
The LogLevel can be set just in a VirtualHost context if is under separate host. If you are then using separate log files for differential VirtualHost it should at least be semi segmented from everything else.

On 26 Oct 2022, at 4:34 pm, stuart mcgraw <smcg...@gmail.com> wrote:



stuart mcgraw

unread,
Oct 30, 2022, 4:38:01 PM10/30/22
to modwsgi
Thanks for the vhost suggestion.  I hadn't thought of that but it turned out we didn't need to.

After more testing it turns out that a .wsgi script with simple hello world script per


is exhibiting the problem: with no incoming requests at all the mod_wsgi process after sitting there for anywhere from a few minutes to a few hours, dies with a segmentation fault.  Any idea what else I could look at?

Graham Dumpleton

unread,
Oct 30, 2022, 4:52:57 PM10/30/22
to mod...@googlegroups.com
Enabling capture of core dump files and then using gdb to work out where C level stack trace is for when process is crashing is all I can think of.

stuart mcgraw

unread,
Oct 30, 2022, 5:05:56 PM10/30/22
to modwsgi
I was afraid of that.  I'll look into that further.  Thanks again for your help.

stuart mcgraw

unread,
Nov 3, 2022, 11:08:25 AM11/3/22
to modwsgi
Just wanted to add some closure... turned out that mod_wsgi had nothing to do with the segmentation faults.  After removing all traces of mod_wsgi they were still occurring and it was the Apache child processes segfaulting, not the mod_wsgi ones.  (We did not notice that initially because the failed process was gone and replaced by the time we noticed a segfault and we were focused on mod_wsgi, that being the last change made; live and learn!)
Reply all
Reply to author
Forward
0 new messages