okaay, finally got a segfault, after 3 days uptime, somewhere around
6:30am this morning. it coincided with a MASSIVE spike in the average
response time (webmaster tools is telling me that the average load
time skyrocketed to NINTEEN seconds). prior to that it had been doing
extremely well - at around 1.3 seconds average page load time.
interestingly, this is occurring at the exact same time every day.
ah! it's down to the logrotate, which is set for 6:25 am.
sooo... whenever the logrotate HUP occurs (which is done with an
/etc/init.d/apache reload) apache2 instead segfaults. it's
particularly noteworthy that it's *not* immediate: it can take several
days of HUP'ing to get the segfault
[Sat Aug 19 06:25:37.746265 2017] [mpm_event:notice] [pid 21345:tid
3074504512] AH00493: SIGUSR1 received. Doing graceful restart
[Sat Aug 19 06:25:39.647852 2017] [core:notice] [pid 21345] AH00060:
seg fault or similar nasty error detected in the parent process
BUT...
even *more* interesting: there's absolutely no sign of any crash
handler output. i've checked all error log files: nothing.
otto i'm not really going to be able to do this repeatedly, it's too
risky for a live server to keep using mpm_event. i'm giving serious
consideration to switching to nginx (i have it all set up). i hope
the above is enough for the team to create a repro case: the segfault
occurs when SIGUSR1 (for a log rotation) is sent to apache2, and
mpm_event is in use on a thread-safe python wsgi and cgi based
application.
l.