apache2 mpm_event mod_pagespeed [core:notice] [pid 394] AH00060: seg fault or similar nasty error detected in the parent process

Luke Kenneth Casson Leighton

unread,

Aug 13, 2017, 2:48:23 AM8/13/17

to mod-pagesp...@googlegroups.com

ok so i decided to try mpm_event with a (stable) mpm_prefork setup,
and it segfaulted within a few hours (leaving the customer's website
down overnight). mpm_worker had also previously segfaulted so was off
the table, but i thought that mpm_event would be okay to try.

there is nothing about this setup which has anything thread-unsafe.
mod-php is *not* being used: php is instead set up as a (rather
unusual) cgi-bin arrangement that was *not* accessed overnight (for
the back office admin). the main web service is a python 2.7 wsgi
(mod_fcgi) bare-bones application with *no framework of any kind* and
absolutely *no threading whatsoever*. sql data access is performed
using python-mysqldb *with no threading*.

bottom line of the previous paragraph: there is no threading of any
kind in the main web served pages.

in previous discussions i mentioned i was going to try setting the
number of mod_pagespeed threads down at 1: that's been done. so the
segfault occurred whilst the number of threads (and expensive threads)
is set to ONE.

main apache2 debian version: 2.4.10-10+deb8u10
libapache2-mod-fcgid 1:2.3.9-1+b1
libapache2-mod-wsgi 4.3.0-1
mod-pagespeed-stable 1.12.34.2-r0 i386

experimentation-wise... i can't risk trying mpm_event or mpm_worker on
this live system any more it was extremely bad that the site was
completely down overnight.

i *might* be able to help you replicate the exact setup so that you
can investigate, i own the source code of the python application
running the site.

l.

pagespeed.conf2

Otto van der Schaaf

unread,

Aug 13, 2017, 2:47:42 PM8/13/17

to mod-pagesp...@googlegroups.com

You could try adding this to mod_pagespeed's configuration:

ModPagespeedInstallCrashHandler on

When you do that, and the errors happens again, chances are a stack trace will be written to the logs (even when the cause isn't related to mod_pagespeed).

Otto

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAPweEDzOHr_G-LztAnJJvVg1pbbpV9HSZjoAni4M9a8TY8Jcmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Luke Kenneth Casson Leighton

unread,

Aug 13, 2017, 7:42:15 PM8/13/17

to mod-pagesp...@googlegroups.com

On Sun, Aug 13, 2017 at 7:47 PM, Otto van der Schaaf <osc...@we-amp.com> wrote:
> You could try adding this to mod_pagespeed's configuration:
>
> ModPagespeedInstallCrashHandler on
>
> When you do that, and the errors happens again, chances are a stack trace
> will be written to the logs (even when the cause isn't related to
> mod_pagespeed).

ha! nice! *thinks*.... aaaand.... if i write a little program which
counts the number of apache2 processes and if zero restarts it, i can
get away with running that live.

... don't tell my client :)

l.

Luke Kenneth Casson Leighton

unread,

Aug 13, 2017, 7:52:50 PM8/13/17

to mod-pagesp...@googlegroups.com

-----
#!/bin/sh
ps ax | grep /usr/sbin/apache2 | grep -v grep

-----
#!/usr/bin/env python

import subprocess
import os
from time import sleep

while True:
try:
x = subprocess.check_output(["/usr/local/bin/apacheprocesses.sh"])
except subprocess.CalledProcessError:
x = ''
print x
if not x:
os.system("echo /etc/init.d/apache2 restart")
sleep(5)

let's hope _that_ doesn't crash...
:)

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

Luke Kenneth Casson Leighton

unread,

Aug 14, 2017, 5:46:03 AM8/14/17

to mod-pagesp...@googlegroups.com

hiya otto,

ok that's... interesting. the mistake i made yesterday about
mod_status interpretation caused me to add these nokeepalive
exceptions (many of them irrelevant, they eventually worked with the
RemoteAddr):

<IfModule mod_headers.c>
Header set Connection Keep-Alive # added already
SetEnvIf Host "localhost" nokeepalive # added yesterday
SetEnvIf Host "127.0.0.1" nokeepalive # etc
SetEnvIf Host "::1" nokeepalive # etc
SetEnvIf RemoteAddr "::1" nokeepalive
SetEnvIf RemoteAddr "127.0.0.1" nokeepalive
</IfModule>

and... it could be a total coincidence, but.. no mpm_event-inspired
segfault so far after 10 hours of operation. the two segfaults
occurred only within a matter of a couple of hours, so this is... a
particularly interesting but not-yet-statistically-significant
datapoint.

i'll leave it running for several days, to see if it's going to stay
up during peak times.

l.

Luke Kenneth Casson Leighton

unread,

Aug 16, 2017, 2:42:01 AM8/16/17

to mod-pagesp...@googlegroups.com

On Mon, Aug 14, 2017 at 10:45 AM, Luke Kenneth Casson Leighton
<lk...@lkcl.net> wrote:

> Header set Connection Keep-Alive # added already

> SetEnvIf RemoteAddr "::1" nokeepalive
> SetEnvIf RemoteAddr "127.0.0.1" nokeepalive

> i'll leave it running for several days, to see if it's going to stay
> up during peak times.

Server uptime: 2 days 6 hours 39 minutes 27 seconds

hmmmm.... still going strong. survived at least two logrotate
reloads... this is still with the loopback keepalives switched off.

btw i did notice that mod_pagespeed sends HTTP/1.0 requests.

https://stackoverflow.com/questions/10723812/if-a-http-1-0-client-requests-connection-keep-alive-will-it-understand-chunked

... mod_pagespeed wouldn't _happen_ to be trying to keep the HTTP/1.0
requests open, when it connects to the server on loopback, would it?

l.

Luke Kenneth Casson Leighton

unread,

Aug 19, 2017, 7:13:23 AM8/19/17

to mod-pagesp...@googlegroups.com

okaay, finally got a segfault, after 3 days uptime, somewhere around
6:30am this morning. it coincided with a MASSIVE spike in the average
response time (webmaster tools is telling me that the average load
time skyrocketed to NINTEEN seconds). prior to that it had been doing
extremely well - at around 1.3 seconds average page load time.

interestingly, this is occurring at the exact same time every day.
ah! it's down to the logrotate, which is set for 6:25 am.

sooo... whenever the logrotate HUP occurs (which is done with an
/etc/init.d/apache reload) apache2 instead segfaults. it's
particularly noteworthy that it's *not* immediate: it can take several
days of HUP'ing to get the segfault

[Sat Aug 19 06:25:37.746265 2017] [mpm_event:notice] [pid 21345:tid
3074504512] AH00493: SIGUSR1 received. Doing graceful restart
[Sat Aug 19 06:25:39.647852 2017] [core:notice] [pid 21345] AH00060:
seg fault or similar nasty error detected in the parent process

BUT...

even *more* interesting: there's absolutely no sign of any crash
handler output. i've checked all error log files: nothing.

otto i'm not really going to be able to do this repeatedly, it's too
risky for a live server to keep using mpm_event. i'm giving serious
consideration to switching to nginx (i have it all set up). i hope
the above is enough for the team to create a repro case: the segfault
occurs when SIGUSR1 (for a log rotation) is sent to apache2, and
mpm_event is in use on a thread-safe python wsgi and cgi based
application.

l.

Reply all

Reply to author

Forward