Random segmentation fault errors

258 views
Skip to first unread message

inaf

unread,
Jan 6, 2010, 11:02:53 PM1/6/10
to modwsgi
I think I read pretty much whatever there is to read about these
errors in this group but still cannot understand what is causing this
problem in my case..

Every once in a while (very very rarely, compared to the amount of
traffic being served) I get the following error:

[Wed Jan 06 22:38:31 2010] [warn] (14)Bad address: mod_wsgi
(pid=18735): Unable to stat target WSGI script '(null)'.
[Wed Jan 06 22:38:31 2010] [alert] (14)Bad address: mod_wsgi
(pid=18735): Request origin could not be validated.
[Wed Jan 06 22:38:31 2010] [error] [client 3.181.52.54] Premature end
of script headers: <script_name>.wsgi
[Wed Jan 06 22:38:32 2010] [notice] child pid 2710 exit signal
Segmentation fault (11)

I have siteminder agent running as well and I notice that bunch of seg
fault errors associated to that follows along with malloc errors...

Here's my configuration:

Apache/2.0.59 (Unix) mod_jk/1.2.18 mod_wsgi/2.6 Python/2.5.4
configured

WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess wsgi processes=1 threads=1 display-name=%{GROUP}
WSGIProcessGroup wsgi

Any help and insight would be much appreciated..

-Cem

Graham Dumpleton

unread,
Jan 7, 2010, 5:51:57 AM1/7/10
to mod...@googlegroups.com
2010/1/7 inaf <cem.e...@gmail.com>:

> I think I read pretty much whatever there is to read about these
> errors in this group but still cannot understand what is causing this
> problem in my case..
>
> Every once in a while (very very rarely, compared to the amount of
> traffic being served) I get the following error:

Your selection of error log messages is confusing.

> [Wed Jan 06 22:38:31 2010] [warn] (14)Bad address: mod_wsgi
> (pid=18735): Unable to stat target WSGI script '(null)'.
> [Wed Jan 06 22:38:31 2010] [alert] (14)Bad address: mod_wsgi
> (pid=18735): Request origin could not be validated.

Both the above error messages are generated within mod_wsgi daemon
mode process.

The first error message, because it says '(null)' indicates that
SCRIPT_FILENAME was missing in data passed to mod_wsgi daemon process
however I don't understand how that could occur at this point.

The second error message indicates that some mod_wsgi validation to
protect against malicious attempts to execute arbitrary script as user
that mod_wsgi daemon process runs has failed. This can be a side
effect of corruption indicated from message above, or technically
could indicate an attempt by external code to connect to mod_wsgi
listener sockets directly and try and fake up requests for execution.

> [Wed Jan 06 22:38:31 2010] [error] [client 3.181.52.54] Premature end
> of script headers: <script_name>.wsgi

This though now is a likely indicator that mod_wsgi daemon process crashed.

> [Wed Jan 06 22:38:32 2010] [notice] child pid 2710 exit signal
> Segmentation fault (11)

Problem now is that the pid of the process that crashed doesn't match
that in which original error messages occurred, thus why it is a bit
confusing.

> I have siteminder agent running as well and I notice that bunch of seg
> fault errors associated to that follows along with malloc errors...
>
> Here's my configuration:
>
>  Apache/2.0.59 (Unix) mod_jk/1.2.18 mod_wsgi/2.6 Python/2.5.4
> configured
>
> WSGIApplicationGroup %{GLOBAL}
> WSGIDaemonProcess wsgi processes=1 threads=1 display-name=%{GROUP}

You don't need 'processes=1' as will default to single process and
using 'processes=1' instead of allow it to default has subtle side
affect of setting 'wsgi.multiprocess' to True. You should only use
'processes=1' if load balancing across many Apache instances where
each has only single process in daemon process group for that
application.

> WSGIProcessGroup wsgi
>
> Any help and insight would be much appreciated..

I can only suggest trying mod_wsgi 3.1.

Other than that don't really have an answer. It looks like memory
corruption but whether the source is mod_wsgi, another Apache module
or a Python C extension module, don't know.

What third party Python modules do you use which may have a C
extension module component?

Anyway, will have a think about it some more and see if can come up
with any suggestions of things to look for or try. A snippet of log
file covering a longer amount of time may be a good point.

Graham

inaf

unread,
Jan 7, 2010, 12:16:14 PM1/7/10
to modwsgi
Graham,

Thank you very much for your quick response and help... very much
appreciated.. see below for my replies..

>
> > I think I read pretty much whatever there is to read about these
> > errors in this group but still cannot understand what is causing this
> > problem in my case..
>
> > Every once in a while (very very rarely, compared to the amount of
> > traffic being served) I get the following error:
>
> Your selection of error log messages is confusing.

I tried to select the lines that show the specific errors that I am
getting from mod_wsgi daemon so that it is not confusing.. I guess I
was wrong :)

>
> > [Wed Jan 06 22:38:31 2010] [warn] (14)Bad address: mod_wsgi
> > (pid=18735): Unable to stat target WSGI script '(null)'.
> > [Wed Jan 06 22:38:31 2010] [alert] (14)Bad address: mod_wsgi
> > (pid=18735): Request origin could not be validated.
>
> Both the above error messages are generated within mod_wsgi daemon
> mode process.
>
> The first error message, because it says '(null)' indicates that
> SCRIPT_FILENAME was missing in data passed to mod_wsgi daemon process
> however I don't understand how that could occur at this point.
>
> The second error message indicates that some mod_wsgi validation to
> protect against malicious attempts to execute arbitrary script as user
> that mod_wsgi daemon process runs has failed. This can be a side
> effect of corruption indicated from message above, or technically
> could indicate an attempt by external code to connect to mod_wsgi
> listener sockets directly and try and fake up requests for execution.
>
> > [Wed Jan 06 22:38:31 2010] [error] [client 3.181.52.54] Premature end
> > of script headers: <script_name>.wsgi
>
> This though now is a likely indicator that mod_wsgi daemon process crashed.
>
> > [Wed Jan 06 22:38:32 2010] [notice] child pid 2710 exit signal
> > Segmentation fault (11)
>
> Problem now is that the pid of the process that crashed doesn't match
> that in which original error messages occurred, thus why it is a bit
> confusing.

Yes, I see it every time the error occurs.. the pids are always
different..

>
> > I have siteminder agent running as well and I notice that bunch of seg
> > fault errors associated to that follows along with malloc errors...
>
> > Here's my configuration:
>
> >  Apache/2.0.59 (Unix) mod_jk/1.2.18 mod_wsgi/2.6 Python/2.5.4
> > configured
>
> > WSGIApplicationGroup %{GLOBAL}
> > WSGIDaemonProcess wsgi processes=1 threads=1 display-name=%{GROUP}
>
> You don't need 'processes=1' as will default to single process and
> using 'processes=1' instead of allow it to default has subtle side
> affect of setting 'wsgi.multiprocess' to True. You should only use
> 'processes=1' if load balancing across many Apache instances where
> each has only single process in daemon process group for that
> application.

I have 4 apaches running on the box with the same wsgi configuration..
the box has 4 cores hence 4 apaches.. I have only 3 simple wsgi
scripts running.. one of them is used for testing, another one is
actively used in production and the third one is only hit by a back
end script to refresh data in a singleton object, which is used by
others for only read.. so I guess it is ok to keep processes=1?

>
> > WSGIProcessGroup wsgi
>
> > Any help and insight would be much appreciated..
>
> I can only suggest trying mod_wsgi 3.1.

Just did.. monitoring to see if I get any errors..

>
> Other than that don't really have an answer. It looks like memory
> corruption but whether the source is mod_wsgi, another Apache module
> or a Python C extension module, don't know.
>
> What third party Python modules do you use which may have a C
> extension module component?
>
> Anyway, will have a think about it some more and see if can come up
> with any suggestions of things to look for or try. A snippet of log
> file covering a longer amount of time may be a good point.
>
> Graham

Another question I had was whether slow network connections might
cause this issue.. what are your thoughts on that?

-Cem

inaf

unread,
Jan 7, 2010, 2:07:31 PM1/7/10
to modwsgi
> -Cem- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

No luck.. got the following error just now..

[Thu Jan 07 13:57:05 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 13:57:05 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[Thu Jan 07 13:57:06 2010] [notice] child pid 31237 exit signal
Segmentation fault (11)

Graham Dumpleton

unread,
Jan 7, 2010, 4:35:49 PM1/7/10
to mod...@googlegroups.com
2010/1/8 inaf <cem.e...@gmail.com>:

Odd. Can you post a larger snippet of error log from around when event occurs.

>> > I have siteminder agent running as well and I notice that bunch of seg
>> > fault errors associated to that follows along with malloc errors...
>>
>> > Here's my configuration:
>>
>> >  Apache/2.0.59 (Unix) mod_jk/1.2.18 mod_wsgi/2.6 Python/2.5.4
>> > configured
>>
>> > WSGIApplicationGroup %{GLOBAL}
>> > WSGIDaemonProcess wsgi processes=1 threads=1 display-name=%{GROUP}
>>
>> You don't need 'processes=1' as will default to single process and
>> using 'processes=1' instead of allow it to default has subtle side
>> affect of setting 'wsgi.multiprocess' to True. You should only use
>> 'processes=1' if load balancing across many Apache instances where
>> each has only single process in daemon process group for that
>> application.
>
> I have 4 apaches running on the box with the same wsgi configuration..
> the box has 4 cores hence 4 apaches.. I have only 3 simple wsgi
> scripts running.. one of them is used for testing, another one is
> actively used in production and the third one is only hit by a back
> end script to refresh data in a singleton object, which is used by
> others for only read.. so I guess it is ok to keep processes=1?

Just because you have four cores doesn't mean you need to run multiple
Apache instances. Apache is already multiprocess and you can within
one mod_wsgi daemon process group specify multiple processes as well.
So, running multiple Apache instances on same box with same
configuration is not necessary to make the most of those cores. Have a
read of:

http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html

But then, if running multiple Apache instances so you can restart each
without interfering with the others, then that is a different issue.
Even then, ensure you read:

http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode

Because for mod_wsgi hosted Python applications at least, there are
various ways you can trigger reloading of application without needing
to restart whole of Apache.

>> > WSGIProcessGroup wsgi
>>
>> > Any help and insight would be much appreciated..
>>
>> I can only suggest trying mod_wsgi 3.1.
>
> Just did.. monitoring to see if I get any errors..
>
>>
>> Other than that don't really have an answer. It looks like memory
>> corruption but whether the source is mod_wsgi, another Apache module
>> or a Python C extension module, don't know.
>>
>> What third party Python modules do you use which may have a C
>> extension module component?
>>
>> Anyway, will have a think about it some more and see if can come up
>> with any suggestions of things to look for or try. A snippet of log
>> file covering a longer amount of time may be a good point.
>>
>> Graham
>
> Another question I had was whether slow network connections might
> cause this issue.. what are your thoughts on that?

Not a process crash as you are seeing.

Graham

inaf

unread,
Jan 7, 2010, 6:41:36 PM1/7/10
to modwsgi

On Jan 7, 4:35 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2010/1/8 inaf <cem.ezbe...@gmail.com>:


Regarding the pids not matching, I found out that the seg fault is for
siteminder agent pid.. but there is an interesting coincidence where
mod_wsgi daemon throws these errors and shortly after the seg fault
comes for siteminder agent.. I also confirmed that it is not always
the case.. so mod_wsgi errors are not always followed by a seg fault
error..

As far as the error logs are concerned, these are pretty much the only
ones along with siteminder lines. Please see below:


After upgrade:

(nothing before these lines for a while)
[Thu Jan 07 13:57:05 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 13:57:05 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[Thu Jan 07 13:57:06 2010] [notice] child pid 31237 exit signal
Segmentation fault (11)
(nothing after the lines above for a while)

..........


[07/Jan/2010:14:46:28] [Information] SiteMinder Agent
SiteMinder agent is enabled.
[07/Jan/2010:14:46:28] [Information] SiteMinder Agent
Configuration file path:
'/appl/apache1/conf/WebAgent.conf'.
[Thu Jan 07 14:46:28 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 14:46:28 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676200496 using key 0x6b000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550469672 using key 0x6c000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 681279577 using key 0xc8000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 681279577 using key 0xc8000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676397111 using key 0x66000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676429880 using key 0x67000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550633517 using key 0x65000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676462649 using key 0x68000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676495418 using key 0x69000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676266034 using key 0x32000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550502441 using key 0x61000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676331573 using key 0x33000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550567979 using key 0x62000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676364342 using key 0x34000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550600748 using key 0x63000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSem::getSem] Attached to semaphore 676298804 using key 0x6a000dd5
[07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
[CSmSharedSegment::smalloc] Attached to shared memory segment
550535210 using key 0x69000dd5
[07/Jan/2010:14:46:29] [Information] SiteMinder Agent
SiteMinder agent is running.
[Thu Jan 07 14:49:06 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 14:49:06 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

........

[Thu Jan 07 15:00:03 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 15:00:03 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[Thu Jan 07 15:00:08 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 15:00:08 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[Thu Jan 07 15:00:45 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 15:00:45 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

[Thu Jan 07 15:00:50 2010] [alert] mod_wsgi (pid=21148): Request


origin could not be validated.

[Thu Jan 07 15:00:50 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

.....

[Thu Jan 07 16:33:42 2010] [alert] mod_wsgi (pid=14480): Request


origin could not be validated.

[Thu Jan 07 16:33:42 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi


.........


[Thu Jan 07 17:15:03 2010] [alert] mod_wsgi (pid=27655): Request


origin could not be validated.

[Thu Jan 07 17:15:03 2010] [error] [client 3.49.42.185] Premature end


of script headers: <script_name>.wsgi

*** glibc detected *** malloc(): memory corruption: 0x08435ab8 ***

I know it looks like I am cherry picking these lines but there is
nothing above or below in the logs other than the first snippet where
siteminder specific lines.

Log level is debug now so will see what can be captured..

>  http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws...


>
> But then, if running multiple Apache instances so you can restart each
> without interfering with the others, then that is a different issue.
> Even then, ensure you read:
>
>  http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode
>
> Because for mod_wsgi hosted Python applications at least, there are
> various ways you can trigger reloading of application without needing
> to restart whole of Apache.
>
>

The reason for multiple apaches is to be able to isolate issues.. it
is a long story :)

I followed your advice on processes and changed the configuration to
the following:

WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess wsgi threads=1 display-name=%{GROUP}
WSGIProcessGroup wsgi

Did not seem to help (not sure if it was expected to help)..


>
>
>
> >> > WSGIProcessGroup wsgi
>
> >> > Any help and insight would be much appreciated..
>
> >> I can only suggest trying mod_wsgi 3.1.
>
> > Just did.. monitoring to see if I get any errors..
>
> >> Other than that don't really have an answer. It looks like memory
> >> corruption but whether the source is mod_wsgi, another Apache module
> >> or a Python C extension module, don't know.
>
> >> What third party Python modules do you use which may have a C
> >> extension module component?
>
> >> Anyway, will have a think about it some more and see if can come up
> >> with any suggestions of things to look for or try. A snippet of log
> >> file covering a longer amount of time may be a good point.
>
> >> Graham
>
> > Another question I had was whether slow network connections might
> > cause this issue.. what are your thoughts on that?
>
> Not a process crash as you are seeing.
>

> Graham- Hide quoted text -


>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Thanks,
-Cem

inaf

unread,
Jan 7, 2010, 6:50:37 PM1/7/10
to modwsgi

> The reason for multiple apaches is to be able to isolate ...
>
> read more »- Hide quoted text -


>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -


After enabling debug, following line seem to stand out but not sure if
they are related to this.. it seems like Apache stars new threads and
Python gets initizalized by mod_wsgi right after... there are many
occurences of this in the logs.. why would Python be initialized and
interpreter would be attached?:

[07/Jan/2010:17:47:23] [Information] SiteMinder Agent
SiteMinder agent is running.
[Thu Jan 07 17:47:23 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:23 2010] [debug] mod_deflate.c(468): [client
3.49.42.185] Zlib: Compressed 0 to 2 : URL
[Thu Jan 07 17:47:23 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:23 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:23 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:23 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:23 2010] [debug] mod_deflate.c(468): [client
3.49.42.185] Zlib: Compressed 0 to 2 : URL
[Thu Jan 07 17:47:24 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:24 2010] [error] DL not available for SSO ID:
501798069
[Thu Jan 07 17:47:24 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[Thu Jan 07 17:47:24 2010] [debug] mod_headers.c(527): headers:
ap_headers_output_filter()
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent
SiteMinder agent is enabled.
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent


Configuration file path:
'/appl/apache1/conf/WebAgent.conf'.

[Thu Jan 07 17:47:24 2010] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 8 children,
there are 11 idle, and 12 total children
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2711): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2712): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2713): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2714): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2715): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2717): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2718): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2711): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2712): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2714): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2716): Initializing
Python.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2718): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2715): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2717): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2713): Attach
interpreter ''.
[Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2716): Attach
interpreter ''.
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent
SiteMinder agent is enabled.
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent


Configuration file path:
'/appl/apache1/conf/WebAgent.conf'.

[07/Jan/2010:17:47:24] [Information] SiteMinder Agent
SiteMinder agent is enabled.
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent


Configuration file path:
'/appl/apache1/conf/WebAgent.conf'.

[07/Jan/2010:17:47:24] [Information] SiteMinder Agent
SiteMinder agent is enabled.
[07/Jan/2010:17:47:24] [Information] SiteMinder Agent


Configuration file path:
'/appl/apache1/conf/WebAgent.conf'.

Thanks,
-Cem

Graham Dumpleton

unread,
Jan 10, 2010, 8:01:33 PM1/10/10
to mod...@googlegroups.com
2010/1/8 inaf <cem.e...@gmail.com>:

> Regarding the pids not matching, I found out that the seg fault is for
> siteminder agent pid.. but there is an interesting coincidence where
> mod_wsgi daemon throws these errors and shortly after the seg fault
> comes for siteminder agent..  I also confirmed that it is not always
> the case.. so mod_wsgi errors are not always followed by a seg fault
> error..

For this site minder, you have a mod_siteminder module loaded into
Apache. Correct?

This still stinks of memory corruption. Specifically, some Apache
module is keeping a pointer to memory contained in an Apache memory
pool after the memory pool is released. That memory area is then being
given out mod_wsgi as per request scratch space. The other Apache
module is then scribbling in the memory.

What doesn't make sense is why it is always replacing it with
'<script_name>'. Are you doing that to protect what the actual paths
are or is that really what the error log files say?

If it is always that, then perhaps not memory corruption but some
Apache module deliberately updating r->filename in Apache request
structure and replacing it with a new string encompassing that string.

Can you clarify whether you are modifying the logs or whether that is
the actual value.

Graham

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>
>
>
>

Graham Dumpleton

unread,
Jan 10, 2010, 8:06:18 PM1/10/10
to mod...@googlegroups.com
2010/1/8 inaf <cem.e...@gmail.com>:

> After enabling debug, following line seem to stand out but not sure if
> they are related to this.. it seems like Apache stars new threads and
> Python gets initizalized by mod_wsgi right after... there are many
> occurences of this in the logs.. why would Python be initialized and
> interpreter would be attached?:
>
>
> [Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2718): Initializing
> Python.
> [Thu Jan 07 17:47:24 2010] [info] mod_wsgi (pid=2711): Attach
> interpreter ''.

Don't worry about that, it is a mod_wsgi message for my debugging
purposes to simply say that have registered the main Python
interpreter which was created in a table of interpreters for later
look up.

So, I attached to the already existing interpreter, as opposed to
later where it may say that new sub interpreter had to be created
instead as one by the required name didn't exist.

Graham

inaf

unread,
Jan 11, 2010, 12:01:52 AM1/11/10
to modwsgi

On Jan 10, 8:01 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2010/1/8 inaf <cem.ezbe...@gmail.com>:
>


> > Regarding the pids not matching, I found out that the seg fault is for
> > siteminder agent pid.. but there is an interesting coincidence where
> > mod_wsgi daemon throws these errors and shortly after the seg fault
> > comes for siteminder agent..  I also confirmed that it is not always
> > the case.. so mod_wsgi errors are not always followed by a seg fault
> > error..
>
> For this site minder, you have a mod_siteminder module loaded into
> Apache. Correct?

I believe so.. this is the line in the config:

LoadModule sm_module "/usr/netegrity/siteminder6qmr5/webagent/bin/
libmod_sm20.so"

>
> This still stinks of memory corruption. Specifically, some Apache
> module is keeping a pointer to memory contained in an Apache memory
> pool after the memory pool is released. That memory area is then being
> given out mod_wsgi as per request scratch space. The other Apache
> module is then scribbling in the memory.
>
> What doesn't make sense is why it is always replacing it with
> '<script_name>'. Are you doing that to protect what the actual paths
> are or is that really what the error log files say?
>
> If it is always that, then perhaps not memory corruption but some
> Apache module deliberately updating r->filename in Apache request
> structure and replacing it with a new string encompassing that string.
>
> Can you clarify whether you are modifying the logs or whether that is
> the actual value.

I am replacing the script name when I paste the lines to protect the
actual paths.

As I mentioned in one of the previous post, this script accesses a
singleton object (well not quite a singleton object but close) where
it reads from a class' variables (not sure if synchronization is
required but I even tried that with lock.. still no luck). The problem
is that this script is SSI'ed on a page 3 times with different
parameters so it does not entirely break the page but these 3 places
where it displays content, I think we are getting SSI error messages
however I have not experienced it myself (I took out the "unable to
include .." SSI errors from the log lines I pasted earlier not to
confuse you as I don't think it has anything to do with it). We are
putting a script in place to hit the page every so often and scrape
the content hoping that in one of those runs, we will get this error
and able to see what a user would see. The reason for doing this is
that I was able to reproduce the error with another WSGI script that
refreshes these class variables every once in a while via a job. I
continously hit this URL in my browser and got the error in the logs
but response was not always 500. It was able to get the result back
successfully in some cases. So I want to see if the users really see a
problem or somehow Apache tries to SSI the script again and is able to
do so.

The other approach I have in mind is to see what is going through the
unix socket between mod_wsgi daemon and apache. I am not quite sure if
it is possible but still investigating.

And another approach is to disable SiteMinder agent and try to test it
in a pre-production environment with a load generator to see if we get
these errors or not. This would help out rulling out (hopefully)
SiteMinder.


Nonetheless, it is quite a strange problem and having these errors in
the logs is not good since this means we are not defect free. I really
do not want to throw away what I have implemented as it takes
advantage of unique strengths of Python and is unbelievable fast and
not resource intensive at all thanks to mod_wsgi. I really appreciate
your work and have so much respect for what you are doing. Thanks once
again for making my idea a reality.. well almost there :)

Regards,
-Cem

> ...
>
> read more »- Hide quoted text -

Graham Dumpleton

unread,
Jan 11, 2010, 5:56:08 PM1/11/10
to mod...@googlegroups.com
2010/1/11 inaf <cem.e...@gmail.com>:

> As I mentioned in one of the previous post, this script accesses a
> singleton object (well not quite a singleton object but close) where
> it reads from a class' variables (not sure if synchronization is
> required but I even tried that with lock.. still no luck). The problem
> is that this script is SSI'ed on a page 3 times with different
> parameters so it does not entirely break the page but these 3 places
> where it displays content, I think we are getting SSI error messages
> however I have not experienced it myself (I took out the "unable to
> include .." SSI errors from the log lines I pasted earlier not to
> confuse you as I don't think it has anything to do with it). We are
> putting a script in place to hit the page every so often and scrape
> the content hoping that in one of those runs, we will get this error
> and able to see what a user would see. The reason for doing this is
> that I was able to reproduce the error with another WSGI script that
> refreshes these class variables every once in a while via a job. I
> continously hit this URL in my browser and got the error in the logs
> but response was not always 500. It was able to get the result back
> successfully in some cases. So I want to see if the users really see a
> problem or somehow Apache tries to SSI the script again and is able to
> do so.

Are you saying you are using Apache server side includes in static
files to virtual include a URL which corresponds to the WSGI script
file containing this code? Or when you say SSI are you meaning it in
the context of some specific Python web templating system?

Is any of this stuff using a custom Python C extension module?

> The other approach I have in mind is to see what is going through the
> unix socket between mod_wsgi daemon and apache. I am not quite sure if
> it is possible but still investigating.

Only really practical by modifying mod_wsgi source code and having it
dump out to log files. I can give instructions on what code to add
later if it comes to that.

> Nonetheless, it is quite a strange problem and having these errors in
> the logs is not good since this means we are not defect free. I really
> do not want to throw away what I have implemented as it takes
> advantage of unique strengths of Python and is unbelievable fast and
> not resource intensive at all thanks to mod_wsgi. I really appreciate
> your work and have so much respect for what you are doing. Thanks once
> again for making my idea a reality.. well almost there :)

Can you possibly use mod_wsgi 3.1 instead of mod_wsgi 2.6.

The code where those error messages is generated has been changed
around in mod_wsgi 3.0. I don't believe it was addressing any specific
problems, but would be good to see if issue resolves itself doing
that.

Graham

inaf

unread,
Jan 11, 2010, 6:50:16 PM1/11/10
to modwsgi

On Jan 11, 5:56 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2010/1/11 inaf <cem.ezbe...@gmail.com>:


>
>
>
>
>
> > As I mentioned in one of the previous post, this script accesses a
> > singleton object (well not quite a singleton object but close) where
> > it reads from a class' variables (not sure if synchronization is
> > required but I even tried that with lock.. still no luck). The problem
> > is that this script is SSI'ed on a page 3 times with different
> > parameters so it does not entirely break the page but these 3 places
> > where it displays content, I think we are getting SSI error messages
> > however I have not experienced it myself (I took out the "unable to
> > include .." SSI errors from the log lines I pasted earlier not to
> > confuse you as I don't think it has anything to do with it). We are
> > putting a script in place to hit the page every so often and scrape
> > the content hoping that in one of those runs, we will get this error
> > and able to see what a user would see. The reason for doing this is
> > that I was able to reproduce the error with another WSGI script that
> > refreshes these class variables every once in a while via a job. I
> > continously hit this URL in my browser and got the error in the logs
> > but response was not always 500. It was able to get the result back
> > successfully in some cases. So I want to see if the users really see a
> > problem or somehow Apache tries to SSI the script again and is able to
> > do so.
>
> Are you saying you are using Apache server side includes in static
> files to virtual include a URL which corresponds to the WSGI script
> file containing this code? Or when you say SSI are you meaning it in
> the context of some specific Python web templating system?
>

Yes, a bit more complicated than that actually.. request goes through
this path: Lighttpd -> Apache --via mod_jk--> Tomcat and comes back
the same way.. the content produced at Tomcat gets intercepted by
Apache via location directives to execute virtual include lines that
include a relative path which corresponds to the WSGI script. At that
point mod_wsgi takes over and executes my WSGI script and the response
gets injected before being returned to Lighttpd (nothing is done by
Lighttpd) and finally to the user.

> Is any of this stuff using a custom Python C extension module?

None, as far as I know. I only have the python-memcached module
(http://www.tummy.com/Community/software/python-memcached/), which I
think is pure Python, in this environment as I don't need anything
else. I do however use cPickle but not sure if that would cause an
error like this. I did read the explanation you have on that issue but
I am not using cPickle in the WSGI script itself. It is used in a
Python file (methods and classes I coded) that I include as a module
at the top of my WSGI script. So not sure it the same restrictions
apply. I only get some values from memcached and unpickle (after
uncompressing -- zlib module).

>
> > The other approach I have in mind is to see what is going through the
> > unix socket between mod_wsgi daemon and apache. I am not quite sure if
> > it is possible but still investigating.
>
> Only really practical by modifying mod_wsgi source code and having it
> dump out to log files. I can give instructions on what code to add
> later if it comes to that.

We might need to try that but I think there are a few things I want to
try before that.


>
> > Nonetheless, it is quite a strange problem and having these errors in
> > the logs is not good since this means we are not defect free. I really
> > do not want to throw away what I have implemented as it takes
> > advantage of unique strengths of Python and is unbelievable fast and
> > not resource intensive at all thanks to mod_wsgi. I really appreciate
> > your work and have so much respect for what you are doing. Thanks once
> > again for making my idea a reality.. well almost there :)
>
> Can you possibly use mod_wsgi 3.1 instead of mod_wsgi 2.6.
>
> The code where those error messages is generated has been changed
> around in mod_wsgi 3.0. I don't believe it was addressing any specific
> problems, but would be good to see if issue resolves itself doing
> that.
>

Already did that a few days back on one node and the errors I pasted
(if I am not mistaken) were from that node.. so the problem is still
there even with the upgrade..

Tried disabling siteminder module and could not reproduce this error
in pre-prod.. so it might be the memory allocation problem you
described earlier.. logged a case with siteminder to see if they can
provide any insight..

-Cem

> Graham- Hide quoted text -

Reply all
Reply to author
Forward
0 new messages