PHP 8.1.7

75 views
Skip to first unread message

Paul Smedley

unread,
Jun 10, 2022, 4:43:04 AM6/10/22
to eCS ISP Mailing List, Apache HTTP Server for OS/2

Hi All,

There were some security fixes released today for PHP.

A build of 8.1.7 is available from
https://smedley.id.au/tmp/php-8.1.7-os2-debug-20220610.zip

Details of the changes are at https://www.php.net/ChangeLog-8.php#8.1.7

Source code is at https://github.com/psmedley/php-os2/tree/php-8.1

Cheers,

Paul

David McKenna

unread,
Jun 11, 2022, 1:10:55 PM6/11/22
to Apache for OS/2
Thanks Paul, 8.1.7 is working well here so far...

Regards,

Steven Levine

unread,
Jun 11, 2022, 3:51:41 PM6/11/22
to apa...@googlegroups.com
In <a465d488-0e4d-4b9b...@googlegroups.com>, on 06/11/22
at 10:10 AM, David McKenna <davidmc...@gmail.com> said:

Hi,

>Thanks Paul, 8.1.7 is working well here so far...

FWIW, it's installed and working here too, but only lightly tested.

David, if you could figure out how to stress your setup sufficiently to
trigger some of the issues Massimo is reporting, it might be helpful. The
important part of his httpd.conf is:

<IfModule mpm_mpmt_os2_module>
StartServers 5
MinSpareThreads 28
MaxSpareThreads 50
MaxRequestsPerChild 115
MaxThreads 59
</IfModule>

IMO, these settings, especially the MaxRequestsPerChild setting, do not
make much sense to my way of thinking. One result is httpd children are
stopping starting every few seconds for no good reason. Process startup
and termination is expensive. The failures, seem to continue to occur.

What I think is happening is that one of the php threads is intermittently
using up all the available address space. The php code base pretty much
assume its running on a 64-bit system and omits a large number of
allocation error checks assuming they can never occur.

As I find the one's that cause us to trap, I avoid the trap and try to die
gracefully.

There's also an intermittent issue where libcx dies which holding the
global lock. We know the PID/TiD from the logs, but so far have not been
able to capture data about what the thread was doing when this occurs.
For some reason, there are no popuplog or exceptq reports for this
PID/TID.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
www.scoug.com www.arcanoae.com www.warpcave.com
----------------------------------------------------------------------

David McKenna

unread,
Jun 12, 2022, 9:31:43 AM6/12/22
to Apache for OS/2
Hi Steven,

  I'll use those settings and try to 'siege' the server and see what happens. One thing I have noticed is that Massimo uses VIRTUALADDRESSLIMIT=1536, but I always use 3072 for that. I'll try lowering that too.

Regards,

David McKenna

unread,
Jun 12, 2022, 9:40:57 AM6/12/22
to Apache for OS/2
Steven,

  FWIW - here is what I have been using in httpd-mpm.conf:

<IfModule mpm_mpmt_os2_module>
    ThreadStackSize 65536
    StartServers 2
    MinSpareThreads  5
    MaxSpareThreads 10
    MaxConnectionsPerChild 1000
</IfModule>

 I didn't know that MaxRequestsPerChild was even a thing that could be used...is there a list of valid directives for this somewhere?

Steven Levine

unread,
Jun 12, 2022, 12:44:08 PM6/12/22
to apa...@googlegroups.com
In <5b81104e-a9a7-4bf2...@googlegroups.com>, on 06/12/22
at 06:40 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> FWIW - here is what I have been using in httpd-mpm.conf:

><IfModule mpm_mpmt_os2_module>
> ThreadStackSize 65536
> StartServers 2
> MinSpareThreads 5
> MaxSpareThreads 10
> MaxConnectionsPerChild 1000
></IfModule>

These are reasonable. For a production server, I wouild use Peter
Moylan's formula and data derived from the logs.

Currently, I'm using

<IfModule mpm_mpmt_os2_module>
ThreadStackSize 65536
StartServers 2
MinSpareThreads 5
MaxSpareThreads 10
MaxRequestsPerChild 0
</IfModule>

on the SCOUG server. These agree pretty well with some estimates from the
logs and Peter's formula.

Barney is using the same settings, except for the more normal.

MaxRequestsPerChild 1000

The scoug server is using

MaxRequestsPerChild 0

because I am currently doing some uptime studies. With this setting, the
httpd child will eventually run out of memory and report

[Fri Mar 25 12:21:57.470000 2022] [cgi:error] [pid 100:tid 6] (8)Invalid
executable file format: [client 194.165.16.27:37016] couldn't create child
process: 8: docsearch.cmd

or something similar. The deadman deamon detects this and kills the
process and reports something like:

2022-06-09 09:16:11 httpd process with PID 51304 (c868) could not create
child process - will try to kill (#731).
2022-06-09 09:16:11 DosKillProcess successfully killed process 51304
(c868) (#740).

The uptime as of this morning is:

There are 57 Processes with 272 Threads.
This machine's uptime is 10d 3h 24m 10s 195ms.

The true uptime was longer because cron does a once a month reboot just
because.

The SCOUG server does a lot of SSI, but very little php, so it does no
have any of the failures Massimo sees.

> I didn't know that MaxRequestsPerChild was even a thing that could be
>used...

This changed to MaxConnectionsPerChild in 2.4. See
http://httpd.apache.org/docs/2.4/upgrading.html

The 2.2 directives are supported in 2.4 if you load access_compat_module,
but of course, you only want to load this module until you get httpd.conf
fully updated to 2.4 standards.

>is there a list of valid directives for this somewhere?

https://httpd.apache.org/docs/2.4/mod/directives.html

is complete except for two directives we recently added (MaxThreads and
BeginLibPath). The plan is to get readme.os2 into the repo so that it can
list the modifications that have not made it into the httpd docs.
Currently we have no process in place for getting the httpd docs updated
or getting our patches into the upstream sources. Maybe someday. The
apache httpd devs do not seem inclined to remove the OS/2 code and it
would be nice to be able to build out of the box from unpatched sources.

httpd command line provides a number of useful options for listing the
capabilities of a given build:

httpd -d.. -v show version number
httpd -d.. -V show compile settings
httpd -d.. -h list available command line options (this
page)
httpd -d.. -l list compiled in modules
httpd -d.. -L list available configuration directives

The -d.. means these are intended to be run from the bin subdirectory.
The -L output includes the new directives:

MaxThreads (mpmt_os2.c)
Maximum number children
Allowed in *.conf only outside <Directory>, <Files>, <Location>,
or <If>

BeginLibPath (mod_so.c)
path list to apply to OS/2 BEGINLIBPATH
Allowed in *.conf only outside <Directory>, <Files>, <Location>,
or <If>

As you can see the directives are organized by source file.

Steven Levine

unread,
Jun 12, 2022, 1:16:09 PM6/12/22
to apa...@googlegroups.com
In <2e33f6fa-e11e-4b5d...@googlegroups.com>, on 06/12/22
at 06:31 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> I'll use those settings and try to 'siege' the server and see what
>happens. One thing I have noticed is that Massimo uses
>VIRTUALADDRESSLIMIT=1536, but I always use 3072 for that. I'll try
>lowering that too.

Thanks,

The VAL setting is a tradeoff. Most systems work fine with 3072.
However, some systems will run out of address space in the system arena
with VAL set to 3072. What VAL does is change the dividing line between
the user arenas and the system arena.

Before the days of above512 (aka upper or high) memory, the system arena
extended from 512MB to 4GB, so it was almost impossible to run out of
address space in the system arena before running out of address space in
the user arenas.

With VAL set to 3072, the kernel, the drivers, the page tables, the kernel
heaps and all the other kernel control tables need to fit in the address
space between 3072MB and 4GB.

To see what's in the system arena, use Theseus's System->Kernel
information->System arena.

David McKenna

unread,
Jun 12, 2022, 2:56:36 PM6/12/22
to Apache for OS/2
Hi Steven,

  Thanks for the explanations! I'll have to play around with the httpd switches a little bit to get more knowledgable about the directives. 

  I have been running with Massimo's settings and trying to siege the server for a couple hours now, but so far all I have gotten is TrapE's in siege (when sieging on a single server - www.davemckenna.com. If I use the local IP address instead - 192.168.21.2 - then no trap), and traps in AFINETK on the server (which I already described and reported to ArcaNoae). No POPUPLOGS or exceptq logs yet. Same behaviour whether using 1536 or 3072 for VIRTUALADDRESSLIMIT.

Regards,

Steven Levine

unread,
Jun 12, 2022, 4:01:22 PM6/12/22
to apa...@googlegroups.com
In <30482411-9b5e-4e50...@googlegroups.com>, on 06/12/22
at 11:56 AM, David McKenna <davidmc...@gmail.com> said:

Hi,

> Thanks for the explanations!

Your welcome. I don't expect folks to automatically understand the
differences between RAM and address space, but when doing this kind of
debugging the differences matter.

>I'll have to play around with the httpd
>switches a little bit to get more knowledgable about the directives.

I find that in most cases the defaults are good. They are probably based
on lots of user feedback over the years.

> I have been running with Massimo's settings and trying to siege the
>server for a couple hours now, but so far all I have gotten is TrapE's in
> siege (when sieging on a single server - www.davemckenna.com.

Please post an exceptq report for the siege trap E, if you have one, or a
popuplog entry. I may see something useful. I've guessing it's an out of
memory problem, but we shall see. Our siege build is pretty old, so this
might give us some incentive to do a rebuild with debug information and
patch to avoid the traps.

>If I use
>the local IP address instead - 192.168.21.2 - then no trap), and traps
>in AFINETK on the server (which I already described and reported to
>ArcaNoae).

You might want to post a query to the ticket. There have been some NIC
driver updates that might have an effect on the AFINETK traps.

>No POPUPLOGS or exceptq logs yet. Same behaviour whether
>using 1536 or 3072 for VIRTUALADDRESSLIMIT.

That's good. Based on what I see, it seems there needs to be a couple of
memory hungry php scripts running before the problems show up.

I'm trying to convince Lewis to carve out some time to upgrade
www.arcanoae.com which is lots of wordpress with lots of plugins. When he
gets a testbed set up, this should help us track down and resolve more of
the edge cases.

You might notice that Massimo does not set StackSize, so each thread gets
a 128KB stack which must live in the lower user private arena. Each php
script starts with a 2MB php heap, which can live in the upper user
private arena.

Have fun,

David McKenna

unread,
Jun 12, 2022, 5:23:57 PM6/12/22
to Apache for OS/2
Hi Steven,

  I do have the latest MultiMac drivers applied on both server and desktop. When siege traps, I just get the trap screen, but no POPUPLOG or exceptq file created. The weird thing is the trap screen will show either 'SOFFICE' or 'DOOBLE" as the offending process if either are running. If I make sure neither is running, then I see 'SIEGE' on the trap screen. I can take a pic if you think it's helpful. I could even get a dump if it's worth your while...

Regards,

Paul Smedley

unread,
Jun 12, 2022, 6:57:32 PM6/12/22
to apa...@googlegroups.com
Hey Guys,

On 13/6/22 05:03, Steven Levine wrote:
> Please post an exceptq report for the siege trap E, if you have one, or a
> popuplog entry. I may see something useful. I've guessing it's an out of
> memory problem, but we shall see. Our siege build is pretty old, so this
> might give us some incentive to do a rebuild with debug information and
> patch to avoid the traps.

Consider me incentivised -
https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip :) Should also
have debug symbols

Cheers,

Paul

Steven Levine

unread,
Jun 12, 2022, 7:18:54 PM6/12/22
to apa...@googlegroups.com
In <ee58212c-cfb6-4233...@googlegroups.com>, on 06/12/22
at 02:23 PM, David McKenna <davidmc...@gmail.com> said:

Hi,



> When siege traps, I just get the trap screen, but no POPUPLOG
>or exceptq file created.

This can happen when the user (aka ring3) stack overflows. The exception
handlers run on the user stack. It can also happen if there are
insufficient resources to write to the popuplog file.

>The weird thing is the trap screen will show
>either 'SOFFICE' or 'DOOBLE" as the offending process if either are
>running.

This a function of how the kernel locates the module name for some kinds
of traps. It's a cosmetic error.

FWIW, I see a similar defect in pmdf whan analyzing system dumps. To
avoid the display error, I unload the symbols for the spurious process.

>trap screen. I can take a pic if you think it's helpful. I could even
>get a dump if it's worth your while...

I would like to see a picture of the trap screen. No need for a dump file
yet.

Thanks,

Steven Levine

unread,
Jun 12, 2022, 7:32:34 PM6/12/22
to apa...@googlegroups.com
In <8d366491-08cf-1760...@smedley.id.au>, on 06/13/22
at 08:27 AM, Paul Smedley <pa...@smedley.id.au> said:

Hi Paul,

>Consider me incentivised -

:-)

>https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip :) Should also
>have debug symbols

Thanks. Let's see how this one works for us. The maintainer is up to
4.1.3, but I don't see any changes that work affect us much.

David McKenna

unread,
Jun 12, 2022, 8:53:06 PM6/12/22
to Apache for OS/2
Hi Paul,

  Thanks for the new build of siege. This one made my server blow up REAL good - it seems to be more intensive than the old one. I crashed the server in about 2 minutes using default settings of siege, and Massimo's httpd settings and VAL = 3072. I still need to avoid 'www.davemckenna.com' in favor of 192.168.21.2 to avoid traps on the siege machine.

 Attached are all the files created by the crash - the 2 TRP files were in the apache24 directory, the httpd traps were in the \var\log\app directory. Siege.txt is the result from the siege (didn't have a log file configured yet).

0070_01.TRP
httpd traps.zip
siege.txt
POPUPLOG.OS2
006D_01.TRP
apache error_log

David McKenna

unread,
Jun 12, 2022, 9:24:25 PM6/12/22
to Apache for OS/2
Hi Steven,

 Attached is an image of the trap screen when trying to use a URL to run siege (specifically, I ran 'siege www.davemckenna.com' from a command line). It happens if I use URL's in the 'urls.txt' file too. Always need to use the IP address.

Regards,
siege trap.jpg

Paul Smedley

unread,
Jun 12, 2022, 9:32:00 PM6/12/22
to apa...@googlegroups.com
Interesting.... using:
u:\siege\bin\siege https://smedley.id.au

is working for me...

On 13/6/22 10:54, David McKenna wrote:
> Hi Steven,
>
>  Attached is an image of the trap screen when trying to use a URL to
> run siege (specifically, I ran 'siege www.davemckenna.com' from a
> command line). It happens if I use URL's in the 'urls.txt' file too.
> Always need to use the IP address.
>
> Regards,
> On Sunday, June 12, 2022 at 8:53:06 PM UTC-4 David McKenna wrote:
>
> Hi Paul,
>
>   Thanks for the new build of siege. This one made my server blow
> up REAL good - it seems to be more intensive than the old one. I
> crashed the server in about 2 minutes using default settings of
> siege, and Massimo's httpd settings and VAL = 3072. I still need to
> avoid 'www.davemckenna.com <http://www.davemckenna.com>' in favor of
> 192.168.21.2 to avoid traps on the siege machine.
>
>  Attached are all the files created by the crash - the 2 TRP files
> were in the apache24 directory, the httpd traps were in the
> \var\log\app directory. Siege.txt is the result from the siege
> (didn't have a log file configured yet).
>
> On Sunday, June 12, 2022 at 7:32:34 PM UTC-4 ste...@earthlink.net wrote:
>
> In <8d366491-08cf-1760...@smedley.id.au>, on 06/13/22
> at 08:27 AM, Paul Smedley <pa...@smedley.id.au> said:
>
> Hi Paul,
>
> >Consider me incentivised -
>
> :-)
>
> >https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip
> <https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip> :)
> Should also
> >have debug symbols
>
> Thanks. Let's see how this one works for us. The maintainer is
> up to
> 4.1.3, but I don't see any changes that work affect us much.
>
> Steven
>
> --
> ----------------------------------------------------------------------
>
> "Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
> www.scoug.com <http://www.scoug.com> www.arcanoae.com
> <http://www.arcanoae.com> www.warpcave.com
> <http://www.warpcave.com>
> ----------------------------------------------------------------------
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Apache for OS/2" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to apache2+u...@googlegroups.com
> <mailto:apache2+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/apache2/badc5845-06ba-4309-8637-62634fdf816an%40googlegroups.com
> <https://groups.google.com/d/msgid/apache2/badc5845-06ba-4309-8637-62634fdf816an%40googlegroups.com?utm_medium=email&utm_source=footer>.

Steven Levine

unread,
Jun 12, 2022, 10:43:25 PM6/12/22
to apa...@googlegroups.com
In <badc5845-06ba-4309...@googlegroups.com>, on 06/12/22
at 06:24 PM, David McKenna <davidmc...@gmail.com> said:

HI David,

> Attached is an image of the trap screen when trying to use a URL to run
>siege (specifically, I ran 'siege www.davemckenna.com' from a command
>line).

The traps screen explains why the module name varies, the cs:eip of 168:0
means some code in the kernel jump to location 0, possibly because of a
bad pointer in a control structure.

I'll need a system dump to say more.

>It happens if I use URL's in the 'urls.txt' file too. Always need
>to use the IP address.

I need to review my notes, but I dimly recall I may have thought the
afinetk trap was related to DNS queries.

Steven Levine

unread,
Jun 12, 2022, 11:38:32 PM6/12/22
to apa...@googlegroups.com
In <4cb390fd-9e1e-4330...@googlegroups.com>, on 06/12/22
at 05:53 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

Nice collection of trap and logs files. :-)

It seems you may have replicated Massimo's failure or something very
similar. At some point a process dies while it owns the global lock and
bad things happen.

62a644b3-0078-HTTPD-libcx.log reports:

mutex handle: 800100b6
owner state: dead
owner PID: 0078 (120) <current>
owner TID: 48
request #: 1

which means TID 48 died while it held the lock. I was hoping one of the
exceptq reports would tell us what this thread was doing when it croaked,
but I don't think we got so lucky. I'm not finding it in your collection.

Most of the other reports are all cascading errors.

I do see a couple of reports that imply php may not be catching as many
out of memory cases as we would like. We know this is a work in progress.

Since you are running with 4 CPUs, if you have not already done so, I
recommend trying with MAXCPU=1 and see if that is sufficient to make the
AFINETK traps go away. My notes for ticket 3177 do not indicate that we
ever tried this.

When looking at the trap files, the SIGTRAPs are libcx's response to the
owner died semaphore issue. These should all have corresponding libcx log
files.

The SIGABRT reports are libcn's response to the same issue. Dmitriy
planned to align the libcx and libcn reporting, but Putin's war got in the
way of Dmitriy's life.

62a6449b-006d_01-HTTPD-exceptq.txt looks like stack corruption of some
sort.

62a6447a-006c_01-HTTPD-exceptq.txt looks like a out of memory issue that
needs to be better handled.

Try limiting MaxThreads to 20 and see if this eliminates the owner died
issue. TID 48 means that the PID had 48 threads running and that can
imply a lot of memory usage, especially for PHP apps.

You might also want to get familiar with the Theseus's Linear Usage by
Process. It will give you a useful picture of how much memory the httpd
processes are using.

If you don't have a copy of
http://www.warpcave.com/os2diags/theseus-how-to.txt, I recommend you grab
one. It's a cookbook for the Theseus operations you will use most often.

Steven Levine

unread,
Jun 13, 2022, 12:31:43 AM6/13/22
to apa...@googlegroups.com
In <de8aa032-fb2b-e09b...@smedley.id.au>, on 06/13/22
at 11:01 AM, Paul Smedley <pa...@smedley.id.au> said:

HI,

>Interesting.... using:
>u:\siege\bin\siege https://smedley.id.au

This goes to prove the generalization that every OS/2 user's system is
different. :-) Recall I have the "slow" localhost on one of the systems I
use for testing. It appears to be something to do wih my firefox config,
but wget and other operations are as fast as expected.

David is running a relatively fast setup. It's Gigabyte Technology Co.,
Ltd. H110M-S2H with an Intel Core i5-7400 CPU @ 3.00GHz. I would expect
it to perform quite differently than your virtualized setup.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
www.scoug.com www.arcanoae.com www.warpcave.com
----------------------------------------------------------------------

Steven Levine

unread,
Jun 13, 2022, 1:03:00 AM6/13/22
to apa...@googlegroups.com
In <4cb390fd-9e1e-4330...@googlegroups.com>, on 06/12/22
at 05:53 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

Nice collection of trap and logs files. :-)

I neglected to mention that your httpd error log will contain a error
reports written by php to stderr. Some of they might be useful to review
Some of the php errors are detected before the php runtime is fully
initialized so stderr is the only possible destination for the reports.

I supect you will find some "heap corrupt" reports.

It seems you may have replicated Massimo's failure or something very
similar. At some point a process dies while it owns the global lock and
bad things happen. 62a644b3-0078-HTTPD-libcx.log reports:

mutex handle: 800100b6
owner state: dead
owner PID: 0078 (120) <current>
owner TID: 48
request #: 1

which means TID 48 died while it held the lock. I was hoping one of the
exceptq reports would tell us what this thread was doing when it croaked,
but I don't think we got so lucky. I'm not finding this report in your
collection.

While testing you might want to modify your httpd common log config to
include the pid and pid associated with the connection. Use %{pid}P and
%{tid}P. See

https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#formats

for the gory details. We did this with Massimo's setup, but the problem
tid has not shown up yet in the logs. It's possible that the thread is
failing before httpd is ready to log anything about the request.

We might need to take the heavy handed approach and use the set EXCEPTQ=Z
feature which is documented in exceptq-shl.txt. This will generate an
exceptq report for every process termination. All we need to find it the
report(s) for process that the libcx logs indicate died while holding the
mutex. The rest of the normal termination reports can be discarded
because they will not tell un anything we need to know.

David McKenna

unread,
Jun 13, 2022, 5:25:27 PM6/13/22
to Apache for OS/2
Hi Steven,

  A lot to digest... I'll update the logging in httpd and set MAXCPU=1 to try and get an AFINETK trap (or not). I do have Theseus, but not on the server - I'll install that. I'll also set EXCEPTQ=Z, unless you say no, and capture a dump (memlimited) of the siege trap. If/when I get another blow-up, I'll inquire about what files need to be uploaded.

Regards,

Steven Levine

unread,
Jun 13, 2022, 7:11:44 PM6/13/22
to apa...@googlegroups.com
In <636e3654-ec71-47a9...@googlegroups.com>, on 06/13/22
at 02:25 PM, David McKenna <davidmc...@gmail.com> said:


Hi,

> A lot to digest...

:-)

I'll update the logging in httpd and set MAXCPU=1 to
> try and get an AFINETK trap (or not). I do have Theseus, but not on the
>server - I'll install that. I'll also set EXCEPTQ=Z, unless you say no,
>and capture a dump (memlimited) of the siege trap. If/when I get another
> blow-up, I'll inquire about what files need to be uploaded.

We have two, mostly unrelated issues. While the The AFINETK traps is
triggered by running siege, anything that generates lots of network
activity could trigger it.

These kinds of kernel traps are often caused by missing serialization
logic which only shows up in SMP setups. If the AFINETK system traps go
away run running with MAXCPU=1 it give us a better idea where to look. I
recommend you do this test first. Once we have detemined whether or not
running multiple cores is what allows the trap to occur, you can update
the ticket.

I don't think we need another system dump for this yet. The two I have
show the issue pretty clearly. Of course, as is typical for these kinds
of traps, understanding the sequence of events that allow the failure to
trigger is going to take more work.

The second issue is the the user level traps and failures in httpd and
php. There are a several flavors of these, but the owner died issue is
the one we probably want to solve first. As you have seem, this failure
results in numerous cascading errors. Once this issue is resolved, we
will be left with a small number of remaining issues to resolve.

David McKenna

unread,
Jun 14, 2022, 6:03:09 PM6/14/22
to Apache for OS/2
Hi Steven,

  I set the server to use MAXCPU=1 and also memlimited to 1846 (in case of a dump). Also memlimited the desktop and ran siege using 'siege www,davemckenna.com' and the desktop trapped (as usual) and I was able to capture a dump on the desktop from that if you want to see it.

  Ran siege using 'siege -f c:/siege/etc/urls.txt'  (which has http://192.168.21.2/index.htmlhttps://192.168.21.2/phpMyAdmin/index.php, https://192.168.21.2/Wordpress/index.php, and http://192.168.21.2/filegator/dist/index.php in it). Apache crashed in about 4 minutes, almost exactly like it did the last time (which was full SMP), but no AFINETK trap yet. I'll attach the POPUPLOG and apache error log. If there are any exceptq files listed in the error log that might be interesting let me know.

  A new wrinkle is that on the desktop I have gotten a trap in SOCKETSK twice when running siege, but haven't caught a dump for that yet...

Regards,
POPUPLOG.OS2
apache error_log

Steven Levine

unread,
Jun 14, 2022, 10:07:52 PM6/14/22
to apa...@googlegroups.com
In <ac677af4-d08d-437a...@googlegroups.com>, on 06/14/22
at 03:03 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

> I set the server to use MAXCPU=1 and also memlimited to 1846 (in case
>of a dump). Also memlimited the desktop

What does memlimited the Desktop mean to you?

>and ran siege using 'siege
>www,davemckenna.com' and the desktop trapped (as usual) and I was able to
> capture a dump on the desktop from that if you want to see it.

Do you mean a process dump or a system dump?

> SMP), but no AFINETK trap yet.

This is some evidence that we have an SMP issue with your setup.

>I'll attach the POPUPLOG

The POPUPLOG

06-14-2022 16:34:06 SYS3175 PID 007c TID 0001 Slot 00e0
C:\PROGRAMS\APACHE24\BIN\HTTPD.EXE
c0000005
7d36dd80
P1=00000002 P2=0081ff1c P3=XXXXXXXX P4=XXXXXXXX
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000
DS=0053 DSACC=d0f3 DSLIM=7fffffff
ES=0053 ESACC=d0f3 ESLIM=7fffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:7d36dd80 CSACC=d0df CSLIM=7fffffff
SS:ESP=0053:0081ff20 SSACC=d0f3 SSLIM=7fffffff
EBP=00000000 FLG=00010206

EXCEPTQ.DLL 0001:0000dd80

implies that EXCEPTQ is loading in upper memory again. Is this true? If
so, please mark exceptq to load in lower memory. If not, I guess I'll
have to figure out what the POPUPLOG entry is trying to tell us.

>and apache error
> log.

The apache error contains pretty much what I expected. I notice that the
heap corruption reports are shown as:

zend_mm_heap corrupted

We need to cherry pick commit ccd4a01272e1c94804467e539a8224e72ee67a81.
If Paul is lurking a new 8.1.7 should be available in the fullness of
time.

We are still left with the problem of figuring out why and how:

owner PID: 00c9 (201)
owner TID: 10

died.

I took a quick look at the previous exceptq reports that where SIGSEGV's.
These would be:

62a6447a-006c_01-HTTPD-exceptq.txt
62a6449b-006d_01-HTTPD-exceptq.txt

The traps occurred because the php termination logic is not sufficiently
robust when there is insufficient memory for the php instance to fully
initialize. We know php is attempting to terminate because _zend_shutdown
is in the call path.

To avoid these traps, I may have to change the out of memory handling
logic in zend_mm_init. What happens is than when we are out of memmory
for the php heap:

alloc_globals->mm_heap = zend_mm_init();

sets alloc_globals->mm_heap to NULL. This is fine, but none of the other
code checks the pointer. I attempted to add the missing NULL pointer
checking code with the idea that we could just use the normal thread
termination code. The NULL pointer checks I put in worked as expected,
but I never got to the point where all the required checks were in place.
I may make another attempt at this.

The other option is to terminate the thread in zend_mm_init(). This might
produce some small, intermin leaks, but we are already out of memory and
this memory will be reclaimed when the parent process terminates when the
MaxConnectionsPerChild limit is reached.

> A new wrinkle is that on the desktop I have gotten a trap in SOCKETSK
>twice when running siege,

What this when running MAXCPU=1? Is so, it's just a variant of the
SOCKETSK trap. AFINETK SOCKETS and the NIC drivers all tied and the hip
and share common data structures.

Have fun,

David McKenna

unread,
Jun 15, 2022, 6:21:27 AM6/15/22
to Apache for OS/2
Hi Steven,

  I memlimited the desktop to capture system dumps for the siege trap and socketsk trap. I have a system dump for the siege trap. Not running the desktop (where I run siege) with MAXCPU=1, only the apache server.

  I checked exceptq on the server with 'highmem -u exceptq.dll' and it indicated that it was not modified, so don't think it was set to load high, but re-installed it anyway, just in case. Also looked in a couple of the exceptq trap files and under 'DLL's accessable from this process' EXCPETQ is shown to be in lower memory (1e0b0000).

Regards,

Paul Smedley

unread,
Jun 15, 2022, 6:34:13 AM6/15/22
to apa...@googlegroups.com

Hi Steven

On 15/6/22 09:51, Steven Levine wrote:
> The apache error contains pretty much what I expected. I notice that the
> heap corruption reports are shown as:
>
> zend_mm_heap corrupted
>
> We need to cherry pick commit ccd4a01272e1c94804467e539a8224e72ee67a81.
> If Paul is lurking a new 8.1.7 should be available in the fullness of
> time.

I hadn't cherry-picked the 'debugging' type commits given we were
working on those with 7.4.x with Max, but given we can also work on
debugging with 8.1.7 :)

I've been flat out, but
https://smedley.id.au/tmp/php-8.1.7-os2-dll-debug-20220615.zip should be
available shortly.

I'd been playing around (unsuccessfully) with trying to move to libtool
rather than aplibtool.exe on the weekend, so needed to do a full
reconfigure/rebuild hence just the DLL for now, whilst the rest finishes
building.

Cheers,

Paul

Paul Smedley

unread,
Jun 15, 2022, 6:38:23 AM6/15/22
to apa...@googlegroups.com

Steven Levine

unread,
Jun 15, 2022, 3:04:44 PM6/15/22
to apa...@googlegroups.com
In <de3b8743-e05e-430e...@googlegroups.com>, on 06/15/22
at 03:21 AM, David McKenna <davidmc...@gmail.com> said:


Hi David,

> I memlimited the desktop to capture system dumps for the siege trap and
> socketsk trap. I have a system dump for the siege trap. Not running the
>desktop (where I run siege) with MAXCPU=1, only the apache server.

Oh, I think I understand now. You are really talking about booted
systems, not just the Desktop. One booted system is running the siege
client and the other is running the apache server. I can be a bit
literal, at times.

> I checked exceptq on the server with 'highmem -u exceptq.dll' and it
>indicated that it was not modified, so don't think it was set to load
>high, but re-installed it anyway, just in case. Also looked in a couple
>of the exceptq trap files and under 'DLL's accessable from this process'
>EXCPETQ is shown to be in lower memory (1e0b0000).

I figured it was a red herring. Something is confused. We have the
popuplog:

06-14-2022 16:34:06 SYS3175 PID 007c TID 0001 Slot 00e0
C:\PROGRAMS\APACHE24\BIN\HTTPD.EXE
c0000005
7d36dd80
P1=00000002 P2=0081ff1c P3=XXXXXXXX P4=XXXXXXXX
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000
DS=0053 DSACC=d0f3 DSLIM=7fffffff
ES=0053 ESACC=d0f3 ESLIM=7fffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:7d36dd80 CSACC=d0df CSLIM=7fffffff
SS:ESP=0053:0081ff20 SSACC=d0f3 SSLIM=7fffffff
EBP=00000000 FLG=00010206

EXCEPTQ.DLL 0001:0000dd80

and CS:EIP=005b:7d36dd80 is definitely in upper memory so this appears to
be another case of a misidentified module. The reason for this may be
that the DLL that was loaded at CS:EIP=005b:7d36dd80, unloaded before the
kernel attempted to identify the module.

David McKenna

unread,
Jun 15, 2022, 4:20:51 PM6/15/22
to Apache for OS/2
Hi Steven,

  Sorry to be confusing - you are right I should not describe one of my machines as just 'the desktop' but add 'computer' to it.

  I do load the Apache modules (but not httpd.dll) high - I wonder if one of them is being referred to in the popuplog you show?

Regards,

Steven Levine

unread,
Jun 15, 2022, 10:05:43 PM6/15/22
to apa...@googlegroups.com
In <9ee53c1c-5a51-4d84...@googlegroups.com>, on 06/15/22
at 01:20 PM, David McKenna <davidmc...@gmail.com> said:


Hi,

> I do load the Apache modules (but not httpd.dll) high - I wonder if one
> of them is being referred to in the popuplog you show?

Not directly. I need to do a bit more analysis to confirm or deny what I
suspect. It's on my list, but not with a high priority.

My current plan is to implement patches to avoid the known traps.

In parallel we are going to keep looking for the thread that is dieing
while holding the lock.

Steven Levine

unread,
Jun 24, 2022, 12:33:10 AM6/24/22
to apa...@googlegroups.com
at 01:20 PM, David McKenna <davidmc...@gmail.com> said:


Hi all,

Several days ago, I said:

In parallel we are going to keep looking for the thread that is dieing
while holding the lock.

It seems that David's siege testing has produced positive results. We
find

62a6449c-0070_01-HTTPD-exceptq.txt

which reports

62a6449c-0070_01-HTTPD-exceptq.txt:176
007FFB48 1E45497B LIBCN0 0001:0009497B ifree.c#150
__um_free_maybe_lock + 7B 0001:00094900 (ifree.obj)

which is libc telling us that the assert at

ifree.c:65
assert (crumb->x.used.size <= crate->crumb_size);

failed. This assert is triggered while the thread is holding the global
lock.

Eventually, we get to

62a6449c-0070_01-HTTPD-exceptq.txt:140
007FFA58 1E4389B2 LIBCN0 0001:000789B2 kill.c#76 __std_kill + 22
0001:00078990 (kill.obj)

and the process dies while holding the global lock which causes all the
other processes to eventually fail with owner died.

As to why I never got to see an exceptq report like this from Massimo with
the many reports he has uploaded to his tickets, I can only guess. The
symptoms have always indicated that this had occurred.

It's not obvious which bit of php code trashed the heap, but the exceptq
report seems to imply that php failed during what is called sapi init.
This code may have vectored to the php shutdown hook before all the
pointers the shutdown hook might access were properly initialized.

Lewis G Rosenthal

unread,
Jun 24, 2022, 8:40:42 AM6/24/22
to apa...@googlegroups.com
Hi...
This is truly amazing and welcome news. This thing has eluded us for ages.
Bravo, and thanks Dave, for a good exceptq report.

--
Lewis
-------------------------------------------------------------
Lewis G Rosenthal, CNA, CLP, CLE, CWTS, EA
Rosenthal & Rosenthal, LLC www.2rosenthals.com
visit my IT blog www.2rosenthals.net/wordpress
-------------------------------------------------------------

Steven Levine

unread,
Jun 24, 2022, 12:32:33 PM6/24/22
to apa...@googlegroups.com
In <62B5B0C1...@2rosenthals.com>, on 06/24/22
at 08:40 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>This is truly amazing and welcome news. This thing has eluded us for
>ages. Bravo, and thanks Dave, for a good exceptq report.

Definitely. If I had to guess, it's the result of Dave's good testing
methodology. Rather than attempting to select what to test, he saved
away all the old data. That way he had a complete, fresh set of related
data to upload when the time came.

We actually know where in the heap the corruption occurred:

62a6449c-0070_01-HTTPD-exceptq.txt:168
007FFB08 1E454F8A LIBCN0 0001:00094F8A ifree.c#65
__um_crumb_free_maybe_lock + 2DE 0001:00094CAC (ifree.obj)

Offset Name Type Hex Value

8 crate pointer to type 0x238 20764000
12 crumb pointer to type 0x244 20764154
16 lock 32 bit signed 1

Unforunately exceptq does no know I want to inspect the content of
20764000, so the memory block is not included in the report.

With a bit of luck, it should be relatively easy to insert a bit of
temporary heap check logic in zend_hash_destroy and force a process dump
when the corrupton is detected.

There should be a fresh 8.1.7 build available once Paul has time to
cherry-pick the latest 7.4 patch and do an 8.1.7 build. This patch will
avoid recursion in the php error handler when it runs out of heap.

Paul Smedley

unread,
Jun 24, 2022, 5:26:48 PM6/24/22
to apa...@googlegroups.com
Hey All,

On 25/6/22 01:41, Steven Levine wrote:
> There should be a fresh 8.1.7 build available once Paul has time to
> cherry-pick the latest 7.4 patch and do an 8.1.7 build. This patch will
> avoid recursion in the php error handler when it runs out of heap.

https://smedley.id.au/tmp/php-8.1.7-os2-debug-20220625.zip

Cheers,

Paul

David McKenna

unread,
Jun 24, 2022, 7:19:45 PM6/24/22
to Apache for OS/2
Thanks Paul and Steven for your great detective work! Getting the logs would not have been possible without Paul's newest 'Siege' too.

 I installed the latest (20220625) version of php8.1, set httpd-mpm to my defaults, CPU=1, cleared the logs, and ran 'siege -f C:\siege\etc\urls.txt' (same entries as above). Let it run until I started seeing an alert about 'Can't determine chunk size', then stopped it (about 20 minutes). The apache server was still running, and there were no POPUPLOG or excetptq trap files anywhere, only the apache error log showed some messages - I'll attach.

 I do have some odd problems with siege. Beside the URL issue, one is it does not seem to honor the 'time' directive in siege.conf - it runs forever until it either errors out (due to the 'failures' directive), or I reboot because I can not stop it, or kill it. If I hit <ctrl>C, a message says 'Lifting the siege...' but it never stops. Trying to kill with 'Top' doesn't work either.

 Oh - one other thing - this new drop of php8.1 is missing the opcache module... maybe an oversight?

Regards,