PHP 8.1.7

96 views
Skip to first unread message

Paul Smedley

unread,
Jun 10, 2022, 4:43:04 AM6/10/22
to eCS ISP Mailing List, Apache HTTP Server for OS/2

Hi All,

There were some security fixes released today for PHP.

A build of 8.1.7 is available from
https://smedley.id.au/tmp/php-8.1.7-os2-debug-20220610.zip

Details of the changes are at https://www.php.net/ChangeLog-8.php#8.1.7

Source code is at https://github.com/psmedley/php-os2/tree/php-8.1

Cheers,

Paul

David McKenna

unread,
Jun 11, 2022, 1:10:55 PM6/11/22
to Apache for OS/2
Thanks Paul, 8.1.7 is working well here so far...

Regards,

Steven Levine

unread,
Jun 11, 2022, 3:51:41 PM6/11/22
to apa...@googlegroups.com
In <a465d488-0e4d-4b9b...@googlegroups.com>, on 06/11/22
at 10:10 AM, David McKenna <davidmc...@gmail.com> said:

Hi,

>Thanks Paul, 8.1.7 is working well here so far...

FWIW, it's installed and working here too, but only lightly tested.

David, if you could figure out how to stress your setup sufficiently to
trigger some of the issues Massimo is reporting, it might be helpful. The
important part of his httpd.conf is:

<IfModule mpm_mpmt_os2_module>
StartServers 5
MinSpareThreads 28
MaxSpareThreads 50
MaxRequestsPerChild 115
MaxThreads 59
</IfModule>

IMO, these settings, especially the MaxRequestsPerChild setting, do not
make much sense to my way of thinking. One result is httpd children are
stopping starting every few seconds for no good reason. Process startup
and termination is expensive. The failures, seem to continue to occur.

What I think is happening is that one of the php threads is intermittently
using up all the available address space. The php code base pretty much
assume its running on a 64-bit system and omits a large number of
allocation error checks assuming they can never occur.

As I find the one's that cause us to trap, I avoid the trap and try to die
gracefully.

There's also an intermittent issue where libcx dies which holding the
global lock. We know the PID/TiD from the logs, but so far have not been
able to capture data about what the thread was doing when this occurs.
For some reason, there are no popuplog or exceptq reports for this
PID/TID.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
www.scoug.com www.arcanoae.com www.warpcave.com
----------------------------------------------------------------------

David McKenna

unread,
Jun 12, 2022, 9:31:43 AM6/12/22
to Apache for OS/2
Hi Steven,

  I'll use those settings and try to 'siege' the server and see what happens. One thing I have noticed is that Massimo uses VIRTUALADDRESSLIMIT=1536, but I always use 3072 for that. I'll try lowering that too.

Regards,

David McKenna

unread,
Jun 12, 2022, 9:40:57 AM6/12/22
to Apache for OS/2
Steven,

  FWIW - here is what I have been using in httpd-mpm.conf:

<IfModule mpm_mpmt_os2_module>
    ThreadStackSize 65536
    StartServers 2
    MinSpareThreads  5
    MaxSpareThreads 10
    MaxConnectionsPerChild 1000
</IfModule>

 I didn't know that MaxRequestsPerChild was even a thing that could be used...is there a list of valid directives for this somewhere?

Steven Levine

unread,
Jun 12, 2022, 12:44:08 PM6/12/22
to apa...@googlegroups.com
In <5b81104e-a9a7-4bf2...@googlegroups.com>, on 06/12/22
at 06:40 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> FWIW - here is what I have been using in httpd-mpm.conf:

><IfModule mpm_mpmt_os2_module>
> ThreadStackSize 65536
> StartServers 2
> MinSpareThreads 5
> MaxSpareThreads 10
> MaxConnectionsPerChild 1000
></IfModule>

These are reasonable. For a production server, I wouild use Peter
Moylan's formula and data derived from the logs.

Currently, I'm using

<IfModule mpm_mpmt_os2_module>
ThreadStackSize 65536
StartServers 2
MinSpareThreads 5
MaxSpareThreads 10
MaxRequestsPerChild 0
</IfModule>

on the SCOUG server. These agree pretty well with some estimates from the
logs and Peter's formula.

Barney is using the same settings, except for the more normal.

MaxRequestsPerChild 1000

The scoug server is using

MaxRequestsPerChild 0

because I am currently doing some uptime studies. With this setting, the
httpd child will eventually run out of memory and report

[Fri Mar 25 12:21:57.470000 2022] [cgi:error] [pid 100:tid 6] (8)Invalid
executable file format: [client 194.165.16.27:37016] couldn't create child
process: 8: docsearch.cmd

or something similar. The deadman deamon detects this and kills the
process and reports something like:

2022-06-09 09:16:11 httpd process with PID 51304 (c868) could not create
child process - will try to kill (#731).
2022-06-09 09:16:11 DosKillProcess successfully killed process 51304
(c868) (#740).

The uptime as of this morning is:

There are 57 Processes with 272 Threads.
This machine's uptime is 10d 3h 24m 10s 195ms.

The true uptime was longer because cron does a once a month reboot just
because.

The SCOUG server does a lot of SSI, but very little php, so it does no
have any of the failures Massimo sees.

> I didn't know that MaxRequestsPerChild was even a thing that could be
>used...

This changed to MaxConnectionsPerChild in 2.4. See
http://httpd.apache.org/docs/2.4/upgrading.html

The 2.2 directives are supported in 2.4 if you load access_compat_module,
but of course, you only want to load this module until you get httpd.conf
fully updated to 2.4 standards.

>is there a list of valid directives for this somewhere?

https://httpd.apache.org/docs/2.4/mod/directives.html

is complete except for two directives we recently added (MaxThreads and
BeginLibPath). The plan is to get readme.os2 into the repo so that it can
list the modifications that have not made it into the httpd docs.
Currently we have no process in place for getting the httpd docs updated
or getting our patches into the upstream sources. Maybe someday. The
apache httpd devs do not seem inclined to remove the OS/2 code and it
would be nice to be able to build out of the box from unpatched sources.

httpd command line provides a number of useful options for listing the
capabilities of a given build:

httpd -d.. -v show version number
httpd -d.. -V show compile settings
httpd -d.. -h list available command line options (this
page)
httpd -d.. -l list compiled in modules
httpd -d.. -L list available configuration directives

The -d.. means these are intended to be run from the bin subdirectory.
The -L output includes the new directives:

MaxThreads (mpmt_os2.c)
Maximum number children
Allowed in *.conf only outside <Directory>, <Files>, <Location>,
or <If>

BeginLibPath (mod_so.c)
path list to apply to OS/2 BEGINLIBPATH
Allowed in *.conf only outside <Directory>, <Files>, <Location>,
or <If>

As you can see the directives are organized by source file.

Steven Levine

unread,
Jun 12, 2022, 1:16:09 PM6/12/22
to apa...@googlegroups.com
In <2e33f6fa-e11e-4b5d...@googlegroups.com>, on 06/12/22
at 06:31 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> I'll use those settings and try to 'siege' the server and see what
>happens. One thing I have noticed is that Massimo uses
>VIRTUALADDRESSLIMIT=1536, but I always use 3072 for that. I'll try
>lowering that too.

Thanks,

The VAL setting is a tradeoff. Most systems work fine with 3072.
However, some systems will run out of address space in the system arena
with VAL set to 3072. What VAL does is change the dividing line between
the user arenas and the system arena.

Before the days of above512 (aka upper or high) memory, the system arena
extended from 512MB to 4GB, so it was almost impossible to run out of
address space in the system arena before running out of address space in
the user arenas.

With VAL set to 3072, the kernel, the drivers, the page tables, the kernel
heaps and all the other kernel control tables need to fit in the address
space between 3072MB and 4GB.

To see what's in the system arena, use Theseus's System->Kernel
information->System arena.

David McKenna

unread,
Jun 12, 2022, 2:56:36 PM6/12/22
to Apache for OS/2
Hi Steven,

  Thanks for the explanations! I'll have to play around with the httpd switches a little bit to get more knowledgable about the directives. 

  I have been running with Massimo's settings and trying to siege the server for a couple hours now, but so far all I have gotten is TrapE's in siege (when sieging on a single server - www.davemckenna.com. If I use the local IP address instead - 192.168.21.2 - then no trap), and traps in AFINETK on the server (which I already described and reported to ArcaNoae). No POPUPLOGS or exceptq logs yet. Same behaviour whether using 1536 or 3072 for VIRTUALADDRESSLIMIT.

Regards,

Steven Levine

unread,
Jun 12, 2022, 4:01:22 PM6/12/22
to apa...@googlegroups.com
In <30482411-9b5e-4e50...@googlegroups.com>, on 06/12/22
at 11:56 AM, David McKenna <davidmc...@gmail.com> said:

Hi,

> Thanks for the explanations!

Your welcome. I don't expect folks to automatically understand the
differences between RAM and address space, but when doing this kind of
debugging the differences matter.

>I'll have to play around with the httpd
>switches a little bit to get more knowledgable about the directives.

I find that in most cases the defaults are good. They are probably based
on lots of user feedback over the years.

> I have been running with Massimo's settings and trying to siege the
>server for a couple hours now, but so far all I have gotten is TrapE's in
> siege (when sieging on a single server - www.davemckenna.com.

Please post an exceptq report for the siege trap E, if you have one, or a
popuplog entry. I may see something useful. I've guessing it's an out of
memory problem, but we shall see. Our siege build is pretty old, so this
might give us some incentive to do a rebuild with debug information and
patch to avoid the traps.

>If I use
>the local IP address instead - 192.168.21.2 - then no trap), and traps
>in AFINETK on the server (which I already described and reported to
>ArcaNoae).

You might want to post a query to the ticket. There have been some NIC
driver updates that might have an effect on the AFINETK traps.

>No POPUPLOGS or exceptq logs yet. Same behaviour whether
>using 1536 or 3072 for VIRTUALADDRESSLIMIT.

That's good. Based on what I see, it seems there needs to be a couple of
memory hungry php scripts running before the problems show up.

I'm trying to convince Lewis to carve out some time to upgrade
www.arcanoae.com which is lots of wordpress with lots of plugins. When he
gets a testbed set up, this should help us track down and resolve more of
the edge cases.

You might notice that Massimo does not set StackSize, so each thread gets
a 128KB stack which must live in the lower user private arena. Each php
script starts with a 2MB php heap, which can live in the upper user
private arena.

Have fun,

David McKenna

unread,
Jun 12, 2022, 5:23:57 PM6/12/22
to Apache for OS/2
Hi Steven,

  I do have the latest MultiMac drivers applied on both server and desktop. When siege traps, I just get the trap screen, but no POPUPLOG or exceptq file created. The weird thing is the trap screen will show either 'SOFFICE' or 'DOOBLE" as the offending process if either are running. If I make sure neither is running, then I see 'SIEGE' on the trap screen. I can take a pic if you think it's helpful. I could even get a dump if it's worth your while...

Regards,

Paul Smedley

unread,
Jun 12, 2022, 6:57:32 PM6/12/22
to apa...@googlegroups.com
Hey Guys,

On 13/6/22 05:03, Steven Levine wrote:
> Please post an exceptq report for the siege trap E, if you have one, or a
> popuplog entry. I may see something useful. I've guessing it's an out of
> memory problem, but we shall see. Our siege build is pretty old, so this
> might give us some incentive to do a rebuild with debug information and
> patch to avoid the traps.

Consider me incentivised -
https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip :) Should also
have debug symbols

Cheers,

Paul

Steven Levine

unread,
Jun 12, 2022, 7:18:54 PM6/12/22
to apa...@googlegroups.com
In <ee58212c-cfb6-4233...@googlegroups.com>, on 06/12/22
at 02:23 PM, David McKenna <davidmc...@gmail.com> said:

Hi,



> When siege traps, I just get the trap screen, but no POPUPLOG
>or exceptq file created.

This can happen when the user (aka ring3) stack overflows. The exception
handlers run on the user stack. It can also happen if there are
insufficient resources to write to the popuplog file.

>The weird thing is the trap screen will show
>either 'SOFFICE' or 'DOOBLE" as the offending process if either are
>running.

This a function of how the kernel locates the module name for some kinds
of traps. It's a cosmetic error.

FWIW, I see a similar defect in pmdf whan analyzing system dumps. To
avoid the display error, I unload the symbols for the spurious process.

>trap screen. I can take a pic if you think it's helpful. I could even
>get a dump if it's worth your while...

I would like to see a picture of the trap screen. No need for a dump file
yet.

Thanks,

Steven Levine

unread,
Jun 12, 2022, 7:32:34 PM6/12/22
to apa...@googlegroups.com
In <8d366491-08cf-1760...@smedley.id.au>, on 06/13/22
at 08:27 AM, Paul Smedley <pa...@smedley.id.au> said:

Hi Paul,

>Consider me incentivised -

:-)

>https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip :) Should also
>have debug symbols

Thanks. Let's see how this one works for us. The maintainer is up to
4.1.3, but I don't see any changes that work affect us much.

David McKenna

unread,
Jun 12, 2022, 8:53:06 PM6/12/22
to Apache for OS/2
Hi Paul,

  Thanks for the new build of siege. This one made my server blow up REAL good - it seems to be more intensive than the old one. I crashed the server in about 2 minutes using default settings of siege, and Massimo's httpd settings and VAL = 3072. I still need to avoid 'www.davemckenna.com' in favor of 192.168.21.2 to avoid traps on the siege machine.

 Attached are all the files created by the crash - the 2 TRP files were in the apache24 directory, the httpd traps were in the \var\log\app directory. Siege.txt is the result from the siege (didn't have a log file configured yet).

0070_01.TRP
httpd traps.zip
siege.txt
POPUPLOG.OS2
006D_01.TRP
apache error_log

David McKenna

unread,
Jun 12, 2022, 9:24:25 PM6/12/22
to Apache for OS/2
Hi Steven,

 Attached is an image of the trap screen when trying to use a URL to run siege (specifically, I ran 'siege www.davemckenna.com' from a command line). It happens if I use URL's in the 'urls.txt' file too. Always need to use the IP address.

Regards,
siege trap.jpg

Paul Smedley

unread,
Jun 12, 2022, 9:32:00 PM6/12/22
to apa...@googlegroups.com
Interesting.... using:
u:\siege\bin\siege https://smedley.id.au

is working for me...

On 13/6/22 10:54, David McKenna wrote:
> Hi Steven,
>
>  Attached is an image of the trap screen when trying to use a URL to
> run siege (specifically, I ran 'siege www.davemckenna.com' from a
> command line). It happens if I use URL's in the 'urls.txt' file too.
> Always need to use the IP address.
>
> Regards,
> On Sunday, June 12, 2022 at 8:53:06 PM UTC-4 David McKenna wrote:
>
> Hi Paul,
>
>   Thanks for the new build of siege. This one made my server blow
> up REAL good - it seems to be more intensive than the old one. I
> crashed the server in about 2 minutes using default settings of
> siege, and Massimo's httpd settings and VAL = 3072. I still need to
> avoid 'www.davemckenna.com <http://www.davemckenna.com>' in favor of
> 192.168.21.2 to avoid traps on the siege machine.
>
>  Attached are all the files created by the crash - the 2 TRP files
> were in the apache24 directory, the httpd traps were in the
> \var\log\app directory. Siege.txt is the result from the siege
> (didn't have a log file configured yet).
>
> On Sunday, June 12, 2022 at 7:32:34 PM UTC-4 ste...@earthlink.net wrote:
>
> In <8d366491-08cf-1760...@smedley.id.au>, on 06/13/22
> at 08:27 AM, Paul Smedley <pa...@smedley.id.au> said:
>
> Hi Paul,
>
> >Consider me incentivised -
>
> :-)
>
> >https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip
> <https://smedley.id.au/tmp/siege-4.1.1-os2-20220613.zip> :)
> Should also
> >have debug symbols
>
> Thanks. Let's see how this one works for us. The maintainer is
> up to
> 4.1.3, but I don't see any changes that work affect us much.
>
> Steven
>
> --
> ----------------------------------------------------------------------
>
> "Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
> www.scoug.com <http://www.scoug.com> www.arcanoae.com
> <http://www.arcanoae.com> www.warpcave.com
> <http://www.warpcave.com>
> ----------------------------------------------------------------------
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Apache for OS/2" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to apache2+u...@googlegroups.com
> <mailto:apache2+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/apache2/badc5845-06ba-4309-8637-62634fdf816an%40googlegroups.com
> <https://groups.google.com/d/msgid/apache2/badc5845-06ba-4309-8637-62634fdf816an%40googlegroups.com?utm_medium=email&utm_source=footer>.

Steven Levine

unread,
Jun 12, 2022, 10:43:25 PM6/12/22
to apa...@googlegroups.com
In <badc5845-06ba-4309...@googlegroups.com>, on 06/12/22
at 06:24 PM, David McKenna <davidmc...@gmail.com> said:

HI David,

> Attached is an image of the trap screen when trying to use a URL to run
>siege (specifically, I ran 'siege www.davemckenna.com' from a command
>line).

The traps screen explains why the module name varies, the cs:eip of 168:0
means some code in the kernel jump to location 0, possibly because of a
bad pointer in a control structure.

I'll need a system dump to say more.

>It happens if I use URL's in the 'urls.txt' file too. Always need
>to use the IP address.

I need to review my notes, but I dimly recall I may have thought the
afinetk trap was related to DNS queries.

Steven Levine

unread,
Jun 12, 2022, 11:38:32 PM6/12/22
to apa...@googlegroups.com
In <4cb390fd-9e1e-4330...@googlegroups.com>, on 06/12/22
at 05:53 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

Nice collection of trap and logs files. :-)

It seems you may have replicated Massimo's failure or something very
similar. At some point a process dies while it owns the global lock and
bad things happen.

62a644b3-0078-HTTPD-libcx.log reports:

mutex handle: 800100b6
owner state: dead
owner PID: 0078 (120) <current>
owner TID: 48
request #: 1

which means TID 48 died while it held the lock. I was hoping one of the
exceptq reports would tell us what this thread was doing when it croaked,
but I don't think we got so lucky. I'm not finding it in your collection.

Most of the other reports are all cascading errors.

I do see a couple of reports that imply php may not be catching as many
out of memory cases as we would like. We know this is a work in progress.

Since you are running with 4 CPUs, if you have not already done so, I
recommend trying with MAXCPU=1 and see if that is sufficient to make the
AFINETK traps go away. My notes for ticket 3177 do not indicate that we
ever tried this.

When looking at the trap files, the SIGTRAPs are libcx's response to the
owner died semaphore issue. These should all have corresponding libcx log
files.

The SIGABRT reports are libcn's response to the same issue. Dmitriy
planned to align the libcx and libcn reporting, but Putin's war got in the
way of Dmitriy's life.

62a6449b-006d_01-HTTPD-exceptq.txt looks like stack corruption of some
sort.

62a6447a-006c_01-HTTPD-exceptq.txt looks like a out of memory issue that
needs to be better handled.

Try limiting MaxThreads to 20 and see if this eliminates the owner died
issue. TID 48 means that the PID had 48 threads running and that can
imply a lot of memory usage, especially for PHP apps.

You might also want to get familiar with the Theseus's Linear Usage by
Process. It will give you a useful picture of how much memory the httpd
processes are using.

If you don't have a copy of
http://www.warpcave.com/os2diags/theseus-how-to.txt, I recommend you grab
one. It's a cookbook for the Theseus operations you will use most often.

Steven Levine

unread,
Jun 13, 2022, 12:31:43 AM6/13/22
to apa...@googlegroups.com
In <de8aa032-fb2b-e09b...@smedley.id.au>, on 06/13/22
at 11:01 AM, Paul Smedley <pa...@smedley.id.au> said:

HI,

>Interesting.... using:
>u:\siege\bin\siege https://smedley.id.au

This goes to prove the generalization that every OS/2 user's system is
different. :-) Recall I have the "slow" localhost on one of the systems I
use for testing. It appears to be something to do wih my firefox config,
but wget and other operations are as fast as expected.

David is running a relatively fast setup. It's Gigabyte Technology Co.,
Ltd. H110M-S2H with an Intel Core i5-7400 CPU @ 3.00GHz. I would expect
it to perform quite differently than your virtualized setup.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
www.scoug.com www.arcanoae.com www.warpcave.com
----------------------------------------------------------------------

Steven Levine

unread,
Jun 13, 2022, 1:03:00 AM6/13/22
to apa...@googlegroups.com
In <4cb390fd-9e1e-4330...@googlegroups.com>, on 06/12/22
at 05:53 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

Nice collection of trap and logs files. :-)

I neglected to mention that your httpd error log will contain a error
reports written by php to stderr. Some of they might be useful to review
Some of the php errors are detected before the php runtime is fully
initialized so stderr is the only possible destination for the reports.

I supect you will find some "heap corrupt" reports.

It seems you may have replicated Massimo's failure or something very
similar. At some point a process dies while it owns the global lock and
bad things happen. 62a644b3-0078-HTTPD-libcx.log reports:

mutex handle: 800100b6
owner state: dead
owner PID: 0078 (120) <current>
owner TID: 48
request #: 1

which means TID 48 died while it held the lock. I was hoping one of the
exceptq reports would tell us what this thread was doing when it croaked,
but I don't think we got so lucky. I'm not finding this report in your
collection.

While testing you might want to modify your httpd common log config to
include the pid and pid associated with the connection. Use %{pid}P and
%{tid}P. See

https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#formats

for the gory details. We did this with Massimo's setup, but the problem
tid has not shown up yet in the logs. It's possible that the thread is
failing before httpd is ready to log anything about the request.

We might need to take the heavy handed approach and use the set EXCEPTQ=Z
feature which is documented in exceptq-shl.txt. This will generate an
exceptq report for every process termination. All we need to find it the
report(s) for process that the libcx logs indicate died while holding the
mutex. The rest of the normal termination reports can be discarded
because they will not tell un anything we need to know.

David McKenna

unread,
Jun 13, 2022, 5:25:27 PM6/13/22
to Apache for OS/2
Hi Steven,

  A lot to digest... I'll update the logging in httpd and set MAXCPU=1 to try and get an AFINETK trap (or not). I do have Theseus, but not on the server - I'll install that. I'll also set EXCEPTQ=Z, unless you say no, and capture a dump (memlimited) of the siege trap. If/when I get another blow-up, I'll inquire about what files need to be uploaded.

Regards,

Steven Levine

unread,
Jun 13, 2022, 7:11:44 PM6/13/22
to apa...@googlegroups.com
In <636e3654-ec71-47a9...@googlegroups.com>, on 06/13/22
at 02:25 PM, David McKenna <davidmc...@gmail.com> said:


Hi,

> A lot to digest...

:-)

I'll update the logging in httpd and set MAXCPU=1 to
> try and get an AFINETK trap (or not). I do have Theseus, but not on the
>server - I'll install that. I'll also set EXCEPTQ=Z, unless you say no,
>and capture a dump (memlimited) of the siege trap. If/when I get another
> blow-up, I'll inquire about what files need to be uploaded.

We have two, mostly unrelated issues. While the The AFINETK traps is
triggered by running siege, anything that generates lots of network
activity could trigger it.

These kinds of kernel traps are often caused by missing serialization
logic which only shows up in SMP setups. If the AFINETK system traps go
away run running with MAXCPU=1 it give us a better idea where to look. I
recommend you do this test first. Once we have detemined whether or not
running multiple cores is what allows the trap to occur, you can update
the ticket.

I don't think we need another system dump for this yet. The two I have
show the issue pretty clearly. Of course, as is typical for these kinds
of traps, understanding the sequence of events that allow the failure to
trigger is going to take more work.

The second issue is the the user level traps and failures in httpd and
php. There are a several flavors of these, but the owner died issue is
the one we probably want to solve first. As you have seem, this failure
results in numerous cascading errors. Once this issue is resolved, we
will be left with a small number of remaining issues to resolve.

David McKenna

unread,
Jun 14, 2022, 6:03:09 PM6/14/22
to Apache for OS/2
Hi Steven,

  I set the server to use MAXCPU=1 and also memlimited to 1846 (in case of a dump). Also memlimited the desktop and ran siege using 'siege www,davemckenna.com' and the desktop trapped (as usual) and I was able to capture a dump on the desktop from that if you want to see it.

  Ran siege using 'siege -f c:/siege/etc/urls.txt'  (which has http://192.168.21.2/index.htmlhttps://192.168.21.2/phpMyAdmin/index.php, https://192.168.21.2/Wordpress/index.php, and http://192.168.21.2/filegator/dist/index.php in it). Apache crashed in about 4 minutes, almost exactly like it did the last time (which was full SMP), but no AFINETK trap yet. I'll attach the POPUPLOG and apache error log. If there are any exceptq files listed in the error log that might be interesting let me know.

  A new wrinkle is that on the desktop I have gotten a trap in SOCKETSK twice when running siege, but haven't caught a dump for that yet...

Regards,
POPUPLOG.OS2
apache error_log

Steven Levine

unread,
Jun 14, 2022, 10:07:52 PM6/14/22
to apa...@googlegroups.com
In <ac677af4-d08d-437a...@googlegroups.com>, on 06/14/22
at 03:03 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

> I set the server to use MAXCPU=1 and also memlimited to 1846 (in case
>of a dump). Also memlimited the desktop

What does memlimited the Desktop mean to you?

>and ran siege using 'siege
>www,davemckenna.com' and the desktop trapped (as usual) and I was able to
> capture a dump on the desktop from that if you want to see it.

Do you mean a process dump or a system dump?

> SMP), but no AFINETK trap yet.

This is some evidence that we have an SMP issue with your setup.

>I'll attach the POPUPLOG

The POPUPLOG

06-14-2022 16:34:06 SYS3175 PID 007c TID 0001 Slot 00e0
C:\PROGRAMS\APACHE24\BIN\HTTPD.EXE
c0000005
7d36dd80
P1=00000002 P2=0081ff1c P3=XXXXXXXX P4=XXXXXXXX
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000
DS=0053 DSACC=d0f3 DSLIM=7fffffff
ES=0053 ESACC=d0f3 ESLIM=7fffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:7d36dd80 CSACC=d0df CSLIM=7fffffff
SS:ESP=0053:0081ff20 SSACC=d0f3 SSLIM=7fffffff
EBP=00000000 FLG=00010206

EXCEPTQ.DLL 0001:0000dd80

implies that EXCEPTQ is loading in upper memory again. Is this true? If
so, please mark exceptq to load in lower memory. If not, I guess I'll
have to figure out what the POPUPLOG entry is trying to tell us.

>and apache error
> log.

The apache error contains pretty much what I expected. I notice that the
heap corruption reports are shown as:

zend_mm_heap corrupted

We need to cherry pick commit ccd4a01272e1c94804467e539a8224e72ee67a81.
If Paul is lurking a new 8.1.7 should be available in the fullness of
time.

We are still left with the problem of figuring out why and how:

owner PID: 00c9 (201)
owner TID: 10

died.

I took a quick look at the previous exceptq reports that where SIGSEGV's.
These would be:

62a6447a-006c_01-HTTPD-exceptq.txt
62a6449b-006d_01-HTTPD-exceptq.txt

The traps occurred because the php termination logic is not sufficiently
robust when there is insufficient memory for the php instance to fully
initialize. We know php is attempting to terminate because _zend_shutdown
is in the call path.

To avoid these traps, I may have to change the out of memory handling
logic in zend_mm_init. What happens is than when we are out of memmory
for the php heap:

alloc_globals->mm_heap = zend_mm_init();

sets alloc_globals->mm_heap to NULL. This is fine, but none of the other
code checks the pointer. I attempted to add the missing NULL pointer
checking code with the idea that we could just use the normal thread
termination code. The NULL pointer checks I put in worked as expected,
but I never got to the point where all the required checks were in place.
I may make another attempt at this.

The other option is to terminate the thread in zend_mm_init(). This might
produce some small, intermin leaks, but we are already out of memory and
this memory will be reclaimed when the parent process terminates when the
MaxConnectionsPerChild limit is reached.

> A new wrinkle is that on the desktop I have gotten a trap in SOCKETSK
>twice when running siege,

What this when running MAXCPU=1? Is so, it's just a variant of the
SOCKETSK trap. AFINETK SOCKETS and the NIC drivers all tied and the hip
and share common data structures.

Have fun,

David McKenna

unread,
Jun 15, 2022, 6:21:27 AM6/15/22
to Apache for OS/2
Hi Steven,

  I memlimited the desktop to capture system dumps for the siege trap and socketsk trap. I have a system dump for the siege trap. Not running the desktop (where I run siege) with MAXCPU=1, only the apache server.

  I checked exceptq on the server with 'highmem -u exceptq.dll' and it indicated that it was not modified, so don't think it was set to load high, but re-installed it anyway, just in case. Also looked in a couple of the exceptq trap files and under 'DLL's accessable from this process' EXCPETQ is shown to be in lower memory (1e0b0000).

Regards,

Paul Smedley

unread,
Jun 15, 2022, 6:34:13 AM6/15/22
to apa...@googlegroups.com

Hi Steven

On 15/6/22 09:51, Steven Levine wrote:
> The apache error contains pretty much what I expected. I notice that the
> heap corruption reports are shown as:
>
> zend_mm_heap corrupted
>
> We need to cherry pick commit ccd4a01272e1c94804467e539a8224e72ee67a81.
> If Paul is lurking a new 8.1.7 should be available in the fullness of
> time.

I hadn't cherry-picked the 'debugging' type commits given we were
working on those with 7.4.x with Max, but given we can also work on
debugging with 8.1.7 :)

I've been flat out, but
https://smedley.id.au/tmp/php-8.1.7-os2-dll-debug-20220615.zip should be
available shortly.

I'd been playing around (unsuccessfully) with trying to move to libtool
rather than aplibtool.exe on the weekend, so needed to do a full
reconfigure/rebuild hence just the DLL for now, whilst the rest finishes
building.

Cheers,

Paul

Paul Smedley

unread,
Jun 15, 2022, 6:38:23 AM6/15/22
to apa...@googlegroups.com

Steven Levine

unread,
Jun 15, 2022, 3:04:44 PM6/15/22
to apa...@googlegroups.com
In <de3b8743-e05e-430e...@googlegroups.com>, on 06/15/22
at 03:21 AM, David McKenna <davidmc...@gmail.com> said:


Hi David,

> I memlimited the desktop to capture system dumps for the siege trap and
> socketsk trap. I have a system dump for the siege trap. Not running the
>desktop (where I run siege) with MAXCPU=1, only the apache server.

Oh, I think I understand now. You are really talking about booted
systems, not just the Desktop. One booted system is running the siege
client and the other is running the apache server. I can be a bit
literal, at times.

> I checked exceptq on the server with 'highmem -u exceptq.dll' and it
>indicated that it was not modified, so don't think it was set to load
>high, but re-installed it anyway, just in case. Also looked in a couple
>of the exceptq trap files and under 'DLL's accessable from this process'
>EXCPETQ is shown to be in lower memory (1e0b0000).

I figured it was a red herring. Something is confused. We have the
popuplog:

06-14-2022 16:34:06 SYS3175 PID 007c TID 0001 Slot 00e0
C:\PROGRAMS\APACHE24\BIN\HTTPD.EXE
c0000005
7d36dd80
P1=00000002 P2=0081ff1c P3=XXXXXXXX P4=XXXXXXXX
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000
DS=0053 DSACC=d0f3 DSLIM=7fffffff
ES=0053 ESACC=d0f3 ESLIM=7fffffff
FS=150b FSACC=00f3 FSLIM=00000030
GS=0000 GSACC=**** GSLIM=********
CS:EIP=005b:7d36dd80 CSACC=d0df CSLIM=7fffffff
SS:ESP=0053:0081ff20 SSACC=d0f3 SSLIM=7fffffff
EBP=00000000 FLG=00010206

EXCEPTQ.DLL 0001:0000dd80

and CS:EIP=005b:7d36dd80 is definitely in upper memory so this appears to
be another case of a misidentified module. The reason for this may be
that the DLL that was loaded at CS:EIP=005b:7d36dd80, unloaded before the
kernel attempted to identify the module.

David McKenna

unread,
Jun 15, 2022, 4:20:51 PM6/15/22
to Apache for OS/2
Hi Steven,

  Sorry to be confusing - you are right I should not describe one of my machines as just 'the desktop' but add 'computer' to it.

  I do load the Apache modules (but not httpd.dll) high - I wonder if one of them is being referred to in the popuplog you show?

Regards,

Steven Levine

unread,
Jun 15, 2022, 10:05:43 PM6/15/22
to apa...@googlegroups.com
In <9ee53c1c-5a51-4d84...@googlegroups.com>, on 06/15/22
at 01:20 PM, David McKenna <davidmc...@gmail.com> said:


Hi,

> I do load the Apache modules (but not httpd.dll) high - I wonder if one
> of them is being referred to in the popuplog you show?

Not directly. I need to do a bit more analysis to confirm or deny what I
suspect. It's on my list, but not with a high priority.

My current plan is to implement patches to avoid the known traps.

In parallel we are going to keep looking for the thread that is dieing
while holding the lock.

Steven Levine

unread,
Jun 24, 2022, 12:33:10 AM6/24/22
to apa...@googlegroups.com
at 01:20 PM, David McKenna <davidmc...@gmail.com> said:


Hi all,

Several days ago, I said:

In parallel we are going to keep looking for the thread that is dieing
while holding the lock.

It seems that David's siege testing has produced positive results. We
find

62a6449c-0070_01-HTTPD-exceptq.txt

which reports

62a6449c-0070_01-HTTPD-exceptq.txt:176
007FFB48 1E45497B LIBCN0 0001:0009497B ifree.c#150
__um_free_maybe_lock + 7B 0001:00094900 (ifree.obj)

which is libc telling us that the assert at

ifree.c:65
assert (crumb->x.used.size <= crate->crumb_size);

failed. This assert is triggered while the thread is holding the global
lock.

Eventually, we get to

62a6449c-0070_01-HTTPD-exceptq.txt:140
007FFA58 1E4389B2 LIBCN0 0001:000789B2 kill.c#76 __std_kill + 22
0001:00078990 (kill.obj)

and the process dies while holding the global lock which causes all the
other processes to eventually fail with owner died.

As to why I never got to see an exceptq report like this from Massimo with
the many reports he has uploaded to his tickets, I can only guess. The
symptoms have always indicated that this had occurred.

It's not obvious which bit of php code trashed the heap, but the exceptq
report seems to imply that php failed during what is called sapi init.
This code may have vectored to the php shutdown hook before all the
pointers the shutdown hook might access were properly initialized.

Lewis G Rosenthal

unread,
Jun 24, 2022, 8:40:42 AM6/24/22
to apa...@googlegroups.com
Hi...
This is truly amazing and welcome news. This thing has eluded us for ages.
Bravo, and thanks Dave, for a good exceptq report.

--
Lewis
-------------------------------------------------------------
Lewis G Rosenthal, CNA, CLP, CLE, CWTS, EA
Rosenthal & Rosenthal, LLC www.2rosenthals.com
visit my IT blog www.2rosenthals.net/wordpress
-------------------------------------------------------------

Steven Levine

unread,
Jun 24, 2022, 12:32:33 PM6/24/22
to apa...@googlegroups.com
In <62B5B0C1...@2rosenthals.com>, on 06/24/22
at 08:40 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>This is truly amazing and welcome news. This thing has eluded us for
>ages. Bravo, and thanks Dave, for a good exceptq report.

Definitely. If I had to guess, it's the result of Dave's good testing
methodology. Rather than attempting to select what to test, he saved
away all the old data. That way he had a complete, fresh set of related
data to upload when the time came.

We actually know where in the heap the corruption occurred:

62a6449c-0070_01-HTTPD-exceptq.txt:168
007FFB08 1E454F8A LIBCN0 0001:00094F8A ifree.c#65
__um_crumb_free_maybe_lock + 2DE 0001:00094CAC (ifree.obj)

Offset Name Type Hex Value

8 crate pointer to type 0x238 20764000
12 crumb pointer to type 0x244 20764154
16 lock 32 bit signed 1

Unforunately exceptq does no know I want to inspect the content of
20764000, so the memory block is not included in the report.

With a bit of luck, it should be relatively easy to insert a bit of
temporary heap check logic in zend_hash_destroy and force a process dump
when the corrupton is detected.

There should be a fresh 8.1.7 build available once Paul has time to
cherry-pick the latest 7.4 patch and do an 8.1.7 build. This patch will
avoid recursion in the php error handler when it runs out of heap.

Paul Smedley

unread,
Jun 24, 2022, 5:26:48 PM6/24/22
to apa...@googlegroups.com
Hey All,

On 25/6/22 01:41, Steven Levine wrote:
> There should be a fresh 8.1.7 build available once Paul has time to
> cherry-pick the latest 7.4 patch and do an 8.1.7 build. This patch will
> avoid recursion in the php error handler when it runs out of heap.

https://smedley.id.au/tmp/php-8.1.7-os2-debug-20220625.zip

Cheers,

Paul

David McKenna

unread,
Jun 24, 2022, 7:19:45 PM6/24/22
to Apache for OS/2
Thanks Paul and Steven for your great detective work! Getting the logs would not have been possible without Paul's newest 'Siege' too.

 I installed the latest (20220625) version of php8.1, set httpd-mpm to my defaults, CPU=1, cleared the logs, and ran 'siege -f C:\siege\etc\urls.txt' (same entries as above). Let it run until I started seeing an alert about 'Can't determine chunk size', then stopped it (about 20 minutes). The apache server was still running, and there were no POPUPLOG or excetptq trap files anywhere, only the apache error log showed some messages - I'll attach.

 I do have some odd problems with siege. Beside the URL issue, one is it does not seem to honor the 'time' directive in siege.conf - it runs forever until it either errors out (due to the 'failures' directive), or I reboot because I can not stop it, or kill it. If I hit <ctrl>C, a message says 'Lifting the siege...' but it never stops. Trying to kill with 'Top' doesn't work either.

 Oh - one other thing - this new drop of php8.1 is missing the opcache module... maybe an oversight?

Regards,

apache error_log

Steven Levine

unread,
Jun 24, 2022, 9:31:00 PM6/24/22
to apa...@googlegroups.com
In <81926abd-18e4-4771...@googlegroups.com>, on 06/24/22
at 04:19 PM, David McKenna <davidmc...@gmail.com> said:

Hi guys,

>Thanks Paul and Steven for your great detective work! Getting the logs
>would not have been possible without Paul's newest 'Siege' too.

Definitely.

>Let it run until I
>started seeing an alert about 'Can't determine chunk size', then stopped
>it (about 20 minutes).

How many requests occurred during this time period? I curious as to what
the error rate was. If you are using the siege defaults, we could be
looking at 10K request or more over 19 minutes.

FWIW, this looks pretty good overall. You only had 8 heap corrupt errors
on what was probably a heavily stressed system.

There is a pattern to the heap corrupt errors. Notice that all occurred
on TID 1. However, php requests are not run on TID 1 or TID 2. TID 1 is
child main thread. It listens for connection requests and passes them on
the worker threads from processing. TID 2 is the server maintainance
thread. It manages the spare worker thread. All http requests are
processed by TID 3 and up.

TID 1 is special in that it handles signals. However, it should never be
running PHP code.

> I do have some odd problems with siege. Beside the URL issue, one is it
>does not seem to honor the 'time' directive in siege.conf - it runs
>forever until it either errors out (due to the 'failures' directive), or
>I reboot because I can not stop it, or kill it. If I hit <ctrl>C, a
>message says 'Lifting the siege...' but it never stops. Trying to kill
>with 'Top' doesn't work either.

That says siege is getting stuck in the exit list. If you have top set up
to support the Ctrl-F hard kill, and you remember to use it before Ctrl-C,
you can avoid getting stuck in the exit list.

Paul and I will look at the siege issues and get them resolved.

A quick look at the siege code tells me that --time probably fails for the
same reason that Ctrl-C fails. Both are signaling the handler thread to
terminate and something is going wrong. Try running siege with --debug
and see if any useful messages appear.

Paul perhaps it's time to put your siege sources on github?

David McKenna

unread,
Jun 25, 2022, 12:37:19 AM6/25/22
to Apache for OS/2
Hi Steven,

  Unfortunately, because I can't stop the siege cleanly, I don't get a report, so don't know how many requests occurred. Previously, it would always end by hitting the number of failures, then give a report - but it actually works without failures now :-)

 I tried running siege marked for single-processor mode, and that allows running with URL's instead of IP addresses, so some SMP issue there. Didn't help with the inability to stop it though.

Regards,

Paul Smedley

unread,
Jun 25, 2022, 1:55:00 AM6/25/22
to apa...@googlegroups.com
Hi Dave,

On 25/6/22 14:07, David McKenna wrote:
>   Unfortunately, because I can't stop the siege cleanly, I don't get a
> report, so don't know how many requests occurred. Previously, it would
> always end by hitting the number of failures, then give a report - but
> it actually works without failures now :-)

I've seen this as well,but have generally found that running Top, and
using 'force-kill' on the PID will work - and I still get the report.

Cheers,

Paul

Paul Smedley

unread,
Jun 25, 2022, 2:23:33 AM6/25/22
to apa...@googlegroups.com
Hey Steven,

On 25/6/22 09:24, Steven Levine wrote:
> In <81926abd-18e4-4771...@googlegroups.com>, on 06/24/22
> at 04:19 PM, David McKenna <davidmc...@gmail.com> said:
>> I do have some odd problems with siege. Beside the URL issue, one is it
>> does not seem to honor the 'time' directive in siege.conf - it runs
>> forever until it either errors out (due to the 'failures' directive), or
>> I reboot because I can not stop it, or kill it. If I hit <ctrl>C, a
>> message says 'Lifting the siege...' but it never stops. Trying to kill
>> with 'Top' doesn't work either.
>
> That says siege is getting stuck in the exit list. If you have top set up
> to support the Ctrl-F hard kill, and you remember to use it before Ctrl-C,
> you can avoid getting stuck in the exit list.
>
> Paul and I will look at the siege issues and get them resolved.
>
> A quick look at the siege code tells me that --time probably fails for the
> same reason that Ctrl-C fails. Both are signaling the handler thread to
> terminate and something is going wrong. Try running siege with --debug
> and see if any useful messages appear.
>
> Paul perhaps it's time to put your siege sources on github?

I'll look to do this ASAP - as I recall, I didn't have to change anything.

Cheers,

Paul

Steven Levine

unread,
Jun 25, 2022, 11:42:14 AM6/25/22
to apa...@googlegroups.com
In <f9f7e6c7-a53c-4c1a...@googlegroups.com>, on 06/24/22
at 09:37 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> Unfortunately, because I can't stop the siege cleanly, I don't get a
>report, so don't know how many requests occurred.

What you can do is run siege as either

siege -f urls.txt 2>&1 | tee tmp.out

or

siege -f urls.txt 2>&1 >tmp.out
tail - f tmp.out

I am the right kind of color blind so I cannot read the hardcoded colored
output and require these kinds of workarounds.

Steven Levine

unread,
Jun 25, 2022, 1:25:59 PM6/25/22
to 'Paul Smedley' via Apache for OS/2
In <a6581086-1853-88f7...@smedley.id.au>, on 06/25/22
at 03:24 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi Paul,

>I've seen this as well,but have generally found that running Top, and
>using 'force-kill' on the PID will work - and I still get the report.

The debugger tells us that while siege is trying to lift itself, the
threads are stuck in http_read_headers or possibly one of its callers.
The v4.1.1 git sources do not seem to match the binaries as well as I
would expect. Of course, this could be the debugger.

When you get a moment, please check if you have any source mods.

However, I don't think this is has anything to do with he failure to stop,
but check my reading. As I read the pthreads code, pthread_cancel() calls
pthread_kill() which is effectively a nop for our needs.

Siege is using pthread_kill to send SIGUSR1 to the specific thread, so
that this thread can invoke pthread_exit for itself. We should be able to
come up with a workaround, although it might require a global shutdown
flag.

Thanks,

David McKenna

unread,
Jun 25, 2022, 4:42:55 PM6/25/22
to Apache for OS/2
Hi Steven,

  I tried your method of 'siege -f urls.txt 2>&1 |tee tmp.out'. This time it hit the 'failures' limit (256) and ended cleanly anyway, but also created the tmp.out file. The apache log is essentially empty:

[Sat Jun 25 16:28:02.598000 2022] [mpm_mpmt_os2:notice] [pid 504:tid 1] AH00206: Apache/2.4.53 (OS/2) OpenSSL/1.1.1o PHP/8.1.7 configured -- resuming normal operations
[Sat Jun 25 16:33:20.242000 2022] [mpm_mpmt_os2:notice] [pid 504:tid 1] AH00201: caught SIGTERM, shutting down

  The tmp.out contains:

[ [1;32malert [0m] Zip encoding disabled; siege requires zlib support to enable it
** SIEGE 4.1.1
** Preparing 25 concurrent users for battle.
The server is now under siege...siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:               13125 hits
Availability:               98.09 %
Elapsed time:              152.13 secs
Data transferred:          752.35 MB
Response time:                0.28 secs
Transaction rate:           86.27 trans/sec
Throughput:                4.95 MB/sec
Concurrency:               24.41
Successful transactions:       13125
Failed transactions:             256
Longest transaction:            8.02
Shortest transaction:            0.00
 
LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

 Siege.log says:

**** siege aborted due to excessive socket failure. ****

No trap files present on the server computer anywhere.  Makes me wonder if this is not an apache/php issue, but a siege issue...

Regards,

Paul Smedley

unread,
Jun 25, 2022, 5:32:48 PM6/25/22
to apa...@googlegroups.com

HI Steven,

On 26/6/22 01:14, Steven Levine wrote:
> In <a6581086-1853-88f7...@smedley.id.au>, on 06/25/22
> at 03:24 PM, "'Paul Smedley' via Apache for OS/2"
> <apa...@googlegroups.com> said:
>
> Hi Paul,
>
>> I've seen this as well,but have generally found that running Top, and
>> using 'force-kill' on the PID will work - and I still get the report.
>
> The debugger tells us that while siege is trying to lift itself, the
> threads are stuck in http_read_headers or possibly one of its callers.
> The v4.1.1 git sources do not seem to match the binaries as well as I
> would expect. Of course, this could be the debugger.
These weren't intended to be ready for perusal yet :) I'm battling
trying to re-generate configure for those sources.

> When you get a moment, please check if you have any source mods.
Will ping when the sources match the build - most likely this evening my
time.

Cheers,

Paul

Steven Levine

unread,
Jun 25, 2022, 6:28:12 PM6/25/22
to apa...@googlegroups.com
In <bd32caab-875a-4461...@googlegroups.com>, on 06/25/22
at 01:42 PM, David McKenna <davidmc...@gmail.com> said:

Hi David.

> I tried your method of 'siege -f urls.txt 2>&1 |tee tmp.out'. This time
> it hit the 'failures' limit (256) and ended cleanly anyway, but also
>created the tmp.out file. The apache log is essentially empty:

>[Sat Jun 25 16:28:02.598000 2022] [mpm_mpmt_os2:notice] [pid 504:tid 1]
>AH00206: Apache/2.4.53 (OS/2) OpenSSL/1.1.1o PHP/8.1.7 configured --
>resuming normal operations
>[Sat Jun 25 16:33:20.242000 2022] [mpm_mpmt_os2:notice] [pid 504:tid 1]
>AH00201: caught SIGTERM, shutting down

> The tmp.out contains:

This is what I would expect to see.

>The server is now under siege...siege aborted due to excessive socket
>failure; you
>can change the failure threshold in $HOME/.siegerc

I've seen this too, but only intermittently. You can override this limit in %HOME\.siege\siege.conf using the failures keyword.

Note that it seems that the code is not 100% in sync regarding the config file name. Some of the comments mention .siegerc, but this file appears be deprecated. There is code in siege.config that will delete it along with the comment

# Moved $HOME/.siegerc to .siege/siege.conf

The code that determines the configure file name is

init.c:83
/**
* Check if we were passed the -R switch to use a different siegerc file.
* If not, check for the presence of the SIEGERC variable, otherwise
* use default of ~/.siege/siege.conf
*/
if(strcmp(my.rc, "") == 0){
if((e = getenv("SIEGERC")) != NULL){
snprintf(my.rc, sizeof(my.rc), "%s", e);
} else {
snprintf(my.rc, sizeof(my.rc), "%s/.siege/siege.conf", getenv("HOME"));
if (stat(my.rc, &buf) < 0 && errno == ENOENT) {
snprintf(my.rc, sizeof(my.rc), CNF_FILE);
}
}
}

where configure defines

#define CNF_FILE "$sysconfdir/siegerc"

> Siege.log says:

Now that I've read siege.conf, it seems we can eliminate the need for redirecting stdout and stderr.

>No trap files present on the server computer anywhere. Makes me wonder if
>this is not an apache/php issue, but a siege issue...

The socket timeouts are a system issue. siege is going report its socket timeouts, as will http.

Steven Levine

unread,
Jun 25, 2022, 6:36:15 PM6/25/22
to 'Paul Smedley' via Apache for OS/2
In <9884054f-ab56-86df...@smedley.id.au>, on 06/26/22
at 07:02 AM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi Paul,

>These weren't intended to be ready for perusal yet :)

As you should realize, I can be literal minded. :-) When you said siege
built out of the box, I pulled a copy of the sources from the siege git
repo figuring that the sources would match what you used.

>I'm battling
>trying to re-generate configure for those sources.

Oh well.

Paul Smedley

unread,
Jun 25, 2022, 6:41:26 PM6/25/22
to apa...@googlegroups.com
Hey guys,

On 26/6/22 07:58, Steven Levine wrote:
> In <9884054f-ab56-86df...@smedley.id.au>, on 06/26/22
> at 07:02 AM, "'Paul Smedley' via Apache for OS/2"
> <apa...@googlegroups.com> said:
>
>> These weren't intended to be ready for perusal yet :)
>
> As you should realize, I can be literal minded. :-) When you said siege
> built out of the box, I pulled a copy of the sources from the siege git
> repo figuring that the sources would match what you used.
Don't forget, we're on 4.1.1 - which isn't the latest.

Anyway - https://smedley.id.au/tmp/siege-4.1.1-os2-20220626.zip is built
from https://github.com/psmedley/siege-os2.git

>> I'm battling
>> trying to re-generate configure for those sources.
>
> Oh well.
For some reason, the generated configure doesn't quite work out of the
box - we need sys/types.h included for some of the defines to work.

Hacked around it for now :P

Back to the subject at large, I do see in src/browser.c:
#if defined(hpux) || defined(__hpux) || defined(WINDOWS)
# define SIGNAL_CLIENT_PLATFORM
#endif

I wonder what this code path does differently? I might explore it later
today. About to pack up the house down here and then we're having lunch
with Dad, then will be home later this afternoon.

Cheers,

Paul

David McKenna

unread,
Jun 25, 2022, 6:58:40 PM6/25/22
to Apache for OS/2
 Hi Paul,

  Thanks for the new build... I'll give it a try tonight - crystal clear skies, so will be out 'till dawn with the telescope. Enjoy lunch!

Regards,

David McKenna

unread,
Jun 25, 2022, 7:05:30 PM6/25/22
to Apache for OS/2
Hi Steven,

  According to: http://www.edm2.com/index.php/SOCKETSK.SYS there are 2 parameters that can be adjusted on the SOCKETSK.SYS device line in CONFIG.SYS. Do you think maybe these can be optimized for an apache server beyond the defaults? Any docs you know of cover network optimization on OS/2?

Regards,

Steven Levine

unread,
Jun 25, 2022, 8:24:37 PM6/25/22
to 'Paul Smedley' via Apache for OS/2
In <45dcca85-0e3b-6aa5...@smedley.id.au>, on 06/26/22
at 08:11 AM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi,

>Don't forget, we're on 4.1.1 - which isn't the latest.

That part I did not miss. :-)

>g br
* (HEAD detached at v4.1.1)
master
I'll switch over when I get back to testing.

>Back to the subject at large, I do see in src/browser.c:
>#if defined(hpux) || defined(__hpux) || defined(WINDOWS)
># define SIGNAL_CLIENT_PLATFORM
>#endif

I saw that but did not review it at the time. HPUX is an old-ish unix.
Looking now, I don't think this will be a solution for us. There's also
the fact this this code references signal_handler, which does not appear
to be defined in the sources.

>About to pack up the house down here and then we're having lunch
>with Dad, then will be home later this afternoon.

Enjoy,

Steven Levine

unread,
Jun 26, 2022, 1:03:30 AM6/26/22
to 'Paul Smedley' via Apache for OS/2
at 08:11 AM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi all,
Looks good.

Steven Levine

unread,
Jun 26, 2022, 1:06:56 AM6/26/22
to apa...@googlegroups.com
In <1a1fa8f2-d73f-4269...@googlegroups.com>, on 06/25/22
at 03:58 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

> Thanks for the new build... I'll give it a try tonight - crystal clear
>skies, so will be out 'till dawn with the telescope.

You get to watch the planets line up. Enjoy. I'm too close to the beach
for seeing much wit combination of city light and fog. If I want to see
anything, I have to drive up into the mountains.

Steven Levine

unread,
Jun 26, 2022, 1:25:25 AM6/26/22
to apa...@googlegroups.com
In <d0394b0e-2cdb-4627...@googlegroups.com>, on 06/25/22
at 04:05 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

> According to: http://www.edm2.com/index.php/SOCKETSK.SYS there are 2
>parameters that can be adjusted on the SOCKETSK.SYS device line in
>CONFIG.SYS.

That list is far from complete.

FWIW, at one time Lars recommended /mbuf:512 /mem:3600, but I've never
done the analysis to know if I agree with this. The defaults are /mbuf:30
/mem:75.

Bumping these values will avoid losing packets, but is not going to
improve overall system thoughput.

The best TCP/IP performance tweaks I am aware of come from tuning
INETCFG.INI. There's an old thread that discusses results:

http://www.os2world.com/forum/index.php?topic=1486.0

David McKenna

unread,
Jun 26, 2022, 9:36:19 AM6/26/22
to Apache for OS/2
Hi Steven,

  Tried using Lars recommended values you mention on both the server computer and the siege computer, but it doesn't seem to help with siege - got virtually identical results as I last reported. I'll just stick with the defaults for now...

Regards,

Steven Levine

unread,
Jun 26, 2022, 12:05:58 PM6/26/22
to apa...@googlegroups.com
In <cea78f4f-3ef7-4d6f...@googlegroups.com>, on 06/26/22
at 06:36 AM, David McKenna <davidmc...@gmail.com> said:


Hi,

> Tried using Lars recommended values you mention on both the server
>computer and the siege computer, but it doesn't seem to help with siege -
> got virtually identical results as I last reported.

This is pretty much what I expected. We would be getting messages if
socketsk was running out of resources.

The window size mods I mentioned will probably do more to help production
servers with GET requests that transfer large amounts of data. The window
size mods are not doing to do much for siege because the responses to the
links in your urls.txt are going to be relatively small.

David McKenna

unread,
Jun 26, 2022, 5:09:41 PM6/26/22
to Apache for OS/2
 Well, duh! Found the source of my 'socket failures' - I stupidly disabled the 'root' account in MySQL (didn't think I needed it) and as a result, Wordpress wouldn't work (couldn't access the database), so every time siege tried to ping Wordpress, it would fail. Re-instated 'root' and now siege just keeps going and going...

Regards,

Steven Levine

unread,
Jun 26, 2022, 6:46:34 PM6/26/22
to apa...@googlegroups.com
In <008a8726-3dbe-45eb...@googlegroups.com>, on 06/26/22
at 02:09 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

> Well, duh! Found the source of my 'socket failures' - I stupidly
>disabled the 'root' account in MySQL (didn't think I needed it) and as a
>result, Wordpress wouldn't work (couldn't access the database), so every
>time siege tried to ping Wordpress, it would fail. Re-instated 'root'
>and now siege just keeps going and going...

Oops. :-)

What's the rate of heap corrupted messages?

It might be worthwhile to crank up Theseus and look a memory usage.

Also, you might want to try with 50 threads.

For php requests, the default usage will 2MB per thread for the heap and
128KB for the stack (unless overridden). How much more heap a given
script might require depends on the complexity of the php script. There
should be no problem starting with this many threads.

Ian Manners

unread,
Jun 26, 2022, 11:20:21 PM6/26/22
to apa...@googlegroups.com
Hi Everyone,

I would like to thank everyone for the work thats happening
on all these problems, it will be very much appreciated when
I get my server out of storage and back on the internet :)

Cheers
Ian


David McKenna

unread,
Jun 30, 2022, 6:57:27 PM6/30/22
to Apache for OS/2
 A little while back, Steven asked how many transactions occurred during a 20 minute siege I ran, but I couldn't say because I didn't get a report. With the latest drop of siege, I set 'time = 20M' and let it rip, then got this report:

[error] stack: 1408F119 : error:1408F119:SSL routines:ssl3_get_record:decryption
 failed or bad record mac
[error] Failed to make an SSL connection: 5
[error] SSL_write() failed (syscall)

siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:                 117726 hits
Availability:                  99.53 %
Elapsed time:                1203.71 secs
Data transferred:            4391.23 MB
Response time:                  0.25 secs
Transaction rate:              97.80 trans/sec
Throughput:                     3.65 MB/sec
Concurrency:                   24.38
Successful transactions:      112844
Failed transactions:             553
Longest transaction:            6.68

Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

  Apache was still ticking after the siege with no trouble! No traps or POPUPLOG.  I'll attach the apache log for the time of the siege - overall, I'd say this thing is working great now!

Regards,
apache error_log

Steven Levine

unread,
Jul 1, 2022, 12:31:55 AM7/1/22
to apa...@googlegroups.com
In <df08b1da-9d93-407a...@googlegroups.com>, on 06/30/22
at 03:57 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

>With the latest drop of siege, I set 'time = 20M' and let it
>rip, then got this report:

>[error] stack: 1408F119 : error:1408F119:SSL
>routines:ssl3_get_record:decryption
> failed or bad record mac
>[error] Failed to make an SSL connection: 5
>[error] SSL_write() failed (syscall)
>siege aborted due to excessive socket failure; you
>can change the failure threshold in $HOME/.siegerc

>Transactions: 117726 hits
>Availability: 99.53 %
>Elapsed time: 1203.71 secs
>Data transferred: 4391.23 MB
>Response time: 0.25 secs
>Transaction rate: 97.80 trans/sec
>Throughput: 3.65 MB/sec
>Concurrency: 24.38
>Successful transactions: 112844
>Failed transactions: 553
>Longest transaction: 6.68
>Shortest transaction: 0.00

If I am reading this correctly and you have an interesting set of
coincidences. The siege aborted due to excessive socket failures message
means you hit the error count limit because:

main.c:530
if (my.failures > 0 && my.failed >= my.failures) {
fprintf(stderr, "%s aborted due to excessive socket failure; you\n",
program_name);
fprintf(stderr, "can change the failure threshold in $HOME/.%src\n",
program_name);
}

However, 1203 / 60 = 20.05 minutes so it's easy to assume that siege
stopped because the timer expired.

The calculated error rate is 553 / 112844 * 100 = 0.49, which is pretty
good for a loaded system. The actual error rate is a bit better because
the version of siege you have counts each thread exit as an error.

The

[Thu Jun 30 18:26:29.437000 2022] [mpm_mpmt_os2:notice] [pid 2103:tid 1]
(OS 10035)Resource temporarily unavailable: apr_socket_accept

reports would not show up on a production system, where loglevel would be
set to warn.

David McKenna

unread,
Jul 1, 2022, 6:18:36 AM7/1/22
to Apache for OS/2
Hi Steven,

  I always get the 'excessive socket failure message' no matter what 'time' is set to. Here is the result of 'time = 2M':

[error] stack: 1408F119 : error:1408F119:SSL routines:ssl3_get_record:decryption
 failed or bad record mac
[error] Failed to make an SSL connection: 5
[error] SSL_write() failed (syscall)
siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:                  11666 hits
Availability:                  95.33 %
Elapsed time:                 125.04 secs
Data transferred:             434.34 MB
Response time:                  0.25 secs
Transaction rate:              93.30 trans/sec
Throughput:                     3.47 MB/sec
Concurrency:                   23.66
Successful transactions:       11204
Failed transactions:             572
Longest transaction:            5.33

Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

 Can the errors be occuring after the time set is reached? I am also using 'loglevel warn' in httpd.conf (attached). I vaguely remember discussing this issue before...

Regards,
httpd.conf

Steven Levine

unread,
Jul 1, 2022, 11:39:15 AM7/1/22
to apa...@googlegroups.com
In <c26bd08e-cc3f-487a...@googlegroups.com>, on 07/01/22
at 03:18 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> I always get the 'excessive socket failure message' no matter what
>'time' is set to. Here is the result of 'time = 2M':

Take a look at your failures setting in siege.conf. This:

>siege aborted due to excessive socket failure; you
>can change the failure threshold in $HOME/.siegerc

means you have exceeded to error limit. This:

>Failed transactions: 572

means you where getting countable errors.

> Can the errors be occuring after the time set is reached?

Until the buld based on pull request #4, this is what was happening. The
pr#4 build partially fixed this and forthcoming pr#5 build should avoid
the rest of the spurious counting.

However, there was never more than one extra count per thread and your
error count is much larger, so these are real errors.

Let's see want changes with the pr#5 build which should arrive after Paul
wakes up.

>I am also
>using 'loglevel warn' in httpd.conf (attached). I vaguely remember
>discussing this issue before...

That's fine, but a bit confusing. Loglevel warn id supposed to suppress
the "notice" level log entries. See

https://httpd.apache.org/docs/2.4/mod/core.html#loglevel

Something for Paul and I to think about, I guess.

Paul Smedley

unread,
Jul 1, 2022, 5:16:58 PM7/1/22
to apa...@googlegroups.com

Hey Guys,

On 2/7/22 00:43, Steven Levine wrote:
> Let's see want changes with the pr#5 build which should arrive after Paul
> wakes up.
https://smedley.id.au/tmp/siege-4.1.3-os2-20220702.zip

Cheers,

Paul

David McKenna

unread,
Jul 1, 2022, 6:43:22 PM7/1/22
to Apache for OS/2
Thanks Paul (and Steven), this one is the best yet:

[C:\siege\bin]siege -f c:\siege\etc\urls.txt
** SIEGE 4.1.3

** Preparing 25 concurrent users for battle.
The server is now under siege...
Transactions:                  59058 hits
Availability:                 100.00 %
Elapsed time:                 603.85 secs
Data transferred:            2201.14 MB
Response time:                  0.24 secs

Transaction rate:              97.80 trans/sec
Throughput:                     3.65 MB/sec
Concurrency:                   23.89
Successful transactions:       56622
Failed transactions:               1
Longest transaction:            9.21

Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

 Set for 'time=10M' and 'failures=5'. Awesome!

Regards,

Steven Levine

unread,
Jul 1, 2022, 8:27:21 PM7/1/22
to apa...@googlegroups.com
In <aa530682-c181-44d9...@googlegroups.com>, on 07/01/22
at 03:43 PM, David McKenna <davidmc...@gmail.com> said:

Hi all,

>Thanks Paul (and Steven), this one is the best yet:

Siege does seem to know how to count now.

>[C:\siege\bin]siege -f c:\siege\etc\urls.txt
>** SIEGE 4.1.3
>** Preparing 25 concurrent users for battle.
>The server is now under siege...
>Transactions: 59058 hits
>Availability: 100.00 %
>Elapsed time: 603.85 secs
>Data transferred: 2201.14 MB
>Response time: 0.24 secs
>Transaction rate: 97.80 trans/sec
>Throughput: 3.65 MB/sec
>Concurrency: 23.89
>Successful transactions: 56622
>Failed transactions: 1
>Longest transaction: 9.21
>Shortest transaction: 0.00

> Set for 'time=10M' and 'failures=5'. Awesome!

We can live with one failure every 10 minutes. I'm still bit curious
about the run that reported

>Failed transactions: 572

Where these real failures or is my analysis of the count issues not quite
right yet? With the server up and running, I never saw and error count
higher than the number of concurrent users.

David McKenna

unread,
Jul 2, 2022, 7:07:43 AM7/2/22
to Apache for OS/2
Hi Steven,

  With the previous drop (20220701b) I would always end up with ~550 failures and the 'siege aborted due to excessive socket failure' message no matter how long it ran - 2, 5, 10 or 20 minutes (BTW, on those tests, 'failures' was set to 256). That led me to believe most failures were occurring after ending the test by the code. With the new drop (20220702) I haven't got more than 3 failures in 20 minutes, and sometimes get 0. I guess I prefer to believe the new drop is correct, but you would have a better insight on that. In case it is relevant, I've been using this in httpd-mpm.conf:

<IfModule mpm_mpmt_os2_module>
    ThreadStackSize 65536
    StartServers 2
    MinSpareThreads  5
    MaxSpareThreads 10
    MaxRequestsPerChild 1000
    MaxThreads 50
</IfModule>

 and my urls.txt is:


   I still need to do some testing using my URL instead of the local IP address (which needs siege.exe set to single processor mode). Doing that seems to be much slower (fewer transactions over time) so less aggressive on the server, presumably because of taking time to go to the DNS.

Regards,

David McKenna

unread,
Jul 2, 2022, 7:15:45 AM7/2/22
to Apache for OS/2
   I should emphasize that with the previous version (20220701b) it would always run the amount of time I set it to even though it claimed to abort due to socket failures...

 Regards,

Steven Levine

unread,
Jul 2, 2022, 2:00:59 PM7/2/22
to apa...@googlegroups.com
In <f5ad2780-59b8-4cf4...@googlegroups.com>, on 07/02/22
Regards,
Reply all
Reply to author
Forward
0 new messages