Siege 4.1.3

224 views
Skip to first unread message

Paul Smedley

unread,
Jun 26, 2022, 5:07:45 AM6/26/22
to Apache HTTP Server for OS/2
I managed to get 4.1.3 building....
https://smedley.id.au/tmp/siege-4.1.3-os2-20220626.zip

David McKenna

unread,
Jun 26, 2022, 9:38:20 AM6/26/22
to Apache for OS/2
Thanks Paul! This one works as well as the last, with all the same quirks...

Regards,

Steven Levine

unread,
Jun 26, 2022, 12:12:54 PM6/26/22
to apa...@googlegroups.com
In <9e31d806-6acf-4d9e...@googlegroups.com>, on 06/26/22
at 06:38 AM, David McKenna <davidmc...@gmail.com> said:

Hi,

>Thanks Paul! This one works as well as the last, with all the same
>quirks...

I pushed an interim patch that should convince siege to respond better to
Ctrl-C kill requests.

You ticket #3177 has not gotten much attention recently. You might want
to post a note to the ticket asking for a status update.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> Warp/DIY/BlueLion etc.
www.scoug.com www.arcanoae.com www.warpcave.com
----------------------------------------------------------------------

Paul Smedley

unread,
Jun 27, 2022, 5:03:34 AM6/27/22
to apa...@googlegroups.com

Hey all,

On 27/6/22 01:36, Steven Levine wrote:
> In <9e31d806-6acf-4d9e...@googlegroups.com>, on 06/26/22
> at 06:38 AM, David McKenna <davidmc...@gmail.com> said:
>
> Hi,
>
>> Thanks Paul! This one works as well as the last, with all the same
>> quirks...
>
> I pushed an interim patch that should convince siege to respond better to
> Ctrl-C kill requests.
https://smedley.id.au/tmp/siege-4.1.3-os2-20220627.zip includes this
patch- seems to work OK here.

Cheers,

Paul

David McKenna

unread,
Jun 27, 2022, 6:54:35 AM6/27/22
to Apache for OS/2
Thanks Paul! This one is a mixed bag here - <CTRL>C does work, but I can't get it to use my URLS.TXT file - when I try 'siege -f C:\siege\etc\urls.txt' (like I always do) it comes back with a list of options as if I typed something wrong. It also still does not honor the 'time' directive (I set to 2 minutes) and I wonder if it is not honoring the 'failures' directive now too. Here is the result of 'siege 192.168.21.2' after running about 4 minutes I hit <CTRL>C:

[C:\siege\bin]siege 192.168.21.2
** SIEGE 4.1.3
** Preparing 25 concurrent users for battle.
The server is now under siege...
Lifting the server siege...siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:                 460718 hits
Availability:                  99.94 %
Elapsed time:                 236.22 secs
Data transferred:             172.23 MB
Response time:                  0.01 secs
Transaction rate:            1950.38 trans/sec
Throughput:                     0.73 MB/sec
Concurrency:                   24.64
Successful transactions:      460718
Failed transactions:             280
Longest transaction:            1.03
Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

 Notice it says 'siege aborted due to excessive socket failure' AFTER I hit <CTRL>C. Maybe the <CTRL>C is the cause of the failures?...

Regards,

David McKenna

unread,
Jun 27, 2022, 7:11:58 AM6/27/22
to Apache for OS/2
 Failures set to '512':

[C:\siege\bin]siege 192.168.21.2
** SIEGE 4.1.3
** Preparing 25 concurrent users for battle.
The server is now under siege...
Lifting the server siege...siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:                  11565 hits
Availability:                  95.57 %
Elapsed time:                   6.27 secs
Data transferred:               4.32 MB
Response time:                  0.01 secs
Transaction rate:            1844.50 trans/sec
Throughput:                     0.69 MB/sec
Concurrency:                   24.66
Successful transactions:       11565
Failed transactions:             536
Longest transaction:            1.17

Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

   and failures set to 128:-)

[C:\siege\bin]siege 192.168.21.2
** SIEGE 4.1.3
** Preparing 25 concurrent users for battle.
The server is now under siege...
Lifting the server siege...siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions:                   8746 hits
Availability:                  98.29 %
Elapsed time:                   5.00 secs
Data transferred:               3.27 MB
Response time:                  0.01 secs
Transaction rate:            1749.20 trans/sec
Throughput:                     0.65 MB/sec
Concurrency:                   21.91
Successful transactions:        8746
Failed transactions:             152
Longest transaction:            1.10

Shortest transaction:           0.00

LOG FILE: /var/log/siege.log
You can disable this log file notification by editing
D:\HOME/.siege/siege.conf and changing 'show-logfile' to false.

 So it appears that 'failures' is what is actually stopping the siege...

Regards,

Steven Levine

unread,
Jun 27, 2022, 12:25:21 PM6/27/22
to apa...@googlegroups.com
In <5c4029f6-3b8a-4a53...@googlegroups.com>, on 06/27/22
at 04:11 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> So it appears that 'failures' is what is actually stopping the siege...

As I mentioned to Paul, this is just the 1st cut at getting the
termination code to work without a working pthread_cancel.

It's sorta working. In response to the Ctrl-C all but one of the crew
threads terminated as expected. The stuck thread prevents the process
from terminating when failures = 0 is configured.

Paul Smedley

unread,
Jun 29, 2022, 3:45:12 AM6/29/22
to apa...@googlegroups.com
Hey Guys,

On 28/6/22 01:46, Steven Levine wrote:
> In <5c4029f6-3b8a-4a53...@googlegroups.com>, on 06/27/22
> at 04:11 AM, David McKenna <davidmc...@gmail.com> said:
>
> Hi David,
>
>> So it appears that 'failures' is what is actually stopping the siege...
>
> As I mentioned to Paul, this is just the 1st cut at getting the
> termination code to work without a working pthread_cancel.
>
> It's sorta working. In response to the Ctrl-C all but one of the crew
> threads terminated as expected. The stuck thread prevents the process
> from terminating when failures = 0 is configured.

The following build has additional fixes from Steven -
https://github.com/psmedley/siege-os2/pull/2 plus some fixes to make it
link from me.

https://smedley.id.au/tmp/siege-4.1.3-os2-20220629.zip

Cheers,

Paul

Steven Levine

unread,
Jun 29, 2022, 10:42:13 AM6/29/22
to 'Paul Smedley' via Apache for OS/2
In <0ce48cdd-5175-e25a...@smedley.id.au>, on 06/29/22
at 05:15 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi Paul,

>The following build has additional fixes from Steven -
>https://github.com/psmedley/siege-os2/pull/2 plus some fixes to make it
>link from me.

>https://smedley.id.au/tmp/siege-4.1.3-os2-20220629.zip

Sorry about the sloppy coding.

We are getting closer. This build stops on Ctrl-C, unless all the socket
connects fail. I thought I had the needed patches in place for this, but
I guess not. pr#3 coming. :-)

Paul, when you get a momenent, please add a copy of readme.os2 to the repo
along with a reference copy of your config.h.

Thanks,

Steven Levine

unread,
Jun 29, 2022, 11:18:53 AM6/29/22
to 'Paul Smedley' via Apache for OS/2
In <0ce48cdd-5175-e25a...@smedley.id.au>, on 06/29/22
at 05:15 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi all,

Looks like the required code is in place, but the signal is getting lost
when there's too much output to the console.

I'll try pulling a sleep call after the socket connect failures and see if
this is sufficient.

Steven Levine

unread,
Jun 29, 2022, 5:09:18 PM6/29/22
to 'Paul Smedley' via Apache for OS/2
In <0ce48cdd-5175-e25a...@smedley.id.au>, on 06/29/22
at 05:15 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi all,

Turns out the lost Ctrl-C requests are only a problem when running
concurrent = 1.

Even two crew threads allows the Ctrl-C to be recognized. I was running
with one thread because it makes it vastly easier to step though the code
in the debugger. This not a standard configuration. With one thread, I
can avoid the Ctrl-C failures with:

delay = 0.001

so, I don't see the need for additional patches at this time.

It's not clear exactly why Ctrl-C requests are not getting passed to the
signal handler thread (thread 2), but I have some ideas. I suspect the
signal gets discarded as a side effect of some other kernel processing
related to the almost continuous screen output.

Running

siege -f urls.txt >nul 2>&1

suppresses all screen out and also allows Ctrl-C to be effective.

So siege-on and let us know if you run into any issues we need to look at.

Paul Smedley

unread,
Jun 30, 2022, 6:00:46 AM6/30/22
to apa...@googlegroups.com
Hey Steven,

On 30/6/22 00:03, Steven Levine wrote:
> In <0ce48cdd-5175-e25a...@smedley.id.au>, on 06/29/22
> at 05:15 PM, "'Paul Smedley' via Apache for OS/2"
> <apa...@googlegroups.com> said:
>> The following build has additional fixes from Steven -
>> https://github.com/psmedley/siege-os2/pull/2 plus some fixes to make it
>> link from me.
>
>> https://smedley.id.au/tmp/siege-4.1.3-os2-20220629.zip
>
> Sorry about the sloppy coding.
No problem:)

> Paul, when you get a momenent, please add a copy of readme.os2 to the repo
> along with a reference copy of your config.h.
Done.

Cheers,

Paul

David McKenna

unread,
Jun 30, 2022, 6:29:26 AM6/30/22
to Apache for OS/2
 Thanks guys for the new version and all the work you put into siege! This one seems to work well - only issue is the 'time' directive is still ignored.

  The latest Apache and php8.1 builds hold up against siege very well here - haven't had them crash yet. Can't say the same about AFINETK and SOCKETSK, it seems siege really finds their weaknesses and crashes the system. Running ACPI.PSD /MAXCPU=1 cures that, but otherwise I get a trap in one or the other (on either the server computer or the siege computer) at least once every time running siege with SMP. Setting siege itself to single-processor mode helps, but still get traps even that way occasionally.

Regards,

Paul Smedley

unread,
Jun 30, 2022, 6:52:24 AM6/30/22
to apa...@googlegroups.com
Hi Dave,

On 30/6/22 19:59, David McKenna wrote:
>  Thanks guys for the new version and all the work you put into siege!
> This one seems to work well - only issue is the 'time' directive is
> still ignored.

You mean this parameter?
-t NUMm, --time=NUMm
This option is similar to --reps but instead of specifying the number
of times each user should run, it specifies the amount of time each
should run.

The value format is “NUMm”, where “NUM” is an amount of time and the “m”
modifier is either S, M, or H for seconds, minutes and hours. To run
siege for an hour, you could select any one of the following
combinations: -t3600S, -t60M, -t1H. The modifier is not case sensitive,
but it does require no space between the number and itself.

I'll try investigate.

Cheers,

Paul

David McKenna

unread,
Jun 30, 2022, 7:06:04 AM6/30/22
to Apache for OS/2
Hi Paul,

  I have to admit I didn't try it on the command line. I have 'time = 2M' in my siege.conf file, and when running siege, it never stops at 2 minutes (or any other value I try). Not a big deal, but would be nice if it worked.

Regards,

Steven Levine

unread,
Jun 30, 2022, 11:09:13 AM6/30/22
to 'Paul Smedley' via Apache for OS/2
In <82f7bc8e-f9ee-caca...@smedley.id.au>, on 06/30/22
at 08:22 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi guys,

>The value format is NUMm , where NUM is an amount of time and the m
>modifier is either S, M, or H for seconds, minutes and hours. To run
>siege for an hour, you could select any one of the following
>combinations: -t3600S, -t60M, -t1H. The modifier is not case sensitive,
> but it does require no space between the number and itself.

>I'll try investigate.

This fails for the same reason as the Ctrl-C kill failed.

With debug enabled, you will get the message:

if(my.debug){ printf("TIMED OUT!!\n"); fflush(stdout); }

but since siege_timer also uses the unimplemented

pthread_kill(handler, SIGTERM);

nothing useful happens. Patch coming or more acurately, modified patch
coming. that sets os2_pthread_cancel_requested as well as testing it.
If this is not sufficient, we well have to expose timer_mutex so that
crew_cancel can post it if needed.

We will try the simpler solutio first.

Paul Smedley

unread,
Jun 30, 2022, 5:13:10 PM6/30/22
to apa...@googlegroups.com
Hey All,

On 1/7/22 00:31, Steven Levine wrote:
> In <82f7bc8e-f9ee-caca...@smedley.id.au>, on 06/30/22
> at 08:22 PM, "'Paul Smedley' via Apache for OS/2"
> <apa...@googlegroups.com> said:
>
> Hi guys,
>
>> The value format is NUMm , where NUM is an amount of time and the m
>> modifier is either S, M, or H for seconds, minutes and hours. To run
>> siege for an hour, you could select any one of the following
>> combinations: -t3600S, -t60M, -t1H. The modifier is not case sensitive,
>> but it does require no space between the number and itself.
>
>> I'll try investigate.
>
> This fails for the same reason as the Ctrl-C kill failed.
>
> With debug enabled, you will get the message:
>
> if(my.debug){ printf("TIMED OUT!!\n"); fflush(stdout); }
>
> but since siege_timer also uses the unimplemented
>
> pthread_kill(handler, SIGTERM);
>
> nothing useful happens. Patch coming or more acurately, modified patch
> coming. that sets os2_pthread_cancel_requested as well as testing it.
> If this is not sufficient, we well have to expose timer_mutex so that
> crew_cancel can post it if needed.
>
> We will try the simpler solutio first.

Simple seems to work -
https://smedley.id.au/tmp/siege-4.1.3-os2-20220701.zip

Using:
siege.exe -t 15s https://os2ports.smedley.id.au

I get:
Transactions: 1003 hits
Availability: 83.93 %
Elapsed time: 17.49 secs
Data transferred: 6.24 MB
Response time: 0.34 secs
Transaction rate: 57.35 trans/sec
Throughput: 0.36 MB/sec
Concurrency: 19.75
Successful transactions: 418
Failed transactions: 192
Longest transaction: 1.29
Shortest transaction: 0.01

Cheers,

Paul

David McKenna

unread,
Jun 30, 2022, 5:49:15 PM6/30/22
to Apache for OS/2
Yup... time=2M in siege.conf:

[C:\siege\bin]siege -f c:\siege\etc\urls.txt

** SIEGE 4.1.3
** Preparing 25 concurrent users for battle.
The server is now under siege...siege aborted due to excessive socket failure; y

ou
can change the failure threshold in $HOME/.siegerc

Transactions:                  11712 hits
Availability:                  93.83 %
Elapsed time:                 126.89 secs
Data transferred:             434.81 MB
Response time:                  0.25 secs
Transaction rate:              92.30 trans/sec
Throughput:                     3.43 MB/sec
Concurrency:                   23.03
Successful transactions:       11238
Failed transactions:             770
Longest transaction:            6.20
Shortest transaction:           0.00

 Although it says aborted due to socket failure, I tried different times, and they all worked. Thanks!

Regards,

Steven Levine

unread,
Jun 30, 2022, 6:24:59 PM6/30/22
to 'Paul Smedley' via Apache for OS/2
In <9c8f3e88-4c41-bf01...@smedley.id.au>, on 07/01/22
at 06:43 AM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi,

>Using:
>siege.exe -t 15s https://os2ports.smedley.id.au

>I get:
>Transactions: 1003 hits
>Availability: 83.93 %
>Elapsed time: 17.49 secs
>Data transferred: 6.24 MB
>Response time: 0.34 secs
>Transaction rate: 57.35 trans/sec
>Throughput: 0.36 MB/sec
>Concurrency: 19.75
>Successful transactions: 418
>Failed transactions: 192
>Longest transaction: 1.29
>Shortest transaction: 0.01

Looks good here too. There's a minor nit with the Failed transactions
count. Running with 50 threads, I always see:

Failed transactions: 50

regardless of the duration. What is probably happening is that when
os2_pthread_cancel_requested is TRUE, __request correctly returns FALSE
here:

browser.c:312
if ((ret = __request(this, tmp))==FALSE) {
__increment_failures();
}

So, the request than never happened is counted as an error. Patching this
to

if ((ret = __request(this, tmp))==FALSE &&
!os2_pthread_cancel_requested) {
__increment_failures();
}

will correct the counting.

Now, I guess it's back to trying to expose the remaining httpd/php issues.

If anyone gets the urge, they might want to see how well ab.exe, which
ships with httpd, is working these days. It's always good to have a
alternate testing tool available.

It may also be time for me to understand what "guru meditation" means in
Massimo's world.

On a somewhat related note, I have an update to deadman.exe v0.6 almost
ready to release. It adds the ability to monitor multiple logs files,
which may be useful if running multiple vhosts. v0.5 added support for
rebooting on request. This should be more robust than the typical setboot
/b method which can fail if the system is low or out of resources. A
deadman requested reboot probably can still fail, but the probability is
much lower than other reboot methods.

On a less related note, I finally got the urge to take another look at
cvs2git. Cvs2git converts a CVS repository to git repository. It's a
python app, and at one time, our python port was not up to running the
code. Once the python port issues got resolved, I was unable to
configure the cvs2svn options to export the files with proper DOS line
ending. In cvs speak, we need to do cvs co -kkv for text files and cvs co
-kb for binary files. This give the expected CR/LF line endings for text
files and expands the keywords and leave CRs in binary files unsullied.
Expanding the keywords during conversion makes sense because git does not
do keyword expansion. The expanded keywords serve as a historical
comment.

I'm still not able to set the options to make this happen, so I added some
temporary code to override the selected options do what I wanted them to
do. Perhaps Michael Haggerty, the cvs2svn maintainer, can tell me what
options I should be using.

Steven Levine

unread,
Jun 30, 2022, 9:55:38 PM6/30/22
to apa...@googlegroups.com
In <70d4b3d2-52b0-4870...@googlegroups.com>, on 06/30/22
at 02:49 PM, David McKenna <davidmc...@gmail.com> said:

Hi,

>Yup... time=2M in siege.conf:

>[C:\siege\bin]siege -f c:\siege\etc\urls.txt
>** SIEGE 4.1.3
>** Preparing 25 concurrent users for battle.
>The server is now under siege...siege aborted due to excessive socket
>failure; y
>ou
>can change the failure threshold in $HOME/.siegerc

>Transactions: 11712 hits
>Availability: 93.83 %
>Elapsed time: 126.89 secs
>Data transferred: 434.81 MB
>Response time: 0.25 secs
>Transaction rate: 92.30 trans/sec
>Throughput: 3.43 MB/sec
>Concurrency: 23.03
>Successful transactions: 11238
>Failed transactions: 770
>Longest transaction: 6.20
>Shortest transaction: 0.00

> Although it says aborted due to socket failure, I tried different times,
> and they all worked. Thanks!

That's because the timer was ignored. The message means that siege died
because failure limit was exceeded, rather than the timer expiring.

The message is misleading because the counter increments for any error,
not just socket errors.

Paul Smedley

unread,
Jul 1, 2022, 2:26:09 AM7/1/22
to apa...@googlegroups.com

Paul Smedley

unread,
Jul 1, 2022, 2:30:26 AM7/1/22
to apa...@googlegroups.com
Latest results for a 2 minute test here (apache2 running on Ubuntu FYI)

Transactions: 12280 hits
Availability: 100.00 %
Elapsed time: 121.41 secs
Data transferred: 29.39 MB
Response time: 0.24 secs
Transaction rate: 101.14 trans/sec
Throughput: 0.24 MB/sec
Concurrency: 24.33
Successful transactions: 2013
Failed transactions: 0
Longest transaction: 4.39
Shortest transaction: 0.01

Steven Levine

unread,
Jul 1, 2022, 11:00:34 AM7/1/22
to 'Paul Smedley' via Apache for OS/2
In <21f6f9ac-5b1f-0096...@smedley.id.au>, on 07/01/22
at 03:56 PM, "'Paul Smedley' via Apache for OS/2"
<apa...@googlegroups.com> said:

Hi all,
This build counts better, but there's one more patch coming.

With -t5s and with the server not running I get the expected:

...
[error] socket: unable to connect sock.c:282: Invalid argument

Transactions: 0 hits
Availability: 0.00 %
Elapsed time: 4.99 secs
Data transferred: 0.00 MB
Response time: 0.00 secs
Transaction rate: 0.00 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 0.00
Successful transactions: 0
Failed transactions: 2456
Longest transaction: 0.00
Shortest transaction: 0.00

Fun fact. Running with the stderr redirected, we get:

[ [1;33merror [0m] socket: unable to connect sock.c:282: Invalid argument
siege aborted due to excessive socket failure; you
can change the failure threshold in $HOME/.siegerc

Transactions: 0 hits
Availability: 0.00 %
Elapsed time: 0.53 secs
Data transferred: 0.00 MB
Response time: 0.00 secs
Transaction rate: 0.00 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 0.00
Successful transactions: 0
Failed transactions: 10049
Longest transaction: 0.00
Shortest transaction: 0.00

which shows how much excessive screen output can affect performance. It
also tells me the that a I still not suppressing all the spurious error
counts.

Redirecting output with

>siege -t5s -f urls.txt 2>tmp.out

and failures = 0, we get:

[ [1;33merror [0m] socket: unable to connect sock.c:282: Invalid argument

Transactions: 0 hits
Availability: 0.00 %
Elapsed time: 4.25 secs
Data transferred: 0.00 MB
Response time: 0.00 secs
Transaction rate: 0.00 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 0.00
Successful transactions: 0
Failed transactions: 84922
Longest transaction: 0.00
Shortest transaction: 0.00

which is a significantly more failures per second. :-)

Steven Levine

unread,
Jul 16, 2022, 1:18:39 PM7/16/22
to apa...@googlegroups.com
In <2023fc86-fd07-4da4...@googlegroups.com>, on 06/30/22
at 03:29 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

It's time to collect my notes for your ticket 3177.

You get intermittent trap E's in afinetk or socketsk if accessing via
www.davemckenna.com and not running /MAXCPU=1.

Running /MAXCPU=1 or accessing via 192.168.21.2 avoids the traps.

Is the correct?

BTW, there should be a new E1000B build available soon. When if appears,
you might as well test it. It's unlikely to avoid the traps, but when/if
we get back to debugging this issue, we will want to be testing against
current binaries.

Another BTW, does your MB have an usable serial port connector and do you
happen to have a 2nd machine with a usable serial port connector? If so,
kernel debugging the issue is a possibility.

David McKenna

unread,
Jul 16, 2022, 5:34:58 PM7/16/22
to Apache for OS/2
Hi Steven,

  Ticket 3177 (at ArcaNoae Mantis) is for my server computer which is running Apache 2.4.53 server with php 8.1. It has an Intel NIC. If I run full SMP, then 'siegeing' the apache server (whether www.davemckenna.com or 192.168.21.2) from another computer hooked to the same router switch (usually my desktop computer) will eventually produce a trap in AFINETK, which I uploaded an example to ArcaNoae. If I set that computer to run /MAXCPU=1, I don't seem to get the traps in AFINETK. I have never got a trap in SOCKETSK on that computer.

  On the other hand, my desktop computer where I run 'siege' from, has seen traps in both AFINETK and SOCKETSK (and other things too) when I 'siege' using www.davemckenna.com, but rarely (but not never) when using 192.168.21.2. It has a Realtek NIC. If I set 'siege' to run in single processor mode using 'Execmode -sp' then I can use www.davemckenna.com as well as 192.168.21.2 (but even then will very occasionally get a trap). Never tried the desktop computer using /MAXCPU=1. I never reported this to ArcaNoae because of the experimental nature of getting siege working here.

  Hope this clarifies my setup/situation and sorry if there is any confusion. I do have a serial port on both the server computer and desktop computer, and can collect debugging data from them (with a laptop). Also already updated the Intel NIC driver from the ArcaNoae 'Experimental Builds' download link.

Regards,

Steven Levine

unread,
Jul 18, 2022, 1:08:52 AM7/18/22
to apa...@googlegroups.com
In <66bc229c-be3e-411e...@googlegroups.com>, on 07/16/22
at 02:34 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

Thanks. I think now have a better understanding of the moving parts.

>Never tried the desktop computer using /MAXCPU=1.

It might be a worthwhile test. Execmode -sp is functionally quite
different from /MAXCPU=1. Execmode -sp forces all the process's threads
to run on the same CPU, but network traffic for the process can still
occur on another CPU.

>I never reported this
>to ArcaNoae because of the experimental nature of getting siege working
>here.

We had porting issues to resolve, but we did nothing I would call
experimental. Siege is a typical TCP/IP application that follows the
standards.

I have copies of both Ticket3177-20210828-dump.7z and
Ticket3177-10152021.7z. The later is a bit easier to work with, but both
traps occured for the same reason - the edx content is borked. The trap
is at:

# %ln -m %f2523443 (eip)
!afinetk ip_insertoptions + F

# u %f2523443
%f2523443 8b4a08 mov ecx,dword ptr [edx+08] ;
trap here

The traps occur because EDX=0000836c, which is not a valid 32-bit pointer.
I have some ideas how this might have occurred, but I need to spend more
time with the dump file and try to figure out how and when the pointer
went bad.

The afinetk and sockets drivers differ in a number of ways from the legacy
afinet and socket drivers. The K drivers are mostly 32-bit and use the
KEE interface. When the KEE drivers are in use, the ring3 part of the
TCP/IP stack sets up a dynamic call gate and calls directly into the
socketsk (IIRC) driver passing read to use 32-bit pointers.

At the time of the trap, the ring3 state active thread was:

Current slot number: 00a5
Slot Pid Ppid Csid Ord Sta Pri pTSD pPTDA pTCB Disp SG
Name *00a5# 0185 0183 0185 0007 run 0200 f8ca4000 f9502228 f944ebac 0e88
14 HTTPX

eax=00000079 ebx=00000185 ecx=00000000 edx=02a1f970 esi=02a1fa18
edi=02a1f9d8 eip=1e480027 esp=02a1f930 ebp=02a1fa34 iopl=0 -- -- -- nv up
ei pl zr na pe nc cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000
cr2=00000000 cr3=00211000 p=00 005b:1e480027 c3 retd

# %ln -m %1e480027
CallGate + 8

CallGate is a function in tcpip32.dll which as it name implies is how we
entered the driver.

That bad news is we don't have sources for the socketsk and afinetk
drivers. The good news is that the freebsd sources are available and they
are close enough to what's in the OS/2 drivers to be useful.

David McKenna

unread,
Jul 18, 2022, 6:51:24 AM7/18/22
to Apache for OS/2
Hi Steven,

  Thanks for looking at the trap file - hope it contains the key to a fix, although it's not clear to me how you can fix it without the OS/2 source...

  I do have a system dump from the desktop computer running siege, so I guess I'll create a ticket at ArcaNoae once I've done a little more testing (especially with /MAXCPU=1). Hope you don't mind if I drop your name :-)

Regards,

Steven Levine

unread,
Jul 22, 2022, 1:10:42 PM7/22/22
to apa...@googlegroups.com
In <b14a05fe-71b1-49f2...@googlegroups.com>, on 07/18/22
at 03:51 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> Thanks for looking at the trap file - hope it contains the key to a
>fix, although it's not clear to me how you can fix it without the OS/2
>source...

It's a skill set kind of thing. I'm in my 70's so I come from the days
when source code did not exist as we think of it today. While my machine
language skills are rusty compared the what they were back when I started
with computers, they are still good enough for this kind a debugging.

PM coming.

David McKenna

unread,
Jul 22, 2022, 6:36:08 PM7/22/22
to Apache for OS/2
Hi Steven,

  Got your PM and downloaded the file. Installed it on both the Apache (server) computer and the siege (desktop) computer. Both computers are running full SMP. In no case did I get a system trap on the Apache computer. I did get one on the siege computer.
Did these tests:

1st run: siege is set for single processor mode and use 192.168.21.2 to address the apache server. Set up siege for 15 minutes, but it stopped due to errors (set to 512) before then with about a 98% success rate. The apache server was still running, although there were a couple exceptq files there - attached. No system traps.

2nd run: siege is set for multi-processor mode and use 192.168.21.2 to address the Apache server. Set for 15 minutes, but it stopped due to the Apache server crashing after about 10 minutes. Many exceptq files and POPUPLOG on the server. Attached. No system traps.

3rd run: siege is set for multi-processor mode and use davemckenna.com to address the apache server (with davemckenna.com defined in the HOSTS file of the siege computer). Set for 15 minutes, but it stopped due to the Apache server crashing after about 10 minutes. Many exceptq files and POPUPLOG on the server. Attached. No system traps.

4th run: siege is set for multi-processor mode and davemckenna.com is used to address the Apache server (davemckenna.com NOT defined in the HOSTS file of the siege computer). Siege computer traps after about 30 seconds in 'SOFFICE', Apache computer continues running with no errors. I have a (old) dump file from this situation if needed.

Regards,
2nd run.7z
3rd run.7z
1st run.7z

Steven Levine

unread,
Jul 22, 2022, 9:05:50 PM7/22/22
to apa...@googlegroups.com
In <465c2468-925f-40b2...@googlegroups.com>, on 07/22/22
at 03:36 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> Got your PM and downloaded the file. Installed it on both the Apache
>(server) computer and the siege (desktop) computer. Both computers are
>running full SMP. In no case did I get a system trap on the Apache
>computer. I did get one on the siege computer.

Thanks for the testing. The results sound promising. Without the system
traps, we have more opportunities for other residual issues to show up.

>1st run: siege is set for single processor mode and use 192.168.21.2 to
>address the apache server. Set up siege for 15 minutes, but it stopped
>due to errors (set to 512) before then with about a 98% success rate.
>The apache server was still running, although there were a couple
>exceptq files there - attached. No system traps.

Offhand these look like unhandled malloc failures in the php code base.
Nothing platform specific. We should be able to detect these and avoid
the traps.

>2nd run: siege is set for multi-processor mode and use 192.168.21.2 to
>address the Apache server. Set for 15 minutes, but it stopped due to the
>Apache server crashing after about 10 minutes. Many exceptq files and
>POPUPLOG on the server. Attached. No system traps.

These were mostly libc lock issues. We seen these before. The libc heap
code exits while holding the lock. I need to check my notes and see what
the status is. I may need to submit a ticket to bitwiseworks for this.

>3rd run: siege is set for multi-processor mode and use davemckenna.com to
> address the apache server (with davemckenna.com defined in the HOSTS
>file of the siege computer). Set for 15 minutes, but it stopped due to
>the Apache server crashing after about 10 minutes. Many exceptq files
>and POPUPLOG on the server. Attached. No system traps.

This is more libc heap corruption followed by libc exiting while holding
the heap lock.

>4th run: siege is set for multi-processor mode and davemckenna.com is
>used to address the Apache server (davemckenna.com NOT defined in the
>HOSTS file of the siege computer). Siege computer traps after about 30
>seconds in 'SOFFICE', Apache computer continues running with no errors.
>I have a (old) dump file from this situation if needed.

Let's open a arcanoae mantis ticket for this one and upload the dump to
the AN FTP. I need to look at the dump file decide what's next this one.

Speaking of tickets, perhaps we should open a siege testing ticket on
Paul's mantis? This will avoid cluttering the list file attachments that
are going to bounce when sent to the other gmail users. It's possible
Paul's zoho provider bounced your message. If they reject zip files, they
may reject .7z files too.

Paul Smedley

unread,
Jul 22, 2022, 9:20:24 PM7/22/22
to apa...@googlegroups.com
Hey guys,

On 23/7/22 09:51, Steven Levine wrote:
> Speaking of tickets, perhaps we should open a siege testing ticket on
> Paul's mantis? This will avoid cluttering the list file attachments that
> are going to bounce when sent to the other gmail users. It's possible
> Paul's zoho provider bounced your message. If they reject zip files, they
> may reject .7z files too.

Happy either way - FYI I did get the 7z attachments.

Cheers,

Paul

David McKenna

unread,
Jul 23, 2022, 9:14:40 AM7/23/22
to Apache for OS/2
OK, I created the ticket (3300) at ArcaNoae and uploaded the dump file. 

Regards,

Steven Levine

unread,
Jul 23, 2022, 3:00:12 PM7/23/22
to apa...@googlegroups.com
In <14df4ce2-35d8-4ffc...@googlegroups.com>, on 07/23/22
at 06:14 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

>OK, I created the ticket (3300) at ArcaNoae and uploaded the dump file.

The dump file tells us that the trap is at the same location as the other
afinetk traps. The code path is slightly different so the patch does not
get to do it's thing.

I will tweak the patch to handle this.

David A surprised by me my jumping in immediately on the ticket. I
suggested he wait until I had a chance to get some answers from the dump
file.

Steven Levine

unread,
Jul 24, 2022, 9:23:16 PM7/24/22
to apa...@googlegroups.com
In <14df4ce2-35d8-4ffc...@googlegroups.com>, on 07/23/22
at 06:14 AM, David McKenna <davidmc...@gmail.com> said:

Hi David,

>OK, I created the ticket (3300) at ArcaNoae and uploaded the dump file.

A closer look at the dump file tells us it was supposed to trap. The code
path looks a bit different because the trap occurred before you installed
the patched afinetk.sys on the siege client.

I will mark the ticket as a duplicate of 3177.

David McKenna

unread,
Jul 24, 2022, 10:59:55 PM7/24/22
to Apache for OS/2
Hi Steven,

  OK, I just uploaded a new dump file from a trap while using the test afinetk.sys you gave me. This trap happened about 4 minutes into a siege using URL's in SMP mode. The server was unaffected.

Regards,

Steven Levine

unread,
Jul 25, 2022, 2:52:59 PM7/25/22
to apa...@googlegroups.com
In <12629d98-4734-4e55...@googlegroups.com>, on 07/24/22
at 07:59 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

> OK, I just uploaded a new dump file from a trap while using the test
>afinetk.sys you gave me. This trap happened about 4 minutes into a siege
>using URL's in SMP mode.

As expected, this trap occurred for a different reason. Initial analysis
implies an SMP related failure, but I need to spend some more time to
better understand what the runnings threads doing at the time of the trap.

Can you check if running the client with the patched afinetk.sys and
/MAXCPU=1 avoids the trap?

We should probably do the same check and determine if just exexmode -sp is
sufficient to avoid the trap.

To avoid uploading duplicate dump files, you should probably check if the
traps are different. Do you know how to do this?

Thanks,

Steven Levine

unread,
Jul 25, 2022, 3:54:30 PM7/25/22
to apa...@googlegroups.com
at 03:36 PM, David McKenna <davidmc...@gmail.com> said:

Hi David,

I've reviewed the data in your *run*.7z files.

The initial cause of each set of failures is reported the error log files
as:

[Fri Jul 22 15:44:16 2022] zend_mm_free_heap detected heap corrupted for
pid:120 (78) tid:1 chunk->heap 0x25200040 heap 0x21400040

The error reports and traps are simply side-effects of this initial
failure.

Each chunk is owned by a heap and the chunk header points the heap that
owns the chunk. This relationship is created when the chunk is allocated.
For some as yet unknown reason and at some as yet unknown location in the
code, the code attempts to free the chunk from the wrong heap. Both the
heap and the chunk pointers look valid so I don't think the problem is
that something clobbered the pointer in the chunk header.

Can we check if this error can still occur with /MAXCPU=1? My notes don't
say one way or another.

Thanks,