apache crash

Massimo

unread,

Dec 6, 2011, 7:56:14 AM12/6/11

to apa...@googlegroups.com

Hi all,

apache 2.2.17 php 5.2.8 - os ecs 2.1 ga

[Tue Dec 06 13:50:51 2011] [error] caught exception (XCPT_ACCESS_VIOLATION) in w
orker thread, initiating child shutdown pid=4384
[Tue Dec 06 13:50:51 2011] [error] caught exception in worker thread, initiating
child shutdown pid=4384
[Tue Dec 06 13:50:51 2011] [error] caught exception (XCPT_ACCESS_VIOLATION) in w
orker thread, initiating child shutdown pid=4384
[Tue Dec 06 13:50:51 2011] [error] caught exception in worker thread, initiating
child shutdown pid=4384

Killed by SIGSEGV
pid=0x1120 ppid=0x111b tid=0x0001 slot=0x011d pri=0x0200 mc=0x0001
D:\APACHE\BIN\HTTPD.EXE
cs:eip=005b:1d8e35d7 ss:esp=0053:0022fc8c ebp=0022fcb8
ds=0053 es=0053 fs=150b gs=0000 efl=00210206
eax=1d8e35d7 ebx=030ea8f4 ecx=0000001b edx=00ccd700 edi=0000000a esi=00b59608
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
[Tue Dec 06 13:50:58 2011] [notice] caught SIGTERM, shutting down

any idea?
any help or suggestion?

thanks
bye

massimo s.

Lewis G Rosenthal

unread,

Dec 6, 2011, 12:19:51 PM12/6/11

to apa...@googlegroups.com

Sounds familiar:

On 12/06/11 07:56 am, Massimo thus wrote :

Have a look at:

http://mantis.smedley.info/view.php?id=299
http://groups.google.com/group/apache2/browse_thread/thread/d7f954e36b36ecf9

Some of the above is quite general:

[error] caught exception (XCPT_ACCESS_VIOLATION) in worker thread,
initiating child shutdown pid=

(however, note the ds, es, fs, gs, and efl addresses in your log and
compare to the ones in the Mantis bug.)

What I *am* seeing, though, is that aside from PHP, we're all using
MySQL, and in bug # 299, Jim mentions the higher frequency of that crash
when using a remote MySQL server (as I do, and I see several of these
per day or even per hour). I have *no* local MySQL process running on
the web server, and instead, run MySQL on a NetWare box on the same LAN.

Can you track back anything in your access logs which happened around
the same time as the crash? In one instance, Steve and I tracked back a
particular AMP app with a connection logging table in the db which was
getting *huge*. Apparently, the delay while posting a hit to this db was
causing the MySQL module considerable distress. Once I added a cron job
to empty the table every night, that immediate problem subsided, and I
returned to my "usual" several restarts per hour behavior.

Just some random thoughts.

--
Lewis
-------------------------------------------------------------
Lewis G Rosenthal, CNA, CLP, CLE, CWTS
Rosenthal& Rosenthal, LLC www.2rosenthals.com
Need a managed Wi-Fi hotspot? www.hautspot.com
visit my IT blog www.2rosenthals.net/wordpress
please do not add my address to any non-bcc mass mailings
-------------------------------------------------------------

Steven Levine

unread,

Dec 6, 2011, 3:40:17 PM12/6/11

to apa...@googlegroups.com

In <4EDE4EB7...@2rosenthals.com>, on 12/06/11
at 12:19 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi guys,

>Sounds familiar:

Well, sorta. :-)

>> [Tue Dec 06 13:50:51 2011] [error] caught exception (XCPT_ACCESS_VIOLATION) in w
>> orker thread, initiating child shutdown pid=4384
>> [Tue Dec 06 13:50:51 2011] [error] caught exception in worker thread, initiating
>> child shutdown pid=4384
>>
>> Killed by SIGSEGV
>> pid=0x1120 ppid=0x111b tid=0x0001 slot=0x011d pri=0x0200 mc=0x0001
>> D:\APACHE\BIN\HTTPD.EXE
>> cs:eip=005b:1d8e35d7 ss:esp=0053:0022fc8c ebp=0022fcb8
>> ds=0053 es=0053 fs=150b gs=0000 efl=00210206
>> eax=1d8e35d7 ebx=030ea8f4 ecx=0000001b edx=00ccd700 edi=0000000a esi=00b59608
>> Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
>> [Tue Dec 06 13:50:58 2011] [notice] caught SIGTERM, shutting down

All this says is that a trap occurred in some DLL and the exception
handler shut down the trapping thread.

This is similar to saying the weather is not nice. There's not enough
information here to know whether or not your need an umbrella or a snow
plow.

>(however, note the ds, es, fs, gs, and efl addresses in your log and
>compare to the ones in the Mantis bug.)

That's because the segment registers are constants in 32-bit applications.
:-)

>Just some random thoughts.

I can probably tell you a lot more about exactly why the trap occurred,
but you guys need to do some of the work. You need to work with Paul to
acquire copies of the maps and diffs for the versions of the binaries you
are running. If I had more free time, I'd do this myself, but I don't.
Some of the distributions include maps in the zip files, so don't. The
same is true for the source code diffs.

One thing you should already know to do without being asked to to convert
the cs:eip to a location to a specific DLL, object and offset. Theseus
can do this, if you don't have a process dump available.

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
----------------------------------------------------------------------

Lewis G Rosenthal

unread,

Dec 6, 2011, 6:17:26 PM12/6/11

to apa...@googlegroups.com

Hi there...

On 12/06/11 03:40 pm, Steven Levine thus wrote :

> In<4EDE4EB7...@2rosenthals.com>, on 12/06/11
> at 12:19 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi guys,
>
>> Sounds familiar:
> Well, sorta. :-)
>

A crash by any other name...

>>> [Tue Dec 06 13:50:51 2011] [error] caught exception (XCPT_ACCESS_VIOLATION) in w
>>> orker thread, initiating child shutdown pid=4384
>>> [Tue Dec 06 13:50:51 2011] [error] caught exception in worker thread, initiating
>>> child shutdown pid=4384
>>>
>>> Killed by SIGSEGV
>>> pid=0x1120 ppid=0x111b tid=0x0001 slot=0x011d pri=0x0200 mc=0x0001
>>> D:\APACHE\BIN\HTTPD.EXE
>>> cs:eip=005b:1d8e35d7 ss:esp=0053:0022fc8c ebp=0022fcb8
>>> ds=0053 es=0053 fs=150b gs=0000 efl=00210206
>>> eax=1d8e35d7 ebx=030ea8f4 ecx=0000001b edx=00ccd700 edi=0000000a esi=00b59608
>>> Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
>>> [Tue Dec 06 13:50:58 2011] [notice] caught SIGTERM, shutting down
> All this says is that a trap occurred in some DLL and the exception
> handler shut down the trapping thread.
>
> This is similar to saying the weather is not nice. There's not enough
> information here to know whether or not your need an umbrella or a snow
> plow.
>

LOL!! Great analogy!

>> (however, note the ds, es, fs, gs, and efl addresses in your log and
>> compare to the ones in the Mantis bug.)
> That's because the segment registers are constants in 32-bit applications.
> :-)
>

Indeed. Brain was on lunch break, and came back sometime later, filling
the space between my ears once more. Apologies for the statement of the
obvious...

>> Just some random thoughts.
> I can probably tell you a lot more about exactly why the trap occurred,
> but you guys need to do some of the work. You need to work with Paul to
> acquire copies of the maps and diffs for the versions of the binaries you
> are running. If I had more free time, I'd do this myself, but I don't.
> Some of the distributions include maps in the zip files, so don't. The
> same is true for the source code diffs.
>

Right-o.

> One thing you should already know to do without being asked to to convert
> the cs:eip to a location to a specific DLL, object and offset. Theseus
> can do this, if you don't have a process dump available.
>

In my case, this doesn't do much, as they're all in DOSCALL1 shared
space. I've tried locating the area in the system arena table, but can't
seem to get any more useful info (yet). In my case, at least, all of
this is consistent (same address, time after time).

Cheers/2

Massimo

unread,

Dec 6, 2011, 7:38:58 PM12/6/11

to apa...@googlegroups.com

Il 06/12/2011 18.19, Lewis G Rosenthal ha scritto:
>
> What I *am* seeing, though, is that aside from PHP, we're all using MySQL, and in bug # 299,
> Jim mentions the higher frequency of that crash when using a remote MySQL server (as I do, and
> I see several of these per day or even per hour). I have *no* local MySQL process running on
> the web server, and instead, run MySQL on a NetWare box on the same LAN.

mysql here is running on the localhost on the same machine

massimo s.

Steven Levine

unread,

Dec 6, 2011, 10:10:15 PM12/6/11

to apa...@googlegroups.com

In <4EDEA28...@2rosenthals.com>, on 12/06/11
at 06:17 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>In my case, this doesn't do much, as they're all in DOSCALL1 shared
>space.

That's useful because it reminds us that this is may be something we have
seen before. I recall we were working on an exception stack corruption
issue and then you wandered off to be busy with other tasks.

IAC, what I said applies. You need to collect what I need before I am
going to be really interested in cranking up pmdf and doing any analysis.

>I've tried locating the area in the system arena table, but can't
>seem to get any more useful info (yet).

That's not going to give you want you are looking for.

Once you know the trap is in doscall1.dll, then you want to display the
doscall1.dll module table and find the eip in the object table listing at
the bottom of the page. Once you have this you can calculate the
object:offset value and look up the nearest symbol use a text dump of
doscall1.dll. However, you are better of letting pmdf do the work for
you.

If you don't have a working pmdf setup, I can help with this.

>In my case, at least, all of
>this is consistent (same address, time after time).

This may make it easier to fix.

Lewis G Rosenthal

unread,

Dec 7, 2011, 1:27:59 AM12/7/11

to apa...@googlegroups.com

Hi again...

On 12/06/11 10:10 pm, Steven Levine thus wrote :

> In<4EDEA28...@2rosenthals.com>, on 12/06/11
> at 06:17 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>> In my case, this doesn't do much, as they're all in DOSCALL1 shared
>> space.
> That's useful because it reminds us that this is may be something we have
> seen before. I recall we were working on an exception stack corruption
> issue and then you wandered off to be busy with other tasks.
>
> IAC, what I said applies. You need to collect what I need before I am
> going to be really interested in cranking up pmdf and doing any analysis.
>
>> I've tried locating the area in the system arena table, but can't
>> seem to get any more useful info (yet).
> That's not going to give you want you are looking for.
>

True enough. I don't do this often, so it takes me a bit of thinking to
wrap my head around the tools, how to use them, and what we're really
trying to find...

> Once you know the trap is in doscall1.dll, then you want to display the
> doscall1.dll module table and find the eip in the object table listing at
> the bottom of the page. Once you have this you can calculate the
> object:offset value and look up the nearest symbol use a text dump of
> doscall1.dll. However, you are better of letting pmdf do the work for
> you.
>

Theseus does a decent job of most of the heavy lifting:

Description of Linear Object 1FFB1B65, PID = 08E7, name = 'HTTPD':
It is (shared data) object # 0000 of module DOSCALL1.
The address is at offset 00001B65 into the memory object.

Arena Record:
har pages linear flg next prev link hash hob hal hco / Decoded
flags
0149 00000010 1FFB0000 3D9 0148 014A 0000 0000 0164 0000 1943 / Mapped
Reload User Exec Read Hco

Object Record:
hob har next flgs ownr hmte sown,cnt lt st / owner / decoded_flags
0164 0149 0000 0838 015B 015B 0000 00 00 00 / m-DOSCALL1 / shared
exec read user

Then, hoping that we haven't swapped, I go back to the module table and
get a formatted display of the MTE for DOSCALL1:

Resident portion of MTE (@ FD40BFCC):
Use count (EXE only) = 0
Number of entries in Imp Mod Name Tbl = 2
Module name DOSCALL1
Handle of the mte = 015B
File system number for open file = 004C
Link to next mte = FFEBD7D3
Link to swappable mte = FBC34998
flags2 = 0004
flags1 = 00008000
Class = Nonspecific
File Media does not permit discarding
DLL module

Swappable portion of MTE (@ FBC34998):
Module # pages = 00000033
Initial instruction pointer = 0002:000009D8
Initial stack pointer = 0000:00000000
Fixup section size = 00002DA0
Object table offset = FBC34A34
Number of objects in module = 0000000B
Object page map offset = FBC34B3C
Object iterated data map offset = 00003F90
Offset of Resource Table = 00000000
Number of resource entries = 00000000
Offset of resident name table = FBC34CD4
Offset of Entry Table = FBC34EBC
Offset of Fixup Page Table = FBC35AE0
Offset of Fixup Record Table = FBC35BB0
Offset of Import Module Name Table = FBC38870
Offset of Imp Procedure Name Table = FBC38880
Offset of Enumerated Data Pages = 00003F90
Offset of Non-resident Names Table = 00000000
Size of Non-resident Name Table = 00000000
Object # for automatic data object = 00000006
Offset of the debugging info = 00000000
The length of the debug info in bytes = 00000000
Use for converted 16-bit modules = 00000000 (heapsize)
Full pathname = FBC34A1C -> C:\OS2\DLL\DOSCALL1.DLL
Length of pathname = 0017
Count of threads waiting on MTE semaphore = 0
Slot number of the owner of the MTE semaphore = 0000
Pointer to file cache for Dos32CacheModule= 00000000
Use for converted 16-bit modules = 0000 (alignshift)
Use for converted 16-bit modules = 0000 (stacksize)
expver from NE header = 0000
exetype from NE header = 0000

Segment Table:
file file ram
Seg# offset size flag size hob sel fixups / flags...
0001 0010 0000 0000 1FFA 9025 8000 00000001 / code
0002 0001 0000 FFD6 0165 C1F4 0000 1FFB0000 / execute-only code
packed conforming
0003 2025 8000 0002 0000 000D 0000 0164FFDF / code packed
0004 FDF9 0000 0000 1FFC D025 8000 0000000F / code
0005 0010 0000 FFE6 0163 E500 0000 1FFD0000 / execute-only code
packed shared conforming
0006 D025 8000 001F 0000 000F 0000 0162FFEE / data packed iterated
0007 1180 0000 0000 1300 1023 8000 0000002E / code
0008 0002 0000 9807 0161 1DC0 0000 13010000 / data packed ring=2
0009 1003 8000 0030 0000 0002 0000 0000980F / code shared
000A 0F24 0000 0000 1302 1023 8000 00000032 / code
000B 0001 0000 9817 015F 05D0 0000 13030000 / data packed ring=2

> If you don't have a working pmdf setup, I can help with this.
>

No, I still do have most of what we set up before in place and ready to
go. My real issue is that I need to do this on a non-production system,
if at all possible (or migrate production processes to a standby box
before chancing hangs and whatnot while tracing this down), and that has
always stalled me in these endeavors. Thus, I've just made myself
somewhat comfortable with the monthly to bi-monthly restarts, and have
put this off for so long.

>> In my case, at least, all of
>> this is consistent (same address, time after time).
> This may make it easier to fix.
>

Indeed, it should help. Consistency is golden.
When I do have this set up (finally), I'll post back.

Steven Levine

unread,

Dec 7, 2011, 3:00:23 AM12/7/11

to apa...@googlegroups.com

In <4EDF076F...@2rosenthals.com>, on 12/07/11

at 01:27 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>True enough. I don't do this often, so it takes me a bit of thinking to

>wrap my head around the tools, how to use them, and what we're really
>trying to find...

The first thing you need to find the where the trap occurred. The next
thing is to convert this to a symbolic name This will give you a hint as
to what the code was doing and what needs to be done next.

>Theseus does a decent job of most of the heavy lifting:

Theseus is OK for getting started, but loading the dump file in a working
pmdf setup will almost always be more efficient a provide better data.

>Description of Linear Object 1FFB1B65, PID = 08E7, name = 'HTTPD': It is
>(shared data) object # 0000 of module DOSCALL1.
>The address is at offset 00001B65 into the memory object.

It appears that you have upgraded to doscall1.dll 14.105. If so the
offset resolves to something between

16 Bit Symbol <DOS32UNSETEXCEPTIONHANDLER> Address 2:1b44
16 Bit Symbol <postDOS32UNSETEXCEPTIONHANDLER> Address 2:1b8a

and this is indeed a trap we have seen before.

>Then, hoping that we haven't swapped, I go back to the module table and
>get a formatted display of the MTE for DOSCALL1:

Swapping is irrelevant in this case. DOSCALL1.DLL is going to load in the
same place pretty much forever for a given system.

>> If you don't have a working pmdf setup, I can help with this.
>>
>No, I still do have most of what we set up before in place and ready to
>go. My real issue is that I need to do this on a non-production system,
>if at all possible (or migrate production processes to a standby box
>before chancing hangs and whatnot while tracing this down), and that has
>always stalled me in these endeavors.

Yes and no. For analysis of the hangs do occur, all that's needed is a
working pmdf setup and this does not even need to be on the same system.

If we want to start banging on the configuration then a test system is the
way to go although it might be difficult to put sufficient load on the
test system so that the hangs occur.

>Thus, I've just made myself
>somewhat comfortable with the monthly to bi-monthly restarts, and have
>put this off for so long.

I understand. It's difficult to get a lot of priority to something that
only occurs 6 times a year.

Massimo's case is a bit different. He seems to get more hangs and traps,
but he seems less able to do the things I require of him for me to help
him effectively.

Regards,

Massimo

unread,

Dec 7, 2011, 5:40:32 AM12/7/11

to apa...@googlegroups.com

Steve, i'm sorry my bad know-how in debug stuff, let me know what i can do
to track better the crashes.

E.g. if you need something to catch the dumps like you sent me for stunnel
let me know or send me the package

another question, i've heard from Roderick that someone is fixing the monster
in gcc libs that heavy lock sem32 with mysql

thanks
bye

massimo s.

p.s. any news about stunnel?

Lewis G Rosenthal

unread,

Dec 7, 2011, 1:21:23 PM12/7/11

to apa...@googlegroups.com

Good morning...

On 12/07/11 03:00 am, Steven Levine thus wrote :

> In<4EDF076F...@2rosenthals.com>, on 12/07/11
> at 01:27 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>> True enough. I don't do this often, so it takes me a bit of thinking to
>> wrap my head around the tools, how to use them, and what we're really
>> trying to find...
> The first thing you need to find the where the trap occurred. The next
> thing is to convert this to a symbolic name This will give you a hint as
> to what the code was doing and what needs to be done next.
>
>> Theseus does a decent job of most of the heavy lifting:
> Theseus is OK for getting started, but loading the dump file in a working
> pmdf setup will almost always be more efficient a provide better data.
>

Makes sense.

FWIW, I have maps for httpd & httpddll. If we need diffs, I'll need to
request them from Paul.

>> Description of Linear Object 1FFB1B65, PID = 08E7, name = 'HTTPD': It is
>> (shared data) object # 0000 of module DOSCALL1.
>> The address is at offset 00001B65 into the memory object.
> It appears that you have upgraded to doscall1.dll 14.105. If so the
> offset resolves to something between
>
> 16 Bit Symbol<DOS32UNSETEXCEPTIONHANDLER> Address 2:1b44
> 16 Bit Symbol<postDOS32UNSETEXCEPTIONHANDLER> Address 2:1b8a
>
> and this is indeed a trap we have seen before.
>

We surely have.

From your email to me of 1/12/2010:

> 01-12-2010 06:30:15 SYS3175 PID 0f7d TID 000b Slot 00de
> J:\APPS\APACHE2\BIN\HTTPD.EXE
> c0000005
> 1ffb1b65
> P1=00000001 P2=00000b14 P3=XXXXXXXX P4=XXXXXXXX
> EAX=00000b14 EBX=200394a0 ECX=0246ffd4 EDX=0246ffd4
> ESI=0246ffd4 EDI=0000000b
> DS=0053 DSACC=f0f3 DSLIM=ffffffff
> ES=0053 ESACC=f0f3 ESLIM=ffffffff
> FS=150b FSACC=00f3 FSLIM=00000030
> GS=0000 GSACC=**** GSLIM=********
> CS:EIP=005b:1ffb1b65 CSACC=f0df CSLIM=ffffffff
> SS:ESP=0053:0246ff94 SSACC=f0f3 SSLIM=ffffffff
> EBP=0246ff94 FLG=00010213
>
> DOSCALL1.DLL 0002:00001b65

16 Bit Symbol<postDOS32UNSETEXCEPTIONHANDLER> Address 2:1b32
16 Bit Symbol<DOS32ENTERMUSTCOMPLETE> Address 2:1b34

This is the trap where the exit list is munged. It you go back through
your notes, you will find that sitr was capable of causing this.

SITR is no longer running on the system (that was that PHP-driven
knowledgebase app which posted to the MySQL table for every hit we got.

ISTR that we updated DOSCALL1 at some point in 2009, while we were
troubleshooting either the above or an rsync issue. Presently, I have:

12-29-04 11:25 144,631 0 DOSCALL1.DLL

>> Then, hoping that we haven't swapped, I go back to the module table and
>> get a formatted display of the MTE for DOSCALL1:
> Swapping is irrelevant in this case. DOSCALL1.DLL is going to load in the
> same place pretty much forever for a given system.
>

I know that an earlier issue surrounded the MySQL module for PHP, which
was getting swapped. This necessitated restarting Apache and catching
the failure early on, because by the time it happened the hundredth time
or so, the module code had already been swapped out to disk. I recall
thinking it odd that the system had swapped anything, because it is
really fairly lean, for a 2GB server (and Theseus seemed to confirm
that). We may have concluded that the system had prepared it to swap but
had never actually done it...I don't remember the specifics, and would
have to look back through our notes.

>>> If you don't have a working pmdf setup, I can help with this.
>>>
>> No, I still do have most of what we set up before in place and ready to
>> go. My real issue is that I need to do this on a non-production system,
>> if at all possible (or migrate production processes to a standby box
>> before chancing hangs and whatnot while tracing this down), and that has
>> always stalled me in these endeavors.
> Yes and no. For analysis of the hangs do occur, all that's needed is a
> working pmdf setup and this does not even need to be on the same system.
>
> If we want to start banging on the configuration then a test system is the
> way to go although it might be difficult to put sufficient load on the
> test system so that the hangs occur.
>

Right. I figured if we needed to get a dump file that I would need to
put the system through the mill. I had a JFS chkdsk pass about a month
ago which offlined the box for over an hour on the way back up from a
reboot triggered by a crash. I can explain to people why their mail and
web services hiccup for five minutes every now and again, but over an
hour of downtime during hours when I *feel* like working is another story.

From the Integrated Management Log:

Severity <javascript:SortRows(t,0);> Class
<javascript:SortRows(t,1);> Last Update <javascript:SortRows(t,2);>
Initial Update <javascript:SortRows(t,3);> Count
<javascript:SortRows(t,4);> Description <javascript:SortRows(t,5);>
Critical

OS

10/20/2011 14:30

10/20/2011 14:30

1

Abnormal Program Termination ()

POPUPLOG.OS2 recorded the same trap we've been discussing, approximately
4 minutes earlier (which had been happening approximately every 5-15
minutes for the entire week preceding, and the Apache error log (last
entry, followed by the restart) shows our familiar friend:

LIBC PANIC!!
fmutex deadlock: Owner died!
0x0026013c: Owner=0x344b0005 Self=0x344b0001 fs=0x3 flags=0x0 hev=0x00010012
Desc="LIBC Heap"
pid=0x344b ppid=0x24dd tid=0x0001 slot=0x00fe pri=0x0200 mc=0x0000
J:\APPS\APACHE2\BIN\HTTPD.EXE

Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

[Thu Oct 20 15:56:24 2011] [warn] pid file
J:/APPS/apache2/logs/httpd.pid overwritten -- Unclean shutdown of
previous Apache run?
[Thu Oct 20 15:56:25 2011] [notice] Apache/2.2.21 (OS/2) PHP/5.2.11
configured -- resuming normal operations

>> Thus, I've just made myself
>> somewhat comfortable with the monthly to bi-monthly restarts, and have
>> put this off for so long.
> I understand. It's difficult to get a lot of priority to something that
> only occurs 6 times a year.
>

Well, it "occurs" much more frequently. However, the mechanisms we have
in place to *usually* deal with the problem in a semi-automated way only
seem to need attention (or cause disruption) about a half dozen times
per year. Still, it should be mended.

> Massimo's case is a bit different. He seems to get more hangs and traps,
> but he seems less able to do the things I require of him for me to help
> him effectively.
>

I think we both see about the same in terms of stability, but mine
happens to recover itself a bit better.

Present uptime on this box is now 7 days, 3 hours, and the HTTPD.EXE is
up to PID 4706, from somewhere around 100 when the system boots. I've
seen it up around 60,000 or so, and I know I'm due for a server bounce
(or it will do it itself, sometimes with catastrophic consequences -
that last one happened when rsync was running its hourly mail backup).

Hopefully, some of what we find out about my system will have some
positive impact on Massimo and other people experiencing similar issues.
Thanks, as always, for your assistance and guidance.

Massimo

unread,

Dec 7, 2011, 1:38:05 PM12/7/11

to apa...@googlegroups.com

Il 07/12/2011 19.21, Lewis G Rosenthal ha scritto:
> Hopefully, some of what we find out about my system will have some positive impact on Massimo
> and other people experiencing similar issues. Thanks, as always, for your assistance and guidance.
> Cheers/2

i guess, but about me the problems are in the gcc libs, since all gcc portings
have a lot of problems (e.g. sem32 lock, instability, zombi processes)

e.g. another problem of apache is that we have no graceful shutdown
so we must kill it with a lot of go -k httpd

when you kill the mother httpd process start to restart the childs
so the pid number raise and i guess this "on and off" is a sort of rape for the OS
and system resources

massimo s.

Lewis G Rosenthal

unread,

Dec 7, 2011, 2:48:02 PM12/7/11

to apa...@googlegroups.com

On 12/07/11 01:38 pm, Massimo thus wrote :

>
> Il 07/12/2011 19.21, Lewis G Rosenthal ha scritto:
>> Hopefully, some of what we find out about my system will have some positive impact on Massimo
>> and other people experiencing similar issues. Thanks, as always, for your assistance and guidance.
>> Cheers/2
> i guess, but about me the problems are in the gcc libs, since all gcc portings
> have a lot of problems (e.g. sem32 lock, instability, zombi processes)
>

What libc version(s) do you have on that box? On this one, I currently have:

C:\ECS\DLL:

4-14-04 16:37 356,330 0 libc05.dll

C:\OS2\DLL:

6-11-07 22:53 48,142 0 libc06.dll
6-11-07 22:53 48,142 0 libc061.dll
6-11-07 22:53 157,124 0 libc062.dll
6-11-07 22:53 1,349,060 0 libc063.dll

For gcc, I have:

C:\OS2\DLL:

4-12-11 19:51 24,270 0 gcc321s.dll
4-12-11 19:51 15,544 0 gcc322.dll
4-12-11 19:51 28,718 0 gcc322.old
6-11-07 22:35 32,801 0 gcc335.dll
4-12-11 19:51 24,416 0 gcc433.dll
4-12-11 19:51 22,647 0 gcc444.dll
4-12-11 19:51 21,899 0 gcc452.dll
4-12-11 19:51 21,742 0 gcc29166.dll

...and what do you know? I found these outdated ones in C:\ECS\DLL just now:

2-23-04 15:40 28,718 0 gcc322.dll
8-06-09 10:39 23,266 0 gcc434.dll
11-16-09 23:45 22,569 0 gcc442.dll
5-01-10 8:47 22,647 0 gcc444.dll
12-29-10 0:27 21,899 0 gcc452.dll

which I'm going to remove *right now* (\ECS\DLL *precedes* \OS2\DLL in
LIBPATH...ugh...). Luckily, a quick check with Theseus shows *none* of
these gcc modules in use at all (in either place), so none of my
current/standard apps require them.

> e.g. another problem of apache is that we have no graceful shutdown
> so we must kill it with a lot of go -k httpd
>

Steve has a nice script, but it has never worked for me (even under
4OS2) for some reason. Still, I typically kill Apache with Ctrl-C in the
VIO.

> when you kill the mother httpd process start to restart the childs
> so the pid number raise and i guess this "on and off" is a sort of rape for the OS
> and system resources
>

It would appear so (particularly if the memory remains fragmented after
exiting). I normally run with just two children in my current setup. How
many do you have?

Massimo

unread,

Dec 7, 2011, 2:50:57 PM12/7/11

to apa...@googlegroups.com

1 father, 6 children httpd processes
with less than 5 or 6 i face problems

i've a lot of vhosts

max

Lewis G Rosenthal

unread,

Dec 7, 2011, 4:32:47 PM12/7/11

to apa...@googlegroups.com

On 12/07/11 02:50 pm, Massimo thus wrote :

> 1 father, 6 children httpd processes
> with less than 5 or 6 i face problems
>
> i've a lot of vhosts
>

Hmmm... What's a lot? I've got 43 at present. What do you have in your
mpm conf? I've got:

ThreadStackSize 2097152
StartServers 2
MinSpareThreads 5
MaxSpareThreads 10
MaxRequestsPerChild 0

Just curious.

--

Steven Levine

unread,

Dec 7, 2011, 5:47:02 PM12/7/11

to apa...@googlegroups.com

In <4EDF42A0...@ecomstation.it>, on 12/07/11
at 11:40 AM, Massimo <m...@ecomstation.it> said:

Hi Massimo,

>Steve, i'm sorry my bad know-how in debug stuff, let me know what i can
>do to track better the crashes.

Tracking the crashes is not the problem. Tracking what you current have
installed when the crashes occur is where you need to help. This means
you need to keep accurate logs of the versions of what you have installed
and be ready to provide me maps and diffs for what you have installed if I
request them.

>E.g. if you need something to catch the dumps like you sent me for
>stunnel let me know or send me the package

You already have everything you need to capture the dumps.

To repeat, what you need to do it make sure that you have the maps and
diffs for the what you have installed. I'll ask for them if I need them.

>another question, i've heard from Roderick that someone is fixing the
>monster in gcc libs that heavy lock sem32 with mysql

I suspect the someone Roderick is referring to is me. Resolving the known
libc issues is working its way to the top of the list.

Knut released libc 0.6.4, but the there have been some questions about its
stability. The Mozilla folks have not yet been able to build a working
Firefox with libc064.

>p.s. any news about stunnel?

It too is working its way up the list.

Steven Levine

unread,

Dec 7, 2011, 5:54:48 PM12/7/11

to apa...@googlegroups.com

In <4EDFAEA3...@2rosenthals.com>, on 12/07/11

at 01:21 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>Good morning...

Good afternoon now. :-)

>FWIW, I have maps for httpd & httpddll. If we need diffs, I'll need to
>request them from Paul.

That would be useful. I prefer to be able to look at sources that match
the binaries. When you get diffs from Paul make sure to note what sources
they are based on. This could be an svn changeset number or the name of a
source tarball.

> This is the trap where the exit list is munged. It you go back
>through
> your notes, you will find that sitr was capable of causing this.

I recall. I was under the impression that removing SITR stopped this
particular exception. However, I guess it just makes it occur less
frequently.

>I know that an earlier issue surrounded the MySQL module for PHP, which
>was getting swapped.

That's not quite what was happening. What was happening was that the code
that contained the Apache runtime was getting unloaded while it was still
needed to process some callbacks. Paul fixed this by linking the APR into
httpd.dll, which is the last DLL to unload.

>This necessitated restarting Apache and catching
>the failure early on, because by the time it happened the hundredth time
>or so, the module code had already been swapped out to disk.

The reason we restarted Apache was so that we could figure out what code
was supposed to be at the trap address when all the DLLs were loaded.

>I recall thinking it odd that the system had swapped anything, because it is
>really fairly lean, for a 2GB server (and Theseus seemed to confirm
>that).

It would have been odd if the system was swapping.

>We may have concluded that the system had prepared it to swap but
>had never actually done it...I don't remember the specifics, and would
>have to look back through our notes.

This much is true. :-)

>Right. I figured if we needed to get a dump file that I would need to
>put the system through the mill.

Or just enable trapdumps and what a couple months for the exception to
occur. If space is an issue you would need to clean up the spurious dumps
that might be capture while you are waiting.

>I had a JFS chkdsk pass about a month
>ago which offlined the box for over an hour on the way back up from a
>reboot triggered by a crash.

JFS chkdsks are great when the tranaction log is not borked. FWIW, it's a
good idea to schedule a full chkdsk every couple of months because there
are things that the fast autocheck will miss.

>LIBC PANIC!!
>fmutex deadlock: Owner died!
>0x0026013c: Owner=0x344b0005 Self=0x344b0001 fs=0x3 flags=0x0
>hev=0x00010012
> Desc="LIBC Heap"

This is all familar.

>Well, it "occurs" much more frequently. However, the mechanisms we have
>in place to *usually* deal with the problem in a semi-automated way only
>seem to need attention (or cause disruption) about a half dozen times
>per year. Still, it should be mended.

Once you have the maps and diffs in place, I'll schedule some time to look
at a process dump.

>I think we both see about the same in terms of stability, but mine
>happens to recover itself a bit better.

Massimo is has a different kind of problem. He runs a lot of stunnel
connections and when they go zombie, it's reboot time. Mensys runs a lot
a stunnel connections too, but they configure for one connection per
instance and this prevents most of the zombies.

The stunnel zombies are definitely a libc issue and I think I know how to
fix it, but it's got to hit the top of the list.

Steven Levine

unread,

Dec 7, 2011, 6:52:17 PM12/7/11

to apa...@googlegroups.com

In <4EDFB28D...@ecomstation.it>, on 12/07/11

at 07:38 PM, Massimo <m...@ecomstation.it> said:

Hi Massimo,

>i guess, but about me the problems are in the gcc libs, since all gcc

>portings have a lot of problems (e.g. sem32 lock, instability, zombi
>processes)

What a word of warning. It's annoying when you make statement like this.
One one hand you are continually making statements about you lack of
debugging and programming skill while on the other you are attributing
specific problems to libc.

There are times when I think to myself, well Massimo thinks he knows so
much about the problem, let him fix it himself.

>e.g. another problem of apache is that we have no graceful shutdown so we
>must kill it with a lot of go -k httpd

That's not true. You are again making incorrection assumptions based on
what you think is true. Httpd understand the USR1 and TERM signals just
fine. You need to use a tool, such as apache_kill, that can generate
these signals.

>when you kill the mother httpd process start to restart the childs so the
>pid number raise and i guess this "on and off" is a sort of rape for the
>OS and system resources

Yawn. :-)

Lewis G Rosenthal

unread,

Dec 7, 2011, 7:41:02 PM12/7/11

to apa...@googlegroups.com

And good afternoon (evening, here) to you, too... :-)

On 12/07/11 05:54 pm, Steven Levine thus wrote :

> In<4EDFAEA3...@2rosenthals.com>, on 12/07/11
> at 01:21 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>
>> Good morning...
> Good afternoon now. :-)
>
>

>> FWIW, I have maps for httpd& httpddll. If we need diffs, I'll need to

>> request them from Paul.
> That would be useful. I prefer to be able to look at sources that match
> the binaries. When you get diffs from Paul make sure to note what sources
> they are based on. This could be an svn changeset number or the name of a
> source tarball.
>

Request sent, in case he hasn't been following the list recently.

>> This is the trap where the exit list is munged. It you go back
>> through
>> your notes, you will find that sitr was capable of causing this.
> I recall. I was under the impression that removing SITR stopped this
> particular exception. However, I guess it just makes it occur less
> frequently.
>

Yes. We alleviated the worst of it by cleaning out that table in the db,
which seemed to make life easier for mysql.dll. Since then, however, I
completely removed that directory and the sitr app, so no more requests
for any of that code.

>> I know that an earlier issue surrounded the MySQL module for PHP, which
>> was getting swapped.
> That's not quite what was happening. What was happening was that the code
> that contained the Apache runtime was getting unloaded while it was still
> needed to process some callbacks. Paul fixed this by linking the APR into
> httpd.dll, which is the last DLL to unload.
>

I meant to imply that the problem with the swapping was finding the code
we wanted to trace, not that swapping was a problem running the code. :-)

>> This necessitated restarting Apache and catching
>> the failure early on, because by the time it happened the hundredth time
>> or so, the module code had already been swapped out to disk.
> The reason we restarted Apache was so that we could figure out what code
> was supposed to be at the trap address when all the DLLs were loaded.
>

Right.

>> I recall thinking it odd that the system had swapped anything, because it is
>> really fairly lean, for a 2GB server (and Theseus seemed to confirm
>> that).
> It would have been odd if the system was swapping.
>

As was my supposition at the time. You were the one IIRC who suggested I
double check the memory usage with Theseus vs taking the Xcenter
widget's view of the memory usage for gospel. In this case, though, they
seemed to agree, and even now (more virtuals; busier mail; busier FTP),
the system rarely "hogs" more than 400MB. Obviously, I keep my desktop
work to a bare minimum on it.

>> We may have concluded that the system had prepared it to swap but
>> had never actually done it...I don't remember the specifics, and would
>> have to look back through our notes.
> This much is true. :-)
>
:-)
>> Right. I figured if we needed to get a dump file that I would need to
>> put the system through the mill.
> Or just enable trapdumps and what a couple months for the exception to
> occur. If space is an issue you would need to clean up the spurious dumps
> that might be capture while you are waiting.
>

Space is indeed becoming something of a concern. I'm soon going to need
to expand the array again (though this chassis has plenty of room). I
have one mail domain using almost 15GB, and they are transitioning off
of the server in the next couple of months; that will free up some room,
as well.

I'll see about setting this up again.

>> I had a JFS chkdsk pass about a month
>> ago which offlined the box for over an hour on the way back up from a
>> reboot triggered by a crash.
> JFS chkdsks are great when the tranaction log is not borked. FWIW, it's a
> good idea to schedule a full chkdsk every couple of months because there
> are things that the fast autocheck will miss.
>

Yes, the wonder of journalling is only a wonder when the journal is
usable. :-) In this case, the HPFS chkdsk ran quickly and the JFS took a
long, long time (I forgot; this was when we had the conversation about
the chklog being a fixed length, as I was unable to get all of the
messages stored during that painful run). Luckily, the majority of the
trashed files were in the backup directories, and the next rsync passes
filled in the gaps again. I do, however, need to reconfigure for an
additional volume (not the boot and not the data volume, where Apache,
CGP, and FTP run, but a separate, isolated one) to store these rsync
backups, and then have chkdsk run *after* boot, so that I can get
services up and running first, in case this one (which will be taking
the biggest "writing" hit during production) needs one of those
hour-long passes. More to figure out...

>> LIBC PANIC!!
>> fmutex deadlock: Owner died!
>> 0x0026013c: Owner=0x344b0005 Self=0x344b0001 fs=0x3 flags=0x0
>> hev=0x00010012
>> Desc="LIBC Heap"
> This is all familar.
>

Yep. Same junk. I can go back several Apache versions and several PHP
versions, and this is all the same. I have these in logs from 2009.

>> Well, it "occurs" much more frequently. However, the mechanisms we have
>> in place to *usually* deal with the problem in a semi-automated way only
>> seem to need attention (or cause disruption) about a half dozen times
>> per year. Still, it should be mended.
> Once you have the maps and diffs in place, I'll schedule some time to look
> at a process dump.
>

Thanks. I'll let you know.

>> I think we both see about the same in terms of stability, but mine
>> happens to recover itself a bit better.
> Massimo is has a different kind of problem. He runs a lot of stunnel
> connections and when they go zombie, it's reboot time. Mensys runs a lot
> a stunnel connections too, but they configure for one connection per
> instance and this prevents most of the zombies.
>
> The stunnel zombies are definitely a libc issue and I think I know how to
> fix it, but it's got to hit the top of the list.
>

Stunnel is one of those things I've wanted to test (for securing FTP
transactions), but have been reluctant until we get these other issues
under control.

Cheers/2

Steven Levine

unread,

Dec 7, 2011, 9:26:02 PM12/7/11

to apa...@googlegroups.com

In <4EE0079E...@2rosenthals.com>, on 12/07/11
at 07:41 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi/2,

>I meant to imply that the problem with the swapping was finding the code
>we wanted to trace,

My point is that swapping had nothing to do with the code being gone from
memory.

>Yes, the wonder of journalling is only a wonder when the journal is
>usable. :-) In this case, the HPFS chkdsk ran quickly and the JFS took a
>long, long time

This is probably an apples vs. oranges comparison. The HPFS volume is
probably tiny relative to the JFS volumes. My experience is that an HPFS
chkdsk is horribly slow compared to JFS chkdsk for the volume sizes we
care about.

>additional volume (not the boot and not the data volume, where Apache,
>CGP, and FTP run, but a separate, isolated one) to store these rsync
>backups, and then have chkdsk run *after* boot, so that I can get
>services up and running first,

This is a good idea. I used to do this when I had more HPFS volumes.
Another technique I used with HPFS is to reboot to a command line
maintenance volume and run the chkdsks from the command line. The
maintenance volume autochecked only itself. The command line autochecks
run much faster because more available memory than provided by chkdsk.sys.

I drove the chkdsks with a script that did

set DRV=f g h i
for %%X in ( %DRV% ) do ( cls & dir %%X: & echo Ready to check %%X: &
pause & chkdsk %%X: /c & pause )

This required manual intervention between volumes but that's what I wanted
at the time. It would have been easy enough to write the script to run
through all the volumes automatically, log the results to a file and
reboot to the production volume.

>in case this one (which will be taking
>the biggest "writing" hit during production) needs one of those
>hour-long passes. More to figure out...

It will be interesting to see which volumes end up with the corrupted
journals.

>Stunnel is one of those things I've wanted to test (for securing FTP
>transactions),

Stunnel can't be used for this. Stunnel needs to know which ports to
serve. What we really need is a working sshd port which would support
sftp.

Massimo

unread,

Dec 7, 2011, 11:27:13 PM12/7/11

to apa...@googlegroups.com

Il 08/12/2011 1.41, Lewis G Rosenthal ha scritto:
> Space is indeed becoming something of a concern. I'm soon going to need to expand the array
> again (though this chassis has plenty of room). I have one mail domain using almost 15GB,

my respect ;)
15GB for a mail domain is even more than all my emails on my ecs desktop
since 1997 to now, about 11GB

> and
> they are transitioning off of the server in the next couple of months; that will free up some
> room, as well.

you know as array the number 1 under eCS is LSI Logic 300-8x sata2 hw raid,
a real monster in performances i use it on all servers of mine

massimo

Massimo

unread,

Dec 7, 2011, 11:30:00 PM12/7/11

to apa...@googlegroups.com

i don't run anymore any hpfs fs since about 2 years ago

i see no reason at all to run hpfs
anyway with JFS partitions i prefere to have the "+" set on the check
while boot, it take about 5 minutes to check all partitions, but i don't
like to have bad surprises ;)

massimo s.

Lewis G Rosenthal

unread,

Dec 8, 2011, 1:20:53 AM12/8/11

to apa...@googlegroups.com

Getting a bit far afield for this list, so I figure it may be time to at
least change the subject header...

On 12/07/11 09:26 pm, Steven Levine thus wrote :

> In<4EE0079E...@2rosenthals.com>, on 12/07/11
> at 07:41 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi/2,
>

<snip>

>> Yes, the wonder of journalling is only a wonder when the journal is
>> usable. :-) In this case, the HPFS chkdsk ran quickly and the JFS took a
>> long, long time
> This is probably an apples vs. oranges comparison. The HPFS volume is
> probably tiny relative to the JFS volumes. My experience is that an HPFS
> chkdsk is horribly slow compared to JFS chkdsk for the volume sizes we
> care about.
>

No comparison drawn, other than time. My reference was to that
particular chkdsk pass. On this server, the HPFS volume is 20GB (65%
free) and the JFS volume is 150GB (13% free). Surely, the HPFS chkdsk
would be way slower per block, but there are fewer blocks! :-) On this
system, when the journal is in good condition, the JFS chkdsk - despite
the disparity in used blocks - flies, and the HPFS chkdsk does plods
along (as expected).

>> additional volume (not the boot and not the data volume, where Apache,
>> CGP, and FTP run, but a separate, isolated one) to store these rsync
>> backups, and then have chkdsk run *after* boot, so that I can get
>> services up and running first,
> This is a good idea. I used to do this when I had more HPFS volumes.
> Another technique I used with HPFS is to reboot to a command line
> maintenance volume and run the chkdsks from the command line. The
> maintenance volume autochecked only itself. The command line autochecks
> run much faster because more available memory than provided by chkdsk.sys.
>

This is how I do it on the ThinkPad (boot to maintenance, and run
concurrent chkdsk passes on C: & J: (MAINT is D:).

> I drove the chkdsks with a script that did
>
> set DRV=f g h i

> for %%X in ( %DRV% ) do ( cls& dir %%X:& echo Ready to check %%X:&
> pause& chkdsk %%X: /c& pause )

>
> This required manual intervention between volumes but that's what I wanted
> at the time. It would have been easy enough to write the script to run
> through all the volumes automatically, log the results to a file and
> reboot to the production volume.
>

This sounds good. In my case, my reasoning is that I want to get the
blasted machine back on its feet as quickly as possible. Knowing that
all of those rsync passes make for lots of disk writes, which will
likely be the messiest things in the event of a crash during one of them
(and they do run pretty much back-to-back, between backing up Apache web
spaces, mail domains, and FTP spaces), it's silly to delay the machine
booting to clean up after a bunch of botched backups, when the live data
is intact.

Of course, I could back up over the network to another box via rsync,
and I've got the room. I just haven't wanted to bog the network with
more traffic, so backing up locally seemed to make more sense with such
a fat local pipe (RAID 5 on a battery-backed caching controller and 15K
RPM drives). The system also has CPU cycles to spare, so that wasn't an
issue.

>> in case this one (which will be taking
>> the biggest "writing" hit during production) needs one of those
>> hour-long passes. More to figure out...
> It will be interesting to see which volumes end up with the corrupted
> journals.
>

True. I've been lucky, I guess. For the most part, even when the system
reboots itself, it tends to clean up nicely before it goes down
(apparently). Still, there have been times when one or more CGP
mailboxes ends up with a corrupted settings file, resulting in the
dreaded "inbox not found" message (or some such). Having ready backups
(i.e., local, not on tape, and not compressed, but simply extra copies)
gets things back together quickly. The firewall does an adequate job of
holding off before actually bouncing an incoming message for one of
these, too, giving me the few minutes to copy back the file(s) before it
tries again to deliver.

>> Stunnel is one of those things I've wanted to test (for securing FTP
>> transactions),
> Stunnel can't be used for this. Stunnel needs to know which ports to
> serve. What we really need is a working sshd port which would support
> sftp.
>

Ah. I know I can do it on Linux with sshd. I don't know why I was
thinking of stunnel for this, instead. I didn't know that our current
sshd port would not support sftp, though.

Lewis G Rosenthal

unread,

Dec 8, 2011, 1:28:16 AM12/8/11

to apa...@googlegroups.com

On 12/07/11 11:27 pm, Massimo thus wrote :

>
> Il 08/12/2011 1.41, Lewis G Rosenthal ha scritto:
>> Space is indeed becoming something of a concern. I'm soon going to need to expand the array
>> again (though this chassis has plenty of room). I have one mail domain using almost 15GB,
> my respect ;)
> 15GB for a mail domain is even more than all my emails on my ecs desktop
> since 1997 to now, about 11GB
>

All told, I probably have 40 or 50GB of mail on the system. Using
CommuniGate Pro, over 50% of my clients are using IMAP, which increases
my storage requirements considerably.

>> and
>> they are transitioning off of the server in the next couple of months; that will free up some
>> room, as well.
> you know as array the number 1 under eCS is LSI Logic 300-8x sata2 hw raid,
> a real monster in performances i use it on all servers of mine
>

HP Proliant ML370 G3 with onboard SmartArray 5i Ultra320 SCSI.

I'm not a big fan of SATA for things other than workstations and
notebooks. I've never seen SATA properly/consistently hot plug (under
*any* OS nor using *any* controller), as opposed to SCSI RAID, where I
just hot swapped a failing drive in this exact server last week.

Still, it's good to know that there is a SATA RAID controller we can
drive under eCS. Thanks for mentioning it!

Lewis G Rosenthal

unread,

Dec 8, 2011, 1:35:51 AM12/8/11

to apa...@googlegroups.com

On 12/07/11 11:30 pm, Massimo thus wrote :

> i don't run anymore any hpfs fs since about 2 years ago
>
> i see no reason at all to run hpfs
>

Not all servers can boot JFS from SCSI subsystems. Have a look at:

http://bugs.ecomstation.nl/view.php?id=1862
http://bugs.ecomstation.nl/view.php?id=2401

> anyway with JFS partitions i prefere to have the "+" set on the check
> while boot, it take about 5 minutes to check all partitions, but i don't
> like to have bad surprises ;)
>

Always good practice, but 5 minutes is not always the case (I wish it
were). ;-)

<snip>

Steven Levine

unread,

Dec 8, 2011, 2:53:42 AM12/8/11

to apa...@googlegroups.com

In <4EE05745...@2rosenthals.com>, on 12/08/11
at 01:20 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>particular chkdsk pass. On this server, the HPFS volume is 20GB (65%
>free)

A lot depends on the number of files. On fast drives without too many
files, I'd expect the chkdsk time to be somewhere between 5 and 10
minutes.

>and the JFS volume is 150GB (13% free).

This will take a bit even with really fast drives.

>Surely, the HPFS chkdsk
>would be way slower per block,

Worse than that, as I've mentioned before, the design is sure the maximize
head movement.

>This sounds good. In my case, my reasoning is that I want to get the
>blasted machine back on its feet as quickly as possible.

Given your setup, it makes sense to leave as much as possible for later.
The longer the recovery time, the more likely that the recovery time will
include a couple of phone calls. It's just the way it is.

>True. I've been lucky, I guess. For the most part, even when the system
>reboots itself, it tends to clean up nicely before it goes down
>(apparently).

It depends on what triggers the reboot. If it is system exception,
there's no cleanup at all. The theory is that if the system is this
messed up, trying to do any clean up has a higher probability of doing
damange that just discarding everything in memory. What you are seeing is
the fact that there's a lot more disk reads than disk writes and that in
relative terms disk writes occur infrequently.

>Ah. I know I can do it on Linux with sshd. I don't know why I was
>thinking of stunnel for this, instead. I didn't know that our current
>sshd port would not support sftp, though.

That's not the problem. The problem is that our sshd port is not terribly
stable. Dave and I were working on a new port, but then family matters
got in his way.

The sftp and ssh clients are OK. I've used them a lot with *ix boxes.

Steven Levine

unread,

Dec 8, 2011, 3:17:12 AM12/8/11

to apa...@googlegroups.com

In <4EDFC2F2...@2rosenthals.com>, on 12/07/11
at 02:48 PM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi guys,

>What libc version(s) do you have on that box? On this one, I currently
>have:

This is pretty much irrelevant. A gcc based application is going to use
the libc is was linked with and there is only one build of each libc
version, TTBOMK. I'm ignoring the private libc06x versions which Paul
does not seem to be using.

>...and what do you know? I found these outdated ones in C:\ECS\DLL just
>now:

> 2-23-04 15:40 28,718 0 gcc322.dll
> 8-06-09 10:39 23,266 0 gcc434.dll
>11-16-09 23:45 22,569 0 gcc442.dll
> 5-01-10 8:47 22,647 0 gcc444.dll
>12-29-10 0:27 21,899 0 gcc452.dll

They are not out dated. They just have different timestamps. Try a
binary compare.

>which I'm going to remove *right now* (\ECS\DLL *precedes* \OS2\DLL in
>LIBPATH...ugh...).

You might as well get rid of the duplicates because it reduces clutter.

>Steve has a nice script, but it has never worked for me (even under
>4OS2) for some reason.

We never did get around to figuring this out, did we. I broke down a
rewrote the script in Classic REXX long ago. Let's see if the text
attachment gets through Google's protections.

It didn't. :)

Regards,

Steven

--
----------------------------------------------------------------------
"Steven Levine" <ste...@earthlink.net> eCS/Warp/DIY etc.
www.scoug.com www.ecomstation.com
----------------------------------------------------------------------

/* apachectl - control httpd 2.2.x
Keep in sync with apachectl for other httpd versions

Return 0 if OK else error code

This program is free software licensed under the terms of the GNU
General Public License. The GPL Software License can be found in
gnugpl2.txt or at http://www.gnu.org/licenses/licenses.html#GPL

2006-05-05 SHL Baseline
2008-03-06 SHL Add kill debug logic
2008-03-11 SHL Use pkill - emxkill seems to kill script
2008-04-30 SHL Turn off debug messages
2009-02-06 SHL Sync with standards
2009-02-23 SHL Rewrite from 4OS2 to REXX
2009-02-25 SHL Start minimized, allow 457
2009-04-08 SHL Correct usage display; drop -d
2009-04-09 SHL Support foreground startup for debugging
2009-04-11 SHL Use FindBaseDir
2009-09-28 SHL Reword help
2009-12-30 SHL Comments
2010-12-20 SHL Correct signal name to number typo
*/

signal on Error
signal on FAILURE name Error
signal on Halt
signal on NOTREADY name Error
signal on NOVALUE name Error
signal on SYNTAX name Error

call SetLocal

call Initialize

Gbl.!Version = '0.1'

Gbl.!AppName = 'Apache server'
Gbl.!Exe = 'httpd'

call FindBaseDir

call directory MakePath(Gbl.!BaseDir, 'bin')

if \ ChkExeInPath(Gbl.!Exe) then
call Fatal Gbl.!AppName 'not found in PATH'

Gbl.!CfgFile = ''
Gbl.!Editor = ''
Gbl.!Err = 0
Gbl.!Killer = ''
Gbl.!WrkFile = ''

Main:

parse arg cmdLine
call ScanArgs cmdLine
drop cmdLine

do iArg = 1 to Gbl.!ArgList.0
curArg = Gbl.!ArgList.iArg
call DoArg curArg
end /* iArg */

call CleanUp

exit Gbl.!Err

/* end main */

/*=== DoArg(action) Process action request; return rc ===*/

DoArg: procedure expose Gbl.
parse arg curArg
select
when curArg == 'start' then
call DoStart
when curArg == 'stop' then
call DoStop
when curArg == 'graceful' then
call DoGraceful
when curArg == 'kill' then
call DoKill
when curArg == 'restart' then
call DoRestart
when curArg == 'status' then
call DoStatus
when curArg == 'config' then
call DoConfig
when curArg == 'test' then
call DoTest
when curArg == 'lynx' then
call DoLynx
when curArg == 'lynx2' then
call DoLynx2
when curArg == 'help' then
call ScanArgsHelp
otherwise
call ScanArgsUsage 'Request' curArg 'unexpected'
end
return

/* end DoArg */

/*=== DoStart() Start server ===*/

DoStart: procedure expose Gbl.

if ChkRunning() then do
say Gbl.!AppName 'already running'
end
else do
if Gbl.!Verbose then
say 'Starting' Gbl.!AppName 'from' directory()
if Gbl.!Foreground then do
signal off Error
Gbl.!Exe '-d..'
signal on Error
if RC \= 0 then do
say Gbl.!AppName 'exited with rc ' RC
Gbl.!Err = 1
end
end
else do
signal off Error
'start "' || Gbl.!AppName '2.2.x" /min /c' Gbl.!Exe '-d..'
signal on Error
if RC \= 0 & RC \= 457 then
signal Error
If \ ChkRunning(1) then do
say 'Can not start' Gbl.!AppName
Gbl.!Err = 1
end
else do
say Gbl.!AppName 'started'
end
end
end

return

/* end DoStart */

/*=== DoStop() Stop ===*/

DoStop: procedure expose Gbl.
if \ ChkRunning() then
say Gbl.!AppName 'is not running'
else do
call ReadPidFromFile
call RunKiller '-TERM'
if ChkRunning(0) then do
say Gbl.!AppName 'did not stop'
Gbl.!Err = 1
end
else
say Gbl.!AppName 'has stopped'
end
return

/* end DoStop */

/*=== DoGraceful() Graceful restart ===*/

DoGraceful: procedure expose Gbl.
say 'httpd does not support graceful restart'
return
if \ ChkRunning() then
say Gbl.!AppName 'is not running'
else do
call ReadPidFromFile
call RunKiller '-USR1'
if \ ChkRunning(0) then do
say Gbl.!AppName 'did not restart'
Gbl.!Err = 1
end
else
say Gbl.!AppName 'has restarted'
end
return

/* end DoGraceful */

/*=== DoKill() Kill ===*/

DoKill: procedure expose Gbl.
if \ ChkRunning() then
say Gbl.!AppName 'is not running'
else do
call ReadPidFromFile
call RunKiller '-KILL'
if ChkRunning(0) then do
say Gbl.!AppName 'will not die'
Gbl.!Err = 1
end
else
say Gbl.!AppName 'has stopped'
end
return

/* end DoKill */

/*=== DoRestart() Restart ===*/

DoRestart: procedure expose Gbl.
/* 27 Feb 09 SHL fixme to use HUP */
call DoStop
if Gbl.!Err = 0 then
call DoStart
else
say Gbl.!AppName 'stop failed'
return

/* end DoRestart */

/*=== DoStatus() Status ===*/

DoStatus: procedure expose Gbl.

if \ ChkRunning() then do
say Gbl.!AppName 'is not running'
Gbl.!Err = 1
end
else do
if \ Gbl.!Verbose then do
call ReadPidFromFile
if Gbl.!Pid == '' then do
/* Assume restarting */
call SysSleep 1
call ReadPidFromFile
end
if Gbl.!Pid == '' then
say Gbl.!AppName 'is running but' Gbl.!PidFile 'does not yet exist'
else
say Gbl.!AppName 'is running as Pid' Gbl.!Pid'('d2x(Gbl.!Pid)')'
end
end
return

/* end DoStatus */

/*=== DoConfig() Config ===*/

DoConfig: procedure expose Gbl.

call FindCfg
call FindEditor
say 'Starting' Gbl.!AppName 'configurator'
Gbl.!Editor Gbl.!CfgFile
say 'Restart' Gbl.!AppName 'to apply changes'
return

/* end DoConfig */

/*=== DoTest() Test ===*/

DoTest: procedure expose Gbl.
call FindCfg
Gbl.!Exe '-d' Gbl.!BaseDir '-t'
return

/* end DoTest */

/*=== DoLynx() Lynx ===*/

DoLynx: procedure expose Gbl.
say 'lynx is not yet supported'
LYNX = 'lynx -dump'
STATUSURL = 'http://localhost:80/server-status'
/* $LYNX $STATUSURL | awk ' /process$/ { print; exit } { print } ' */
return

/* end DoLynx */

/*=== DoLynx2() Lynx2 ===*/

DoLynx2: procedure expose Gbl.
say 'lynx is not yet supported'
LYNX = 'lynx'
STATUSURL = 'http://localhost:80/server-status'
/* $LYNX $STATUSURL | awk ' /process$/ { print; exit } { print } ' */
return

/* end DoLynx2 */

/*=== ChkRunning() Return TRUE if exe running, optionally waits for expected state ===*/

ChkRunning: procedure expose Gbl.

/* 23 Feb 09 SHL fixme to use GetPIDForProcess */
parse arg waitFor
if Gbl.!WrkFile == '' then do
if \ ChkExeInPath('grep') then
call Fatal 'grep not in PATH'
s = MakePath(Gbl.!TmpDir, Gbl.!CmdName || '_???.wrk')
Gbl.!WrkFile = SysTempFileName(s)
end
do c = 1 to 10
signal off Error
'@pstat /c | grep -i' Gbl.!Exe'.exe >'Gbl.!WrkFile
signal on Error
running = RC == 0
if waitfor == '' | waitFor == running then leave
call SysSleep 1
end
if Gbl.!Verbose & running then do
'@type' Gbl.!WrkFile
do i = 1 to 10
call ReadPidFromFile
if Gbl.!Pid \== '' then do
say Gbl.!AppName 'is running as Pid' Gbl.!Pid'('d2x(Gbl.!Pid)')'
leave
end
call SysSleep 1
end
end
return running

/* end ChkRunning */

/*=== Cleanup() Cleanup work files ===*/

Cleanup: procedure expose Gbl.

if Gbl.!WrkFile \== '' then do
'@del /q' Gbl.!WrkFile
Gbl.!WrkFile = ''
end
return

/* end Cleanup */

/*=== FindBaseDir() Find base directory and set Gbl.!BaseDir or die ===*/

FindBaseDir: procedure expose Gbl.

if symbol('Gbl.!BaseDir') == 'LIT' then do
s = GetEnv('HOSTNAME')
select
when s = 'slamain' then
d = 'd:\Internet\Apache22'
when s = 'slat42-1' then
d = 'd:\Internet\Apache22'
when s = 'acru' then
d = 'D:\Apps\apache2'
otherwise
d = ''
end
if d == '' then
call Fatal 'Can not determine base directory'
if \ IsDir(d) then
call Fatal d 'directory not found'
Gbl.!BaseDir = d
end
return

/* end FindBaseDir */

/*=== FindCfg() Find httpd.conf ===*/

FindCfg: procedure expose Gbl.

if Gbl.!CfgFile == '' then do
s = '..\conf\httpd.conf'
if IsFile(s) then
Gbl.!CfgFile = s
if Gbl.!CfgFile == '' then
call Fatal 'Can not locate' s
end
return

/* end FindCfg */

/*=== FindEditor() Find editor ===*/

FindEditor: procedure expose Gbl.

if Gbl.!Editor == '' then do
s = GetEnv('EDITOR')
if s \== '' then
Gbl.!Editor = s
else if ChkExeInPath('vim') then
Gbl.!Editor = 'vim'
else if ChkExeInPath('vimx.cmd') then
Gbl.!Editor = '4os2 /c vimx'
else if ChkExeInPath('tedit') then
Gbl.!Editor = 'tedit'
end
if Gbl.!Editor == '' then
call Fatal 'EDITOR not defined'
return

/* end FindEditor */

/*=== Initialize() Intialize globals ===*/

Initialize: procedure expose Gbl.
call GetCmdName
call LoadRexxUtil
Gbl.!Env = 'OS2ENVIRONMENT'
call GetTmpDir
return

/* end Initialize */

/*=== ReadPidFromFile() return pid from pidfile if file exists, die if pidfile corrupt ===*/

ReadPidFromFile: procedure expose Gbl.

Gbl.!PidFile = Gbl.!BaseDir || '\logs\httpd.pid'
Gbl.!Pid = ReadLineFromFile(Gbl.!PidFile)

if Gbl.!Pid == '' & IsFile(Gbl.!PidFile) then
call Fatal 'Can not read PID from' Gbl.!PidFile

return

/* end ReadPidFromFile */

/*=== RunKiller(signal) Run process killer, default signal to kill if omitted ===*/

RunKiller: procedure expose Gbl.

parse arg sig

if sig == '' then
sig = '-KILL'

if Gbl.!Pid == '' then
call Fatal 'Pid not set'

if Gbl.!Killer == '' then do
if ChkExeInPath('apache_kill') then
Gbl.!Killer = 'apache_kill'
else if ChkExeInPath('emxkill') then
Gbl.!Killer = 'emxkill'
if Gbl.!Killer == '' then
call Fatal 'Gbl.!Killer not defined'
end
select
when Gbl.!Killer == 'apache_kill' then nop
when Gbl.!Killer == 'emxkill' then nop
otherwise
/* Translate signal name to number */
sigTbl = '-HUP 1 -KILL -9 -TERM 15 -USR1 16'
do i = 1 to words(sigTbl) by 2
if translate(sig) == word(sigTbl, i) then do
sig = word(sigTbl, i + 1)
leave
end
end
end
cmd = Gbl.!Killer sig Gbl.!Pid
say Gbl.!Killer sig Gbl.!Pid'('d2x(Gbl.!Pid)')'
signal off Error
'@'Gbl.!Killer sig Gbl.!Pid
signal on Error

return

/* end RunKiller */

/*=== ScanArgsInit() ScanArgs initialization exit routine ===*/

ScanArgsInit: procedure expose Gbl. cmdTail swCtl keepQuoted

if cmdTail == '' then
call ScanArgsHelp

/* Preset defaults */
Gbl.!Foreground = 0 /* Start in foreground */
Gbl.!Verbose = 0 /* Verbose messages */
Gbl.!ArgList.0 = 0 /* Reset arg count */

/* Configure scanner */
swCtl = '' /* Switches that take args, append ? if arg optional */
keepQuoted = 0 /* Set to 1 to keep arguments quoted */

return

/* end ScanArgsInit */

/*=== ScanArgsSwitch() ScanArgs switch option exit routine ===*/

ScanArgsSwitch: procedure expose Gbl. curSw curSwArg

select
when curSw == 'f' then
Gbl.!Foreground = 1
when curSw == 'h' | curSw == '?' then
call ScanArgsHelp
when curSw == 'v' then
Gbl.!Verbose = 1
when curSw == 'V' then do
say Gbl.!CmdName Gbl.!Version
exit
end
otherwise
call ScanArgsUsage 'switch '''curSw''' unexpected'
end /* select */

return

/* end ScanArgsSwitch */

/*=== ScanArgsArg() ScanArgs argument option exit routine ===*/

ScanArgsArg: procedure expose Gbl. curArg

i = Gbl.!ArgList.0 + 1
Gbl.!ArgList.i = curArg
Gbl.!ArgList.0 = i

return

/* end ScanArgsArg */

/*=== ScanArgsTerm() ScanArgs scan end exit routine ===*/

ScanArgsTerm: procedure expose Gbl.

if Gbl.!ArgList.0 = 0 then do
call ScanArgsUsage 'action required (i.e. start, stop)'
nop
end
return

/* end ScanArgsTerm */

/*=== ScanArgsHelp() Display ScanArgs usage help exit routine ===*/

ScanArgsHelp:
say
say 'Control httpd.'
say
say 'Usage:' Gbl.!CmdName '[-f] [-h] [-v] [-V] [-?] action...'
say
say ' -f Start in foreground'
say ' -h -? Display this message'
say ' -v Enable verbose output'
say ' -V Display version'
say
say ' action start, stop, restart, kill, graceful, status, config, test, help'
exit 255

/* end ScanArgsHelp */

/*=== ScanArgsUsage(message) Report Scanargs usage error exit routine ===*/

ScanArgsUsage:
parse arg msg
say
if msg \== '' then
say msg
say 'Usage:' Gbl.!CmdName '[-f] [-h] [-v] [-V] [-?] action...'
exit 255

/* end ScanArgsUsage */

/*============================================================================== */
/*=== SkelRexxFunc standards - Delete unused - Move modified above this mark === */
/*============================================================================== */

/*=== ChkExeInPath(exe) return true if executable is in PATH supplies .exe if no extension ===*/

ChkExeInPath: procedure expose Gbl.
parse arg exe
if exe == '' then
inPath = 0
else do
i = lastpos('.', exe)
j = lastpos('\', exe)
if i = 0 | i < j then exe = exe || '.exe' /* No extension */
inPath = SysSearchPath('PATH', exe) \== ''
end
return inPath

/* end ChkExeInPath */

/*=== ChopDirSlash(directory) Chop trailing \ from directory name unless root ===*/

ChopDirSlash: procedure
parse arg dir
if right(dir, 1) == '\' & right(dir, 2) \== ':\' & dir \== '\' then
dir = substr(dir, 1, length(dir) - 1)
return dir

/* end ChopDirSlash */

/*=== GetEnv(var) Return value for environment variable or empty string ===*/

GetEnv: procedure expose Gbl.
parse arg var
if var = '' then
call Fatal 'GetEnv requires an argument'
return value(var,, Gbl.!Env)

/* end GetEnv */

/*=== GetPidForProcess(procname) Return decimal pid for named process or empty string ===*/

GetPidForProcess: procedure expose Gbl.
parse arg procName
if procName = '' then
call Fatal 'Process name required'
signal off Error
'@pstat /c | rxqueue'
call on Error name ErrorVL
procpid = ''
procName = translate(procName) /* pstat always outputs uppercase */
do while queued() <> 0
pull line
if procpid \== '' then iterate /* Optimize */
parse var line pid ppid sess name blkid state misc1 misc2
if pos('\', name) = 0 then iterate /* Not a process defintion */
if pos(procName, name) \= 0 then
procpid = x2d(strip(pid))
end
return procpid

/* end GetPidForProcess */

/*=== IsDir(dirName) return 1 if arg is valid directory name ===*/

IsDir: procedure expose Gbl.
parse arg dir
/* requires ChopDirSlash
* wildcards OK in last component if wildcard resolves to single directory
*/
parse arg dir
if dir == '' then
yes = 0
else do
dir = ChopDirSlash(dir)
if right(dir, 1) == ':' then do
dir = dir'NUL'
call SysFileTree dir, 'f'
end
else do
do forever
i = lastpos('\', dir)
if i = 0 then
leave
s = substr(dir, i + 1)
if s \== '.' then
leave
dir = left(dir, i) /* Chop trailing . */
dir = ChopDirSlash(dir)
end
call SysFileTree dir, 'f', 'D'
end
if RESULT \= 0 then
call Fatal 'IsDir' dir 'failed'
if f.0 = 1 then
yes = 1 /* Matched 1 or matched . or .. with one subdir */
else do
/* Matched more than 1 - make sure really directory */
i = lastpos('\', dir)
s = substr(dir, i + 1)
yes = s = '.' | (f.0 >= 1 & (s = '..' | right(dir, 2) == ':\'))
end
end
return yes

/* end IsDir */

/*=== IsFile(fileSpec) return true if arg is file ===*/

IsFile: procedure expose Gbl.
parse arg fileSpec
if fileSpec == '' then
yes = 0
else do
call SysFileTree fileSpec, 'fileList', 'F'
if RESULT \= 0 then
call Fatal 'IsFile' wildCard 'failed'
/* Assume caller knows if arg contains wildcards */
yes = fileList.0 \= 0
end
return yes

/* end IsFile */

/*=== MakePath(drv, dir, name, ext) Make pathname from parts ===*/

MakePath: procedure expose Gbl.

/* All parts optional - code guesses what caller means
MakePath(path, filename) will work
MakePath(path, name, ext) will not work
Autmatically converts unix slashes to dos slashes
*/

parse arg drv, dir, name, ext

if name == '' & ext == '' then do
name = dir
dir = drv
drv = ''
end
if drv == '' then
path = '' /* Drive omitted */
else do
drv = translate(drv, '\', '/') /* Ensure DOS */
path = drv
/* If just on character assume drive letter */
if length(drv) = 1 then
path = path':'
/* If leading \\ assume UNC */
else if left(drv, 2) == '\\' & right(drv, 1) \== '\' then
path = path'\'
/* Otherwise figure out later */
end

if dir \== '' then do
dir = translate(dir, '\', '/') /* Ensure DOS */
c = right(path, 1)
c2 = left(dir, 1)
if path \== '' & c \== ':' & c \== '\' & c2 \== '\' then
path = path || '\' || dir
else if c == '\' & c2 == '\' then
path = path || substr(dir, 2)
else
path = path || dir
end

if name \== '' then do
c = right(path, 1)
if path \== '' & c \= '\' & c \== ':' then
path = path || '\' || name
else
path = path || name
end

if ext \== '' then do
if left(ext, 1) \== '.' then
path = path'.'ext
else
path = path || ext
end

return path

/* end MakePath */

/*=== ReadLineFromFile(file) Read first line from file, return line or empty sting ===*/

ReadLineFromFile: procedure expose Gbl.
parse arg fileName
if fileName = '' then
call Fatal 'File name argument required'
line = ''
/* OK for file to not exist */
if stream(fileName, 'C', 'QUERY EXISTS') \== '' then do
if lines(fileName) <> 0 then
line = strip(linein(fileName))
call stream fileName, 'C', 'CLOSE'
end
return line

/* end ReadLineFromFile */

/*========================================================================== */
/*=== SkelRexx standards - Delete unused - Move modified above this mark === */
/*========================================================================== */

/*=== Error() Report ERROR, FAILURE etc., trace and exit or return if called ===*/

Error:
say
parse source . . cmd
say 'CONDITION'('C') 'signaled at' cmd 'line' SIGL'.'
if 'CONDITION'('D') \= '' then say 'REXX reason =' 'CONDITION'('D')'.'
if 'CONDITION'('C') == 'SYNTAX' & 'SYMBOL'('RC') == 'VAR' then
say 'REXX error =' RC '-' 'ERRORTEXT'(RC)'.'
else if 'SYMBOL'('RC') == 'VAR' then
say 'RC =' RC'.'
say 'Source =' 'SOURCELINE'(SIGL)

if 'CONDITION'('I') \== 'CALL' | 'CONDITION'('C') == 'NOVALUE' | 'CONDITION'('C') == 'SYNTAX' then do
trace '?A'
say 'Enter REXX commands to debug failure. Press enter to exit script.'
call 'SYSSLEEP' 2
if 'SYMBOL'('RC') == 'VAR' then exit RC; else exit 255
end

return

/* end Error */

/*=== Fatal(message) Report fatal error and exit ===*/

Fatal:
parse arg msg
call 'LINEOUT' 'STDERR', ''
call 'LINEOUT' 'STDERR', Gbl.!CmdName':' msg 'at script line' SIGL
call 'BEEP' 200, 300
call 'SYSSLEEP' 2
exit 254

/* end Fatal */

/*=== GetCmdName() Get script name; set Gbl.!CmdName ===*/

GetCmdName: procedure expose Gbl.
parse source . . cmd
cmd = filespec('N', cmd) /* Chop path */
c = lastpos('.', cmd)
if c > 1 then
cmd = left(cmd, c - 1) /* Chop extension */
Gbl.!CmdName = translate(cmd, xrange('a', 'z'), xrange('A', 'Z')) /* Lowercase */
return

/* end GetCmdName */

/*=== GetTmpDir() Get TMP dir name with trailing backslash, set Gbl. ===*/

GetTmpDir: procedure expose Gbl.
tmpDir = value('TMP',,Gbl.!Env)
if tmpDir \= '' & right(tmpDir, 1) \= ':' & right(tmpDir, 1) \== '\' then
tmpDir = tmpDir'\' /* Stuff backslash */
Gbl.!TmpDir = tmpDir
return

/* end GetTmpDir */

/*=== Halt() Report HALT condition and exit ===*/

Halt:
say
parse source . . cmd
say 'CONDITION'('C') 'signaled at' cmd 'line' SIGL'.'
say 'Source = ' 'SOURCELINE'(SIGL)
call 'SYSSLEEP' 2
say 'Exiting.'
exit 253

/* end Halt */

/*=== LoadRexxUtil() Load RexxUtil functions ===*/

LoadRexxUtil:
if RxFuncQuery('SysLoadFuncs') then do
call RxFuncAdd 'SysLoadFuncs', 'REXXUTIL', 'SysLoadFuncs'
if RESULT then
call Fatal 'Cannot load SysLoadFuncs'
call SysLoadFuncs
end
return

/* end LoadRexxUtil */

/*=== ScanArgs(cmdLine) Scan command line ===*/

ScanArgs: procedure expose Gbl.

/* Calls user exits to process arguments and switches */

parse arg cmdTail
cmdTail = strip(cmdTail)

call ScanArgsInit

/* Scan */
curArg = '' /* Current arg string */
curSwList = '' /* Current switch list */
/* curSwArg = '' */ /* Current switch argument, if needed */
noMoreSw = 0 /* End of switches */

do while cmdTail \== '' | curArg \== '' | curSwList \== ''

if curArg == '' then do
/* Buffer empty, refill */
qChar = left(cmdTail, 1) /* Remember quote */
if \ verify(qChar,'''"', 'M') then do
parse var cmdTail curArg cmdTail /* Not quoted */
end
else do
/* Arg is quoted */
curArg = ''
do forever
/* Parse dropping quotes */
parse var cmdTail (qChar)quotedPart(qChar) cmdTail
curArg = curArg || quotedPart
/* Check for escaped quote within quoted string (i.e. "" or '') */
if left(cmdTail, 1) \== qChar then
leave /* No, done */
curArg = curArg || qChar /* Append quote */
if keepQuoted then
curArg = curArg || qChar /* Append escaped quote */
parse var cmdTail (qChar) cmdTail
end /* do */
if keepQuoted then
curArg = qChar || curArg || qChar /* requote */
end /* if quoted */
end

/* If switch buffer empty, refill */
if curSwList == '' then do
if left(curArg, 1) == '-' & curArg \== '-' then do
if noMoreSw then
call ScanArgsUsage 'switch '''curArg''' unexpected'
else if curArg == '--' then
noMoreSw = 1
else do
curSwList = substr(curArg, 2) /* Remember switch string */
curArg = '' /* Mark empty */
iterate /* Refill arg buffer */
end
parse var cmdTail curArg cmdTail
end
end

/* If switch in progress */
if curSwList \== '' then do
curSw = left(curSwList, 1) /* Next switch */
curSwList = substr(curSwList, 2) /* Drop from pending */
/* Check switch allows argument, avoid matching ? */
if pos(curSw, translate(swCtl,,'?')) \= 0 then do
if curSwList \== '' then do
curSwArg = curSwList /* Use rest of switch string for switch argument */
curSwList = ''
end
else if curArg \== '' & left(curArg, 1) \== '-' then do
curSwArg = curArg /* Arg string is switch argument */
curArg = '' /* Mark arg string empty */
end
else if pos(curSw'?', swCtl) = 0 then
call ScanArgsUsage 'Switch' curSw 'requires argument'
else
curSwArg = '' /* Optional arg omitted */
end

call ScanArgsSwitch /* Passing curSw and curSwArg */
drop curSwArg /* Must be used by now */
end /* if switch */

/* If arg */
else if curArg \== '' then do
noMoreSw = 1
call ScanArgsArg /* Passing curArg */
curArg = ''
end

end /* while not done */

call ScanArgsTerm

return

/* end ScanArgs */

/* The end */

Joachim Benjamins

unread,

Dec 8, 2011, 3:47:05 AM12/8/11

to apa...@googlegroups.com

Massimo wrote:
>
> Il 08/12/2011 1.41, Lewis G Rosenthal ha scritto:
>> Space is indeed becoming something of a concern. I'm soon going to need to expand the array
>> again (though this chassis has plenty of room). I have one mail domain using almost 15GB,
>
> my respect ;)
> 15GB for a mail domain is even more than all my emails on my ecs desktop
> since 1997 to now, about 11GB

Seriously? Our mail domains (okay, that are a lot) add up to over 100 GB
easily...

>> and
>> they are transitioning off of the server in the next couple of months; that will free up some
>> room, as well.
>
> you know as array the number 1 under eCS is LSI Logic 300-8x sata2 hw raid,
> a real monster in performances i use it on all servers of mine

We use the IBM ServeRAID 6i controller with 15K UW320 SCSI disks (in
RAID5 mode), but performance is not exceptionally good.

> massimo

--
Kind regards / met vriendelijke groet,

Joachim Benjamins

Massimo

unread,

Dec 8, 2011, 8:37:21 AM12/8/11

to apa...@googlegroups.com

3 partitions here

1 - jfs C 2GB
2 - jfs D 320GB
3 - jfs E 152GB
4 - sadump F 2GB

with the "+" in the check option, about 4-5 minutes...

i repeat, this speed is thanks to the lsi logic raid controller

massimo

Lewis G Rosenthal

unread,

Dec 8, 2011, 11:26:52 AM12/8/11

to apa...@googlegroups.com

Good morning...

On 12/08/11 02:53 am, Steven Levine thus wrote :

> In<4EE05745...@2rosenthals.com>, on 12/08/11
> at 01:20 AM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi,
>

<snip>

>> Surely, the HPFS chkdsk
>> would be way slower per block,
> Worse than that, as I've mentioned before, the design is sure the maximize
> head movement.
>

Though that is somewhat mitigated in a RAID 5 configuration with 6
physical drives. Elevator seeking and TCQ comes into play to help
alleviate such bottlenecks (though admittedly, HP's documentation on the
full capabilities of the CPQCISS and CPQARRAY drivers is rather sparse).

>> This sounds good. In my case, my reasoning is that I want to get the
>> blasted machine back on its feet as quickly as possible.
> Given your setup, it makes sense to leave as much as possible for later.
> The longer the recovery time, the more likely that the recovery time will
> include a couple of phone calls. It's just the way it is.
>

Exactly.

>> True. I've been lucky, I guess. For the most part, even when the system
>> reboots itself, it tends to clean up nicely before it goes down
>> (apparently).
> It depends on what triggers the reboot. If it is system exception,
> there's no cleanup at all. The theory is that if the system is this
> messed up, trying to do any clean up has a higher probability of doing
> damange that just discarding everything in memory. What you are seeing is
> the fact that there's a lot more disk reads than disk writes and that in
> relative terms disk writes occur infrequently.
>

...unless we're running an rsync pass... But indeed, most of the writes
(aside from incoming mail) are done to the MySQL database, and that runs
on NetWare, across the wire.

>> Ah. I know I can do it on Linux with sshd. I don't know why I was
>> thinking of stunnel for this, instead. I didn't know that our current
>> sshd port would not support sftp, though.
> That's not the problem. The problem is that our sshd port is not terribly
> stable. Dave and I were working on a new port, but then family matters
> got in his way.
>
> The sftp and ssh clients are OK. I've used them a lot with *ix boxes.
>

As do I. I was not aware, though, that that server component was not as
stable. I'd love to be able to offer sftp to my clients at some point.

Cheers/2

Lewis G Rosenthal

unread,

Dec 8, 2011, 11:30:08 AM12/8/11

to apa...@googlegroups.com

On 12/08/11 03:17 am, Steven Levine thus wrote :

> In<4EDFC2F2...@2rosenthals.com>, on 12/07/11
> at 02:48 PM, Lewis G Rosenthal<lgros...@2rosenthals.com> said:
>
> Hi guys,
>
>> What libc version(s) do you have on that box? On this one, I currently
>> have:
> This is pretty much irrelevant. A gcc based application is going to use
> the libc is was linked with and there is only one build of each libc
> version, TTBOMK. I'm ignoring the private libc06x versions which Paul
> does not seem to be using.
>

Good information; thanks.

>> ...and what do you know? I found these outdated ones in C:\ECS\DLL just
>> now:
>> 2-23-04 15:40 28,718 0 gcc322.dll
>> 8-06-09 10:39 23,266 0 gcc434.dll
>> 11-16-09 23:45 22,569 0 gcc442.dll
>> 5-01-10 8:47 22,647 0 gcc444.dll
>> 12-29-10 0:27 21,899 0 gcc452.dll
> They are not out dated. They just have different timestamps. Try a
> binary compare.
>

Too late to play. Zapped, and gone. :-)

>> which I'm going to remove *right now* (\ECS\DLL *precedes* \OS2\DLL in
>> LIBPATH...ugh...).
> You might as well get rid of the duplicates because it reduces clutter.
>

Indeed.

>> Steve has a nice script, but it has never worked for me (even under
>> 4OS2) for some reason.
> We never did get around to figuring this out, did we. I broke down a
> rewrote the script in Classic REXX long ago. Let's see if the text
> attachment gets through Google's protections.
>
> It didn't. :)
>

Oh, well. :-)

Steven Levine

unread,

Dec 8, 2011, 5:51:40 PM12/8/11

to apa...@googlegroups.com

In <4EE0E610...@2rosenthals.com>, on 12/08/11
at 11:30 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>> It didn't. :)
>>
>Oh, well. :-)

I was just testing to see if Google was being consistent. I resent it
pasted after the tear bar so it's available.

I probably should build an outbound filter to do the rename automatically.

Steven Levine

unread,

Dec 8, 2011, 6:24:05 PM12/8/11

to apa...@googlegroups.com

In <4EE0E54C...@2rosenthals.com>, on 12/08/11

at 11:26 AM, Lewis G Rosenthal <lgros...@2rosenthals.com> said:

Hi,

>Though that is somewhat mitigated in a RAID 5 configuration with 6

>physical drives. Elevator seeking and TCQ comes into play to help
>alleviate such bottlenecks (though admittedly, HP's documentation on the
>full capabilities of the CPQCISS and CPQARRAY drivers is rather sparse).

This will help, but when you have up to 256 threads all trying to do disk
reads to different parts of the drive this strain the abilities of almost
any optimization algorithm.

For example a chkdsk of a 1GB HPFS boot volume results in the following
pstat output

08B1 0039 14 F:\OS2\CHKDSK.COM 01 0200
FD837648 Block
02 0200 0A001CE4
Ready
03 0200 0A001D44
Block
04 0200 0A001D04
Block
05 0200 0A001D24
Block
06 0200 FD83765C
Block
07 0200 FD83765C
Block
08 0200 FD83765C
Block
09 0200 FD83765C
Block

Large drives will create more threads.

>As do I. I was not aware, though, that that server component was not as
>stable.

There's that and it requires Security/2 for the password database.

Paul Smedley

unread,

Dec 8, 2011, 7:04:49 PM12/8/11

to apa...@googlegroups.com

For the record, I do follow the list, but spare time is at a premium
recently, so my participation has been limited.

I have 3 weeks vacation over Christmas so will try and catch up on a
few projects during that time.

Cheers

Paul

Sent from my iPhone

On 08/12/2011, at 11:11 AM, Lewis G Rosenthal
<lgros...@2rosenthals.com> wrote:

> --
> You received this message because you are subscribed to the Google Groups "Apache for OS/2" group.
> To post to this group, send email to apa...@googlegroups.com.
> To unsubscribe from this group, send email to apache2+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/apache2?hl=en.
>

Lewis G Rosenthal

unread,

Dec 8, 2011, 9:02:52 PM12/8/11

to apa...@googlegroups.com

On 12/08/11 06:24 pm, Steven Levine thus wrote :

No doubt that it's resource intensive. I think my point of contrast is
that the disk I/O bottleneck is significantly reduced under an efficient
storage bus, such as SCSI, and particularly with a RAID 5 configuration.
Still, your point is well taken; chkdsk on HPFS is simply not very
efficient (as compared to JFS).

>> As do I. I was not aware, though, that that server component was not as
>> stable.
> There's that and it requires Security/2 for the password database.
>

Right. I forgot that there was some major reason I held off setting it
up... :-)

Dave Saville

unread,

Dec 9, 2011, 3:01:46 AM12/9/11

to apa...@googlegroups.com

On Thu, 08 Dec 2011 21:02:52 -0500 Lewis G Rosenthal wrote:

>>>As do I. I was not aware, though, that that server component was not as
>>>stable.
>>There's that and it requires Security/2 for the password database.
>>
>Right. I forgot that there was some major reason I held off setting it
>up... :-)
>

There are two issues here. My server is a twin CPU board and I, or rather
Steven looking at the dumps, found that Security/2 does not appear to be
thread safe on a multi-engined box. All the SSHD crashes were in Security/2
modules.

The porter is unreachable or does not care to respond. I tried to recompile
from the tar ball with some help from Paul and Steven and found that TTY
stuff is not included/work on the current libc. (It is/does in EMX but
putting the source through EMX produced so many errors I gave up.)

I have binaries that do not depend on Security/2 and work fine for using
SSH in tunnel mode but terminal mode and SFTP do not. Steven and I were
about to track it down but then Zena was taken ill. As she is due home on
the 19th I hope to get back to it. Along with a load of other projects. :-)

--
Kind regards

Dave Saville

Lewis G Rosenthal

unread,

Dec 9, 2011, 9:42:54 AM12/9/11

to apa...@googlegroups.com

Hey, Dave!

On 12/09/11 03:01 am, Dave Saville thus wrote :

> On Thu, 08 Dec 2011 21:02:52 -0500 Lewis G Rosenthal wrote:
>
>>>> As do I. I was not aware, though, that that server component was
>>>> not as
>>>> stable.
>>> There's that and it requires Security/2 for the password database.
>>>
>> Right. I forgot that there was some major reason I held off setting
>> it up... :-)
>>
>
> There are two issues here. My server is a twin CPU board and I, or
> rather Steven looking at the dumps, found that Security/2 does not
> appear to be thread safe on a multi-engined box. All the SSHD crashes
> were in Security/2 modules.
>

Very interesting; thanks for the warning (my Proliant is a dual Xeon,
and I run SMP).

> The porter is unreachable or does not care to respond. I tried to
> recompile from the tar ball with some help from Paul and Steven and
> found that TTY stuff is not included/work on the current libc. (It
> is/does in EMX but putting the source through EMX produced so many
> errors I gave up.)
> I have binaries that do not depend on Security/2 and work fine for
> using SSH in tunnel mode but terminal mode and SFTP do not. Steven and
> I were about to track it down but then Zena was taken ill. As she is
> due home on the 19th I hope to get back to it. Along with a load of
> other projects. :-)
>

Thanks for the clarification and the additional status info. Yet another
reason to hope for Zena's speedy recovery! :-) Again, glad she's coming
home.

Cheers/2

Reply all

Reply to author

Forward