Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Server reboot?

1 view
Skip to first unread message

Dave Olson

unread,
Dec 3, 2000, 3:00:00 AM12/3/00
to
This message/question isn't about an abend, but I''m not sure where else
to put it.

My main server rebooted yesterday morning about 5:30 am. It's a Dell
PowerEdge 4200, with 190MB RAM. Never had a problem like this before
with it. For the past week or so, it's been slow to respond to GW and
other apps that run off the network, like our Help Desk program, and
just using Explore to get a list of files in a folder - we get the
"flashlight" while it "collects" the files before displaying them on the
Win95 screen. When this happens, I've looked at the Monitor screen on
the server and found Utilization consistently at between 30 - 60% for
several minutes at a time. I don't look at that very often, but when I
do, it's almost always between 0-10%, with a few spikes into the 20's or
30's for only a few seconds.

Here's what I found in the SYS$LOG.ERR file:

12-02-00 5:07:58 am: SERVER-4.11-2324
Severity = 5 Locus = 19 Class = 2
Cache memory allocator out of available memory.

12-02-00 5:07:58 am: SERVER-4.11-830
Severity = 5 Locus = 1 Class = 1
Short term memory allocator is out of memory.
1 attempts to get more memory failed.

Then, 20 minutes later:

12-02-00 5:27:48 am: SERVER-4.11-830
Severity = 5 Locus = 1 Class = 1
Short term memory allocator is out of memory.
8624 attempts to get more memory failed.

12-02-00 5:27:49 am: SERVER-4.11-2324
Severity = 5 Locus = 19 Class = 2
Cache memory allocator out of available memory.

12-02-00 5:32:37 am: DS-6.9-28
Severity = 1 Locus = 17 Class = 19
Bindery open requested by the SERVER

Nothing should have been running at that time, to request more memory,
except a GW PO and a GW MTA. But there couldn't have been much activity
at that hour. I checked the POA log file, and it's got no activity at
that time except for "lack of memory" errors. The MTA log file is
empty, for some reason, but that seems to indicate the MTA wasn't
routing any messages at all that morning.

There is nothing in the Abend file for that morning - last entry is from
August.

Cache Utilization on Monitor shows (right now) 100% on 3 of the top 4
lines; only Long Term Cache Hits is lower, at 99%. I think I remember
looking at this screen Friday, and Long Term Cache Hits was at 96%.

LRU sitting time is now at 15 hours, and I think it was there on Friday
also.
Original Cache Buffers is 64,900 or so. Total Cache Buffers is at
48,000, same as Friday, I believe.

Any ideas as to what happened? Or where else I should look for
information and an explanation?

Can a lack of memory like this cause a server re-boot???

Thanks, Dave Olson

Andrew C Taubman

unread,
Dec 4, 2000, 5:55:11 PM12/4/00
to
Looks like a memory or utilization problem. Please download confg9.exe from
the File Finder on support.novell.com . Extract it into the SYSTEM directory
and from the console run Load Config /s. Post the resulting config.txt file
here, deleting your serial number and any public IP addresses first. Thanks.
--
Andrew C Taubman
Novell Support Connection Volunteer SysOp
http://support.novell.com/forums
(Sorry, support is not provided via e-mail)
Opinions expressed in the above text are not
necessarily those of Novell Inc.


Dave Olson

unread,
Dec 5, 2000, 3:00:00 AM12/5/00
to
Here's the config.txt file.
DO_InfoConfig.txt

Andrew C Taubman

unread,
Dec 5, 2000, 5:23:26 PM12/5/00
to
Ahh, you have the famous 1997 aic7870 driver that causes all sorts of
problems. Update to the latest one from Adaptec, there is at least a 1999
one. When it's updated edit the startup.ncf line that loads it to read

LOAD AIC7870.DSK SLOT=10002 TAG_DISABLE=FFFF READ_AFTER_WRITE=0

The disk is on IRQ 15, so make sure your secondary IDE controller (if any)
is disabled in bios. Move the lan card from IRQ 9 (as this is often used by
video cards) to something unused like 5.

Dave Olson

unread,
Dec 6, 2000, 3:00:00 AM12/6/00
to
OK. I have a '99 version of both the AIC7870.DSK driver, plus a '99 version of
the AHA2940.HAM driver. I talked to Dell about these, and they confirmed my
thought that Novell is pushing the HAM drivers now. So do you agree I should
go to the AHA2940 and forget the DSK driver? If so, Dell says I should replace
the PEDGE4xx driver - that runs the hardware RAID card - with a HAM equivalent
also.

Comments?

Thanks, do

Andrew C Taubman

unread,
Dec 6, 2000, 10:48:42 PM12/6/00
to
I have no strong opinion either way. Certainly the HAM/CDM model is the only
supported one these days, and NW5 uses nothing else. However if dsk works
for you that's fine.

Dave Olson

unread,
Dec 10, 2000, 12:26:35 PM12/10/00
to
Yesterday morning I found the same server in a "searching for cache memory"
state again. Total Cache Buffers was down to about 350. LRU Waiting time was
about 10-15 seconds. I have other stats, but I left them at work. Can give you
more on Monday.

When I tried to down it, it abended and hung. Got it back up with a power
cycle, then downed it the right way and copied the '99 AHA2940 files, plus an
updated PEDGE.HAM file from Dell, and loaded them instead of the DSK drivers. I
hadn't had a chance to load them during the week.

1) I didn't add the TAG_DISABLE=FFFF READ_AFTER_WRITE=0 parameter you had
specified for the DSK driver because I didn't know if it was also for the HAM
driver. Should I use it? What does it do and why are you recommending it?

2) Do you think the extremely low cache buffers is related to that '97 version
of the DSK driver? Or should we also be looking elsewhere for that problem?

Thanks, do

Andrew C Taubman

unread,
Dec 10, 2000, 7:46:35 PM12/10/00
to
1/ No, they are only for the aic7870.dsk driver. They help the write speed
and also reliability of the card/driver
2/ It is possible, but it's likely to be elsewhere

Dave Olson

unread,
Dec 18, 2000, 2:45:09 PM12/18/00
to
Well, here's more info. Hope you are still monitoring this thread.

12/9/00 - early Saturday morning - got more cache utilization errors, like
previous Saturday, but server did not reboot. Total Cache Buffers was down to
about 250!!! I got an abend when trying to down the server, but got it back
up OK. Replaced the DSK driver with '99 version of HAM driver.

12/15/00 - In trying to further troubleshoot the High Utilization problem, I
found 26,000+ 3KB files in the Groupwise PO folder on the server that is
having the problems - in OFFILES\FDF folder, one of the folders that holds
attachments. (I had the same type of problem as at another PO a couple of
months ago. I opened an incident with Novell at that time, and ended up
deleting the bogus attachment files.) These 26,000 files have date of 11/27,
starting about 3:00 am, going to about 3:30 am - so they "got born" about the
same time that my High Utilization problem started, and just before the first
reboot of the server early Saturday morning on 12/2.

12/16/00 - Saturday - again got cache memory errors, starting about 5:00 am.
Same problems trying to DOWN server as last Saturday. Got it up and running.
Then realized that THERE IS something running on that server that early in the
morning. I have a GWCheck / Contents start at 4:00 am on all servers, and on
this one it takes about an hour. Checked the log file and found the cache
memory errors started just about the same time that the GWCheck ended.

Then I looked at the log file from 12/9 and found the same thing - the errors
began just as the GWCheck was ending. On 12/2, the day the server rebooted at
about that time, there is no GWCheck log file. So I guess the server rebooted
during the GWCheck, so the file didn't get written.

So the question, I guess, is this: Could the GWCheck process - because of all
the extra, "bogus" attachment files - have caused the server to reboot on
12/2, and then caused the cache memory errors the next two weeks??? The
timing seems to say Yes and No: Yes because the memory errors began right at
the end of the GWCheck process; No because the process was either totally or
almost finished - not right in the middle - when the memory errors began.

Any ideas???

Thanks, do

0 new messages