Dual-Processor Dell Poweredge running SCO 5.0.6
Patches - RS506A
Up until recently this server has been running fine until the air con
unit leaked onto it. This caused a problem with one of the PSU's.
After switching the server off, and leaving the cover off it overnight
we were then able to power up the server.
It lasted about 2hrs before the users were getting errors in the
software and then 3 disks started to beep. When rebooting the server
it came up with 1 logical drive failed.
There are 6 disks in this server
1 x mirrored pair
1 x Raid 5
3 of the disks in the raid 5 were flashing. Hence it would boot to the
SCO login prompt but wouldn't mount the data array.
We were able to go into the SCSI utilities (CTRL M) and select these 3
disks and force them on-line. The beeping & flashing went away and
were were now able to mount the data array. Users were able to log back
on - all seemed fine - no further errors reported.
The above happened on a Friday and then on the Monday morning the
server panicked with the following:
PANIC:HTFS block 119110532 epi already allocated on HTFS dev hd (1/104)
cannot dump 524127 pages th dump hd(1/41) space for only 131072 pages .
Dump not complete .
When server was rebooted it said that there were no logical drives on
the host adapter and 0 logical drives handled by the bios. Hardware
engineer went to site to replace the PERC card - tried to replace this
but it then wouldn't see the O/S disks. He then put the old card back
in place and managed to get it working.
The server has been up and running for a couple of weeks with no
problems apart from every now and again I get the following error
message written to syslog and messages:
CPU2: NOTICE: Sdsk: Unrecoverable error writing SCSI disk 1 dev 1/104
(ha=0 bus=0 id=0 lun=1) block=11066764
It is not always the same block number listed.
Why does this error only occur for CPU2?
How do I rectify this problem?
Any help would be greatly appreciated
Having seen moisture get into computers before sometimes you just
need to start over with a fresh motherboard, and perhaps
Since it ran find before and now just runs a day or so before you
have the problem I suspect there is some hardware failure.
Moisture can cause shorts - and perhaps something was stressed
enought to make it unreliable, and will run a few hours/days before
something makes it fail and give errors.
Water will NOT hurt computers/electonics IF it is turned off and
then let dry thoroughly. Of course that pre-supposes 'clean'
A very long time ago when I worked in broadcast the station I was
at bought an FM transmitter that had been put on line in the late
1940s when FM was 'the coming thing'. It turned out FM
practically died except in major cities.
The transmitter sat in a chicken coop for over 10 years.
It was covered with a lot of chicken excrement.
The solution was to take a hose and thoroughly wash everything, and
then let it set and complete dry out for about a week.
But moisture on a running piece of equipment usually means you can
not trust that equipment any more - at least in critical
Bill Vermillion - bv @ wjv . com
How old is the server? How important is your data to you?
That machine is obviously unreliable at this point. Is it still under
warranty? Does Dell cover things like moisture incursion?
If so I would be putting pressure on the to replace EVERYTHING. If it's
no longer under warranty or covered - think real hard about getting a
That message indicates unrecoverable hard disk errors when data is read
or written - you are losing data.
I hope you are getting verified backups via Backup Edge or Lone-tar. I
would not trust backups via tar or cpio.
Pat Welch, UBB Computer Services, a WCS Affiliate
SCO Authorized Partner
(209) 745-1401 Cell: (209) 251-9120
Lesson learned I hope. When a machine, ANY MACHINE, has water integrity
issues, just shut it down, pull every stinking board that is plugged in, dry
them by wiping, blowing, and in some instances using alcohol to make sure
that any and all visible water has been dried. Then use the old hair drier
to blow the port slots dry. Once this is done, let it sit for at least 24
hours with a fan blowing on it. Now you can re-insert the boards, boot it
up, make a good backup, shut it down, and replace the whole thing with a new
The warning is this. WATER conducts electriciy and infiltrates in ways we
can not predict. Therefore, even by drying, we can not be sure that the
water has not caused unseen damage. When dealing with data integrity,
customer files, security related information, we can not and should not ever
assume things will be OK. We must KNOW they will. Thus, replacement of
equipment is the only reasonable answer.
I could tell numerous horror stories of water damaged machines (I've been
involved with many), and none of them ends in story-book fashion.
I hope you have good backups. Good luck on the recovery.
JP - piperent
The server is probably just over a year old - not sure what the
warranty is on it - will check that out.
Why would only CPU2 report the errors?
Do you know of any diagnostics that can be run to determine whether
CPU2 is having problems?
I believe that this customer does have insurance for this type of thing
and so will discuss this issue with him
Thanks for your reply
Will see whether it is still under warranty with Dell on Monday.
If not customer said that he was insured.
Will get this sorted sooner rather than later as I don't want to go
through the pain of this again.
Use displayintr to show which cpu's are handling which hardware driver
interrupts. Depending on how the driver is written, the interrupt may
be fixed to a specific cpu, shareable among all cpu's, or be assignable
at boot time.
See man displayintr (ADM) for more details.
> Lesson learned I hope. When a machine, ANY MACHINE, has water integrity
> issues, just shut it down, pull every stinking board that is plugged in, dry
> them by wiping, blowing, and in some instances using alcohol to make sure
> that any and all visible water has been dried. Then use the old hair drier
> to blow the port slots dry. Once this is done, let it sit for at least 24
> hours with a fan blowing on it. Now you can re-insert the boards, boot it
> up, make a good backup, shut it down, and replace the whole thing with a new
> The warning is this. WATER conducts electriciy and infiltrates in ways we
> can not predict. Therefore, even by drying, we can not be sure that the
> water has not caused unseen damage. When dealing with data integrity,
> customer files, security related information, we can not and should not ever
> assume things will be OK. We must KNOW they will. Thus, replacement of
> equipment is the only reasonable answer.
> I could tell numerous horror stories of water damaged machines (I've been
> involved with many), and none of them ends in story-book fashion.
Just last week I was helping someone by phone in adding a DVD drive to
a system. We had gotten to the point where I had verified everything I
needed to know before the physical install, so he said he'd call me
back once it was installed.
So he did.
Because the machine was dirty when he opened it up, he decided to blow
Conveniently, this machine was located in an auto repair shop, which of
course always has a nice supply of quite high pressure, oil and grease
laden air. The high pressure is great for driving the oil and dirt
deep into otherwise inaccessible places. Oil and grease are pretty
good conductors, which is always helpful.
Oh my. Error upon error, panic upon panic. That's what was going on
when he called me back. I wasn't particularly surprised, expressed my
condolences, and suggested new hardware.
To my surprise, he called back several hours later, having somehow
managed to get the machine working. I wouldn't ever trust the thing
again, but we finished up the project and I left it at that..
(Unix/Linux/Mac OSX resources etc.)
I guess the only thing worse would be blowing it out with a sand
blaster... oh, wait, no I think they make something that blows tiny
metal balls.. there, that out to clean 'er out!
Oil and grease generally do not conduct electricity, and mechanic shops air
compressors generally do not have oil and grease in the air.
However water does conduct and all air compressors have water in their air.
The act of compressing air squeases water out of it and so water collects
everywhere that air gets compressed. This includes all along the air lines
not just in the tank.
There are "dryers" on the air lines to try to remove it but it's often not
very effective. All the dryer is is a little tank to try to get air being
compressed into the line to lose it's water there in the little tank before
reaching the line. But really, every part of the system that has compressed
air in it, also has air _being compressed_ in it, and wherever air gets
compressed, water condenses out of it. So water collects all along the lines
and blows out with the air.
That also explains why it started working again after a while.
And as the air expands it will cool and can cause condensation