OpenServer 6.0.0 hangs randomly. Please help!

27 views
Skip to first unread message

Alberto Rodriguez

unread,
Aug 21, 2007, 3:39:13 PM8/21/07
to

Hi all,

I have a customer with a big problem in a very critical server. SCO
OpenServer 6.0.0 hangs randomly without any error message...

The scenario:

Server:
-----------------------------------------------------

- Hewlett Packard ProLiant ML370-G4
- Two Intel Xeon @ 3.6 GHz, 3 GB RAM
- Dual channel on-board LSI U320-SCSI controller
- HP SmartArray 6404, four channel U320-SCSI RAID adapter
- Two disks HP, 146 GB, RAID-1, installed in the server's hot-swap
cabinet, connected to 6404's channel #1 (Logical Unit 1)
- External cabinet Hewlett Packard StorageWorks Modular Smart Array 30
(MSA30), dual bus, 14 bays. First bus (bays 1 to 7) connected to
6404's channel #3. Second bus (bays 8 to 14) connected to 6404's
channel #4
- 6404's channel #2, not used
- Four HP disks, 76 GB, U320-SCSI, RAID 0+1, located in bays 1, 2, 8,
9 of MSA30 (Logical Unit 2)
- Five HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 3, 4, 5,
6, 7 of MSA30 (Logical Unit 3)
- Three HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 10, 11,
12 of MSA30 (Logical Unit 4)
- Bays 13 and 14 are empty
- HP StorageWorks Ultrium 460 SCSI Tape connected to LSI HBA
- Redundant Power Supplies in the ML370 and MSA30
- ML370 and MSA 30 plugged to an on-line Merlin Gerin UPS with two
power cords each
- All firmwares and bios (motherboard, HBAs, disks) updated to the
latest version (HP FW Maintenance CD 7.90, 03/aug/2007)


Operating System and Patches (custom & pkginfo)
----------------------------------------------------

- SCO OpenServer 6.0.0, Enterprise Edition
- Maintenance Pack 2 (MP2)
- OSS703A
- OSS706C
- Network Drivers (nd 8.0.6e)
- mtools 3.9.10Sa
- Graphical User Interface (Qt)
- Mozilla 1.7.13Ba
- KDE 3.5.2
- KDE i18n Language Support 3.5.2 (set to Spanish)
- Java 2 Standard Edition Runtime Environment 1.4.2.06
- Java 2 Standard Edition Runtime Environment 1.5.0.06
- mpt driver (LSI) 8.0.2 (from OSR6 CD)
- ciss driver (SmartArray 6404) 8.0.2 (from OSR6 CD)
- Generic IDE/ATAPI Driver
- HP ProLiant Support Pack 7.770a for SCO OpenServer 6 (latest at HP
website), including:
--- HP Proliant Extended Feature Supplement
--- HP Proliant EFS Documentation Package 7.770a
- Samba 3.0.20

Logical Units
--------------------------------------------

Logical Unit 1 (RAID 1) contains root, swap and /u filesystems. /u -->
TransTOOLs MultiBase 3.0 Database Engine
Logical Unit 2 (RAID 0+1) contains one filesystem --> Progress 8.3B
Database Engine
Logical Unit 3 (RAID 5) contains three filesystems --> Files
Logical Unit 4 (RAID 5) contains one filesystem --> Files


About 50 PCs (Windows 2000/XP) running a Client-Server application
with Progress Runtime
A few PCs (Windows 2000/XP) running a Client-Server application with
MultiBase Runtime
About 20 HP Network Printers driven with "netcat"
scologin is disabled and only runs from the shell with "startx".
Graphical environment not used in daily work.

And now the weird problem...


System running flawlessly, and suddenly hangs. No Notices, Warnings or
Panics in /usr/adm/messages and /usr/adm/syslog...

No keyboard at console, no telnet connections... NO NOTHING! It just
hangs!

The only action allowed is to reset or power-cycle the server, enter
in maintenance mode, full fsck all filesystems (about 45 min.) and
running again... until the next crash.

System installed last February 2007. Problems happening from the
begining.

Hewlett Packard replaced motherboard and power supplies, with no
success.

There is not a pattern for hangs. It may occur once in 15 days or
twice in the same day.

Same software and applications have been running without problems in
an old HP NetServer LH 6000 for years, with OSR 5.0.5

The only strange messages in /usr/adm/messages and /usr/adm/syslog are
related to Samba. There are A LOT of error messages, but this happens
in all OSR6 boxes I've installed and I think this must be treated in
another thread.


What can I do? I'm desperate, not to mention my customer...

My apologies for long post.

Any help will be very appreciated!!!

Many thanks in advance and best regards.


Greetings from Spain.
--
Alberto Rodriguez Rodriguez
alberto----@----unilogic.es

Boyd Lynn Gerber

unread,
Aug 21, 2007, 4:44:40 PM8/21/07
to Alberto Rodriguez
On Tue, 21 Aug 2007, Alberto Rodriguez wrote:
> Hi all,
>
> I have a customer with a big problem in a very critical server. SCO
> OpenServer 6.0.0 hangs randomly without any error message...
...

I have seen this happen with bad memory. That is why I have customer only
use paritiy memory modules with it enabled. I have found that using a
linux utility to check memory over 96 hours. I often find random memory
problems cause what you are seeing.

--
Boyd Gerber <ger...@zenez.com>
ZENEZ 1042 East Fort Union #135, Midvale Utah 84047

Bill Vermillion

unread,
Aug 21, 2007, 5:11:56 PM8/21/07
to
In article <1187725153.0...@r29g2000hsg.googlegroups.com>,

Alberto Rodriguez <alb...@unilogic.es> wrote:
>
>Hi all,
>
>I have a customer with a big problem in a very critical server. SCO
>OpenServer 6.0.0 hangs randomly without any error message...

I'll second what Boyd says.

The HW vendor did NOT follow my specs and had a machine that would
not recognize nor use ECC memory.

We tried several things including putting in an AC recording line
monitor. We looked around the area to see if anyone had a
high-power transmitter that could be causing the problem.

And - since I really thought it had ECC we never suspected that.

Got the HW vendor to replace the MB with one that supported
ECC and put in ECC and problem solved.

I've also seen places where the computer is on a cirtuit that
has some devices that make the line unstable. Such things
as a new refrigerator in the lunch room next to the computer room,
setting up PCs while on the other side of the wall was an incoming
power panel, microwave ovens, and anything else that may be
on the power line.

And the strangest one I had was where we put in an excellent BEST
power supply for a critical office [they thought they were critical
and it was the secretaries of the President that were using this].

Things were good for awhile and then unexpected reboots - and there
were power-sags in that area that the BEST should have taken care
of. I monitored it's output on install and we could see less than
.1 V difference in outpu with over a 20V swing on input.

It turns out the secretaries re-arranged their office.

The computer was plugged into the wall socket. Their cheap
transistor radio was the only thing plugged into the $$$ BEST.

As Boyd says - it really sounds like memory.

If that doesn't fix it, go through the litany I posted above [
which is far from complete]

Bill
Bill
--
Bill Vermillion - bv @ wjv . com

Bill Campbell

unread,
Aug 21, 2007, 7:12:04 PM8/21/07
to sco-...@lists.celestial.com
On Tue, Aug 21, 2007, Bill Vermillion wrote:
>In article <1187725153.0...@r29g2000hsg.googlegroups.com>,
>Alberto Rodriguez <alb...@unilogic.es> wrote:
>>
>>Hi all,
>>
>>I have a customer with a big problem in a very critical server. SCO
>>OpenServer 6.0.0 hangs randomly without any error message...
>
>I'll second what Boyd says.
>
>The HW vendor did NOT follow my specs and had a machine that would
>not recognize nor use ECC memory.

Bela and others went into long rants when Intel and others were pushing the
idea that we don't need parity checking RAM.

>We tried several things including putting in an AC recording line
>monitor. We looked around the area to see if anyone had a
>high-power transmitter that could be causing the problem.
>
>And - since I really thought it had ECC we never suspected that.
>
>Got the HW vendor to replace the MB with one that supported
>ECC and put in ECC and problem solved.
>
>I've also seen places where the computer is on a cirtuit that
>has some devices that make the line unstable. Such things
>as a new refrigerator in the lunch room next to the computer room,
>setting up PCs while on the other side of the wall was an incoming
>power panel, microwave ovens, and anything else that may be
>on the power line.

Welding and machine shops are fun too.

>And the strangest one I had was where we put in an excellent BEST
>power supply for a critical office [they thought they were critical
>and it was the secretaries of the President that were using this].
>
>Things were good for awhile and then unexpected reboots - and there
>were power-sags in that area that the BEST should have taken care
>of. I monitored it's output on install and we could see less than
>.1 V difference in outpu with over a 20V swing on input.
>
>It turns out the secretaries re-arranged their office.
>
>The computer was plugged into the wall socket. Their cheap
>transistor radio was the only thing plugged into the $$$ BEST.

Did the secretaries also have floppies stored on the side of filing
cabinets with refrigerator magnets?

One of the strangest problems I came across was a Radio Shack Model II that
was intermittently failing. We finally traced the problem to a bad ballast
in the lighting fixture above the computer.

In 1981 or 1982 I got a bunch of calls about strange problems, which turned
out to be the result of major sun spot activity.

Bill
--
INTERNET: bi...@celestial.com Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676

Lord, the money we do spend on Government and it's not one bit better
than the government we got for one third the money twenty years ago.
Will Rogers

Bill Vermillion

unread,
Aug 22, 2007, 12:52:03 AM8/22/07
to
In article <mailman.5.1187738...@lists.celestial.com>,

Bill Campbell <bi...@celestial.com> wrote:
>On Tue, Aug 21, 2007, Bill Vermillion wrote:
>>In article <1187725153.0...@r29g2000hsg.googlegroups.com>,
>>Alberto Rodriguez <alb...@unilogic.es> wrote:

>>>Hi all,

>>>I have a customer with a big problem in a very critical server. SCO
>>>OpenServer 6.0.0 hangs randomly without any error message...

>>I'll second what Boyd says.

>>The HW vendor did NOT follow my specs and had a machine that would
>>not recognize nor use ECC memory.

>Bela and others went into long rants when Intel and others were
>pushing the idea that we don't need parity checking RAM.

Yup. And when I was running several SGI machines they really
had error correcting RAM, and every now and then I'd see a message
in the log file about a correction being made. That was much nicer
than just parity checking to tell you that there was a problem.

>>We tried several things including putting in an AC recording line
>>monitor. We looked around the area to see if anyone had a
>>high-power transmitter that could be causing the problem.

>>And - since I really thought it had ECC we never suspected that.

>>Got the HW vendor to replace the MB with one that supported
>>ECC and put in ECC and problem solved.

>>I've also seen places where the computer is on a cirtuit that
>>has some devices that make the line unstable. Such things
>>as a new refrigerator in the lunch room next to the computer room,
>>setting up PCs while on the other side of the wall was an incoming
>>power panel, microwave ovens, and anything else that may be
>>on the power line.

>Welding and machine shops are fun too.

The only problem I had with a machine shop - which had HUGE
Poreba lathes from Poland as nothing that big was being made in the
US anymore. One lathe would take a piece 23 inches in diamter
and FORTY FEET LONG - and cut threads on it - used for feedscrews
in the plastic injection industry. He also had a gun-barrel
drilling machine to put a hole through these 40 foot pieces, so
that device had to have 80 feet of space to be able to get
the drilling tool back far enough to get to the end.

He's gotten out of that and is now running web-sites geared toward
that industry and I have about 10 of those on a dedicated server
he has in our Level 3 rack.

But back to the story. I had just gotten a new UPS installed so
the owner decided to test it by pulling the plug from the wall.

Instantly everything crashed.

The building was fairly large and the Wyse terminals in the far
end were on a different circuit - and the serial board had
pin 1 - frame ground - connected. {better quality multi-port
serial boards had this pin disconnected].

Since the terminal were in another part of the building they were
on a different leg of the circuit and there was 110V coming down
the pin 1 wire from the terminal back to the computer.

Luckily nothing was damaged, but the next step was to make SURE
that no frame grounds were EVER connected to the serial cables.

>>And the strangest one I had was where we put in an excellent BEST
>>power supply for a critical office [they thought they were critical
>>and it was the secretaries of the President that were using this].

>>Things were good for awhile and then unexpected reboots - and there
>>were power-sags in that area that the BEST should have taken care
>>of. I monitored it's output on install and we could see less than
>>.1 V difference in outpu with over a 20V swing on input.

>>It turns out the secretaries re-arranged their office.

>>The computer was plugged into the wall socket. Their cheap
>>transistor radio was the only thing plugged into the $$$ BEST.

>Did the secretaries also have floppies stored on the side of filing
>cabinets with refrigerator magnets?

Not really. These two ladies had been there since the college
started - about 25 years before. It took one lady a LONG time
to stop using 'l' instead of '1' on the keyboard - as she grew up
in the typewriter days when there was no '1' on the keyboard.

But they were running Wyse 160 terminals - and I had two ports
on those set up as the apps wanted entirely different terminal
setups, and that was the best and easiest way to run things.
They also liked being able to hot-key between applications.
This was on SCO's Xenix and I had the Specialix RIO installed
with a huge loop running through about 6 offices. I really liked
that system as it was pretty much self healing. Lose a machine in
the middle and it would re-route. Sort of like a mini sonet ring.

>One of the strangest problems I came across was a Radio Shack
>Model II that was intermittently failing. We finally traced
>the problem to a bad ballast in the lighting fixture above the
>computer.

Argh. And I've had a problem - in the place above - where whoever
tan the serial cables from room to room ran them over and along
side the flouresecne lights.

>In 1981 or 1982 I got a bunch of calls about strange problems,
>which turned out to be the result of major sun spot activity.

Ah yes. That was back when chips were quite susceptible to outside
interference and the best chips were the ones in ceramic not
plastic.

I'm really glad we don't have problems like that anymore - at least
not very often.

jbo...@sco.com

unread,
Aug 22, 2007, 3:55:34 AM8/22/07
to
On 21 Aug, 21:39, Alberto Rodriguez <albe...@unilogic.es> wrote:
> Hi all,
>
> I have a customer with a big problem in a very critical server. SCO
> OpenServer 6.0.0 hangs randomly without any error message...
[ Detail removed]

> What can I do? I'm desperate, not to mention my customer...

Alberto,

As a start check out:

http://wdb1.sco.com/kb/showta?taid=116163

If the server is critical and the issue has been happening since last
Februray
I would recommend that you also escalate the issue to SCO Support via
your
support provider.

John


mbennett

unread,
Aug 22, 2007, 9:39:10 AM8/22/07
to
Alberto,

I suggest you try the procedure here:
http://osr600doc.sco.com/en/SM_trouble/CTOC-using_crash.html

It's been many years since I tried this myself, and it's going to
lengthen the time to reboot after the next crash. But you will likely
be able to isolate the issue.

Or you can just replace the memory.

Mark

scoace

unread,
Aug 22, 2007, 2:20:26 PM8/22/07
to
> alberto-...@----unilogic.es

Check the file /stand/boot. Make sure a line "KHZ=100" is there, or
add it and reboot.


Mike

Alberto Rodriguez

unread,
Aug 22, 2007, 3:52:53 PM8/22/07
to
Hi all,

THANK YOU to all of you guys for your interest and fast answers. You
are GREAT!!!

John wrote:
>
> As a start check out:
>
> http://wdb1.sco.com/kb/showta?taid=116163

Mike wrote:
>
> Check the file /stand/boot. Make sure a line "KHZ=100" is there, or
> add it and reboot.
>
> Mike


Following the link supplied by John, there is another link:
(http://www.sco.com/ta/126735) relative to UnixWare 7.1.4 in wich
it's explained the behaviour of the parameter KHZ (also valid for
OSR6).

This parameter is modified from KHZ=100 to KHZ=1000 in OpenServer
6.0.0 when patch OSS706B or OSS706C are loaded (not OSS706A).

It is known that this change "... may be related to some system hangs
or reboots reported to SCO, as described above. These issues are under
investigation."

Last ptf9052g for UW 7.1.4 resets this parameter to its original value
(KHZ=100)

So I agree with Mike. This may be the origin of the trouble. I've
added KHZ=100 to /etc/default/boot, and the server will be rebooted
this midnight. I must wait for resluts in the next days. I'll post the
results.

To all that speak about memory problems, I forgot to say that memory
in this server is ECC memory from HP, 6 DIMMs, 512 MB each, DDR2.
Memory, motherboard, RAID controllers, etc. have been tested with test
programs supplied by HP without errors (not tested 96 hours as Boyd
proposed...)

Thank you all again. I'll keep you informed.

Best regards.
--
Alberto Rodriguez Rodriguez


scoace

unread,
Aug 23, 2007, 2:48:14 PM8/23/07
to

Hi Alberto,

the ML370G4 has a very robust advanced ECC memory subsystem,
I suppose hardware problems are possible but they would be recorded
in the HW log. If you have the Management Agents loaded you can
access the log through a web browser to port 2301. I think it is the
KHZ tunable, I have seen the same issue on other servers.

Here is a 10 page PDF on the ProLiant 300 series memory features:

http://h20000.www2.hp.com/bizsupport/TechSupport/CoreRedirect.jsp?redirectReason=DocIndexPDF&prodSeriesId=397646&targetPage=http%3A%2F%2Fh20000.www2.hp.com%2Fbc%2Fdocs%2Fsupport%2FSupportManual%2Fc00218059%2Fc00218059.pdf

Mike

Brian K. White

unread,
Aug 24, 2007, 2:37:29 AM8/24/07
to

----- Original Message -----
From: "Bill Campbell" <bi...@celestial.com>
Newsgroups: comp.unix.sco.misc
To: <sco-...@lists.celestial.com>
Sent: Tuesday, August 21, 2007 7:12 PM
Subject: Re: OpenServer 6.0.0 hangs randomly. Please help!


> On Tue, Aug 21, 2007, Bill Vermillion wrote:
>>In article <1187725153.0...@r29g2000hsg.googlegroups.com>,
>>Alberto Rodriguez <alb...@unilogic.es> wrote:
>>>
>>>Hi all,
>>>
>>>I have a customer with a big problem in a very critical server. SCO
>>>OpenServer 6.0.0 hangs randomly without any error message...
>>
>>I'll second what Boyd says.
>>
>>The HW vendor did NOT follow my specs and had a machine that would
>>not recognize nor use ECC memory.
>
> Bela and others went into long rants when Intel and others were pushing
> the
> idea that we don't need parity checking RAM.

Actually why weren't they right?
Yes you need more robust memory solutions, but why couldn't you impliment
the parity checking and-or error correcting in the chipset using extra
sticks of any-old ram instead of building it into each stick of ram?
Ram-raid as it were. Which I think they have actually. That would be quite a
big chunk of overhead off our backs and better for everyone if a whole type
of ram and it's design and manufacture chain went completely away and those
resources just went into making more regular ram.

Brian K. White br...@aljex.com http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!

Bela Lubkin

unread,
Aug 24, 2007, 1:58:58 PM8/24/07
to
Brian K. White wrote:

> From: "Bill Campbell" <bi...@celestial.com>
>
> > On Tue, Aug 21, 2007, Bill Vermillion wrote:
> >>
> >>The HW vendor did NOT follow my specs and had a machine that would
> >>not recognize nor use ECC memory.
> >
> > Bela and others went into long rants when Intel and others were pushing
> > the idea that we don't need parity checking RAM.
>
> Actually why weren't they right?
> Yes you need more robust memory solutions, but why couldn't you impliment
> the parity checking and-or error correcting in the chipset using extra
> sticks of any-old ram instead of building it into each stick of ram?
> Ram-raid as it were. Which I think they have actually. That would be quite a
> big chunk of overhead off our backs and better for everyone if a whole type
> of ram and it's design and manufacture chain went completely away and those
> resources just went into making more regular ram.

Such a scheme would have met my requirements -- it's still a parity or
ECC scheme, even if the memory sticks you install into the machine don't
have extra parity bits.

But... your scheme basically wouldn't work. It would limit the
amount of RAM in the machine to the amount that could be covered by
the parity/ECC RAM bits in the chipset (so you would have to buy a
fancier chipset model for a larger machine). Worse, you would need
a different speed of chipset internal parity/ECC RAM for each speed
of external memory stick (look how much trouble ensues when you use
mismatched sticks in a single machine, even if they're the _same_
speed and possibly even the same SKU, but different batches from the
manufacturer...) Worst of all, the timing for it just wouldn't work
very well and you'd end up with a flaky design.

And all that just to save 1 bit in 9. It's not "quite a big chunk of
overhead". If parity/ECC RAM is more expensive than regular RAM, it
certainly isn't because of the 1/9 extra hardware, it's due to low
demand because the chipset and system designers don't use it.

(I was going to say something here about how a certain large
hardware-spec-setting OS vendor should push parity or ECC for all
systems (down to desktops & laptops) so they could show us how much of
the instability really isn't their fault at all, it's the memory -- then
it occurred to me that perhaps it wouldn't make much difference and
would only further highlight the true source of instability...)

>Bela<

David C. Moody

unread,
Aug 29, 2007, 2:58:53 PM8/29/07
to
This is very similar to my issue.

I have a ML350 G5, put into service around the first of June.

It randomly locks up, I still have access to the terminal, but network
shuts down. Nothing can come in nothing can go out.

Currently, have a new motherboard sitting in my floor I plan to swap
that out tomorrow and see what that does.

Hate to say this, but I'm glad I'm not the only one with issues.

Alberto, please keep us/me informed as I will also update on my
situation when I know something

-David

Alberto Rodriguez

unread,
Aug 29, 2007, 4:21:23 PM8/29/07
to

David,

The server is now seven days up and running without problems, from the
day that I set the parameter KHZ=100 in /stand/boot.

But I think it's soon for bells and whistles. I should wait for at
least three or four weeks without hangs to think that things are OK.

Anyway, I'll keep you informed weekly.

Best regards
--
Alberto Rodriguez Rodriguez

Alberto Rodriguez

unread,
Sep 9, 2007, 3:23:52 AM9/9/07
to
Eighteen days without hangs!!!

--
Alberto Rodriguez Rodriguez

David C. Moody

unread,
Sep 10, 2007, 9:22:55 PM9/10/07
to
Just an update from me..

I tried the KHZ=100 parameter and my problem remains the same. Every
3-5 days my network connection shuts down and I cannot access anything
via the NIC card.

I've already replaced the motherboard (built-in NIC), now HP is
sending me through another array of tests.

This is just ridiculous.

Any other help would be greatly appreciated.

-David

Bill Vermillion

unread,
Sep 11, 2007, 8:02:27 AM9/11/07
to
In article <1189473775.7...@w3g2000hsg.googlegroups.com>,

How about some more information. I had a client with
a Sonic-Wall [I think that was it - I get some clients confused]
who would set the system on the weekend so the owner could log in
from home, and then on Monday they could not access the network.

So they would reboot the machine. After doing this several times
they called me, and the fix for them was 1) don't do this [which
was not a solution] or 2) just restart the network.

I just wrote a 2 or 3 line script so they did not have to remember
the CLI interface other than just login in and type one command.

What do the network stats show when you can't connect. eg IP
number, and any other messages. Check them again after you
restart and see what happens.

What is your network connection type. direct-link such
as shared or dedicated T1, ?DSL, Cable modem, whatever.

I really expect some external changes are causing this and not
anything internal to SCO.

David C. Moody

unread,
Sep 13, 2007, 1:31:56 PM9/13/07
to
On Sep 11, 8:02 am, b...@wjv.com (Bill Vermillion) wrote:
> In article <1189473775.705261.268...@w3g2000hsg.googlegroups.com>,

Hi Bill,

I have a Sonicwall Pro3060 on my network I have it for years. I do
allow VPN access, but the SCO box loses network connectivity
randomly. I will have to try a netstat command next time to see what
it is doing. Any other commands you would like to see?

This has nothing to do with VPN access, VPN access is provided 24/7
and is used very frequently by several users. So I'm at a loss, this
is just becoming annoying.

I have a ML350 G3, that is running SCO6 with NO problems at all. It's
just my G5, and they are configured exactly the same, software, etc.

Now HP wants me to uninstall all the EFS drivers and reinstall them
thinking that I got a screwed up version. Guess I will try that this
weekend.

I'm looking for any help anyone can give me. Can anyone tell me what
command to issue to see what driver the network interface is using?
Or where to go look for it? That's another thing HP wanted me to look
at, but the commands they were giving me were all linux commands and
not avail on SCO.

Thanks,
-David

ThreeStar

unread,
Sep 14, 2007, 1:58:38 PM9/14/07
to

HP's off-source help desk often doesn't even know their own products,
much less OS's that run on them. SCO recommends using their NIC
drivers rather than HP's. TA 116163, which despite its title also
applies to OS6. So I don't think re-installing EFS is going to help
and may hurt, given how finicky OpenServer is about un-installs.

The OS6 netconfig utility is pretty good at auto-detecting NICs and
applying the right driver. It's only important that you have
installed the latest SCO ND's.

I had reports of an OS 6 box developing periodic network problems.
Looking into it I found that the problems began when the Network Admin
decided to segment the local network using M$ ISA's. The problems
went away when he put all boxes back in the same segment. So, yeah,
I'm in the camp that suspects something external to the box.

--RLR

Bill Vermillion

unread,
Sep 14, 2007, 2:52:21 PM9/14/07
to
In article <1189704716.2...@d55g2000hsg.googlegroups.com>,

David C. Moody <infr...@gmail.com> wrote:

Run your netstat commands and run the arp commands. Do it when all
works and then when it fails. It could be that something is
usurping the IP of the SCO machine.

>This has nothing to do with VPN access, VPN access is provided 24/7
>and is used very frequently by several users. So I'm at a loss, this
>is just becoming annoying.

On the VPN using the Sonic people could still log in, but the SCO
was disconnected. Restarting the tcp daemon fixed it. No need to
reboot. I just built a small script with an easy name so that
when things went away they could just login and type <program-name>

>I have a ML350 G3, that is running SCO6 with NO problems at all. It's
>just my G5, and they are configured exactly the same, software, etc.

Something is different.

>Now HP wants me to uninstall all the EFS drivers and reinstall them
>thinking that I got a screwed up version. Guess I will try that this
>weekend.

That makes no sense. Software/settings should not change
dynmaically unless there is some outside influence.


>I'm looking for any help anyone can give me. Can anyone tell me what
>command to issue to see what driver the network interface is using?
>Or where to go look for it? That's another thing HP wanted me to look
>at, but the commands they were giving me were all linux commands and
>not avail on SCO.

If it works part of the time then I would not expect the driver.

I really suspect something is grabbing the SCO's IP, or that
there is something that may be disconnecting the SCO from the
network.

Again - look at the network status before and after and also
look at the ARP commands. And perform the arp on the SCO
machine >>AND<< other machines on the network and compare the
output. Note the MAC addresses to make sure they are the same.

Also check to see on what the SCO machine is connected to, as if
this goes away, loses power, inaccessbie in any way, that could
also cause the problems.

I can't give you the exact commands as I don't have access to an
SCO machine at the moment.

And just what commands were being given to you by HP.
Post them and we can give you real Unix equivalents :-)

Alberto Rodriguez

unread,
Sep 16, 2007, 4:29:29 AM9/16/07
to

Twenty five days without hangs!!!

(I'm becoming a happy man...)

--
Alberto Rodriguez Rodriguez

Alberto Rodriguez

unread,
Sep 24, 2007, 2:18:42 AM9/24/07
to
Thirty two days up and running!

No more hangs until now! :-)

--
Alberto Rodriguez Rodriguez


David C. Moody

unread,
Oct 2, 2007, 4:22:07 PM10/2/07
to

I am very happy as well. I found an extra download on HP's site. HP
Networking Pack, I downloaded the pack looked into my NIC driver's
folders and found a new driver version.

The driver version included with EFS was 2.8.7 the driver version in
the pack was 3.4.0.

The new driver has been installed and I have an uptime of 12 days
now. I haven't been online for 12 solid days since the first month of
the machine being online.

I'll continue to update after I have 1 month uptime.

-David

Joe Chasan

unread,
Oct 18, 2007, 9:44:11 AM10/18/07
to
I have a customer with SCO OpenServr 6/mp2 also with random hangs.
It is an HP Proliant ML-350G5 w/3GB & SAS Raid - i've setup similar
a few times, yet this one in particular randomly hangs as well.

the only thing different about this install others i've done with identical
hardware/software is their reliance on terminals (Digiboard C/X, maybe 30 to
40 terminals & serial printers) and mscreen. because of this i contacted
digi tech suport, who insist its a sco problem.

users log out & never get back to login, who shows them still logged in,
eventually all are hung. could take anywhere from 1 day to 3 weeks after
are reboot before this happens.

no error messages to console, nothing in syslog/messages, no indication
of running out of any resources, etc. built-in hp diagnostics run & don't
find any errors.

I tried the KHZ=100 thing and it did not help in my case at all.

ideas?

-joe

--- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ---
-Joe Chasan- Magnatech Business Systems, Inc.
j...@magnatechonline.com Hicksville, NY - USA
http://www.MagnatechOnline.com Tel.(516) 931-4444/Fax.(516) 931-1264

Alberto Rodriguez

unread,
Oct 20, 2007, 4:36:51 AM10/20/07
to
Hi all,

Two months after changing parameter KHZ=100 in /etc/default/boot, the
system has not crashed at all. It's up and running without problems!!!

I want to give a big "Thank You" to all of you who shared your
knowledges and ideas. Your help has been very valuable.

Joe Chasan wrote:

> I tried the KHZ=100 thing and it did not help in my case at all.
> ideas?

Joe,

I had the same problem (random hangs) with another two servers with a
very different hardware. Both of them had one hang after chanching
KHZ=100, but I made a full fsck of filesystems and from that moment
on, they work flawlessly. No hangs.

As single user:

# /etc/fsck -y -ofull /dev/root

# /etc/fsck -y -ofull /dev/u ...or whichever other filesystems

Try to assign a single (not shared) IRQ for the Digiboard card in
server's BIOS.

Hope this helps. Best regards.
--
Alberto Rodriguez Rodriguez


Reply all
Reply to author
Forward
0 new messages