We upgraded a couple of days ago from 3.1 to 4.0. Although the process
itself was trivial (a matter of hours), the machine appears to be twice
as slow, although it should be somewhat faster, since 4.0 should make
use of both CPUs (and they are configured in, and started at boot time).
Actually, my feeling is that it is YP and/or NFS accesses that tremendously
slow the system (based on the fact that "ls -l" seems MUCH slower --
retrieval of login/group names?). Since we did not change anything yet
from 3.1 (in the YP setup), I am puzzled.
Any pointers that might help me?
Thanks in advance,
Michael Fingerhut
There have been several articles in the computer press here about problems
with the 58{234}0 under ULTRIX 4.0 . E.g., multiprocessor support aint
what it is supposed to be. I would push DEC for a refund on the extra cpu
(maybe the whole machine...)
greg pavlov, fstrf, amherst, ny
pav...@stewart.fstrf.org
1. What are the problems as known to DEC today (so that we're less
pissed off when we encounter them). That would be *some* help.
I am rather upset that the local support tells me, when I call them
with this problem "oh yeah we knew this from the start, why don't
you just turn all but one CPUs off until further notice...". If
they knew it from the start, don't send us 4.0 or warn us.
2. What they do or intend to do in order to solve them (other than
suggesting we buy another machine and/or go to another vendor, as
has already been suggested here).
Michael Fingerhut
"DEC" is a very big company and many of us that are
listening don't have access to the wide variety of
hardware needed to test customer problems. Many of
those that respond do so because we happen to know
the answer. Please don't confuse us with the people
in the development group who's job it is to test and
fix these sorts problems. Occasionally someone in
Engineering does respond, but usually they're working
on the bug fixes and new features of the next version.
If you have a problem the appropirate way to report is
to submit a Software Performance Report and/or go through
the Customer Support Center nearest you.
>
> 1. What are the problems as known to DEC today (so that we're less
> pissed off when we encounter them). That would be *some* help.
> I am rather upset that the local support tells me, when I call them
> with this problem "oh yeah we knew this from the start, why don't
> you just turn all but one CPUs off until further notice...". If
> they knew it from the start, don't send us 4.0 or warn us.
>
Most problems that we know about go into the release notes,
but sometimes the problems aren't found until after the
release notes have been printed. It would be nice if there
were a nice easy way to report verified problems back to
you.
> 2. What they do or intend to do in order to solve them (other than
> suggesting we buy another machine and/or go to another vendor, as
> has already been suggested here).
Hopefully fix the problem once we know what's wrong. Of
course until the people that own the problem know about
it they can't do anything. If they don't happen to be
reading this newsgroup then it might be a while before
they find out about it. I won't report a problem to them
until >>>I<<< can verify it. Since I don't have a 58xx
to test with there isn't much I can do.
One thing that would help is a better description of "slow".
What is the program doing? Lots of system calls, disk I/O,
network I/O, lots of memory use, paging? I suggest looking
at cpustat(1), iostat(1), netstat(1) and vmstat(1). One of
these days I'll see if I can put a source archive of monitor
for V4 on gatekeeper.dec.com.
>
> Michael Fingerhut
--
Alan Rollow al...@nabeth.enet.dec.com
Well, this was not the case. So *either* the software was not tried on
such configurations (hard to believe, but this would not be the first
time, eh? Remember the GT62?) or *else* customers were not informed
(which I believe is the case, since the support center was well aware
of the problem when I called them).
As to reporting the problem: we can do it only by phone, are given a
call number and most of the time hear that it will get in the next
release, hopefully.
But to this particular problem, I was also told it was a much more
serious problem, namely design flaws in the 58n0. That was DEC's
response. So you bet I'm worried.
>1. What are the problems as known to DEC today (so that we're less
> pissed off when we encounter them).
Here in Aberystwyth we are running 2x DEC 5830's with Rev 179 Ultrix 4.0.
We have observed that the Symmetric Multi-Processing behaves badly under
a low machine load and are therefore permamently running 2 low-priority
cpu-burning jobs which sleep for bursts of 15 seconds if the load
average goes >4.0.
When these jobs are running, the performance is greatly improved, we
believe this is because the presence of the two jobs (1 per spare CPU)
solves some sort of ordering problem in the scheduler. DEC definitely
DO know about this, it has appeared on a list of SPR's sent to us. No
solution is yet forthcoming, but we live in hope...
It's not the perfect solution, because the two jobs tend to eat away at
the cpu, and if some user puts a heavily i/o bound job up as well, the
machine starts to groan. Then we just kill them fast and put them back
later...
So, DEC have given us the ultimate reciprocal machine... the more load
you put on it, the faster it goes... 8)
alec (and robert :-) )
We suffer from the same restrictions, except for the fact that normally
we don't even hear such pleasant news as "fixed in next release".
In addition, "official" DEC channels refuse to comment on problems
mentioned in places like this: they ask you to submit a bug report if
you have a problem.
It is ridiculous that there is (apparently) no data base of known
problems and, where possible, available patches. I know that there are
various patches available, some of which seem to be considered "mandatory".
However, access seems only to be for those people having wasted their
time identifying a known problem on their own systems.
--
_ _ o | __ | j...@cernvax.uucp
| | | | _ / \ _ __ _ __ _| j...@cernvax.bitnet
| | | | |_) /_) | __/_) | (___\ | (_/ | J. M. Gerard, Div. DD, CERN,
| | |_|_| \_/\___ \__/ \___| (_|_| \_|_ 1211 Geneva 23, Switzerland
I got a call from DEC today, the person said that he will forward this to
the local office. However, he said there are two things to look for:
Is the configuration on an HSC, and if so, how many disks/requestors are
configured? The second was a suggestion - changing bufcache in your config
to 25 instead of the 10%. If you make the change, let me know how it goes.
Gary J. Rosenblum
UNIX Systems Manager rose...@nyu.edu
New York University ga...@nyu.edu
Gary
... so please don't say it did not occur. DEC acknowledges the problem
occurs, and that one of the problems is a design flaw in the scheduler which
causes lousy response time for small interactive jobs or commands (such
as ls) but which makes the 5820 a great machine for batch. Too bad. Should
have gotten an IBM.
> So, please provide us with as information as possible to
> help us solve the problem.
Response to all the information I gave was "next release".
> The official reporting channel for this things is an SPR.
No, at least not this side of the ocean.
> (*) My personal opinion is that we should allow submitting SPRs
> via e-mail
Yeah mine too, with automatic aknowledgment, and the possibility to consult
an online database of bug reports
> but I'm only a system manager
What about some responses from people at DEC who KNOW what's happening?
Michael Fingerhut