R6000 systems, anyone?

17 views
Skip to first unread message

bruce allen ediger

unread,
Jun 28, 1991, 11:52:56 AM6/28/91
to
I'm posting this for a friend who doesn't have usenet access, so email
replies to me, and I'll get them to him. Thanks.


We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
like yo-yos, have poor network performance (especially NFS -- gag!),
disk controller and ethernet board failures, CPU board failures, and so on
and on and on. CDC claims they have a total of 48 systems installed and these
are the only FOUR that have problems. We actually received shipment of SIX
systems, but TWO were irreparably DOA and had to be shipped back. A THIRD
was not DOA but had to be swapped out with a new box after about two months
of failed efforts to make it work. SO, since our experience is that SEVEN
consecutive systems shipped to us from MIPS, and installed by CDC with their
variant of the MIPS OS, were utter dogs, it seems hard to take CDC at their
word that the other 40-some-odd systems are happy-go-lucky.

Charlie Price

unread,
Jun 28, 1991, 5:47:56 PM6/28/91
to

1) When one posts an article saying something so critical of a product,
it seems only fair that one not do so anonymously.
I realize that the poster did not write the comment, nor the writer
create the article -- but I believe the original
writer's name and organization should appear with the comments.

2) What "replies" does this article solicit?
There are no question marks in the body of the article
so I don't see any explicit questions.
Does the original writer want information
or does he/she just want to complain?

I'll take a guess about what is wanted and make some general comments.

We have a number of R6000-based machines in engineering
(running MIPS OS and not a CDC-modified version).
Though I don't keep track of reliability explicitly I have some
anecdotal experience with real machines and through talking to
various hardware and software folks working on the boxes.

The RC6280 and RC6260 (MIPS product numbers)
are our most complicated products
(fastest, most memory, most controllers, most disks, most nets, ...)
and are not our most reliable products.
However they can't, generally be described as "yo-yos".

However:
1) The machines are complicated and are somewhat sensitive to the
detailed nature of the work load.
Over time, the computer center here and people in the field have
turned up hardware problems by using machines in a new way.
Some very-heavy loads don't provoke any problems and other,
seemingly less stressful loads, can trigger them.
If an application mix happens to discover a new hardware bug
then the machines probably trip over it a lot.

I don't find it hard to believe that one installation has a lot
of problems that other installations just never see.

The machines *are* getting globally better over time
as is the OS software.

2) That means it is important to have the latest Engineering Changes
to hardware and revisions of software.
Presumably the CDC field people know that.

3) No matter what the machine, you can always find an individual
machine that just has problems. Having 4 of them does seem
like rather a lot, so I would guess that you have a new bug.
(Lucky you :-) )

4) (Point-of-comment disclaimer: Remember; I speak for myself alone).
I wouldn't encourage you to accept wretched reliability from
MIPS or any other vendor.
It seems to me there are 3 main choices, and the one you choose
depends on the skills, time you have, relationship with the
vendor, alternatives available, and other stuff.

1) Complain, put up with it, and hope somebody (else) finds
the problem. This doesn't take any special effort,
but also doesn't necessarily make anything run better.

2) Rip the machines out and send them back.
You don't have the reliability problem any more,
but you don't have a working computer system either.

3) Actively "help" the vendor discover the problem by
*making* them pay attention to you and then giving them
all the assistance that you can.
The nature of load-dependent or environment-dependent
reliability problems means that the users and administrators
are often an important part of identifying the problem area.
This can be (very) painful and not every installation
has the time/skills/resources to do it.
If you can do it, you stand a much better chance of getting
your particular problems fixed.
--
Charlie Price cpr...@mips.mips.com (408) 720-1700
MIPS Computer Systems / 928 Arques Ave. MS 1-03 / Sunnyvale, CA 94088-3650

Gustavo E. Scuseria

unread,
Jun 30, 1991, 6:44:28 PM6/30/91
to
In article <52...@spim.mips.COM> cpr...@mips.com (Charlie Price) writes:
>In article <1991Jun28.1...@mnemosyne.cs.du.edu> bed...@isis.cs.du.edu writes:
>>
>> [stuff deleted]

>>
>>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>>like yo-yos, have poor network performance (especially NFS -- gag!),
>>disk controller and ethernet board failures, CPU board failures, and so on
>>and on and on.
>> [more stuff deleted]

My RC6280 had pretty much the same problems ...
I was going to buy it (trading in one of my m-2000s)
but gave up after 6 months and innumerable board exchanges.

Charlie Price's recommendation in such a case are:


>
>
> 1) Complain, put up with it, and hope somebody (else) finds
> the problem. This doesn't take any special effort,
> but also doesn't necessarily make anything run better.
>
> 2) Rip the machines out and send them back.
> You don't have the reliability problem any more,
> but you don't have a working computer system either.
>
> 3) Actively "help" the vendor discover the problem by
> *making* them pay attention to you and then giving them
> all the assistance that you can.
> The nature of load-dependent or environment-dependent
> reliability problems means that the users and administrators
> are often an important part of identifying the problem area.
> This can be (very) painful and not every installation
> has the time/skills/resources to do it.
> If you can do it, you stand a much better chance of getting
> your particular problems fixed.

which sounds very good ...

In my case, MIPS demanded a maintenance contract on the machine
to continue an unsuccesfull effort to keep it up longer than
24 hours ! It will crashed for any reason, anytime.
Of course, I sent the 6280 back and bought an IBM 6000/530.
With the money left, I'm also buying an IBM 550 or an HP 730.
Have not made up my mind yet ... Either of them easily beat the
6280 in floating point speed, not to mention that they cost only
a fraction of the MIPS's box price. IMHO, that's the way you get
your problems fixed.

--
Gustavo E. Scuseria | gus...@katzo.rice.edu
Department of Chemistry |
Rice University | office: (713) 527-4082
Houston, Texas 77251-1892 | fax : (713) 285-5155

Klaus Steinberger

unread,
Jun 30, 1991, 1:20:47 PM6/30/91
to
bed...@isis.cs.du.edu (bruce allen ediger) writes:

>I'm posting this for a friend who doesn't have usenet access, so email
>replies to me, and I'll get them to him. Thanks.


>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>like yo-yos, have poor network performance (especially NFS -- gag!),
>disk controller and ethernet board failures, CPU board failures, and so on

[something deleted]


>word that the other 40-some-odd systems are happy-go-lucky.

We have one CDC CD4680, and we are very happy with the system, after
some initial problems (We've triggered a bug in the floating chip).

The system is really fast, and runs reliably. We don't experience
poor network performance, are you sure your network is ok?
(be sure you don't run EP/IX 1.2.1 which has some trouble with YP)

We experienced some failures with the CPU Board, but they were all
related to heavy board swapping, during the search for the floating
point chip bug.

CDC's support is very good, hardware as well as software.

Sincerely,
Klaus Steinberger

--
Klaus Steinberger Beschleunigerlabor der TU und LMU Muenchen
Phone: (+49 89)3209 4287 Hochschulgelaende
FAX: (+49 89)3209 4280 D-8046 Garching, Germany
BITNET: K2@DGABLG5P Internet: k...@bl.physik.tu-muenchen.de

Masataka Ohta

unread,
Jul 2, 1991, 4:28:31 AM7/2/91
to
In article <52...@spim.mips.COM> cpr...@mips.com (Charlie Price) writes:

>>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>>like yo-yos, have poor network performance (especially NFS -- gag!),

We have four RC6280s running RISC/os4.52. From our experience, MTBF of a
RC6280 is about 1 month, though it is getting better recently.

Network performance is not bad, though NIS is still somewhat buggy in its
response speed. FDDI has just begun to work (I hope).

>4) (Point-of-comment disclaimer: Remember; I speak for myself alone).

Yes, but you don't know much about RC6280s.

> 3) Actively "help" the vendor discover the problem by
> *making* them pay attention to you and then giving them
> all the assistance that you can.

That is very difficult because RC6280 always crashes with double
panic, which means no post mortem dump is available to us. Thus, it is
very difficult for us to analyze the cause of the crash.

> This can be (very) painful and not every installation
> has the time/skills/resources to do it.

The problem is that the currently shipped RISC/os won't cooperate with us.

Masataka Ohta

Lori Corrin

unread,
Jul 2, 1991, 2:47:01 PM7/2/91
to


Well, there are definitely other systems out there with problems. We have a
CDC 4680 (aka mips 6280) that has had it's share of trouble too. We've
had the box for about 5 months and have had to replace a number of
boards in the system due to hardware failures. The first such board
swap was shortly after we received the box. The system started
crashing after the installation of an expresslink board. The system
is currently stable but we're definitely keeping our fingers crossed.

In all of this the local CDC support people have been great (thanks
George and Dave), and have spent some long hours in here tracking the
problems. It doesn't excuse CDC but since our box isn't in a critical
role yet it does make it easier to live with.


Lori Corrin Email: <pur...@uwo.ca>
Computing and Communication Services Voice: (519) 661-2151
University of Western Ontario Fax: (519) 661-3486
London Ontario Canada

Wolfgang Hecht

unread,
Jul 3, 1991, 5:44:05 AM7/3/91
to
pur...@ria.ccs.uwo.ca (Lori Corrin) writes:

>>We have FOUR CDC/4680 (MIPS ECL R6000) systems and all four go up and down
>>like yo-yos, have poor network performance (especially NFS -- gag!),

...
We have a CDC/4680 since october 1990, running fast and well and with
very good network performance (ftp and nfs). I suspect you have troubles
on your network, which can lead to any bad effects.

Wolfgang Hecht

Reply all
Reply to author
Forward
0 new messages