Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Octane Stability Issues

6 views
Skip to first unread message

matt

unread,
Mar 18, 2002, 6:32:47 PM3/18/02
to
Hi, I'm having some issues with my Octane. It's an R10k@250Mhz/512mb/
SSE machine, and it doesn't seem to be able to stay up more than 2 weeks.
When it crashes, it typically goes without provication (sometimes I'll happen
to be using it, sometimes it goes at night), the whole machine freezes (can't
ping it externally). The display always freezes the same way, the current
image on the screen holds indefinetely irregardless of peripheral input,
but the colors look like film negatives of what they are supposed to be.

Somestimes this happens every few days, sometimes the machine will stay up
2 weeks, maybe a little more, but it always eventually crashes. It's done
this since I bought it last August.

The only way to "fix" it, meaning get it running and ready to crash again, is
to pull the plug, peripherals, and yank the motherboard. Sometimes it only
takes a couple motherboard reseats to get the thing going, sometimes I will
fight with it for over an hour, reseating ram, trying to make sure there isn't
any dust anywhere, applying pressure to chips on the mainboard, etc...
So far I've been able to bring it back to life every time it's died, though
it almost got chucked out the window a few times.

I'm *really* tired of this, to the point where I'm thinking of just selling
off the bloody thing. It's running Irix 6.5.13f, all peripherals are
authentic granite sgi originals, except for 256mb of ram which I believe is
3rd party (SEC or something like that).

Anyone have similar experiences? Anyone solve the problem? I've heard
plenty of horror stories about Octanes failing, does anyone have an Octane
that has a better track record for uptime? (is it forsale?!)

Thanks for reading,

-m

David W. Stadden

unread,
Mar 18, 2002, 7:40:59 PM3/18/02
to

There could be a problem with the motherboard or the graphics board.
Is there anything logged in the syslog file, /var/adm/SYSLOG. If there
is it may give you a hint whether it's the graphics board or motherboard.
If there's nothing in there that looks like it pertains to the problem, then,
if you have access to another graphics board, you should try installing
a different graphics board to see if it happens. This will narrow down
which board has the problem.

RE:


>Anyone have similar experiences? Anyone solve the problem? I've heard
>plenty of horror stories about Octanes failing, does anyone have an Octane
>that has a better track record for uptime? (is it forsale?!)

I've supported a lot of Octanes that have had no problems. I have had a
couple of lemons though.

Hope this helps.

Dave

Colin Anderson

unread,
Mar 19, 2002, 2:14:48 AM3/19/02
to
Just my $0.02...

There are plenty of Octane horror stories (moreso than Indigo2
anyway), but for the most part I have found the Octane to be a *very*
sturdy machine. In working with many Octanes in a variety of
configurations over the past four years (R10K/195 MXI, R12K/300
SE+Tex, R12K/400 V12) I have run across no stability problems.

Your problems don't sound like any others I have heard of. I am
curious, though, if you have found heat to be an issue (usually not
too common with e series mgras gfx). Are you able to "fix" the problem
by simply cutting the power and waiting an hour or two for the machine
to cool down? Or does it always require board reseating and other
physical futzing?

Unless someone speaks up with a quick and easy fix (or more info as to
the problem), you might want to find a low spec Octane ($500 R10K/195
SI) to use for parts swapping until the problem is resolved.

- Colin


ma...@dqc.org (matt) wrote in message news:<a75tev$282s$1...@msunews.cl.msu.edu>...

Per Ekman

unread,
Mar 19, 2002, 4:25:37 AM3/19/02
to
ma...@dqc.org (matt) writes:

> Anyone have similar experiences? Anyone solve the problem? I've heard
> plenty of horror stories about Octanes failing, does anyone have an Octane
> that has a better track record for uptime? (is it forsale?!)

Dual R10K@250Mhz, 1GB memory, MXI graphics. Current uptime 118 days
(upgraded the OS). Has crashed maybe 3-4 times in 4 years despite
intensive use. It is standing in a real computer room though, I guess
that can make a difference.

*p

Valdis Liseks

unread,
Mar 19, 2002, 6:39:08 AM3/19/02
to
Hi Matt,

Anything unusual in SYSLOG file - errors, warnings ? I'v seen
something like this on Octane with dead motherboard - not quite dead -
it also worked for some 1 - 2 days ( or even 1 -2 hours ) and then whole
system freezes. Check sgi.hardware - this problem was discussed before.

Best,

Valdis Liseks

www.vilks.com

Hans van der Voort

unread,
Mar 19, 2002, 8:23:39 AM3/19/02
to
Hi Matt,

> Hi, I'm having some issues with my Octane. It's an R10k@250Mhz/512mb/
> SSE machine, and it doesn't seem to be able to stay up more than 2 weeks.
> When it crashes, it typically goes without provication (sometimes I'll happen
> to be using it, sometimes it goes at night), the whole machine freezes (can't
> ping it externally). The display always freezes the same way, the current
> image on the screen holds indefinetely irregardless of peripheral input,
> but the colors look like film negatives of what they are supposed to be.
>
> Somestimes this happens every few days, sometimes the machine will stay up
> 2 weeks, maybe a little more, but it always eventually crashes. It's done
> this since I bought it last August.
Our Octane has problems like you describe, started after some problems
with the mains: machine freezes without warning. If I reset it the
screen blanks but nothing further happens. At first I thought it was
something with the RAM, so I threw out the most suspicious RAM. That
helped a lot, but the machine still freezed now and then. If it does I
press reset together with the power button. Maybe magic but works for me.
Funny is that the machine seems to become more stable, no freeze since a
month now.
BTW, one of the freezes cost me some files, they were all zeroed. Seems XFS
does this when a file is beyond recovery.

-- Hans

Ralf Beyer

unread,
Mar 19, 2002, 9:28:57 AM3/19/02
to
matt schrieb:

>
> Hi, I'm having some issues with my Octane. It's an R10k@250Mhz/512mb/
> SSE machine, and it doesn't seem to be able to stay up more than 2 weeks.
> When it crashes, it typically goes without provication (sometimes I'll happen
> to be using it, sometimes it goes at night), the whole machine freezes (can't
> ping it externally). The display always freezes the same way, the current
> image on the screen holds indefinetely irregardless of peripheral input,
> but the colors look like film negatives of what they are supposed to be.
>
> Somestimes this happens every few days, sometimes the machine will stay up
> 2 weeks, maybe a little more, but it always eventually crashes. It's done
> this since I bought it last August.
>
> The only way to "fix" it, meaning get it running and ready to crash again, is
> to pull the plug, peripherals, and yank the motherboard. Sometimes it only
> takes a couple motherboard reseats to get the thing going, sometimes I will
> fight with it for over an hour, reseating ram, trying to make sure there isn't
> any dust anywhere, applying pressure to chips on the mainboard, etc...
> So far I've been able to bring it back to life every time it's died, though
> it almost got chucked out the window a few times.
>
> I'm *really* tired of this, to the point where I'm thinking of just selling
> off the bloody thing. It's running Irix 6.5.13f, all peripherals are
> authentic granite sgi originals, except for 256mb of ram which I believe is
> 3rd party (SEC or something like that).
>
> Anyone have similar experiences? Anyone solve the problem? I've heard
> plenty of horror stories about Octanes failing, does anyone have an Octane
> that has a better track record for uptime? (is it forsale?!)
>
> Thanks for reading,
>
> -m

Reports and recommendations from earlier postings:

- It may be a bad IP30 board version -003 which must be replaced by an
IP30 board with a higher version number (-005) or by a new design of
that board (part 030-1467-001 or later).

Report the version number of the board you have (for instance, IP30
board, barcode HHL594, part 030-0887-004 rev C).

The version number is shown on the IP30 board but can also be
read under IRIX by 'hinv -mv'.

Look for a line similar to:

Location: /hw/node/xtalk/15
IP30 Board: barcode HHL594 part 030-0887-003 rev C

If it is IP30 board 030-0887-003 or -004, remove the power cord,
ground yourself, remove the system module, put it on a flat surface
with the components facing you, and press the module with the black
heat sink in the center of the board firmly against the surface.
Reassemble and restart the machine.

This succeeded in 10 of 10 cases with 10 different Octanes I know
of. However, in some other cases the defect showed-up again after
some time.

- I removed the system boards and reseated the memory, cleaned the
dust off the connections and it booted up right away.

- In some cases reseating the system module did the trick. May be
it is necessary to repeat the procedure several times.

However, some system modules shut-off and could not be brought back
by this method.

- Removed the system module and installed it into another Octane: no
success, only fan and light on. Installed the system module back
into the original Octane: no success, only fan and light on. Then
unplugged the power cable, depressed the power button and while it
was depressed I plugged the power cable in again: the machine came-
up as usual.

- Let the Octane cool down long enough and turn it on again.

- Look for a sticking plastic power switch lever. Remove the front
panel, start the machine with the chassis button, and then replace
the front panel. This helped more than once.

- Keep the air intakes clean, cooling is critical.

- Ingo Fellner <i.fe...@puz.de> once wrote:

Try to take out the mainboard (and eventually power supply), clean
it and its connectors from dust (with compressed air), put it all
together and try again - we have much strange errors with Octanes
that could be solved like this.

- Alexis Cousein <a...@brussels.sgi.com> once wrote:

DO *NOT* CLEAN CPOP CONNECTORS WITH COMPRESSED AIR.
At least not if you don't want to ruin them. There are special cans
with just the right kind of pressure (and inert gases) to clean
these, but normal air coming at much higher pressure out of a
compressor are *not* a good idea.

- Simon Pigot <si...@dpiwe.tas.gov.au> once wrote:

I have 0887-003 (large central black heat sink) and 0887-005
(smaller central silver heat sink) IP30s and the problems occur/have
occured with both ie. power supply fan starts when machine is
connected to wall socket but nothing else happens - no LED, no
response to power button. Don't know about the 0887-004 boards or
the later 1467-00x boards.

Only hints I can get are that its the HEART and SPIDER chips under
the central heat sink on the IP30 which cause most of the problems
(see Ralf Beyer's earlier messages about pressing on the heat sink -
a friend just emailed me with success by doing this but it hasnt
worked on my 005 board so maybe it has something else wrong :-() and
that you'll definitely get this problem if you don't keep the air
intakes clean so it looks like cooling is critical too (not terribly
surprising). Lastly, as mentioned earlier, it looks like
transporting them doesn't help either as they seem to like being
dead on arrival - I don't think I'd buy one without a warranty!

- Les Sharp <artp...@cyberone.com.au> once wrote:

The IP30 board is 030-0887-004 Rev A. I took out the system module,
placed it on a flat surface al la the instructions, pressed firmly
on the central silver heat sink, reassembled the unit and bingo the
Octane came straight back up. Amazing!! Initially I thought I would
end up with a pretty green door stop. Thanks alot for the help.


What is the status of the LEDs? Ben Drago <bdr...@sgi.com> once wrote:

The LEDs are simply link status lights. There is actually seven
LEDs:

BaseIO X
QA X X PCI Expansion
QD X X QB
QC X X Heart

The BaseIO and Heart are connected internally and are part of the
IP30, so these will always be lit. The QA-QD refer to the quad
module, which is labeled like this when facing the rear of the
Octane:

QA | QB
-------
QC | QD

QA should always be lit as the first graphics card is installed in
Quadrant A. The other LEDs will be lit depending on what XIO options
are installed.

The PCI Expansion LED will be lit if there is a PCI shoebox
installed.


Please report how you resolved the case.

Regards
Ralf Beyer
--
beyer.bra...@freenet.de

Wolfgang Szoecs

unread,
Mar 19, 2002, 1:56:07 PM3/19/02
to
In article <3C973BD...@svi.nl>,

Hans van der Voort <ha...@svi.nl> writes:

> BTW, one of the freezes cost me some files, they were all zeroed. Seems XFS
> does this when a file is beyond recovery.

that's nothing bad from XFS.
XFS guarantees only META-data integrity, not file-data integrity.

Wolfgang

matt

unread,
Mar 20, 2002, 12:12:46 PM3/20/02
to
Thanks to everyone who has posted follow-ups and sent me e-mail with
suggestions, I haven't resolved the problem yet, but here's what has happened
so far (let's hope this time I can post this before the octane crashes
again!), plus a few more details.

- Although not recently, I have tried the reinstalling irix bit, on a seperate
drive. It never helped the problem, and from what I've heard of 6.5.13f (plus
this morning's experiencies), I don't think it's a software problem. The
syslog as well as the crash report thing don't report anything when a crash
happens, I did set the chkconfig verbose mode as per a good suggestion I
recieved, It didn't seem to affect anything. I have tried replacing the
keyboard/mouse with spares (authentic sgi granite stuff, of course) from one
of the indigo2's here, still no change. And now for this morning's fun...

I let the octane sit for a day and a half so as to let it (and me) cool down
sufficiently. It has good ventilation, so I didn't think heat was an issue,
and sure enough the unit still wouldn't start when it was cold. I tried
reseating the XIO stuff (I just have an SSE in the octane at the moment), no
dice. Then I tried reseating the motherboard, no dice. I took the motherboard
out again, examined the compression connectors, they looked fine, but I gently
blew them off anyhow. Still, no dice. At this point I pulled the 2 128mb
dimms that comprised the 3rd party memory I was using, eureka! (or so I
thought), the machine came to life again. 45 minutes later it crashed again,
so I yanked the 2 128mb genuine sgi memory and replaced them with the 3rd party
memory, and once again the machine jumped back to life. Instead of starting
irix this time, I ran the diagnostics from the boot menu. Took 10 minutes or
something like that, pretty normal, the tests indicated everything had passed
and everything was good - I typed "quit" and the machine monitor went into
apm mode, the machine was frozen/crashed again. I played with the memory a bit,
and it finally started again with the 2x sgi branded dimms replacing the
3rd party memory. I ran the diagnostics again, everything came back normal,
and the machine is booted and running. the logs in /var/adm still show nothing
from the crash that happened this morning after the 45 minutes of uptime.
One odd thing I noticed, when I booted into irix after running the diganostics
the last time, the time was reset to 1969, and I recieved some warning errors
about the clock loosing it's battery backup or something like that.

Also, the link lights seem to light up in strange configurations after most
reseats, regardless of if a reseat was successful or not. The two I noticed
that are definitely not the norm:

O
X X
0 X
X X

and

X
0 0
0 X
0 0

Where X = on, 0 = 0ff

Any other suggestions give this new information? btw, Thanks so much everyone
for being so helpful so far. I'm open to just about any idea, though I'm
really starting to think about getting rid of this darn thing.

- Matt

In article <Gt85K9...@arl.army.mil>,

matt

unread,
Mar 20, 2002, 12:15:10 PM3/20/02
to

Hrm, forgot to fix my e-mail address, it's actually ma...@dqc.org - that dyndns
one probably won't work.

-m

In article <a7afue$d65$1...@msunews.cl.msu.edu>,

Walther Mathieu

unread,
Mar 20, 2002, 2:39:03 PM3/20/02
to matt
matt wrote:

Hi matt,

with all the effects You describe this sure is a hardware problem.
Since You were unable yet to locate a single defectuous component...

Whenever it comes to hardware defects - You know, suspicious are
- contacts/connectors,
- cables
- components
(in that order). PCBs tend to belong to the cable category.

But with all that cleaning, reseating, component swapping: no healing?

So I would at least suspect the integrity of the power supply (PS)
on its secondary side, don´t know its voltages.
The machine doesn´t start until You change something...
sometimes PSs on Indigo2s (R10k/Impact) did behave that way -
I´ve experienced that twice. An indigo of mine with impressario RIP
got kernel panic sometimes when it was about time to print now...
too much current for the parallel IF.
SGI PSs sense current and switch off or don´t even start when its too high.
And for the freezes... some spikes or glitches on DC -
most likely thatVLSI components get disturbed by that!

This is not easy to catch without a autotrigger memory oscilloscope.
Maybe You should give another - trusted - power supply a look
instead of frequently reseating boards & components -
connectors are made to withstand certain duty cycles only!

Good luck - and please post if You solve the problem!

Walther.

0 new messages