Cache Chip Bug

Anthony DeLorenzo

unread,

Jul 6, 2000, 3:00:00 AM7/6/00

to

Searching the archives revealed this output in a few dmesg dumps, so I'm
assuming that this is normal, but nevertheless.... My SS2 w/PowerUp
reports this when it boots:

cpu0 at mainbus0: cache chip bug; trap page uncached: W8601/8701 or
MB86903 @ 40Mhz, on-chip FPU

Obviously anything with the word 'bug' in it isn't all that
comforting. Is this normal, and if it is what's the story?

Regards,

Tony

# Anthony DeLorenzo <go...@vex.net>
# Whitehorse, YT
# http://www.vex.net/~gonzo/
# mojo wire: 209-391-8932

Rick Kelly

unread,

Jul 7, 2000, 3:00:00 AM7/7/00

to

Anthony DeLorenzo said:

>Searching the archives revealed this output in a few dmesg dumps, so I'm
>assuming that this is normal, but nevertheless.... My SS2 w/PowerUp
>reports this when it boots:
>
>cpu0 at mainbus0: cache chip bug; trap page uncached: W8601/8701 or
>MB86903 @ 40Mhz, on-chip FPU

This is from my SS2 with standard processor, 64 megs. etc...

mainbus0 (root): SUNW,Sun 4/75
cpu0 at mainbus0: cache chip bug; trap page uncached: CY7C601 @ 40 MHz, TMS390C6
02A FPU
cpu0: 64K byte write-through, 32 bytes/line, hw flush: cache enabled

It's normal. The box is as solid as a rock.

This particular box is the "Big Brother" server for my network. It hits
2.8-4.0 load average when doing CGI stuff to update the Big Brother pages.

It's running:

NetBSD seahag 1.4.1 NetBSD 1.4.1 (SEAHAG) #1: Sat Mar 4 00:05:04 MST 2000
rmk@seahag:/usr/src/sys/arch/sparc/compile/SEAHAG sparc

And it just keeps grinding along...
--
Rick Kelly r...@rmkhome.com www.rmkhome.com

James Sharp

unread,

Jul 7, 2000, 3:00:00 AM7/7/00

to

On Fri, 7 Jul 2000, Rick Kelly wrote:

> Anthony DeLorenzo said:
>
> >Searching the archives revealed this output in a few dmesg dumps, so I'm
> >assuming that this is normal, but nevertheless.... My SS2 w/PowerUp
> >reports this when it boots:
> >
> >cpu0 at mainbus0: cache chip bug; trap page uncached: W8601/8701 or
> >MB86903 @ 40Mhz, on-chip FPU
>

from sys/arch/sparc/sparc/cpu.c

/* Machines with "buserr-type" 1 have a bug in the cache chip that affects
traps. */

The "cache chip bug; trap page uncached" message is the kernel activating
a workaround for this bug.

Chris Torek

unread,

Jul 7, 2000, 3:00:00 AM7/7/00

to

>from sys/arch/sparc/sparc/cpu.c
>
>/* Machines with "buserr-type" 1 have a bug in the cache chip that affects
>traps. */
>
>The "cache chip bug; trap page uncached" message is the kernel activating
>a workaround for this bug.

For whatever it is worth, this comment and code is mine. I put it in
on, as I recall, advice from John Gilmore, who said that some sun4c
CPUs had this bug. The "buserr-type" test was my best guess at how
such CPUs were identified. I have never seen any official document
saying whether this is actually right, but it did seem to work.

Chris

David Brownlee

unread,

Jul 7, 2000, 3:00:00 AM7/7/00

to

How much of a performance penalty is the workaround and what sort
of symptoms does the bug exhibit otherwise?
(Just curious if its worth testing without the workaround on some
sun4c boxes :)

David/absolute
-- www.netbsd.org: No hype required --

Chris Torek

unread,

Jul 7, 2000, 3:00:00 AM7/7/00

to

>How much of a performance penalty is the workaround ...

Depends how many traps you take, and how often they would have been
in the cache? :-) (Note that the trap table occupies exactly one
page, so it is really just the 4 instructions at the beginning of
the trap that are not cached -- all traps just set up a few things
and jump to cached code.)

The original sparc chips have something like a 4 stage pipeline,
and a trap has to flush out the pipe anyway. CPU to cache i-fetch
bandwidth is probably not much different than CPU to main memory
i-fetch bandwidth, at 20 MHz (but certainly steeper at 40 MHz on SS2's,
yet more on Weitek PowerUP). The main penalty will be going through
the MMU, which I think relies on the cache to avoid table walks. At
a (very rough) guess, figure those 4 instructions from the trap page
run 2x to 3x slower than they would if they were from cache. Assuming
a typical "fast" trap (rwindow save/load) takes at least 120 cycles,
and that those 4 uncached instructions go from 4 cycles to 12 cycles,
you have gone from 120 to 128 cycles, or 6.66%.

A typical "slow" trap (e.g., syscall and return) is of course numbered
well in the thousands of cycles, so the penalty there drops way below
1%. Of course it is exactly those "fast" (e.g., rwindow) traps that
you care about here ... luckily (?) rwindow save/load is already badly
memory-bandwidth limited, so the faster the CPU, the more cycles it
spends waiting in the (cached) rwindow code anyway, hence the smaller
the effect of the (uncached) 4 instructions in the trap page.

>and what sort of symptoms does the bug exhibit otherwise?

If I recall correctly, the cache simply delivers wrong data. Often
this turns into an illegal instruction, so that you get a trap
during a trap, which causes a reset. This condition can only be
caught by the ROM, and by then it is too late to do anything about
it. In other cases, the wrong data might be a valid instruction;
who knows what will happen then.

Chris

David Brownlee

unread,

Jul 11, 2000, 3:00:00 AM7/11/00

to

Many thanks for the very complete answer :)
Do you mind if we add something based on this to the NetBSD/sparc
FAQ?

David/absolute
-- www.netbsd.org: No hype required --