Cache Flushing on PowerPC ISA

Roberto Brega

unread,

Jun 19, 1996, 3:00:00 AM6/19/96

to

Hi -

I need to flush the caches from, let's say, [a, b].
Where a and b are the start and end position of the memory block I would
like to flush from the cache. Should I "dcbst" every position between
the two addresses with something like:
loop: dcbst adr
adr:=adr+4
b loop

Or is there a smarter way? (Without calls to libraries, as always)

Thanks you.
--Roberto

Alan Booker

unread,

Jun 20, 1996, 3:00:00 AM6/20/96

to

> I need to flush the caches from, let's say, [a, b].
> Where a and b are the start and end position of the memory block I would
> like to flush from the cache. Should I "dcbst" every position between
> the two addresses with something like:
> loop: dcbst adr
> adr:=adr+4
> b loop
>

You need only dcbst one address for every cache line. Also use a
"sync" at the end of your loop to insure the operation completes
with respect to external memory
--

Alan Booker
IBM PowerPC Embedded Processors
"I work for IBM, but I don't speak for IBM"

Roberto Brega

unread,

Jun 21, 1996, 3:00:00 AM6/21/96

to

Alan Booker wrote:

> You need only dcbst one address for every cache line. Also use a
> "sync" at the end of your loop to insure the operation completes
> with respect to external memory
>

> Alan Booker
> IBM PowerPC Embedded Processors

Ok. I imagined that. Is there a way to find out during run-time the
cache-line length. I should write a general I-cache Flush routine, but I
do not have an underlying OS, which could tell me this information.

--Roberto

Alan Booker

unread,

Jun 21, 1996, 3:00:00 AM6/21/96

to

Roberto Brega wrote:
>
> . Is there a way to find out during run-time the
> cache-line length. I should write a general I-cache Flush routine, but I
> do not have an underlying OS, which could tell me this information.

I know of no straightfoward mechanism to do this. Anyone else?
There is one goofy idea that should work. Take an area larger than
any expected cache line size (aligned appropriately) and set it to
some non zero value. Use dcbz to set a cache line's worth of zeros.
See how many zero bytes you get..... (I said it was goofy)
--

Alan Booker
IBM PowerPC Embedded Processors

Cliff Click

unread,

Jun 21, 1996, 3:00:00 AM6/21/96

to

Roberto Brega <rbr...@iiic.ethz.ch> writes:

Alan Booker wrote:

> You need only dcbst one address for every cache line. Also use a
> "sync" at the end of your loop to insure the operation completes
> with respect to external memory
>

> Alan Booker
> IBM PowerPC Embedded Processors

Ok. I imagined that. Is there a way to find out during run-time the

cache-line length. I should write a general I-cache Flush routine, but I
do not have an underlying OS, which could tell me this information.

I believe a dcbst writes a "block", not a cache line. A block is 32
bytes long, which just happens to match the cache line size for some
caches.

Cliff
--
Cliff Click, Ph.D. Compiler Researcher & Designer
RISC Software, Motorola PowerPC Compilers
cli...@risc.sps.mot.com (512) 891-7240
http://members.aol.com/mjclick1

Dale Elson

unread,

Jun 21, 1996, 3:00:00 AM6/21/96

to

Alan Booker <ala...@raleigh.ibm.com> writes:

> Roberto Brega wrote:
> >
> > . Is there a way to find out during run-time the
> > cache-line length.

> I know of no straightfoward mechanism to do this. Anyone else?

Hi guys :-)

This is so simple that there's got to be a problem with it, but here
goes: Read the processor version number to find out what CPU it is, and
then use a table to find the cache line size from there?

Cheers...............Dale

Don North

unread,

Jun 22, 1996, 3:00:00 AM6/22/96

to

In article <4qesq0$20...@ausnews.austin.ibm.com>, el...@austin.ibm.com
(Dale Elson) wrote:

Here's a little chunk of code that finds the cache line size (or at least
the size of the granule that dcbX controls). An additional asm routine
that does nothing but a 'dcbz' needs to be added as well:

-----asm-code-----------------------------------------------------------
dcbz: li r4, 0 ; generate a zero offset
dcbz r3, r4 ; zero at this address
blr ; return via the link register
-----c-code-------------------------------------------------------------
// size of a block that is (much) larger than a cache line (ie, a page)
#define SIZE 4096

// external references
extern void dcbz(char *);

// determine cache line size
long PPClineSize ()
{
char * block = (char *) NewPtr(SIZE);
char * p;
long i;
long size;

// fill entire block with all ones
for (i = SIZE, p = block; --i >= 0; ) *p++ = -1;

// now clear a cache line in the middle...
dcbz(&block[SIZE/2]);

// now count all the zero bytes in the block... it's the line size
for (i = SIZE, size = 0, p = block; --i >= 0; ) if (*p++ == 0) ++size;

DisposePtr(block);
return size;
}
------------------------------------------------------------------------

----------------------------------------------------------------------
Don North Apple Computer, Inc
internet: no...@apple.com System Architecture Group
AppleLink: NORTH One Infinite Loop / MS 301-4G
KD6JTT Cupertino, CA 95014
----------------------------------------------------------------------
{ Facts are facts, but any opinions expressed are my own, and do not }
{ represent any viewpoint, official or otherwise, of Apple Computer }
----------------------------------------------------------------------

Michael Meissner

unread,

Jun 23, 1996, 3:00:00 AM6/23/96

to

In article <4qesq0$20...@ausnews.austin.ibm.com> el...@austin.ibm.com (Dale
Elson) writes:

| This is so simple that there's got to be a problem with it, but here
| goes: Read the processor version number to find out what CPU it is, and
| then use a table to find the cache line size from there?

Except of course the nasty little fact that you can't read the PIR register
from user mode, unless the OS cooperates.
--
Michael Meissner, Cygnus Support (East Coast)
Suite 105, 48 Grove Street, Somerville, MA 02144, USA
meis...@cygnus.com, 617-629-3016 (office), 617-629-3010 (fax)

Bruce Hoult

unread,

Jun 24, 1996, 3:00:00 AM6/24/96

to

Alan Booker <ala...@raleigh.ibm.com> writes:
> Roberto Brega wrote:
> >
> > . Is there a way to find out during run-time the

> > cache-line length. I should write a general I-cache Flush routine, but I
> > do not have an underlying OS, which could tell me this information.

> I know of no straightfoward mechanism to do this. Anyone else?

> There is one goofy idea that should work. Take an area larger than
> any expected cache line size (aligned appropriately) and set it to
> some non zero value. Use dcbz to set a cache line's worth of zeros.
> See how many zero bytes you get..... (I said it was goofy)

It varied all over the place on the Power RS/6000's, but there was an
instruction (deleted on PowerPC) that returned the cache line size.

Doesn't the PowerPC architecture define it to always be 16? (I don't
have a reference for that)

-- Bruce

Zalman Stern

unread,

Jun 24, 1996, 3:00:00 AM6/24/96

to

In article <donald-2206...@204.75.62.47>, don...@ppp.ablecom.net

(Don North) wrote:
> Here's a little chunk of code that finds the cache line size (or at least
> the size of the granule that dcbX controls). An additional asm routine
> that does nothing but a 'dcbz' needs to be added as well:

[Code deleted.]

I wrote a similar routine at a previous job. However, mine used a stack
local buffer that was twice the maximum block size supported and then
computed an address aligned to the maximum size needed. (You have to
allocate the bigger buffer to ensure you can get an aligned address that
fits.) I then did a dcbz on the aligned address and counted the number of
zeros stored at that address. This will run very quickly.

The resaon for using this approach over a procssor version mapping table
is that your code will run on new PowerPC processors without modification.
(I was once told that there is such code in MacOS, though it is
significantly more complicated because it also determines the size and
geometry of the first two levels of cache. If you are looking for code to
do this, I think there is some in Larry McVoy's lmbench package.) It is
also possible that the line size could change for a processor with the
same revision number. (Notably for chips with off-chip L1 caches. Though
this is unlikely to be seen in the PowerPC realm.)

The downside of this approach for embedded systems is that the dbcz
instruction can fault in certain circumstances. In particular must make
sure that the memory being dealt with is cached. (Which really shouldn't
be a problem, except if you want to support processors with no cache at
all. In which case you should install an exception handler to detect that
case.)

I suppose there is also a problem with cache blocks bigger than the max
size wired in to the routine. (That is, the code can overwrite other
memory if the dbcz instruction writes more data than you planned for.) I
don't expect to ever see 4k byte L1 cache blocks so I'm not too worried
about this. (My routine allocated an 8k stack buffer and computed a 4k
aligned address.)

Zalman Stern, Caveman Programmer, Macromedia Video Products, (415) 378 4539
3 Waters Dr. #100, San Mateo CA, 94403, zal...@macromedia.com
If you squeeze my lizard, I'll put my snake on you -- Lemmy

Tim Olson

unread,

Jun 25, 1996, 3:00:00 AM6/25/96

to

In article <29184...@hoult.actrix.gen.nz>
Br...@hoult.actrix.gen.nz (Bruce Hoult) writes:

> It varied all over the place on the Power RS/6000's, but there was an
> instruction (deleted on PowerPC) that returned the cache line size.
>
> Doesn't the PowerPC architecture define it to always be 16? (I don't
> have a reference for that)

Nope -- the PowerPC architecture is silent on any particular
implementation's cache characteristics such as split i/d vs unified
i/d, block/line size, etc. Some examples:

601 64-byte block, 2 32-byte sectors, DCBZ works on a sector
603, 604 32-byte block
620 64-byte block

-- Tim Olson
Apple Computer, Inc.
(t...@apple.com)

Wolfgang Solfrank

unread,

Jun 25, 1996, 3:00:00 AM6/25/96

to

In article <zalman-2406...@198.95.245.190> zal...@macromedia.com (Zalman Stern) writes:

> I suppose there is also a problem with cache blocks bigger than the max
> size wired in to the routine. (That is, the code can overwrite other
> memory if the dbcz instruction writes more data than you planned for.) I
> don't expect to ever see 4k byte L1 cache blocks so I'm not too worried
> about this. (My routine allocated an 8k stack buffer and computed a 4k
> aligned address.)

Since caching can be controlled with page granularity in the mmu, cache
lines larger than 4k, while theoretically possible, wouldn't make too
much sense.

Just my 0.02$.
--
w...@TooLs.DE (Wolfgang Solfrank, TooLs GmbH) +49-228-985800

Dale Elson

unread,

Jun 26, 1996, 3:00:00 AM6/26/96

to

t...@apple.com (Tim Olson) writes:

> 601 64-byte block, 2 32-byte sectors, DCBZ works on a sector

The PowerPC architecture defines the cacheable unit as a block. In the
case of the 601, each sector is a block. The granularity of the
cacheable unit remains 32 bytes for the 601/3/4 families. That's why you
are correct in saying that DataCacheBlockZero works on the sector :-).

> 603, 604 32-byte block
> 620 64-byte block

There is such a massive difference between the 60x and the 620 that I
wonder if any code written for the 604 will be running on the 620.

Cheers..............Dale

Zalman Stern

unread,

Jun 27, 1996, 3:00:00 AM6/27/96

to

In article <4qrera$2n...@ausnews.austin.ibm.com>, el...@austin.ibm.com

(Dale Elson) wrote:
> There is such a massive difference between the 60x and the 620 that I
> wonder if any code written for the 604 will be running on the 620.

Are you refering to operating systems? If not, 99.9% of the Power
Macintosh code I've written doesn't work on the 620, then the 620 is
broken. (And the little bit that won't, won't run on a 604 either. Well
actually, it will with Apple's OS because there is emulation for a lot of
601 stuff. Hence it will likely work on a 620 if they ever build a 620
based machine.)

Not that I ever expect to see the original 620 at this point. However it
is important to realize that PowerPC is an archtiecture spec and it says
what works and what doesn't. Almost all the code running on Power Mac's
adheres to that architecture spec and so it should run on a 620. Of course
it will not take advantage of 64-bit operation.

Lawson English

unread,

Jun 27, 1996, 3:00:00 AM6/27/96

to

Zalman Stern <zal...@macromedia.com> wrote:
: In article <4qrera$2n...@ausnews.austin.ibm.com>, el...@austin.ibm.com

: (Dale Elson) wrote:
: > There is such a massive difference between the 60x and the 620 that I
: > wonder if any code written for the 604 will be running on the 620.

[snipt]
: Not that I ever expect to see the original 620 at this point. However it

: is important to realize that PowerPC is an archtiecture spec and it says
: what works and what doesn't. Almost all the code running on Power Mac's
: adheres to that architecture spec and so it should run on a 620. Of course
: it will not take advantage of 64-bit operation.

I believe that there is a PowerPC variant that doesn't run the 32-bit ISA.

I don't know if this is an actual product or a proposed one, but by
definition, it couldnt' be a PowerPC, since all PPC's allow at least the
32-bit ISA to run.

Could he be confusing the 620 with this 64-bit only version?

--
-------------------------------------------------------------------------------
Lawson English Toolbars are evidence that
eng...@primenet.com Windows really is a cargo cult...
-Kevin Killion
-------------------------------------------------------------------------------

Tim Olson

unread,

Jul 1, 1996, 3:00:00 AM7/1/96

to

In article <zalman-2706...@198.95.245.190>
zal...@macromedia.com (Zalman Stern) writes:

> In article <4qrera$2n...@ausnews.austin.ibm.com>, el...@austin.ibm.com
> (Dale Elson) wrote:
> > There is such a massive difference between the 60x and the 620 that I
> > wonder if any code written for the 604 will be running on the 620.
>

> Not that I ever expect to see the original 620 at this point. However it
> is important to realize that PowerPC is an archtiecture spec and it says
> what works and what doesn't. Almost all the code running on Power Mac's
> adheres to that architecture spec and so it should run on a 620. Of course
> it will not take advantage of 64-bit operation.

One big difference is the cache block size (620 uses a 64-byte cache
block). And unfortunately there are a number of places where system and
application code *assume* that the cache block size is 32 bytes. See,
for example, the discussion of the memset routine in this newsgroup...