Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MC68030 Cache Organization

6 views
Skip to first unread message

ag...@ccvaxa.uucp

unread,
Sep 29, 1986, 11:05:00 PM9/29/86
to

Motorola 68030 Cache Organization
---------------------------------

Can someone explain to me why the change in cache organization between
the 68020 and the 68030 is such a win? I don't need numbers, I'd just like
a rationalization that explains the mechanism.

NB. I'm not talking about the separate address/data lines to the I-cache
- that's obviously an improvement. What I refer to is the comment in
_Electronics_ that goes like this:

To improve the likelihood of cache hits, Motorola is also reorganizing
the 256-byte instruction cache into 16 entries of four long words each
with 4 bytes per word. The 68020 instruction cache consists of 64 entries
each of one long word... The reorganized instruction cache, along with
the new burst mode addressing methods, should double the cache hit ratio
and reduce the number of times the 68030 must access the system bus.

First off, reducing the number of entries that can be independently
associated seems to be a loss, not a win. But, have they changed the cache
structure - is it fully associative now, where it wasn't before? Maybe they
just needed fewer entries so that they could do it fast enough for a 1-cycle
access with the separate A/D buses.

Do the entries have to be strictly aligned, on a multiple of 16 byte
boundary, or can they be skewed? I'd suspect the former. If so, this means
that there will be advantages to aligning the top of your inner loops on a
16 byte boundary. NOPs, anybody?

Why the emphasis on "four long words each with 4 bytes per word"? I assume
that the 4 long words reflect how the cache line is filled, by a Modulo 4
burst mode memory access. That's probably one of the big advantages of
this cache organization - it doesn't increase cache hit ratio so much as
decrease the time necessary to make good a cache miss, so that you can get
back to work quicker. Also, if you are sequentially accessing memory, as
you do in a linear instruction stream, you may have obtained the next word,
due to a burst mode line fill, before the processor asks for it - whereas
if you weren't prefetching you'd have another miss, and even if you were
prefetching but were using a slower memory access, you might not have it
ready in time.

The emphasis on `bytes' in the instruction cache probably means that it is
easier for the execution unit to pull funny sized instructions out of the
cache. Ahh, the joys of variable length instruction sets!

The orientation to longer lines, filled faster by burst mode, is probably
a good thing for an instruction cache, but one wonders whether it is so
good for a data cache. Probably is for floating point numbers, which by
themselves can fill up a cache line, or for matrix processing or graphics
where you do a lot of sequential access to data, but maybe not so good
for systems that use a lot of pointer accesses to random fields in
structures, picking out, say, only one byte on every cache line filled.
Could Motorola have given us a 64 entry 1 word per line data cache, like
the 68020's instruction cache?

(Oh, another thing: TLB address translation is done in parallel with
cache access. Does this mean that the cache is virtual? Does it do
invalidations according to physical addresses off the external bus,
or what?)

Summing up, I see these as the tradeoffs that came into the 68030 cache:
LOSS fewer independent entries
GAIN faster association on the fewer entries?
GAIN faster filling using burst mode
Longer cache lines
GAIN for instructions
GAIN for numerical and sequentially accessed data
LOSS for pointer/structure oriented programs?

Am I missing or confused about anything?

Andy "Krazy" Glew. Gould CSD-Urbana. USEnet: ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801 ARPAnet: aglew@gswd-vms

Richard Simoni

unread,
Oct 1, 1986, 8:09:24 PM10/1/86
to
In article <5100146@ccvaxa> ag...@ccvaxa.UUCP writes:
>
>Motorola 68030 Cache Organization
>---------------------------------
>Oh, another thing: TLB address translation is done in parallel with
>cache access. Does this mean that the cache is virtual?

This doesn't necessarily follow. Address translation is often done in
parallel with cache access by using only the low-order bits of the virtual
address (i.e., the bits that indicate the offset within the page) to address
the cache. This is possible because these offset bits do not change in
the virtual-to-physical mapping. When the cache access is complete, the
tag (which is a physical page number) is compared with the result of the
address translation (which happened in parallel with the cache access) to
see if a hit occurred in the cache.

The problem with this scheme is that it can be difficult to build a large
cache since the page size limits the number of bits that can be used to
address the cache. The size of the cache can be increased by making the
cache set-associative and/or by increasing the page size (thereby
increasing the number of bits that can address the cache). Of course, an
on-chip cache (as in the 68030 case) will not be very large, anyway.

Rich Simoni
Center for Integrated Systems
Stanford University
sim...@sonoma.stanford.edu
...!decwrl!glacier!shasta!simoni

John R. Levine

unread,
Oct 3, 1986, 7:11:35 PM10/3/86
to
In article <5100146@ccvaxa> ag...@ccvaxa.UUCP writes:
>
>Motorola 68030 Cache Organization
>---------------------------------
>
>Can someone explain to me why the change in cache organization between
>the 68020 and the 68030 is such a win? ...

>
> To improve the likelihood of cache hits, Motorola is also reorganizing
> the 256-byte instruction cache into 16 entries of four long words each
> with 4 bytes per word. The 68020 instruction cache consists of 64 entries
> each of one long word... The reorganized instruction cache, along with
> the new burst mode addressing methods, should double the cache hit ratio
> and reduce the number of times the 68030 must access the system bus.

According to an article in Digital Design, the big win with this kind of
cache design is that it takes advantage of nibble mode RAM chips that can
cycle four sequential bits out very fast. It means you can get four times the
data in a bus transaction in much less than four times the time. Since much
read access is sequential anyway (instruction execution, or scanning a string
or a table) it's a big win.
--
John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Lev...@YALE.EDU
The opinions expressed herein are solely those of a 12-year-old hacker
who has broken into my account and not those of any person or organization.

jo...@polaris.uucp

unread,
Oct 13, 1986, 7:07:35 PM10/13/86
to
In article <8...@Shasta.STANFORD.EDU> sim...@Shasta.UUCP (Richard Simoni) writes:
>This doesn't necessarily follow. Address translation is often done in
>parallel with cache access by using only the low-order bits of the virtual
>address (i.e., the bits that indicate the offset within the page) to address
>the cache. This is possible because these offset bits do not change in
>the virtual-to-physical mapping. When the cache access is complete, the
>tag (which is a physical page number) is compared with the result of the
>address translation (which happened in parallel with the cache access) to
>see if a hit occurred in the cache.
>
>The problem with this scheme is that it can be difficult to build a large
>cache since the page size limits the number of bits that can be used to
>address the cache. The size of the cache can be increased by making the
>cache set-associative and/or by increasing the page size (thereby
>increasing the number of bits that can address the cache). ...

The largest cache that can be implemented in this way is the product
of the degree of set associativity and the page size. Larger caches
can be implemented but you will sometimes have to take some extra
cycles if you've "guessed" wrong (by using some of the virtual
address bits before translation) about where things are. In the
IBM 3090, the page size is 4K, the degree of set associativity is
4 and the cache size is 64K. Therefore the problem arises and
is addressed (oops, sorry 'bout that) by comparing all 16 (!) real
address tags and taking some small amount of extra time when the
thing you want isn't exactly where you expected it to be. See the
article on the 3090 in a recent issue of the IBM Systems Journal
by Stu Tucker (Vol. 25, No. 1, 1986, pp. 4-19 "The IBM 3090 System:
an Overview").
--

Josh Knight, IBM T.J. Watson Research
jo...@ibm.com, jo...@yktvmh.bitnet, ...!philabs!polaris!josh

0 new messages