Addressing modes (long)

John R. Mashey

unread,

Mar 5, 1993, 6:47:23 PM3/5/93

to

Several people (correctly) questioned my comment about lack of
autodecrement mode in RISCs, and others wanted to see the big chart of addressing modes.

I.
1) When I said I didn't know of a clear RISC with auto-decrement, what I
meant by auto-decrement was the specific STACK-PUSH (or equivalent) instruction,
i.e. usually done by:
STORE --(REG) or equivalent
such that:
a) effective address = (REG) - sizeof(data object)
b) REG <- effective address

2) HP PA, POWER, i860 (in various combinations) have the more general
a) effective address = (REG)+displacement (or other forms)
b) REG <- effective address
which is usually more general, and subsumes auto-decrement if you allow
negative displacements,
although sometimes only done for FP load/store to avoid extra integer
register write port.

Form 1 was often used in dense-encoded instruction sets to get a very
short instruction for pushing registers onto down-growing stacks, of course.
In the big table I was looking at, that I did several years ago, I called
1) and 2) different addressing modes, which is what I was thinking of.

II. THE GIANT ADDDRESSING MODE TABLE (Corrections happily accepted)

Address mode summary
r register
r+ autoincrement (post) [by size of data object]
-r autodecrement (pre) [by size,...and this was the one I meant]
>r modify base register [generally, effective ddress -> base]

d displacement d1 & d2 if 2 different displacements
x index register
s scaled index
a absolute [as a separate mode, as opposed to displacement+(0)
I Indirect
Shown below are 22 distinct addressing modes [you can argue whether
these are right categories]. In the table are the *number* of different
encodings/variations [and this is a little fuzzy; you can especially argue about
the 4 in the HP PA column, I'm not even sure that's right]. For example,
I counted as different variants on a mode the case where the structure was the same, but there were different-sized displacements that had to be decoded.
Note that meaningfully counting addressing modes is *at least as bad* as meaningfully counting opcodes; I did the best I could, and I spect a lot of
hours looking at manuals for the chips I hadn't programmed much, and in
some cases, even after hours, it was hard for me to figure out
meaningful numbers...

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
r r
r r r +d1 +d1
r r r | | r r | r r+ +d +d1 I +s
r r r +d +x +s| s+ s+|s+ +d +d|r+ +d I I I +s I
r +d +x +s >r >r >r|r+ -r a a r+|-r +x +s|I I +s +s +d2 +d2 +d2
-- -- -- -- -- -- --|-- -- -- -- --|-- -- --|-- -- -- -- --- --- ---
AMD 29K 1 | | |
Rxxx 1 | | |
SPARC 1 1 | | |
88K 1 1 1 | | |
HP PA 2 1 1 4 1 1| | |
ROMP 1 | | |
POWER 1 1 1 1 | | |
i860 1 1 1 1 | | |
Swrdfish 1 1 1 | 1 | |

Clipper 1 3 1 | 1 1 2 | |
i960KB 1 1 1 1 | 2 2 | 1 |

S/360 1 | 1 |
i486 1 3 1 1 | 1 1 2 | 2 3|
NSC32K 3 | 1 1 3 3 | 3| 9
MC68000 1 1 | 1 1 2 | 2 |
MC68020 1 1 | 1 1 2 | 2 4| 16 16
VAX 1 3 1 | 1 1 1 1 1| 1 3| 1 3 1 3

COLUMN NOTES:

1) Columns 1-7 are addressing modes used by many machines, but very few,
if any clearly-RISC architectures use anything else. They are all
characterized by what they don't have:
2 adds needed before generating the address
indirect addressing
variable-sized decoding

2) Columns 13-15 include fairly simple-looking addressing modes, which however,
*may* require 2 back-to-back adds beforethe address is available. [*may*
because some of them use index-register=0 or something to avoid
indexing, and usually in such machines, you'll see variable timing figures,
depending on use of indexing.

3) Columns 16-22 use indirect addressing.

ROW NOTES
1) Clipper & i960, of curent chips, are more on the RISC-CISC border,
or are "modern CISCs" than most.

2) ROMP has a number of characteristics different fro mteh rest of the RISCs,
you might call it "early RISC", and it is of course no longer made.

3) You might consider HP PA a little odd, as it appears to have more addressing
modes, in the same way that CISCs do, but I don't think this is the case: it's an issue of whether you call something several modes or one mode with a modifier, just as there is trouble counting opcodes (with & without modifiers).
From my view, neither PA nor POWER have truly "CISCy" addressing modes.

4) Notice difference between 68000 and 68020 (and later 68Ks): a bunch of
incredibly-general & complex modes got added...

5) Note that the addressing on the S/360 is actually pretty simple.

6) A dimension *not* shown on this particular chart, but also highly
relevant, is that this chart shows the different *types* of modes, *not*
how many can be found in each instruction. That may be worth noting also:
AMD - i960 1
S/360 - MC68020 2
VAX 6
By looking at alignment, indirect addressing, and looking only at those
chips that have MMUs,
consider the number of times an MMU *might* be used per instruction for
data address translations:
AMD - Clipper 2 [Swordfish & i960KB: no TLB]
S/360 - NSC32K 4
MC68Ks (all) 8
VAX 24

When RS/6000 does unaligned, it must be in the same cache line
(and thus also in same MMU page), and traps to software otherwise, thus
avoiding numerous ugly cases.

(Note: in some sense, S/360s & VAXen can use an arbitrary number of translations
per instruction, with MOVE CHARACTER LONG, or similar operations & I don't
count them as more, because they're defined to be interruptable/restartable,
saving state in general-purpose registers, rather than hidden internal state.

SUMMARY:
1) It should be clear from this, that computer design styles mostly changed from
machines with:
2-6 addresses per instruction, with variable sized encoding
address specifiers were usually "orthogonal", so that any could ggo
anywhere in an instruction
sometimes indirect addressing
sometimes need 2 adds *before* effective address is available
sometimes with many potential MMU accesses (and possible exceptions)
per instruciton, often buried in the middle of the instruction,
and often *after* you'd normally want to commit state because
of auto-increment or other side effects.
to machines with:
1 address per instruction
address specifiers encoded in small # of bits in 32-bit instruction
no indirect addressing
never need 2 adds before address available
use MMU once per data access

and we usually call the latter group RISCs.

2) Now, ignoring any other features, but looking at this single attribute
(architectural addressing features and implementation effects therof),
it ought to be clear to anybody with half a brain that the machines in
the first part of the table are doing something technically different
from those in the second part of the table. Thus, people may sometimes
call something RISC that isn't, for marketing reasons, but the people
calling the first batch RISC really did have some serious technical issues at
heart.

Hence, whether RISC is better than CISC need not be debated here, but I hope
this table, and the earlier observations about implementation trickiness,
make it clear that anyone who says "RISC is just a marketing term":
a) Is ignorant of serious computer design issues.
b) Is happy to spread their ignorance to others...

3) Maybe this should be a FAQ; I'm not going to type it in again :-)
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@sgi.com
DDD: 415-390-3090
USPS: Silicon Graphics 7U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

John F Carr

unread,

Mar 7, 1993, 3:10:32 PM3/7/93

to

In article <1993Mar5.2...@odin.corp.sgi.com>

ma...@mash.wpd.sgi.com (John R. Mashey) writes:

>II. THE GIANT ADDDRESSING MODE TABLE (Corrections happily accepted)

> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
> r r
> r r r +d1 +d1
> r r r | | r r | r r+ +d +d1 I +s
> r r r +d +x +s| s+ s+|s+ +d +d|r+ +d I I I +s I
> r +d +x +s >r >r >r|r+ -r a a r+|-r +x +s|I I +s +s +d2 +d2 +d2
> -- -- -- -- -- -- --|-- -- -- -- --|-- -- --|-- -- -- -- --- --- ---

>ROMP 1 | | |

>VAX 1 3 1 | 1 1 1 1 1| 1 3| 1 3 1 3

Actually ROMP has 3 addressing modes (register, register + 4 bit, register +
16 bit) and should read:

ROMP 1 2

However it is still fundamentally different from the VAX, which also has
multiple forms of r+d adressing. Unlike the VAX, the instruction opcode
selects the displacement size. Instructions with the first byte between 00
and 7F are 16 bit instructions; 1x, 2x, 3x, 4x, 5x, and 7x are all decoded
the same way:

opcode displacement data register address register

Compare to the VAX where you have to read the operand specifier byte to
determine how many displacement bytes follow (you can speculatively read the
displacement in parallel and ignore the value if you don't need it, but it's
still harder than decoding RISC instructions).

The RT had some bugs associated with variable length instructions:

. on exiting a loop with certain instruction alignment, 16 bit
blocks could be fetched out of order under certain (rare) conditions

. when an instruction in a branch delay slot spans a page boundary
and the second page isn't valid, bad things happen

Does anyone know of similar bugs in other machines with variable length
instructions?

--
John Carr (j...@athena.mit.edu)

Jonathan Thornburg

unread,

Mar 7, 1993, 3:35:48 PM3/7/93

to

In article <1ndkro...@senator-bedfellow.MIT.EDU> j...@athena.mit.edu
(John F Carr) writes:
>The [IBM] RT had some bugs associated with variable length instructions:

>
> . on exiting a loop with certain instruction alignment, 16 bit
> blocks could be fetched out of order under certain (rare) conditions
>
> . when an instruction in a branch delay slot spans a page boundary
> and the second page isn't valid, bad things happen
>
>Does anyone know of similar bugs in other machines with variable length
>instructions?

"The Soul of a New Machine" (book by Tracy Kidder, should be in
comp.arch FAQ) describes a bug of this type in the prototype of
what later became the first Data General MV/10000.

I understand that the Vax 11/750 had similar problems, which managed
to break the inner-loop code for printf(3) ! Could someone who knows
what really happened here please post the true story? (It really belongs
in the alt.folklore.computers FAQ, but still, it has at least *something*
to do with architecture -- the risks of very intricate instruction sets.)

The Sun 3/50 had a bug of this type: under certain circumstances
a multibyte (68020) instruction which crossed a page boundary, and
had a DMA memory access between the 1st-page and the 2nd-page ifetches,
and then got a page fault on the 2nd page ifetch (I think I have that
right, this is from memory of a Usenet posting ~3-5 years ago), the
2nd-page ifetch would erroniously return all zeros.

The thing that made the tracking down of this bug truly awe-inspiring
is that the person who found it did so based on a 6 megabyte lisp
program whose only symptom of the failure was sporadic core dumps!
I know *I* could never track down such a bug, let along dive into
the hardware and find the problem with the dual-porting of the memory
between CPU & DMA, and I'm utterly astonished that any mortal human
could!

- Jonathan Thornburg
<jona...@hermes.chpc.utexas.edu> or <jona...@einstein.ph.utexas.edu>
[until 31/Aug/93] U of Texas at Austin / Physics Dept / Center for Relativity
and [until ~Apr/93] U of British Columbia / {Astronomy,Physics}

Torben AEgidius Mogensen

unread,

Mar 8, 1993, 7:16:07 AM3/8/93

to

ma...@mash.wpd.sgi.com (John R. Mashey) writes:

>II. THE GIANT ADDDRESSING MODE TABLE (Corrections happily accepted)

>Address mode summary
>r register
>r+ autoincrement (post) [by size of data object]
>-r autodecrement (pre) [by size,...and this was the one I meant]
>>r modify base register [generally, effective ddress -> base]

>d displacement d1 & d2 if 2 different displacements
>x index register
>s scaled index
>a absolute [as a separate mode, as opposed to displacement+(0)
>I Indirect
>Shown below are 22 distinct addressing modes [you can argue whether
>these are right categories]. In the table are the *number* of different

>encodings/variations.

> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
> r r
> r r r +d1 +d1
> r r r | | r r | r r+ +d +d1 I +s
> r r r +d +x +s| s+ s+|s+ +d +d|r+ +d I I I +s I
> r +d +x +s >r >r >r|r+ -r a a r+|-r +x +s|I I +s +s +d2 +d2 +d2
> -- -- -- -- -- -- --|-- -- -- -- --|-- -- --|-- -- -- -- --- --- ---
>AMD 29K 1 | | |
>Rxxx 1 | | |
>SPARC 1 1 | | |
>88K 1 1 1 | | |
>HP PA 2 1 1 4 1 1| | |

....

You can add ARM:
ARM 2 2 2 1 1| 1 1

Since the scale for the index is given as a shift/rotate by any
constant number of bits, not by the size of the object moved this is a
bit more general than normal scaled indexing. Setting the scaling to
shift 0 bits gives unscaled indexing, but I haven't added this to the
table as this is just a special case of scaled indexing (just like
register addressing is a special case of indexed addressing). ARM also
has some addressing modes NOT mentioned in the table. The idea is that
you can modify the base register to some "effective address", but use
the unmodified base address for accessing memory. This corresponds to
post-incremented (or decremented) addressing. The "2"'s mentioned in
the table are because post (in/de)crement has no effect unless
write-back is specified.

The entries for register, post- and pre- (in/de)cremented is for
load/store multiple registers. There are no displacement on the
addressing modes for these, but before OR after each load/store the
base address is modified by +/- 4. Note that it is possible to have
pre-increment and post-decrement, so again the table is insufficient.
If write-back is specified, the final modified base address is written
back to the base register. Note that post modification now matters
even when there is no write-back, as it controls the modification of
the base address between each load/store.

Some may argue that this is a lot of addressing modes for a supposed
RISC, but they all share the property that the effective address can
be calculated in one cycle using the standard ALU. Load/store multiple
registers require an extra register for a temporary copy of the base
address, and must be able to do a write to this register at the same
time it does a write to a loaded register. But the advantage of
multiple register load/store outweighs the extra cost, so this is just
an application of the RISC principle rather than a violation.

The very first (pre-production) ARM had an addressing mode where the
scaling of the index could be controlled by a register. This required
little extra hardware, as the ALU already had (and still has) this
capability. It was, however, found that this mode was rarely used, so
it was taken out of the specification and is not found on any present
ARMs. In an earlier posting I mistakenly stated that ARM still has
this addressing mode. The bit that specifies this is still there, but
it has no effect. I don't know of any plans to reintroduce this mode
or use the bit for other purposes.

Torben Mogensen (tor...@diku.dk)

John R. Mashey

unread,

Mar 8, 1993, 2:56:07 PM3/8/93

to

In article <1ndkro...@senator-bedfellow.MIT.EDU>, j...@athena.mit.edu (John F Carr) writes:

|> Actually ROMP has 3 addressing modes (register, register + 4 bit, register +
|> 16 bit) and should read:
|>
|> ROMP 1 2

Thanx; I'll repost the big chart after any other corrections (there
have been some others) come in.

Steve Hobbs

unread,

Mar 8, 1993, 5:57:55 PM3/8/93

to

When John Mashey posts his updated address mode table it would be interesting
if knowledgeable people could post restrictions on these address modes.

For example, John posts that the the Intel i860 has 4 address modes,
r+d, r+x, r+d<r and r+x<r. However, only the the floating point loads
and stores can use all four modes.

The integer loads can only use r+d and r+x. The integer loads do not
use r+d<r and r+x<r because that would require another write port into
the integer register file.

The integer stores can only use r+d. The integer stores do not use r+x
because this would do three reads from the integer registers and would require
another read port in the integer register file.

I do not know why i860 integer stores cannot use the r+d<r addressing mode.
It seems that integer register write port is available during this
operation. Perhaps, someone from Intel can explain.

However, I suspect that other architecture also limit address modes. For
example, do any of the architectures that support r+x<r allow this on integer
load? If it is allowed, are there two write ports to the integer register
file or does the machine stall if a long sequence of such loads are executed?
Does any machine allow r+x address mode on integer store and does such a
machine require 3 read ports on the integer register file?

Josh Osborne

unread,

Mar 8, 1993, 7:29:26 PM3/8/93

to

In article <1993Mar8.2...@nntpd.lkg.dec.com> ho...@steven.enet.dec.com (Steve Hobbs) writes:
>When John Mashey posts his updated address mode table it would be interesting
>if knowledgeable people could post restrictions on these address modes.
>
>For example, John posts that the the Intel i860 has 4 address modes,
>r+d, r+x, r+d<r and r+x<r. However, only the the floating point loads
>and stores can use all four modes.
>
>The integer loads can only use r+d and r+x. The integer loads do not
>use r+d<r and r+x<r because that would require another write port into

>the integer register file. [...]

If I remember correctly the i860 has one register file (not N I regs and N
FP regs). It does have a load and fload, the fload does not enter data into
the cache (useful for walking through vectors), I don't remember if they
normalise, but I don't think they do.

--
Disclaimer:
Speaking for myself, not UUNET.

Dik T. Winter

unread,

Mar 8, 1993, 8:58:20 PM3/8/93

to

In article <1ngod...@biolante.UU.NET> str...@biolante.UU.NET (Josh Osborne) writes:
> If I remember correctly the i860 has one register file (not N I regs and N
> FP regs).

You remember wrong.

> It does have a load and fload, the fload does not enter data into
> the cache (useful for walking through vectors), I don't remember if they
> normalise, but I don't think they do.

There are two floads, one pipelined, one not pipelined. One enters data in
the cache, one does not. They do not normalize.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland
home: bovenover 215, 1025 jn amsterdam, nederland; e-mail: d...@cwi.nl

Dale Morris

unread,

Mar 9, 1993, 3:15:37 PM3/9/93

to

Steve Hobbs (ho...@gemmax.enet.dec.com) wrote:
| When John Mashey posts his updated address mode table it would be interesting
| if knowledgeable people could post restrictions on these address modes.
|

| [...]

|
| However, I suspect that other architecture also limit address modes. For
| example, do any of the architectures that support r+x<r allow this on integer
| load? If it is allowed, are there two write ports to the integer register
| file or does the machine stall if a long sequence of such loads are executed?
| Does any machine allow r+x address mode on integer store and does such a
| machine require 3 read ports on the integer register file?

PA-RISC supports indexed integer loads with base register modification
(r+x>r). Generally this is implemented with 2 write ports, so there are no
stalls for sequences of loads.

------------------------------------------------------------------------------
Dale Morris | Now is the time, and now is the record...
mor...@nsa.hp.com | of the time.
------------------------------------------------------------------------------

Mike Albaugh

unread,

Mar 10, 1993, 10:39:17 AM3/10/93

to

Jonathan Thornburg (jona...@chpc.utexas.edu) wrote:
: In article <1ndkro...@senator-bedfellow.MIT.EDU> j...@athena.mit.edu

: (John F Carr) writes:
: >The [IBM] RT had some bugs associated with variable length instructions:
: >
: > . on exiting a loop with certain instruction alignment, 16 bit
: > blocks could be fetched out of order under certain (rare) conditions
: >
: > . when an instruction in a branch delay slot spans a page boundary
: > and the second page isn't valid, bad things happen
: >
: >Does anyone know of similar bugs in other machines with variable length
: >instructions?

:
: I understand that the Vax 11/750 had similar problems, which managed

: to break the inner-loop code for printf(3) ! Could someone who knows
: what really happened here please post the true story? (It really belongs
: in the alt.folklore.computers FAQ, but still, it has at least *something*
: to do with architecture -- the risks of very intricate instruction sets.)

Not sure about that particular 750 horror story, but the one that
was widely reported in the media at the time had to do with a certain DRAM
manufacturer shipping chips with shorter-than-spec refresh interval. Under
VMS, the normal "background" access pattern was sufficient to make up the
difference, but Unix had different access characteristics, and the result
was corrupted page tables and failure.

My personal favorite was the 780 bug with RTE (or whatever DEC
calls it) back to emulation mode where the stack frame crossed a page
boundary with the lower page was valid but the upper wasn't. This would
"suck dust" into the PSW and proceed :-) My boss actually found it in a
home-brew RT-11 emulator, but after reporting it (and getting a polite
"go away boy, you bother me" from DEC) we also found that it could hit
stand-alone BACKUP. Our "solution" was to have the trap handler "touch"
the first and last words of the frame just before returning, but we don't
know whether DEC ever made similar changes to BACKUP.

And then there was the Amdahl manual which said, paraphrased,
"We meet the IBM/370 spec as detailed in the documentation except that
when a transfer in channel instruction transfers to an invalid address,
the error report will contain the invalid address, rather than the
address of the erroneous transfer. That's what a real 370 does". In other
words, bug-for-bug compatible :-)

[Sun bug deleted]
: The thing that made the tracking down of this bug truly awe-inspiring

: is that the person who found it did so based on a 6 megabyte lisp
: program whose only symptom of the failure was sporadic core dumps!

A friend of mine found an anomaly in the CDC7600 cache based on
core dumps from a Fortran program that only failed when it was the only
process on the machine, i.e. in the dead of night on holidays :-)

But the real question is, why restrict to boundary cases. There
are plenty of bugs in machines that are simply "documented away" as
implementation-defined behavior, like the inability of an LSI/11 to do
a multiply with one operand in ROM :-) Not to pick on DEC, but I've
been using their machines for a while. Anyway, the "boundary case" bugs
are simply what is left over after eliminating:

1) Bugs that are caught during design review.
2) Bugs that are noticed early in testing, and get fixed in Rev 2
3) Bugs that occur before very many systems have been shipped, and get
"documented away"

Mike

| Mike Albaugh (alb...@agames.com || netcom.com!agames.com!albaugh)
| Atari Games Corp (Arcade Games, no relation to the makers of the ST)
| 675 Sycamore Dr. Milpitas, CA 95035 voice: (408)434-1709
| The opinions expressed are my own (Boy, are they ever)

Rich Stewart

unread,

Mar 10, 1993, 6:01:30 PM3/10/93

to

In article <1ngod...@biolante.UU.NET> str...@biolante.UU.NET (Josh Osborne) writes:

You do NOT remember correctly.

fld -> f2-f32; ld -> r1,r31 . There are instructions to move between
float and integer registers. They cost a few clocks.
Floating point loads, just load the data, nothing else. They are only 32 , 64
, and 128 bit loads. (128 takes 2 bus accesses).

Integer loads are 8,16, and 32 bit.

-Rich
--
rste...@megatek.com

John R. Levine

unread,

Mar 10, 1993, 10:59:59 PM3/10/93

to

>I understand that the Vax 11/750 had similar problems, which managed
>to break the inner-loop code for printf(3) !

No kidding. A group of us at Yale brought up 4.1BSD (I think it was 4.1,
might have been 4.0) on an early 750 in about 1976. Our host machine was
a PDP-11/45, so the cross-compiled kernel rebuilds were painful.

We found two separate microcode bugs in the MOVTUC instruction which was
in the inner loop of printf. MOVTUC copied a string from one place to
another, looking up each character in a translation table, stopping when
it hit a given translated value. It was the fastest way to scan a string
for a null or a percent sign, or at least would have been if it worked.

One bug was that it was supposed to update some of the registers to note
how much of the source and destination strings had been scanned, but the
750 didn't update the destination registers like it was supposed to. We
hacked around that, and found that if the source and destination strings
appeared to overlap, based on the length counts, it stored garbage in
random places in memory. We eventually gave up and waited for Bill Joy,
who had a 780 to host from, to get a working version.

For many years, in the inner loop of Vax printf there was Bill's comment
"comet sucks" where he diked out the MOVTUC. (Comet was the internal
project name for the 750. The 780 was starlet.) For all I know, the
comment is still there. We found that DEC had no interest in fixing the
microcode bugs because no VMS software depended on MOVTUC working
correctly.

The 750 had lots of other microcode bugs as well. The best known is that
read-only pages in the stack segment didn't work reliably, which meant
that copy-on-write stacks didn't work. This is the main reason that BSD
was so slow to do copy-on-write fork. In retrospect, copy-on-touch would
have worked, since VMS did page the user stack, and for stack pages c-o-t
performs just as well as c-o-w.

Regards,
John Levine, jo...@iecc.cambridge.ma.us, {spdcc|ima|world}!iecc!johnl

Torben AEgidius Mogensen

unread,

Mar 11, 1993, 5:39:03 AM3/11/93

to

Jonathan Thornburg (jona...@chpc.utexas.edu) wrote:
> In article <1ndkro...@senator-bedfellow.MIT.EDU> j...@athena.mit.edu
> (John F Carr) writes:
> >The [IBM] RT had some bugs associated with variable length instructions:
> >
> > . on exiting a loop with certain instruction alignment, 16 bit
> > blocks could be fetched out of order under certain (rare) conditions
> >
> > . when an instruction in a branch delay slot spans a page boundary
> > and the second page isn't valid, bad things happen
> >
> >Does anyone know of similar bugs in other machines with variable length
> >instructions?

The 6502 has an indirect jump: the instruction specifies a 16 bit
immediate address, where the actual destination is fetched. If the 16
bit immediate field spans a page (256 bytes) boundary, the address is
composed of the last 8 bits and first 8 bits of the current page
rather than the last 8 bits of the current page and the first 8 bits
of the next. This behaviour is well documented, so some might not call
it a bug. It is also easy to avoid, as it can be detected at
compilation/assembly time.

The ARM processors has several documented cases of "unexpected
behavior". Most of these only applies to non-user modes, so mainly OS
writers are affected. The list below is (much abbreviated) from an
article Alasdair Thomas of ARM Ltd. posted to comp.sys.acorn. Note
that these bugs has nothing to do with variable length instructions.

On the ARM2, the "banked" registers can not be accessed in the first
cycle after a mode change.

On both ARM2 and ARM3, write-back of resulting address when storing
user-mode registers in non-user mode (by STM) can cause write-back to
the user-mode equivalent register instead of the specified non-user
mode register. Also, when loading (with LDM) into user-mode registers
from non-user mode, the banked registers can not be accessed in the
next instruction.

There are also some special error conditions (such as illegal
addresses or undefined instructions) that in special cases might not
be caught or give wrong trap numbers.

I don't know the status of these "bugs" in the ARM6 core. I expect
that they have been fixed.

Torben Mogensen (tor...@diku.dk)

Jan Vorbrueggen

unread,

Mar 11, 1993, 3:17:30 PM3/11/93

to

In article <1993Mar10.1...@dms.agames.com>
alb...@dms.agames.com (Mike Albaugh) writes:

My personal favorite was the 780 bug with RTE (or whatever DEC
calls it) back to emulation mode where the stack frame crossed a page
boundary with the lower page was valid but the upper wasn't. This would
"suck dust" into the PSW and proceed :-) My boss actually found it in a
home-brew RT-11 emulator, but after reporting it (and getting a polite
"go away boy, you bother me" from DEC) we also found that it could hit
stand-alone BACKUP. Our "solution" was to have the trap handler "touch"
the first and last words of the frame just before returning, but we don't
know whether DEC ever made similar changes to BACKUP.

As far as I know, it was a bug in REI (return from exception or
interrupt) when checking for pending ASTs during the AST exit code.
Standalone Backup was hit first because it was the first (and only?)
programme to consist almost entirely of AST-driven code going as
quickly as the machines peripherals allow.

I believe that, to this day, the piece of code doing AST exit contains
a sanity check to reset a rogue FP register (caused by the bug) to a
"sensible" value. Or is the 780 no longer supported under current versions
of VMS :-)?

Jan

Chris Torek

unread,

Mar 12, 1993, 7:06:44 AM3/12/93

to

In article <930310225...@iecc.cambridge.ma.us>
jo...@iecc.cambridge.ma.us (John R. Levine) writes:
>[We] found that if the source and destination strings [for the VAX MOVTUC
>(move translated until character) instruction] appeared to overlap, based
>on the length counts, it stored garbage in random places in memory. ...

>For many years, in the inner loop of Vax printf there was Bill's comment

>"comet sucks" where he diked out the MOVTUC. ...

To be fair to DEC, this is documented in the VAX architecture
handbook. It specifically says that `if the destination string
overlaps the translation table, the destination string is
unpredictable.' (I believe this was the real problem.)

(This is not to say that CISC machines, including the VAX, have been
bug-free.)
--
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
Berkeley, CA Domain: to...@ee.lbl.gov

John Redford

unread,

Mar 12, 1993, 3:50:06 AM3/12/93

to

In article <930310225...@iecc.cambridge.ma.us> jo...@iecc.cambridge.ma.us (John R. Levine) writes:

For many years, in the inner loop of Vax printf there was Bill's comment
"comet sucks" where he diked out the MOVTUC. (Comet was the internal
project name for the 750. The 780 was starlet.) For all I know, the
comment is still there. We found that DEC had no interest in fixing the
microcode bugs because no VMS software depended on MOVTUC working
correctly.

I never worked on the 750, but I did work on other VAX designs, and
this was absolutely not our attitude. If there was a bug, it got
fixed. It didn't matter whether VMS used it or not. We knew that
there were dusty executables out there, and that people depended on every
nit working correctly. In fact, VAX designs were (and are) verified
by doing literally millions of cases of random instruction sequences
and making sure that the new design's behavior matched the old. The
fact that you can run 15-year-old .exe files on the latest (and now
11th) implementation of the VAX architecture speaks well for how
carefully they've been debugged.

/jlr (John Redford, jred...@bbn.com)

Alan Christiansen

unread,

Mar 16, 1993, 1:03:21 AM3/16/93

to

alb...@dms.agames.com (Mike Albaugh) writes:

>Jonathan Thornburg (jona...@chpc.utexas.edu) wrote:
>: In article <1ndkro...@senator-bedfellow.MIT.EDU> j...@athena.mit.edu
>: (John F Carr) writes:
>: >The [IBM] RT had some bugs associated with variable length instructions:
>: >
>: > . on exiting a loop with certain instruction alignment, 16 bit
>: > blocks could be fetched out of order under certain (rare) conditions
>: >
>: > . when an instruction in a branch delay slot spans a page boundary
>: > and the second page isn't valid, bad things happen
>: >
>: >Does anyone know of similar bugs in other machines with variable length
>: >instructions?
>:

[.........]

> But the real question is, why restrict to boundary cases. There
>are plenty of bugs in machines that are simply "documented away" as
>implementation-defined behavior, like the inability of an LSI/11 to do
>a multiply with one operand in ROM :-) Not to pick on DEC, but I've
>been using their machines for a while. Anyway, the "boundary case" bugs
>are simply what is left over after eliminating:

>1) Bugs that are caught during design review.
>2) Bugs that are noticed early in testing, and get fixed in Rev 2
>3) Bugs that occur before very many systems have been shipped, and get
> "documented away"

The DSP96002 has a whole bunch of documented exceptional behavoir patterns.
It might be easy to describe these as bugs "Documented away" I however
believe that all the documented behavoirs I have read are natural consequences
of the architecture that were not fixed as the solutions would have an
impact on performance. They may have been fixed to the extent that the
naturally implied internalk bus collisions dont happen but I have not
read this anywhere it is simply stated that these particular instructions
sequences are illegal.

The point is be careful laying the bug documented away at the door
of the processor designer.

The documented away bugs I have seen in the DSP96002 were I am sure known
before silicon was made. Thus they are not really documented away bugs.

Alan

PS This in no way implies that i do not believ that processor manufactures
do not document away bugs that they find late in the design.

A Myles

unread,

Mar 16, 1993, 5:28:00 AM3/16/93

to

al...@saturn.cs.swin.OZ.AU (Alan Christiansen) writes:

>alb...@dms.agames.com (Mike Albaugh) writes:

>PS This in no way implies that i do not believ that processor manufactures
>do not document away bugs that they find late in the design.

How were the "extra" Z80 opcodes found - did zilog announce them, or
did someone with knowledge of the design work them out?

i.e. the sls, and IXH, IXL, IYH, IYL instruction and operand types.

Do Z80s still have these? They used to be popular for a while in commercial
(read games) code to fool hackers disassamblers, despite the possiblity
that zilog may rectify them at a moments notice.

Andy.
--
Andrew Myles aj...@ee.ed.ac.uk| Newer, brighter
Integrated Systems Group PG office. 031 650 5665 | whiter, sadder
----- | .signature...
Yes, I am into flagellation. My favourite is the American one with 51 stars...

Michael Hermann

unread,

Mar 16, 1993, 7:14:47 AM3/16/93

to

aj...@festival.ed.ac.uk (A Myles) writes:

>al...@saturn.cs.swin.OZ.AU (Alan Christiansen) writes:

>How were the "extra" Z80 opcodes found - did zilog announce them, or
>did someone with knowledge of the design work them out?

>i.e. the sls, and IXH, IXL, IYH, IYL instruction and operand types.

I think by try and error. They are fairly regular and easy to find once
you actually get the idea that there MAY be undocumented opcodes.

>Do Z80s still have these? They used to be popular for a while in commercial

To my knowlegde, yes.
In the Z280 they were regular instructions (most of them).
In the Z180 you bite the dust with them. Some of my programs actually
used (a few) and I had to recode for the Z180.

>Andy.

If you want to see the really nice bugs, get a bug report for the 32000 series.
Sometimes the buglist actually *decreased* after a new revision, but not
as a general rule. Sometimes you had the impression of bug swapping.

Michael

--
Dipl.-Ing. Michael Hermann m...@regent.e-technik.tu-muenchen.de
Lehrstuhl fuer Rechnergestuetztes Entwerfen, Postfach 202420
Technische Universitaet Muenchen 089/55174331

David Hembrow

unread,

Mar 17, 1993, 7:03:57 AM3/17/93

to

A Myles (aj...@festival.ed.ac.uk) wrote:

: al...@saturn.cs.swin.OZ.AU (Alan Christiansen) writes:
:
: >alb...@dms.agames.com (Mike Albaugh) writes:
:
: >PS This in no way implies that i do not believ that processor manufactures
: >do not document away bugs that they find late in the design.
:
: How were the "extra" Z80 opcodes found - did zilog announce them, or
: did someone with knowledge of the design work them out?
:
: i.e. the sls, and IXH, IXL, IYH, IYL instruction and operand types.

There were quite obvious holes where the "sls" fitted. The IXL etc.
instructions were the result of obvious experiments of adding the IX/IY
prefix byte to an existing instruction which dealt with H or L.

--
David Hembrow EO Europe Ltd.,
email: dhem...@eoe.co.uk Abberley House, Granhams Road,
Great Shelford, Cambridge CB2 5LQ, England