Some facts about the Acorn RISC Machine

RWi...@acorn.co.uk

unread,

Nov 2, 1988, 1:03:47 PM11/2/88

to

There have now been enough partially correct postings about the Acorn RISC
Machine (ARM) to justify semi-official comment.

History:

ARM is a key member of a 4 chip set designed by Acorn, beginning in 1984, to
make a low cost, high performance personal computer. Our slogan was/is "MIPs
for the masses". The casting vote in each design decision was to make the
final computer economic.

The chips are (1) ARM: a 32 bit RISC Microprocessor; (2) MEMC: a MMU and
DRAM/ROM controller; (3) VIDC: a video CRTC with on chip DACs and sound; and
(4) IOC: a chip containing I/O bus and interrupt control logic, real time
clocks, serial keyboard link, etc.

The first ARM (that referred to by David Chase @ Menlo Park) was designed at
Acorn and built using VLSI Technology Inc's (VTI) 3 micron double level metal
CMOS process using full custom techniques; samples, working first time, were
obtained on 26th April 1985. The target clock was 4MHz, but it ran at 8. The
timings that David gives are for the ARM Evaluation System, where ARM was run
at 3.3MHz and 6.6MHz (20/3) for initial and page-mode DRAM cycles,
respectively. The ARM comprises 24,000 transistors (circa 8,000 gates). Every
instruction is conditional, but there are neither delayed loads/stores nor
delayed branches (sorry, Martin Hanley). Call is via Branch and Link (same
timing as Branch). All instructions are abortable, to support virtual memory.

The first VIDC was obtained on 22nd Oct 1985, the first MEMC on 25th Feb 1986,
and the first IOC 30th Apr 1986. All were "right first time".

We then redesigned ARM to make it go faster (since, by this time, Acorn had
decided roughly what market to aim the completed machines at and 8MHz minimum
capability was required - but we did continue to develop software on the 3
micron part!). Some more FIQ registers were added, bringing the total to 27
(some of our "must go as fast as possible for real time reasons" code didn't
manage with the smaller set). A multiply instruction (2 bits per cycle,
terminate when multiplier exhausted so that 8xn multiply takes 4 cycles max)
and a set of coprocessor interfaces were added. Scaled indexed by register
shifted by register (i.e. effective address was ra+rb<<rc) was removed from
the instruction set (too hard to compile for) [scaled indexed by register
shifted by constant was NOT removed!].

The new, 2 micron ARM was right first time on 19th Feb 1987. It's peak
performance was 18MHz; its die size 230x230 mil^2; 25,000 transistors.

VTI were given a license to sell the chips to anyone. They renamed the chips:
VL86C010 (ARM), VL86C110 (MEMC), VL86C310 (VIDC), VL86C410 (IOC).

Acorn released volume machines "Acorn Archimedes" in June 1987. Briefly:
A305: 1/2 MByte, 1MByte floppy, graphics to 640x514x16 colours
A310: ditto, 1MByte
A310M: ditto with PC software emulator (circa a PC XT, if you're interested)
A440: 4MByte, 20MByte hard disc, 1152x896 graphics also.
All machines have ARM at 4/8MHz (circa 5000 dhrystones 1.1), 8 channel sound
synthesiser, proprietry OS, 6502 software emulator, software.... Prices
between 800 and 3000 pounds UK with monitor and mouse and all other useful
bits. Not available in the US, but try Olivetti Canada.

VTI make ARM available as an ASIC cell. Sanyo have taken a second source
license (in April 1988) for the chip set, and make a 32 bit microcomputer
(single chip controller). In "VLSI Systems Design" July 1988, the following
statements are made by VTI: ARM in 1.5 micron (18-20MHz clock), 180x180 mil^2;
future shrink to 1 micron (they are expecting "perhaps 40MHz" and 150 mil
square with the price dropping from $50 to $15); expected sales in 1988
90-100,000 units.

Contact Ron Cates, VTI Application Specific Logic Products Division,
Tempe, Arizona for details (e.g. the "VL86C010 RISC Family Data Manual").

Plug in boards for PCs are available. A controller for Laser printers
with ARM, MEMC, VIDC and 4MBytes DRAM has been sold to Olivetti [Acorn'
parent company as of 1985-6] (contact SWoo...@acorn.co.uk if you want to
know more).

In the Near Future:

We have a Floating Point Coprocessor interface chip working "in the lab" - the
fifth member of the four chip set. It interfaces an ATT WE32206 to ARM's
coprocessor bus. It benchmarks at 95.5 KFlops LINPACK DP FORTRAN Rolled BLAS
(slowest) (11KFlops with a floating point emulator) on an A310. Definitely
have to make our own, some time...

Acorn is about to release UNIX 4.3BSD including TCP/IP, NFS, X Windows and
IXI's X.desktop on the A440. Contact MJe...@acorn.co.uk or
DSl...@acorn.co.uk for more info (and to be told that it isn't available in
the US {yet}).

Operating Systems:

Acorn's proprietry OS "Arthur" is written in machine code: it fills 1/2MByte
of ROM! (yes, writing in RISC machine code is truly wonderful as others have
noted on comp.arch). Its main features are windows, anti-aliased fonts
(wonderful at 90 pixels per inch - I use 8 point all the time) and sound
synthesis. It runs on all Archimedes machines. A 2nd release is due real soon
now and features multitasking, a better desktop and a name change to RISC OS.

VTI are porting VRTX to the ARM; Cambridge (UK) Computer Lab's Tripos has been
ported to A310/A440. UNIX has been ported by Acorn: see above. There are MINIX
ports everywhere one looks (try querying the net...).

Software:

C Compiler: ANSI/pcc; register allocation by graph colouring; code motion;
dead code elimation; tail call elimination; very good local code generation;
CSE and cross-jumping work and will be in the next release. No peepholing (yet
- not much advantage, I'm afraid). Can't turn off most optimisation features.
Also FORTRAN 77, ISO PASCAL, interpreted BASIC (structured BBC BASIC, very
fast), Forth, Algol, APL, Smalltalk 80 (as seen at OOPSLA 88: on an A440 it
approximates a Dorado) and others (LISP, Prolog, ML, Ponder, BCPL....).

Specific applications for Archimedes computers are too numerous to mention!
(though the high speed Mandelbrot calculation has to be seen to be believed -
one iteration of the set in 28 clock ticks [32 bit fixed point] real time
scroll across the set [calculate row/column in a frame time and move the
picture]).

There is a part of the net that talks about Archimedes machines:
(eunet.micro.acorn).

Random Info:

Code density is approximately that of 80x86/68020. Occasionally 30% worse
(usually on very small programs).

The average number of ticks per instruction 1.895 (claims VTI - we've never
bothered to measure it).

DRAM page mode is controlled by the MEMC, but there is a prediction signal
from the ARM saying "I will use a sequential address in the next cycle" which
helps the timing a great deal! S=125nS, N=250nS with current MEMC and DRAM
(see David Chase's article for instruction timing). Static RAM ARM systems
have been implemented up to 18MHz - S=N=1/18 with these systems.

Approximately 1000 dhrystones 1.1 per MHz if N=S; about 1000/1.895 dhrystones
per MHz if N=2S (i.e. 5K dhrystones for a 4/8MHz system; 18K dhrystones for
an 18/18MHz system).

Most recent features: Electronic Design Jul 28 1988, VLSI Systems Design July
1988.

We had a competition to see who would use "ra := rb op rc shifted by rd" with
all of ra, rb, rc and rd actually different registers, but the graphics people
won it too easily!

ARM's byte sex is as VAX and NS32000 (little endian). The byte sex of a 32 bit
word can be changed in 4 clock ticks by:
EOR R1,R0,R0,R0R #16
BIC R1,R1,#&FF0000
MOV R0,R0,ROR #8
EOR R0,R0,R1,LSR #8
which reverses R0's bytes. Shifting and operating in one instruction is fun.

Shifted 8bit constants (see David Chase's article) catch virtually everything.

Major use of block register load/save (via bitmask) is procedure entry/exit.
And graphics - you just can't keep those boys down. The C and BCPL compilers
turn some multiple ordinary loads into single block loads.

MEMC's Content Addressable Memory inverted page table contains 128 entries.
This gives rather large pages (32KBytes with 4MBytes of RAM) and one can't
have the same page at two virtual addresses. Our UNIX hackers revolted, but
are now learning to love it (there's a nice bit in the standard kernel which
goes "allocate 31 pages to start a new process"....)

Data types: byte, word aligned word, and multi-word (usually with a
coprocessor e.g. single, double, double extended floating point).

Neatest trick: compressing all binary images by around a factor of 2. The
decompression is done FASTER than reading the extra data from a 5MBit
winchester!

Enough! (too much?) Specific questions to me, general brickbats to the net.

.....Roger Wilson (RWi...@Acorn.co.uk)

DISCLAIMER: (I speak for me only, etc.)

The above is all a fiction constructed by an outline processor, a thesaurus
and a grammatical checker. It wasn't even my computer, nor was I near it at
the time.

Der Tynan

unread,

Nov 4, 1988, 5:37:04 PM11/4/88

to

In article <5...@acorn.UUCP>, RWi...@acorn.co.uk writes:

> Operating Systems:

> Acorn's proprietry OS "Arthur" is written in machine code: it fills 1/2MByte
> of ROM! (yes, writing in RISC machine code is truly wonderful as others have
> noted on comp.arch).

> .....Roger Wilson (RWi...@Acorn.co.uk)

Why write THAT MUCH code? I mean, C ain't that bad, and it certainly would've
been a *lot* easier. The speed penalty and added size (although my guess is,
it wouldn't be much) surely couldn't justify 100,000 lines (roughly) of
machine code, or am I missing something...
- Der
--
Reply: dty...@sultra.UUCP (Der Tynan @ Tynan Computers)
{mips,pyramid}!sultra!dtynan
Cast a cold eye on life, on death. Horseman, pass by... [WBY]

John Mashey

unread,

Nov 4, 1988, 9:13:06 PM11/4/88

to

In article <5...@acorn.UUCP> RWi...@acorn.co.uk writes:
...
>Acorn's proprietry OS "Arthur" is written in machine code: .....

>now and features multitasking, a better desktop and a name change to RISC OS.

You will want to pick another name: that's a bit close:

RISC/os is a Trademark of MIPS Computer Systems.
--
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR ma...@mips.com
DDD: 408-991-0253 or 408-720-1700, x253
USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

Hugh LaMaster

unread,

Nov 16, 1988, 3:44:13 PM11/16/88

to

In article <5...@acorn.UUCP> RWi...@acorn.co.uk writes:

>Acorn and built using VLSI Technology Inc's (VTI) 3 micron double level metal
>CMOS process using full custom techniques; samples, working first time, were

>respectively. The ARM comprises 24,000 transistors (circa 8,000 gates). Every

>
>
>

I know this must be obvious to many hardware types out there, but, is this
ratio of transistors/gates typical for CMOS?

--
Hugh LaMaster, m/s 233-9, UUCP ames!lamaster
NASA Ames Research Center ARPA lama...@ames.arc.nasa.gov
Moffett Field, CA 94035
Phone: (415)694-6117

F. L. Charles Seeger III

unread,

Nov 21, 1988, 8:53:32 PM11/21/88

to

Well, I haven't seen a posted answer to this:

In article <18...@ames.arc.nasa.gov> lama...@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
|In article <5...@acorn.UUCP> RWi...@acorn.co.uk writes:
|>respectively. The ARM comprises 24,000 transistors (circa 8,000 gates).
|

|I know this must be obvious to many hardware types out there, but, is this
|ratio of transistors/gates typical for CMOS?

I can't speak much about real world practice, but can give a brief
background, since no one else has answered publicly. Of course, counting
these things is a bit open to interpretation, e.g. what is a typical
gate? how to count a PLA or memory cell? etc. I prefer to count
transistors and IO pads, realizing, of course, that yield and die area
are the most important economic measures, for a given process.

A fully static CMOS gate will typically have two transistors
per input signal. This is why CMOS transistors are sometimes counted
in pairs (one n-type and one p-type). If we assume as our reference
gates with two inputs, then we should expect 4 transistors/gate.
However, CMOS is very rich in the variety of logic circuit techniques
that can be used, most of which reduce the transistor count but may
decrease speed, increase circuit complexity and create debugging
problems (i.e. it tends to be easier to debug fully static designs).
Good simulation tools make these trade-offs more predictable. For
example, these logic structures include pseudo-nmos, dynamic, clocked,
domino, pass transistors and cascade voltage switches. The use of
dynamic logic is signified by a *minimum* clock rate for a part, which
is common for microprocessors. Weste and Eshraghian's book,
_Principles_of_CMOS_VLSI_Design_A_Systems_Perspective_ ('85), is a good
introduction to these logic families.

I hope this wasn't too far off from the answer that you were seeking.
Chuck

--
Charles Seeger 216 Larsen Hall
Electrical Engineering University of Florida
see...@iec.ufl.edu Gainesville, FL 32611

Hugh LaMaster

unread,

Nov 22, 1988, 10:26:24 PM11/22/88

to

In article <19...@uflorida.cis.ufl.EDU> see...@beach.cis.ufl.edu (F. L. Charles Seeger III) writes:

>I can't speak much about real world practice, but can give a brief
>background, since no one else has answered publicly. Of course, counting
>these things is a bit open to interpretation, e.g. what is a typical
>gate? how to count a PLA or memory cell? etc. I prefer to count

Well, what I am really driving at is "4 input nand gate equivalents".
(Or 2 input nand gate equivalents, if that makes more sense...)
The reason is just that there is no convenient way to judge CPU
complexity by "number of transistors" without knowing a lot about the
technology. I am looking for a ROUGH estimate of complexity independent
of the number of transistors.

I may be wrong, but may guess is that there is some easy and CONSISTENT
way to measure CPU complexity which is ROUGHLY correct (within a factor
of two). Am I the only person out there wondering how much logic can be
packed onto one of those new GaAs micros, and would like to compare that
to a uVAX or MIPS chip? Since this isn't going to be of use to
marketeers, a simple measure which is USUALLY only slightly misleading
would be of use.

F. L. Charles Seeger III

unread,

Nov 23, 1988, 12:27:08 AM11/23/88

to

In article <19...@uflorida.cis.ufl.EDU> I wrote:
|>I can't speak much about real world practice, but can give a brief
|>background, since no one else has answered publicly. Of course, counting
|>these things is a bit open to interpretation, e.g. what is a typical
|>gate? how to count a PLA or memory cell? etc. I prefer to count

In article <18...@ames.arc.nasa.gov> lama...@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
|Well, what I am really driving at is "4 input nand gate equivalents".
|(Or 2 input nand gate equivalents, if that makes more sense...)

It doesn't necessarily make any more sense, but "equivalent 2-input
nand gates" has been used, at least by some ASIC vendors.

|The reason is just that there is no convenient way to judge CPU
|complexity by "number of transistors" without knowing a lot about the
|technology. I am looking for a ROUGH estimate of complexity independent
|of the number of transistors.
|
|I may be wrong, but may guess is that there is some easy and CONSISTENT
|way to measure CPU complexity which is ROUGHLY correct (within a factor
|of two). Am I the only person out there wondering how much logic can be
|packed onto one of those new GaAs micros, and would like to compare that
|to a uVAX or MIPS chip?

Modulo 2, transistor count may be as good as anything, especially if you
account for the technology. It is certainly easier to make this count
than that of gates, though we should restrict ourselves to counting
logic gates/transistors to measure complexity, i.e. not drive inverters/
transistors. We might want to count memory/register cells separately.
If roughness is OK, the number of transistors in an N-input gate rarely
fall outside the (1.25-2.0)*N range.

The only other way to make this measurement, that springs to my feeble
mind, is to do some sort of analysis of the logic equations and state
machines (automata, for you CS guys) that are implemented. I won't
even think about trying to measure the dynamic complexity of the chip,
though this is probably the most important. However, this sort of
approach is open to the same problems as regular benchmarking. It
certainly wouldn't be "easy and CONSISTENT." Then again, I only
vaguely have an idea as to what complexity is. If I don't know how
to measure it ...

|Since this isn't going to be of use to
|marketeers, a simple measure which is USUALLY only slightly misleading
|would be of use.

If many people start quoting it, the marketeers will, too. Transistor
counts used to be mentioned as new microprocessors were introduced. It's
probably progress to quote MIPS/dhrystones per buck rather than the
gate count (i.e. the dynamic utility vs. static existence). Either way,
there's a lot more to life than can be stuffed into a checksum.

Regards

ag...@mcdurb.urbana.gould.com

unread,

Nov 25, 1988, 4:03:00 PM11/25/88

to

|The reason is just that there is no convenient way to judge CPU
|complexity by "number of transistors" without knowing a lot about the
|technology. I am looking for a ROUGH estimate of complexity independent
|of the number of transistors.
|
|I may be wrong, but may guess is that there is some easy and CONSISTENT
|way to measure CPU complexity which is ROUGHLY correct (within a factor
|of two). Am I the only person out there wondering how much logic can be
|packed onto one of those new GaAs micros, and would like to compare that
|to a uVAX or MIPS chip?

One of the trade mags (I think it's VLSI Design) is flogging a set
of "benchmark circuits". The idea is that you take, say, MSI components,
and then see how many devices and how much area it takes to implement
that component in a given technology. The purpose is mainly to compare
gate arrays and not-quite-full-custom logic families -- in gate arrays
especially, the way in which the manufacturer lays the array out
can make a great difference in how costly it is to implement a
given function. It may be possible to customize these benchmarks
in order to compare full-custom technologies.

Andy "Krazy" Glew ag...@urbana.mcd.mot.com uunet!uiucdcs!mcdurb!aglew
Motorola Microcomputer Division, Champaign-Urbana Design Center
1101 E. University, Urbana, Illinois 61801, USA.

My opinions are my own, and are not the opinions of my employer, or
any other organisation. I indicate my company only so that the reader
may account for any possible bias I may have towards our products.

ag...@mcdurb.urbana.gould.com

unread,

Nov 26, 1988, 12:43:00 AM11/26/88

to

>If many people start quoting it, the marketeers will, too. Transistor
>counts used to be mentioned as new microprocessors were introduced. It's
>probably progress to quote MIPS/dhrystones per buck rather than the
>gate count (i.e. the dynamic utility vs. static existence). Either way,
>there's a lot more to life than can be stuffed into a checksum.

>--
> Charles Seeger

Now there's a thought. Maybe we should Go:delize a complete
description of each system, and let the marketeers play with that.