ROMP

105 views
Skip to first unread message

Thomas Koenig

unread,
Jun 3, 2022, 4:38:25 PMJun 3
to
Looking around for a bit of computer history, I stumbled across
the ROMP, the 801 version that IBM commercialized in 1986 as
the RT PC.

http://bitsavers.org/pdf/ibm/pc/rt/aix/SC23-0802-0_AIX_2.1_Assembler_Language_Reference_198701.pdf
has the ISA. It has 16 32-bit general purpose registers.
Instructions are either 16 or 32 bits, and generally two-operand
(with one exception: they actually managed to squeeze ra = rb +
rc into a 16-bit instruction). Branches can be either with or
without a delay slot. There are many instructions which either have
a four-bit immediate (in two byte intructions) or a 16-bit immediate
(in four byte instructions). There is also an MQ register (shadows
of the IBM 704) and instructions for performing multiplications
two bits at a time and a "divide step" instruction for doing one
bit of division (with some postprocessing).

By and large, it was a commercial failure. Apparently,
IBM had the chip ready by 1981, and it appears to have had a
similar performance at 5.88 MHz to the 68000 at 12.5 MHz, at
least if one is to believe the Dhrystone benchmark figures at
https://www.tech-insider.org/unix/research/1986/0219.html , but
it could not compete in speed against Sun and other workstations
running at 16.67 MHz, let alone aginst the 68030 and later RISC
processors. Later, of course, IBM stared developing the RS/6000.

So, an interesting historical architecture.

John Levine

unread,
Jun 3, 2022, 5:04:53 PMJun 3
to
According to Thomas Koenig <tko...@netcologne.de>:
>Looking around for a bit of computer history, I stumbled across
>the ROMP, the 801 version that IBM commercialized in 1986 as
>the RT PC. ...

>By and large, it was a commercial failure. Apparently,
>IBM had the chip ready by 1981, ...

I wrote a fair amount of the system software for RT AIX. If IBM
had shipped the RT PC when they planned to, it would have been
competitive, but there were internal fights about positioning
and whether it would canibalize other product lines. AIX had
poor performance because it sat on top of the VRM, an extended
virtual machine monitor that was supposed to allow other operating
systems to coexist but never did. There was a native BSD port
that performed a lot better.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Andy Valencia

unread,
Jun 3, 2022, 7:50:26 PMJun 3
to
Thomas Koenig <tko...@netcologne.de> writes:
> Looking around for a bit of computer history, I stumbled across
> the ROMP, the 801 version that IBM commercialized in 1986 as
> the RT PC.

I had one of these. They pushed this out with AIX, and absolutely
nobody in academia wanted one. So IBM did a BSD port, and that's
what I had running on it.

> By and large, it was a commercial failure. Apparently,
> IBM had the chip ready by 1981, and it appears to have had a
> similar performance at 5.88 MHz to the 68000 at 12.5 MHz,

That sounds about right. A few years later, I dealt with
HP/UX on the 68000. My memory is the HP 9000/310 (68010) was
comparable to an RT PC, and the 9000/320 was well ahead of it.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Anne & Lynn Wheeler

unread,
Jun 3, 2022, 9:56:38 PMJun 3
to
Thomas Koenig <tko...@netcologne.de> writes:
> Looking around for a bit of computer history, I stumbled across
> the ROMP, the 801 version that IBM commercialized in 1986 as
> the RT PC.

ROMP was originally going to be for a displaywriter followon, running
CP.r implemented in PL.8. When displaywriter followon was canceled, they
decided to retarget to the unix workstation market. They got the company
that had done PC/IX AT&T unix port for IBM/PC ... to do one for ROMP
("AIX"). They also had all these IBM PL.8 programmers ... and so had
the PL.8 programmers do the VRM abstract "virtual machine" ... and
directed the company doing the unix port, to implement to the VRM
abstract "virtual machine" ... claiming that the combined effort to do
VRM plus the unix port to VRM ... would be much less than having the
unix company port directly to the real hardware.
https://en.wikipedia.org/wiki/IBM_RT_PC

There was an IBM group out in Palo Alto working on BSD port to IBM (370)
mainframe ... who got redirected to do BSD port to ROMP (instead)
... which required drastically less resources than the VRM+AIX effort
and done in signifcantly less elapsed time ... which shipped as "AOS"
for the PC/RT.

one of the side effects of the VRM+AIX was UNIX tradition of being able
to easily do new device drivers ... then required a new VRM PL.8 device
driver and a AIX device driver in "C".

The Palo Alto group had also been working with UCLA and had done a Locus
port to the the IBM Series/1 and working a 370 port ... which eventually
ships as AIX/370 (along with AIX/386)
https://dl.acm.org/doi/10.1145/773379.806615
https://www.cs.princeton.edu/courses/archive/fall03/cs518/papers/locus.pdf
https://en.wikipedia.org/wiki/Locus_Computing_Corporation

some topic drift:
https://en.wikipedia.org/wiki/IBM_RT_PC#As_part_of_the_NSFNET_backbone
In 1987, "The NSF starts to implement its T1 backbone between the
supercomputing centers with 24 RT-PCs in parallel implemented by IBM as
"parallel routers". The T1 idea is so successful that proposals for T3
speeds in the backbone begin. Internet History of 1980.

...

Early 80s, I had HSDT project doing T1 and faster computer links (both
terrestrial and satellite) and was working with the NSF director and was
supposed to get $20M to interconnect the NSF supercomputer centers; then
congress cuts the budget, some other things happen and eventually
releases an RFP (in part based on what we already had running, like
requring T1 links). Preliminary announce (Mar1986)
http://www.garlic.com/~lynn/2002k.html#12

The OASC has initiated three programs: The Supercomputer Centers Program
to provide Supercomputer cycles; the New Technologies Program to foster
new supercomputer software and hardware developments; and the Networking
Program to build a National Supercomputer Access Network - NSFnet.

...

Internal politics prevent us from bidding ... the NSF director tries to
help by writing the company a letter (with support from other agencies)
but that just makes the internal politics worse (as well as comments
that what we already had running was at least 5yrs ahead of winning
bid). Note the winning bid had (PC/RT) 440kbit links and to give it
appearance of meeting the RFP, had T1 trunks with telco multiplexors
running multiple 440kbit links over the T1 trunks. I would ridicule why
didn't they call it a T5 network ... since possible some of the T1
trunks were possibly multiplexed in turn over T5 trunks someplace.

They did ask me to be the "red team" for the T3 response .... the "blue
team" had couple dozen people from half dozen labs around the world. I
presented 1st then the blue team presented. Five mins into the blue team
presentation, the executive in charge, pounded on the table saying he
would lay down in front of a garbage truck before he let any but the
blue team proposal go forward.

--
virtualization experience starting Jan1968, online at home since Mar1970

nemo

unread,
Jun 4, 2022, 10:42:29 AMJun 4
to
On 2022-06-03 21:56, Anne & Lynn Wheeler wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
>> Looking around for a bit of computer history, I stumbled across
>> the ROMP, the 801 version that IBM commercialized in 1986 as
>> the RT PC.
>
> ROMP was originally going to be for a displaywriter followon, running
> CP.r implemented in PL.8. When displaywriter followon was canceled, they
> decided to retarget to the unix workstation market.


Was ROMP not the 801 with the FPU torn out and later bolted back on for
the workstation?

N.

[... Really interesting history clipped ...]

John Levine

unread,
Jun 4, 2022, 11:15:06 AMJun 4
to
According to nemo <inv...@invalid.invalid>:
>Was ROMP not the 801 with the FPU torn out and later bolted back on for
>the workstation?

The ROMP was certainly based on the 801 but it wasn't the same. ROMP had 32 bit
addresses and registers rather than 24, it had an MMU, and it was one chip.

Anton Ertl

unread,
Jun 4, 2022, 12:54:21 PMJun 4
to
Andy Valencia <van...@vsta.org> writes:
>Thomas Koenig <tko...@netcologne.de> writes:
>> By and large, it was a commercial failure. Apparently,
>> IBM had the chip ready by 1981, and it appears to have had a
>> similar performance at 5.88 MHz to the 68000 at 12.5 MHz,
>
>That sounds about right. A few years later, I dealt with
>HP/UX on the 68000. My memory is the HP 9000/310 (68010) was
>comparable to an RT PC, and the 9000/320 was well ahead of it.

What was wrong with the ROMP for it to be so slow? The early RISCs
that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) all
were significantly faster than an 68020/030 at similar clock rates
(and the 68020/030 was quite a bit faster than an 68000/010). E.g.,
the HP 9000/825 with a 25MHz NS-1 CPU was rated at 9MIPS
<https://www.openpa.net/systems/hp_early-systems.html>, while the
HP9000/360 with a 25MHz 68030 was rated at 5MIPS
<https://www.hpmuseum.net/display_item.php?hw=204>.

So, even at half the clock speed, a competent RISC should beat the
68000; which leads to the question: What was wrong with the ROMP for
it to be so slow?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

John Levine

unread,
Jun 4, 2022, 1:10:46 PMJun 4
to
According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
>What was wrong with the ROMP for it to be so slow? The early RISCs
>that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) ...

Those were late 1980s chips. The 9000/825 was in 1987, 88100 in 1988,
but the ROMP was designed no later than 1980. They screwed around
until 1986 before shipping it.

It also didn't help that AIX+VRM had huge overhead. Perhaps the VRM
could have been tuned to be faster but why bother.

R's,
John

Anne & Lynn Wheeler

unread,
Jun 4, 2022, 1:29:02 PMJun 4
to
nemo <inv...@invalid.invalid> writes:
> Was ROMP not the 801 with the FPU torn out and later bolted back on
> for the workstation?

ROMP was going to be the processor for the displaywriter (8086) followon
which was canceled (likely because of the spreading success of the
IBM/PC for word processing)
https://en.wikipedia.org/wiki/IBM_Displaywriter_System

most prevalent 801 would have been Iliad chips ... late 70s was effort to
replace the myriad of internal microprocessors used in controllers,
low&mid range 370s (i.e. 4361 & 4381 followon to 4331&4341 was suppose
to be risc instead of cisc), the as/400 followon to s/38, etc. for
various reasons they all floundered and company went back to cisc
microprocessors (and some number of risc engineers left and went to risc
efforts at other vendors).

Los Gatos VLSI lab had been doing "Blue Iliad" ... would have been the
first 32-bit 801 chip (predating multi-chip RIOS for RS/6000, claimed
150 million ops, 60 million flops, 7million transisters) ... but never
completely debugged ... it was a really large, hot, single chip. trivia:
when effort to abandon the Iliad efforts ... one of the primary people
working Blue Iliad, left for HP & snake (also later on Itanium)

disclaimer: I contributed to white paper that showed that cisc had
gotten to a point where nearly the whole 370 instruction set could
be implemented directly in circuits (for 4361 & 4381) rather than
microcode simulation (which had been running around avg of ten
native instruction for each emulated 370 instruction ... i.e. a
1mip mid-range 370 had required a 10mip cisc microprocessor)

Anne & Lynn Wheeler

unread,
Jun 4, 2022, 1:33:23 PMJun 4
to
John Levine <jo...@taugh.com> writes:
> It also didn't help that AIX+VRM had huge overhead. Perhaps the VRM
> could have been tuned to be faster but why bother.

trivia: I did some work with somebody that claimed he was the person
that had tuned the VRM that got it from totally unacceptable to
barely acceptable ... or otherwise they would have scrapped the
VRM (and all the PL.8 IBM programmers) and done unix directly to
the bare hardware (like the BSD "AOS" port).

Thomas Koenig

unread,
Jun 4, 2022, 5:13:58 PMJun 4
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> Andy Valencia <van...@vsta.org> writes:
>>Thomas Koenig <tko...@netcologne.de> writes:
>>> By and large, it was a commercial failure. Apparently,
>>> IBM had the chip ready by 1981, and it appears to have had a
>>> similar performance at 5.88 MHz to the 68000 at 12.5 MHz,
>>
>>That sounds about right. A few years later, I dealt with
>>HP/UX on the 68000. My memory is the HP 9000/310 (68010) was
>>comparable to an RT PC, and the 9000/320 was well ahead of it.
>
> What was wrong with the ROMP for it to be so slow? The early RISCs
> that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) all
> were significantly faster than an 68020/030 at similar clock rates
> (and the 68020/030 was quite a bit faster than an 68000/010). E.g.,
> the HP 9000/825 with a 25MHz NS-1 CPU was rated at 9MIPS
><https://www.openpa.net/systems/hp_early-systems.html>, while the
> HP9000/360 with a 25MHz 68030 was rated at 5MIPS
><https://www.hpmuseum.net/display_item.php?hw=204>.
>
> So, even at half the clock speed, a competent RISC should beat the
> 68000; which leads to the question: What was wrong with the ROMP for
> it to be so slow?

ROMP was not the uncompromising design of the later RISCs. Only
16 registers and (mostly) R1 = R1 op R2 instructions probably made
for more spills to memory. It had a three-stage pipeline, I think
classic RISCs had more.

Interesting challenge, though - try to design something within the
budget of a ROMP or a 68000 which is clearly superior to both :-)

MitchAlsup

unread,
Jun 4, 2022, 5:18:42 PMJun 4
to
On Saturday, June 4, 2022 at 11:54:21 AM UTC-5, Anton Ertl wrote:
> Andy Valencia <van...@vsta.org> writes:
> >Thomas Koenig <tko...@netcologne.de> writes:
> >> By and large, it was a commercial failure. Apparently,
> >> IBM had the chip ready by 1981, and it appears to have had a
> >> similar performance at 5.88 MHz to the 68000 at 12.5 MHz,
> >
> >That sounds about right. A few years later, I dealt with
> >HP/UX on the 68000. My memory is the HP 9000/310 (68010) was
> >comparable to an RT PC, and the 9000/320 was well ahead of it.
<
> What was wrong with the ROMP for it to be so slow? The early RISCs
> that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) all
> were significantly faster than an 68020/030 at similar clock rates
<
The Mc 68020 had a minimum instruction count of 2 cycles with 1 cycle
increments. So, a RISC at 20 MHz was essentially equivalent to a '020 at
40 MHz.
<
> (and the 68020/030 was quite a bit faster than an 68000/010). E.g.,
> the HP 9000/825 with a 25MHz NS-1 CPU was rated at 9MIPS
> <https://www.openpa.net/systems/hp_early-systems.html>, while the
> HP9000/360 with a 25MHz 68030 was rated at 5MIPS
> <https://www.hpmuseum.net/display_item.php?hw=204>.
>
> So, even at half the clock speed, a competent RISC should beat the
> 68000; which leads to the question: What was wrong with the ROMP for
> it to be so slow?
<
I don't think ROMP fits into the RISC category, with 16-bit and 32-bit
instructions and some other weirdities.

Andy Valencia

unread,
Jun 4, 2022, 6:31:44 PMJun 4
to
John Levine <jo...@taugh.com> writes:
> It also didn't help that AIX+VRM had huge overhead. Perhaps the VRM
> could have been tuned to be faster but why bother.

On the BSD side, it's quite possible they were using pcc (I honestly
don't remember, except that the whole system "felt" BSD through and
through, so it wasn't some exotic compiler group's rethink of C).
A mediocre code generator would represent a pervasive tax, at least
for the BSD port.

Stefan Monnier

unread,
Jun 4, 2022, 8:28:18 PMJun 4
to
> Interesting challenge, though - try to design something within the
> budget of a ROMP or a 68000 which is clearly superior to both :-)

IIUC the ARM did fit comfortably within the 68000's transistor budget.
Not sure how it'd have fared within the limits of the 68000's
packaging, OTOH: it's probably hard to get much benefit from a pipeline
when you're stuck with a "4bit/cycle" memory bus and no on-chip cache.


Stefan

Anne & Lynn Wheeler

unread,
Jun 4, 2022, 11:29:27 PMJun 4
to
Andy Valencia <van...@vsta.org> writes:
> On the BSD side, it's quite possible they were using pcc (I honestly
> don't remember, except that the whole system "felt" BSD through and
> through, so it wasn't some exotic compiler group's rethink of C).
> A mediocre code generator would represent a pervasive tax, at least
> for the BSD port.

The VRM was implemented in PL.8 by all the Austin PL.8 programmers that
had been working on ROMP
https://en.wikipedia.org/wiki/PL/8
for displaywriter followon
https://en.wikipedia.org/wiki/IBM_Displaywriter_System

When the followon was canceled (looked like word processing had move to
PCs) they decided to retarget for the workstation market, the created
the "VRM" for the PL.8 programmers to implement and the company that had
done the AT&T UNIX IBM/PC PC/IX port was contracted to do port to VMR
abstract virtual layer for PC/RT AIX.

palo alto group was originally working on BSD port for (mainframe) 370
and needed a compiler.

Los Gatos lab had done lots of VLSI mainframe tools and a Pascal
compiler for mainframe using MetaWare's TWS (before pascal became IBM
mainframe product). I was then talking to one of the LSG people (that
had done the mainframe Pascal) about doing a C language front-end for
the 370 pascal compiler ... i'm was then in europe for 6weeks given
classes and lectures and when I got back the person had left IBM and
gone to work for MetaWare. I suggested to palo alto group that they
contract with MetaWare 370 C compiler (as part of their 370 BSD
port). when palo alto got redirected to do the BSD port to PC/RT (ROMP
bare hardware) they kept the MetaWare C compiler and had MetaWare do a
ROMP backend.

some metaware trivia

Efficient Computation of LALR(1) Look-Ahead Sets FRANK DeREMER and
THOMAS PENNELLO University of California, Santa Cruz, and MetaWare TM
Incorporated
https://dl.acm.org/doi/pdf/10.1145/69622.357187

Anton Ertl

unread,
Jun 5, 2022, 6:47:43 AMJun 5
to
John Levine <jo...@taugh.com> writes:
>According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
>>What was wrong with the ROMP for it to be so slow? The early RISCs
>>that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) ...
>
>Those were late 1980s chips. The 9000/825 was in 1987, 88100 in 1988,
>but the ROMP was designed no later than 1980. They screwed around
>until 1986 before shipping it.

Yes, that would created an impression of slowness compared to the
competition of the time.

But my question was why the 5.88MHz ROMP was as slow as a 12.5MHz
68000 (introduced in 1979, although the 12.5MHz speed only became
available in 1982). The 68000 takes at least 4 cycles per instruction
(and many instructions take more), while 84 of the 118 ROMP
instructions have single-cycle latency
<https://en.wikipedia.org/wiki/IBM_ROMP>. The 68000 also used a 3.5um
process and the ROMP a 2um process.

Anton Ertl

unread,
Jun 5, 2022, 7:01:16 AMJun 5
to
Thomas Koenig <tko...@netcologne.de> writes:
>ROMP was not the uncompromising design of the later RISCs. Only
>16 registers and (mostly) R1 = R1 op R2 instructions probably made
>for more spills to memory.

16 registers is plenty for reducing spilling. Sure, 32 registers
reduces spilling more if you have a good register allocator, but it's
not like it's a factor of 2 in most applications.

>It had a three-stage pipeline, I think
>classic RISCs had more.

The first ARMs have 3 stages (and there are even recent ARMs with 3
stages). HP's NS-1 also has 3 stages (how did HP manage to get 30MHz
out of that in 1987 while MIPS with its 5-stage pipeline without
interlocks took until 1988 to reach similar speed grades?).

>Interesting challenge, though - try to design something within the
>budget of a ROMP or a 68000 which is clearly superior to both :-)

To do that, one would have to know why ROMP was not so great. ARM
certainly fits the budget, but it's not clear to me why it should
perform better than ROMP.

Anton Ertl

unread,
Jun 5, 2022, 7:13:47 AMJun 5
to
84 out of 118 instructions are single-cycle, so whether it is
categorized as a RISC or not, one would expect it to perform
significantlt better than a 68000 that has twice the clock.

As for whether it is RISC, I don't think that a single instruction
size really is a relevant criterion: There is ROMP, ARM T32, MIPS16,
and the RISC-V C extension, all of which support mixing 16-bit and
32-bit instructions.

Anton Ertl

unread,
Jun 5, 2022, 7:18:57 AMJun 5
to
Andy Valencia <van...@vsta.org> writes:
>John Levine <jo...@taugh.com> writes:
>> It also didn't help that AIX+VRM had huge overhead. Perhaps the VRM
>> could have been tuned to be faster but why bother.
>
>On the BSD side, it's quite possible they were using pcc (I honestly
>don't remember, except that the whole system "felt" BSD through and
>through, so it wasn't some exotic compiler group's rethink of C).
>A mediocre code generator would represent a pervasive tax, at least
>for the BSD port.

Easily possible, but at the time, this tax was payed in a lot of
places (except probably MIPS and PL.8). I was a summer intern at HP
in 1988 and 1989, and one person there commented that on first release
gcc beat HP's compiler (probably based on pcc) by a large margin, and
that they had worked a lot to catch up with gcc (but IIRC where not
quite there yet at the time).

Anton Ertl

unread,
Jun 5, 2022, 7:33:29 AMJun 5
to
A 16-bit memory bus would have slowed ARM down by probably a factor of
2. But ROMP had a 32-bit bus AFAIK.

Anton Ertl

unread,
Jun 5, 2022, 8:04:35 AMJun 5
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Thomas Koenig <tko...@netcologne.de> writes:
>>ROMP was not the uncompromising design of the later RISCs. Only
>>16 registers and (mostly) R1 = R1 op R2 instructions probably made
>>for more spills to memory.
>
>16 registers is plenty for reducing spilling. Sure, 32 registers
>reduces spilling more if you have a good register allocator, but it's
>not like it's a factor of 2 in most applications.

Forgot to write: Two-address instructions don't increase the number of
spills, but may increase the number of executed instructions: Just
replace

r1 = r2 op r3

with

r1 = r2
r1 = r1 op r3

However, even if you have three-address instructions, it's quite
frequent that the destination is the same as one source, so the
slowdown from only having that is not that big (and may be recovered
by having fewer instruction fetch cycles).

EricP

unread,
Jun 5, 2022, 11:04:56 AMJun 5
to
John Levine wrote:
> According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
>> What was wrong with the ROMP for it to be so slow? The early RISCs
>> that I experienced (HP 9000/825, 88100, and MIPS R2000 and R3000) ...
>
> Those were late 1980s chips. The 9000/825 was in 1987, 88100 in 1988,
> but the ROMP was designed no later than 1980. They screwed around
> until 1986 before shipping it.
>
> It also didn't help that AIX+VRM had huge overhead. Perhaps the VRM
> could have been tuned to be faster but why bother.
>
> R's,
> John

There is a whole issue of HP Journal from Sep-1987 devoted
to the PA-RISC NS-1 chip used in the HP 9000/825S,
and in particular 2 articles on its uArch.

https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-09.pdf

Apparently it had an off-chip TLB with 2048 to 4096 entries!
At 144,000 transistor it is more than double the size of a
Motorola 68000 which was ~68,000.

https://www.openpa.net/pa-risc_processor_pa-early.html#ns-1


MitchAlsup

unread,
Jun 5, 2022, 11:05:07 AMJun 5
to
On Sunday, June 5, 2022 at 6:01:16 AM UTC-5, Anton Ertl wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >ROMP was not the uncompromising design of the later RISCs. Only
> >16 registers and (mostly) R1 = R1 op R2 instructions probably made
> >for more spills to memory.
> 16 registers is plenty for reducing spilling. Sure, 32 registers
> reduces spilling more if you have a good register allocator, but it's
> not like it's a factor of 2 in most applications.
<
The difference between 16 registers and 32 registers should be on the
order of 3%

MitchAlsup

unread,
Jun 5, 2022, 11:07:32 AMJun 5
to
On Sunday, June 5, 2022 at 7:04:35 AM UTC-5, Anton Ertl wrote:
> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >Thomas Koenig <tko...@netcologne.de> writes:
> >>ROMP was not the uncompromising design of the later RISCs. Only
> >>16 registers and (mostly) R1 = R1 op R2 instructions probably made
> >>for more spills to memory.
> >
> >16 registers is plenty for reducing spilling. Sure, 32 registers
> >reduces spilling more if you have a good register allocator, but it's
> >not like it's a factor of 2 in most applications.
> Forgot to write: Two-address instructions don't increase the number of
> spills, but may increase the number of executed instructions: Just
> replace
>
> r1 = r2 op r3
>
> with
>
> r1 = r2
> r1 = r1 op r3
>
> However, even if you have three-address instructions, it's quite
> frequent that the destination is the same as one source, so the
> slowdown from only having that is not that big (and may be recovered
> by having fewer instruction fetch cycles).
<
Without disagreeing with anything you said::
<
I dislike the terms 3-address when you really mean 2-operand.
3-address implies:: m[1234] = m[4321] + m[7632]

John Levine

unread,
Jun 5, 2022, 11:40:29 AMJun 5
to
According to Thomas Koenig <tko...@netcologne.de>:
>ROMP was not the uncompromising design of the later RISCs. Only
>16 registers and (mostly) R1 = R1 op R2 instructions probably made
>for more spills to memory. It had a three-stage pipeline, I think
>classic RISCs had more.

Remember that ROMP was designed by the people who designed the 801.
They left out anything that could be done as well in software, and
also only included instructions that their PL.8 compiler generated.
They took this to an extreme level, 24 bit registers and no memory
management.

ROMP made some compromises with practicality. Memory was still
expensive in the late 1970s so they added 16 bit instructions.
Virtual memory was useful so they added a minimalist MMU with
reverse mapping. They also added back a few instructions that
turned out to be useful, like load/store multiple which were
in every subroutine call sequence.

It's ahistorical to ask whether it was a RISC. It came out of
a different group with different goals than the people on
the west coast who coined the term RISC.

Anton Ertl

unread,
Jun 5, 2022, 12:34:15 PMJun 5
to
EricP <ThatWould...@thevillage.com> writes:
>There is a whole issue of HP Journal from Sep-1987 devoted
>to the PA-RISC NS-1 chip used in the HP 9000/825S,
>and in particular 2 articles on its uArch.
>
>https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-09.pdf
>
>Apparently it had an off-chip TLB with 2048 to 4096 entries!
>At 144,000 transistor it is more than double the size of a
>Motorola 68000 which was ~68,000.

Another number I have seen for the 68000 was in the 40,000-50,000
transistor range (and seems more plausible given the size and
implementation process).

But the 68000 has no MMU, so what do you want to tell us with that
comparison?

The 68851 designed to go with the 68020 has 210,000 transistors
<https://patpend.net/technical/68000/68000faq.txt>.

It seems to me that both the NS-1 TLB and the 68851 have so many
transistors because they could. The 68030 has an MMU and a 256-byte
I-cache, but only 83,000 transistors more than the 68020.

Anton Ertl

unread,
Jun 5, 2022, 12:49:21 PMJun 5
to
John Levine <jo...@taugh.com> writes:
>It's ahistorical to ask whether it was a RISC. It came out of
>a different group with different goals than the people on
>the west coast who coined the term RISC.

The "people on the west coast" certainly considered the IBM 801 to be
a RISC. E.g., Patterson lists it with the Berkeley RISC-I and
Stanford MIPS as a RISC. And given that ROMP is a productized 801,
like SPARC is a productized Berkeley RISC, and commerial MIPS is a
productized Stanford MIPS, I would call ROMP a RISC, too.

EricP

unread,
Jun 5, 2022, 2:40:21 PMJun 5
to
Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>> There is a whole issue of HP Journal from Sep-1987 devoted
>> to the PA-RISC NS-1 chip used in the HP 9000/825S,
>> and in particular 2 articles on its uArch.
>>
>> https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-09.pdf
>>
>> Apparently it had an off-chip TLB with 2048 to 4096 entries!
>> At 144,000 transistor it is more than double the size of a
>> Motorola 68000 which was ~68,000.
>
> Another number I have seen for the 68000 was in the 40,000-50,000
> transistor range (and seems more plausible given the size and
> implementation process).
>
> But the 68000 has no MMU, so what do you want to tell us with that
> comparison?
>
> The 68851 designed to go with the 68020 has 210,000 transistors
> <https://patpend.net/technical/68000/68000faq.txt>.
>
> It seems to me that both the NS-1 TLB and the 68851 have so many
> transistors because they could. The 68030 has an MMU and a 256-byte
> I-cache, but only 83,000 transistors more than the 68020.
>
> - anton

This article written by Nick Tredennick, one of the 68000 designers,
is paywalled but in the Google extract it says

"... but the MC68000 was a large processor for its time,
containing 68,000 transistors..."

Microprocessor-based computers, 1996
https://ieeexplore.ieee.org/abstract/document/539718

Stefan Monnier

unread,
Jun 5, 2022, 5:11:34 PMJun 5
to
Anton Ertl [2022-06-05 11:19:29] wrote:
> Stefan Monnier <mon...@iro.umontreal.ca> writes:
>>> Interesting challenge, though - try to design something within the
>>> budget of a ROMP or a 68000 which is clearly superior to both :-)
>> IIUC the ARM did fit comfortably within the 68000's transistor budget.
>> Not sure how it'd have fared within the limits of the 68000's
>> packaging, OTOH: it's probably hard to get much benefit from a pipeline
>> when you're stuck with a "4bit/cycle" memory bus and no on-chip cache.
> A 16-bit memory bus would have slowed ARM down by probably a factor of
> 2. But ROMP had a 32-bit bus AFAIK.

Extending the 68k's 16bit bus to 32bit would only have sped it up to
8bit/cycle, so the issue is not just the bus's width but also
its protocol.

IIUC, the ARM relied on a somewhat more complex (off-chip)
memory controller that was important to let it handle one memory access
per cycle at 8MHz without relying on SRAM.


Stefan

MitchAlsup

unread,
Jun 5, 2022, 6:38:23 PMJun 5
to
On Sunday, June 5, 2022 at 4:11:34 PM UTC-5, Stefan Monnier wrote:
> Anton Ertl [2022-06-05 11:19:29] wrote:
> > Stefan Monnier <mon...@iro.umontreal.ca> writes:
> >>> Interesting challenge, though - try to design something within the
> >>> budget of a ROMP or a 68000 which is clearly superior to both :-)
> >> IIUC the ARM did fit comfortably within the 68000's transistor budget.
> >> Not sure how it'd have fared within the limits of the 68000's
> >> packaging, OTOH: it's probably hard to get much benefit from a pipeline
> >> when you're stuck with a "4bit/cycle" memory bus and no on-chip cache.
> > A 16-bit memory bus would have slowed ARM down by probably a factor of
> > 2. But ROMP had a 32-bit bus AFAIK.
<
> Extending the 68k's 16bit bus to 32bit would only have sped it up to
> 8bit/cycle, so the issue is not just the bus's width but also
> its protocol.
<
True--which lead to the sub-topic of DTAK-grounded.........

Marcus

unread,
Jun 7, 2022, 9:23:17 AMJun 7
to
On 2022-06-05, MitchAlsup wrote:
> On Sunday, June 5, 2022 at 7:04:35 AM UTC-5, Anton Ertl wrote:
>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>>> ROMP was not the uncompromising design of the later RISCs. Only
>>>> 16 registers and (mostly) R1 = R1 op R2 instructions probably made
>>>> for more spills to memory.
>>>
>>> 16 registers is plenty for reducing spilling. Sure, 32 registers
>>> reduces spilling more if you have a good register allocator, but it's
>>> not like it's a factor of 2 in most applications.
>> Forgot to write: Two-address instructions don't increase the number of
>> spills, but may increase the number of executed instructions: Just
>> replace
>>
>> r1 = r2 op r3
>>
>> with
>>
>> r1 = r2
>> r1 = r1 op r3
>>
>> However, even if you have three-address instructions, it's quite
>> frequent that the destination is the same as one source, so the
>> slowdown from only having that is not that big (and may be recovered
>> by having fewer instruction fetch cycles).
> <
> Without disagreeing with anything you said::
> <
> I dislike the terms 3-address when you really mean 2-operand.
> 3-address implies:: m[1234] = m[4321] + m[7632]
> <

To counter this: I have seen the term "operand" used both for sources
and destinations. So:

r1 = r2 op r3

...would have two source operands and one destination operand, for a
total of three operands.

Is that a common notation?

In your wording that would be a two-operand instruction. What is the
result called (if not an instruction operand)?


>> - anton
>> --
>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

/Marcus

MitchAlsup

unread,
Jun 7, 2022, 12:09:59 PMJun 7
to
While I have seen this: I prefer operands as sources only and results as
destinations only.
>
> In your wording that would be a two-operand instruction. What is the
> result called (if not an instruction operand)?
<
Result.

Ivan Godard

unread,
Jun 7, 2022, 12:26:21 PMJun 7
to
We call it a single-result dyadic operation. The ISA includes
multi-result instructions (divrem(a,b)->{q,r}) and polyadic instructions
(call(a,b,c)); the nomenclature seems to handle all of those.

Stephen Fuld

unread,
Jun 7, 2022, 12:40:56 PMJun 7
to
While I agree that the result(s) should be called "Results", this
doesn't capture the essential difference Tom was trying to show. In a
register machine, any instruction like Add has two source operands
(pretty much by the definition of addition). The question is
distinguishing in our naming convention between when the one of the
sources is the same as the result, or can be a different i.e. third
register.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

MitchAlsup

unread,
Jun 7, 2022, 5:41:25 PMJun 7
to
The word "destructive" comes to mind.
IBM 360 is based on a 2-operand destructive instruction set model.
RISC is based on a 2-operand non-destructive instruction set model.

MitchAlsup

unread,
Jun 7, 2022, 5:42:12 PMJun 7
to
Poly-operand and poly-result work just as well.

Anton Ertl

unread,
Jun 9, 2022, 5:34:11 AMJun 9
to
EricP <ThatWould...@thevillage.com> writes:
>Anton Ertl wrote:
>> Another number I have seen for the 68000 was in the 40,000-50,000
>> transistor range (and seems more plausible given the size and
>> implementation process).
...
>This article written by Nick Tredennick, one of the 68000 designers,
>is paywalled but in the Google extract it says
>
>"... but the MC68000 was a large processor for its time,
>containing 68,000 transistors..."
>
>Microprocessor-based computers, 1996
>https://ieeexplore.ieee.org/abstract/document/539718

Yes, the 68,000 transistor number seems to originate with Motorola.

Let's contrast it with some other numbers:

68000: 3.5um HMOS I 6.1mm x 7.3mm = 44.53mm (accoring to
<https://books.google.at/books?id=NUz1CAAAQBAJ&pg=PA282&lpg=PA282&ots=U7-9smOQsY&focus=viewport&dq=68000+die+size+area&hl=de>)

8086: 3.2um NMOS 33mm^2 29,000 transistors 878t/mm^2
<https://en.wikipedia.org/wiki/Intel_8086>

16032: 3.5um NMOS 7.5mm * 7.3mm = 54.75mm^2 60,000 transistors 1095t/mm^2
<http://cpu-ns32k.net/CPUs.html>

ARM1: 3um CMOS ~7mm * ~7mm = 50mm^2 24,800 transistors 496t/mm^2
<https://en.wikichip.org/wiki/acorn/microarchitectures/arm1>

If the 68000 has 68000 transistors, that would be 1527t/mm^2. This
does not look particularly plausible compared to the competition.

EricP

unread,
Jun 9, 2022, 9:44:32 AMJun 9
to
Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>> Anton Ertl wrote:
>>> Another number I have seen for the 68000 was in the 40,000-50,000
>>> transistor range (and seems more plausible given the size and
>>> implementation process).
> ....
>> This article written by Nick Tredennick, one of the 68000 designers,
>> is paywalled but in the Google extract it says
>>
>> "... but the MC68000 was a large processor for its time,
>> containing 68,000 transistors..."
>>
>> Microprocessor-based computers, 1996
>> https://ieeexplore.ieee.org/abstract/document/539718
>
> Yes, the 68,000 transistor number seems to originate with Motorola.
>
> Let's contrast it with some other numbers:
>
> 68000: 3.5um HMOS I 6.1mm x 7.3mm = 44.53mm (accoring to
> <https://books.google.at/books?id=NUz1CAAAQBAJ&pg=PA282&lpg=PA282&ots=U7-9smOQsY&focus=viewport&dq=68000+die+size+area&hl=de>)
>
> 8086: 3.2um NMOS 33mm^2 29,000 transistors 878t/mm^2
> <https://en.wikipedia.org/wiki/Intel_8086>
>
> 16032: 3.5um NMOS 7.5mm * 7.3mm = 54.75mm^2 60,000 transistors 1095t/mm^2
> <http://cpu-ns32k.net/CPUs.html>
>
> ARM1: 3um CMOS ~7mm * ~7mm = 50mm^2 24,800 transistors 496t/mm^2
> <https://en.wikichip.org/wiki/acorn/microarchitectures/arm1>
>
> If the 68000 has 68000 transistors, that would be 1527t/mm^2. This
> does not look particularly plausible compared to the competition.
>
> - anton

https://en.wikipedia.org/wiki/HMOS#Intel_HMOS

"Intel's own depletion-load NMOS process was known as HMOS,
for High density, short channel MOS."
...
"According to Intel, HMOS II (1979) provided twice the density and
four times the speed/power product over other typical contemporary
depletion-load nMOS processes. This version was widely licensed by
3rd parties, including (among others) Motorola who used it for
their Motorola 68000..."

So that "twice the density" is about right compared to 8086.


mac

unread,
Jun 9, 2022, 10:12:55 AMJun 9
to

> What was wrong with the ROMP for it to be so slow?

Word at the time was that they had implementation issues, and had to
disable cache.

EricP

unread,
Jun 9, 2022, 10:26:21 AMJun 9
to
> ....
> "According to Intel, HMOS II (1979) provided twice the density and
> four times the speed/power product over other typical contemporary
> depletion-load nMOS processes. This version was widely licensed by
> 3rd parties, including (among others) Motorola who used it for
> their Motorola 68000..."
>
> So that "twice the density" is about right compared to 8086.

Searching for "68000" "HMOS" finds this article by a fellow who
was at Motorola in the microprocessor design team at that time.

"The “father” of the 6805 MCU relates his role in the history of
the microprocessor."

A participant's perspective, RG Daniels, 1996
https://scholar.archive.org/work/3ao6h2o7cjhaljwgcvqspya2ey/access/wayback/http://ada.computer.org:80/pubs/micro/articles/m60021.pdf

"Introduced in 1979, the 68000 was fabricated on 5-inch wafers in
4-micron HMOS technology. Coincidentally, the transistor count was
about 68,000."




MitchAlsup

unread,
Jun 9, 2022, 12:58:09 PMJun 9
to
Most of the transistors were ROM not logic.

David Schultz

unread,
Jun 9, 2022, 1:36:21 PMJun 9
to
On 6/9/22 9:26 AM, EricP wrote:

> "The “father” of the 6805 MCU relates his role in the history of
> the microprocessor."
>
> A participant's perspective, RG Daniels, 1996
> https://scholar.archive.org/work/3ao6h2o7cjhaljwgcvqspya2ey/access/wayback/http://ada.computer.org:80/pubs/micro/articles/m60021.pdf
>
>
> "Introduced in 1979, the 68000 was fabricated on 5-inch wafers in
> 4-micron HMOS technology. Coincidentally, the transistor count was
> about 68,000."
>
Or something a bit more contemporary, from a BYTE magazine series in 1983:

Photo 1: The MC68000 microprocessor chip, which contains more than
68,000 transistors, is 246 by 281 mils (6.24 by 7.14 mm) in
size.


If the photo in the article isn't detailed enough for you:
http://www.visual6502.org/images/pages/Motorola_68000.html


--
http://davesrocketworks.com
David Schultz

John Levine

unread,
Jun 9, 2022, 3:25:53 PMJun 9
to
According to mac <aco...@efunct.com>:
>
>> What was wrong with the ROMP for it to be so slow?
>
>Word at the time was that they had implementation issues, and had to
>disable cache.

When I was working on AIX I don't ever recall them talking about a cache. It would
have been pretty aggressive for a single-chip CPU designed in 1979.

Looking at the diagrams and pictures of the processor card in the RT
PC technology book I don't see where a cache would have gone.

MitchAlsup

unread,
Jun 9, 2022, 4:07:21 PMJun 9
to
It was never clear to me (and I worked there at the time) that 68,000
transistors were counting the transistors in the ROM that were not
connected to anything (or not). The ROM contained a transistor at
each crossing point of diffusion and poly. A good many of these
transistors were not connected to VDD, GND, or a signal*--the way
we wired up a ROM to have a given set of values.
<
(*) don't cares.

Anton Ertl

unread,
Jun 10, 2022, 4:39:48 AMJun 10
to
Interesting that he mentions all kinds of microprocessors (both from
Motorola and the competition), but not Motorola's 88100 and 88110.

Anton Ertl

unread,
Jun 10, 2022, 5:11:17 AMJun 10
to
But it's about HMOS II, while the first 68000 was done in HMOS I
according to the reference above.

Looking at
<http://www.righto.com/2020/06/die-shrink-how-intel-scaled-down-8086.html>,
it says that HMOS I was 3.0um channel length, while HMOS II has 2.0um
channel length. HMOS I fits the 3.5um feature size for the 68000
better. The 8086 was originally made in HMOS I according to
<http://www.righto.com/2020/06/die-shrink-how-intel-scaled-down-8086.html>,
so that would be the same process as the original 68000.

Mitch Alsup apparently wants to tell us that the larger proportion of
ROM leads to higher transistor density, but does the 68000 really have
a significantly larger proportion of ROM than the 8086 with its
significant number of irregular instructions?

EricP

unread,
Jun 10, 2022, 10:23:08 AMJun 10
to
The question is which came first, the project's
transistor budget or the product name?

The original 6800 microprocessor was so well known
that the name 68000 would have been a fait accompli
to ride on its marketing brand recognition coattails.
Cuz everyone knows that more zero's means bigger and better!

Maybe later someone noticed that, if you squinted,
the number of transistors was close to the same number,
with some arm waving and rounding.


MitchAlsup

unread,
Jun 10, 2022, 12:02:09 PMJun 10
to
68000 had more data path stuff--3 sections:: 1 for PC and fetching, 1 for
Address and memory reference, 1 for Data and calculations
Whereas I think original x86 had but 2.

MitchAlsup

unread,
Jun 10, 2022, 12:02:59 PMJun 10
to
This ^^^
>
> Maybe later someone noticed that, if you squinted,
> the number of transistors was close to the same number,
> with some arm waving and rounding.
<
With a bit of this mixed in ^^^

Stefan Monnier

unread,
Jun 10, 2022, 12:53:04 PMJun 10
to
>> Cuz everyone knows that more zero's means bigger and better!
> <
> This ^^^
>>
>> Maybe later someone noticed that, if you squinted,
>> the number of transistors was close to the same number,
>> with some arm waving and rounding.
> <
> With a bit of this mixed in ^^^

And here I was, thinking they had first chosen the name/number and then
designed the ISA and implementation to try and get as close as possible
to this target.


Stefan

MitchAlsup

unread,
Jun 10, 2022, 1:57:18 PMJun 10
to
That would have required reasoning and forethought. Both of these are
banished and punished inside a design team.
>
>
> Stefan

Thomas Koenig

unread,
Jun 10, 2022, 3:04:33 PMJun 10
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> Mitch Alsup apparently wants to tell us that the larger proportion of
> ROM leads to higher transistor density, but does the 68000 really have
> a significantly larger proportion of ROM than the 8086 with its
> significant number of irregular instructions?

Compare the die shot of the 8086 at

http://www.righto.com/2020/06/a-look-at-die-of-8086-processor.html

with the die shots at

http://www.visual6502.org/images/pages/Motorola_68000.html

and compare the area of the ROMs. For the 8086, I'd say its
around 16% of the total die area, for the 68000 probably a bit
more, more like 22-23% (but my identification my be off).

David Schultz

unread,
Jun 10, 2022, 3:20:19 PMJun 10
to
On 6/10/22 3:59 AM, Anton Ertl wrote:

> Mitch Alsup apparently wants to tell us that the larger proportion of
> ROM leads to higher transistor density, but does the 68000 really have
> a significantly larger proportion of ROM than the 8086 with its
> significant number of irregular instructions?
>
> - anton

I turned up another interesting article on the 68000. This one claimed
around 40,000 transistors but probably left off the ones in the 31Kbits
of ROM. Mostly it discussed debugging the design using a hardware (wire
wrapped TTL) and software simulation (IBM 370).

https://dl.acm.org/doi/pdf/10.1145/1014188.803015

MitchAlsup

unread,
Jun 10, 2022, 3:24:11 PMJun 10
to
Agreed, x86 has more random logic and less data-path logic and
somewhat less ROM. 68000 has more structure in the Data path,
in the control machine(s) and in the ROM.

Anne & Lynn Wheeler

unread,
Jun 10, 2022, 4:45:40 PMJun 10
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> Interesting that he mentions all kinds of microprocessors (both from
> Motorola and the competition), but not Motorola's 88100 and 88110.

801/risc didn't have cache consistency (for multiprocessor) plus
some number of other deficiencies ... rios/power was large six chip
implementation. When AIM (apple, ibm, motorola) was formed
https://en.wikipedia.org/wiki/AIM_alliance

the executive we reported to (when we were doing ha/cmp), went over to
head up somerset ... for power/pc doing single chip design, also
multiprocessor cache consistency and some number of other features
... I claimed a lot of it came from 88k.
https://en.wikipedia.org/wiki/PowerPC_600

.. then well before ibm sells somerset, our former boss and gone over to
be president of mips.

POWERPC ALLIANCE FRACTURED AS IBM SELLS SOMERSET
https://techmonitor.ai/technology/powerpc_alliance_fractured_as_ibm_sells_somerset

--
virtualization experience starting Jan1968, online at home since Mar1970

EricP

unread,
Jun 11, 2022, 2:02:55 AMJun 11
to
The amount of ROM is confirmed by a patent on 68000 microprogram controller.
It shows the two level sequencer: a microcode address selecting from
a 620 * 12 bit microcode ROM, feeding a nanocode sequencer selecting
from a 280 * 70 nanocode ROM, for a total of 7680 + 19600 = 27,280 bits.

Microprogrammed control apparatus having a
two-level control store for data processor, 1978
https://patents.google.com/patent/US4307445A


Reply all
Reply to author
Forward
0 new messages