RISC with standard TTL chips

Thomas Koenig

unread,

Oct 16, 2022, 9:08:26 AM10/16/22

to

I have been wondering for some time what a RICS machine would
have looked like implemented only with standard TTL chips, and
what impact it would have had had it been released in the late
1970s instead of, or in competition with, the VAX.

I've now found the answer to the first question. HP first released
its HP-PA architecture with a CPU made from MSI TTL chips, described
in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
first shipping its HP 3000/930 in 1986. Its CPU had six printed
circuit boards, running at 8 MHz, and performing around 4-5 million
of its own instructions per second.

Such a machine (with smaller caches, probably) in 1978 would have
been entirely possible, and would have outperformed the VAX by a
reasonable factor, leading to a less CISC-y microprocessor world.

MitchAlsup

unread,

Oct 16, 2022, 12:14:50 PM10/16/22

to

On Sunday, October 16, 2022 at 8:08:26 AM UTC-5, Thomas Koenig wrote:
> I have been wondering for some time what a RICS machine would
> have looked like implemented only with standard TTL chips, and
> what impact it would have had had it been released in the late
> 1970s instead of, or in competition with, the VAX.
>
> I've now found the answer to the first question. HP first released
> its HP-PA architecture with a CPU made from MSI TTL chips, described
> in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
> first shipping its HP 3000/930 in 1986. Its CPU had six printed
> circuit boards, running at 8 MHz, and performing around 4-5 million
> of its own instructions per second.
<

The 1st generation of RISC machines used SAMs (for caches) that
ran 2× the pipeline stage delay.
<
This HP design used SRAMs that were 5× and 4× as fast as the
pipeline itself. The pipeline used would have 2× as many stages
in a modern design.

>
> Such a machine (with smaller caches, probably) in 1978 would have
> been entirely possible, and would have outperformed the VAX by a
> reasonable factor, leading to a less CISC-y microprocessor world.
<

DEC would still have had an advantage in caché if nothing else.
And with the TTL available in 1978 it is not clear that HP would have
had "that much" of a performance advantage.
<
Whether (or not) this would have lead to a less CISCy future is
entirely unclear, as x86 was already here and PCs were too. The
only thing (I can see) that would have altered the course of history
is a price structure competitive with PCs. Neither VAX not HP would
have fit in that space. Perhaps HP could have out-competed VAX
11/780 in the time sharing (back room) space, but workstations
were already making inroads and soon lead to everyone having
a computer on their desk.

Theo

unread,

Oct 16, 2022, 12:19:11 PM10/16/22

to

Thomas Koenig <tko...@netcologne.de> wrote:
> I've now found the answer to the first question. HP first released
> its HP-PA architecture with a CPU made from MSI TTL chips, described
> in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
> first shipping its HP 3000/930 in 1986. Its CPU had six printed
> circuit boards, running at 8 MHz, and performing around 4-5 million
> of its own instructions per second.

That's interesting. It feels like a nice hobbyist project for somebody to
try today...

> Such a machine (with smaller caches, probably) in 1978 would have
> been entirely possible, and would have outperformed the VAX by a
> reasonable factor, leading to a less CISC-y microprocessor world.

Surely the enabler for the 'RISC revolution' was falling memory costs, which
enabled RISC's less-dense machine code and thus a simpler, faster CPU? I'm
not sure you'd get very far with a RISC CPU at 1978 DRAM prices.

Theo

MitchAlsup

unread,

Oct 16, 2022, 7:03:27 PM10/16/22

to

As I saw it::
<
The enabler for RISC 1st generation was the ability to put a whole 32-bit
pipelined data path on a single chip -- combined with the inability to put
a CISC ISA microcode control engine on the same chip.
<
Falling SRAM prices may have helped, but were not that big a contributing
factor.
>
> Theo

Thomas Koenig

unread,

Oct 17, 2022, 12:54:47 AM10/17/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

Neither the 801 nor the first HP-PA implementations were single-chip
implementations, and yet they had performance advantages over their
CISCy relations implemented with a similar technology.

Does that make them 0th generation RISC?

> Falling SRAM prices may have helped, but were not that big a contributing
> factor.

Paradoxically enough, falling RAM prices first enabled, then
helped destroy the minis. Once you had performance _and_ memory
on a workstation matching the expensive mini, why buy the expensive
mini?

Anton Ertl

unread,

Oct 17, 2022, 4:41:39 AM10/17/22

to

MitchAlsup <Mitch...@aol.com> writes:
>DEC would still have had an advantage in cach=C3=A9 if nothing else.

>And with the TTL available in 1978 it is not clear that HP would have
>had "that much" of a performance advantage.

The RISC would have benefited from the pipelining advantage. The VAX
ran at 5MHz with 10 cycles per instruction on average. Would a RISC
in the same technology have a significantly slower clock?

Concerning the "advantage in cache": I expect that the RISC would have
outperformed the VAX even without cache: The Nova 800 had
semiconductor memory with 800ns cycle time in 1971. So a RISC could
have run at at least 1.25MHz (the ARM1 ran at 6MHz without cache in
1985). Assuming a CPI of 1.25, the VAX would have needed to reduce
the instructions by a factor of 2 with it's CISC features, and AFAIK
it did not.

>Whether (or not) this would have lead to a less CISCy future is
>entirely unclear, as x86 was already here and PCs were too. The
>only thing (I can see) that would have altered the course of history
>is a price structure competitive with PCs.

But such a RISC would needed a too-expensive system compared to PCs,
because it would have needed a 32-bit memory interface; and of course
the CPU itself would have been far too expensive in TTL.

But there was a RISC around at the time: the IBM 801 (well, first
implementation in 1980, and it was in ECL, not TTL). But it seems
that at first only academia and Acorn took notice, quickly developing
Stanford MIPS, Berkeley RISC, and ARM, while Motorola apparently still
saw the VAX as the ideal, and made the 68020 more VAX-like. Intel
pursued BiiN and the i960 (supposedly a RISC with register windows,
but I have never seen the instruction set) at the time, with the i386
as a side project that became the main project when the PCs took off;
at least the i386 did not add any VAX features beyond those already
present in the 80286 (some might consider scaled addressing modes
CISC, but there are RISC CPUs that have them).

Overall, it seems to me that the RISC revolution managed to upset the
industry pretty soon after the IBM 801 became public, except that
Motorola continued on its VAX trajectory for some time; an alternative
history where Motorola would have developed the 68k in a more RISCy
direction earlier might be interesting. Maybe Apple would not have
gone for PPC, the Amiga and Atari would have had a better fate, and
maybe other PC competitors using the 68k would have sprung up.

An alternative history where the computer architects of the 70s were
given the RISC papers from the 80s would also be interesting. I
expect that the VAX would have looked much different, and the
microprocessors, too.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Anton Ertl

unread,

Oct 17, 2022, 4:53:55 AM10/17/22

to

Theo <theom...@chiark.greenend.org.uk> writes:
>Surely the enabler for the 'RISC revolution' was falling memory costs, which
>enabled RISC's less-dense machine code

Here's code density results from
<2017Aug...@mips.complang.tuwien.ac.at>:

bash grep gzip
398384 88084 47944 armhf
584340 130872 68276 armel
588972 129096 66892 amd64
604656 131804 66268 i386
637620 133868 72712 s390
638912 140544 71744 sparc
674912 141120 74032 mipsel
674912 141168 74112 mips
680928 139664 74272 powerpc
688052 150680 75908 s390x
1539872 322432 158656 ia64

As you can see, armhf (a RISC) has smaller code size than i386 and
s390 (CISCs). Unfortunatly, I don't have 68k or VAX results. Also,
one problem with these results is that they are based on modern
compilers, which are tuned for modern RAM and cache sizes.

- anton.

Anton Ertl

unread,

Oct 17, 2022, 5:29:37 AM10/17/22

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Theo <theom...@chiark.greenend.org.uk> writes:
>>Surely the enabler for the 'RISC revolution' was falling memory costs, which
>>enabled RISC's less-dense machine code
>
>Here's code density results from
><2017Aug...@mips.complang.tuwien.ac.at>:
>
> bash grep gzip
> 398384 88084 47944 armhf
> 584340 130872 68276 armel
> 588972 129096 66892 amd64
> 604656 131804 66268 i386
> 637620 133868 72712 s390
> 638912 140544 71744 sparc
> 674912 141120 74032 mipsel
> 674912 141168 74112 mips
> 680928 139664 74272 powerpc
> 688052 150680 75908 s390x
> 1539872 322432 158656 ia64
>
>As you can see, armhf (a RISC) has smaller code size than i386 and
>s390 (CISCs). Unfortunatly, I don't have 68k or VAX results. Also,
>one problem with these results is that they are based on modern
>compilers, which are tuned for modern RAM and cache sizes.

I found an earlier measurement (from
<2002Dec1...@a0.complang.tuwien.ac.at>) that includes m68k; the
order of architectures is quite different, which I explain with
differences in compiler tuning.

gzip_1.3.5-1 grep_2.4.2_3
text data bss text data bss
40566 3012 328956 39030 1212 1916 m68k
44258 3100 329264 43772 1212 2068 i386
54419 3012 329000 53238 1224 1948 s390
55384 3012 332960 52012 1224 1896 arm
58228 2884 329024 58144 1036 1960 sparc
58312 3356 328992 55886 1504 1940 powerpc
60045 2832 329000 60643 988 1956 hppa
73691 7760 329088 67542 3913 2152 alpha
86766 3432 329088 78886 1212 2004 mipsel
86846 3432 329088 78918 1212 2004 mips
112417 7344 329144 109004 3768 2200 ia64

"text" is the code size (and possibly read-only data).

- anton

Robert Swindells

unread,

Oct 17, 2022, 10:00:48 AM10/17/22

to

On Sun, 16 Oct 2022 13:08:23 -0000 (UTC), Thomas Koenig wrote:

> I have been wondering for some time what a RICS machine would have
> looked like implemented only with standard TTL chips, and what impact it
> would have had had it been released in the late 1970s instead of, or in
> competition with, the VAX.

I presumed that the Pyramid 90x was build from separate chips, never seen
the boards from one though.

Bernd Linsel

unread,

Oct 17, 2022, 11:00:33 AM10/17/22

to

On 17.10.2022 09:56, Anton Ertl wrote:
> Intel
> pursued BiiN and the i960 (supposedly a RISC with register windows,
> but I have never seen the instruction set)

This can be remedied:
http://www.nj7p.org/Manuals/PDFs/Intel/270710-003.pdf

Regards,
Bernd

Scott Lurndal

unread,

Oct 17, 2022, 11:37:04 AM10/17/22

to

1983 (P90x) may have been a bit early for gate arrays, although we (at Burroughs)
were starting the gate array designs that shipped in 1987 during that
timeframe.

MitchAlsup

unread,

Oct 17, 2022, 12:33:07 PM10/17/22

to

On Monday, October 17, 2022 at 3:41:39 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >DEC would still have had an advantage in cach=C3=A9 if nothing else.
> >And with the TTL available in 1978 it is not clear that HP would have
> >had "that much" of a performance advantage.
> The RISC would have benefited from the pipelining advantage. The VAX
> ran at 5MHz with 10 cycles per instruction on average. Would a RISC
> in the same technology have a significantly slower clock?
>
> Concerning the "advantage in cache": I expect that the RISC would have
<

The word I used would be pronounced cash-shay not cache
It is an aura of respect not a place to store things.

At the time RISC 1st generation started, Motorola already had done the
damage to the 68K with the '020. {That is: put in the extra CISCy stuff
that was more of a pain to pipeline than the '000}

<
> direction earlier might be interesting. Maybe Apple would not have
> gone for PPC, the Amiga and Atari would have had a better fate, and
> maybe other PC competitors using the 68k would have sprung up.
>
> An alternative history where the computer architects of the 70s were
> given the RISC papers from the 80s would also be interesting. I
> expect that the VAX would have looked much different, and the
> microprocessors, too.
<

We look back on the VAX as if it were horrible, but if you go back to
the time of the PDP-11/45 and /70, VAX looked like a significant step
forward. In effect, VAX and 432 started the pendulum swinging away
from CISC towards RISC.

JimBrakefield

unread,

Oct 17, 2022, 1:04:50 PM10/17/22

to

RISC also pipelined the instruction flow. Even if 32-bit instructions are half as dense,
the instruction flow matches the memory interface. And a small instruction cache helps.
The reduction in data flow due to having a register file is a bigger benefit than the savings
from more compact instructions of non-RISC ISAs.
So from a memory bandwidth point of view, register files and larger instructions win over
micro-programming and higher density instructions?

Today's transistor budgets allow OOO micro-architecture and the trade-offs are not as simple.

Thomas Koenig

unread,

Oct 17, 2022, 1:05:43 PM10/17/22

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> MitchAlsup <Mitch...@aol.com> writes:
>>DEC would still have had an advantage in cach=C3=A9 if nothing else.
>>And with the TTL available in 1978 it is not clear that HP would have
>>had "that much" of a performance advantage.
>
> The RISC would have benefited from the pipelining advantage. The VAX
> ran at 5MHz with 10 cycles per instruction on average. Would a RISC
> in the same technology have a significantly slower clock?

The TTL implementation of the first HP-PA would suggest not. They
clocked it at 8 MHz, with a complex RISC pipeline.

This is also the sort of cycle that looks reasonable when browsing
through TI TTL handbooks from the era. Addition certainly was
fast enough for that kind of cycle.

Now, building a Dadda-style multipliers out of the four-by-four
multipliers (74274) plus the full adders (74275) and final summation
with 74181/74182 would have looked daunting from the number of
chips and circuit area used, but it would sound doable with a few
cycles of latency, and doubtless TI would have approved :-)

> Concerning the "advantage in cache": I expect that the RISC would have
> outperformed the VAX even without cache: The Nova 800 had
> semiconductor memory with 800ns cycle time in 1971. So a RISC could
> have run at at least 1.25MHz (the ARM1 ran at 6MHz without cache in
> 1985). Assuming a CPI of 1.25, the VAX would have needed to reduce
> the instructions by a factor of 2 with it's CISC features, and AFAIK
> it did not.

No reason why a RISC would have had to have lower cycle time than
a VAX. HP had a CPI of around 1.75 at the time of their TTL PA-RISC
implementation.

Anton Ertl

unread,

Oct 17, 2022, 1:27:29 PM10/17/22

to

MitchAlsup <Mitch...@aol.com> writes:
>At the time RISC 1st generation started, Motorola already had done the
>damage to the 68K with the '020. {That is: put in the extra CISCy stuff
>that was more of a pain to pipeline than the '000}

When was that? "The case for the reduced instruction set computer"
was published in October 1980, so the cat was out of the bag at that
point at the latest.

>> An alternative history where the computer architects of the 70s were
>> given the RISC papers from the 80s would also be interesting. I
>> expect that the VAX would have looked much different, and the
>> microprocessors, too.
><
>We look back on the VAX as if it were horrible, but if you go back to
>the time of the PDP-11/45 and /70, VAX looked like a significant step
>forward.

And it was, it just was a step into the wrong direction. And I think
that if the VAX architects had the RISC papers available, they would
have recognized that and designed the VAX differently. Maybe not as a
RISC (after all, it should also support PDP-11 code), but maybe more
like a 32-bit PDP-11 with more registers and maybe some addressing
modes deprecated.

>In effect, VAX and 432 started the pendulum swinging away
>from CISC towards RISC.

By moving it even further in the other direction. It's interesting
that PDP-11 is somewhat similar to the IA-32 and S/360 architectures,
with all of them having load-and-op and read-modify-write
instructions, while VAX with its three-general-operands was quite
different. In particular, PDP-11, IA-32, and S/360 (and original 68K)
have only one memory address for most instructions (better would be,
if all instructions had that property, but these architectures were
all introduced or extended processors without MMUs).

Anton Ertl

unread,

Oct 17, 2022, 1:33:54 PM10/17/22

to

Thanks. Yes, it's a RISC. A load/store architecture with register
windows, with instructions that are generally 32-bit wide, but may
have a 32-bit displacement appended.

Bill Findlay

unread,

Oct 17, 2022, 2:04:44 PM10/17/22

to

On 17 Oct 2022, MitchAlsup wrote
(in article<48fc8f3d-7901-4de0...@googlegroups.com>):

> On Monday, October 17, 2022 at 3:41:39 AM UTC-5, Anton Ertl wrote:
> > MitchAlsup <Mitch...@aol.com> writes:
> > > DEC would still have had an advantage in cach=C3=A9 if nothing else.
> > > And with the TTL available in 1978 it is not clear that HP would have
> > > had "that much" of a performance advantage.
> > The RISC would have benefited from the pipelining advantage. The VAX
> > ran at 5MHz with 10 cycles per instruction on average. Would a RISC
> > in the same technology have a significantly slower clock?
> >
> > Concerning the "advantage in cache": I expect that the RISC would have
> <
> The word I used would be pronounced cash-shay not cache
> It is an aura of respect not a place to store things.
> <

Actually, Mitch, while it would be so pronounced,
it means 'hidden'. The word you needed is 'cachet'.

Oh, and BTW, it's 'Spectre', not 'Spectré'. 8-)

--
Bill Findlay

Thomas Koenig

unread,

Oct 17, 2022, 2:06:28 PM10/17/22

to

Theo <theom...@chiark.greenend.org.uk> schrieb:

> Thomas Koenig <tko...@netcologne.de> wrote:
>> I've now found the answer to the first question. HP first released
>> its HP-PA architecture with a CPU made from MSI TTL chips, described
>> in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
>> first shipping its HP 3000/930 in 1986. Its CPU had six printed
>> circuit boards, running at 8 MHz, and performing around 4-5 million
>> of its own instructions per second.
>
> That's interesting. It feels like a nice hobbyist project for somebody to
> try today...

Today, it would be possible to work from one of the published design
documents for earlier RISC architectures, but it would still be
a tall order (unless somebody wrote a VHDL-to-74xx converter :-)

MitchAlsup

unread,

Oct 17, 2022, 3:37:45 PM10/17/22

to

So did Stretch, 360-91, 1108, ...

<
> Even if 32-bit instructions are half as dense,
> the instruction flow matches the memory interface. And a small instruction cache helps.
> The reduction in data flow due to having a register file is a bigger benefit than the savings
> from more compact instructions of non-RISC ISAs.
<

The 32-entry register file allows a good compiler to get rid of 1/3rd of all memory refs
compared to <say> a VAX-11/780.

<
> So from a memory bandwidth point of view, register files and larger instructions win over
> micro-programming and higher density instructions?
<

It was said (?Hennessey?} that the original MIPS (Stanford) ran 50% more instructions
6× faster for an overall 4× performance advantage. {I forgot whether this was x86, 68K
or VAX as the comparison}.

<
>
> Today's transistor budgets allow OOO micro-architecture and the trade-offs are not as simple.
<

Todays transistor budgets allow for may thing to be handled with brute force.

Brett

unread,

Oct 17, 2022, 4:01:39 PM10/17/22

to

VAX has horrible code density, as bad as RISC.

John Dallman

unread,

Oct 17, 2022, 4:14:02 PM10/17/22

to

In article <2022Oct1...@mips.complang.tuwien.ac.at>,

an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> MitchAlsup <Mitch...@aol.com> writes:
> >At the time RISC 1st generation started, Motorola already had done
> >the damage to the 68K with the '020. {That is: put in the extra CISCy
> >stuff that was more of a pain to pipeline than the '000}
>
> When was that? "The case for the reduced instruction set computer"
> was published in October 1980, so the cat was out of the bag at that
> point at the latest.

How long does it takes a large company, with a successful product and
plans to improve it, to realise and accept that they're heading for a
dead end?

Research papers are easy for commercial managers to discount. The
commercial impact of RISC started in 1985-86 as MIPS and SPARC
demonstrated that RISC worked and would sell. 68020 appeared in 1984 and
the 88000 in 1988.

John

MitchAlsup

unread,

Oct 17, 2022, 4:24:22 PM10/17/22

to

68020 was "on the drawing boards" in '81 and basically in concrete in '82
Seeing tapeout late in '82 and chips inside Moto in '83. So, even if they read
the papers on the day it was published, and convinced management that
this was the direction of the future the day after; the '020 would have still
been the plan for sales in '84 through '87. You cannot "create" a RISC ISA,
a pipeline, do a compiler, and port an OS is less time--especially if your
current design team was "all on" the current project.
<
When I interviewed at Moto in Aug '83, Murry Goldman ask me my opinion
of RISC and where it was going.......
>
> John

Thomas Koenig

unread,

Oct 17, 2022, 5:10:27 PM10/17/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

> When I interviewed at Moto in Aug '83, Murry Goldman ask me my opinion
> of RISC and where it was going.......

What did you answer, and what was his reaction?

MitchAlsup

unread,

Oct 17, 2022, 6:26:06 PM10/17/22

to

I lied:: I said "I think the Jury is still out".
<
He liked that answer.

Thomas Koenig

unread,

Oct 18, 2022, 2:48:37 AM10/18/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

Very diplomatic - no manager likes to be told that he'd been
backing the wrong horse, and that he should have known
(especially since the 68020 was released the next year).

><
> He liked that answer.

I can well believe that.

Terje Mathisen

unread,

Oct 18, 2022, 3:36:43 AM10/18/22

to

I'm pretty sure that the comparison was vs VAX, those other CPUs were
still considered "toys" at the time?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Anton Ertl

unread,

Oct 18, 2022, 4:51:19 AM10/18/22

to

What would have been your honest answer?

Theo

unread,

Oct 18, 2022, 8:51:25 AM10/18/22

to

Thomas Koenig <tko...@netcologne.de> wrote:
> Theo <theom...@chiark.greenend.org.uk> schrieb:

> > That's interesting. It feels like a nice hobbyist project for somebody to
> > try today...
>
> Today, it would be possible to work from one of the published design
> documents for earlier RISC architectures, but it would still be
> a tall order (unless somebody wrote a VHDL-to-74xx converter :-)

There were some quite funky MSI 74xx chips back in the day - various
bitslices, pieces of an ALU, register files, etc. Somewhat hard to get hold
of these days, but perhaps would simplify things.

Of course, if you want to go all FPGA you could just have some flipflops and
27xx EPROMs :-)

(that would not do wonders for your cycle time, however)

Theo

MitchAlsup

unread,

Oct 18, 2022, 12:18:39 PM10/18/22

to

On Tuesday, October 18, 2022 at 3:51:19 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Monday, October 17, 2022 at 4:10:27 PM UTC-5, Thomas Koenig wrote:
> >> MitchAlsup <Mitch...@aol.com> schrieb:
> >> > When I interviewed at Moto in Aug '83, Murry Goldman ask me my opinion
> >> > of RISC and where it was going.......
> ><
> >> What did you answer, and what was his reaction?
> ><
> >I lied:: I said "I think the Jury is still out".
<
> What would have been your honest answer?
<

The truth is that I knew so little about RISC that I had not yet made up my mind.

BGB

unread,

Oct 18, 2022, 4:06:55 PM10/18/22

to

Not sure which to use, looks like:
2704:
4x LUT2 -> 1|2
3x LUT3 -> 2|3
LUT3+LUT4 -> 4
2708:
5x LUT2 -> 1|2
2x LUT3 + LUT4 -> 2|3
2x LUT5 -> 4

Some SRAM chips and similar could also be useful.

To build something analogous to an XC7S25 would need approximately 7500
of the 2708s though... No idea how one would program it though, as
programming each chip with an EPROM burner and hand-wiring everything
seems impractical.

Slightly more practical would be 8Mb parallel EPROMs, which could reduce
ROM chip count to ~ 3750 and also more closely match the bit-counts of
actual FPGAs (3xLUT6->2|3, 4xLUT5->2, 5xLUT4->1|2; 6xLUT3->1; 8xLUT2->1).

Though, 512Kb EPROMs for efficient handling of LUT2 and LUT4 (8xLUT2->1,
4xLUT4->2, 3xLUT5->3b, 2xLUT8->4b).

Still no idea how wiring/routing would be done (direct point-to-point
connections does not seem like it would be very practical at this scale).

Also, not a good application for breadboards, would likely make more
sense to do PCB modules with several socketed ROM chips and flip-flop
chips per board, maybe an SRAM chip, and likely pin-headers or similar
for connecting wires (both for intra-board wiring and routing to
external boards). Possibly, some fixed intra-board connections could
exist but would be controlled using DIP switches or similar.

Maybe screw-terminals for the Vcc/Vgnd wiring (possibly daisy-chaining
boards, but then needing heavy wiring as such a monstrosity is likely to
"pull a few amps".

Possibly also dedicated connections for connecting things up to a "clock
bus" or similar (possibly itself driven by a power MOSFET or similar,
say, to handle a fairly large fan-out).

Then starts thinking and wondering how actual FPGA programming works as
well as it does; seemingly one would need to pump "absurd" amounts of
data into the things in order to program them (orders of magnitude more
than the size of the bitstream files).

> Theo

Thomas Koenig

unread,

Oct 18, 2022, 5:10:07 PM10/18/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Tuesday, October 18, 2022 at 3:51:19 AM UTC-5, Anton Ertl wrote:
>> MitchAlsup <Mitch...@aol.com> writes:
>> >On Monday, October 17, 2022 at 4:10:27 PM UTC-5, Thomas Koenig wrote:
>> >> MitchAlsup <Mitch...@aol.com> schrieb:
>> >> > When I interviewed at Moto in Aug '83, Murry Goldman ask me my opinion
>> >> > of RISC and where it was going.......
>> ><
>> >> What did you answer, and what was his reaction?
>> ><
>> >I lied:: I said "I think the Jury is still out".
><
>> What would have been your honest answer?
><
> The truth is that I knew so little about RISC that I had not yet made up my mind.

So, you told the truth - the jury (being you) had not yet decided :-)

Timothy McCaffrey

unread,

Oct 19, 2022, 6:41:58 PM10/19/22

to

On Sunday, October 16, 2022 at 9:08:26 AM UTC-4, Thomas Koenig wrote:
> I have been wondering for some time what a RICS machine would
> have looked like implemented only with standard TTL chips, and
> what impact it would have had had it been released in the late
> 1970s instead of, or in competition with, the VAX.
>

> I've now found the answer to the first question. HP first released
> its HP-PA architecture with a CPU made from MSI TTL chips, described
> in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
> first shipping its HP 3000/930 in 1986. Its CPU had six printed
> circuit boards, running at 8 MHz, and performing around 4-5 million
> of its own instructions per second.
>

> Such a machine (with smaller caches, probably) in 1978 would have
> been entirely possible, and would have outperformed the VAX by a
> reasonable factor, leading to a less CISC-y microprocessor world.

The CDC 6600 was introduced in 1965(?) and used discrete transistors.
The CDC 6400 was brought out a year or two later, and was about 1/3rd
of the performance. Both had 100ns (10Mhz) clocks, and 1us cycle time
on the memory. They were RISC before the term existed, load store, simple
easy to decode instructions. Yes, it had a 60 bit word (and was word addressed),
but all the concepts were there for designers/architects to learn how to build
a high performance architecture.

It would have been interesting if CDC had built a TTL based CDC 6000(ish) computer.
They may have, there were rumors of some CDC engineer building a single board 6400.
I have no idea how big the board was though...

- Tim

JimBrakefield

unread,

Oct 19, 2022, 7:41:51 PM10/19/22

to

The Hamburg VHDL Archive has VHDL for the 2901 and 2910 bit slice parts.
And the chips themselves are still available. Fifteen of 2901 would form the ALU of the CDC 6400.
Small TTL or ECL parts can be coded as needed directly into RTL. Not sure how one
would do one's complement using 2901? A partial list of 2901 based processors is at the AMD_Am2900
Wikipedia page.

MitchAlsup

unread,

Oct 19, 2022, 8:50:37 PM10/19/22

to

On Wednesday, October 19, 2022 at 5:41:58 PM UTC-5, timca...@aol.com wrote:

> On Sunday, October 16, 2022 at 9:08:26 AM UTC-4, Thomas Koenig wrote:
> > I have been wondering for some time what a RICS machine would
> > have looked like implemented only with standard TTL chips, and
> > what impact it would have had had it been released in the late
> > 1970s instead of, or in competition with, the VAX.
> >
> > I've now found the answer to the first question. HP first released
> > its HP-PA architecture with a CPU made from MSI TTL chips, described
> > in https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-03.pdf ,
> > first shipping its HP 3000/930 in 1986. Its CPU had six printed
> > circuit boards, running at 8 MHz, and performing around 4-5 million
> > of its own instructions per second.
> >
> > Such a machine (with smaller caches, probably) in 1978 would have
> > been entirely possible, and would have outperformed the VAX by a
> > reasonable factor, leading to a less CISC-y microprocessor world.
> The CDC 6600 was introduced in 1965(?) and used discrete transistors.
> The CDC 6400 was brought out a year or two later, and was about 1/3rd
> of the performance. Both had 100ns (10Mhz) clocks, and 1us cycle time
> on the memory. They were RISC before the term existed, load store, simple
> easy to decode instructions. Yes, it had a 60 bit word (and was word addressed),
> but all the concepts were there for designers/architects to learn how to build
> a high performance architecture.
<

Instruction fetch and decode were actually pipelined, the rest (6600) was
simply concurrent managed by the scoreboard. The 6400 was a 6600
less the scoreboard.
<
The scoreboard managed instruction issue (operands ready) and instruction
retirement (result ready). Without the scoreboard, the 6400 issued the next
instruction when the current instruction wanted to retire.

MitchAlsup

unread,

Oct 19, 2022, 8:52:32 PM10/19/22

to

Only the long-add unit.
<
You would still need the multiply unit, the divide unit, the stunt box, the increment unit,
and the branch unit.
<
And then there would be the peripheral processors,...

MitchAlsup

unread,

Oct 19, 2022, 8:53:17 PM10/19/22

to

On Wednesday, October 19, 2022 at 6:41:51 PM UTC-5, JimBrakefield wrote:

6600 (and 6400) were 1-s complement machines.

Thomas Koenig

unread,

Oct 20, 2022, 3:52:53 PM10/20/22

to

Theo <theom...@chiark.greenend.org.uk> schrieb:

> Thomas Koenig <tko...@netcologne.de> wrote:
>> Theo <theom...@chiark.greenend.org.uk> schrieb:
>> > That's interesting. It feels like a nice hobbyist project for somebody to
>> > try today...
>>
>> Today, it would be possible to work from one of the published design
>> documents for earlier RISC architectures, but it would still be
>> a tall order (unless somebody wrote a VHDL-to-74xx converter :-)
>
> There were some quite funky MSI 74xx chips back in the day - various
> bitslices, pieces of an ALU, register files, etc. Somewhat hard to get hold
> of these days, but perhaps would simplify things.

I assume they used those more complex chips on the HP-PA design.

The 74xx register files, like the 4*4 bit 74670, can read and
write one data point at the same time, so implementing the
classical 2R1W RISC register file would require having two
sets of registers, or "only" 128 chips for 32 32-bit registers.

An alternative would be to build the register file from D flip-flops,
they had eight bits on the 74374, so you would need the same
number of memory chips.

> Of course, if you want to go all FPGA you could just have some flipflops and
> 27xx EPROMs :-)

They used PLAs in the HP design, which were just becoming available
in the late 1970s (their availability being a gamble that was
described in "The Soul of a New Machine"), so the random logic would
have had to be built up from individual chips, making it larger
and more expensive.

> (that would not do wonders for your cycle time, however)

Very probably.

EricP

unread,

Oct 21, 2022, 2:55:56 PM10/21/22

to

Thomas Koenig wrote:
> Theo <theom...@chiark.greenend.org.uk> schrieb:
>> Thomas Koenig <tko...@netcologne.de> wrote:
>>> Theo <theom...@chiark.greenend.org.uk> schrieb:
>>>> That's interesting. It feels like a nice hobbyist project for somebody to
>>>> try today...
>>> Today, it would be possible to work from one of the published design
>>> documents for earlier RISC architectures, but it would still be
>>> a tall order (unless somebody wrote a VHDL-to-74xx converter :-)
>> There were some quite funky MSI 74xx chips back in the day - various
>> bitslices, pieces of an ALU, register files, etc. Somewhat hard to get hold
>> of these days, but perhaps would simplify things.
>
> I assume they used those more complex chips on the HP-PA design.
>
> The 74xx register files, like the 4*4 bit 74670, can read and
> write one data point at the same time, so implementing the
> classical 2R1W RISC register file would require having two
> sets of registers, or "only" 128 chips for 32 32-bit registers.
>
> An alternative would be to build the register file from D flip-flops,
> they had eight bits on the 74374, so you would need the same
> number of memory chips.

Try a 74172 8x2-bit 2R1W 3-port register file with tri-state output.

For comparison, looked at the VAX 780 internals (design docs dated 1978).
It used 3 banks of Nat Semi 85S68 16x4-bit single port register files.
I'm guessing 2 banks for 2R ports of the 16 ISA registers,
plus 1 bank for temp registers.

780's barrel shifter was built with a tree of 25S10 4-bit shifters.
It takes 7 input bits and selects a 4-bit field for shifts of 0..3 bits.

780's ALU was 74181's.

>> Of course, if you want to go all FPGA you could just have some flipflops and
>> 27xx EPROMs :-)
>
> They used PLAs in the HP design, which were just becoming available
> in the late 1970s (their availability being a gamble that was
> described in "The Soul of a New Machine"), so the random logic would
> have had to be built up from individual chips, making it larger
> and more expensive.

PLAs were available in the early 1970's but they were mask programmed
and, as for ROMs, IIRC a mask set cost something like $50k.

Field Programmable Logic Arrays were available circa 1975
such as Signetics 82S100 with 48 AND terms and 8 OR outputs.
These were fuse programmed and not erasable.

Some cpus like 6502 used PLAs in their decoder/sequencer.
VAX 780 used ROMs to look up the initial micro-sequencer jump address
for the start of each instruction.

CMOS RCA-1802 circa 1975 and used static logic for decode and
sequencing (a Johnson counter plus a bunch of random logic)
as CMOS doesn't have static PLAs and they wanted static logic
so you could clock at any frequency up to 5 MHz max,
or turn clock off altogether and draw only ~0.1 uA to retain state.

>> (that would not do wonders for your cycle time, however)
>
> Very probably.

A 74S151 8:1 mux (as used in 780) has worst case delay of 18 ns
and burns 225 mW. So you get about 5 of them to the Watt.

Thomas Koenig

unread,

Oct 21, 2022, 5:29:18 PM10/21/22

to

EricP <ThatWould...@thevillage.com> schrieb:

> Thomas Koenig wrote:
>> Theo <theom...@chiark.greenend.org.uk> schrieb:
>>> Thomas Koenig <tko...@netcologne.de> wrote:
>>>> Theo <theom...@chiark.greenend.org.uk> schrieb:
>>>>> That's interesting. It feels like a nice hobbyist project for somebody to
>>>>> try today...
>>>> Today, it would be possible to work from one of the published design
>>>> documents for earlier RISC architectures, but it would still be
>>>> a tall order (unless somebody wrote a VHDL-to-74xx converter :-)
>>> There were some quite funky MSI 74xx chips back in the day - various
>>> bitslices, pieces of an ALU, register files, etc. Somewhat hard to get hold
>>> of these days, but perhaps would simplify things.
>>
>> I assume they used those more complex chips on the HP-PA design.
>>
>> The 74xx register files, like the 4*4 bit 74670, can read and
>> write one data point at the same time, so implementing the
>> classical 2R1W RISC register file would require having two
>> sets of registers, or "only" 128 chips for 32 32-bit registers.
>>
>> An alternative would be to build the register file from D flip-flops,
>> they had eight bits on the 74374, so you would need the same
>> number of memory chips.
>
> Try a 74172 8x2-bit 2R1W 3-port register file with tri-state output.

Yep, a perfect fit for a classic RISC 2R1W register file. The power
dissipation is amazing, though - 560 mW for 16 bits.

> For comparison, looked at the VAX 780 internals (design docs dated 1978).
> It used 3 banks of Nat Semi 85S68 16x4-bit single port register files.
> I'm guessing 2 banks for 2R ports of the 16 ISA registers,
> plus 1 bank for temp registers.

Where can these be found? I tried looking at Bitsavers, but did not
find them.

> 780's barrel shifter was built with a tree of 25S10 4-bit shifters.
> It takes 7 input bits and selects a 4-bit field for shifts of 0..3 bits.
>
> 780's ALU was 74181's.

Did they actually use 74182s for carry lookahead?

>
>>> Of course, if you want to go all FPGA you could just have some flipflops and
>>> 27xx EPROMs :-)
>>
>> They used PLAs in the HP design, which were just becoming available
>> in the late 1970s (their availability being a gamble that was
>> described in "The Soul of a New Machine"), so the random logic would
>> have had to be built up from individual chips, making it larger
>> and more expensive.
>
> PLAs were available in the early 1970's but they were mask programmed
> and, as for ROMs, IIRC a mask set cost something like $50k.
>
> Field Programmable Logic Arrays were available circa 1975
> such as Signetics 82S100 with 48 AND terms and 8 OR outputs.
> These were fuse programmed and not erasable.
>
> Some cpus like 6502 used PLAs in their decoder/sequencer.
> VAX 780 used ROMs to look up the initial micro-sequencer jump address
> for the start of each instruction.
>
> CMOS RCA-1802 circa 1975 and used static logic for decode and
> sequencing (a Johnson counter plus a bunch of random logic)
> as CMOS doesn't have static PLAs and they wanted static logic
> so you could clock at any frequency up to 5 MHz max,
> or turn clock off altogether and draw only ~0.1 uA to retain state.
>
>>> (that would not do wonders for your cycle time, however)
>>
>> Very probably.
>
> A 74S151 8:1 mux (as used in 780) has worst case delay of 18 ns
> and burns 225 mW. So you get about 5 of them to the Watt.

Power usage was really huge in those days.

EricP

unread,

Oct 21, 2022, 7:34:58 PM10/21/22

to

The 780 documents only had the part number 85S68.
It took a lot of searching to find out it was Nat Semi.

Found it in Bitsavers *components* archive

http://bitsavers.trailing-edge.com/components/national/_dataBooks/1976_National_TTL_Databook.pdf

page 3-83, 85S68 64-Bit Edge Triggered Registers, 40 ns max.

>> 780's barrel shifter was built with a tree of 25S10 4-bit shifters.
>> It takes 7 input bits and selects a 4-bit field for shifts of 0..3 bits.
>>
>> 780's ALU was 74181's.
>
> Did they actually use 74182s for carry lookahead?

Yes 74S181, 74S182

Anton Ertl

unread,

Oct 22, 2022, 4:48:05 AM10/22/22

to

Thomas Koenig <tko...@netcologne.de> writes:
>They used PLAs in the HP design, which were just becoming available
>in the late 1970s (their availability being a gamble that was
>described in "The Soul of a New Machine")

The Eagle described in "The Soul of a New Machine" "gambled" on PALs
<https://en.wikipedia.org/wiki/Programmable_Array_Logic> (at that
point there was only one source for PALs), not PLAs
<https://en.wikipedia.org/wiki/Programmable_logic_array>.

|PLAs differ from programmable array logic devices (PALs and GALs) in
|that both the AND and OR gate planes are programmable.

Hmm, sounds like DG could have replaced the PALs with PLAs if their
source for PALs dried up, but maybe PLAs of the needed capacity were
not available.

Thomas Koenig

unread,

Oct 22, 2022, 1:02:48 PM10/22/22

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> Thomas Koenig <tko...@netcologne.de> writes:
>>They used PLAs in the HP design, which were just becoming available
>>in the late 1970s (their availability being a gamble that was
>>described in "The Soul of a New Machine")
>
> The Eagle described in "The Soul of a New Machine" "gambled" on PALs
><https://en.wikipedia.org/wiki/Programmable_Array_Logic> (at that
> point there was only one source for PALs), not PLAs
><https://en.wikipedia.org/wiki/Programmable_logic_array>.

OK, that distiction had escaped me. Seems that both have the
same function (in general), to provide logic functions, but with
a different implementation.

>|PLAs differ from programmable array logic devices (PALs and GALs) in
>|that both the AND and OR gate planes are programmable.
>
> Hmm, sounds like DG could have replaced the PALs with PLAs if their
> source for PALs dried up, but maybe PLAs of the needed capacity were
> not available.

It seems PLAs are slower than PALs. Looking at the 1976 National
Data Handbook, it says that for their DM7575 chip, it takes 100
microseconds for a transition from high to low. The Eclipse MV was
a 5 MHz machine with internal 10 MHz on the CPU (first cycle for
microcode, second cycle for actually doing things), that would have
made it infeasable. Seems that the 74330/74331 was a bit faster,
with 70 ns maximum time.

Coming back to the hypothetical late 1970s RISC machine, using
a PLA for some decoding would have been feasible, but could have
limited cycle times.

BGB

unread,

Oct 22, 2022, 2:17:39 PM10/22/22

to

Seems like, depending on the ISA, one could map the decode process to a
few cascaded EPROMs.

Say, for a 3R RISC with 32-bit instructions and with no opcode bits
overlapping registers:
128K ROM, Major Opcode (Function Unit)
128K ROM, Minor Opcode (Op Control)
The high 2 bits of the minor opcode could be used for "register config",
where:
4K ROM: Rs Port (s,n)
4K ROM: Rt Port (s,t)
4K ROM: Rn Port (t,n)
2x 4K: Immed (s,t)
With 2b for config, say:
00: Rs, Rt, Rn (Rs=Rs, Rt=Rt, Rn=Rn, Imm=0)
01: Rs, Imm5, Rn (Rs=Rs, Rt=Imm, Rn=Rn, Imm=Rt)
10: 2R? Rn, Rs, Rn (Rs=Rn, Rt=Rs, Rn=Rn, Imm=0)
11: Imm10, Rn (Rs=Rn, Rt=Imm, Rn=Rn, Imm=(Rt,Rs))

Something like RISC-V would be a little harder to map efficiently to
ROMs in this way.

Wouldn't be super efficient use of ROMs, but would probably be more
effective than trying to use them to mimic LUTs in any case (realized
after last post that this is very inefficient use of the ROMs).

But, chips that mimic the behavior of a logic block (a few LUTs and some
flip-flops and similar), aren't really a thing.

Seems like pin-count would be a major limiting factor for the latter, as
the amount of LUTs they could fit onto the chip would likely (in any
case) be far larger than what would be practical to map through IO pins.

...

MitchAlsup

unread,

Oct 22, 2022, 3:51:18 PM10/22/22

to

Since you said 3R RISC above::
<
A 1:64 decoder (6-bits) is all one needs to "decode" My 66000 instructions
and of these only 35 entries are assigned instructions--the rest "decode" to
UNIMP (raise OPERATION). Pretty sure you don't need a ROM for this. Almost
all of these are single pipeline cycle execution and the 2 which are not have
easy 1st cycle pipeline operation.
<
Also note: The decoder of the Mc 88100 was a single NOR plane.

<
> 128K ROM, Minor Opcode (Op Control)
> The high 2 bits of the minor opcode could be used for "register config",
> where:
> 4K ROM: Rs Port (s,n)
> 4K ROM: Rt Port (s,t)
> 4K ROM: Rn Port (t,n)
> 2x 4K: Immed (s,t)
> With 2b for config, say:
> 00: Rs, Rt, Rn (Rs=Rs, Rt=Rt, Rn=Rn, Imm=0)
> 01: Rs, Imm5, Rn (Rs=Rs, Rt=Imm, Rn=Rn, Imm=Rt)
> 10: 2R? Rn, Rs, Rn (Rs=Rn, Rt=Rs, Rn=Rn, Imm=0)
> 11: Imm10, Rn (Rs=Rn, Rt=Imm, Rn=Rn, Imm=(Rt,Rs))
<

The 3 high bits of the Major OpCode determines the instruction format::
000 -- Extended instructions with immediate operands
001 -- Extended instructions with register operands
010 -- raise Operation
011 -- Branches
100 -- LDs with 16-bit immediates
101 -- STs with 16-bit immediates
110 -- integers with 16-bit immediates
111 -- logical with 16-bit immediates.
<
Within the 001 groupings:: next 3 bits::
000 -- raise Operation
001 -- Memory Refs with [Rbase+Rindex<<scale] addressing
010 -- 2-Operand instructions {Int, FP, Logical}
011 -- raise Operation
100 -- 3-operand instructions (FMAC, INS, CMOV)
101 -- 1-operand instructions
11x -- raise Operation

>
> Something like RISC-V would be a little harder to map efficiently to
> ROMs in this way.
>

Still don't see the need for ROMs--just 1:2^n decoders.

Joe Pfeiffer

unread,

Oct 22, 2022, 6:06:12 PM10/22/22

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> Thomas Koenig <tko...@netcologne.de> writes:
> |PLAs differ from programmable array logic devices (PALs and GALs) in
> |that both the AND and OR gate planes are programmable.
>
> Hmm, sounds like DG could have replaced the PALs with PLAs if their
> source for PALs dried up, but maybe PLAs of the needed capacity were
> not available.

But wouldn't that have required a redesign?

David Schultz

unread,

Oct 22, 2022, 6:52:46 PM10/22/22

to

Most likely. Unless they were pin compatible. The reason for using PALs
was that fixing a bug didn't always require moving a wire.

I have read the book a couple of times and still remember the
description of the prototypes. All wire wrapped. Corrections were done
with a different color wire and the book describes the back of the board
gradually changing color. There were a lot of changes.

PLAs seem to have not had wide availability while PALs stuck around for
a while. Eventually becoming the electrically erasable GALs.

--
http://davesrocketworks.com
David Schultz

BGB

unread,

Oct 22, 2022, 7:08:24 PM10/22/22

to

I was thinking of ROMs mostly as an "efficient" way to map an input
pattern to an output pattern, at least a lot more so than doing it with
a crapton of NAND gate ICs or something.

The idea of using a 3R RISC is mostly that it (mostly) isolates the
register and opcode fields.

An ISA design like BJX2 would not map over as nicely, because the Rt and
Rm fields are often used as part of the opcode (in 2R and 2RI encodings).

I wasn't assuming an ISA with instruction format split up by block.

For something like SH-2 though, one could do the entire decoding process
with multiple parallel ROM chips (mostly needing multiple chips to drive
the output signals and ports).

Some other 16-bit ISA designs could also have that property (pretty much
the entire combinatorial space can fit into 64K ROMs).

Though one other possibility could be an ISA with 16-bit instructions
and 8 registers.

Say:
zzzz-zzzn-nnss-sttt (3R block)
zzzz-zzzn-nnss-szzz (2R forms)
zzzz-zzzn-nnii-iiii (2RI Imm6 forms)

So, say:
0z: Ld/St (Disp3)
1z: 3R ALU (ADD/SUB/-/-/-/AND/OR/XOR)
...

With registers, say:
R0/AX, R1/BX, R2/CX, R3/DX, R4/IX, R5/JX, R6/KX, R7/SP
R0..R3: Scratch
R4..R6: Preserved
R7: SPRs

Could have a separate CR space:
C0: PC
C1: LR
C2: GP / GBR
..
Where, say, R0 and R1 will function as PC and GP if used as a base register.

Could possibly make sense as a 16-bit ISA design.

In this case, could fit the 3 register ports into a pair of ROMs:
64K: OpA
64K: OpB
64K: RegA
64K: RegB
64K: Imm (Immediate Field bits)

That and/or use dedicated electronics for decoding registers and
immediate values.

>>
>> Something like RISC-V would be a little harder to map efficiently to
>> ROMs in this way.
>>
> Still don't see the need for ROMs--just 1:2^n decoders.
>

Will not claim to be an expert on older style logic chips (or which all
chips exist, or are better for a given use, ...). I had looked some up,
but may miss a lot of them.

As noted, VLSI chips have existed longer than I have.

Most of my own boards had mostly been wiring a microcontroller and
similar into some discrete components (mostly simple logic built from
resistors and transistors, usually then interfacing with
power-transistors or power MOSFETs or similar).

A lot of this was in-turn combining perfboard with "wire wrap and
solder". Often using magnet wire for the signal wires as it worked
reasonably well (and in a few cases mechanically stabilizing the wires
using PVA glue).

Though, partly this was because 28 AWG magnet wire is easier to work
with than 30 AWG sheathed wire. Usually with 24 AWG for low-power
Vcc/Gnd wires (and 20 AWG for higher power cases). Where, 20 AWG magnet
wire is basically about the thickest wire one can use and still have the
ability to route it through the holes in the perfboard...

Haven't done much of this recently though, partly because "not having a
source of income" kinda cuts back on things that burn money, which
includes most electronics projects (programming is still basically free
though).

While FPGAs are not themselves available as DIP, some of the FPGA boards
are available in forms where one can fit them into a DIP40 socket or
similar (so these can be used in a similar way to microcontrollers, just
a lot more expensive than a typical microcontroller).

...

Thomas Koenig

unread,

Oct 23, 2022, 4:06:49 AM10/23/22

to

David Schultz <david....@earthlink.net> schrieb:

> On 10/22/22 5:06 PM, Joe Pfeiffer wrote:
>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>> |PLAs differ from programmable array logic devices (PALs and GALs) in
>>> |that both the AND and OR gate planes are programmable.
>>>
>>> Hmm, sounds like DG could have replaced the PALs with PLAs if their
>>> source for PALs dried up, but maybe PLAs of the needed capacity were
>>> not available.
>>
>> But wouldn't that have required a redesign?
>
> Most likely. Unless they were pin compatible. The reason for using PALs
> was that fixing a bug didn't always require moving a wire.
>
> I have read the book a couple of times and still remember the
> description of the prototypes. All wire wrapped. Corrections were done
> with a different color wire and the book describes the back of the board
> gradually changing color. There were a lot of changes.

Compare that to the first HP-PA implementation, done in TTL, where they
went to printed circuit boards right from the beginning. If this was
due to better methodology, better simulation tools a few years later,
or RISC vs. CISC is unclear.

> PLAs seem to have not had wide availability while PALs stuck around for
> a while. Eventually becoming the electrically erasable GALs.

PLAs have survived virtually in the form of espresso, the boolean
minimization program :-)

Thomas Koenig

unread,

Oct 23, 2022, 4:09:21 AM10/23/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

> A 1:64 decoder (6-bits) is all one needs to "decode" My 66000 instructions
> and of these only 35 entries are assigned instructions--the rest "decode" to
> UNIMP (raise OPERATION).

There's a bit more to that. Determining the size of the instruction
including operands includes looking at the bits of the opcodes,
and two additional bits.

Unless you don't consider that as part of the decoding.

Thomas Koenig

unread,

Oct 23, 2022, 5:39:15 AM10/23/22

to

BGB <cr8...@gmail.com> schrieb:

Do you know espresso? It is a logic minimization program developed
at MIT and available from
https://github.com/classabbyamp/espresso-logic .

You can build up a bit table of an ISA and give the results
according to instruction format, type of instruction, etc.
The size of the resulting decoding PLA is something you can use
as a rough measure of the complexity of an ISA (and would be
more direct if you used a NOR plane).

BGB

unread,

Oct 23, 2022, 5:44:13 AM10/23/22

to

Yeah, decode probably needs to do "something" at least...

Only other alternative would likely be to have "instructions" which map
directly to the internal state of the CPU pipeline.

For BJX2, this would mean, say:
3x 18-bit, for major+minor opcode (2x 6b + 2x 3b control);
6x 7-bit for register ports (combined GPR+SPR+CR space);
3x 33-bit for immediate values.

So, say, a 256 bit instruction word.
With the layout likely changing between implementations.

This would be clearly non-viable though.

I am dealing a little bit as-is with some fall-out from a few "minor"
ABI tweaks (namely, function calls now have a dedicated spill space more
like in MS-style ABIs).

Though I had initially thought I had escaped without any real adverse
effect from the tweak, but it seems it creates a binary compatibility
issue with TKGDI calls (which turns out was also prone to break with a
few other compiler options due to having a function which went over the
argument count limit; breaking if the program and shell were built with
different ABI and XGPR flags).

Had modified the internal interface to be less prone to break in the
future, but this change is itself a break.

Otherwise, I have gone and added a few special purpose ops to help with
stack canary values, mostly to make them more secure. With the old
mechanism, if a binary was not rebuilt it would be possible (in premise)
to figure out and then forge the canary values when making use of a
buffer overflow. I have added some new instructions to hopefully
sidestep this issue (a pure software solution would have had a higher
runtime overhead).

But, alas...

...

Anton Ertl

unread,

Oct 23, 2022, 8:07:35 AM10/23/22

to

It depends. If they are worried that their source dries up, they can
prepare for that eventuality by putting the PALs on a separate board
(or separate boards), and preparing a drop-in replacement. A friend
of mine who has had supply chain problems in recent years has gone
that path. But of course this incurs additional cost, and, as
discussed, might have slowed the Eagle down.

Anton Ertl

unread,

Oct 23, 2022, 8:23:16 AM10/23/22

to

Thomas Koenig <tko...@netcologne.de> writes:
>David Schultz <david....@earthlink.net> schrieb:

>> I have read the book a couple of times and still remember the
>> description of the prototypes. All wire wrapped. Corrections were done
>> with a different color wire and the book describes the back of the board
>> gradually changing color. There were a lot of changes.
>
>Compare that to the first HP-PA implementation, done in TTL, where they
>went to printed circuit boards right from the beginning. If this was
>due to better methodology, better simulation tools a few years later,
>or RISC vs. CISC is unclear.

It may just have been good-enough turn-around times for printing and
otherwise preparing a new circuit board.

Joe Pfeiffer

unread,

Oct 23, 2022, 11:49:19 AM10/23/22

to

David Schultz <david....@earthlink.net> writes:

> On 10/22/22 5:06 PM, Joe Pfeiffer wrote:
>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>
>>> Thomas Koenig <tko...@netcologne.de> writes:
>>> |PLAs differ from programmable array logic devices (PALs and GALs) in
>>> |that both the AND and OR gate planes are programmable.
>>>
>>> Hmm, sounds like DG could have replaced the PALs with PLAs if their
>>> source for PALs dried up, but maybe PLAs of the needed capacity were
>>> not available.
>> But wouldn't that have required a redesign?
>
> Most likely. Unless they were pin compatible. The reason for using
> PALs was that fixing a bug didn't always require moving a wire.

I guess I wasn't even considering redesigning the board, just
reprogramming all the logic in the PAL (PLA). With a PAL/PLA that's
where most of the design takes place.

MitchAlsup

unread,

Oct 23, 2022, 2:39:01 PM10/23/22

to

On Sunday, October 23, 2022 at 3:09:21 AM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > A 1:64 decoder (6-bits) is all one needs to "decode" My 66000 instructions
> > and of these only 35 entries are assigned instructions--the rest "decode" to
> > UNIMP (raise OPERATION).
<
> There's a bit more to that. Determining the size of the instruction
> including operands includes looking at the bits of the opcodes,
> and two additional bits.
<

Given a 1-wide implementation and a pointer into the inst-buffer all the size
logic can be done "later", the register specification fields routed directly to
the register file ports. The output of that 1:64 decoder is then used to select
from {PRED, Shift, Mem, 2-OP, 3-OP, 1-OP, major} "what is selected on the data-"
"path" 1st cycle select lines.
<
So, the main job of the 1:64 decoder is to enable the "other decoders" which
are all merged (OR) into selections to be asserted next cycle.
<
Meanwhile forwarding compares and find firsts are performed; so, proper
data arrives on the operand busses.

>
> Unless you don't consider that as part of the decoding.
<

Routing constants to the operand busses is the easy part; as is determining
instruction length.

BGB

unread,

Oct 23, 2022, 2:50:44 PM10/23/22

to

Not heard of it.

But, yeah, for something like I had imagined here (with a ROM), it might
sense to try to cram nearly all of the opcode bits into a shared 17-bit
blob, say:
ZZZZ-ZZZZ-ZZZn-nnnn ssss-sttt-ttZZ-ZZZZ (3R)
ZZZZ-ZZZZ-ZZZn-nnnn ssss-siii-iiZZ-ZZZZ (3RI, Imm5)
Maybe (if resources allow):
ZZZZ-ZZZZ-ZZZn-nnnn ssss-siii-iiii-iiii (3RI, Imm11)
ZZZZ-ZZZZ-ZZZn-nnnn iiii-iiii-iiii-iiii (2RI, Imm16)

With Rn/Rs/Rt mostly unable to participate in opcode selection.

Though, one may find that, often, 3R encodings may be overkill (if only
a 2R or 1R encoding is needed). But, decoding these increases the number
of bits that need to be considered for the opcode.

Or, possibly, 6b reg fields:
ZZZZ-ZZZZ-ZZnn-nnnn ssss-sstt-tttt-ZZZZ (3R)
ZZZZ-ZZZZ-ZZnn-nnnn ssss-ssii-iiii-ZZZZ (3RI, Imm6)
ZZZZ-ZZZZ-ZZnn-nnnn ssss-ssii-iiii-iiii (3RI, Imm10)
ZZZZ-ZZZZ-ZZnn-nnnn iiii-iiii-iiii-iiii (2RI, Imm16)
Using a 14-bit ROM space for opcodes.

Where, say:
R0 ..R31: Pure GPRs
R32..R47: SPRs (ZR, LR, GBR, SP, ...)
R48..R63: CRs

At least for BJX2:

Though, I have noted that in my case, cost and complexity (at least on
an FPGA) seems to be more dominated by the function units and EX stages
than by the decoder. The code for the decoder is kinda bulky (due to
lots of instructions), but seemingly not too terrible (mostly maps input
instructions to the output opcode numbers and some other numbers which
encode how to decode the register arguments).

Both the C (emulator) version and Verilog version have a loosely similar
structure for the front-end part:
A split between decoding 16 and 32 bit instructions;
64 and 96 bit merely "extend" the 32-bit format.
Main decoding is pattern/matching via nested switch/case.

Decoding 16-bit ops via a ROM would require mapping all 16 bits through
a ROM, partly because the 30zz block uses all 16 bits. If we ignore the
1R block (3xxx), the ROM would drop to around 8 bits.

It is possible that two cascaded 10b ROMs could also deal with the
decoding space.

Register field mapping depends some on the instruction:
0..F may map to, say:
R0..R15
R16..R31
R1:R0, R17:R16, R3:R2, R19:R18, ...
C0..C15
Imm4
...

For 32-bit ops:
The register fields and immediate values are at least more consistent;
Organized into major blocks (0/1/2/3, 8/9/A/B);
F0, block effectively uses 8-bits for 3R blocks;
Increases to 12 for 2R blocks, and 16 for the 1R block.
F1, uses 4 bits.
F2, uses 4 bits for 3RI, and 8 for 2RI.
F3, reserved for now.
F8, uses 3 bits (2RI), 2-bits for Op96 (LDI/ADD/-/-)
F9, reserved for now.
FA, 0 bits (Imm24)
FB, 0 bits (Imm24)
Many of these blocks add 1 bit as E.Q is used as an opcode bit.

Register types:
R0..R31 or R0..R63 (XGPR)
Though, the F8 block uses a different encoding for Rn.
C0..C31

Excluding the 1R block, the 32-bit BJX2 decoder frontend could likely
fit into a 20-bit ROM:
xxxx-ZxZZ-xxxx-ZZZZ-ZZZZ-Zxxx-ZZZZ-ZZZZ
Where:
x=N/A for opcode, Z=needed for opcode.

Could use possibly use smaller ROMs if the blocks were split up based on
major encoding blocks.

The 32-bit decoder doesn't itself care as much about F/E/7/9, as their
scope is mostly limited to the register fields and predication mode
(thus mostly independent of the main part of the decoder).

The outer "bundle decoder" mostly cares about this, to select output
from the correct decoder (and, with RV mode enabled, needs to deal with
both BJX2 and RISC-V).

The situation isn't quite as bad for LUTs, as LUTs can mostly ignore
"not relevant" bits for "sparse" parts of the ISA, so it seems to make
more sense to divide stuff up into "sparse" and "dense" areas, and to
group similar encodings together (so, densely pack all the 2R spaces,
and all the 1R space into a single block, ...).

So, if a new instruction in the listing uses existing logic, it mostly
"just disappears into the LUTs". But, if it needs new logic in the EX
stages, ..., this is where cost comes in.

Despite being seemingly minor, adding new SPRs or new immediate-decoding
cases, is potentially unexpectedly expensive.

So, for example, had looked into adding a "BRcc Rn, Imm5, Label" case;
but this required a hack to allow passing the immediate in a register
port. This required a new internal SPR, which (while only applied to the
Rs and Rt ports), managed to add ~ 2k LUT to the size of the core (which
led me to reconsider this). Similar issue also exists with adding
instructions for FPU Immediate values.

Though, costs seem to multiply in areas near "hot path" parts of the
code, which basically touches the ID2 stage, ALU, and the SR.T bit,
pretty hard (for SR.T, a fair chunk of "weight" hangs off a single
status-flag bit, so stuff effecting this bit can really "rock the boat"
as it were).

Well, along with the L1 pipeline-stall flag, which is tangled up in all
of this and is a frequent source of timing failures. Only way to
sidestep this one though is to redesign things such that L1 misses no
longer stall the pipeline (but, this would be a non-trivial redesign).

Things like register and immediate fields are mostly dealt with in the
second stage.

The first stage produces a few 3 and 4 bit fields which are used to
select the instruction form and similar, which drives the rest of the
decoding.

For BGBCC's disassembler, had used pattern matching over a listing table
((OpBits&PatternMask)==Pattern), which basically works as well, though
for the CPU core and emulator, nested switch/case works well.

The logic in the WEXifier is also driven mostly by nested switch and
if/else logic.

Listing tables are more compact, but generally slower to work with (than
nested switch and if/else). However, the multiple levels of ISA encoding
changes have turned some of this into a bit of a mess (whereas listing
tables and/or wrappers deal more gracefully with ISA changes).

While seemingly simpler up-front, RISC-V effectively shares more bits
between the immediate fields and opcode.

So, it would look more like:
ZZZZZZZ-ZZZZZ-xxxxx-ZZZ-ZZZZZ-ZZ-ZZZxx
Or, ~ 23 bits needed for opcode.
Some operations are different if X0 is used as Rd, ...

The encoding of many ops in RISC-V seems to treat the fields as sparse
bit-flags, which is not necessarily ideal for either ROM or LUTs.

Encodings seem to be more organized in terms of category than in terms
of how the instructions are encoded.

In my attempt, had used a similar 2-step decoding process for RISC-V
mode as had been used for BJX2.

One can also argue that RISC-V's immediate fields being chewed up helps
possibly reducing cost of "bit inertia" needed when decoding.

Both ISA designs have the annoyance of sign/one/zero extension.
Though, RISC-V primarily uses sign extension, whereas BJX2 mostly uses
zero-extension (with one-extension as a separate case; except for branch
displacements which use "native" sign extension).

Timothy McCaffrey

unread,

Oct 25, 2022, 6:47:48 PM10/25/22

to

On Wednesday, October 19, 2022 at 8:50:37 PM UTC-4, MitchAlsup wrote:

> Instruction fetch and decode were actually pipelined, the rest (6600) was
> simply concurrent managed by the scoreboard. The 6400 was a 6600
> less the scoreboard.

I don't think that is exactly true. I remember seeing block diagrams and the 6400 was considerably simpler than the 6600
(not to mention considerably smaller, I think a 6600 was about 3-5 times the size of a 6400. This is based on pictures
I've seen of the 6600 and being in the presence of a 6400 & 6500 (the 6500 was two 6400s, and it still wasn't as big as the 6600).

> <
> The scoreboard managed instruction issue (operands ready) and instruction
> retirement (result ready). Without the scoreboard, the 6400 issued the next
> instruction when the current instruction wanted to retire.

And the 6400 didn't have the instruction stack (loop cache) that the 6600 had.

> And then there would be the peripheral processors,...

Actually, there was only one, it just ran 10 threads (barrel processor).

> 6600 (and 6400) were 1-s complement machines.

You can implement 1's complement by doing an end-around-carry. (carry out of the MSB is connected to carry in of the LSB of the ALU).

Probably the biggest problem is implementing the console. Maybe somebody can dig up a couple of the old Vectrex video games...
:)

- Tim

JimBrakefield

unread,

Oct 25, 2022, 9:18:12 PM10/25/22

to

There exists somewhere on the internet:
"6000 Training Supplement preliminary edition January 1968" as a scanned document.
On page 6 there is a chassis frame by frame physical diagram of the 6600
"central processor layout"
that diagrams the allocation of the frames (16 total, called "chassis" in the document).
Eight are allocated to memory, each frame holding 20 4K by 12 core memory modules.
One is allocated to the peripheral processors, one is empty and the remaining six constitute the 6600 CPU.

The empty frame can be used for a second set of peripheral processors or a 6400 CPU (my inference)

The document, although marked obsolete, appears to be a through introduction to the 6600 micro architecture.
There are other 6600 documents that describe the micro-architecture in detail, this is the only
one I have that provides a chassis allocation diagram.
The possibility exists that the diagram is not strict, in that empty spaces in the frames where used
for hardware patches and overflow from the diagram's allocation?

MitchAlsup

unread,

Oct 26, 2022, 1:22:53 PM10/26/22

to

On Tuesday, October 25, 2022 at 8:18:12 PM UTC-5, JimBrakefield wrote:
> On Tuesday, October 25, 2022 at 5:47:48 PM UTC-5, timca...@aol.com wrote:

> There exists somewhere on the internet:
> "6000 Training Supplement preliminary edition January 1968" as a scanned document.
> On page 6 there is a chassis frame by frame physical diagram of the 6600
> "central processor layout"
> that diagrams the allocation of the frames (16 total, called "chassis" in the document).
> Eight are allocated to memory, each frame holding 20 4K by 12 core memory modules.
> One is allocated to the peripheral processors, one is empty and the remaining six constitute the 6600 CPU.
<

"The Design of a Computer", Thornton; has an overhead view on page 35 as figure 26.

>
> The empty frame can be used for a second set of peripheral processors or a 6400 CPU (my inference)
<

A second PP is believable, a 6400 CPU would imply that a 6400 was 1/5th the size of a 66000.
{A 6600 'core' is 5-bays in the figure given above--in comparison: memory is 8 bays, and the PPs
are 1 bay. But I don't understand the frame labeled PP and another frame labeled Peripheral.
Is the PP the processors and Peripheral the connections points, the read/write pyramid ???}
<
CPU control is the scoreboard.

JimBrakefield

unread,

Oct 26, 2022, 5:28:09 PM10/26/22

to

Ah, yes. However it is not as detailed as the one in 6000 Training Supplement:

The Training Supplement shows peripheral controllers included in the memory bank chassis (all eight of them).

And shows chassis #12 housing the display controllers and peripheral controller(s).
Leaving only five chassis (#s 2, 5, 6, 7 & 8)holding the 6600 CPU.
#6 holds the two multiplication units sans exponent adders.
#7 and half of #8 hold the multi- port register files; leading me to wonder how Thornton squeezed the 6400 CPU into a single chassis?

6400 presumably used an ALU similar to the one in the CDC1604 with single bit shifting by one on
multiply, divide, normalize and shift clock cycles.
The 6400 was a single issue multi-clock cycle device (e.g. worked on one instruction at a time)?

MitchAlsup

unread,

Oct 26, 2022, 5:55:40 PM10/26/22

to

My guess is that the 2R1W 6400 register file is a LOT smaller than the {6R4W X register file of the 6600,
the 4R2W B file, and the 2R1W A file:: 12 read ports and 7 write ports} By a very large margin (maybe
10× smaller) 1 INC unit, 1 multiplier (perhaps a smaller re-used tree), and a commensurately slower divider,
no stunt box, and less aggessive fetch.

>
> 6400 presumably used an ALU similar to the one in the CDC1604 with single bit shifting by one on
> multiply, divide, normalize and shift clock cycles.
<

Divide and normalize, sure. Maybe a 6×60 multiplier "tree" to prevent multiply from becoming a
bottleneck.

<
> The 6400 was a single issue multi-clock cycle device (e.g. worked on one instruction at a time)?
<

Overlapping only fetch, with 1 concurrent outstanding memory ref.

Ivan Godard

unread,

Oct 28, 2022, 4:56:19 PM10/28/22

to

@Mitch: test followup post

MitchAlsup

unread,

Oct 28, 2022, 6:37:20 PM10/28/22

to

@Ivan: I see you.

Timothy McCaffrey

unread,

Nov 1, 2022, 3:32:48 PM11/1/22

to

On Wednesday, October 26, 2022 at 1:22:53 PM UTC-4, MitchAlsup wrote:
> On Tuesday, October 25, 2022 at 8:18:12 PM UTC-5, JimBrakefield wrote:
> > On Tuesday, October 25, 2022 at 5:47:48 PM UTC-5, timca...@aol.com wrote:
>

> "The Design of a Computer", Thornton; has an overhead view on page 35 as figure 26.
> >
> > The empty frame can be used for a second set of peripheral processors or a 6400 CPU (my inference)

The 6700 was a 6600 + 6400. Not sure how many were sold. IIRC, the OS ran on the 6400 and the 6600
only ran user programs. This was a continuing problem at CDC, where they kept putting the OS on the slowest
processors (originally, most of the OS ran on the PPs, that was a BIG mistake).

> <
> A second PP is believable, a 6400 CPU would imply that a 6400 was 1/5th the size of a 66000.
> {A 6600 'core' is 5-bays in the figure given above--in comparison: memory is 8 bays, and the PPs
> are 1 bay. But I don't understand the frame labeled PP and another frame labeled Peripheral.
> Is the PP the processors and Peripheral the connections points, the read/write pyramid ???}

AFAIK, the second PP wasn't added until (maybe?) the 7600. They were certainly there by the
Cyber 170/7xx era (late 70s). Note that the 170/750 was basically a 7600 CPU internally, just
missing a few 7600 features (like the CPU being able to do I/O without the PPs). It was physically
about the size of the 6400 if I remember right.

The Peripheral Processors were different than the controllers. The PPs connected to the controllers
using channels. IIRC, any PP could talk to any channel, I'm not sure if the channels were statically assigned
to controllers or not, or they worked on some kind of shared bus.

D Gillies

unread,

Nov 11, 2022, 10:20:49 PM11/11/22

to

On Monday, October 17, 2022 at 10:27:29 AM UTC-7, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> And it was, it just was a step into the wrong direction. And I think
> that if the VAX architects had the RISC papers available, they would
> have recognized that and designed the VAX differently. Maybe not as a
> RISC (after all, it should also support PDP-11 code), but maybe more
> like a 32-bit PDP-11 with more registers and maybe some addressing
> modes deprecated.
> >In effect, VAX and 432 started the pendulum swinging away
> >from CISC towards RISC.

No, and No.

In high school and college I was a big fan of computer architecture which became a prime interest. I used a UIUC PDP 11/70 (Unix v7) in high school and Vax 11/750's and 11/780's (BSD 4.15 and BSD 4.2) in college and graduate school.

In the 1970's memory was horribly expensive and slow. Cache memories were not very popular because they were only fast if integrated onto a custom chip, so virtually nobody had cache memories. Under the circumstances of very high memory costs, very dense highly efficient instruction sets can help cut the von Neumann bottleneck in half. It didn't look like things would get better in the 1980's. And then Japan and MITI came along and suddenly in the mid 1980's Japan leapfrogged the USA memory makers and memory was suddenly cheap as dirt, and FAST :

https://www.youtube.com/watch?v=bwhU9goCiaI&t=703s

Under these circumstances, wide low-density instruction sets (RISC) suddenly become affordable for your computer design. The advent of the 801 minicomputer and MIPS processor weren't so much "The right idea at the right time". They were the "The right idea before it was time" and Japan just happened to change all of our futures shortly after these papers published and the projects were promoted.

- Don Gillies, PhD CS
Bluescape, Inc.

EricP

unread,

Nov 13, 2022, 10:09:22 AM11/13/22

to

VAX 780 had a cache: 8 kB with parity, 2 way set assoc, combined I & D.
It was built from 93425A/74S214 1 kb x 1 SRAMS, 50 ns, 500 mW.
42 chips for the index, 72 for the data on two PCB's.

Anton Ertl

unread,

Nov 13, 2022, 1:34:43 PM11/13/22

to

D Gillies <dwgi...@gmail.com> writes:
>In the 1970's memory was horribly expensive and slow. Cache memories were =
>not very popular because they were only fast if integrated onto a custom ch=
>ip, so virtually nobody had cache memories. Under the circumstances of ver=
>y high memory costs, very dense highly efficient instruction sets can help =

>cut the von Neumann bottleneck in half.

It's not clear that VAX is a "very dense highly efficient instruction
set". Looking at results from my code size measurement postings. And
ARM T32 demonstrates that RISCs can have a very dens instruction set
<2017Aug...@mips.complang.tuwien.ac.at>.

And yes, given the memory prices of the late 1970s, I would expect a
RISC design that's targeted more towards code density than we saw
later. Interestingly, ARM started with the not very dense A32
instruction set, and only added the original Thumb later (and T32
(Thumb2) even later). Decoding such an instruction set would still be
quite a bit easier than decoding the VAX instruction set.

With a pipelined implementation of such an instruction set with the
memory subsystem of the VAX, I expect the implementation to run faster
and require significantly less

>It didn't look like things would g=

>et better in the 1980's.

Moore's law was well-known and it applied nicely to DRAM:

http://www.singularity.com/images/charts/DynamicRamPrice.jpg

The Japanese may have accelerated the development a little, but it was
already happening while the US companies dominated the DRAM market.

So memory was about 30 times more expensive in 1977 (when the VAX was
released) than in 1986 (when MIPS and SPARC were released) or in 1987
(when the Acorn Archimedes 305 was introduced at GBP 799 with 512KB
RAM and the 310 at GBP 875 with 1MB DRAM). I don't know what the VAX
11/780 cost in 1977, but I expect that it's more than 30 times more
than the Archimedes A310 cost in 1987 (admittedly, the VAX 11/780 has
an MMU and was more extensible, so the better comparison might be the
Archimedes A440 (4MB DRAM, 20MB hard disk, GBP 2299).
<https://news.ycombinator.com/item?id=10685739> reports that a VAX
11/780 with 3 RM80 drives (each 121MB), but less than 4MB of memory
cost USD 350k in 1979.

I continue to think that in 1977 one could have implemented a
pipelined CPU with less cost and more performance than a VAX 11/780 by
learning things from the RISC papers (if the RISC papers had
time-traveled into the early 1970s), and the instruction set would
look much more like a RISC than like VAX; maybe it would have looked
like T32 or RV32GC.

MitchAlsup

unread,

Nov 13, 2022, 5:58:54 PM11/13/22

to

On Sunday, November 13, 2022 at 12:34:43 PM UTC-6, Anton Ertl wrote:
> D Gillies <dwgi...@gmail.com> writes:
> >In the 1970's memory was horribly expensive and slow. Cache memories were =
> >not very popular because they were only fast if integrated onto a custom ch=
> >ip, so virtually nobody had cache memories. Under the circumstances of ver=
> >y high memory costs, very dense highly efficient instruction sets can help =
> >cut the von Neumann bottleneck in half.
<
> It's not clear that VAX is a "very dense highly efficient instruction
> set". Looking at results from my code size measurement postings. And
> ARM T32 demonstrates that RISCs can have a very dens instruction set
> <2017Aug...@mips.complang.tuwien.ac.at>.
<

Somewhat to be expected. It had but 16 registers, and with defined SP, FP
pre-allocated, it should suffer 15%-18% in the instruction count department.
Furthermore, due to the addressing modes, several strength reducing
compilation techniques were less likely to be employed as these trade
more registers for fewer instructions.
<
So, one might expect VAX to have 20% instruction disadvantage compared
to a register RISC design.