Could we build a better 6502?

Thomas Koenig

unread,

Jul 23, 2021, 6:59:30 AM7/23/21

to

Another direction for retro-architectures... I've been looking
at the 6502 a bit, and it really is quite an interesting design.
Squeezing the functionality of a CPU into ~3500 transistors (plus
~1000 transistors used as resistors) was quite an achievement.

Could we do better knowing what we know now?

"Better" could of course mean different things - more instructions
per cycle, possibility of higher frequency, higher code density,
easier programming (programming the 6502 was not easy, especially
on the C-64 where Commodore had used up almost all of the zero
page for its Basic - I hardly ever used the X register).

The boundary conditions were of course severe. 16 bit address
bus, combined program and data bus of 8 bit. At least memory
was rather fast and could be accessed once per cycle without
problems (and even with the possibility of another, interleaved
access for graphics). Plus, any more transistors were bound to
increase the size and decrease the yield, leading to much
higher cost and erosion of the competetive advantage that the
6502 and its derivatives had at the time.

Quadibloc

unread,

Jul 23, 2021, 8:00:15 AM7/23/21

to

On Friday, July 23, 2021 at 4:59:30 AM UTC-6, Thomas Koenig wrote:
> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.

Indeed.

However, even the 6502, let alone the 6800 or the 8080, seemed
to me to have very complicated instruction sets compared to the
PDP-8.

But interestingly enough, the 12-bit SDS 92 had an instruction set
that in many ways resembled those of the early 8-bit micros. (A
manual describing its instruction set is, of course, on Bitsavers.)

John Savard

Marcus

unread,

Jul 23, 2021, 8:26:50 AM7/23/21

to

On 2021-07-23 12:59, Thomas Koenig wrote:
> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.

One of my favorite aspects of the 6502/6510 CPU was that the logic
was so "compressed" that they didn't even bother to define the
semantics for undefined opcodes, so people later discovered that
many of the undefined opcodes actually had a useful meaning (while
some were less useful).

This is a very nice document that gives a detailed description of
all of the "illegal" opcodes (or "unintended", as the author calls
them):

https://codebase64.org/lib/exe/fetch.php?media=base:nomoresecrets-nmos6510unintendedopcodes-20202412.pdf

>
> Could we do better knowing what we know now?
>
> "Better" could of course mean different things - more instructions
> per cycle, possibility of higher frequency, higher code density,
> easier programming (programming the 6502 was not easy, especially
> on the C-64 where Commodore had used up almost all of the zero
> page for its Basic - I hardly ever used the X register).

I have a love-hate relationship with the 6502. It's the first machine
that I programmed assembler on (actually I mostly used a machine code
monitor in the beginning), and as you say it's remarkable in many ways.
Yet it's also horrible to code in many ways.

I have sometimes thought about various ways to do a "modern" 6502.

At one end of the spectrum you could try to make a binary compatible
machine that tries to go as far as possible with pipelining and
high clock speeds etc (it's an interesting problem since 6502 was
not designed for pipelining, and ugly things like self-modifying code
were very common - at least on the C=64).

At another end of the spectrum you could try to make a 6502:ish
ISA, but with easier to use addressing modes and registers etc. It
wouldn't be compatible, but it would be interesting to see if you
could design something with similar (low) complexity that's more
programmer friendly.

Thomas Koenig

unread,

Jul 23, 2021, 10:00:25 AM7/23/21

to

Marcus <m.de...@this.bitsnbites.eu> schrieb:

> On 2021-07-23 12:59, Thomas Koenig wrote:
>> Another direction for retro-architectures... I've been looking
>> at the 6502 a bit, and it really is quite an interesting design.
>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>> ~1000 transistors used as resistors) was quite an achievement.
>
> One of my favorite aspects of the 6502/6510 CPU was that the logic
> was so "compressed" that they didn't even bother to define the
> semantics for undefined opcodes, so people later discovered that
> many of the undefined opcodes actually had a useful meaning (while
> some were less useful).

That is a consequence of their logic - they generated the internal
CPU with a PLA with a lot of "don't care" values. You can see the
PLA on the top of the die shots on http://www.visual6502.org/ .
Apparently, this was quite popular in NMOS, less so in CMOS.

We could probably do better now because our PLA optimization
algorithms are better now than in 1974 (if they even used automated
methods for the design in that particular team, I see no mention
of it anywhere).

[...]

>> Could we do better knowing what we know now?
>>
>> "Better" could of course mean different things - more instructions
>> per cycle, possibility of higher frequency, higher code density,
>> easier programming (programming the 6502 was not easy, especially
>> on the C-64 where Commodore had used up almost all of the zero
>> page for its Basic - I hardly ever used the X register).
>
> I have a love-hate relationship with the 6502. It's the first machine
> that I programmed assembler on (actually I mostly used a machine code
> monitor in the beginning), and as you say it's remarkable in many ways.
> Yet it's also horrible to code in many ways.

That pretty much expresses my feelings, too.

> I have sometimes thought about various ways to do a "modern" 6502.
>
> At one end of the spectrum you could try to make a binary compatible
> machine that tries to go as far as possible with pipelining and
> high clock speeds etc (it's an interesting problem since 6502 was
> not designed for pipelining, and ugly things like self-modifying code
> were very common - at least on the C=64).

You would probably need to run this off 64 kbyte static RAM
(basically use a modern L1 cache as RAM), or you would have to
wait a relative eternity of what you can do today in a CPU.

> At another end of the spectrum you could try to make a 6502:ish
> ISA, but with easier to use addressing modes and registers etc. It
> wouldn't be compatible, but it would be interesting to see if you
> could design something with similar (low) complexity that's more
> programmer friendly.

That's what I was thinking of - what could they have built with
a similar transistor number at the time?

One severe problem for code density was 16-bit arithmetic. This was
so bad tat Steve Wozniak wrote what we would today call a "Soft CPU"
in 6502 assembler, the https://en.wikipedia.org/wiki/SWEET16 ,
a sixteen-register accumulator architecture. This has to be one
of the shortest CPU emulators in history, with around 300 bytes.

So... probably a load-store architecture to save complexity, and
lose the more complicated addressing modes. With an 8 bit machine,
you'd want 8 and 16 bit immediates for load and store, jump and
branch. Opcodes should be limited to single bytes. There are so
few bits in a byte that you cannot have many operations between
registers, so the accumulator architecture stays, but something
like the 8080 or Z80 register pairs could make sense.

You could still have special treatment of the zero page by allowing
load and store absolute with an immediate byte, or relative to
an 8-bit register.

Hm... this would probably look more like an 8080 than a 6502.

MitchAlsup

unread,

Jul 23, 2021, 11:21:04 AM7/23/21

to

On Friday, July 23, 2021 at 5:59:30 AM UTC-5, Thomas Koenig wrote:
> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.
>
> Could we do better knowing what we know now?
<

Yes, It was called the PDP-11.

>
> "Better" could of course mean different things - more instructions
> per cycle, possibility of higher frequency, higher code density,
> easier programming (programming the 6502 was not easy, especially
> on the C-64 where Commodore had used up almost all of the zero
> page for its Basic - I hardly ever used the X register).
>
> The boundary conditions were of course severe. 16 bit address
> bus, combined program and data bus of 8 bit.
<

The buss needs widened.

Quadibloc

unread,

Jul 23, 2021, 12:08:16 PM7/23/21

to

On Friday, July 23, 2021 at 9:21:04 AM UTC-6, MitchAlsup wrote:
> On Friday, July 23, 2021 at 5:59:30 AM UTC-5, Thomas Koenig wrote:
> > Another direction for retro-architectures... I've been looking
> > at the 6502 a bit, and it really is quite an interesting design.
> > Squeezing the functionality of a CPU into ~3500 transistors (plus
> > ~1000 transistors used as resistors) was quite an achievement.

> > Could we do better knowing what we know now?

> Yes, It was called the PDP-11.

The PDP-11 used even fewer transistors?

I could imagine the PDP-8 would, as it was *very* simple.

John Savard

EricP

unread,

Jul 23, 2021, 12:33:05 PM7/23/21

to

Thomas Koenig wrote:
> Marcus <m.de...@this.bitsnbites.eu> schrieb:
>> On 2021-07-23 12:59, Thomas Koenig wrote:
>>> Another direction for retro-architectures... I've been looking
>>> at the 6502 a bit, and it really is quite an interesting design.
>>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>>> ~1000 transistors used as resistors) was quite an achievement.
>> One of my favorite aspects of the 6502/6510 CPU was that the logic
>> was so "compressed" that they didn't even bother to define the
>> semantics for undefined opcodes, so people later discovered that
>> many of the undefined opcodes actually had a useful meaning (while
>> some were less useful).
>
> That is a consequence of their logic - they generated the internal
> CPU with a PLA with a lot of "don't care" values. You can see the
> PLA on the top of the die shots on http://www.visual6502.org/ .
> Apparently, this was quite popular in NMOS, less so in CMOS.

In nmos they could build equivalent to wired-OR logic with a
depletion mode pull-up (effectively a resistor pulling the wire high)
and a pull down for each input signal. Essentially each logic input
term to a large fan-in NAND (a, b, c, ...) cost just 1 transistor.

It wastes power, but it works.

Static CMOS doesn't have any equivalent to wire-OR so there is little
saving making static PLAs. Instead PLAs are built out of dynamic logic.
But I don't see them explicitly referenced in designs much
so I think the tricky dynamic aspect keeps designers away.
Also tools like Verilog don't generate PLAs.

> We could probably do better now because our PLA optimization
> algorithms are better now than in 1974 (if they even used automated
> methods for the design in that particular team, I see no mention
> of it anywhere).

IIRC 6502 used a PLA for the first part of the decoding and sequencing,
but also had gobs of random logic after it.

EricP

unread,

Jul 23, 2021, 12:58:56 PM7/23/21

to

Also the whole processor would have been designed using brain power
on paper and pencil. There was no CAD development tools.

Then a prototype is wired up, probably using TTL on wire-wrap boards,
and debugged using oscilloscopes.

Then that logic is translated by hand to individual transistors,
again using pencil and paper.
And wire up a second prototype using individual transistors
and debugged using oscilloscopes.

Then the transistor level design is translated by hand to
giant colored-tape-on-clear-mylar sheets, about 6 ft by 6 ft.
One sheet for each mask layer.

Oh and you only have 2 layers of interconnect, one polysilicide
and one metal. Use them wisely.

These masks are then photo reduced to produce the reticle masks.

Then you can make your prototype IC and 8-12 weeks later
you can start testing it to try to figure out the mistakes
you made in any of the above.

Guillaume

unread,

Jul 23, 2021, 1:16:39 PM7/23/21

to

Le 23/07/2021 à 12:59, Thomas Koenig a écrit :
> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.
>
> Could we do better knowing what we know now?

If you mean with the same number of transistors, that would be very hard
to do really better.

I guess using a completely different architecture, based on a very
simple RISC instruction set, you might be able to do something a bit
more "powerful" with the same amount of logic, and being able to run at
much higher frequencies even on old processes. But using such a CPU with
simple dev tools as they were available at them time would be very
clunky. Only decent compilers can make efficient use of such simple RISC
architectures while not making the developers' life a hell. I guess.

MitchAlsup

unread,

Jul 23, 2021, 1:37:51 PM7/23/21

to

Err, no........
All of the 68K family (at least the ones before 040) used PLAs.
The 88Ks used a PLA--well, actually ½ a PLA; we used a single NOR
plane as the decoder in 88100.
Lots of other designs used PLAs or NOR planes because they were
DENSE--as dense as DRAMs with mask programmable bit patterns.
I bet most of the microprocessors, prior to the RISC revolution, used
PLAs (or NOR planes).
<
One of the clever things 68020 did was to place an XOR gate between
the two NOR planes of the PLA which vastly increases the kinds of
patterns one could decode !!

<
> Also tools like Verilog don't generate PLAs.
<

NO but microcode assemblers do !

MitchAlsup

unread,

Jul 23, 2021, 1:43:32 PM7/23/21

to

Silicide did not occur until 2µ and we are talking about the 5µ-3µ era.
But then again, logic speeds were not high enough that silicide bought
much gain (it was instrumental in getting 68020 to 33 MHz BTW)....

>
> These masks are then photo reduced to produce the reticle masks.
>
> Then you can make your prototype IC and 8-12 weeks later
> you can start testing it to try to figure out the mistakes
> you made in any of the above.
<

In the 1.5µ era we went from full layer tapeout, to prototype wafers
in 35 hours..........it was a rush job, and we had a FAB guy walk the
wafers between stations so there was not manufacturing latency
only the latency of each job step.

MitchAlsup

unread,

Jul 23, 2021, 1:48:10 PM7/23/21

to

On Friday, July 23, 2021 at 12:16:39 PM UTC-5, Guillaume wrote:
> Le 23/07/2021 à 12:59, Thomas Koenig a écrit :
> > Another direction for retro-architectures... I've been looking
> > at the 6502 a bit, and it really is quite an interesting design.
> > Squeezing the functionality of a CPU into ~3500 transistors (plus
> > ~1000 transistors used as resistors) was quite an achievement.
> >
> > Could we do better knowing what we know now?
<
> If you mean with the same number of transistors, that would be very hard
> to do really better.
<

It would be impossible today. 3× transistors is the minimum you could even
conceive of !!
<
Consider the select line driver on the 68020:: This is a logic block that
took a lightly loaded control signal, a clock, and drove a select line
across a 32-bit data path with 32 loads on this wire. Today we call these
and AND-clock. We built these with 3 transistors !! and one inversion.
I can't imagine you could build this with fewer than 12 transistors today
and 2 logic inversions.

>
> I guess using a completely different architecture, based on a very
> simple RISC instruction set, you might be able to do something a bit
> more "powerful" with the same amount of logic, and being able to run at
> much higher frequencies even on old processes. But using such a CPU with
> simple dev tools as they were available at them time would be very
> clunky. Only decent compilers can make efficient use of such simple RISC
> architectures while not making the developers' life a hell. I guess.
<

In other words--don't bother.

Thomas Koenig

unread,

Jul 23, 2021, 2:02:19 PM7/23/21

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Friday, July 23, 2021 at 5:59:30 AM UTC-5, Thomas Koenig wrote:
>> Another direction for retro-architectures... I've been looking
>> at the 6502 a bit, and it really is quite an interesting design.
>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>> ~1000 transistors used as resistors) was quite an achievement.
>>
>> Could we do better knowing what we know now?
><
> Yes, It was called the PDP-11.

The PDP-11 had considerably more transistors, looking at the die
shot of the https://en.wikipedia.org/wiki/DEC_T-11 vs. the die
shot at https://en.wikipedia.org/wiki/MOS_Technology_6502 will
tell you (or a look at https://en.wikipedia.org/wiki/DEC_J-11
from 1979).

Totally impossible to start the home computer revolution with
such a chip launched in 1975, whose main selling point was that
it undercut the more 6809 on price.

No 6502, no Apple I or II, no C-64.

>> "Better" could of course mean different things - more instructions
>> per cycle, possibility of higher frequency, higher code density,
>> easier programming (programming the 6502 was not easy, especially
>> on the C-64 where Commodore had used up almost all of the zero
>> page for its Basic - I hardly ever used the X register).
>>
>> The boundary conditions were of course severe. 16 bit address
>> bus, combined program and data bus of 8 bit.
><
> The buss needs widened.

And so they did, with their 16-bit chips. For the embedded market
in the mid-1970s that they originally developed the chip for,
this was out of the question.

Thomas Koenig

unread,

Jul 23, 2021, 2:08:13 PM7/23/21

to

EricP <ThatWould...@thevillage.com> schrieb:

> Thomas Koenig wrote:

> Static CMOS doesn't have any equivalent to wire-OR so there is little
> saving making static PLAs. Instead PLAs are built out of dynamic logic.
> But I don't see them explicitly referenced in designs much
> so I think the tricky dynamic aspect keeps designers away.
> Also tools like Verilog don't generate PLAs.

If you want to, it is quite possible to use Espresso (see
https://github.com/classabbyamp/espresso-logic to generate Verilog
or VHDL truth table (you may have to edit the formulas, but that
is easily doable by using a small sed script).

>
>> We could probably do better now because our PLA optimization
>> algorithms are better now than in 1974 (if they even used automated
>> methods for the design in that particular team, I see no mention
>> of it anywhere).
>
> IIRC 6502 used a PLA for the first part of the decoding and sequencing,
> but also had gobs of random logic after it.

Easily identifiable on the chip, yep :-)

EricP

unread,

Jul 23, 2021, 2:11:31 PM7/23/21

to

The cpu's I worked with were the Motorola 6800, Intel 8080,
and CMOS RCA 1802. Electronics for 1802 was the
easiest to design with but it had a horrible ISA.

I had an paper-design hobby project a while back to design the best
ISA possible within the 1802's budget of 5500 CMOS transistors.
Gates have a maximum of 3 inputs, and there is 2 layers of interconnect.

My design had 2 accumulators, A and B, a Flag register,
8 16-bit address/data registers, a link register for Branch And Link,
a link register for interrupted IP, and a IP register.
In the Mark-I model A and B were 8 bits. In the Mark-II they were 16 bits.

Internally there was a 16 bit temp register for loading an immediate,
a Memory Address Register (MAR), ALU input registers,
and an 8-bit instruction register.

Instruction opcodes are 8 bits, with 1 or 2 bytes of immediate,
and can do 8 and a few 16 bit operations.
Two accumulators means operate opcodes are of the form A = A op B
eliminating register fields for most instructions.
The rest of the instructions have a single 3-bit register field.

The decoder was one 2:4 and two 3:8 decoders (remember, gates are
limited to 3 inputs so no 4:16 decoders) followed by random logic,
and the instruction sequencer a 16 state Johnson counter.
ALU performed standard operations but also did 16-bit
increment, decrement, and add.

There is a 16 bit bus running from the register file to MAR, ALU
and data bus interface, and a 16 bit result bus the runs back to
register file but also to MAR so a register can be read into MAR,
and incremented sent back to register file in 1 cycle.
16 bit adds took 2 cycles.

Bus timing had its own sequencer logic (just a few FF)
so to do a bus read or write one stuffed a value in the MAR and
pulsed the bus sequencer. That shut of the instruction sequencer
until the bus cycle finished and presented the data value on the
internal data bus, and re-enable the main sequencer.

Thomas Koenig

unread,

Jul 23, 2021, 2:15:11 PM7/23/21

to

Guillaume <mes...@bottle.org> schrieb:

> Le 23/07/2021 à 12:59, Thomas Koenig a écrit :
>> Another direction for retro-architectures... I've been looking
>> at the 6502 a bit, and it really is quite an interesting design.
>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>> ~1000 transistors used as resistors) was quite an achievement.
>>
>> Could we do better knowing what we know now?
>
> If you mean with the same number of transistors, that would be very hard
> to do really better.
>
> I guess using a completely different architecture, based on a very
> simple RISC instruction set, you might be able to do something a bit
> more "powerful" with the same amount of logic, and being able to run at
> much higher frequencies even on old processes.

The Nova paved the way for that, I think - it was very good on the
price / performance curve for its day.

I have not found a transistor count for the Nova, unfortunately, and
the schematics on Bitsavers give just codes for the ICs that they
built in that I am unable to deciper.

> But using such a CPU with
> simple dev tools as they were available at them time would be very
> clunky.

Ugh...

I have programmed the 6502 in assembler, it was a PITA. A few
more registers would have done wonders and would have saved me a
few torn-out hairs (but I had enough in those days).

The first time I browsed through handbook on the 68000, I thought
"This is not assembler, this is a high-level language!"

> Only decent compilers can make efficient use of such simple RISC
> architectures while not making the developers' life a hell. I guess.

Hardly a decent compiler around for the 6502 in those days, at
least I knew none. It was Basic or Assembler (or machine code
via a monitor, with a hard reset if you made a programming mistake).

MitchAlsup

unread,

Jul 23, 2021, 2:24:41 PM7/23/21

to

And they took most of it from the PDP-11

Thomas Koenig

unread,

Jul 23, 2021, 3:17:58 PM7/23/21

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Friday, July 23, 2021 at 1:15:11 PM UTC-5, Thomas Koenig wrote:

>> The first time I browsed through handbook on the 68000, I thought
>> "This is not assembler, this is a high-level language!"
><
> And they took most of it from the PDP-11

Certainly a lot.

What mostly dazzeled me, though, was the sheer number and
witdth of registers.

For somebody who had only seen 6502 and Z-80 up to that point,
that was quite an experience.

Of course, there was also some frustration involved. I had done
some Mandelbrot calculations (who hadn't :) and sped things up by
a factor of 10 or more by using the floating point routines
in the C-64's BASIC interpreter.

A friend, who had one of the first Amigas, was _much_ faster
just using BASIC. I then gave up on 6502 assembler.

Timothy McCaffrey

unread,

Jul 23, 2021, 3:27:41 PM7/23/21

to

16 bit Stack pointer.

Some optimizations that found their way into follow on products (I think these are documented on the Wikipedia page).
IIRC, there was a sequencer optimization to one of the addressing modes that sped things up by ~15%.

16 bit X & Y would have been great, especially if they supported some 16 bit math.
Get rid of the (mostly) unused addressing modes.
Direct Page register like the 6809.

I really liked the ISA of the General Instruments CP1600, but I cannot find out what the transistor count was.
(Except for the stupid encoding which wasted 4 bits for every 16 bits of memory).

- Tim

Timothy McCaffrey

unread,

Jul 23, 2021, 3:29:27 PM7/23/21

to

On Friday, July 23, 2021 at 2:02:19 PM UTC-4, Thomas Koenig wrote:

> Totally impossible to start the home computer revolution with
> such a chip launched in 1975, whose main selling point was that
> it undercut the more 6809 on price.
>

I think you meant 6800 (or 8080), 6809 didn't show up to several years later
(with 3x the number of transistors).

- Tim

JimBrakefield

unread,

Jul 23, 2021, 3:41:25 PM7/23/21

to

On Friday, July 23, 2021 at 5:59:30 AM UTC-5, Thomas Koenig wrote:

There are several open source attempts at 6502 with more 16-bit capabilities.
Only one that attempts 64-bit capability:
http://www.6502.org/users/andre/65k/index.html
It does not fare well in FPGA LUT counts, about a 4X to 8X increase over 6502 open source designs.
But then you are going from an 8-bit ALU to a 64-bit ALU. Perhaps a two clock
32-bit ALU would be best?

Timothy McCaffrey

unread,

Jul 23, 2021, 3:45:47 PM7/23/21

to

On Friday, July 23, 2021 at 2:15:11 PM UTC-4, Thomas Koenig wrote:

> Hardly a decent compiler around for the 6502 in those days, at
> least I knew none. It was Basic or Assembler (or machine code
> via a monitor, with a hard reset if you made a programming mistake).

The only one I knew of was UCSD Pascal, which used the P-System
interpreter.

- Tim

George Neuner

unread,

Jul 23, 2021, 4:10:54 PM7/23/21

to

On Fri, 23 Jul 2021 18:15:09 -0000 (UTC), Thomas Koenig
<tko...@netcologne.de> wrote:

>I have programmed the 6502 in assembler, it was a PITA. A few
>more registers would have done wonders and would have saved me a
>few torn-out hairs (but I had enough in those days).

I also programmed 6502 in assembler. I didn't find it particularly
troublesome, but no doubt it depended on what you were trying to do.

>Hardly a decent compiler around for the 6502 in those days, at
>least I knew none. It was Basic or Assembler (or machine code
>via a monitor, with a hard reset if you made a programming mistake).

Manx, DeSmet, and Orca all had native 6502 C compilers. Orca also had
a native Pascal compiler.

UCSD and Apple both had pcode Pascal compilers. I never tried the
Apple compiler, but UCSD was pretty useless.

I used the Manx C compilers on Apple //e. There was both a native and
a bytecode compiler [their own, not based on UCSD pcode or Sweet16].
You could mix native and bytecode in the same program, and even write
in 'assembler' for the bytecode interpreter. Manx suported nested
code overlays, and could use the extra 16KB "language" card on the
Apple ][, and the extra 64KB available on the Apple //e and //c.

With careful coding, it was possible to pack a whole a lot of function
into a C program ... at least on the 128KB Apple //e or //c ... and
the resulting program could be made fast enough to use.

YMMV,
George

Anssi Saari

unread,

Jul 23, 2021, 4:21:37 PM7/23/21

to

Thomas Koenig <tko...@netcologne.de> writes:

> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.
>
> Could we do better knowing what we know now?

Wasn't it already done with the 65816 or perhaps more interestingly the
CSG4510? That was intended to go in the Commodore 65 which never
happened but lives on in the FPGA-based MEGA65 project. Information is a
little hard to find for the moment though but there's some documentation
from the MEGA65 project at
https://files.mega65.org/manuals-upload/mega65-chipset-reference.pdf

AFAIK, mostly the 4510 improved on the 6502 on speed, some instructions
executed faster and some new ones were introduced. I have no idea of the
transistor count.

John Dallman

unread,

Jul 23, 2021, 5:36:33 PM7/23/21

to

In article <sde7eg$hgb$1...@newsreader4.netcologne.de>,
tko...@netcologne.de (Thomas Koenig) wrote:

> ... programming the 6502 was not easy, especially on the C-64

> where Commodore had used up almost all of the zero page for

> its Basic - I hardly ever used the X register.

It gets easier with practice. The style of assembler code that works well
is very different from normal Z80 or x86 styles. The 6502 is quite good
at running interpreters, but compiling HLLs for it is horrible. I had a
couple of years of employment writing CAD for 6502 in the mid-eighties,
and we implemented an interpreter for a virtual machine that did the
floating point.

The early ARMs were inspired by the 6502, but only in terms of being
simple and avoiding microcode.

John

Andy Valencia

unread,

Jul 23, 2021, 7:53:55 PM7/23/21

to

Timothy McCaffrey <timca...@aol.com> writes:
> I think you meant 6800 (or 8080), 6809 didn't show up to several years later
> (with 3x the number of transistors).

I was trying to design page faults for the 6809. Exercise for the reader:
What is the worst case number of pages faults which need to be resolved
to complete a single instruction?

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Quadibloc

unread,

Jul 24, 2021, 1:56:08 AM7/24/21

to

On Friday, July 23, 2021 at 2:21:37 PM UTC-6, Anssi Saari wrote:

> Wasn't it already done with the 65816

That is a better 6502, but the way he phrased his post, I thought
he meant a more optimized 6502, one that fit the 6502 into an
even smaller package with fewer logic gates.

That is something I don't think we could improve on much, if at
all... nor is there any motivation to make the effort to try.

John Savard

Thomas Koenig

unread,

Jul 24, 2021, 3:48:00 AM7/24/21

to

Quadibloc <jsa...@ecn.ab.ca> schrieb:

> On Friday, July 23, 2021 at 2:21:37 PM UTC-6, Anssi Saari wrote:
>
>> Wasn't it already done with the 65816
>
> That is a better 6502, but the way he phrased his post, I thought
> he meant a more optimized 6502, one that fit the 6502 into an
> even smaller package with fewer logic gates.

It was more along the lines of "what could be the best
microprocessor that could have fitted the transistor and 8 bit
bus limit", which "best" having different quality metrics,
of course.

Anton Ertl

unread,

Jul 24, 2021, 7:14:53 AM7/24/21

to

Thomas Koenig <tko...@netcologne.de> writes:
>Another direction for retro-architectures... I've been looking
>at the 6502 a bit, and it really is quite an interesting design.
>Squeezing the functionality of a CPU into ~3500 transistors (plus
>~1000 transistors used as resistors) was quite an achievement.

[Additional boundary conditions from further down:]

|16 bit address bus, combined program and data bus of 8 bit.

>Could we do better knowing what we know now?

I think so.

Maybe something like the small variant of b16 (first called b16-small,
later renamed into b16, while the original (large) b16 was renamed
into b16-dsp):

https://bernd-paysan.de/b16-presentation.pdf

Having the stacks on-chip would be great, but probably does not fit
the transistor budget, so one would only keep TOS and P on-chip, and
replace the rest with stack pointers (maybe 5 bits each) and one
16-bit buffer (not per stack) for keeping one other stack item after
loading it or before storing it.

The b16 feature of having 3+ instructions (without immediates) per
16-bit word could be good for speed, but requires a 16-bit shift
register for holding the instruction, which we may not want due to its
transistor cost. The alternative is to have a 5-bit and a 3-bit
instruction in an 8-bit byte. One would have to analyse the usage to
decide which instructions to make available through the three bits.

The b16 design uses a 16-bit ALU. One can replace that with several
passes through an 8-bit or 4-bit ALU, at an increase in control logic.
Not sure if that would pay off wrt transistors. The first Nova
certainly took the 4-bit-ALU approach. Given that you need a two-pass
approach for 16-bit memory accesses anyway, the additional cost for a
two-pass approach through an 8-bit ALU may be minor.

I don't think that the b16 has interrupts, so you would need to add
that to be on feature-parity with the 6502.

>"Better" could of course mean different things - more instructions
>per cycle, possibility of higher frequency, higher code density,

>easier programming (programming the 6502 was not easy, especially

>on the C-64 where Commodore had used up almost all of the zero

>page for its Basic - I hardly ever used the X register).

About half of the zero page was used for the kernal (OS), and yes,
BASIC used the other half. But if you did not call into BASIC, you
could use that other half. The X register can be used with 16-bit
base addresses, so even if you don't use the zero page, X is useful.
I did find the (...,X) addressing mode useless, though (even though I
had enough zero-page places for my usage).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Anton Ertl

unread,

Jul 24, 2021, 11:50:31 AM7/24/21

to

Guillaume <mes...@bottle.org> writes:
>I guess using a completely different architecture, based on a very
>simple RISC instruction set, you might be able to do something a bit
>more "powerful" with the same amount of logic, and being able to run at
>much higher frequencies even on old processes. But using such a CPU with
>simple dev tools as they were available at them time would be very
>clunky. Only decent compilers can make efficient use of such simple RISC
>architectures while not making the developers' life a hell. I guess.

You guess wrong. RISCs are easy to program in assembly. It's more
the other way round: Assembly programmers can cope with irregular
register sets like the one of the 6502, while compilers have a hard
time with them; they benefit from the regularity of register machines,
both CISC (e.g., PDP-11, VAX, AMD64), and RISC (all of them).

Concerning a "very simple RISC instruction set", the NOVA has a
load/store architecture. The National Semiconductor IPC-16A/520 PACE
seems to be the first single-chip implementation of an instruction set
similar to the Nova (and the first single-chip 16-bit CPU), announced
in 1974 <https://www.cpu-world.com/CPUs/PACE/index.html>, so maybe
with a similar transistor budget as the 6502. It took 10us per
instruction on average (1MHz 6502 2-7us), somewhat negating the 16-bit
advantage. That's apparently also the case for the other
Nova-inspired microprocessors (microNova and Fairchild 9440), and is
probably the reason why they were not successful; it's unclear to me
why they came out this way, but if it is to be a "better 6502", it
must be avoided.

Concerning the "much higher frequencies", a 6502 in a computer of its
time does not really benefit from a higher clock frequency, because it
accesses memory in most cycles, so a higher-clocked 6502 would
effectively run at the same speed.

So if we want to design something better, we need to design something
that needs fewer memory accesses. The 6502 needs many instructions
and many memory accesses because it has so few registers. It also
needs many instructions because one needs to synthesize 16-bit
operations from several 8-bit instructions.

OTOH, unlike load/store architectures the 6502 has load-op
instructions (and even a few read-modify-write instructions) that
reduce the number of instructions executed compared to an otherwise
similar load/store instruction set.

Still, if I wanted to rise to the challenge, my first pick would be a
16-bit load/store architecture, with as many registers as fit in the
transistor budget. Use an 8-bit ALU to save transistors (you already
need the sequencing logic for 16-bit memory accesses anyway); or
alternatively, use a 16-bit ALU, and perform 8-bit loads and 8-bit
stores programmatically to avoid the sequencing.

My guess is that 3 general-purpose registers should fit (can be
addressed in 2 bits, and have one option for zero/discard-result).
That's of course very tight, especially because you have no additional
stack pointer. If we can fit more registers, it would be better.

Instructions should fit into 8 bits (plus immediate operands/offsets),
so we can afford only one or two register addresses in each
instruction. Would be an interesting exercise for the instruction-set
designers here. Auto-increment/decrement would be cool, but probably
does not fit in the instruction size or transistor size limit.

Calls store the return address in a register, and return is an
indirect jump through that register.

An extra special-purpose register would be needed for holding the
interrupt return address.

Anton Ertl

unread,

Jul 24, 2021, 11:51:53 AM7/24/21

to

MitchAlsup <Mitch...@aol.com> writes:
>On Friday, July 23, 2021 at 1:15:11 PM UTC-5, Thomas Koenig wrote:

>> The first time I browsed through handbook on the 68000, I thought=20

>> "This is not assembler, this is a high-level language!"
><
>And they took most of it from the PDP-11

Except the best feature of the PDP-11: general-purpose registers.

Anton Ertl

unread,

Jul 24, 2021, 11:53:11 AM7/24/21

to

j...@cix.co.uk (John Dallman) writes:
>The early ARMs were inspired by the 6502, but only in terms of being
>simple and avoiding microcode.

But the 6502 was microcoded (with the microprogram encoded in the
PLA).

MitchAlsup

unread,

Jul 24, 2021, 1:17:19 PM7/24/21

to

On Saturday, July 24, 2021 at 10:51:53 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Friday, July 23, 2021 at 1:15:11 PM UTC-5, Thomas Koenig wrote:
> >> The first time I browsed through handbook on the 68000, I thought=20
> >> "This is not assembler, this is a high-level language!"
> ><
> >And they took most of it from the PDP-11
> Except the best feature of the PDP-11: general-purpose registers.
<

They thought they were doing better with A and D registers.

EricP

unread,

Jul 24, 2021, 1:19:57 PM7/24/21

to

MitchAlsup wrote:
> On Friday, July 23, 2021 at 11:33:05 AM UTC-5, EricP wrote:
>>
>> Static CMOS doesn't have any equivalent to wire-OR so there is little
>> saving making static PLAs. Instead PLAs are built out of dynamic logic.
>> But I don't see them explicitly referenced in designs much
>> so I think the tricky dynamic aspect keeps designers away.
> <
> Err, no........
> All of the 68K family (at least the ones before 040) used PLAs.
> The 88Ks used a PLA--well, actually ½ a PLA; we used a single NOR
> plane as the decoder in 88100.
> Lots of other designs used PLAs or NOR planes because they were
> DENSE--as dense as DRAMs with mask programmable bit patterns.
> I bet most of the microprocessors, prior to the RISC revolution, used
> PLAs (or NOR planes).
> <
> One of the clever things 68020 did was to place an XOR gate between
> the two NOR planes of the PLA which vastly increases the kinds of
> patterns one could decode !!
> <

Yes but those were 30 to 35 years ago.
I haven't seen PLA's mentioned in a design in at least 20 years,
though I would imagine that Intel and AMD use them for the decoders.

The 88100's single NOR plane would have less strict timing than a
PLA with two planes. From what I have read, ideally the first plane
drives the second directly - one doesn't want a layer of latches between.
However before the first plane signals stabilize, they can glitch the
second plane and cause it to erroneously discharge.
I can see this would be finicky circuitry and possibly susceptible
to timing changes due to process variation.

I looked at the 88100 instruction encodings -
I still haven't figured out how you got away with just 1 NOR plane.

>> Also tools like Verilog don't generate PLAs.
> <
> NO but microcode assemblers do !
> <

Yes but how would one do a simulation test of a whole design?

I suppose the PLA assembler could generate a Verilog UNIQUE CASEX statement
with ? don't cares which implements the equivalent logic to the PLA.
That could be used to test the logic functionality but not the timing.

John Levine

unread,

Jul 24, 2021, 2:06:33 PM7/24/21

to

According to Quadibloc <jsa...@ecn.ab.ca>:

>On Friday, July 23, 2021 at 4:59:30 AM UTC-6, Thomas Koenig wrote:
>> Another direction for retro-architectures... I've been looking
>> at the 6502 a bit, and it really is quite an interesting design.
>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>> ~1000 transistors used as resistors) was quite an achievement.
>

>Indeed.
>
>However, even the 6502, let alone the 6800 or the 8080, seemed
>to me to have very complicated instruction sets compared to the
>PDP-8.

The original PDP-8 had only 1409 transisors, each one in a separate can,
so it's not surprising. The first computer I programmed was a PDP-8 and
it was a fantastically well-done tradeoff between extreme simplicity and
usability.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

MitchAlsup

unread,

Jul 24, 2021, 2:11:25 PM7/24/21

to

On Saturday, July 24, 2021 at 12:19:57 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Friday, July 23, 2021 at 11:33:05 AM UTC-5, EricP wrote:
> >>
> >> Static CMOS doesn't have any equivalent to wire-OR so there is little
> >> saving making static PLAs. Instead PLAs are built out of dynamic logic.
> >> But I don't see them explicitly referenced in designs much
> >> so I think the tricky dynamic aspect keeps designers away.
> > <
> > Err, no........
> > All of the 68K family (at least the ones before 040) used PLAs.
> > The 88Ks used a PLA--well, actually ½ a PLA; we used a single NOR
> > plane as the decoder in 88100.
> > Lots of other designs used PLAs or NOR planes because they were
> > DENSE--as dense as DRAMs with mask programmable bit patterns.
> > I bet most of the microprocessors, prior to the RISC revolution, used
> > PLAs (or NOR planes).
> > <
> > One of the clever things 68020 did was to place an XOR gate between
> > the two NOR planes of the PLA which vastly increases the kinds of
> > patterns one could decode !!
> > <
> Yes but those were 30 to 35 years ago.
> I haven't seen PLA's mentioned in a design in at least 20 years,
> though I would imagine that Intel and AMD use them for the decoders.
<

A lot of microcode moved towards ROM territory:: a ROM is a PLA that
happens the the first layer is a decoder (which could be done with a NOR
plane but the decoder is faster and fewer gates.) IIRC: Athlon had a bit
more than 3K µwords and Opteron had a bit more than 4K µwords.

>
> The 88100's single NOR plane would have less strict timing than a
> PLA with two planes. From what I have read, ideally the first plane
> drives the second directly - one doesn't want a layer of latches between.
<

Back in the 88100 there was a latch before the NOR plane and a latch
at the output of the NOR plane and the NOR plane carefully biased
and the latch input carefully sized so that the latch acted as a sense
amplifier.
<
The latch directly drove the first ½ cycle of the datapath select
lines. This corresponds to the first ½ of integer execution or the
delivery of operands to the MUL or FADD units. The second ½
of the cycle might come from the NOR plane (int) or from the
sequencers (AGEN, MUL, and FADD).

<
> However before the first plane signals stabilize, they can glitch the
> second plane and cause it to erroneously discharge.
> I can see this would be finicky circuitry and possibly susceptible
> to timing changes due to process variation.
>
> I looked at the 88100 instruction encodings -
> I still haven't figured out how you got away with just 1 NOR plane.
<

Dig deeper......

<
> >> Also tools like Verilog don't generate PLAs.
> > <
> > NO but microcode assemblers do !
> > <
> Yes but how would one do a simulation test of a whole design?
>
> I suppose the PLA assembler could generate a Verilog UNIQUE CASEX statement
> with ? don't cares which implements the equivalent logic to the PLA.
> That could be used to test the logic functionality but not the timing.
<

Nah:: just have the assembler spit out a VERILOG table directly--there is
no more reason to be able to read it here than the average coder reading
assembly language.

MitchAlsup

unread,

Jul 24, 2021, 2:13:29 PM7/24/21

to

On Saturday, July 24, 2021 at 1:06:33 PM UTC-5, John Levine wrote:
> According to Quadibloc <jsa...@ecn.ab.ca>:
> >On Friday, July 23, 2021 at 4:59:30 AM UTC-6, Thomas Koenig wrote:
> >> Another direction for retro-architectures... I've been looking
> >> at the 6502 a bit, and it really is quite an interesting design.
> >> Squeezing the functionality of a CPU into ~3500 transistors (plus
> >> ~1000 transistors used as resistors) was quite an achievement.
> >
> >Indeed.
> >
> >However, even the 6502, let alone the 6800 or the 8080, seemed
> >to me to have very complicated instruction sets compared to the
> >PDP-8.
>
> The original PDP-8 had only 1409 transisors, each one in a separate can,
> so it's not surprising. The first computer I programmed was a PDP-8 and
> it was a fantastically well-done tradeoff between extreme simplicity and
> usability.
<

Not quite a fair comparison: In the logic family of the PDP-8, 1 transistor
could make a 5-input NAND gate whereas in the 6502 logic family this
would take 6 transistors (5 pull downs (N-ch) and 1 pull up (depletion).)

Thomas Koenig

unread,

Jul 24, 2021, 2:20:00 PM7/24/21

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> Guillaume <mes...@bottle.org> writes:

> Concerning a "very simple RISC instruction set", the NOVA has a
> load/store architecture. The National Semiconductor IPC-16A/520 PACE
> seems to be the first single-chip implementation of an instruction set
> similar to the Nova (and the first single-chip 16-bit CPU), announced
> in 1974 <https://www.cpu-world.com/CPUs/PACE/index.html>, so maybe
> with a similar transistor budget as the 6502.

https://www.cpu-world.com/CPUs/INS8900/index.html states that

# The fastest INS8900 instruction can be executed in 4 machine
# cycles, and require 1 read cycle. Since each machine cycle takes
# 4 clock cycles, it takes at least 17 clock cycles to execute one
# instruction. INS8900 CPU, running at 2 MHz, has 0.5 microsecond
# cycle, and thus can execute at most 117,600 instructions per
# second. That is much slower than instruction per second rate of
# many popular 8-bit processors of the time - Intel 8080, MOS 6502,
# or Motorola 6800.

Having "machine cycles" which were quite a few clock cycles seems
to have been done frequently, the Z80 also did this (I assume to
deal with its 4-bit ALU). This is also why the Z80 was not much
faster than the 6502 despite its much higher clock rates.

I think the 6502 had its clock cycles (only? partially?) determined
by its 8-bit ripple carry adder(s), which is why many instructions
where the address + index crossed a page boundary took an extra
cycle.

George Neuner

unread,

Jul 24, 2021, 2:54:14 PM7/24/21

to

That's the 65c802: 16-bit registers, direct page, pin compatible with
65c02.

65c816 adds 24 bit address space.

WDC supposedly also was working on a 32-bit version that included
floating point ... but AFAIK, nothing came of it.

Quadibloc

unread,

Jul 24, 2021, 4:42:51 PM7/24/21

to

On Saturday, July 24, 2021 at 12:06:33 PM UTC-6, John Levine wrote:
> According to Quadibloc <jsa...@ecn.ab.ca>:

> >However, even the 6502, let alone the 6800 or the 8080, seemed
> >to me to have very complicated instruction sets compared to the
> >PDP-8.

> The original PDP-8 had only 1409 transisors, each one in a separate can,
> so it's not surprising. The first computer I programmed was a PDP-8 and
> it was a fantastically well-done tradeoff between extreme simplicity and
> usability.

I wonder what the transistor count of the PDP-5 was.

The PDP-5 was the predecessor of the PDP-8; its instruction set
was almost identical. However, it used memory location 0 as the
program counter.

When the program counter was added to the PDP-8 as a separate
register, they changed the locations used to save return data for
interrupts to use location 0, making the two computers incompatible.

Also, the PDP-5 was normally connected to a 5-level Teletype rather
than an ASCII one.

John Savard

Anton Ertl

unread,

Jul 24, 2021, 5:56:57 PM7/24/21

to

Thomas Koenig <tko...@netcologne.de> writes:
>I think the 6502 had its clock cycles (only? partially?) determined
>by its 8-bit ripple carry adder(s), which is why many instructions
>where the address + index crossed a page boundary took an extra
>cycle.

The extra cycle is necessary because the high byte then has to run
through the 8-bit ALU to get incremented. See the data path at
<http://www.weihenstephan.org/~michaste/pagetable/6502/6502.jpg>. The
only other parts that can increment are only connected to the program
counter.

Bernd Paysan

unread,

Jul 24, 2021, 7:40:46 PM7/24/21

to

Am Sat, 24 Jul 2021 10:22:08 GMT schrieb Anton Ertl:

> Thomas Koenig <tko...@netcologne.de> writes:
>>Could we do better knowing what we know now?
>
> I think so.
>
> Maybe something like the small variant of b16 (first called b16-small,
> later renamed into b16, while the original (large) b16 was renamed into
> b16-dsp):
>
> https://bernd-paysan.de/b16-presentation.pdf
>
> Having the stacks on-chip would be great, but probably does not fit the
> transistor budget, so one would only keep TOS and P on-chip, and replace
> the rest with stack pointers (maybe 5 bits each) and one 16-bit buffer
> (not per stack) for keeping one other stack item after loading it or
> before storing it.

Dynamic structures (DRAM-style storage elements) are ok with an NMOS
CPU. A DRAM cell is 2 transistors (one used as switch, one used as
capacitor). 2*256 transistors for 2*8 cells stack looks ok.

As long as the stack is used push/pull only (means NOS is kept in a
register), reads can be destructive (you read when you pull, so the data
is no longer needed). Refresh logic could be software requirement (don't
keep things on the on-chip stack for more than a few 1000 instructions),
or a refresh interrupt would read out the stacks into main memory and
read them in again once every few 1000 instructions.

> The b16 design uses a 16-bit ALU. One can replace that with several
> passes through an 8-bit or 4-bit ALU, at an increase in control logic.
> Not sure if that would pay off wrt transistors. The first Nova
> certainly took the 4-bit-ALU approach. Given that you need a two-pass
> approach for 16-bit memory accesses anyway, the additional cost for a
> two-pass approach through an 8-bit ALU may be minor.

Probably. ALUs are not that big after all.

> I don't think that the b16 has interrupts, so you would need to add that
> to be on feature-parity with the 6502.

Yes. The use cases of the b16 didn't require interrupts, a “wait for an
event” feature was sufficient. The downside of interrupts is that you
need some stack space for them; the absolute minimum is one return and
one data stack item.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=uy@X}1GWr!+0qqp_Cn176t4(dQ*
https://bernd-paysan.de/

John Levine

unread,

Jul 24, 2021, 9:01:18 PM7/24/21

to

According to Quadibloc <jsa...@ecn.ab.ca>:

>I wonder what the transistor count of the PDP-5 was.
>
>The PDP-5 was the predecessor of the PDP-8; its instruction set
>was almost identical. However, it used memory location 0 as the
>program counter.

Probably not that much fewer than the PDP-8. The main difference was that
the PDP-8 was reimplemented in flip chips rather than the older system
modules. As you note, it put the PC in a register which would mean a few
more flip flops but the architecture was otherwise the same.

>Also, the PDP-5 was normally connected to a 5-level Teletype rather
>than an ASCII one.

No, the price list and manuals say it came with an ASR-33.

Michael Barry

unread,

Jul 25, 2021, 12:36:12 AM7/25/21

to

On Friday, July 23, 2021 at 3:59:30 AM UTC-7, Thomas Koenig wrote:
> Another direction for retro-architectures... I've been looking
> at the 6502 a bit, and it really is quite an interesting design.
> Squeezing the functionality of a CPU into ~3500 transistors (plus
> ~1000 transistors used as resistors) was quite an achievement.
>

> Could we do better knowing what we know now?
>

How about wider "bytes"? You could discard the silly 8-bit convention
and go with 9 or 10. You don't lose the classic flavor, but you increase
your opcode space by a power of two and your addressing space by a
power of four. Add a few more registers and a few more addressing
modes, add without carry, subtract without borrow, signed comparisons,
bsr, brl, conditional rts ... whatever floats your boat.

I grew up with the 6502 and the Z80, and they are both brilliant little gems
from the mid-70s. I gravitated toward the 6502 because it was more
accessible to me, and it fit better into my thought processes. I still post
frequently on 6502.org because I love how its assembly language feels in
my tired old brain. The 65816 with its width mode bits ... not so much.

Mike B.

Thomas Koenig

unread,

Jul 25, 2021, 4:55:58 AM7/25/21

to

Michael Barry <barry...@yahoo.com> schrieb:

> On Friday, July 23, 2021 at 3:59:30 AM UTC-7, Thomas Koenig wrote:
>> Another direction for retro-architectures... I've been looking
>> at the 6502 a bit, and it really is quite an interesting design.
>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>> ~1000 transistors used as resistors) was quite an achievement.
>>
>> Could we do better knowing what we know now?
>>
> How about wider "bytes"?

(Of course what I am now writing is shifting the goalpost on "8 bit
data bus").

One possibility would be a 16-bit RISC chip simplified down to the
bone, like what Helmut Neeman has done for his Digital simulator.

This is a 16-register, two-operand machine where instructions take
two bytes (one byte opcode, one byte operand) where short immediates
go into the instructions and long immediates are possible. It is
desiged for two-cycle operation, one cycle of memory operation
and one cycle of execution.

If you throw out the multiplier and replace the control ROM by a
PLA, this would come to around 6000 transistors (I have assumed
20 transistors for a full adder, which may be off). Add some more
logic for interrupt handling, and you are higher than the 6502,
but lower than the Z80, and any such CPU would have run rings around
any of the 8-bit CPUs at the time even when clocked a bit lower
than the 6502.

So, for that time machine to go back and tell people about the RISC
revolution in the 1970s - maybe Edson De Castro or Ken Olsen would
have been the wrong people to talk to, Chuck Peddle or Frederico
Faggin would have been better.

Terje Mathisen

unread,

Jul 25, 2021, 8:24:48 AM7/25/21

to

Thomas Koenig wrote:
> Guillaume <mes...@bottle.org> schrieb:

>> Only decent compilers can make efficient use of such simple RISC
>> architectures while not making the developers' life a hell. I guess.
>

> Hardly a decent compiler around for the 6502 in those days, at
> least I knew none. It was Basic or Assembler (or machine code
> via a monitor, with a hard reset if you made a programming mistake).

Let me fix that documentation bug for you:

"with a hard reset WHEN you made a programming mistake"

:-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Terje Mathisen

unread,

Jul 25, 2021, 8:35:18 AM7/25/21

to

Andy Valencia wrote:
> Timothy McCaffrey <timca...@aol.com> writes:
>> I think you meant 6800 (or 8080), 6809 didn't show up to several years later
>> (with 3x the number of transistors).
>
> I was trying to design page faults for the 6809. Exercise for the reader:
> What is the worst case number of pages faults which need to be resolved
> to complete a single instruction?

I did write some 6809 asm many years ago, but I don't remember enough of
the details now. :-(

In particular, any system with unlimited memory indirect can also cause
unimited number of page faults, even without any hardware looping, but
I'm pretty sure the 6809 didn't allow this?

Thomas Koenig

unread,

Jul 25, 2021, 8:53:03 AM7/25/21

to

Terje Mathisen <terje.m...@tmsw.no> schrieb:

> Thomas Koenig wrote:
>> Guillaume <mes...@bottle.org> schrieb:
>>> Only decent compilers can make efficient use of such simple RISC
>>> architectures while not making the developers' life a hell. I guess.
>>
>> Hardly a decent compiler around for the 6502 in those days, at
>> least I knew none. It was Basic or Assembler (or machine code
>> via a monitor, with a hard reset if you made a programming mistake).
>
> Let me fix that documentation bug for you:
>
> "with a hard reset WHEN you made a programming mistake"
>
>:-)

While better than the original, this is still not quite correct.
There were some programming mistakes (maybe 10% or so) which only
led to incorrect results and not to a freeze.

So, maybe

"with a hard reset most of the times when you made a programming
mistake" ?

The longest-running program I ever ran on that machine was searching
for Golomb rulers after reading an article on them in the German
version of Scientific American (using Basic for I/O), that ran
for three weeks; the lack of peripherals meant that I had to
take care that nobody unplugged the machine during that time.

Hm. I just saw that Martin Gardner's article in Scientific American
appeared 1972, far too early for me or for the C-64. They must have
reprinted it more than a decade later in the German edition.

Andy Valencia

unread,

Jul 25, 2021, 9:51:32 AM7/25/21

to

Terje Mathisen <terje.m...@tmsw.no> writes:
> In particular, any system with unlimited memory indirect can also cause
> unimited number of page faults, even without any hardware looping, but
> I'm pretty sure the 6809 didn't allow this?

We were kicking it around in private email (my memory is fuzzy, too).
Given the 6809's unaligned everything, you can have the instruction
straddle pages, the memory ref, the indirect ref, and the result
store all be unaligned memory refs. So... 8?

Of course, bulk memory move/set would also have unbounded, but that
has its own family of fixes to address it.

I vaguely remember a CPU where you could have unlimited indirect
memory references, and you could lock it up with an infinite
loop of them (until they fixed the implementation).

John Dallman

unread,

Jul 25, 2021, 10:08:23 AM7/25/21

to

In article <162722089396.5736....@media.vsta.org>,

van...@vsta.org (Andy Valencia) wrote:

> I vaguely remember a CPU where you could have unlimited indirect
> memory references, and you could lock it up with an infinite
> loop of them (until they fixed the implementation).

The DEC PDP-10 had this, although there may have been others.

John

Anton Ertl

unread,

Jul 25, 2021, 11:21:26 AM7/25/21

to

Thomas Koenig <tko...@netcologne.de> writes:
[Golomb Rulers]

>Hm. I just saw that Martin Gardner's article in Scientific American
>appeared 1972, far too early for me or for the C-64. They must have
>reprinted it more than a decade later in the German edition.

Thomas Koenig

unread,

Jul 25, 2021, 11:41:47 AM7/25/21

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> Thomas Koenig <tko...@netcologne.de> writes:
> [Golomb Rulers]
>>Hm. I just saw that Martin Gardner's article in Scientific American
>>appeared 1972, far too early for me or for the C-64. They must have
>>reprinted it more than a decade later in the German edition.
>
><https://www.csplib.org/Problems/prob006/references/> says:
>
>|Golomb rulers have been featured twice in the Computer Recreations
>|column of Scientific American, which did much to popularize them.
> ...
>|A. K. Dewdney
>|Computer Recreations
>|Scientific American, 21, mar 1986

That must have been the one, then, just before Chernobyl.

Thanks!

Michael Barry

unread,

Jul 25, 2021, 12:12:28 PM7/25/21

to

On Friday, July 23, 2021 at 4:53:55 PM UTC-7, Andy Valencia wrote:
>
> I was trying to design page faults for the 6809. Exercise for the reader:
> What is the worst case number of pages faults which need to be resolved
> to complete a single instruction?
>

I must have been a bit sheltered, because I never knew of a 6809 system
with virtual memory.

But for page crossings, I suppose a multi-byte instruction and its multi-byte
operand could both straddle page boundaries. Make that an indirect mode
and arrive at a worst case of three? I don't think the 6809 made as much of
a deal about page crossings as the 6502 did.

Anton Ertl

unread,

Jul 25, 2021, 12:17:47 PM7/25/21

to

Bernd Paysan <be...@net2o.de> writes:
>Am Sat, 24 Jul 2021 10:22:08 GMT schrieb Anton Ertl:
>
>> Thomas Koenig <tko...@netcologne.de> writes:
>>>Could we do better knowing what we know now?
>>
>> I think so.
>>
>> Maybe something like the small variant of b16 (first called b16-small,
>> later renamed into b16, while the original (large) b16 was renamed into
>> b16-dsp):
>>
>> https://bernd-paysan.de/b16-presentation.pdf
>>
>> Having the stacks on-chip would be great, but probably does not fit the
>> transistor budget, so one would only keep TOS and P on-chip, and replace
>> the rest with stack pointers (maybe 5 bits each) and one 16-bit buffer
>> (not per stack) for keeping one other stack item after loading it or
>> before storing it.
>
>Dynamic structures (DRAM-style storage elements) are ok with an NMOS
>CPU. A DRAM cell is 2 transistors (one used as switch, one used as
>capacitor). 2*256 transistors for 2*8 cells stack looks ok.

The 8008 fits an 8x14bits stack in its 3500 (PMOS) transistors, and
has a more complex architecture. And it probably was static rather
than dynamic RAM, because AFAIK there is no way to refresh it by
storing it to memory at regular intervals.

>Refresh logic could be software requirement (don't
>keep things on the on-chip stack for more than a few 1000 instructions),
>or a refresh interrupt would read out the stacks into main memory and
>read them in again once every few 1000 instructions.

Given that we implement interrupts, the interrupt approach looks like
a winner to me. Maybe have an internal cycle counter that generates
this interrupt at regular intervals.

>> The b16 design uses a 16-bit ALU. One can replace that with several
>> passes through an 8-bit or 4-bit ALU, at an increase in control logic.
>> Not sure if that would pay off wrt transistors. The first Nova
>> certainly took the 4-bit-ALU approach. Given that you need a two-pass
>> approach for 16-bit memory accesses anyway, the additional cost for a
>> two-pass approach through an 8-bit ALU may be minor.
>
>Probably. ALUs are not that big after all.

Today I lean more towards having a 16-bit ALU and letting the software
synthesize 16-bit memory accesses out of 8-bit accesses, to avoid the
need for a sequencer. Maybe I will revert that again when I think
more about how to deal with immediate arguments.

>The downside of interrupts is that you
>need some stack space for them; the absolute minimum is one return and
>one data stack item.

Yes. That does not look like a deal-breaker to me. The interrupt
handler would first dump as many data and return stack items to memory
as it needs.

Stephen Fuld

unread,

Jul 25, 2021, 12:31:56 PM7/25/21

to

Yes, as I have mentioned here before, the Univac 1108, and its
descendants can have "unlimited" memory references, through indirect
addressing or through an Execute instruction chain, but lockup is
prevented by a hardware timer preventing any of these from taking too
long by causing an illegal operation interrupt.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

EricP

unread,

Jul 25, 2021, 2:16:47 PM7/25/21

to

DG Nova too.
Nova only had 4 general registers AC0..AC3, plus a PC.
AC2, AC3 and PC could be used as index registers.
I believe the intent was to put a bank of virtual registers in memory,
say 16 or 32, and point one of the HW registers at that.

If the Indirect bit in an instruction == 0
the effective address is the address of the operand.
If Indirect bit == 1 the effective address is the address of the address.
Furthermore, if the msb of that second address == 1 then that
is also an address, looping until msb == 0.

if (Indirect)
{ do { addr = *addr; }
while ((int)addr < 0);
}
operand = *addr;

Early Nova models could get into operand read loops that
require a hard reset.

Ivan Godard

unread,

Jul 25, 2021, 2:35:35 PM7/25/21

to

I write a compiler for the Nova 1200, that had an iteration limit IIRC.
It could compile itself, one pass, in 64kb that you shared with the OS.
That compiler was how I first met Mitch.

EricP

unread,

Jul 25, 2021, 4:00:41 PM7/25/21

to

Do you remember ever using infinite indirect?
It strikes me as a feature that sounded good
over beers in the bar but in the cold light
of day is more trouble than it helps.

John Levine

unread,

Jul 25, 2021, 4:24:55 PM7/25/21

to

According to John Dallman <j...@cix.co.uk>:

>In article <162722089396.5736....@media.vsta.org>,
>van...@vsta.org (Andy Valencia) wrote:
>
>> I vaguely remember a CPU where you could have unlimited indirect
>> memory references, and you could lock it up with an infinite
>> loop of them (until they fixed the implementation).

DG Nova, perhaps?

>The DEC PDP-10 had this, although there may have been others.

The PDP-10 didn't have lockup problems because it took interrupts
during each address calculation cycle. This led to one wart, the "first part done"
flag used by the ILDB and IDBP instructions which first did the standard adress
calculation to fetch a byte pointer, then updated the pointer to refer to the next
byte, did a second address calculation with the address in the pointer, and then
loaded or stored the byte. The FPD flag told the processor not to update the
pointer when resuming from an interrupt between the two halves of the instruction.
The BLT block transfer instruction was interruptible, and stored the updated
to/from pointer word so it could continue when restarted.

One time when I was bored I wrote a little program that made an ever longer chain
of indirect addresses until it stalled because the time to compute the address was
longer than a clock tick.

The GE 635 had a no-interrupt flag in some instructions that you could use to
contruct critical section code and a timer that killed the job if he flag was
on too long.

Ivan Godard

unread,

Jul 25, 2021, 4:26:49 PM7/25/21

to

Certainly not infinite, but finite indirect could be used to get around
the paucity of index registers. I think the OS used it for interrupt
vectoring. I don't remember ever using it in the generated code, nor the
compiler itself. But data-indirect is an asm kind of thing, an I was
only using a HLL. Do you know if they repeated the feature in the Eclipse?

John Levine

unread,

Jul 25, 2021, 4:28:58 PM7/25/21

to

According to EricP <ThatWould...@thevillage.com>:

>Do you remember ever using infinite indirect?
>It strikes me as a feature that sounded good
>over beers in the bar but in the cold light
>of day is more trouble than it helps.

It could be handy when passing arguments to subroutines, e.g.

subroutine foo(x)

... z = bar(x)

The code in foo() passes an indirect pointer to its incoming argument,
which might in turn be indirect. In PDP-10 Algol they used chains of
XCT instructions to handle call-by-name.

I agree it was hard to find useful examples of indirection more than 2 or 3 deep,
but once you get past 1 level, it's easier not to have a limit.

MitchAlsup

unread,

Jul 25, 2021, 7:00:05 PM7/25/21

to

On Sunday, July 25, 2021 at 3:28:58 PM UTC-5, John Levine wrote:
> According to EricP <ThatWould...@thevillage.com>:
> >Do you remember ever using infinite indirect?
> >It strikes me as a feature that sounded good
> >over beers in the bar but in the cold light
> >of day is more trouble than it helps.
> It could be handy when passing arguments to subroutines, e.g.
>
> subroutine foo(x)
>
> ... z = bar(x)
>
> The code in foo() passes an indirect pointer to its incoming argument,
> which might in turn be indirect. In PDP-10 Algol they used chains of
> XCT instructions to handle call-by-name.
>
> I agree it was hard to find useful examples of indirection more than 2 or 3 deep,
> but once you get past 1 level, it's easier not to have a limit.
<

Could you find (access) the end of a linked list by infinite indirection ?

Ivan Godard

unread,

Jul 25, 2021, 7:06:17 PM7/25/21

to

You could just access address zero directly without bothering to go
through the chain.

EricP

unread,

Jul 26, 2021, 11:24:12 AM7/26/21

to

Anton Ertl wrote:
>
> The 8008 fits an 8x14bits stack in its 3500 (PMOS) transistors, and
> has a more complex architecture. And it probably was static rather
> than dynamic RAM, because AFAIK there is no way to refresh it by
> storing it to memory at regular intervals.

8008 (circa 1972) and 8080 (circa 1974) were both dynamic internally.

8008 required an externally generated 2 phase non overlapping
pulsed clock with some rather strict timing requirements.
Cycle time of each phase had minimum of 2 us, max of 3 us
taking 2 clock cycles for 1 machine state cycle.
First cycle, phase 1 is the precharge to internal memory and buses,
phase 2 is read, second cycle phase 1 is precharge, phase 2 is write.

8080 is similar but different phase and pulse width timings.

Motorola 6800 was also dynamic internally but it generated any
multiphase clocking it needed internally from the external clock,
withing a min and max frequency of 0.1 to 1.0 MHz.

RCA 1802 was static cmos so you could run it at any frequency,
or turn the clock off and it took just 0.1 (typical) uAmps
to retain its state.

EricP

unread,

Jul 26, 2021, 12:05:35 PM7/26/21

to

MitchAlsup wrote:
> On Sunday, July 25, 2021 at 3:28:58 PM UTC-5, John Levine wrote:
>> According to EricP <ThatWould...@thevillage.com>:
>>> Do you remember ever using infinite indirect?
>>> It strikes me as a feature that sounded good
>>> over beers in the bar but in the cold light
>>> of day is more trouble than it helps.
>> It could be handy when passing arguments to subroutines, e.g.
>>
>> subroutine foo(x)
>>
>> ... z = bar(x)
>>
>> The code in foo() passes an indirect pointer to its incoming argument,
>> which might in turn be indirect. In PDP-10 Algol they used chains of
>> XCT instructions to handle call-by-name.
>>
>> I agree it was hard to find useful examples of indirection more than 2 or 3 deep,
>> but once you get past 1 level, it's easier not to have a limit.
> <
> Could you find (access) the end of a linked list by infinite indirection ?

Sure but such a chain has to be constructed with all but the last
pointer having its indirect bit set and the last pointer clear.
For a dynamic chain, managing the indirect bits is more expensive
than the list management. A circular single linked list with a
tail pointer is cheapest.

To use this auto-indirect feature, the chain of links between
data structures would have to be planned in advance.
That limits its to pretty much what John Levine said, 2 or 3 levels max.

Note also the to get this indirect bit on every pointer they give
up byte addresses. Using little-endian bit numbering here
(Nova uses big-endian) bits [14:0] are a 16-bit word address
- there is no bit to hold a byte selector. Sure you could play
games right rotating char addresses but that all costs.

Thomas Koenig

unread,

Jul 26, 2021, 2:17:18 PM7/26/21

to

EricP <ThatWould...@thevillage.com> schrieb:

> RCA 1802 was static cmos so you could run it at any frequency,
> or turn the clock off and it took just 0.1 (typical) uAmps
> to retain its state.

Sounds like an attractive property for spacecraft (together with
the radiation-hardening properties from Silicon-on-Sapphire
that they apprently used).

David Schultz

unread,

Jul 26, 2021, 2:59:19 PM7/26/21

to

The C2L design (MOSFETs were always closed loops with the drain in the
center) provided some intrinsic radiation hardening.

Some extra was applied via special process steps. See:
https://www.osti.gov/biblio/6054040.

SOS was just icing on the cake.

And of course the 1802 was used in spacecraft with the most famous use
probably being the Galileo probe.

Some code from Galileo turned up a while back. They had a problem with
the memory and needed to patch the code. But the ability to run the
Forth Inc. MicroFORTH system had vanished. So a system was whipped up in
LISP to allow the task to be done.

https://github.com/rongarret/gll-mag-patch/

--
http://davesrocketworks.com
David Schultz

Andy Valencia

unread,

Jul 26, 2021, 7:49:10 PM7/26/21

to

Thomas Koenig <tko...@netcologne.de> writes:
> > RCA 1802 was static cmos so you could run it at any frequency,
> > or turn the clock off and it took just 0.1 (typical) uAmps
> > to retain its state.

> Sounds like an attractive property for spacecraft (together with
> the radiation-hardening properties from Silicon-on-Sapphire
> that they apprently used).

If you debounced a switch, you could actually advance it cycle by cycle
manually. My Super Elf's display would follow along as the CPU (slowly)
progressed. A poor man's single step debugger.

MitchAlsup

unread,

Jul 26, 2021, 8:40:14 PM7/26/21

to

On Monday, July 26, 2021 at 6:49:10 PM UTC-5, Andy Valencia wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> > > RCA 1802 was static cmos so you could run it at any frequency,
> > > or turn the clock off and it took just 0.1 (typical) uAmps
> > > to retain its state.
> > Sounds like an attractive property for spacecraft (together with
> > the radiation-hardening properties from Silicon-on-Sapphire
> > that they apprently used).
<
> If you debounced a switch, you could actually advance it cycle by cycle
> manually. My Super Elf's display would follow along as the CPU (slowly)
> progressed. A poor man's single step debugger.
<

Debouncing a switch is the original job of the S-R flip-flop.

JimBrakefield

unread,

Jul 26, 2021, 9:50:49 PM7/26/21

to

Putting on my low cost embedded systems hat:
every wire and every switch contact has a cost:
a SPST switch and one wire is lower cost than SPDT and two wires,
so the job falls to the interrupt handler or a polling routine.

> Debouncing a switch is the original job of the S-R flip-flop.

And 2X two-input nand or nor gates is about as low cost as it gets.

Given Musk's and others emphasis on no parts being
more desirable or reliable, one wonders how the cost/reliability
of software affects the trade-off?

a...@littlepinkcloud.invalid

unread,

Jul 27, 2021, 4:49:32 AM7/27/21

to

MitchAlsup <Mitch...@aol.com> wrote:
> On Saturday, July 24, 2021 at 1:06:33 PM UTC-5, John Levine wrote:
>>
>> The original PDP-8 had only 1409 transisors, each one in a separate can,
>> so it's not surprising.

And about 10,000 diodes.

>> The first computer I programmed was a PDP-8 and it was a
>> fantastically well-done tradeoff between extreme simplicity and
>> usability.

I hated it, but by then we were well into the 1970s and it looked like
something out of the stone age.

> Not quite a fair comparison: In the logic family of the PDP-8, 1 transistor
> could make a 5-input NAND gate whereas in the 6502 logic family this
> would take 6 transistors (5 pull downs (N-ch) and 1 pull up (depletion).)

Indeed, and when the PDP-8 was actually implemented as a single
(IM6100) IC it took 4000 tranisistors, which given the awfulness of
the instruction set no longer looked like such a good deal.

Andrew.

Anton Ertl

unread,

Jul 27, 2021, 1:53:02 PM7/27/21

to

a...@littlepinkcloud.invalid writes:

>MitchAlsup <Mitch...@aol.com> wrote:
>> Not quite a fair comparison: In the logic family of the PDP-8, 1 transistor
>> could make a 5-input NAND gate whereas in the 6502 logic family this
>> would take 6 transistors (5 pull downs (N-ch) and 1 pull up (depletion).)
>
>Indeed, and when the PDP-8 was actually implemented as a single
>(IM6100) IC it took 4000 tranisistors, which given the awfulness of
>the instruction set no longer looked like such a good deal.

However, the IM6100 was in CMOS (compared to NMOS for 6502), which
costs extra transistors (I think 10 transistors for the 5-input NAND
gate). Also, the IM6100 included the "Extended Arithmetic Element"
(multiply and divide instructions, and the MQ register). So I expect
that an NMOS implementation of the basic PDP-8 instruction set would
take fewer transistors than a 6502. I would not consider it a better
6502, though, and it certainly does not fit the "8-bit data bus"
requirement.

MitchAlsup

unread,

Jul 27, 2021, 6:56:42 PM7/27/21

to

On Tuesday, July 27, 2021 at 12:53:02 PM UTC-5, Anton Ertl wrote:
> a...@littlepinkcloud.invalid writes:
> >MitchAlsup <Mitch...@aol.com> wrote:
> >> Not quite a fair comparison: In the logic family of the PDP-8, 1 transistor
> >> could make a 5-input NAND gate whereas in the 6502 logic family this
> >> would take 6 transistors (5 pull downs (N-ch) and 1 pull up (depletion).)
> >
> >Indeed, and when the PDP-8 was actually implemented as a single
> >(IM6100) IC it took 4000 tranisistors, which given the awfulness of
> >the instruction set no longer looked like such a good deal.
> However, the IM6100 was in CMOS (compared to NMOS for 6502), which
> costs extra transistors (I think 10 transistors for the 5-input NAND
> gate). Also, the IM6100 included the "Extended Arithmetic Element"
> (multiply and divide instructions, and the MQ register). So I expect
> that an NMOS implementation of the basic PDP-8 instruction set would
> take fewer transistors than a 6502. I would not consider it a better
> 6502, though, and it certainly does not fit the "8-bit data bus"
> requirement.
<

Is the requirement to have only a 16-bit address bus and a 8-bit data bus?
<
If so, why not just use something that is already in a library as a CPU and
not bother ?
<
The utility of reinventing this wheel in today's world and market is so close
to zero it nears exponent underflow.

Thomas Koenig

unread,

Jul 28, 2021, 1:25:38 AM7/28/21

to

MitchAlsup <Mitch...@aol.com> schrieb:

> Is the requirement to have only a 16-bit address bus and a 8-bit data bus?

The question was of the "what if" type, what could have been in the
mid 1970s to build a better chip with what we know about chips now,
instead of building a 6502, with similar boundary conditions.

> If so, why not just use something that is already in a library as a CPU and
> not bother ?
><
> The utility of reinventing this wheel in today's world and market is so close
> to zero it nears exponent underflow.

Nobody suggested developing such beast for today's market; it
would be a hobby project at most.

Anton Ertl

unread,

Jul 28, 2021, 3:39:48 AM7/28/21

to

MitchAlsup <Mitch...@aol.com> writes:
>On Tuesday, July 27, 2021 at 12:53:02 PM UTC-5, Anton Ertl wrote:
>> a...@littlepinkcloud.invalid writes:
>> >MitchAlsup <Mitch...@aol.com> wrote:
>> >> Not quite a fair comparison: In the logic family of the PDP-8, 1 transistor
>> >> could make a 5-input NAND gate whereas in the 6502 logic family this
>> >> would take 6 transistors (5 pull downs (N-ch) and 1 pull up (depletion).)
>> >
>> >Indeed, and when the PDP-8 was actually implemented as a single
>> >(IM6100) IC it took 4000 tranisistors, which given the awfulness of
>> >the instruction set no longer looked like such a good deal.
>> However, the IM6100 was in CMOS (compared to NMOS for 6502), which
>> costs extra transistors (I think 10 transistors for the 5-input NAND
>> gate). Also, the IM6100 included the "Extended Arithmetic Element"
>> (multiply and divide instructions, and the MQ register). So I expect
>> that an NMOS implementation of the basic PDP-8 instruction set would
>> take fewer transistors than a 6502. I would not consider it a better
>> 6502, though, and it certainly does not fit the "8-bit data bus"
>> requirement.
><
>Is the requirement to have only a 16-bit address bus and a 8-bit data bus?

The IM6100 also does not satisfy the 16-bit address bus requirement.

The other requirement given by the OP
<sde7eg$hgb$1...@newsreader4.netcologne.de> was:

|Squeezing the functionality of a CPU into ~3500 transistors (plus
|~1000 transistors used as resistors) was quite an achievement.

>If so, why not just use something that is already in a library as a CPU and
>not bother ?
><
>The utility of reinventing this wheel in today's world and market is so close
>to zero it nears exponent underflow.

The problem is not about utility, but about insight:

|Could we do better knowing what we know now?

Anton Ertl

unread,

Jul 28, 2021, 6:02:49 AM7/28/21

to

EricP <ThatWould...@thevillage.com> writes:
>Anton Ertl wrote:
>>
>> The 8008 fits an 8x14bits stack in its 3500 (PMOS) transistors, and
>> has a more complex architecture. And it probably was static rather
>> than dynamic RAM, because AFAIK there is no way to refresh it by
>> storing it to memory at regular intervals.
>
>8008 (circa 1972) and 8080 (circa 1974) were both dynamic internally.

Now I remember that the 6502 was dynamic, too, and the 65C02 was
static.

However, it was never really clear to me what this "dynamic" means,
other than having a lower limit on the clock rate.

Does it mean that these dynamic CPUs used DRAM for their registers?
If so, how were they refreshed?

>8008 required an externally generated 2 phase non overlapping
>pulsed clock with some rather strict timing requirements.
>Cycle time of each phase had minimum of 2 us, max of 3 us
>taking 2 clock cycles for 1 machine state cycle.
>First cycle, phase 1 is the precharge to internal memory and buses,
>phase 2 is read, second cycle phase 1 is precharge, phase 2 is write.

This sounds like each bit would require more transistors than the two
that Bernd Paysan assumed for the b16 variant (and for which he
suggested software refresh).

Anton Ertl

unread,

Jul 28, 2021, 12:09:46 PM7/28/21

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Bernd Paysan <be...@net2o.de> writes:
>>Am Sat, 24 Jul 2021 10:22:08 GMT schrieb Anton Ertl:

>>> https://bernd-paysan.de/b16-presentation.pdf
...

>Today I lean more towards having a 16-bit ALU and letting the software
>synthesize 16-bit memory accesses out of 8-bit accesses, to avoid the
>need for a sequencer. Maybe I will revert that again when I think
>more about how to deal with immediate arguments.

For immediate arguments one could take a transputer-like approach (or
RISC-like, but the transputer seems more relevant given that we are
discussing a stack machine):

Immediates are synthesized on the top of return stack (that works out
better for conditional branches that deal with data on the top of the
data stack): Have an instruction that has a 7-bit immediate field and
pushes it sign-extended on the return stack; have another instruction
that has a 6-bit immediate field, shifts the top of return stack to
the left by 6 bits and puts the immediate field in the bottom 6 bits.

This means that in many cases we can have an immediate operand and the
instruction consuming the operand in the same number of bytes as the
6502. It leaves 64 encodings for other instructions, which should be
ample (the 6502 has 56).

It also means that I have now departed from the c18/b16 way of
encoding more than one instruction per word, and having immediate
arguments in extra words. I think that way would require more
transistors than just treating each byte as one instruction.

Concerning the need for multi-cycle instructions (and thus a
sequencer; but that could be cheap to implement), at least load and
store instructions need to be multi-cycle (one cycle for loading the
instruction, and at least one for loading or storing the data). And
given that multi-cycle instructions are needed anyway, having a
two-byte load and two-byte store may add few transistors.

EricP

unread,

Jul 28, 2021, 8:50:08 PM7/28/21

to

Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>> Anton Ertl wrote:
>>> The 8008 fits an 8x14bits stack in its 3500 (PMOS) transistors, and
>>> has a more complex architecture. And it probably was static rather
>>> than dynamic RAM, because AFAIK there is no way to refresh it by
>>> storing it to memory at regular intervals.
>> 8008 (circa 1972) and 8080 (circa 1974) were both dynamic internally.
>
> Now I remember that the 6502 was dynamic, too, and the 65C02 was
> static.
>
> However, it was never really clear to me what this "dynamic" means,
> other than having a lower limit on the clock rate.
>
> Does it mean that these dynamic CPUs used DRAM for their registers?
> If so, how were they refreshed?

I don't know of any reason that any device would have a a minimum clock
rate unless its' internal state required a dynamic refresh to retain.
However the storage cells may not be identical to an actual DRAM,
i.e. not a single capacitor.

The only reason I can think of, and I don't think this applies here,
that a device might be static logic internally and not require periodic
refresh but still has a minimum clock, is if the some of the substrate
voltages are generated internally by charge pumps driven by the clock.
If the clock stops then the charge bleeds off, the voltages disappear,
and the logic looses its state.

The 4004 is pMOS and the manual explicitly says the register file
is dynamic ram. The internal functional diagram has a refresh counter
muxed into the register file.

The 8008 is pMOS and has the same timing as 4004 and the functional
diagram has the same refresh counter attached to the register file.

The 8080 is nMOS and has similar clocking to 4004 and 8008.
The manual does not show a refresh counter on the internal functional
diagram but it does state that it is a "a dynamic device" that requires
a clock for refresh and timing control.

Anton Ertl

unread,

Jul 29, 2021, 8:19:49 AM7/29/21

to

EricP <ThatWould...@thevillage.com> writes:
>The 4004 is pMOS and the manual explicitly says the register file
>is dynamic ram. The internal functional diagram has a refresh counter
>muxed into the register file.
>
>The 8008 is pMOS and has the same timing as 4004 and the functional
>diagram has the same refresh counter attached to the register file.

My current understanding from this description is that the counter
counts through the various registers, and at some point during
instruction execution one register is read and written back.

Looking at the 6502 microarchitecture as described in
https://i.imgur.com/BkZ9o.png, with its individual registers on 4 or
so buses, and the way it is clocked, at first it seemed unplausible to
me that it could do automatic refreshing of DRAM registers.

But thinking about it, it's not that outlandish: When loading the next
instruction, the SB and DB buses are unused and can be used for
reading from one of the registers that can keep data for a long time
without being necessarily updated (in all bits): A, X, Y, S, P; and
the control logic can produce the right signals to do it. I don't see
anything like a refresh counter for counting through these registers,
but then not all of the signals from the timing generation logic are
clear to me, nor the control flip-flops; so the refresh counter might
hide in there.

It's interesting that despite the hardware limitations none of these
architectures fell back on software assist for refresh.

MitchAlsup

unread,

Jul 29, 2021, 2:22:14 PM7/29/21

to

On Wednesday, July 28, 2021 at 7:50:08 PM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > EricP <ThatWould...@thevillage.com> writes:
> >> Anton Ertl wrote:
> >>> The 8008 fits an 8x14bits stack in its 3500 (PMOS) transistors, and
> >>> has a more complex architecture. And it probably was static rather
> >>> than dynamic RAM, because AFAIK there is no way to refresh it by
> >>> storing it to memory at regular intervals.
> >> 8008 (circa 1972) and 8080 (circa 1974) were both dynamic internally.
> >
> > Now I remember that the 6502 was dynamic, too, and the 65C02 was
> > static.
> >
> > However, it was never really clear to me what this "dynamic" means,
> > other than having a lower limit on the clock rate.
> >
> > Does it mean that these dynamic CPUs used DRAM for their registers?
> > If so, how were they refreshed?
<
> I don't know of any reason that any device would have a a minimum clock
> rate unless its' internal state required a dynamic refresh to retain.
> However the storage cells may not be identical to an actual DRAM,
> i.e. not a single capacitor.
<

Mc 68020 had a minimum clock rate. There were latches between stages
that were a capacitor followed by an inverter driving the next stage, (also
there were pairs of capacitors driving a sense amplifier driving true-comp
to the next stage.)

EricP

unread,

Jul 29, 2021, 3:04:05 PM7/29/21

to

The wikipedia article on 6501/6502 is an interesting read.
https://en.wikipedia.org/wiki/MOS_Technology_6501

The 650x designers left Motorola after designing the nMOS 6800.
6800 nMOS required 3 voltages -5V, (gnd), +5V, +12V.
One of 6800 features was the on-chip voltage inverter for generating -5V,
and the voltage doubler for +12V invented and patented by John Buchanan
(the charge pumps I referred to earlier.)

Patent 3942047 MOS DC Voltage booster circuit, Buchanan, 1974
https://patents.google.com/patent/US3942047A/en?oq=3942047

On 8080 these voltages were externally supplied from DC-to-DC converters,
which are expensive, inefficient, and get quite hot.

Buchanan was left Motorola for MOS Technologies with the others
and designed the 650x's voltage generators.

6501 was pin compatible with 6800 and had the same 2 clocks phase
(phase 2 was just the compliment of phase 1 so its not clear why
6800 did that - maybe current draw for volt generators).
At any rate, the 6502 used just a single 5V clock.

So I'm guessing that 6800 and 650x used static latches for their registers,
the phi-1 and phi-2 clocks referred to on the JPG are generated internally
from clock edge-triggered, self-timed circuits to precharge the bus,
and the minimum clock was for the voltage generators as I suggested.

The nMOS 8080 had no such on-chip voltage generators so I have no idea
why it has a minimum clock of 0.5 MHz.

Thomas Koenig

unread,

Jul 29, 2021, 3:07:22 PM7/29/21

to

EricP <ThatWould...@thevillage.com> schrieb:

> So I'm guessing that 6800 and 650x used static latches for their registers,

https://downloads.reactivemicro.com/Electronics/CPU/6502%20Schematic.pdf
has the schematics, if you can read them, you could look it up :-)

David Schultz

unread,

Jul 29, 2021, 4:53:54 PM7/29/21

to

On 7/29/21 1:48 PM, EricP wrote:
> 6501 was pin compatible with 6800 and had the same 2 clocks phase
> (phase 2 was just the compliment of phase 1 so its not clear why
> 6800 did that - maybe current draw for volt generators).
> At any rate, the 6502 used just a single 5V clock.

The MC6800 clocks were required to be nonoverlapping. Usually supplied
by a MC6875.

Anton Ertl

unread,

Jul 30, 2021, 4:55:51 AM7/30/21

to

EricP <ThatWould...@thevillage.com> writes:
>The wikipedia article on 6501/6502 is an interesting read.
>https://en.wikipedia.org/wiki/MOS_Technology_6501
>
>The 650x designers left Motorola after designing the nMOS 6800.
>6800 nMOS required 3 voltages -5V, (gnd), +5V, +12V.
>One of 6800 features was the on-chip voltage inverter for generating -5V,
>and the voltage doubler for +12V invented and patented by John Buchanan
>(the charge pumps I referred to earlier.)
>
>Patent 3942047 MOS DC Voltage booster circuit, Buchanan, 1974
>https://patents.google.com/patent/US3942047A/en?oq=3942047

...

>So I'm guessing that 6800 and 650x used static latches for their registers,
>the phi-1 and phi-2 clocks referred to on the JPG are generated internally
>from clock edge-triggered, self-timed circuits to precharge the bus,
>and the minimum clock was for the voltage generators as I suggested.

According to the article you linked above, the 6501/6502 used
depletion-load NMOS:

|depletion-load NMOS is a form of digital logic family that uses only a
|single power supply voltage, unlike earlier nMOS (n-type metal-oxide
|semiconductor) logic families that needed more than one different
|power supply voltage.

<https://en.wikipedia.org/wiki/Depletion-load_NMOS>

So the 6501/6502 do not need such voltage generators.

>The nMOS 8080 had no such on-chip voltage generators so I have no idea
>why it has a minimum clock of 0.5 MHz.

Your earlier theory that Intel used DRAM for registers is plausible
given that they already had experience with that.

Quadibloc

unread,

Jul 30, 2021, 12:22:46 PM7/30/21

to

On Tuesday, July 27, 2021 at 4:56:42 PM UTC-6, MitchAlsup wrote:

> The utility of reinventing this wheel in today's world and market is so close
> to zero it nears exponent underflow.

Oh, that may be, but I never thought that this was what the OP's question
was about.
The original 6502 was a masterpiece of economical design.
Could it be bettered by a designer today?
I would suspect:
- not by much, because it was done so well;
- but it is not the case that modern designers have forgotten how to design
circuits efficiently. They still want to pack *more* on each die, even if that
more is 11 cores instead of 10, and those cores are GBOoO.

John Savard

Bernd Linsel

unread,

Jul 30, 2021, 12:40:36 PM7/30/21

to

On 28.07.2021 11:46, Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>

> Now I remember that the 6502 was dynamic, too, and the 65C02 was
> static.
>
> However, it was never really clear to me what this "dynamic" means,
> other than having a lower limit on the clock rate.
>
> Does it mean that these dynamic CPUs used DRAM for their registers?
> If so, how were they refreshed?
>
>

> This sounds like each bit would require more transistors than the two
> that Bernd Paysan assumed for the b16 variant (and for which he
> suggested software refresh).
>
> - anton
>

See https://en.wikipedia.org/wiki/Dynamic_logic_(digital_electronics) .

However, the distinction made in this article is not clear enough, as
there is as well clocked static logic.

To put it short, signals in dynamic logic circuits are always "in
flight" and only buffered by (small) capacitors between the gates, thus
the need for a minimal clock rate specification (roughly 5 tau of the
buffers), while static logic has latches between functional blocks that
keep the signals up as long as Vcc is applied, so that even if the clock
is stopped, the circuit state is held.

Besides, the maximum clock rate for dynamic logic may also be
constrained to a substantial lower clock rate than resulting from
switching times, because the capacitors can only be charged/discharged
with rates depending on gate fan-out.

On the other hand, the clock rate for static logic is restrained by the
additional latch switching times.

In summary, dynamic logic needs less silicon and less energy (if
designed correctly), but static logic is less dependent on clock phase
and stability, and has the advantage of a wide frequency range.

Today's big chips usually incorporate a mixture of both.

--
Regards,
Bernd

MitchAlsup

unread,

Jul 30, 2021, 2:03:42 PM7/30/21

to

On Friday, July 30, 2021 at 11:40:36 AM UTC-5, Bernd Linsel wrote:
> On 28.07.2021 11:46, Anton Ertl wrote:
> > EricP <ThatWould...@thevillage.com> writes:
> >
> > Now I remember that the 6502 was dynamic, too, and the 65C02 was
> > static.
> >
> > However, it was never really clear to me what this "dynamic" means,
> > other than having a lower limit on the clock rate.
> >
> > Does it mean that these dynamic CPUs used DRAM for their registers?
> > If so, how were they refreshed?
> >
> >
> > This sounds like each bit would require more transistors than the two
> > that Bernd Paysan assumed for the b16 variant (and for which he
> > suggested software refresh).
> >
> > - anton
> >
> See https://en.wikipedia.org/wiki/Dynamic_logic_(digital_electronics) .
>
> However, the distinction made in this article is not clear enough, as
> there is as well clocked static logic.
>
> To put it short, signals in dynamic logic circuits are always "in
> flight" and only buffered by (small) capacitors between the gates, thus
<

During the precharge time period, the signals are not "in flight" they are
solidly driven in one direction (usually high). It is only in the evaluation
period that the signals can be pulled low and then float until the precharge
period comes back around.

<
> the need for a minimal clock rate specification (roughly 5 tau of the
> buffers), while static logic has latches between functional blocks that
> keep the signals up as long as Vcc is applied, so that even if the clock
> is stopped, the circuit state is held.
>
> Besides, the maximum clock rate for dynamic logic may also be
> constrained to a substantial lower clock rate than resulting from
> switching times, because the capacitors can only be charged/discharged
> with rates depending on gate fan-out.
<

Some circuits such as integer adders are actually faster and lower power
in static implementations than in dynamic implementations.

>
> On the other hand, the clock rate for static logic is restrained by the
> additional latch switching times.
>
> In summary, dynamic logic needs less silicon and less energy (if
> designed correctly), but static logic is less dependent on clock phase
> and stability, and has the advantage of a wide frequency range.
>
> Today's big chips usually incorporate a mixture of both.
<

Mostly, the dynamic parts are hidden from designers inside macros that
the designer is not allowed to even look at. So, while there are dynamic
parts to modern chips, 99% of the designers work in purely static design
capacities. SRAM macros have internal dynamic components (bit lines
sense amplifiers,...) but the interface given to the designers using said
macro smells distinctly static (except that some libraries mandate that
the read out from the sense amplifiers are latched (or floped) before
used==seen by other signals. Very few things "on the data path" are
dynamic these days, essentially nothing in the control path is dynamic.
>
> --
> Regards,
> Bernd

Thomas Koenig

unread,

Jul 30, 2021, 2:55:59 PM7/30/21

to

Quadibloc <jsa...@ecn.ab.ca> schrieb:

If you take the basic concept of a 6502 (single-byte instructions,
heavy on sequencing, but not as heavy as the Z80) I think they
were pretty close to optimum on the cost/performance curve.

To expand a little on a previous post, I still think that a
proto-RISC design could have been superior. A design with

- All 16 bit instructions only
- Load/store architecture
- 16 registers of 16 bits each
- Eight bit opcode, eight bit data
- Possibility of 16-bit immediates for all arithmetic
operations, load and store for the address
- Format: op Ra, Ra or op Ra, #immediate (4 bits)

can probably be made to fit on a 6502-sized die using the
technology of the day.

Of course, such a design would be screaming to have a 16-bit
data bus. Power, Ground, Memory ready, IRQ, Reset, Read/Write,
NMI, Clock. Would that be enough?

EricP

unread,

Jul 30, 2021, 3:18:29 PM7/30/21

to

Thanks.

At the top right, latches for stack pointer, X and Y -
two inverters back to back, with the 2 pass gates cp2 and x6 being a mux,
and pass gate x7 output onto the bus.
To set the latch, put the data on the bus, set cp2=0 and x6=1.
Then x6=0 and cp2=1 to hold that value.

Since cp2 is clock pulse 2, it goes 0 every 1/2 cycle which leaves
its inverter gate input floating. I was taught to never do this.
Anyway, that might be a source of the minimum clock - if cp2 is
missing the gate charge bleeds off and loses its feedback state.

EricP

unread,

Jul 30, 2021, 3:45:05 PM7/30/21

to

Anton Ertl wrote:
> EricP <ThatWould...@thevillage.com> writes:
>> The wikipedia article on 6501/6502 is an interesting read.
>> https://en.wikipedia.org/wiki/MOS_Technology_6501
>>
>> The 650x designers left Motorola after designing the nMOS 6800.
>> 6800 nMOS required 3 voltages -5V, (gnd), +5V, +12V.
>> One of 6800 features was the on-chip voltage inverter for generating -5V,
>> and the voltage doubler for +12V invented and patented by John Buchanan
>> (the charge pumps I referred to earlier.)
>

> According to the article you linked above, the 6501/6502 used
> depletion-load NMOS:
>
> |depletion-load NMOS is a form of digital logic family that uses only a
> |single power supply voltage, unlike earlier nMOS (n-type metal-oxide
> |semiconductor) logic families that needed more than one different
> |power supply voltage.
>
> <https://en.wikipedia.org/wiki/Depletion-load_NMOS>
>
> So the 6501/6502 do not need such voltage generators.

Oops, right. Sorry about that. I was multitasking too much with
the 6800 article and mixed them up.

Never mind.

Brian G. Lucas

unread,

Jul 30, 2021, 3:53:54 PM7/30/21

to

I think you just described the Motorola MCore except for 16-bit immediates.

Quadibloc

unread,

Jul 30, 2021, 6:36:44 PM7/30/21

to

On Friday, July 30, 2021 at 1:53:54 PM UTC-6, Brian G. Lucas wrote:
> On 7/30/21 1:55 PM, Thomas Koenig wrote:

> > To expand a little on a previous post, I still think that a
> > proto-RISC design could have been superior. A design with

> > - All 16 bit instructions only
> > - Load/store architecture
> > - 16 registers of 16 bits each
> > - Eight bit opcode, eight bit data
> > - Possibility of 16-bit immediates for all arithmetic
> > operations, load and store for the address
> > - Format: op Ra, Ra or op Ra, #immediate (4 bits)

> I think you just described the Motorola MCore except for 16-bit immediates.

If somebody designed a bare-bones 16-bit processor that fit on
an 8-bit sized die, what prevented it from becoming an amazing success
that made people forget all about the 6502, the 8080, and even the 6800?

Oh. Introduced in late 1997. Unfortunately, it was a little late to the party;
and so in real life, the TI 9900 was the only minimalist 16-bit chip that
tried to hasten the transition from 8-bit to 16-bit.

John Savard

Tim Rentsch

unread,

Jul 30, 2021, 7:16:49 PM7/30/21

to

John Levine <jo...@taugh.com> writes:

> According to Quadibloc <jsa...@ecn.ab.ca>:
>
>> On Friday, July 23, 2021 at 4:59:30 AM UTC-6, Thomas Koenig wrote:
>>
>>> Another direction for retro-architectures... I've been looking
>>> at the 6502 a bit, and it really is quite an interesting design.

>>> Squeezing the functionality of a CPU into ~3500 transistors (plus
>>> ~1000 transistors used as resistors) was quite an achievement.
>>

>> Indeed.
>>
>> However, even the 6502, let alone the 6800 or the 8080, seemed
>> to me to have very complicated instruction sets compared to the
>> PDP-8.

>
> The original PDP-8 had only 1409 transisors, each one in a separate can,

> so it's not surprising. The first computer I programmed was a PDP-8 and

> it was a fantastically well-done tradeoff between extreme simplicity and
> usability.

Amusing data point: the LGP-30 had 19 flip-flops.

Brian G. Lucas

unread,

Jul 30, 2021, 8:28:58 PM7/30/21

to

The MCore is a 32-bit processor with 16-bit instructions.

> John Savard
>

Stephen Fuld

unread,

Jul 31, 2021, 1:48:47 AM7/31/21

to

Yes. With only an 8 bit bus, every instruction will require two memory
reads to fetch the instruction, which will substantially hurt
performance. 8088 anyone? :-(

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Anton Ertl

unread,

Jul 31, 2021, 4:57:49 AM7/31/21

to

Quadibloc <jsa...@ecn.ab.ca> writes:
>If somebody designed a bare-bones 16-bit processor that fit on
>an 8-bit sized die, what prevented it from becoming an amazing success
>that made people forget all about the 6502, the 8080, and even the 6800?
>
>Oh. Introduced in late 1997. Unfortunately, it was a little late to the party;
>and so in real life, the TI 9900 was the only minimalist 16-bit chip that
>tried to hasten the transition from 8-bit to 16-bit.

There are all the Nova-like microprocessors (National Semiconductor
IPC-16A/520 PACE, microNOVA mN601, Fairchild 9440), but from what I
read, they tended to be not particularly fast, more expensive than the
6502 (6502: $25, mn601: $95 in lots of 100, 9440: $395 in lots of
100), and not marketed as aggressively as the 6502.

Anton Ertl

unread,

Jul 31, 2021, 6:10:22 AM7/31/21

to

Thomas Koenig <tko...@netcologne.de> writes:
>If you take the basic concept of a 6502 (single-byte instructions,
>heavy on sequencing, but not as heavy as the Z80) I think they
>were pretty close to optimum on the cost/performance curve.

What makes you think so? Admittedly, we have not seen anything
guaranteed to be better, but then it's expensive (maybe a person-year)
and commercially pointless to design a CPU with 1975 technology today.

>To expand a little on a previous post, I still think that a
>proto-RISC design could have been superior. A design with
>
>- All 16 bit instructions only
>- Load/store architecture
>- 16 registers of 16 bits each
>- Eight bit opcode, eight bit data
>- Possibility of 16-bit immediates for all arithmetic
> operations, load and store for the address
>- Format: op Ra, Ra or op Ra, #immediate (4 bits)
>
>can probably be made to fit on a 6502-sized die using the
>technology of the day.

I see lots of things that make it more expensive:

16-bit ALU
16 registers of 16 bits (in addition to PC and status? or are they included?)
Bernd Paysan's suggestion of using DRAM for registers might help here.
longer instruction buffer (especially with the 16-bit immediates)

What might make it cheaper:

Depending on the instruction set: A simpler decoder with fewer cycles
to generate signals for (although 6502 was ingenious in minimizing the
microcode for its instruction set). Looking at the 6502 die shot
description in
<https://en.wikipedia.org/wiki/MOS_Technology_6502#Technical_description>,
apparently half of the 6502 was dedicated to control (microcode PLA
and random), so there is substantial room for improvement here.

>Of course, such a design would be screaming to have a 16-bit
>data bus. Power, Ground, Memory ready, IRQ, Reset, Read/Write,
>NMI, Clock. Would that be enough?

The 6502 has an extra VSS pin, 3 N.C. (not connected?) pins, a sync
pin, an S0 pin and two clock-out pins in addition. Not sure what the
sync and S0 pins are good for, but if you can do without all these
pins, a 16-bit data bus would be technically possible.

But the result does not fit your original requirements, and in 1975
would have been a commercial failure. Remember that for the 1981 IBM
PC, IBM chose the 8088 over the 8086, because a system with a 16-bit
data bus is significantly more expensive than one with an 8-bit data
bus.

If we stick with an 8-bit data bus as you originally required, your
16-bit instructions may be suboptimal. I also thought about something
RISC-like, but this aspect steered me to something b16-like.

Thomas Koenig

unread,

Jul 31, 2021, 8:23:02 AM7/31/21

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> Thomas Koenig <tko...@netcologne.de> writes:
>>If you take the basic concept of a 6502 (single-byte instructions,
>>heavy on sequencing, but not as heavy as the Z80) I think they
>>were pretty close to optimum on the cost/performance curve.
>
> What makes you think so? Admittedly, we have not seen anything
> guaranteed to be better, but then it's expensive (maybe a person-year)
> and commercially pointless to design a CPU with 1975 technology today.
>
>>To expand a little on a previous post, I still think that a
>>proto-RISC design could have been superior. A design with
>>
>>- All 16 bit instructions only
>>- Load/store architecture
>>- 16 registers of 16 bits each
>>- Eight bit opcode, eight bit data
>>- Possibility of 16-bit immediates for all arithmetic
>> operations, load and store for the address
>>- Format: op Ra, Ra or op Ra, #immediate (4 bits)
>>
>>can probably be made to fit on a 6502-sized die using the
>>technology of the day.
>
> I see lots of things that make it more expensive:
>
> 16-bit ALU

The ALU should perform:

- ADD, SUB, ADDC, SUBC
- OR, AND, NOT, XOR
- ROL, ROR, ASL

The most expensive part is the adder (used as adder subtractor
of course). A 16-bit ripple-carry adder (have to leave _some_
room for improvement) including the xor is about 500 transistors

Shift instructions are just wiring. AND, OR and NOT are cheap, for
XOR, the XOR from the adder/subtractor can be re-used.

The processor should probably be set up so that the ALU also
does branch calculations.

So, ~ 700-800 transistors for the ALU including the multiplexers.

> 16 registers of 16 bits (in addition to PC and status? or are they included?)

16 registers of 16 bits is 256 bits, four transistors each (like they
did on the actual 6502), with multiplexers, lets's say 1200 transistors.

2000 so far.

> Bernd Paysan's suggestion of using DRAM for registers might help here.
> longer instruction buffer (especially with the 16-bit immediates)
>
> What might make it cheaper:
>
> Depending on the instruction set: A simpler decoder with fewer cycles
> to generate signals for (although 6502 was ingenious in minimizing the
> microcode for its instruction set). Looking at the 6502 die shot
> description in
><https://en.wikipedia.org/wiki/MOS_Technology_6502#Technical_description>,
> apparently half of the 6502 was dedicated to control (microcode PLA
> and random), so there is substantial room for improvement here.

That was the point I was trying to get across. A proto-RISC 16-bit
processor could get by without much of the sequencing and control
logic on the 6502. Let's say the additional logic is implemented
in 1500 transistors (not unreasonable), and we're up to 3500.

>>Of course, such a design would be screaming to have a 16-bit
>>data bus. Power, Ground, Memory ready, IRQ, Reset, Read/Write,
>>NMI, Clock. Would that be enough?
>
> The 6502 has an extra VSS pin, 3 N.C. (not connected?) pins, a sync
> pin, an S0 pin and two clock-out pins in addition. Not sure what the
> sync and S0 pins are good for, but if you can do without all these
> pins, a 16-bit data bus would be technically possible.

> But the result does not fit your original requirements, and in 1975
> would have been a commercial failure. Remember that for the 1981 IBM
> PC, IBM chose the 8088 over the 8086, because a system with a 16-bit
> data bus is significantly more expensive than one with an 8-bit data
> bus.

You're right, this could be a significant bottleneck, but maybe still
better than a 6502.

Terje Mathisen

unread,

Jul 31, 2021, 9:39:30 AM7/31/21

to

The only good thing about 8088 was the fact that you could statically
calculate the number of cycles a given code construct would use, i.e.
"number of bytes touched/read/written by code or data" multiplied by 4.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

MitchAlsup

unread,

Jul 31, 2021, 12:48:27 PM7/31/21

to

Shift instructions require multiplexers.

>
> The processor should probably be set up so that the ALU also
> does branch calculations.
>
> So, ~ 700-800 transistors for the ALU including the multiplexers.
> > 16 registers of 16 bits (in addition to PC and status? or are they included?)
> 16 registers of 16 bits is 256 bits, four transistors each (like they
> did on the actual 6502), with multiplexers, lets's say 1200 transistors.
<

The 4 transistors make up the storage unit, but you have not added the
read (mux) or write ports, so the count is a bit higher.

>
> 2000 so far.
> > Bernd Paysan's suggestion of using DRAM for registers might help here.
> > longer instruction buffer (especially with the 16-bit immediates)
> >
> > What might make it cheaper:
> >
> > Depending on the instruction set: A simpler decoder with fewer cycles
> > to generate signals for (although 6502 was ingenious in minimizing the
> > microcode for its instruction set). Looking at the 6502 die shot
> > description in
> ><https://en.wikipedia.org/wiki/MOS_Technology_6502#Technical_description>,
> > apparently half of the 6502 was dedicated to control (microcode PLA
> > and random), so there is substantial room for improvement here.
<

Like Mc 88100, 6502 used a single NOR plane as microcode.