On Sunday, January 1, 2023 at 3:42:05 PM UTC-6, Russell Wallace wrote:
> From the 90s onward, all high-performance CPUs have been heavily pipelined. Before that, this was not necessarily the case; early microprocessors tended not to be pipelined, presumably due to not having the transistor budget. The earliest pipelined microprocessors were typically RISC, presumably because this kind of design is easier with a simpler architecture.
<
Pipelining is expensive (flip flops, clock loading and fanout). Pipelining goes back to at least Stretch
and possibly earlier. Even in the world of microcoded mainframes, various pieces of instruction
execution were pipelined (Fetch and Memory Access) because these paid off, while a complete
pipeline did not.
>
> I'm wondering about the even earlier, 8-bit microprocessors. Take for example the Z80. Is this simpler or more complex than an early RISC like MIPS? In some ways it is simpler (smaller transistor count); in others, it is more complex. Probably overall harder to pipeline.
<
Most of the 8-bit CPUs were entirely bus limited, thus pipelining of the CPU would not have "been
worth it". RISC came along in the time period where one had 2 choices, build a microcoded CPU
(like we had been doing for decades) and include the microcode in the chip with the execution
hardware, or build a pipelined CPU with a very tiny control sequencer.
>
> Just what would it take to make a pipelined implementation of the Z80 architecture? (i.e. you can lay out the chip any way you like, but it has to run existing Z80 code unchanged except for being faster, and it has to stay within process technology available in the heyday of the Z80, say not later than mid-80s.) How many transistors would such an implementation need, compared to the ~8000 of the original Z80?
<
In Z80 time frame Flip-Flops were 6 gates big (10-12 today) and if you pipelined the entire Z80
CPU attempting to get 0.7 instructions per cycle, the CPU would be around 2.5×-3.0× the size
of the Z80 CPU. While doable, until you have a memory port (that is enough pins) big enough
and cache-enough-like to support 1 memory reference per clock, you would gain only 10%-20%
over a Z80 as was done-at the cost of 2.5×-3.0× making the effort "not worth it".