16 bit risc instruction set

Mark T

unread,

Nov 10, 2022, 2:10:49 PM11/10/22

to retro-comp

This is an outline of the instruction set planned for a 16 bit pipelined cpu. I was originally considering an eight bit instruction set and data bus with 16 bit internal registers, but the lack of instruction bits puts too low a limit on the number of registers that could be addressed and would probably end up as an accumulator plus source register type ISA.

Original plan was 3 bits for op code, 5 for destination and first source, 5 for second source and 3 for ALU function.

5 bits for register address would allow 32 registers. Plan was to use 74AS870 or IS61C256 cache ram. I had considered 74AC670, but I don’t have enough of those for more that 8 registers. The 74AS870 contains two 16x4 register files which seems at first to provide 32x4 registers, but the decoding is a bit odd. There are two ports to access registers and select lines to choose which of the two 16x4 register files are accessed by each port, unfortunately the register files each have separate address control so not possible to have each port select any one of the 32 registers while the other port does the same. There is a 12ns setup time on select lines before write enable activated and 12 ns hold time on select lines after write enable inactivated. To avoid these setup and hold times I’ll hardwire the select lines to a single 16x4 register file, there will be a common address control for read and write but separate data port for writing and reading from registers. There will be two banks of 74AS870, one for each source to the ALU and a common input to both banks so both ba nks will hold a mirror of the data.

Mark T

unread,

Nov 10, 2022, 2:46:30 PM11/10/22

to retro-comp

With only 16 registers addressable I don’t want to follow common RISC practice of dedicating R0 to 0. For a similar reason and to simplify instruction decoding the program counter and condition codes will not be mapped to a register address. The Program counter is planned to use 74ACT163 so a separate incrementer ALU is not needed, this introduces some complication for jump and link which I’ll describe later.

Instruction register is planned as follows.

IR15 - 1 second source is IR7..0, sign extended for ADDI (add immediate) or shifted to 15..8 for LUI (load upper immediate)

IR15 - 0 second source is register IR7..4

IR14 - 1 destination for ALU result is PC or Status word

IR14 - 0 destination for ALU/MEM result is register IR11..8

IR13 - 1 first source for ALU is PC or Status word

IR13 - 0 first source for ALU or source for store MEM is register IR11..8

If both IR14 and IR13 select register then IR12 selects ALU or MEM operation

Otherwise IR12 - 1/0 to select between PC or Status word

IR11..8 - Register address for destination or MEM/first source

(If PC is both source and destination IR11..8 selects condition from Status word flags)

iR7..0 - 8 bits immediate data, sign extended or shifted and 0 filled for ADDI or LUI

IR7..4 - Register address for second source or MEM address

IR3..0 - MEM operation selects Read/Write (maybe memory/input/output, code, stack etc)

IR3..0 - ALU operation select (3 bits for 74F381/2, additional bit for carry selection)

Mark T

unread,

Nov 10, 2022, 3:01:54 PM11/10/22

to retro-comp

The common pipeline sequence for risc processors is

instruction fetch,

instruction decode and register read

ALU operation

Memory operation
Register write back.

With this sequence a load from memory prior to an ALU operation requires a null operation or bubble to be inserted.

With only 16 bit instruction its not possible to code an offset to the memory address for the ALU operation prior to the memory access so to avoid hardware to insert a bubble the plan is to modify the pipeline sequence to:

Instruction fetch

instruction decode and register read

Either ALU operation or Memory operation

Register write back

This avoids the need to insert a bubble but any memory operation with an address calculation now requires two separate instructions.

This also complicates operand forwarding logic and multiplexing.

Mark T

unread,

Nov 10, 2022, 3:28:38 PM11/10/22

to retro-comp

Clock cycle time of the processor is limited by the slowest pipeline stage.

This was first thought to be the ALU and Register setup and hold times. Using the 74F381/2 with 74F181 lookahead and the 74ACT399 as input registers to the ALU, it might just be possible to reach a 33ns cycle time. Unfortunately I also need to allow for the longer set up time of the 74ACT163 adding a further 4-6ns to the cycle time (unless I pipeline the PC update, but that increases branch delay).

Register read and write is planned to follow the common risc practice of write to register during the first half of each cycle and read from register during the second half of the cycle, as this reduces the number of operand forwarding paths.

74AC670 has separate read and write control and would be ideal for this but would need too many chips to support 16 x16 bit registers with two output ports, total of 64 x 74AC670.

74AS870 with select controls hardwired for 16x4 bit register could possible reach 20ns for write followed by 20ns for read. The control circuit for address selection between read and write will be critical but a 40ns cycle time may be possible. This will need 8 x 74AS870.

74F219 might have been an alternate for the register file in a smaller 16 pin dil, but these are a little slower than the 74AS870 and I also already have the 74AS870s.

Target clock speed for the processor will be 20-25MHz, generated from a master clock running at 40-50MHz to split the register access cycles.

Alan Cox

unread,

Nov 10, 2022, 4:08:34 PM11/10/22

to Mark T, retro-comp

Most of the simpler 16bit processors I've fiddled with didn't have
that many registers but some were certainly risc-like(by necessity of
the times).

The DG Nova basically has 4 registers of which one you use as the
stack pointer and two cannot be used for indexing as one combination
is "no index" and another other is "PC relative". PC not being a
generic register on Nova. The encoding was pretty tight although it
wasn't until the Nova 4 that byte pointers, hardware stack (as opposed
to software conventions) and the Eclipse that load immediate got added
(Nova you load PC relative which means the assembler has to know how
to spill constants and jump to keep them in range or hide them in skip
slots). ARM draws a bunch of stuff from the Nova including some of the
conditional execution and "operation with extra effect bit" mindset.

The indexing form is nice because it's basically two bits into a four
way mux of 0, PC and register 2 or 3.

RISC16 is another interesting one - it's lacking a few essentials
(shifts) you'd really need to add for any actual use but it's
basically a teaching ultra simple 16bit RISC CPU intended to teach
simple, pipelined and out of order CPU design in US university
courses. https://user.eng.umd.edu/~blj/risc/ . Code density is quite
poor though which is a big deal for 16bit IMHO.

The most beautiful 16bit simple instruction set I know is probably the
SEL System72 (apart from the fact it has special I/O instructions) .
It really was an encoding thing of elegance - single instruction
format and totally consistent set of flag bits on the instructions. It
does however require that registers also map as memory word 0-7. Like
many period systems of this form it had options for fast SRAM (very
pricy at the time) in the lowest memory locations and ran much faster
that way if fitted rather than just core. The SEL encoding is

[R][I][X][op][S][displacement]

with 5 operation bits. R I X S are relative, signed, indirect and xreg
and generate an address

You could also just use a 12ns RAM and not bother with separate
registers. It's how core based machines often worked in the days of
slow cycle times, and it's also how some more modern minicomputers
like the Gould systems did but with cache RAMs. They avoided all the
cache complexity by just using fast SRAM back when fast SRAM although
pricey and power hungry was basically CPU clock speed.

Anyway that was more of a diversion on "you probably only need four or
eight registers" 8)

Alan

Mark T

unread,

Nov 10, 2022, 5:57:55 PM11/10/22

to retro-comp

I had considered trying to implement something like the INS8060 with a pipeline in ttl, perhaps even supporting a subset of the INS8060 instructions, probably excluding the delay instruction and.bcd addition. I wanted to try and build something to try and equal a 20MHz z80 but the instruction set of the 8060 is very limited.

I had looked at the RISC-16 pages and decided on 32 registers limited to a combined source/destination and secod source structure, before I realised the limitation of the 74AS870. Then the choice came down to 16 x 16 bit registers using 8 x 74AS870 or 8 x 16 bit registers using 32 x 74AC670, the lower chip count for more registers seemed like a better tradeoff, though the 74AS870 will be a bit more complicated to control the timing. Reducing the number of registers to only 4 seemed to be a reduction too far, by the time you have a stack pointer and a link register, leaves only two registers left and makes a block memory copy slower.

Alan Cox

unread,

Nov 10, 2022, 6:49:10 PM11/10/22

to Mark T, retro-comp

>timing. Reducing the number of registers to only 4 seemed to be a reduction too far, by the time you have a stack pointer and a link register, >leaves only two registers left and makes a block memory copy slower.

Only if you can't stack/borrow any link register. Otherwise I agree
entirely - 6803 has exactly that problem which is one reason the
68HC11 added a Y register so you had two indexes. Ditto a lot of 8051
clones have dual data pointer additions to the base architecture.

Mark T

unread,

Nov 10, 2022, 6:50:55 PM11/10/22

to retro-comp

For memory I think I’ll be using a pair of 128k x 8 cache ram chips, though this is a bit of a cheat for something that could otherwise be a late 1980s design.

For a pipelined processor the natural configuration seems to be a harvard model, separate program and data memory, ussually via separate program and data cache. I think I’ll avoid the complexity of cache for now and also planning a combined code and data memory. Memory access will be at twice the speed of the processor clock, similar to how the register interface uses the first half cycle for write and the second half for read. Instead the first half cycle will be for Memory read or write operations, then the second half of the cycle will be for instruction fetch.

Memory access will be 16 bit word only for now, but maybe with provision for byte access to be added at a later date.

Maximum direct memory access will be 32k 16 bit words, the program counter 74ACT163s will be offset by one bit so the address is incremented by 2 on each fetch.

A0 will be ignored for now, possibly used for high/low byte selection later.

I’m not going to attempt any branch prediction, branches will be delayed due to the pipeline and there will be two branch delay slots to be filled as the program counter will be set by the output of the ALU.

Using the 74ACT163s as a program counter makes it a bit more complicated to save the return address instead of the last instruction address for the jump and link instruction for subroutine calls. I’ll need a multiplexer or tri state buffer to select between memory address or program counter to address memory. Plan is to have 74ACT574 register the instruction address and increment the 74ACT163s on the beginning of the second half of the cycle. At the beginning of the first half of the cycle the 74ACT163s might be loaded with the output of the ALU.

The 74ACT163s need to be clocked at the begining and middle of each cycle but without a delay between this double clock and the regular instruction cycle clock. I think I can do this with a 74ACT74, one half divides the master clock by two, the second half is used as a delay on the master clock with an inverter from the Q output to the clear input. The inverter from Q will asynchonously reset the Second half of the 74ACT74 after every rising edge of the master clock.

The goal is to avoid any clock gating that might introduce race hazards so all registers will be updated on every cycle or half cycle while the inputs to the registers are selected by multiplexers or tri state outputs from a previous pipeline stage. The majority of the pipeline and control registers will be 74ACT574 with a few 74ACT399 where a multiplexer from non-tri state outputs is needed.

Reply all

Reply to author

Forward