Dear RISC-V Community,
Folks might remember me from a fast binary translator named rv8:
-
https://github.com/michaeljclark/rv8
In the meantime, I have been thinking a lot about geometric coding
schemes for compressing instructions using super regular and simple
instruction coding schemes designed for fast binary translation to
riscv64, aarch64, or x86_64. I am designing an instruction encoding for
a virtual machine target that could also be a physical architecture.
# RISC-IB - a super regular RISC with immediate blocks
Here are some of the design constraints:
- 16, 32, 64, and 128-bit variable length instruction packets.
- minimizes bit twiddling for vector decoding by CPU translators.
- separates instruction stream into instructions and constants.
- adds immediate base register next to the program counter.
- linkage uses 32-bit relative displacements in constant islands.
- geometric coding scheme uses field extension in successive packets.
- opcode bits are never used as an immediate so the wires never cross,
instead register slots are bonded together and the maximum embedded
immediate value in 16-bit instruction packets is 9-bit. this reduces the
total number of instruction formats in a super regular scheme.
- there's lots of room for static vector SIMD configurations in the
larger opcodes for the larger packets but they have not been coded.
The instruction format is here:
https://metaparadigm.com/~mclark/VLI.pdf
This new encoding is a RISC with immediate blocks (constant islands),
although we make more extensive use of them compared to GP-Relative
loads that come in via the load-store unit. The key innovation is we
split the instruction stream into two streams, one for instructions and
the other for constants. Constant load instructions are not new.
Argonaut Games had a RISC GPU with constant load instructions in the
early 1990's. The key innovation is that we pair the program counter
with an immediate base register (pc,ib), and add IBS (immediate block
switch), LIB (load from immediate block), and a new procedure call
instruction JALIB (jump and link immediate block) that links the program
counter and immediate base register at the same time into two adjacent
register slots using 32-bit PC-Relative addresses from the constant
stream. This might sound a little like TOC pointers on PowerPC only the
JALIB instruction adds two 32-bit PC-Relative addresses from a constant
island to the (pc,ib) pair, linking the old program counter and old
immediate base register into two adjacent registers at the same time.
Noting (pc,ib) registers must be spilled to the stack as a pair.
Here is a first cut at the compressed opcode space:
https://gist.github.com/michaeljclark/8f9b81e5e40488035dc252c9da3ecc2e
- 16-bit compressed instruction packet can access 8 x 64-bit registers.
- 16-bit compressed instruction packet can access 64 x 64-bit constants.
- (pc,ib) is a special program counter and immediate base register pair.
- c.ibs adds 32-bit disp +/-1KiB (512*4) to switch immediate blocks.
- c.lib uses unsigned 6-bit disp to access 64 x 64-bit constants (64*8).
- c.jalib uses unsigned 6-bit disp to add 32-bit disp to (pc,ib) linking
program counter and immediate base into adjacent registers.
From the perspective of a compiler and linker, this has advantages:
- large immediate values are all packed into constant islands.
- constant synthesis is replaced by constants from constant islands.
- 9/18-bit displacements fit inside 16/32-bit instructions packets.
- linking larger displacements uses aligned relative 32-bit constants.
From the perspective of a microarchitecture, this has several
implications for the instruction and constant data path and how a branch
predictor might work with a potential physical architecture.
- (pc,ib) are updated as a double-wide register pair to simplify the
branch predictor for instruction and constant fetch.
- there are three memory ports versus two: control, constant, data
- constants should most likely be fetched with X permissions.
- the branch predictor and instruction fetch treat the immediate base
register like the program counter so that the immediate base can be
predicted in the same way as the program counter. it is designed to be a
front-end register next to the program counter (pc,ib) and could be
copied with it for referencing constants in the current constant block
which will be fetched at the same time as the instruction block. the
branch unit is going to be switching and linking both at the same time.
- there will be some forwarding latency to populate constants but it
will be less than populating constants via the load-store unit.
- could work with two memory ports or a separate constant data path
bypasing registers/constants using a dedicated operand caching bus.
We might keep far indirect branches and base register updates 64-byte
aligned but those details must be figured out. There is no indirect
branch instruction in the 16-bit opcode space because the branch and
immediate base register displacements come from constant islands. This
is deliberate. While I guess it is an indirect branch, it is indirect
from the constant stream which would have RX permissions.
It also uses compare and branch, putatively with branch predicate
registers where the compressed set implicitly uses pred0. Only the
3-operand ALU instructions use 2-read ports and 1-write port. This is
because we want to optimize the virtual machine target for translation
to Aarch64 and x86-64 which use separate compare and branch instructions
but it is still in essence pretty close to RISC-V.
The 16-bit instructions are free-standing in that it would be possible
to make a soft-float target that uses only the 16-bit instructions.
It's not so dissimilar to RISC-V that a design couldn't be modified
although the instruction coding is unique. A modification to a RISC-V
design could put constant fetches over the instruction fetch port or via
the load-store unit, however, it is designed so that a physical
instantiation would use a specialized branch predictor that predicts the
immediate base register for fast constant fetches. The immediate base
register should be treated more like the program counter in the front
end so that instruction fetch is beside constant fetch.
The current RISC-V encoding is not ideal as a virtual target after
having built decoders and translators. This is where RISC-IB fits as a
virtual target with considerations one puts into a physical target.
I haven't made a compiler, linker, or interpreter yet but I have a
feeling it will compress well and decode quickly as it is simple. It has
been designed from scratch as a virtual target with what should be
decent compression, potentially with vectorized instruction decode as
well as being optimized for a potential physical instantiation.
# Highlighted differences
- instruction decode is a little different from RISC-V.
- has program counter and immediate base register pair (pc,ib).
- load-imm-ib loads a constant from an immediate block.
- jalib links (pc,ib) pair and adds two 32-bit constants to (pc,ib).
- doesn't yet have an indirect branch instruction (wink wink).
- uses separate compare and branch instructions like aarch64/x86.
- ABI should use input canonicalization as opposed to returned value.
I like the idea of 1-bit branch predicate registers for compare
instructions to allow renaming branches. You could unify them if, for
example, you made the renamer allocate 1-bit registers for labelled
branches. The 32-bit compare instruction will have a predicate register
but the 16-bit compare instruction implicitly uses pred0 with a 9-bit
displacement for an effective 10-bit reach for compressed near branches.
Fetch from constant memory is now > 33 years old, and it has curious
implications for microarchitecture research, specifically in branch
predict and fetch for interleaved instruction and constant streams.
The more I think about it, the more I like it so I'm about to start
writing an interpreter. Unfortunately, it is a tremendous amount of work
to make a compiler, assembler, linker, and translator, thus it might
begin as a research interpreter. At this point, it is a sketch.
I'm using the RISC-IB moniker because this is presently very unofficial.
Regards,
Michael Clark