Misc: Tweak for usefulness of large instruction encodings?...

36 views

Skip to first unread message

BGB

unread,

Dec 19, 2025, 7:28:24 PM12/19/25

to isa...@groups.riscv.org

Thought:
Would keep the existing scheme for 16/32/48/64 bit instructions in
RISC-V as-is;
Maybe tweak the scheme for potential larger encodings.

As can be noted, the existing scheme is:
...-xx-xxx-00: 16-bit (C extension)
...-xx-xxx-01: 16-bit (C extension)
...-xx-xxx-10: 16-bit (C extension)
...-xx-xxx-11: 32+ bits
...-x0-111-11: 48 bits
...-01-111-11: 64 bits
...-11-111-11: 80+ bits (uses Func3 for 16-bit count)

But, say:
80 bits doesn't seem like a terribly useful size as-is;
Not really enough bits to fit an Imm64 or similar;
There isn't likely a need for that many fine-grain size buckets;
80 bits has the further disadvantage that is messes with the alignment
of 32-bit ops (say on an implementation where keeping 32-bit alignment
for 32-bit instructions is relevant for performance).

So, it makes sense to partly skip over using 80 bits, and focus more on
96 bits. May make sense to keep 80 bits around in-case still needed
(just probably for some other, non-Imm64-use-case).

While I have a scheme for 64-bit encodings (which work by extending the
32-bit ops; Imm12/Disp12 to Imm33/Disp33), as-is it doesn't usefully
extend to larger immediate sizes (such as Imm64).

One possible thought here is:
Next size up used is 96 bits instead of 80;
The Func3 encoding scheme is then modified:
000: 80 bit (kept, partly skipped over)
001: 96 bit (A Block)
010: 128 bit
011: 192 bit (or 192+)
1xx: 96 bit (B Block)

In this case, there is 2 more bits for 96 bit ops (as the expense of
granularity for the larger sizes).

Within 192 bits, possibly there is an escape case for bigger-still ops,
or just leave 192 as the biggest size.

This scheme loses 112, 144, 160, and 176 bit instruction sizes though.

May make sense to define part of the encoding space relative to the
basic 32-bit encoding scheme just with some of the bits moved around.

wwww-zzz-yyxxx-sssss-001-nnnnn-11-111-11 =>
0ww0w0w-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)

wwww-zzz-yyxxx-sssss-1sn-nnnnn-11-111-11 =>
1wwswnw-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)

This being to make it easier to leverage an existing 32-bit decoder when
implementing support for larger encodings.

Possible example encoding (B Block):
wwww-zzz-00000-sssss-1sn-nnnnn-11-111-11 LOAD Rn, Disp64(Rs)
wwww-zzz-01000-sssss-1st-ttttt-11-111-11 STORE Rt, Disp64(Rs)
Rs==X0: Abs64

wwww-zzz-11000-sssss-1st-ttttt-11-111-11 Bcc Rs, Rt, Abs64

wwww-zzz-00001-sssss-1sn-nnnnn-11-111-11 FLD Rn, Disp64(Rs)
wwww-zzz-01001-sssss-1st-ttttt-11-111-11 FST Rt, Disp64(Rs)

0000-000-11011-00000-10n-nnnnn-11-111-11 JAL Rn, Abs64

wwww-zzz-00100-sssss-1sn-nnnnn-11-111-11 ALUI Rn, Rs, Imm64

Then:
0000-000-00100-sssss-1sn-nnnnn-11-111-11 ADDI Rn, Rs, Imm64
0000-010-00100-sssss-1sn-nnnnn-11-111-11 SLTI Rn, Rs, Imm64
0000-011-00100-sssss-1sn-nnnnn-11-111-11 SLTIU Rn, Rs, Imm64
0000-100-00100-sssss-1sn-nnnnn-11-111-11 XORI Rn, Rs, Imm64
0000-110-00100-sssss-1sn-nnnnn-11-111-11 ORI Rn, Rs, Imm64
0000-111-00100-sssss-1sn-nnnnn-11-111-11 ANDI Rn, Rs, Imm64

Non-standard, but maybe makes sense to include:
0100-010-00100-sssss-1sn-nnnnn-11-111-11 SGEI Rn, Rs, Imm64
1000-010-00100-sssss-1sn-nnnnn-11-111-11 SEQI Rn, Rs, Imm64
1100-010-00100-sssss-1sn-nnnnn-11-111-11 SNEI Rn, Rs, Imm64

...

Possible:
0m00-000-10100-sssss-1SN-nnnnn-11-111-11 FADD.D Fn, Fs, Binary64
0m00-001-10100-sssss-1SN-nnnnn-11-111-11 FSUB.D Fn, Fs, Binary64
0m00-010-10100-sssss-1SN-nnnnn-11-111-11 FMUL.D Fn, Fs, Binary64
0m00-011-10100-sssss-1SN-nnnnn-11-111-11 FDIV.D Fn, Fs, Binary64
...

Where, in this case the Imm64 is a raw 64-bit value following the
initial 32-bit instruction word.

The two bits here could possibly be used to extend the register fields
to 6 bits, in this case (if supported) serving to merge the X and F
registers. Within the 001 encoding, both would be understood as 0 (X
registers).

Though, in this case, it could make sense to decode the ALU-Imm64 block
partly by internally mapping it to the 3R ALU block, just with Rs2
replaced by the Imm64 value.

Likely, using many other instruction blocks would be reserved.

For the FPU block, these could encode Binary64 immediate values for some
instructions (some alternate 64-bit encodings could be used for Binary32
immediate values). The SN bits would invert the values (0=Fn, 1=Xn). The
rounding mode could be reduced to 1 bit (0=RNE, 1=DYN).

As noted, Bcc and JAL could become Abs64 rather than PC+Disp.
Some of my existing 64-bit Disp33 encodings would be used for Disp33
branches.

The exact meaning of full 64-bit addresses may depend on implementation
(for example, my CPU core uses a 48 bit address space with tag bits in
the high 16 bits, and the LSB for branches used to encode an implicit
mode change).

Such an encoding could allow for direct branches between ISA modes
(rather than using a multiple-instruction sequence to load the target
address from memory and then branching to it).

...

Any thoughts?...

BGB

unread,

Dec 21, 2025, 6:19:35 AM12/21/25

to isa...@groups.riscv.org

On 12/19/2025 6:28 PM, BGB wrote:
> Thought:
> Would keep the existing scheme for 16/32/48/64 bit instructions in RISC-
> V as-is;
> Maybe tweak the scheme for potential larger encodings.
>
> As can be noted, the existing scheme is:
>    ...-xx-xxx-00: 16-bit (C extension)
>    ...-xx-xxx-01: 16-bit (C extension)
>    ...-xx-xxx-10: 16-bit (C extension)
>    ...-xx-xxx-11: 32+ bits
>    ...-x0-111-11: 48 bits
>    ...-01-111-11: 64 bits
>    ...-11-111-11: 80+ bits (uses Func3 for 16-bit count)
>
>

...

>
> One possible thought here is:
> Next size up used is 96 bits instead of 80;
> The Func3 encoding scheme is then modified:
>     000: 80 bit (kept, partly skipped over)
>     001: 96 bit (A Block)
>     010: 128 bit
>     011: 192 bit (or 192+)
>     1xx: 96 bit (B Block)
>
> In this case, there is 2 more bits for 96 bit ops (as the expense of
> granularity for the larger sizes).
>
> Within 192 bits, possibly there is an escape case for bigger-still ops,
> or just leave 192 as the biggest size.
>
> This scheme loses 112, 144, 160, and 176 bit instruction sizes though.
>

Will note that my current implementation currently just assumes that
anything in the 80+ block is 96 bits for now.

My current CPU core can only express a limited range of instruction sizes:
16, 32, 48, 64, and 96.

Where:
80, currently no way to express it easily;
128/192: Both are larger than my current instruction fetch.

>
> May make sense to define part of the encoding space relative to the
> basic 32-bit encoding scheme just with some of the bits moved around.
>
>    wwww-zzz-yyxxx-sssss-001-nnnnn-11-111-11 =>
>    0ww0w0w-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)
>
>    wwww-zzz-yyxxx-sssss-1sn-nnnnn-11-111-11 =>
>    1wwswnw-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)
>
> This being to make it easier to leverage an existing 32-bit decoder when
> implementing support for larger encodings.
>
>
>
> Possible example encoding (B Block):
> wwww-zzz-00000-sssss-1sn-nnnnn-11-111-11 LOAD Rn, Disp64(Rs)
> wwww-zzz-01000-sssss-1st-ttttt-11-111-11 STORE Rt, Disp64(Rs)
>     Rs==X0: Abs64
>
> wwww-zzz-11000-sssss-1st-ttttt-11-111-11 Bcc   Rs, Rt, Abs64
>
> wwww-zzz-00001-sssss-1sn-nnnnn-11-111-11 FLD   Rn, Disp64(Rs)
> wwww-zzz-01001-sssss-1st-ttttt-11-111-11 FST   Rt, Disp64(Rs)
>
> 0000-000-11011-00000-10n-nnnnn-11-111-11 JAL   Rn, Abs64
>
>
> wwww-zzz-00100-sssss-1sn-nnnnn-11-111-11 ALUI Rn, Rs, Imm64
>

...

I decided to go a different route, more conservative with my existing
implementation (thus less decoder logic needed; but more dog-chewed).

Going the previously described route would have both required an
entirely different chunk of logic for the Immediate handling, and also
more logic for moving bits around in the decoder (and individual
handling for various instruction blocks, etc).

My existing implementation basically deals with this stuff by using 3
32-bit instruction decoders and some glue logic; and a design goal is to
try to minimize the need for new logic or special cases here (and
transparently deal with more of the ISA).

Ended up adding a 64-bit J52I prefix, currently:
spppppp-ppppp-qqqqq-110-rrrrr-01-11111 -
tiiiiii-iiiii-jjjjj-uuu-kkkkk-vv-vvv11 J52I

Which is decoded partly as two J21I prefixes:
0iiiiii-iiiii-jjjjj-100-kkkkk-01-11111 J21I

So, low 32-bits of immediate are decoded the same as the J21I+Imm12
scenario.

Where, for example:
0iiiiii-iiiii-jjjjj-100-kkkkk-01-11111 -
qoooooo-ooooo-mmmmm-000-nnnnn-00-10011 ADDI Rn, Rm, Imm33

Would decode the 33-bit immediate as:
q-kkkkk-jjjjj-iiiiii-iiiii-oooooo-ooooo
The immediate is then sign-extended to 64 bits.

So, for the J52 prefix, the low 32 bits are the same, and the high
32-bits are decoded as-if it were another J21I prefix, just filling in
the remaining bits by scavenging:
spppppp-ppppp-qqqqq-110-rrrrr-01-11111 -
tiiiiii-iiiii-jjjjj-uuu-kkkkk-vv-vvv11 -
qoooooo-ooooo-mmmmm-000-nnnnn-00-10011 ADDI Rn, Rm, Imm64

Would decode the 64-bit immediate as:
rrrrr-qqqqq-pppppp-ppppp-vvvvvuuustq -
kkkkk-jjjjj-iiiiii-iiiii-ooooooooooo

I decided also to define the JAL Abs64 case as JALR with Disp64 and Rs1
as X0. Where, I have run into a case where I feel a need for Abs64 branches:
For some scenarios, it would be preferable to be able to hot-patch code
so that "trap-and-emulate" scenarios can be replaced with branches into
handlers;
Normal JAL only has a range of around 1MB, and LUI+JALR needs to stomp a
register (but, one doesn't necessarily know which registers are safe to
use when hot patching).

So, for example, in a normal hot-patch scenario, one might have to, say:
Replace target instruction with a JAL;
Potentially find somewhere else to put another JAL;
...
Until one can get to a place to put a blob, like, say:
SD X1, -8(SP)
SD X5, -16(SP)
AUIPC X5, BrAddrHi
LD X5, BrAddrHi(X5)
JALR X1, 0(X5) //to actual handler
LD X5, -16(SP)
LD X1, -8(SP)
JAL X0, RetAddr //we need to get back where we came from

Which kinda sucks...

So, for example, an Abs64 JAL helps here (though, a 96-bit encoding is
less ideal for hot-patching; one still needs either intermediate JAL's
or to patch over multiple instructions).

Reason for this is mostly that hot patching can be better for
performance than emulating instructions in an exception handler every
time (needs to branch to a blob of code that mimics the behavior of the
offending instruction and then branches back to the location of the
following instruction).

Well, in my implementation it is also possible to branch between ISA
modes, but doing this in a single instruction will require a full-width
address.

But, there are some use cases for also addressing the matter of 64-bit
immediate values, even if the need for them is statistically infrequent.

Annoyingly, this does not allow for a unified register space within the
RV encoding space.
But, it is a tradeoff.

Note that this is separate from my XG3 thing, I am more just trying to
figure a sensible way to deal with these cases in a way that is also
compatible with the existence of the 'C' extension.

>
> Any thoughts?...
>

Well, still open to any thoughts.

But, am operating within the limits of what seems viable to try to
implement (and the idea from the previous message was probably too
ambitious here; and I had thought about it and realized that it was in
fact possible to squeeze the needed bits into a bigger jumbo prefix...).

Reply all

Reply to author

Forward

0 new messages