On 12/19/2025 6:28 PM, BGB wrote:
> Thought:
> Would keep the existing scheme for 16/32/48/64 bit instructions in RISC-
> V as-is;
> Maybe tweak the scheme for potential larger encodings.
>
> As can be noted, the existing scheme is:
> ...-xx-xxx-00: 16-bit (C extension)
> ...-xx-xxx-01: 16-bit (C extension)
> ...-xx-xxx-10: 16-bit (C extension)
> ...-xx-xxx-11: 32+ bits
> ...-x0-111-11: 48 bits
> ...-01-111-11: 64 bits
> ...-11-111-11: 80+ bits (uses Func3 for 16-bit count)
>
>
...
>
> One possible thought here is:
> Next size up used is 96 bits instead of 80;
> The Func3 encoding scheme is then modified:
> 000: 80 bit (kept, partly skipped over)
> 001: 96 bit (A Block)
> 010: 128 bit
> 011: 192 bit (or 192+)
> 1xx: 96 bit (B Block)
>
> In this case, there is 2 more bits for 96 bit ops (as the expense of
> granularity for the larger sizes).
>
> Within 192 bits, possibly there is an escape case for bigger-still ops,
> or just leave 192 as the biggest size.
>
> This scheme loses 112, 144, 160, and 176 bit instruction sizes though.
>
Will note that my current implementation currently just assumes that
anything in the 80+ block is 96 bits for now.
My current CPU core can only express a limited range of instruction sizes:
16, 32, 48, 64, and 96.
Where:
80, currently no way to express it easily;
128/192: Both are larger than my current instruction fetch.
>
> May make sense to define part of the encoding space relative to the
> basic 32-bit encoding scheme just with some of the bits moved around.
>
> wwww-zzz-yyxxx-sssss-001-nnnnn-11-111-11 =>
> 0ww0w0w-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)
>
> wwww-zzz-yyxxx-sssss-1sn-nnnnn-11-111-11 =>
> 1wwswnw-TTTTT-sssss-zzz-nnnnn-yy-xxx-11 (depends on block)
>
> This being to make it easier to leverage an existing 32-bit decoder when
> implementing support for larger encodings.
>
>
>
> Possible example encoding (B Block):
> wwww-zzz-00000-sssss-1sn-nnnnn-11-111-11 LOAD Rn, Disp64(Rs)
> wwww-zzz-01000-sssss-1st-ttttt-11-111-11 STORE Rt, Disp64(Rs)
> Rs==X0: Abs64
>
> wwww-zzz-11000-sssss-1st-ttttt-11-111-11 Bcc Rs, Rt, Abs64
>
> wwww-zzz-00001-sssss-1sn-nnnnn-11-111-11 FLD Rn, Disp64(Rs)
> wwww-zzz-01001-sssss-1st-ttttt-11-111-11 FST Rt, Disp64(Rs)
>
> 0000-000-11011-00000-10n-nnnnn-11-111-11 JAL Rn, Abs64
>
>
> wwww-zzz-00100-sssss-1sn-nnnnn-11-111-11 ALUI Rn, Rs, Imm64
>
...
I decided to go a different route, more conservative with my existing
implementation (thus less decoder logic needed; but more dog-chewed).
Going the previously described route would have both required an
entirely different chunk of logic for the Immediate handling, and also
more logic for moving bits around in the decoder (and individual
handling for various instruction blocks, etc).
My existing implementation basically deals with this stuff by using 3
32-bit instruction decoders and some glue logic; and a design goal is to
try to minimize the need for new logic or special cases here (and
transparently deal with more of the ISA).
Ended up adding a 64-bit J52I prefix, currently:
spppppp-ppppp-qqqqq-110-rrrrr-01-11111 -
tiiiiii-iiiii-jjjjj-uuu-kkkkk-vv-vvv11 J52I
Which is decoded partly as two J21I prefixes:
0iiiiii-iiiii-jjjjj-100-kkkkk-01-11111 J21I
So, low 32-bits of immediate are decoded the same as the J21I+Imm12
scenario.
Where, for example:
0iiiiii-iiiii-jjjjj-100-kkkkk-01-11111 -
qoooooo-ooooo-mmmmm-000-nnnnn-00-10011 ADDI Rn, Rm, Imm33
Would decode the 33-bit immediate as:
q-kkkkk-jjjjj-iiiiii-iiiii-oooooo-ooooo
The immediate is then sign-extended to 64 bits.
So, for the J52 prefix, the low 32 bits are the same, and the high
32-bits are decoded as-if it were another J21I prefix, just filling in
the remaining bits by scavenging:
spppppp-ppppp-qqqqq-110-rrrrr-01-11111 -
tiiiiii-iiiii-jjjjj-uuu-kkkkk-vv-vvv11 -
qoooooo-ooooo-mmmmm-000-nnnnn-00-10011 ADDI Rn, Rm, Imm64
Would decode the 64-bit immediate as:
rrrrr-qqqqq-pppppp-ppppp-vvvvvuuustq -
kkkkk-jjjjj-iiiiii-iiiii-ooooooooooo
I decided also to define the JAL Abs64 case as JALR with Disp64 and Rs1
as X0. Where, I have run into a case where I feel a need for Abs64 branches:
For some scenarios, it would be preferable to be able to hot-patch code
so that "trap-and-emulate" scenarios can be replaced with branches into
handlers;
Normal JAL only has a range of around 1MB, and LUI+JALR needs to stomp a
register (but, one doesn't necessarily know which registers are safe to
use when hot patching).
So, for example, in a normal hot-patch scenario, one might have to, say:
Replace target instruction with a JAL;
Potentially find somewhere else to put another JAL;
...
Until one can get to a place to put a blob, like, say:
SD X1, -8(SP)
SD X5, -16(SP)
AUIPC X5, BrAddrHi
LD X5, BrAddrHi(X5)
JALR X1, 0(X5) //to actual handler
LD X5, -16(SP)
LD X1, -8(SP)
JAL X0, RetAddr //we need to get back where we came from
Which kinda sucks...
So, for example, an Abs64 JAL helps here (though, a 96-bit encoding is
less ideal for hot-patching; one still needs either intermediate JAL's
or to patch over multiple instructions).
Reason for this is mostly that hot patching can be better for
performance than emulating instructions in an exception handler every
time (needs to branch to a blob of code that mimics the behavior of the
offending instruction and then branches back to the location of the
following instruction).
Well, in my implementation it is also possible to branch between ISA
modes, but doing this in a single instruction will require a full-width
address.
But, there are some use cases for also addressing the matter of 64-bit
immediate values, even if the need for them is statistically infrequent.
Annoyingly, this does not allow for a unified register space within the
RV encoding space.
But, it is a tradeoff.
Note that this is separate from my XG3 thing, I am more just trying to
figure a sensible way to deal with these cases in a way that is also
compatible with the existence of the 'C' extension.
>
> Any thoughts?...
>
Well, still open to any thoughts.
But, am operating within the limits of what seems viable to try to
implement (and the idea from the previous message was probably too
ambitious here; and I had thought about it and realized that it was in
fact possible to squeeze the needed bits into a bigger jumbo prefix...).