On 11/7/2024 9:43 AM, Guy Lemieux wrote:
> it's complicated because it has to be. what we're doing is actually
> minimal and practical.
>
> what you're trying to do is virtually impossible without setting some
> ground rules, like what happens to the register contents of the old
> ISA, what happens to anything running in the "background", and any
> interrupt responses etc. your ground rules might be pretty harsh which
> is more like "shutdown this processor and its ISA completely, and boot
> this other processor ISA". that's pretty easy to do on an FPGA --
> probably easier by having two actual processors. good luck :-)
>
In my case, I have a mechanism to call between different ISA modes in my
CPU core, but granted:
Both ISA's share the same register space, with defined mapping rules;
There is enough overlap between them that things aren't too weird;
All other significant parts of the architecture are shared;
It is essentially a single CPU core with multiple sets of decoders.
In my case, the inter-ISA call/return was implemented by using tag bits
in the function-pointer register and link-register (conceptually also
tied to the PC register, sorta).
Can't claim to be strictly original in this approach.
Granted, would be a much uglier issue if the ISA's were entirely
dissimilar or ran on different processors. Not really sure how this
would be approached.
The cost on an Inter-ISA call is essentially that it requires a pipeline
flush if the mode changes (it may be branch-predicted if no mode change
will happen). There is some tagging on cache lines in the L1 instruction
cache, but this should not matter (unless the same line were being run
as two different ISAs, it would not matter).
Setting up a function pointer for the target mode often has to be done
manually. In my ISA, it is possible to use "LEAT.B (PC, Disp), Rn" to
capture a tagged function pointer, but currently no direct equivalent
exists in RV land (AUIPC+ADD will give an untagged pointer).
However, for the link register it is implicit (for code in RV64 mode, it
doesn't poke at the link-register contents, so doesn't notice that mode
tagging is being used). Trying to jump to the link register implicitly
restores the mode captured in the link register.
So:
( 0): 0=Same ISA, 1=Inter-ISA (always 1 for Link-Register values)
(47: 1): PC Address
(63:48): Mode (6-bit) and Status (T, S, and U bits).
High bits are ignored if LSB is clear.
With Modes:
000000: Baseline (No WEX)
000001: Baseline (WEX)
000010: RV64GC
000011: RV64G + XG3RV (new)
000100: XG2 (No WEX)
000101: XG2 (WEX)
000110: XG2RV (No WEX)
000111: XG2RV (WEX)
rest: Unused/Reserved for now.
Where:
Baseline:
Original ISA mode;
16/32/64/96 bit instruction encodings;
Nominally 32 GPRs.
Has 16-bit instructions:
For the most part, can only access R0..R15;
Encoding and layout kinda resembles the SuperH ISA.
RV64GC:
Should be obvious enough
RV64G + XG3RV:
Newer mode (still very experimental).
Has the RV64G encoding space.
XG3 is a bit-repacked version of my ISA.
Mostly to make the encoding less ugly;
And to be able to exist in the same encoding space.
Like with RV, XG3 relies on hardware superscalar.
Internally, is repacked into a modified XG2RV during Fetch.
Allowed reusing the existing instruction decoders.
XG2:
A modified form of Baseline
Expanded register fields to 6 bits via XOR trickery;
Expanded immediate fields in similar ways;
The 16-bit instructions are N/E in this mode.
XG2RV:
XG2 encoding, but using the RISC-V register space.
As is, register mappings:
RV:
X0: ZZR (Zero)
X1: LR (RA)
X2: SP
X3: GP (GBR)
X4: TP (R4)
X5..X13: R5..R13
X14/X15: R2/R3
X16..X31: R16..R31
F0..F31: R32..R63
Baseline / XG2:
R0: DLR (N/E in RV)
R1: DHR (N/E in RV)
R2/R3 (X14/X15)
R4..R13 (X4..X13)
R14 (N/E in RV)
R15 (SP)
...
XG2RV/XG3RV:
X0..X31: Same as RV
X32..X63: R32..R63 / F0..F31
Both of these modes use the RV64 LP64 ABI.
Earlier on, X4 was mapped to TBR, but I ended up changing this when
trying to get RV64 Linux ELF binaries to work, realizing in this case
that the libc implementations try to set up and manage TP themselves.
XG2RV has not seen much use, as by itself it lacks a compelling use-case:
There has been little real reason to use it by itself over XG2;
The need for mode changing to interoperate with RV64 code was still a
hassle;
...
XG3RV seems a bit more promising:
Direct inter-operation with RV64G is possible without the use of
function-pointer tagging;
Both RV64G and XG3RV can coexist in the same encoding space;
The compiler can freely mix/match instructions from both ISAs (at
present, the output from my compiler is a confetti mix of both ISAs).
Also XG3RV seems to have succeeded in giving "performance that doesn't
suck", which also giving interop with RV64G that doesn't suck.
Can note, general instruction encoding scheme I ended up going with:
XXXX-oooooo-mmmmmm-ZZZZ-nnnnnn-QY-YYPw (3R)
iiii-iiiiii-mmmmmm-ZZZZ-nnnnnn-QY-YYPw (3RI)
iiii-iiiiii-iiiiii-aZZZ-nnnnnn-bY-YYPw (2RI Imm16)
iiii-iiiiii-iiiiii-aZZZ-jjjjjj-bY-YYPw (~ JAL, +/- 16MB)
Where:
X/Y/Z: Opcode
n=Rd, m=Rs1, o=Rs2
i=Immed
Pw: 00/01: Predicated (currently unused), 10=XG3 Op, 11=RV Op
This ended up replacing the idea I posted about a little over a month
ago; I ended up redesigning things in a way that allowed me to leverage
my existing decoders.
It lacks the concept of WEX (explicit bundle encoding), as for XG3 I
went over to in-order superscalar. It does still have jumbo prefixes,
but uses a different set of prefixes than those used for RV64+Jx
encodings (and thus far, the two types of jumbo prefixes are not
interchangeable).
XG3RV would still have interop hassles with RV64GC though.
This hassle can be avoided via the RV64+Jumbo scheme, which has full
compatibility with RV64GC, but isn't quite as good in terms of performance;
Trying to use it as a generic 64-register ISA is worse on average than
using it as a 32 register ISA (one generally needs to keep integer
values in X registers and FPU values in F registers otherwise
code-density is negatively effected).
In contrast, XG3RV, with has native 6 bit register fields, and gets
better code density and performance when used as a flat 64 register
space (relative gains over the prior Jumbo-Prefix extension are much
smaller if used as a 32 register ISA). I suspect this is the major
"practical" difference here (otherwise, the functionality from my own
ISA that is being used in the case of Doom, is largely already present
in the 'B' extension).
Can note that XG3RV is within an 8% performance delta relative to XG2,
so there may not be much more to gain here. Thus far, my compiler isn't
using the full feature set of this mode; and some features (such as
predication) I am considering leaving as optional.
Where, I can note for ".text" size in Doom (along with fps at start of
E1M1, at 50 MHz):
XG2 (BGBCC): 289K, 25 fps
XG3RV (BGBCC): 320K, 23 fps
RV64G+Jx (BGBCC): 360K, 20 fps (*1)
RV64GC (GCC ): 393K, -- (*2)
RV64G (BGBCC): 438K, 12 fps (*3)
RV64G (GCC ): 445K, 17 fps
*1: Jumbo prefixes (expanded immediate and displacement fields),
register-indexed load/store, Zba instructions, and load/store pair.
*2: Builds but Doom crashes on start-up, may need more debugging.
*3: My compiler isn't doing so hot here...
I suspect I may be the near the limits of how much speed I can get out
of this (at least, for programs like Doom and similar).
Might also make sense to do similar comparisons for Quake.
Side note, in the past had also experimented with faster-clocked scalar
cores, but generally have had better results at slightly lower
clock-speeds (which generally allowed for things like bigger and better
performing L1 caches). Say: 32K L1 cache with consistent access latency,
beats a 4K L1 cache that needs to stall whenever there is a memory RAW
hazard or similar, ... What MHz gives, L1 misses and RAW hazards takes away.
Most attempts to move to more MHz almost invariably hurting performance
more than is gained.
But, that said, going much bigger than 32K of L1 cache also doesn't gain
much.
> guy
>
>
> On Thu, Nov 7, 2024 at 1:39 AM Robert Finch <
robf...@gmail.com
> <mailto:
robf...@gmail.com>> wrote:
>
> That is in the direction I was thinking. Thanks for the references.
> However, it appears to be somewhat RISCV centric. It also seems to
> be pretty complex, I have not read the whole thing yet though. I was
> thinking that for instance, that multiple CPU types could be stored
> on a flash memory device and selected in an FPGA via a call
> instruction. One of my current projects has an i386 similar core
> along with a native core. The i386 does not know anything about
> CSRs, neither does the native core. Because the cores do not know
> anything about the structure of another core, the interface has to
> be somewhat abstract / bland.
>
>
> On Thursday, November 7, 2024 at 3:59:19 AM UTC-5 Guy Lemieux wrote:
>
> A TG has been formed to do this for the custom opcodes only:
>
https://github.com/riscv-admin/composable-custom-extensions
> <
https://github.com/riscv-admin/composable-custom-extensions>
>
> which is being built out of prior work done by the SoftCPU SIG,
> written up here:
>
https://github.com/grayresearch/CX <
https://github.com/
> grayresearch/CX>
>
> Guy
>
>
> On Thu, Nov 7, 2024 at 12:56 AM Robert Finch <
robf...@gmail.com>
> wrote:
>
> Does RISCV have anything resembling an architecture call /
> return instruction? It would allow using the instruction set
> of a different architecture, or allow RISCV to be used from
> another architecture. I think this is just two instructions
> ( call and return) that do not use a lot of opcode space.
> The exact mechanics of switching architectures would not
> need to be fully defined. One approach might be to use a
> buffer for storing register contents.
>
>
> --
> You received this message because you are subscribed to the
> Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
isa-dev+u...@groups.riscv.org.
> To view this discussion visit
https://groups.google.com/a/
>
groups.riscv.org/d/msgid/isa-dev/6064aa44-db22-4e12-9eaa-
> a75eaaddaf52n%
40groups.riscv.org <
https://groups.google.com/
> a/
groups.riscv.org/d/msgid/isa-dev/6064aa44-db22-4e12-9eaa-
> a75eaaddaf52n%
40groups.riscv.org?
> utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
isa-dev+u...@groups.riscv.org <mailto:
isa-
>
dev+uns...@groups.riscv.org>.
> CAEasn%2BCE56dYHPXAE_FD%2BQy6MV0A9qYv3zeAoOg6E8Jg%3D4%2BRLw%
40mail.gmail.com <
https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAEasn%2BCE56dYHPXAE_FD%2BQy6MV0A9qYv3zeAoOg6E8Jg%3D4%2BRLw%40mail.gmail.com?utm_medium=email&utm_source=footer>.