Compiler impact of RVA20/22/23U64 adoption

71 views

Skip to first unread message

L Peter Deutsch

unread,

Dec 3, 2024, 1:28:06 PM12/3/24

to isa...@groups.riscv.org

I'm writing a compiler one of whose targets is RV64IMDC, and I'm wondering
what code generation approach is recommended for extensions that are
mandatory in newer RVA profiles but optional or not included in older ones.
My main concerns are Zbb (mandatory in RVA22, not included in RVA20) and Zcb
(mandatory in RVA23, not included in RVA22), but others are being considered
at lower priority (e.g., Zbs). What is current thinking on the alternatives
of:

1) Providing for command-line (or equivalent) selection of individual
extensions, which I believe is what gcc does?

2) Providing only for selection of target profile (RVA20/22/23U64)?

3) Settling on a single RVA target with no selection?

This project has very limited development resources, so #1 would have to be
limited to a small number of extensions.

Is there a graph or table posted somewhere that shows the adoption curve (in
design, production, and deployment) of parts conforming to each profile?
That would be helpful in planning for the future.

Thanks -

L Peter Deutsch :: Aladdin Enterprises :: Healdsburg, CA & Burnaby, BC

BGB

unread,

Dec 3, 2024, 6:42:04 PM12/3/24

to isa...@groups.riscv.org

On 12/3/2024 12:27 PM, L Peter Deutsch wrote:
> I'm writing a compiler one of whose targets is RV64IMDC, and I'm wondering
> what code generation approach is recommended for extensions that are
> mandatory in newer RVA profiles but optional or not included in older ones.
> My main concerns are Zbb (mandatory in RVA22, not included in RVA20) and Zcb
> (mandatory in RVA23, not included in RVA22), but others are being considered
> at lower priority (e.g., Zbs). What is current thinking on the alternatives
> of:
>
> 1) Providing for command-line (or equivalent) selection of individual
> extensions, which I believe is what gcc does?
>
> 2) Providing only for selection of target profile (RVA20/22/23U64)?
>
> 3) Settling on a single RVA target with no selection?
>
> This project has very limited development resources, so #1 would have to be
> limited to a small number of extensions.
>

For my compiler, I took an approach along the lines of:
-m<ARCH> specifies the major target as a string.
-f<flag> may enable a feature flag.
-fno<flag> may disable a feature flag.

So, say, if one wanted RV64G with Zba, theoretically, one could do, say:
-mRV64G -frvzba

Though, just had to go and add RV64G as previously my RV64G target was
being identified as XRVA. Using the more typical names makes more sense
though.

Here, I had ended up using 'rv' as an option name prefix for RV
extensions, as this lessens the chance of conflict.

So, for example, if 'C' were supported:
-frvc //enable 'C' extension.
-fnorvc //disable C extension
(My compiler does not yet support compressed instructions).

In these cases, the compilers' context has a bunch of integer flag
fields, with bits in various flag-fields given for each relevant
optional feature.

Extension flags were mostly being spread between the instruction-emitter
logic. It will not emit an encoding if its flag is absent, and ideally
code-generation logic should not try to use instructions for which the
required flag is absent.

My compiler currently assumes RV64G (understood more as RV64IMFD) as the
baseline. Might consider adding RV64I or RV64IM modes (for now, it
always assumes the FPU exists).

For profiles, if supported, would likely make sense to set up a base ISA
for the profile and then enable/disable any relevant feature flags.

Say, in my scheme, say, if "RVA22" selected "RV64GC" as a base and then
maybe added 'Zba'/'Zbb'/... as feature flags onto this.

Probably, compiler need not need to support/use every feature of the
profile, but maybe only the relevant ones (the profiles would be asking
a bit more of a CPU implementation though; since if a compiled program
uses an ISA feature and the CPU doesn't support it, well, ...).

Where, say, I am using '-frvjumbo' for my jumbo-prefix extension.

Also '-frvjumbo96' for 96-bit encodings (mostly things like loading a
full 64-bit constant via LUI). Decided to partly split up these cases,
as full 64-bit constants are relatively infrequent (so the effect of the
96-bit encodings on performance or code density is fairly small; but
could potentially have a bigger impact on the cost of the instruction
fetch/decode in a CPU).

Well, say, if one has a CPU designed for 2-wide superscalar, it is
likely to already have a lot of the needed mechanisms in place to
support jumbo prefixes (apart from from some hackery in the instruction
decoders).

Things are a little wonky as in my compiler, RV64 has mostly ended up
sharing the same backend as my own ISA, as well as my newer "XG3"
sub-ISA (essentially being treated as an extension to the RV sub-mode).

Originally, this was because I had intended to do mixed-ISA binaries
(where, say, some parts of the program were using my ISA, other parts
RISC-V). Apart from some very limited scenarios, have mostly ended up
not doing this though (and probably would have been cleaner to fork the
backend, but alas).

It more makes sense (and is less complicated) to compile the whole
binary for a single ISA or ISA variant.

For now, XG3RV is '-mX3RV', also implicitly assumes rvjumbo is present
as well.

Though, will note that XG3 has likely somewhat higher implementation
cost vs smaller gains, if compared with the RV64G+Jumbo. So may not make
sense to argue for it as a general-purpose ISA feature (plus also needs
the CPU to support multiple operating modes if one wants it to be able
to coexist with the 'C' extension...). For now, I still consider it
"experimental".

In case anyone cares (unlikely), I had also recently re-licensed my
compiler from the MIT license to MIT-0 / MIT-NA, which is more
permissive than the normal "MIT Expat" license.

Not sure how many people would find it useful (even for RV64G) as it
currently only supports PE/COFF (and a few other tweaked PE variants) as
its output format (doesn't support ELF as I am not currently using ELF
in my projects).

Well, and also it handles the F/D extensions in ways that are "kinda
broken" for now (just made them work, sorta, but this part is still not
really "good" yet though). Mostly it is a side effect of the backend
being extended from my ISA, which doesn't really have a concept of
separate integer and floating-point registers.

...

> Is there a graph or table posted somewhere that shows the adoption curve (in
> design, production, and deployment) of parts conforming to each profile?
> That would be helpful in planning for the future.
>

Dunno on this part...

Personally I would like it if there were a list somewhere of what all
encodings have been assigned for known extensions. As is, one needs to
do web searches and look at PDF's or similar for each extension.

And can seemingly pick up incorrect information, such as thinking that
LQ/SQ already had encodings defined. Though, I guess for me this saved
some effort, as I don't have a particular need to change my Load/Store
Pair implementation from my current encodings at the moment (overloading
LDU / SDU).

Also still not gotten any feedback as to whether or not I should move my
jumbo prefix encoding to not conflict with the previously-dropped
ADDIWU instruction. With this sort of thing, the longer one leaves an
encoding one way, the harder it is to change later (so, if I should
change it, better to know sooner than later).

But, if ADDIWU is good and solidly in the grave, may make more sense to
for me to leave the prefix where it is (and, the prefix can itself
recreate what was "lost" here via a 64-bit encoding; if albeit not with
a 33-bit immediate as could be done by prefix-extending ADDW).

Can note that rationale for dropping it was that it was infrequent,
which also seems reasonably true for its equivalent in my ISA. In this
case, the 3-register variants appear more common than the immediate-form
variant (and when they do appear, much of the time the immediate would
not fit within 12 bits).

My personal leaning is to semi-revive ADDWU/SUBWU, if albeit no-longer
in the 'B' extension; but to what extent one needs the semantics of
ADDIWU, to fake it with a Jumbo+ADDWU or similar (using the prefix to
fake an Imm17s variant as a 64-bit encoding). If the Imm17s form fails,
can use "Jumbo+ADD; ADD.UW" (or, basically, do a 64-bit add with a
33-bit immediate and then zero-extend the result).

Where, RV64G+Zba would give this case more like "LUI; ADD; ADD; ADD.UW",
which has a 4-word cost and 6 cycle latency (assuming 2 cycle latency
for 64-bit ADD); vs 2 words and 1 clock cycle for the Imm17s case.

Not really sure why not to have ADDWU / SUBWU if compared with a lot of
the other somewhat more complex and expensive seeming parts of the 'B'
extension though.

Theoretically, cheap / simple / general-purpose should win out over
expensive / complex / overly-niche. So, some parts of 'B' seem a bit odd
by this metric...

Will note that I still have no current plans to support the V extension,
mostly on the basis that (beyond implementation effort), it would likely
accomplish little beyond blowing the FPGA resource budget (I still have
doubts it would be particularly viable for soft-processors on an Artix-7
or similar). Also likely of limited direct relevance to typical
integer-code dominant workloads.

The 'B' extension seems relevant, but also contains a lot of stuff that
seems "overly niche" (so, for now, I have only really implemented Zba
and similar).

Reply all

Reply to author

Forward

0 new messages