Proposal to remove the need for binary translation for RV32G to RV64G

Kelly Dean

unread,

Mar 1, 2018, 2:10:15 PM3/1/18

to RISC-V ISA Dev

It appears that binary translation of 32-bit-dependent RV32I programs to run (correctly) on RV64I processors requires three steps:

Translate the nine instructions ADD[I], SUB, SLL[I], SRL[I], and SRA[I], to their W counterparts. (This just requires flipping one bit (inst[3]) in the major opcode.)

Following «[rdcycle | rdtime | rdinstret] rd», add the instruction:
addiw rd, rd, 0

Then replace «[rdcycleh | rdtimeh | rdinstreth] rd» by the sequence:
[rdcycle | rdtime | rdinstret] rd; srai rd, rd, 32

(Unlike the arithmetic and logic translations, the read-counter translations require expansion of the program.)

Is that all? If so, then why doesn't RV32I allow those W instructions? The advantage would be that no binary translation would be necessary (except for the gratuitous incompatibility of the read-counter instructions).

If it's just a historical accident, then I propose to remove this wart in the following way.

In section 4.2 of riscv-spec-v2.2.pdf:
Remove the sentence “They [the W instructions] cause an illegal instruction exception in RV32I.” and replace it by “They're equivalent to the non-W instructions in RV32I.” This adds no complexity to RV32I decoders; it might even simplify them, since they no longer need to discriminate inst[3] when the major opcode is 0x1x011 and funct3 is 000, 001, or 101.

In table 19.3 (CSRs):
Remove the gratuitous restriction “RV32I only” from cycleh, timeh, and instreth.
Add three new CSRs: cyclel, timel, and instretl, that map to the lower 32 bits of cycle, time, and instret.

In section 2.8:
Add new pseudo-instructions RDCYCLEL, RDTIMEL, and RDINSTRETL, to read the new CSRs. On RV32I, they do the same thing as RDCYCLE, RDTIME, and RDINSTRET.
Specify that when CSRR[S|C][I] reads [cycle | time | instret][l|h] on RV64I and RV128I, it sign-extends (to maintain the invariant described in section 4.2, page 29) to XLEN bits instead of zero-extending. This is backward compatible, because in the current spec these CSRs are either nonexistent or illegal to access in RV64I and RV128I.

In section 4.4:
Remove the sentence “Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are not necessary and are illegal in RV64I.”

Then to write RV32I programs that are portable to RV64I processors, just use the W instead of the non-W instructions for algorithms that are intolerant of 64-bit operations, and use the new L-suffixed read-counter instructions instead of the non-suffixed ones. (Continue using the current H-suffixed instructions.)

This is fully backward compatible; RV32I and RV64I programs written to the 2.2 spec run unchanged on processors that implement this revised spec. If it's politically impossible to modify the frozen spec, then just define these new compatibility features to be an optional standard extension.

RV32IM on RV64IM would also work by adding MULH[[S]U]W instructions and using them (along with the current MULW, DIV[U]W, and REM[U]W)) instead of their non-W counterparts. And this would give full RV32G on RV64G, since the A, F, and D extensions are already compatible.

RV32IC on RV64IC can't be done without breaking backward compatibility, due to their differing repertoires of compressed instructions, and harmonizing them would necessarily worsen one of them. But at least RV32G on RV64G could be done.

Bruce Hoult

unread,

Mar 1, 2018, 2:27:39 PM3/1/18

to Kelly Dean, RISC-V ISA Dev

I absolutely agree. I've pointed out here several times before that allowing the W instructions as aliases for the non W in RV32 is not only trivial but, as you say, even simplifies it.

As well as the arithmetic instructions, you also want to allow the LWU (Load Word Unsigned) as an alias for LW in RV32. Again this simply requires ignoring one bit in the instruction (though a different bit).

After that, you'd want to modify the compilers to start actually using those instructions in 32 bit mode -- optionally at first, as existing hardware doesn't accept them, but in a few years it could perhaps become the default.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/64Gn11pyCTCo9nARXanQovr2p8xOs08KAodpwjIliUp%40local.

Michael Clark

unread,

Mar 1, 2018, 3:28:40 PM3/1/18

to Kelly Dean, RISC-V ISA Dev

> On 2/03/2018, at 8:09 AM, Kelly Dean <ke...@prtime.org> wrote:
>
> It appears that binary translation of 32-bit-dependent RV32I programs to run (correctly) on RV64I processors requires three steps:
>
> Translate the nine instructions ADD[I], SUB, SLL[I], SRL[I], and SRA[I], to their W counterparts. (This just requires flipping one bit (inst[3]) in the major opcode.)
>
> Following «[rdcycle | rdtime | rdinstret] rd», add the instruction:
> addiw rd, rd, 0

If you need to insert instructions during binary translation then you need a shadow area containing the translated code along with special handling for indirect unconditional branches i.e. JALR, to handle the remapping of program counters. Static binary translation would only be possible if you didn’t alter the code size as after linkage, most of the relocation information is omitted from the linked binary. You could do static binary translate for simple statically linked programs that don’t use true indirect branches (e.g. GOT/PLT for shlibs) if you consider the AUIPC+JALR pair as a direct unconditional branch. If you have any lone JALRs in the code, you will need to do dynamic binary translation with translated code caches et al (loads from the TEXT for example should return the untranslated code, for the translation to be accurate).

It’s possible. I could pretty easily add an RV64 backend to rv8, which handled RV32. It could also be done in QEMU.

> Then replace «[rdcycleh | rdtimeh | rdinstreth] rd» by the sequence:
> [rdcycle | rdtime | rdinstret] rd; srai rd, rd, 32
>
> (Unlike the arithmetic and logic translations, the read-counter translations require expansion of the program.)
>
> Is that all? If so, then why doesn't RV32I allow those W instructions? The advantage would be that no binary translation would be necessary (except for the gratuitous incompatibility of the read-counter instructions).
>
>
> If it's just a historical accident, then I propose to remove this wart in the following way.
>
> In section 4.2 of riscv-spec-v2.2.pdf:
> Remove the sentence “They [the W instructions] cause an illegal instruction exception in RV32I.” and replace it by “They're equivalent to the non-W instructions in RV32I.” This adds no complexity to RV32I decoders; it might even simplify them, since they no longer need to discriminate inst[3] when the major opcode is 0x1x011 and funct3 is 000, 001, or 101.
>
> In table 19.3 (CSRs):
> Remove the gratuitous restriction “RV32I only” from cycleh, timeh, and instreth.
> Add three new CSRs: cyclel, timel, and instretl, that map to the lower 32 bits of cycle, time, and instret.
>
> In section 2.8:
> Add new pseudo-instructions RDCYCLEL, RDTIMEL, and RDINSTRETL, to read the new CSRs. On RV32I, they do the same thing as RDCYCLE, RDTIME, and RDINSTRET.
> Specify that when CSRR[S|C][I] reads [cycle | time | instret][l|h] on RV64I and RV128I, it sign-extends (to maintain the invariant described in section 4.2, page 29) to XLEN bits instead of zero-extending. This is backward compatible, because in the current spec these CSRs are either nonexistent or illegal to access in RV64I and RV128I.
>
> In section 4.4:
> Remove the sentence “Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are not necessary and are illegal in RV64I.”
>
>
> Then to write RV32I programs that are portable to RV64I processors, just use the W instead of the non-W instructions for algorithms that are intolerant of 64-bit operations, and use the new L-suffixed read-counter instructions instead of the non-suffixed ones. (Continue using the current H-suffixed instructions.)
>
> This is fully backward compatible; RV32I and RV64I programs written to the 2.2 spec run unchanged on processors that implement this revised spec. If it's politically impossible to modify the frozen spec, then just define these new compatibility features to be an optional standard extension.
>
>
> RV32IM on RV64IM would also work by adding MULH[[S]U]W instructions and using them (along with the current MULW, DIV[U]W, and REM[U]W)) instead of their non-W counterparts. And this would give full RV32G on RV64G, since the A, F, and D extensions are already compatible.
>
> RV32IC on RV64IC can't be done without breaking backward compatibility, due to their differing repertoires of compressed instructions, and harmonizing them would necessarily worsen one of them. But at least RV32G on RV64G could be done.

I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.

There would need to be a new RV32X ABI that is somewhat like x32 on x86_64. A 32-bit ABI on RV64 may be practical but RV32 is likely not going to change as the Base ISA is frozen. As you mentioned, the compressed extension is different.

I can see ilp32 for rv64 being a distinct possibility, but that would use 64-bit instructions for fast handling of long long as does x32. i.e. the primary benefit is in pointer size for smaller memory systems. x32 supports the 64-bit instructions for 64-bit scalars.

Bruce Hoult

unread,

Mar 1, 2018, 4:02:02 PM3/1/18

to Michael Clark, Kelly Dean, RISC-V ISA Dev

Note that making an RVC binary that runs correctly on both rv32 and rv64 only requires avoiding:

C.FLW, C.FSW, C.FLWSP, C.FSWSP -- single precision FP loads/stores. DP is fine.

C.JAL -- subroutine call. Of limited use in non-embedded programs as it only has a +/- 2KB range.

This would have very little effect on the size savings in most programs -- especially as most don't use FP at all.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/D7578234-C99D-4F40-BA95-8C4C39DA7BC1%40mac.com.

Kelly Dean

unread,

Mar 1, 2018, 10:10:17 PM3/1/18

to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> As well as the arithmetic instructions, you also want to allow the LWU
> (Load Word Unsigned) as an alias for LW in RV32.

Why?

If address n contains FFFF_FFFF and register x1 contains n, and you do
lw x1, x1, 0
slt x1, x1, x0

Then you get 1 in x1, on both RV32 and RV64. No binary translation necessary.

But if you use LWU instead of LW, then you'd get 1 on RV32 but 0 on RV64. Adding support for LWU to RV32 would be counterproductive, because code using it would run incorrectly on RV64.

Bruce Hoult

unread,

Mar 1, 2018, 10:27:29 PM3/1/18

to Kelly Dean, RISC-V ISA Dev

Well, yes, because there's a bug in your code

If the FFFF_FFFF at address n is semantically an unsigned value (i.e. 4294967295 not -1) then you need to use...

lwu x1, x1, 0

sltu x1, x1, x0

... which will produce the same results on RV32 and RV64 if RV32 is as suggested modified to accept the lwu opcode and execute it exactly the same as it executes lw.

Of course no unsigned values are less than 0, so you might want to use something other than x0 to compare against.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/6dcxNumjr2IRvxFnjUXPfih9WSjIg4jfloyRmCfVzsA%40local.

Kelly Dean

unread,

Mar 1, 2018, 11:26:40 PM3/1/18

to Michael Clark, RISC-V ISA Dev

Michael Clark writes:

> I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.

Then as I said, just define it as an optional standard extension. Name it e.g. “Z”. RV32I silicon shipping and the spec being frozen doesn't mean new standard extensions can't be added; e.g. B will be, and RV32IB silicon will ship in the future. As with B, and any other extension, software that doesn't use Z will still run correctly on processors that do implement Z.

Of course, “Z” is a silly name for it. Section 22.4 of the spec v2.2 already explains exactly what the right names are:
“RV32I2p1M2p1”, “RV32G2p1”, “RV64I2p1M2p1”, and “RV64G2p1”.

> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64. A 32-bit ABI on RV64 may be practical

RV64 already has a standard 32-bit ABI: it's the *W instructions.

Samuel Falvo II

unread,

Mar 1, 2018, 11:31:45 PM3/1/18

to Kelly Dean, Michael Clark, RISC-V ISA Dev

On Thu, Mar 1, 2018 at 8:26 PM, Kelly Dean <ke...@prtime.org> wrote:
> Michael Clark writes:
>> I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.
>
> Then as I said, just define it as an optional standard extension. Name it e.g. “Z”. RV32I silicon shipping and the spec being frozen doesn't mean new standard extensions can't be added; e.g. B will be, and RV32IB silicon will ship in the future. As with B, and any other extension, software that doesn't use Z will still run correctly on processors that do implement Z.

I think this is what, in part, the "X" non-standard extensions were
intended for.

I'll be happy to support this extension for my future-planned revision
to my Kestrel project's processor design. It's not formally
documented, but I already support LDU and *all* ALU instructions have
corresponding -W variants (not just the 4 listed in the standard),
purely because I got lazy with the decoder and just didn't see the
point. ;)

--
Samuel A. Falvo II

Samuel Falvo II

unread,

Mar 1, 2018, 11:35:31 PM3/1/18

to Kelly Dean, Michael Clark, RISC-V ISA Dev

On Thu, Mar 1, 2018 at 8:31 PM, Samuel Falvo II <sam....@gmail.com> wrote:
> I'll be happy to support this extension for my future-planned revision
> to my Kestrel project's processor design. It's not formally

What I meant was to apply this core idea to RV64->RV128 instructions.
My CPU design is already 64-bit. Heh.

Kelly Dean

unread,

Mar 2, 2018, 10:43:00 AM3/2/18

to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> Well, yes, because there's a bug in your code

Yes, intentionally; my point is that LWU, itself, in RV32 software would (always) be a latent bug, masked when run on an RV32 processor but potentially exposed when run on an RV64 processor. The bug is not merely the combination of LWU and SLT.

> If the FFFF_FFFF at address n is semantically an unsigned value (i.e.
> 4294967295 not -1) then you need to use...
>
> lwu x1, x1, 0
> sltu x1, x1, x0
>
> ... which will produce the same results on RV32 and RV64 if RV32 is as
> suggested modified to accept the lwu opcode and execute it exactly the same
> as it executes lw.
>
> Of course no unsigned values are less than 0, so you might want to use
> something other than x0 to compare against.

Using LW with SLTU correctly produces the same results on RV32 and RV64. Even for unsigned values, you must use LW in 32-bit software (i.e. software that operates on 32-bit values in registers), regardless of whether the software runs on an RV32 processor (with 32-bit registers) or RV64 processor (where the *W instructions emulate 32-bit registers).

LWU doesn't work (not even for unsigned values) for 32-bit software, because LWU fails to emulate a 32-bit destination register. It violates the invariant described in section 4.2 of the spec v2.2, which says “all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32”. Consider another example:

i+0=i, for any value of i, whether signed or unsigned.

If address n contains FFFF_FFFF and register x1 contains n, and you do
lw x1, x1, 0

addiw x2, x1, 0
beq x2, x1, target

Then it would branch (as it should), on both RV32 and RV64. But if you use LWU instead of LW, it would branch on RV32 but not on RV64. This is 32-bit software, because it uses ADDIW; therefore, LWU itself would be the bug.

Alex Elsayed

unread,

Mar 2, 2018, 2:50:27 PM3/2/18

to isa...@groups.riscv.org

On Thursday, 1 March 2018 12:28:19 PST Michael Clark wrote:
> > On 2/03/2018, at 8:09 AM, Kelly Dean <ke...@prtime.org> wrote:
> >

<snip>

> > Is that all? If so, then why doesn't RV32I allow those W instructions? The
> > advantage would be that no binary translation would be necessary (except
> > for the gratuitous incompatibility of the read-counter instructions).
> >
> >
> > If it's just a historical accident, then I propose to remove this wart in
> > the following way.

<snip>

> I agree with the sentiment but not with the timing. Given there are already
> many RV32 designs including shipping silicon it just may not be practical.

I'll note that these changes take things that trap and give them non-trap
semantics. As a result, this can be implemented for existing silicon in M- or
S-mode, albeit with a performance penalty.

> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64.
> A 32-bit ABI on RV64 may be practical but RV32 is likely not going to
> change as the Base ISA is frozen. As you mentioned, the compressed
> extension is different.

This is notably different from x32, in that the new ABI will run on 32-bit
processors, while x32 runs only in long mode. If anything, this is bringing
RISC-V closer to SPARC's approach (of making 32-bit programs execute
unmodified under 64-bit mode). In addition, this does not change the calling
convention _at all_ compared to RV32. x32 does.

It stops short of that (due to the C extension), but IMO it's a very
interesting design point.

It would be a new ABI though, yes, simply due to the *L variants of the time
CSRs.

> I can see ilp32 for rv64 being a distinct possibility, but that would use
> 64-bit instructions for fast handling of long long as does x32. i.e. the
> primary benefit is in pointer size for smaller memory systems. x32 supports
> the 64-bit instructions for 64-bit scalars.

I don't think that would be anywhere near as beneficial. The benefits of an
ILP32 ABI along those lines on x86 (larger register file, new instructions,
etc) are much larger than on RISC-V (larger registers, all else unchanged). In
addition, the main benefit of the proposed change is _not_ performance: it's
compatibility.

Bruce Hoult

unread,

Mar 2, 2018, 4:11:09 PM3/2/18

to Alex Elsayed, RISC-V ISA Dev

On Fri, Mar 2, 2018 at 10:49 PM, Alex Elsayed <etern...@gmail.com> wrote:

On Thursday, 1 March 2018 12:28:19 PST Michael Clark wrote:
> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64.
> A 32-bit ABI on RV64 may be practical but RV32 is likely not going to
> change as the Base ISA is frozen. As you mentioned, the compressed
> extension is different.

This is notably different from x32, in that the new ABI will run on 32-bit
processors, while x32 runs only in long mode. If anything, this is bringing
RISC-V closer to SPARC's approach (of making 32-bit programs execute
unmodified under 64-bit mode). In addition, this does not change the calling
convention _at all_ compared to RV32. x32 does.

And PowerPC. Right from the start, 32 bit PowerPC processors provided both lw and lwz (zero extend) instructions that acted identically on 32 bit CPUs. Compilers right from the start used lw for signed values and lwz for unsigned values.

When the 64 bit G5 Macs came out, all legacy 32 bit PowerPC code kept right on working.

It stops short of that (due to the C extension), but IMO it's a very
interesting design point.

As I showed previously in this thread, you can use *almost* all of RV32C. You only have to avoid the four single precision FP load and store instructions, and C.JAL (which can only address +/- 2 KB anyway).

Kelly Dean

unread,

Mar 3, 2018, 12:39:33 AM3/3/18

to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> Note that making an RVC binary that runs correctly on both rv32 and rv64
> only requires avoiding:
>
> C.FLW, C.FSW, C.FLWSP, C.FSWSP -- single precision FP loads/stores. DP is
> fine.
> C.JAL -- subroutine call. Of limited use in non-embedded programs as it
> only has a +/- 2KB range.
>
> This would have very little effect on the size savings in most programs --
> especially as most don't use FP at all.

You'd also have to avoid C.ADDI4SPN, C.ADDI, C.ADDI16SP, C.SRLI, C.SUB, C.SLLI, and C.ADD, for the same reason that you have to avoid the full instructions that those represent. They operate on 32-bit values on RV32 but 64-bit values on RV64.

Adding insult to injury, the RV32/64 encodings for C.ADDW and C.SUBW are the same, but you can't use them in RV32 software as a substitute for C.ADD and C.SUB (which you must avoid, in order to be portable to RV64) because they're capriciously reserved on RV32 (consistent with the current prohibition of all *W computation instructions on RV32).

That all adds up to a very large effect on the size savings.

Richard Herveille

unread,

Mar 5, 2018, 4:04:35 AM3/5/18

to Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.

I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.

Richard

cid:image001.png@01D348FE.8B6D1030

Richard Herveille

Managing Director

Phone +31 (45) 405 5681

Cell +31 (6) 5207 2230

richard....@roalogic.com

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to

isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkwH1aPq5Wh0JnvgLYQXOsZ8wLXsunyKDLqbZSGeswkQEA%40mail.gmail.com.

Tommy Thorn

unread,

Mar 5, 2018, 4:12:24 PM3/5/18

to Richard Herveille, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev

Why would that be? Wouldn't you just run RV32 apps in RV32 mode on Linux

like you currently can run 32-bit apps on a 64-bit processor. I fail to understand

what the big deal is here and when this would ever be useful.

Tommy

On Mar 5, 2018, at 01:04 , Richard Herveille <richard....@roalogic.com> wrote:

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.
I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.

Richard

<image001.png>

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/B9989910-9746-4CEF-A40D-B733BFA09C81%40roalogic.com.
<image001.png>

Richard Herveille

unread,

Mar 6, 2018, 3:05:29 AM3/6/18

to Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Besides that there are other systems than Linux …

The RISC-V ISA has the potential of natively running RV32I (probably RV32IMA) on an RV64 or RV128 CPU.

However due to some minor choices this is not possible. For example the ADD instruction ix XLEN sized. So ADD on an RV32 CPU behaves differently than on an RV64 CPU. However RV32-ADD and RV64-ADDW behave the same.

So why not fix the opcode such that RV32 always uses ADDW instead of ADD? If that’s done consequently (ie for other conflicting opcodes), then RV32 code runs on an RV64 CPU (and RV128 CPU for that matter).

Richard

cid:image001.png@01D348FE.8B6D1030

Jacob Bachmeyer

unread,

Mar 6, 2018, 10:34:39 PM3/6/18

to Alex Elsayed, isa...@groups.riscv.org

Alex Elsayed wrote:
> I don't think that would be anywhere near as beneficial. The benefits of an
> ILP32 ABI along those lines on x86 (larger register file, new instructions,
> etc) are much larger than on RISC-V (larger registers, all else unchanged). In
> addition, the main benefit of the proposed change is _not_ performance: it's
> compatibility.

For compatibility, we already have the UXL field in sstatus, which
allows a supervisor to configure an RV64-capable processor to act as an
RV32 processor in U-mode.

Why are we concerned about running RV32 code on RV64 when we have the
option for RV64 processors to directly support RV32, including RV32C?

-- Jacob

Richard Herveille

unread,

Mar 7, 2018, 2:52:50 AM3/7/18

to jcb6...@gmail.com, Alex Elsayed, isa...@groups.riscv.org, Richard Herveille

Implementing the UXL field adds complexity, which is bad for embedded CPUs.

Besides, there are systems without HSU mode that might benefit from running RV32 code natively.

Not being able to execute RV32I(MA) natively is a mistake.

Richard

cid:image001.png@01D348FE.8B6D1030

Richard Herveille

Managing Director

Phone +31 (45) 405 5681

Cell +31 (6) 5207 2230

richard....@roalogic.com

From: Jacob Bachmeyer <jcb6...@gmail.com>
Reply-To: "jcb6...@gmail.com" <jcb6...@gmail.com>
Date: Wednesday, 7 March 2018 at 04:34
To: Alex Elsayed <etern...@gmail.com>
Cc: "isa...@groups.riscv.org" <isa...@groups.riscv.org>
Subject: Re: [isa-dev] Proposal to remove the need for binary translation for RV32G to RV64G

Alex Elsayed wrote:

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5A9F5DCC.7080200%40gmail.com.

Jacob Bachmeyer

unread,

Mar 7, 2018, 10:47:29 PM3/7/18

to Richard Herveille, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman

Richard Herveille wrote:
>
> Implementing the UXL field adds complexity, which is bad for embedded
> CPUs.
>

How much complexity does UXL add, given that RV64 is already supported?
Also, embedded systems are not running "just any" software, but a
unified and known firmware image, so build-time translation (or simply
"always compile for the actual processor you are targeting") is not
unreasonable.

> Besides, there are systems without HSU mode that might benefit from
> running RV32 code natively.
>

While slightly trickier, for this there is the MXL field in misa.

> Not being able to execute RV32I(MA) natively is a mistake.
>

While I agree that it looks like a poor choice, I suspect that some
other constraint drove that decision. To Dr. Waterman: why was the
choice made to use OP for "varying width" instructions rather than have
distinct OP-{32,64,128} for all XLEN? (This was, AFAIK, part of the PhD
thesis that became RISC-V.)

-- Jacob

Andrew Waterman

unread,

Mar 8, 2018, 1:53:58 AM3/8/18

to Jacob Bachmeyer, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

We're writing up an explanation of this design decision as part of
revisions to the commentary, and we will send it out when we're done.

Separately, and this is just my opinion, I don't find this to be a
particularly pressing issue. My reasons include the following, some
of which have already been pointed out in this thread:

- For embedded systems, it's hard to see why running RV32 binaries on
RV64 systems is compelling. For these systems, you can nearly always
just recompile the code. Furthermore, the memory map and other
platform details tend to be baked into the binary, and these will
differ between RV32 and RV64 systems; the instruction encoding is the
least of one's concerns. Finally, if you know you want to deploy RV32
software in a power constrained system, you should just use an RV32
core.

- For Unixy systems, the processors are already sufficiently complex
that supporting UXL is a blip on the complexity radar. (And note, the
hardware cost is very low. It may be a pain to implement, but it
takes few gates.)

- For Unixy systems, there is no legacy software base for RISC-V
(yet), so it is not even clear this will be a concern. And there
won't be a legacy RV32 software base for some time, because RV32 isn't
even supported in upstream Linux/glibc!

To this point, consider also that there are AArch64 server processors
incapable of natively executing ARMv7 programs. AFAIK this has not
proven problematic, since there isn't a long history of ARMv7 Linux
servers.

- We will eventually provide an x32-style ABI for applications that
want 4-byte pointers on RV64. I'm well aware that some people don't
like the complexity of this approach, but it does address the main
technical reason to deliberately run RV32 code on RV64 systems.

>
>
> -- Jacob

Richard Herveille

unread,

Mar 8, 2018, 4:06:32 AM3/8/18

to Andrew Waterman, Jacob Bachmeyer, Alex Elsayed, isa...@groups.riscv.org, Richard Herveille

- For embedded systems, it's hard to see why running RV32 binaries on

RV64 systems is compelling. For these systems, you can nearly always

just recompile the code. Furthermore, the memory map and other

platform details tend to be baked into the binary, and these will

differ between RV32 and RV64 systems; the instruction encoding is the

least of one's concerns. Finally, if you know you want to deploy RV32

software in a power constrained system, you should just use an RV32

core.

[rih] This is simply not true.

RISC-V is a young architecture and there isn’t much legacy software out there at the moment, but there will be.

In the embedded world it is very common to buy libraries from 3^rd parties.

If a user, at some point in the future, wants to upgrade from an RV32 to an RV64 CPU then the incompatibility makes this impossible.

- For Unixy systems, the processors are already sufficiently complex

that supporting UXL is a blip on the complexity radar. (And note, the

hardware cost is very low. It may be a pain to implement, but it

takes few gates.)

[rih] This argument holds no value in my opinion.

This argument was about embedded stuff. Everybody keeps talking about unox, but in the embedded world there are many other systems.
UXL (and the likes) is a cludge to make RV32 code work on an RV64 system.

If RV64 would natively run RV32 code none of this would be necessary in the first place. Hence even less gates and no pain to implement.

To this point, consider also that there are AArch64 server processors

incapable of natively executing ARMv7 programs. AFAIK this has not

proven problematic, since there isn't a long history of ARMv7 Linux

servers.

[rih] You’re comparing apples and pears

RISC-V is supposed to be unique in that the architecture supports small microcontroller type implementations all the way up to mega servers.

If that’s not the case, then we need to split.

Richard Herveille

unread,

Mar 8, 2018, 4:17:36 AM3/8/18

to jcb6...@gmail.com, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman, Richard Herveille

Richard Herveille wrote:

Implementing the UXL field adds complexity, which is bad for embedded
CPUs.

How much complexity does UXL add, given that RV64 is already supported?

[rih] Well,

there’s the registers with their encoding in CSR
muxes for illegal-opcode detection
muxes for the actual instruction execution (i.e. the ALU block)

I don’t have an exact number and it won’t be a major contributor. But when every gate counts, this just seems a waste while it could have been avoided.

Also, embedded systems are not running "just any" software, but a

unified and known firmware image, so build-time translation (or simply

"always compile for the actual processor you are targeting") is not

unreasonable.

[rih] Yes and no. This same argument is used over and over.

My counter-argument is 3^rd party libraries. Paying (sometimes a lot of money) for a library and not being able to use it when upgrading the processor (from RV32 to RV64) seems wrong and limits the use case for RISC-V.

Besides, there are systems without HSU mode that might benefit from
running RV32 code natively.

While slightly trickier, for this there is the MXL field in misa.

[rih] Isn’t that just a fudge?!

IMO MXL, UXL are just kludges to fix the issue. Running RV32 natively on RV64 voids all of this.

Not being able to execute RV32I(MA) natively is a mistake.

While I agree that it looks like a poor choice, I suspect that some

other constraint drove that decision

[rih] I am pretty sure there are. But we’re trying to move the RISC-V architecture from the academic world into the commercial world. Different decisions matter.

cid:image001.png@01D348FE.8B6D1030

Richard Herveille

unread,

Mar 8, 2018, 4:30:57 AM3/8/18

to Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Why would that be? Wouldn't you just run RV32 apps in RV32 mode on Linux

[rih] There are other systems besides unox.

like you currently can run 32-bit apps on a 64-bit processor. I fail to understand

what the big deal is here and when this would ever be useful.

[rih] RISC-V is different in this aspect than other 64bit processors.

For x86 the 64bit opcodes are an extension of the 32bit CPU (which is an extension of the 16bit 8086).

MIPS32 code can run on MIPS64 and behaves the same as it would on an MIPS32-CPU.

There are issues when calling MIPS32 code from a MIPS64 CPU, because the 32bit code only save the lower part of the registers.

This is what I hoped would be fixed by RISC-V, but instead it went the exact opposite way.

Load/Store use specify the width, meaning RV32 uses LW to load a word. And RV64 uses LD to load a double. Whereas ADD has a different behavior on RV32 and RV64.

Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And specifying the width for ALU operations (i.e. ADDW for RV32) would have ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

cid:image001.png@01D348FE.8B6D1030

Cesar Eduardo Barros

unread,

Mar 8, 2018, 6:30:00 AM3/8/18

to Richard Herveille, Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev

Em 08-03-2018 06:30, Richard Herveille escreveu:
> There are issues when calling MIPS32 code from a MIPS64 CPU, because the
> 32bit code only save the lower part of the registers.
>
> This is what I hoped would be fixed by RISC-V, but instead it went the
> exact opposite way.
>
> Load/Store use specify the width, meaning RV32 uses LW to load a word.
> And RV64 uses LD to load a double. Whereas ADD has a different behavior
> on RV32 and RV64.
>
> Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits
> would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And
> specifying the width for ALU operations (i.e. ADDW for RV32) would have
> ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

I apologize if I'm missing something obvious, but how would a load/store
of XLEN bits help in the scenario where 64-bit code calls 32-bit code?

Suppose the 64-bit code has something in register s0, and 32-bit code
wants to save that value. If XLEN is 32 bits, it needs a 4-byte save
area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit
code always allocates a 4-byte save area, a XLEN-sized store will
overflow it when run on a 64-bit processor; if the 32-bit code always
allocates an 8-byte save area, it will be wasting space when run a
32-bit processor (and 32-bit processors usually have less memory).

Also, what about pointers? The 32-bit code might receive a pointer from
the 64-bit code; to work well for that case, the 32-bit code would need
to use 64-bit pointers on every data structure, once again wasting space
in a 32-bit processor.

The more I think about it, the less the "mixed 32-bit and 64-bit code in
the same process" scenario makes sense. In a separate process (or
process-like entity), it makes more sense: the process/task switch code
is responsible for saving and restoring the registers, and normally
pointers aren't shared between processes.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Richard Herveille

unread,

Mar 8, 2018, 9:12:11 AM3/8/18

to Cesar Eduardo Barros, Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Em 08-03-2018 06:30, Richard Herveille escreveu:

There are issues when calling MIPS32 code from a MIPS64 CPU, because the
32bit code only save the lower part of the registers.
This is what I hoped would be fixed by RISC-V, but instead it went the
exact opposite way.
Load/Store use specify the width, meaning RV32 uses LW to load a word.
And RV64 uses LD to load a double. Whereas ADD has a different behavior
on RV32 and RV64.
Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits
would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And
specifying the width for ALU operations (i.e. ADDW for RV32) would have
ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

I apologize if I'm missing something obvious, but how would a load/store

of XLEN bits help in the scenario where 64-bit code calls 32-bit code?

Suppose the 64-bit code has something in register s0, and 32-bit code

wants to save that value. If XLEN is 32 bits, it needs a 4-byte save

area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit

code always allocates a 4-byte save area, a XLEN-sized store will

overflow it when run on a 64-bit processor; if the 32-bit code always

allocates an 8-byte save area, it will be wasting space when run a

32-bit processor (and 32-bit processors usually have less memory).

[rih] XLEN is determined by the CPU, not the program code.

When passing values from 64bit code to 32bit code, the value can only be 32bits large of course. Also the value must be located in the lower 32bits of the register.

That’s not where an XLEN bits load/store helps. It helps when saving registers during the pre-/post-amble.

Support S0 contains some value that must be restored after the function call. XLEN=64bits, so a STORE would write all 64bits to memory, since the CPU is an RV64. The 32bit program only uses the 32LSBs of S0. After the call completes, S0 is restored using a LOAD. All 64bits are loaded from memory, thus restoring the original value.

When the 32bit function is called on an RV32, 32bits are stored and loaded, again restoring the original value.

Also, what about pointers? The 32-bit code might receive a pointer from

the 64-bit code; to work well for that case, the 32-bit code would need

to use 64-bit pointers on every data structure, once again wasting space

in a 32-bit processor.

[rih] No, 32bit code can only address 32bit space. So an RV64 CPU calling 32bit code must ensure the pointers fit in the ±2GB address space.

Richard

Alex Elsayed

unread,

Mar 8, 2018, 10:58:39 AM3/8/18

to RISC-V ISA Dev

On Wednesday, March 7, 2018 10:53:35 PM PST Andrew Waterman wrote:

<snip>

> - For Unixy systems, there is no legacy software base for RISC-V
> (yet), so it is not even clear this will be a concern. And there
> won't be a legacy RV32 software base for some time, because RV32 isn't
> even supported in upstream Linux/glibc!

<snip>

Yes, but the same argument applies to RV64/RV128.

Just as history has shown that address space expansion converges on a flat
address space with twice as many pointer bits (and the spec notes this, using
it as justification for RV128 existing at all), it has also shown that legacy
software _does_ rapidly become a differentiating factor, albeit not equally in
all spaces. IA64 learned this the hard way.

Companies and users buy proprietary software, and companies that _make_
proprietary software are not-infrequently outlived by their users.

An architecture that resulted in RV32 programs running unmodified on RV64
would almost certainly also allow both to run unmodified on RV128, where the
problem of legacy software will have had plenty of time to grow unchecked.

signature.asc

Alex Elsayed

unread,

Mar 8, 2018, 10:58:43 AM3/8/18

to RISC-V ISA Dev

On Thursday, March 8, 2018 6:12:04 AM PST Richard Herveille wrote:

> Cesar Eduardo Barros wrote:
>> Em 08-03-2018 06:30, Richard Herveille escreveu:
>>
>>> There are issues when calling MIPS32 code from a MIPS64 CPU, because the
>>> 32bit code only save the lower part of the registers.
>>> This is what I hoped would be fixed by RISC-V, but instead it went the
>>> exact opposite way.
>>>
>>> Load/Store use specify the width, meaning RV32 uses LW to load a word.
>>> And RV64 uses LD to load a double. Whereas ADD has a different behavior
>>> on RV32 and RV64.
>>>
>>> Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits
>>> would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And
>>> specifying the width for ALU operations (i.e. ADDW for RV32) would have
>>> ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.
>>
>> I apologize if I'm missing something obvious, but how would a load/store
>> of XLEN bits help in the scenario where 64-bit code calls 32-bit code?
>>
>> Suppose the 64-bit code has something in register s0, and 32-bit code
>> wants to save that value. If XLEN is 32 bits, it needs a 4-byte save
>> area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit
>> code always allocates a 4-byte save area, a XLEN-sized store will
>> overflow it when run on a 64-bit processor; if the 32-bit code always
>> allocates an 8-byte save area, it will be wasting space when run a
>> 32-bit processor (and 32-bit processors usually have less memory).
>

> XLEN is determined by the CPU, not the program code.
>
> When passing values from 64bit code to 32bit code, the value can only be
> 32bits large of course. Also the value must be located in the lower 32bits
> of the register.
>
> That’s not where an XLEN bits load/store helps. It helps when saving
> registers during the pre-/post-amble.
>
> Support S0 contains some value that must be restored after the function
> call. XLEN=64bits, so a STORE would write all 64bits to memory, since the
> CPU is an RV64. The 32bit program only uses the 32LSBs of S0. After the
> call completes, S0 is restored using a LOAD. All 64bits are loaded from
> memory, thus restoring the original value.
>
> When the 32bit function is called on an RV32, 32bits are stored and loaded,
> again restoring the original value.

This doesn't work, though, because the RV32 code's ABI would not know about
the wider registers, and thus would only reserve 32-bit _stack slots_. The
wider stores then clobber adjacent stack slots, and everything goes badly
wrong. This design cannot work as written - unknown registers would have to
always be assumed to be as wide as the widest RISC-V variant _in existence_,
i.e. RV128, when reserving stack slots for spilling them.

>> Also, what about pointers? The 32-bit code might receive a pointer from
>> the 64-bit code; to work well for that case, the 32-bit code would need
>> to use 64-bit pointers on every data structure, once again wasting space
>> in a 32-bit processor.
>

> No, 32bit code can only address 32bit space. So an RV64 CPU calling
> 32bit code must ensure the pointers fit in the ±2GB address space.

Except that the userspace RV64 code doing the calling _does not control this_
- this is controlled by `malloc`, which itself usually boils down to `mmap`,
which has _zero_ awareness of the code's intent to _eventually_ pass the
pointer to RV32 code.

In addition, Richard, your nonstandard quoting style is _intensely_
problematic. It is difficult to read (hard to tell where your comments end and
a subsequent piece you are responding to begins), nonstandard (and thus
requires much more conscious parsing, and is unsupported by mail clients'
display logic), and worst of all _does not list the name of the person you are
responding to_.

Please, please use a more standard quoting style.

signature.asc

Richard Herveille

unread,

Mar 8, 2018, 11:12:33 AM3/8/18

to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

From: Alex Elsayed <etern...@gmail.com>
Date: Thursday, 8 March 2018 at 16:58
To: RISC-V ISA Dev <isa...@groups.riscv.org>

Subject: Re: [isa-dev] Proposal to remove the need for binary translation for RV32G to RV64G

On Thursday, March 8, 2018 6:12:04 AM PST Richard Herveille wrote:

That sounds like a reasonable argument.

However since ABIs are software I am sure we can come up with a way to handle this without wasting loads of stack space.

Assuming now that RV128 is the final variant is shortsighted too.

Also, what about pointers? The 32-bit code might receive a pointer from
the 64-bit code; to work well for that case, the 32-bit code would need
to use 64-bit pointers on every data structure, once again wasting space
in a 32-bit processor.
No, 32bit code can only address 32bit space. So an RV64 CPU calling
32bit code must ensure the pointers fit in the ±2GB address space.

Except that the userspace RV64 code doing the calling _does not control this_

- this is controlled by `malloc`, which itself usually boils down to `mmap`,

which has _zero_ awareness of the code's intent to _eventually_ pass the

pointer to RV32 code.

Well I was initially referring to embedded code, where one has some stricter control over what goes where.

And again, there are other systems besides Linux.

It is not unknown for 32bit code to run on 64bit machines (other CPUs support this). I am sure they solved this particular issue.

In addition, Richard, your nonstandard quoting style is _intensely_

problematic. It is difficult to read (hard to tell where your comments end and

a subsequent piece you are responding to begins), nonstandard (and thus

requires much more conscious parsing, and is unsupported by mail clients'

display logic), and worst of all _does not list the name of the person you are

responding to_.

Sorry about that. I am fighting my email client. Too many accounts with all different requirements.

Is this better??

Richard

Alex Elsayed

unread,

Mar 8, 2018, 2:43:20 PM3/8/18

to RISC-V ISA Dev

The problem here is that code needs to know how much stack space a spilled
register will take up at _compile_ time. There are thus three options:

1. Fit the stack space to the target ISA (i.e. RV32). RV64 code calling RV32
code must be responsible for spilling any 64-bit values before the call. This
avoids wasting stack space.
2. Fit the stack space to the worst-case ISA (i.e. RV128 currently). RV64 code
can safely call RV32 code, which stores full registers blindly. This wastes
stack space.
3. Use relocations to patch the stack slot size at load, in many places, all
over the library.

Currently, RISC-V uses (1). It'd be _plausible_ to use (2), but would be a
significant stack-size hit on any machine except RV128. (3) would be very
burdensome on the loader, and would also absolutely destroy any chance of
sharing memory between processes for library code, since it'd always be
modified.

>>>> Also, what about pointers? The 32-bit code might receive a pointer from
>>>> the 64-bit code; to work well for that case, the 32-bit code would need
>>>> to use 64-bit pointers on every data structure, once again wasting space
>>>> in a 32-bit processor.
>>>
>>> No, 32bit code can only address 32bit space. So an RV64 CPU calling
>>> 32bit code must ensure the pointers fit in the ±2GB address space.
>>
>> Except that the userspace RV64 code doing the calling _does not control
>> this_ - this is controlled by `malloc`, which itself usually boils down to
>> `mmap`, which has _zero_ awareness of the code's intent to _eventually_
>> pass the pointer to RV32 code.
>
> Well I was initially referring to embedded code, where one has some stricter
> control over what goes where.
> And again, there are other systems besides Linux.
> It is not unknown for 32bit code to run on 64bit machines (other CPUs
> support this). I am sure they solved this particular issue.

Yes, it's not unknown for 32-bit code to run on 64-bit machines. You are
proposing something _completely_ different, though, which is for 32-bit code
to run inside of 64-bit _processes_. Basically nothing does this, and the
problems I describe are a large part of why.

Another large part of why is that the syscall ABI almost always differs
between the two, and the kernel knows which to use based on the process'
architecture. However, if a 32-bit library is loaded into a 64-bit program,
and both make syscalls, this becomes impossible.

What you are describing is unrealistic, and while there are ways to make it
work, they are overwhelmingly unlikely to be worthwhile. In the vanishingly
few cases where a 64-bit program _does_ have reason to call into 32-bit code,
treating all 64-bit registers as caller-saved is a pretty small price to pay.

>> In addition, Richard, your nonstandard quoting style is _intensely_
>> problematic. It is difficult to read (hard to tell where your comments end
>> and a subsequent piece you are responding to begins), nonstandard (and thus
>> requires much more conscious parsing, and is unsupported by mail clients'
>> display logic), and worst of all _does not list the name of the person you
>> are responding to_.
>>

>> Please, please use a more standard quoting style.
>

> Sorry about that. I am fighting my email client. Too many accounts with all
> different requirements.
> Is this better??

Somewhat; your emails are more legible now, but they don't format properly
when replied to because the plain-text part is malformed (among other issues).

Inspecting the headers, you seem to use Outlook Web Access - OWA is known to
have serious quoting issues (especially on mailing lists); using absolutely
any _desktop_ client to send email avoids the issue. For example, in Outlook,
this guide describes how to enable "internet-style quoting": https://
www.slipstick.com/outlook/email/to-use-internet-style-quoting/

Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.

signature.asc

Guy Lemieux

unread,

Mar 8, 2018, 4:25:03 PM3/8/18

to RISC-V ISA Dev

What about power consumption?

A CPU that is 64b or 128b capable will likely toggle the upper 32b/96b
of the data path needlessly when running 32b code, unless special
measures are taken in the microarchitecture.

Running only 32b processes to save power is a nice thought, but unrealistic.

On many 64b systems, you can't even compile 32b code (missing 32b
libraries or compiler). In the future, if there are savings to be had,
I can imagine structuring a program so most of the code runs in 32b
mode, with possibly some data access portions using 64b pointers. This
would save both power and memory (64b pointers are twice as big; 128b
pointers are far worse).

Can software jump in/out of 32b/64b mode on the fly to save power? Is
that a reasonable thing to expect? Or should we allow 32b/64b
instructions to freely intermingle at a fine grain? Or should the
microarchitecture somehow figure out which mode to use on the fly,
erring on the side of caution?

Guy

Jacob Bachmeyer

unread,

Mar 8, 2018, 7:44:23 PM3/8/18

to Alex Elsayed, RISC-V ISA Dev

Alex Elsayed wrote:
> On Wednesday, March 7, 2018 10:53:35 PM PST Andrew Waterman wrote:
>
>> - For Unixy systems, there is no legacy software base for RISC-V
>> (yet), so it is not even clear this will be a concern. And there
>> won't be a legacy RV32 software base for some time, because RV32 isn't
>> even supported in upstream Linux/glibc!
>>

> Yes, but the same argument applies to RV64/RV128.
>
> Just as history has shown that address space expansion converges on a flat
> address space with twice as many pointer bits (and the spec notes this, using
> it as justification for RV128 existing at all), it has also shown that legacy
> software _does_ rapidly become a differentiating factor, albeit not equally in
> all spaces. IA64 learned this the hard way.
>
> Companies and users buy proprietary software, and companies that _make_
> proprietary software are not-infrequently outlived by their users.
>
> An architecture that resulted in RV32 programs running unmodified on RV64
> would almost certainly also allow both to run unmodified on RV128, where the
> problem of legacy software will have had plenty of time to grow unchecked.
>

The UXL field solves this for general purpose systems. User-space
programs can be run with the base ISA that they expect, and all you need
are full sets of dynamic libraries (i.e. disk space to store the extra
libraries, which is cheap and (still) getting cheaper) for each base ISA
your CPU can run.

-- Jacob

Jacob Bachmeyer

unread,

Mar 8, 2018, 9:48:58 PM3/8/18

to Richard Herveille, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman

Richard Herveille wrote: [edited into Internet quoting style]

>
> Jacob Bachmeyer wrote:
>
>> Richard Herveille wrote:
>>
>>> Implementing the UXL field adds complexity, which is bad for
>>> embedded CPUs.
>>>
>> How much complexity does UXL add, given that RV64 is already supported?
>>
>>
>>

> Well,
>
> 1. there’s the registers with their encoding in CSR
> 2. muxes for illegal-opcode detection
> 3. muxes for the actual instruction execution (i.e. the ALU block)

>
> I don’t have an exact number and it won’t be a major contributor. But
> when every gate counts, this just seems a waste while it could have
> been avoided.
>

Item 1 is probably the most involved. Item 2 is easily handled by
producing an "illegal opcode in RV32 mode" signal and masking that while
in RV64 mode. Similarly, item 3 is easily handled by (internally)
making OP-32 fully orthogonal to OP and simply mapping OP to OP-32 while
in RV32 mode. The only MUXes needed are those that distinguish
execution of ADD/ADDW and the other similar pairs in RV64 and those are
needed anyway to implement RV64.

If every gate counts, why would you be using RV64? Surely RV32 would
be adequate? I would expect that hardware sufficient to need RV64 would
be far more complex than the incremental cost of implementing UXL.

>> Also, embedded systems are not running "just any" software, but a
>>
>> unified and known firmware image, so build-time translation (or simply
>>
>> "always compile for the actual processor you are targeting") is not
>>
>> unreasonable.
>>
>
>

> Yes and no. This same argument is used over and over.
>

> My counter-argument is 3^rd party libraries. Paying (sometimes a lot

> of money) for a library and not being able to use it when upgrading
> the processor (from RV32 to RV64) seems wrong and limits the use case
> for RISC-V.
>

In an embedded environment, why would you "upgrade" from RV32 to RV64?
Presumably there was some reason to use RV32 in the first place.

I think that the Stallman crowd may actually be right here -- these
kinds of problems are simply an inherent cost of using someone else's
proprietary software. The correct answer is to include such costs in
your budget estimates when you license a 3rd-party library.

>>> Besides, there are systems without HSU mode that might benefit from
>>> running RV32 code natively.
>>
>>
>> While slightly trickier, for this there is the MXL field in misa.
>>
>>
>>

> Isn’t that just a fudge?!
>
> IMO MXL, UXL are just kludges to fix the issue. Running RV32 natively
> on RV64 voids all of this.
>

Except that running RV32 code natively on RV64 is not actually possible,
since the register widths are different, which means that registers
spilled onto the stack take up different amounts of space on the stack.
The only real solution to that problem is hardware stack support, which
RISC-V explicitly eschews -- the stack is a software structure in RISC-V
and there are no PUSH and POP opcodes. Hardware stack operations are
omitted because they inhibit instruction-level parallelism.

>>> Not being able to execute RV32I(MA) natively is a mistake.
>>
>>
>> While I agree that it looks like a poor choice, I suspect that some
>>
>> other constraint drove that decision
>>
>>
>>

> I am pretty sure there are. But we’re trying to move the RISC-V
> architecture from the academic world into the commercial world.
> Different decisions matter.
>

If we want to talk about commercial practicalities, MXL/SXL/UXL are
essentially the RISC-V equivalent to how x86 handles multiple ISA
widths: put a field in a register somewhere that defines the current
ISA width. (The x86 architecture uses the hidden segment register for
the code segment to store this value, ever since the 80386 needed to
support both 16-bit and 32-bit code. AMD64 uses a previously-reserved
bit in the segment descriptor to indicate 64-bit code segments.)

-- Jacob

Richard Herveille

unread,

Mar 9, 2018, 12:23:40 AM3/9/18

to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

On 08/03/2018, 20:43, "Alex Elsayed" <etern...@gmail.com> wrote:

What you are describing is unrealistic, and while there are ways to make it

work, they are overwhelmingly unlikely to be worthwhile. In the vanishingly

few cases where a 64-bit program _does_ have reason to call into 32-bit code,

treating all 64-bit registers as caller-saved is a pretty small price to pay.

Ok. From a HW point of view is sounded easy (easier). But the SW side makes this unpractical. I guess we can put this to rest. I learned a lot here!

Anyways, the initial thread was about running RV32 programs natively in RV64. Let’s focus the discussion on that argument again.

Please, please use a more standard quoting style.

Sorry about that. I am fighting my email client. Too many accounts with all
different requirements.
Is this better??

Somewhat; your emails are more legible now, but they don't format properly

when replied to because the plain-text part is malformed (among other issues).

Inspecting the headers, you seem to use Outlook Web Access - OWA is known to

have serious quoting issues (especially on mailing lists); using absolutely

any _desktop_ client to send email avoids the issue. For example, in Outlook,

this guide describes how to enable "internet-style quoting": https://

www.slipstick.com/outlook/email/to-use-internet-style-quoting/

Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.

Thanks for the pointer. Unfortunately I am forced to use Outlook for Mac. And guess what …. It doesn’t support a prefix.

The indent is the best I can do. I wish I could still use iMail, but each time M$ updates their API either iMail or iCal gets stuck ☹

Thanks,

Alex Elsayed

unread,

Mar 9, 2018, 12:42:36 AM3/9/18

to RISC-V ISA Dev

On Thursday, March 8, 2018 9:23:34 PM PST Richard Herveille wrote:
> On 08/03/2018, 20:43, "Alex Elsayed" <etern...@gmail.com> wrote:

<snip>

>> Somewhat; your emails are more legible now, but they don't format properly
>> when replied to because the plain-text part is malformed (among other
>> issues).
>>
>> Inspecting the headers, you seem to use Outlook Web Access - OWA is known
>> to have serious quoting issues (especially on mailing lists); using
>> absolutely any _desktop_ client to send email avoids the issue. For
>> example, in Outlook,
>>
>> this guide describes how to enable "internet-style quoting":
>> https://www.slipstick.com/outlook/email/to-use-internet-style-quoting/
>>
>> Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.
>
> Thanks for the pointer. Unfortunately I am forced to use Outlook for Mac.
> And guess what …. It doesn’t support a prefix.
>
> The indent is the best I can do. I wish I could still use iMail, but each
> time M$ updates their API either iMail or iCal gets stuck ☹

<snip>

That's very odd; the link I gave includes instructions for Outlook for Mac
2016. Did a subsequent version remove the settings it describes?

Namely, Preferences -> Composing -> Replies and Forwards -> Indent each line
of the original message

signature.asc

Richard Herveille

unread,

Mar 9, 2018, 12:46:00 AM3/9/18

to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

of the original message

That option is still there and that is what I enabled. This email is what it ends up looking like …

Richard

Rogier Brussee

unread,

Mar 9, 2018, 12:05:19 PM3/9/18

to RISC-V ISA Dev

Op donderdag 8 maart 2018 20:43:20 UTC+1 schreef Alex Elsayed:

On Thursday, March 8, 2018 8:12:25 AM PST Richard Herveille wrote:
> From: Alex Elsayed <etern...@gmail.com>
> Date: Thursday, 8 March 2018 at 16:58
> To: RISC-V ISA Dev <isa...@groups.riscv.org>
> Subject: Re: [isa-dev] Proposal to remove the need for binary translation
> for RV32G to RV64G
>

[snip]

The problem here is that code needs to know how much stack space a spilled
register will take up at _compile_ time. There are thus three options:

1. Fit the stack space to the target ISA (i.e. RV32). RV64 code calling RV32
code must be responsible for spilling any 64-bit values before the call. This
avoids wasting stack space.
2. Fit the stack space to the worst-case ISA (i.e. RV128 currently). RV64 code
can safely call RV32 code, which stores full registers blindly. This wastes
stack space.
3. Use relocations to patch the stack slot size at load, in many places, all
over the library.

4. Introduce instructions that allow efficiently dealing with units of XLEN bits e.g. with xadd / xaddi instructions that do something like

xadd rd rs1 rs2 : rd <- rs1 + rs2 <<log2(XLEN/8)

xaddi rd rs1 imm12 rd <- rs1 + sext(imm12, XLEN) << log2(XLEN/8)

and use that in bitsize portable libs.

lkcl .

unread,

Mar 9, 2018, 1:02:38 PM3/9/18

to Richard Herveille, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev

On Mon, Mar 5, 2018 at 9:04 AM, Richard Herveille <richard.herveille@roalogic.com> wrote:

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.
I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.

the debian ia32 multiarch port was specifically added because, despite the limitation of only being able to address 32 bit memory spaces, using 32-bit x86 instructions on a *64-bit* x86 system was found to give massive memory usage reductions (30%) as well as modest efficiency gains:

https://wiki.debian.org/X32Port

that page shows clearly how to create such a strange (working) hybrid, including detection of X32

the problem will be (possibly addressing tommy's concern) is if there *is* no discernable difference between RV32 and RV64 instructions that a run-time execution engine (RISC-V core) can detect. i.e. if the exact same assembly-code instruction is utilised, how the heck at any given clock-cycle can the CPU tell if it is to treat the operands as 32-bit or whether to treat them as 64-bit [or 128]?

what i think you are saying, tommy, is that RV32 apps would be executed in an [emulated] RV32 mode on an RV64 [or RV128] processor.

what specifically distinguishes that scenario from the one that i believe richard would like to see (as outlined in the debian x32 port), RV32 instructions are **MIXED IN** with RV64 instructions **DIRECTLY IN THE SAME EXECUTABLE**.

for that to work, you would either need to ensure that [identical assembly-level binary-codes for] RV32 instructions may be easily distinguished from RV64 instructions... *or* that registers are specifically tagged as being either 32-bit [or 64-bit... or 128-bit] and thus *imply* that the operations should be directed to a 32-bit ALU as opposed to a 64-bit one.

both of these, i suspect, would be *massive* disruptive architectural changes and require a heck of a lot of work to analyse. still, a 30% benefit (code size reduction) if it could also be achieved with RISC-V as it has been with x86, that would be... enormous. the only thing is: x86 is so different from RISC-V (much larger numbers of registers being one of them) that it's not really possible to say that the gains from x86 will definitely be replicated.

are RV32 instructions more efficiently packed than RV64 ones, particularly on explicit operands (by value)? does Compression mean that the same program when compiled for RV64 is as efficiently stored as tne exact same RV32 one? such things strike me as being important to know.

lkcl .

unread,

Mar 9, 2018, 1:06:29 PM3/9/18

to jcb6...@gmail.com, Alex Elsayed, RISC-V ISA Dev

On Wed, Mar 7, 2018 at 3:34 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Alex Elsayed wrote:
>>
>> I don't think that would be anywhere near as beneficial. The benefits of
>> an ILP32 ABI along those lines on x86 (larger register file, new
>> instructions, etc) are much larger than on RISC-V (larger registers, all
>> else unchanged). In addition, the main benefit of the proposed change is
>> _not_ performance: it's compatibility.
>
>
> For compatibility, we already have the UXL field in sstatus, which allows a
> supervisor to configure an RV64-capable processor to act as an RV32
> processor in U-mode.

that's not what richard is describing. that's running native RV32
instructions effectively in a sandboxed (virttual) environment.

> Why are we concerned about running RV32 code on RV64 when we have the option
> for RV64 processors to directly support RV32, including RV32C?

richard is talking about *mixing* RV32 instructions *with* RV64 ones
*in an RV64 executable* because certain RV32 instructions may turn out
to be more efficient memory-wise and code-size-wise than their RV64
equivalent [if the debian x32 port is anything to go by, that could
well be the case].

l.

lkcl .

unread,

Mar 9, 2018, 1:26:26 PM3/9/18

to Andrew Waterman, Jacob Bachmeyer, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

On Thu, Mar 8, 2018 at 6:53 AM, Andrew Waterman
<wate...@eecs.berkeley.edu> wrote:
> On Wed, Mar 7, 2018 at 7:47 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> - For embedded systems, it's hard to see why running RV32 binaries on
> RV64 systems is compelling. For these systems, you can nearly always
> just recompile the code.

the lines between what constitutes an "embedded" system and one which
is capable of running at least a cut-down [off-the-shelf /
distro-maintained] GNU/Linux OS is becoming very blurred. Ingenic's
M150 processor had 128mb of DDR2 RAM on-board: their X1000 processor
now has only 32mb, it's used in mass-produced smart-watches in china
and it runs android and debian-mips32 extremely well. that's pretty
much exactly the amount of RAM that all the handhelds.org systems had,
10 years ago, and you could perfectly well run x11r6, qt3 and a full
PDA-tuned GUI on them (angstrom linux, opie / familiar).

with 300mbyte/sec 32mbyte HyperBus memory ICs coming online (5x5mm
and around $2.50 at the moment) that distinction is only going become
even greyer.

what *that* means is that there's a strong possibility of
off-the-shelf binary GNU/Linux distros coming online for both RV32 and
RV64 and being used (to save on software development time and costs)
in situations where people indeed would formerly have "rolled their
own", and for those situations it's really _not_ okay to say to people
that they can and should recompile everything (richard jones explained
why, in the case of fedora-riscv, a few months ago, and the same logic
applies to e.g. debian).

only by the time you are forced to go with 16mbyte or even 8mbyte do
you have to use openwrt and derivatives, which are so heavily
resource-restrained you *have* to compile from source.

> - We will eventually provide an x32-style ABI for applications that
> want 4-byte pointers on RV64. I'm well aware that some people don't
> like the complexity of this approach, but it does address the main
> technical reason to deliberately run RV32 code on RV64 systems.

ah! that's the one. yay. good to see you've a handle on this,
andrew :) question: are there any *architectural* changes needed to
RV64 implementations to support x32-style ABIs?

secondly: are the proposed x32-style ABIs *exactly* what richard is
talking about? as in, are you familar with the *full* internal
workings of the debian x32 port? i ask because i'm not sure if the
debian x32 port *only* does x32-style ABIs, i believe it actually
mixes 32-bit arithmetic into 64-bit code (and saves code-space as a
result). honestly, though, i'm not sure: you may have a much better
understanding of this.

l.

lkcl .

unread,

Mar 9, 2018, 1:31:50 PM3/9/18

to Richard Herveille, Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev

On Thu, Mar 8, 2018 at 9:30 AM, Richard Herveille <richard....@roalogic.com> wrote:

There are issues when calling MIPS32 code from a MIPS64 CPU, because the 32bit code only save the lower part of the registers.
This is what I hoped would be fixed by RISC-V, but instead it went the exact opposite way.
Load/Store use specify the width, meaning RV32 uses LW to load a word. And RV64 uses LD to load a double. Whereas ADD has a different behavior on RV32 and RV64.
Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits would have solved the ‘calling 32bit code on a 64bit CPU’ issue.

that does seem to be an extremely odd... "lack of symmetry" is the best phrase i could use. why does ADD have different behaviour? what direct benefits, if any, are brought about by having such a discrepancy?

And specifying the width for ALU operations (i.e. ADDW for RV32) would have ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

is there room to do so within the available space of the RISC-V binary-code? are there any down-sides to such a proposal?

l.

Michael Clark

unread,

Mar 9, 2018, 2:12:51 PM3/9/18

to lkcl ., Andrew Waterman, Jacob Bachmeyer, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

x32 is a 32-bit pointer ABI using x86_64 in long-mode with SSE. It requires a special kernel ABI as unlike the regular 32-bit compatibility ABI that is supported in 64-bit Linux kernels, the x32 process is running in long-mode not compatibility mode [1]. Also they were able to remove some cruft from the older 32-bit ABIs. It has a benefit over x86_64 is SPECint due to the smaller pointer sizes, even though both have the same number of registers. x32 has native support for 64-bit math unlike i686. It’s just the pointer size and hence per process virtual address space that is limited.

For RISC-V an X32 style ABI would be a feature of an RV64 kernel and the full RV64 instruction set would be available, including 64-bit scalar math, however LW/SW would be used for pointers instead of LD/SD and of course all of the syscalls would take 32-bit pointers. For RISC-V the kernel ABI would be very similar to the RV32 ABI however the ISA would be RV64 and UXL/SXL would be set to RV64. It’s likely the RV32 calling convention would be used but the instruction set would be RV64. I don’t know what the arch would be called in RISC-V land given we have riscv32 and riscv64.

In fact it would need GCC changes to support -march=rv64gc -mabi=ilp32

[1] https://lwn.net/Articles/456731/

Palmer Dabbelt

unread,

Mar 9, 2018, 3:46:57 PM3/9/18

to michae...@mac.com, luke.l...@gmail.com, wate...@eecs.berkeley.edu, jcb6...@gmail.com, richard....@roalogic.com, etern...@gmail.com, isa...@groups.riscv.org

This will be called rv64gc-ilp32d (or rv64gc-ilp32 or rv64imac-ilp32, depending
on what you want). It'll fall directly into the pattern for our other ISA/ABI
pairs, and would be natural to add to the default multilib target set as it's a
useful pair.

I don't know much about Debian, but this would be a very natural fit for
multilib-based distributions. Last I heard then plan for Debian was to
eventually support multilib inside multiarch for RISC-V, so this would all fall
into place. Of course, the decision of which multilib targets to support would
be up to distros.

As Michael identified here, there's a lot of work to be done. Most of the work
here is in the GCC port, but there's some work in the glibc and Linux ports as
well that will be necessary to bring up the new ABI.

lkcl .

unread,

Mar 9, 2018, 10:30:35 PM3/9/18

to Michael Clark, Andrew Waterman, Jacob Bachmeyer, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

On Fri, Mar 9, 2018 at 7:12 PM, Michael Clark <michae...@mac.com> wrote:

> For RISC-V an X32 style ABI would be a feature of an RV64 kernel and the
> full RV64 instruction set would be available, including 64-bit scalar math,
> however LW/SW would be used for pointers instead of LD/SD and of course all
> of the syscalls would take 32-bit pointers.

ok very cool, you clearly have an extremely in-depth understanding of
this. so i notice now that x32 isn't what i originally thought it
was: i thought x32 was not just about having 32-bit pointer
instructions and ABI calls within RV64 executables, i thought it also
had part of richard's idea as well, actual RV32 *ALU* operations (or
the x86 equivalent) mixed in with RV64 ALU operations (or the x86_64
equivalent).

yes i would expect gcc to have to be modified. looks like glibc
would need to be modified as well, if this article is anything to go
by, luckily that work's been done already [on x86 at least].
https://en.wikipedia.org/wiki/X32_ABI

all of which is extremely fascinating and very valuable... but even
more interestingly still doesn't answer richard's original question :)
why is there no symmetry and discernment between RV32 and RV64
instructions such that RV32 *Arithmetic* Operations (not pointer
operations) may easily be called and used within an RV64 binary? and
are there any actual benefits (or down-sides) for doing so?

l.

Jacob Bachmeyer

unread,

Mar 10, 2018, 12:34:31 AM3/10/18

to lkcl ., Andrew Waterman, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

lkcl . wrote:
> what *that* means is that there's a strong possibility of
> off-the-shelf binary GNU/Linux distros coming online for both RV32 and
> RV64 and being used (to save on software development time and costs)
> in situations where people indeed would formerly have "rolled their
> own", and for those situations it's really _not_ okay to say to people
> that they can and should recompile everything (richard jones explained
> why, in the case of fedora-riscv, a few months ago, and the same logic
> applies to e.g. debian).
>

In this case, only the custom application code (if any) must be
recompiled: for the distro, simply switch from the RV32 port to the
RV64 port or vice versa. But if Linux is involved, the situation is
exactly what UXL was intended to solve and RV32 user programs should
work on either processor.

Only when Linux is *not* in the picture is there even a problem here.

-- Jacob

Jacob Bachmeyer

unread,

Mar 10, 2018, 12:49:34 AM3/10/18

to lkcl ., Alex Elsayed, RISC-V ISA Dev

lkcl . wrote:
> On Wed, Mar 7, 2018 at 3:34 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Alex Elsayed wrote:
>>
>>> I don't think that would be anywhere near as beneficial. The benefits of
>>> an ILP32 ABI along those lines on x86 (larger register file, new
>>> instructions, etc) are much larger than on RISC-V (larger registers, all
>>> else unchanged). In addition, the main benefit of the proposed change is
>>> _not_ performance: it's compatibility.
>>>
>> For compatibility, we already have the UXL field in sstatus, which allows a
>> supervisor to configure an RV64-capable processor to act as an RV32
>> processor in U-mode.
>>
>
> that's not what richard is describing. that's running native RV32
> instructions effectively in a sandboxed (virttual) environment.
>

*All* applications run in a sandboxed (virtual) environment -- that is
what U-mode *is*. UXL simply enables an RV64 supervisor to offer both
RV64 and RV32 U-mode sandboxes.

>> Why are we concerned about running RV32 code on RV64 when we have the option
>> for RV64 processors to directly support RV32, including RV32C?
>>
>
> richard is talking about *mixing* RV32 instructions *with* RV64 ones
> *in an RV64 executable* because certain RV32 instructions may turn out
> to be more efficient memory-wise and code-size-wise than their RV64
> equivalent [if the debian x32 port is anything to go by, that could
> well be the case].
>

Since RISC-V uses fixed-length instructions, I strongly doubt that there
would ever be a code size improvement, although RVC appears to
complicate matters, but mixing RV32C and RV64C is impossible -- the
processor has no way to determine if a particular RVC instruction should
be decoded as RV32C or RV64C other than UXL.

The x32 ABI is simply x86-64 with a 4GiB address space and therefore
32-bit pointers. In RISC-V this will create a small problem due to
sign-extension, but that can be addressed by mapping the upper 2GiB at
0xFFFFFFFF'80000000 in the 64-bit page tables and possibly aliasing it
at 0x80000000. This of course will play havoc with systems that want to
use the sign bit to distinguish user/supervisor addresses, but that just
requires better supervisor programming or limiting RISC-V "x32"
processes to 2GiB of address space as if by setrlimit(2) using RLIMIT_AS.

-- Jacob

lkcl .

unread,

Mar 10, 2018, 2:58:53 AM3/10/18

to Jacob Bachmeyer, Andrew Waterman, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

On Sat, Mar 10, 2018 at 5:34 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> In this case, only the custom application code (if any) must be recompiled:
> for the distro, simply switch from the RV32 port to the RV64 port or vice
> versa. But if Linux is involved, the situation is exactly what UXL was
> intended to solve and RV32 user programs should work on either processor.

not quite, jacob: you've misunderstood. the idea here is not to run
pure RV32 user programs but to run hybrid mixed executables that have
*both* RV64 *and* RV32 instructions in them. later in the thread
you'll see that michael describes one of those scenarios where the
intention is (at a later date) to add the equivalent of x32 ABI
calling conventions to RV64 glibc6 and gcc, allowing RV64 to call RV32
ABI.

hmm... michael, am i right in thinking, that would mean that during
the call the UXL would need to switch down to RV32 just for the 32-bit
call and then switch back again after the RV32 function call
finished... is that right?

... but even there, jacob, that's *not* what richard is referring to:
he is effectively referring to actually being able to utilise RV32
Arithmetic Logic assembly instructions *in* an RV64 binary *without*
requiring a UXL switch from RV64 to RV32. is that clearer as to what
he's asking? (and richard, did i get it right?)

l.

lkcl .

unread,

Mar 10, 2018, 3:26:15 AM3/10/18

to Jacob Bachmeyer, Alex Elsayed, RISC-V ISA Dev

On Sat, Mar 10, 2018 at 5:49 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> *All* applications run in a sandboxed (virtual) environment -- that is what
> U-mode *is*. UXL simply enables an RV64 supervisor to offer both RV64 and
> RV32 U-mode sandboxes.

ok. so it's that global "tag" switch concept i thought might exist.
but unfortunately it's not in the actual instruction(s) themselves,
it's in a register / state.

>> richard is talking about *mixing* RV32 instructions *with* RV64 ones
>> *in an RV64 executable* because certain RV32 instructions may turn out
>> to be more efficient memory-wise and code-size-wise than their RV64
>> equivalent [if the debian x32 port is anything to go by, that could
>> well be the case].
>>
>
>
> Since RISC-V uses fixed-length instructions, I strongly doubt that there
> would ever be a code size improvement,

yehyeh. under x86 where the continuous usage of escape-sequencing
has left (for the most part) older instructions being more compact
than newer ones, intuitively one would expect the use of ILP32 to be
more compact.

> although RVC appears to complicate matters,

or perhaps it would mean that

> but mixing RV32C and RV64C is impossible -- the processor has no
> way to determine if a particular RVC instruction should be decoded as RV32C
> or RV64C other than UXL.

exactly, and that i think is richard's point. he's asking *why* that
is the case, when there's space in the instruction set *to* properly
identify and clarify RV32 instructions as being clearly RV32 [where it
matters]. he outlines which ones would make that possible, here:

https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/E0ED201E-5066-4FB3-B859-8EBEAD0DB117%40roalogic.com?utm_medium=email&utm_source=footer

(richard is that the _only_ ones?)

> The x32 ABI is simply x86-64 with a 4GiB address space and therefore 32-bit
> pointers. In RISC-V this will create a small problem due to sign-extension,
> but that can be addressed by mapping the upper 2GiB at 0xFFFFFFFF'80000000
> in the 64-bit page tables and possibly aliasing it at 0x80000000. This of
> course will play havoc with systems that want to use the sign bit to
> distinguish user/supervisor addresses, but that just requires better
> supervisor programming or limiting RISC-V "x32" processes to 2GiB of address
> space as if by setrlimit(2) using RLIMIT_AS.

... yuck :) that sounds like a windows-esque kludge! windows did
that: subdivided the top/bottom 2GB/2GB so that COM applications had
somewhere to put fixed-address / global (shared) memory, and you could
then switch from caller to callee and still refer to the same data
structures across a COM (DCE/RPC) remote procedure call boundary. bit
drastic but it worked.

fast-forward 10/20/30 years and with applications like firefox failing
to even compile / link in 7 GB of resident RAM i don't feel halving
the memory space would fly :)

this does seem to be getting quite complicated, quite quickly!

l.

lkcl .

unread,

Mar 10, 2018, 3:44:09 AM3/10/18

to Richard Herveille, Alex Elsayed, RISC-V ISA Dev

On Fri, Mar 9, 2018 at 5:23 AM, Richard Herveille <richard....@roalogic.com> wrote:

Thanks for the pointer. Unfortunately I am forced to use Outlook for Mac. And guess what …. It doesn’t support a prefix.
The indent is the best I can do. I wish I could still use iMail, but each time M$ updates their API either iMail or iCal gets stuck ☹

richard, for this specific mailing list you could consider utilising the google groups web-interface online to form replies. it does however require logging in to google's servers to do so, which may not be possible for you.

l.

Richard Herveille

unread,

Mar 10, 2018, 8:01:31 AM3/10/18

to lkcl ., Jacob Bachmeyer, Andrew Waterman, Alex Elsayed, isa...@groups.riscv.org, Richard Herveille

cid:image001.png@01D348FE.8B6D1030

Richard Herveille

Managing Director

Phone +31 (45) 405 5681

Cell +31 (6) 5207 2230

richard....@roalogic.com

From experience I’ve seen customer struggle with (1) legacy (3^rd party) code that was not portable and could not be upgrade, since the 3^rd party went out of business, (2) existing software targeted to a specific CPU that could not be migrated and hence required multiple CPUs in the same SoC, thereby increasing the NRE by millions. Both these issues can potentially be addressed by RISC-V. From the discussions on this thread I realize there’s more to it than just the hardware, but it’s worthwhile discussing.

Biggest pain point are the arithmetic instructions, because they can overflow. This causes different behavior for 32,64, and 128bit registers. Therefore the instruction set defines (for example) ADD, ADDW, and ADDD; RV32 has ADD, RV64 ADDW and ADD, and RV128 ADDW, ADDD, and ADD. The compiler issues ADD in case an XLEN wide operation is required.

But if the encoding (or compiler usage) was slightly different we could have ADDW, ADDD, and ADDQ. Where ADDW would always be used for 32bit operations, ADDD for 64bit and ADDQ for 128bit. In that case the code’s behavior would always be the same, independent of whether the code runs on RV32, RV64, and RV128.

Richard

l.

lkcl .

unread,

Mar 10, 2018, 8:51:10 AM3/10/18

to Richard Herveille, Jacob Bachmeyer, Andrew Waterman, Alex Elsayed, isa...@groups.riscv.org

On Sat, Mar 10, 2018 at 1:01 PM, Richard Herveille
<richard....@roalogic.com> wrote:

> From experience I’ve seen customer struggle with (1) legacy (3rd party)
> code that was not portable and could not be upgrade, since the 3rd party
> went out of business,

literally the only instance i know where code that was written 20+
years ago, the company went out of business, and the executables are
*still useable today*, is COM / Active-X Components for microsoft
windows. this is utterly amazing and down to the run-time
self-describing capability of COM - nothing to do with x86 at all.

and that's the point: it's unnnbelievably rare for 3rd party
proprietary (binary-only) software to be useable even a few years into
the future.

> But if the encoding (or compiler usage) was slightly different we could have
> ADDW, ADDD, and ADDQ. Where ADDW would always be used for 32bit
> operations, ADDD for 64bit and ADDQ for 128bit.

makes sense to me. puzzles the heck out of me when things aren't
symmetrical / balanced. and, i assume it would make ADD redundant,
and thus free up an instruction binary-code slot for alternate or
future use. ADDH. 16-bit add. something like that.

> In that case the code’s behavior would always be the same, independent of
> whether the code runs on RV32, RV64, and RV128.

ok so the question would be: if the envisioned use-case you raise is
*not* x32 ABI interoperability (or RV64/32 equivalent of), but is more
"pure proprietary RV32 executables that could become legacy
applications very very quickly", and you're going to have to upgrade
from an RV32 to an RV64 processor, *presumably* at that point the RV64
processor is far more complex anyway, it would surprise me if having
the UXL field was not a mandatory requirement: surely switching to
RV32 with the UXL field would do the job, right?

l.

Jacob Bachmeyer

unread,

Mar 10, 2018, 11:29:59 PM3/10/18

to lkcl ., Andrew Waterman, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

lkcl . wrote:
> On Sat, Mar 10, 2018 at 5:34 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> In this case, only the custom application code (if any) must be recompiled:
>> for the distro, simply switch from the RV32 port to the RV64 port or vice
>> versa. But if Linux is involved, the situation is exactly what UXL was
>> intended to solve and RV32 user programs should work on either processor.
>>
>
> not quite, jacob: you've misunderstood. the idea here is not to run
> pure RV32 user programs but to run hybrid mixed executables that have
> *both* RV64 *and* RV32 instructions in them. later in the thread
> you'll see that michael describes one of those scenarios where the
> intention is (at a later date) to add the equivalent of x32 ABI
> calling conventions to RV64 glibc6 and gcc, allowing RV64 to call RV32
> ABI.
>

Running hybrid mixed executables like this is something not officially
supported on any x86-64 operating system I know about. It is possible
(via certain hacks involving the x86 FAR CALL instruction), but
definitely *not* supported.

> hmm... michael, am i right in thinking, that would mean that during
> the call the UXL would need to switch down to RV32 just for the 32-bit
> call and then switch back again after the RV32 function call
> finished... is that right?
>

No, this is incorrect: x32 is x86-64, in 64-bit mode, with the address
space artificially limited to 32-bits so that 32-bit pointers can be
used. In a RISC-V equivalent, an "RVx32" process would run with
UXL==RV64, in 64-bit mode, but the supervisor would dispatch system
calls from that process to a special handler that knows that pointers
are 32-bit values instead of 64-bit values. On x86-64, long mode has
some major advantages (such as doubling the number of available
registers to 16) that RV64 will not have over RV32. I am uncertain
whether an "RVx32" ABI would have similar advantages to x32 or not.

> ... but even there, jacob, that's *not* what richard is referring to:
> he is effectively referring to actually being able to utilise RV32
> Arithmetic Logic assembly instructions *in* an RV64 binary *without*
> requiring a UXL switch from RV64 to RV32. is that clearer as to what
> he's asking? (and richard, did i get it right?)

Currently, the RV32 ALU instructions are interpreted as 64-bit ALU
instructions in RV64. Changing this would be a major change to the
application ISA spec.

-- Jacob

Jacob Bachmeyer

unread,

Mar 10, 2018, 11:42:29 PM3/10/18

to lkcl ., Alex Elsayed, RISC-V ISA Dev

lkcl . wrote:
> On Sat, Mar 10, 2018 at 5:49 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> *All* applications run in a sandboxed (virtual) environment -- that is what
>> U-mode *is*. UXL simply enables an RV64 supervisor to offer both RV64 and
>> RV32 U-mode sandboxes.
>>
>
> ok. so it's that global "tag" switch concept i thought might exist.
> but unfortunately it's not in the actual instruction(s) themselves,
> it's in a register / state.
>

Embedding a width in every instruction would cost an additional two bits
of encoding space, so that is a non-starter at this point. Having a
"current ISA width" in a control register somewhere is more-or-less what
other ISAs do and does not seem to cause significant problems.

>>> richard is talking about *mixing* RV32 instructions *with* RV64 ones
>>> *in an RV64 executable* because certain RV32 instructions may turn out
>>> to be more efficient memory-wise and code-size-wise than their RV64
>>> equivalent [if the debian x32 port is anything to go by, that could
>>> well be the case].
>>>
>> Since RISC-V uses fixed-length instructions, I strongly doubt that there
>> would ever be a code size improvement,
>>
>
> yehyeh. under x86 where the continuous usage of escape-sequencing
> has left (for the most part) older instructions being more compact
> than newer ones, intuitively one would expect the use of ILP32 to be
> more compact.
>

This is exactly the difference from x86 in RISC-V: *all* RVI
instructions are 32 bits long, regardless of XLEN.

>> although RVC appears to complicate matters,
>>
>
> or perhaps it would mean that
>
>
>> but mixing RV32C and RV64C is impossible -- the processor has no
>> way to determine if a particular RVC instruction should be decoded as RV32C
>> or RV64C other than UXL.
>>
>
> exactly, and that i think is richard's point. he's asking *why* that
> is the case, when there's space in the instruction set *to* properly
> identify and clarify RV32 instructions as being clearly RV32 [where it
> matters]. he outlines which ones would make that possible, here:
>
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/E0ED201E-5066-4FB3-B859-8EBEAD0DB117%40roalogic.com?utm_medium=email&utm_source=footer
>
> (richard is that the _only_ ones?)
>

There is no such space in RVC, and RVC is the only place where the
available instructions change non-trivially as XLEN changes.
(Obviously, 64-bit ALU ops are not available when XLEN is 32.)

>> The x32 ABI is simply x86-64 with a 4GiB address space and therefore 32-bit
>> pointers. In RISC-V this will create a small problem due to sign-extension,
>> but that can be addressed by mapping the upper 2GiB at 0xFFFFFFFF'80000000
>> in the 64-bit page tables and possibly aliasing it at 0x80000000. This of
>> course will play havoc with systems that want to use the sign bit to
>> distinguish user/supervisor addresses, but that just requires better
>> supervisor programming or limiting RISC-V "x32" processes to 2GiB of address
>> space as if by setrlimit(2) using RLIMIT_AS.
>>
>
> ... yuck :) that sounds like a windows-esque kludge! windows did
> that: subdivided the top/bottom 2GB/2GB so that COM applications had
> somewhere to put fixed-address / global (shared) memory, and you could
> then switch from caller to callee and still refer to the same data
> structures across a COM (DCE/RPC) remote procedure call boundary. bit
> drastic but it worked.
>
> fast-forward 10/20/30 years and with applications like firefox failing
> to even compile / link in 7 GB of resident RAM i don't feel halving
> the memory space would fly :)
>
> this does seem to be getting quite complicated, quite quickly!
>

Which is probably why no one seems to have pursued this earlier. It is
a bit of a rabbit hole. :-)

-- Jacob

lkcl .

unread,

Mar 10, 2018, 11:58:47 PM3/10/18

to Jacob Bachmeyer, Andrew Waterman, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org

On Sun, Mar 11, 2018 at 4:29 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> Running hybrid mixed executables like this is something not officially
> supported on any x86-64 operating system I know about. It is possible (via
> certain hacks involving the x86 FAR CALL instruction), but definitely *not*
> supported.

interesting: i didn't know that.

>> hmm... michael, am i right in thinking, that would mean that during
>> the call the UXL would need to switch down to RV32 just for the 32-bit
>> call and then switch back again after the RV32 function call
>> finished... is that right?
>>
>
>
> No, this is incorrect: x32 is x86-64, in 64-bit mode, with the address
> space artificially limited to 32-bits so that 32-bit pointers can be used.
> In a RISC-V equivalent, an "RVx32" process would run with UXL==RV64, in
> 64-bit mode, but the supervisor would dispatch system calls from that
> process to a special handler that knows that pointers are 32-bit values
> instead of 64-bit values.

ok. so UXL's not involved. cool. thanks for making that clear, jacob.

>> he is effectively referring to actually being able to utilise RV32
>> Arithmetic Logic assembly instructions *in* an RV64 binary *without*
>> requiring a UXL switch from RV64 to RV32. is that clearer as to what
>> he's asking? (and richard, did i get it right?)

> Currently, the RV32 ALU instructions are interpreted as 64-bit ALU
> instructions in RV64. Changing this would be a major change to the
> application ISA spec.

yes it would. so that would be a major down-side (as in, i don't
know what the consensus is but my feeling is, it's getting a bit late
in the game to make major ISA changes). are there any major up-sides
[that are not covered by running an entire RV32 executable in
UXL==RV32 mode]?

... sorry, going into "insight / rueful" mode here - i worked for
aspex semiconductors around 2003. it was the most arcane and
ambitious massively-parallel SIMD processor, and its major feature was
content-addressable memory: 4096 2-bit ALUs with 256 bits of CAM per
processor. certain classes of algorithms could be executed a HUNDRED
times faster than a standard processor, most of them involving
pattern-recognition. it was amazing and arcane and developing
algorithms was measured in DAYS per assembly-level instruction (yes,
really: not instructions per day, but days per instruction).

i worked for this company for six months. in the first three months
i learned their arcane architecture. in the following three months,
pretty much every few weeks i would go, "y'know... for this specific
application which this company has never had customers for, before,
but we've got one now, the architecture could *really* do with an
instruction that allows [for example] the CAM to be addressed
individually by one of the registers on each ALU".

their reply (virtually every time)? "oh we had that in a previous
version of the architecture but nobody used it so we took it out".

ngggggggh! :)

the point of mentioning this story is: despite asking the question
above, just because the answer might be "we, collectively, right now,
cannot think of anything" does *not* mean that someone else in the
future will not think of anything.

l.

lkcl .

unread,

Mar 11, 2018, 1:51:14 AM3/11/18

to Jacob Bachmeyer, Alex Elsayed, RISC-V ISA Dev

On Sun, Mar 11, 2018 at 4:42 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> Embedding a width in every instruction would cost an additional two bits of
> encoding space, so that is a non-starter at this point. Having a "current
> ISA width" in a control register somewhere is more-or-less what other ISAs
> do and does not seem to cause significant problems.

okay. got it. i like that concept [state-information / tagging,
changing the meaning of an instruction]. i came up with an extreme
version of that back in.. mm... 1991? got it down to an 8-bit
instruction width, with compounding state information changing
register banks (16 registers, divided into 4). 2-bits to say if
future instructions are to be an arithmetic, a boolean, a float or a
comparison operation; 2-bits to set the width of future operands
(8,16,32,64); 2-bits to... you get the idea.

>> this does seem to be getting quite complicated, quite quickly!
>>
>
>
> Which is probably why no one seems to have pursued this earlier. It is a
> bit of a rabbit hole. :-)

:)

well, the question's been asked, which is great. i learned from
dealing with questions on the EOMA68 standard, you really really do
have to go through things with innnfinite patience, because it's the
explanations to people who don't quite understand that often allow you
to spot the [legitimate] mistakes.

if nothing else, a way / roadmap to implement x32-ABI seems to have
been thoroughly explored, and (despite the supervisor issues for which
you describe some good workarounds) i'm not seeing anything that's a
genuine show-stopper.

all good :)

l.

Christian Brunschen

unread,

Mar 11, 2018, 3:18:50 PM3/11/18

to lkcl ., Richard Herveille, Jacob Bachmeyer, Andrew Waterman, Alex Elsayed, isa...@groups.riscv.org

On 10 March 2018 at 13:51, lkcl . <luke.l...@gmail.com> wrote:

literally the only instance i know where code that was written 20+
years ago, the company went out of business, and the executables are
*still useable today*, is COM / Active-X Components for microsoft
windows. this is utterly amazing and down to the run-time
self-describing capability of COM - nothing to do with x86 at all.

Are you familiar with the IBM 360 series of mainframes?

https://en.wikibooks.org/wiki/360_Assembly

This book is about assembly language programming for the Fujitsu BS2000 Mainframe and the IBM 360, 370, ESA/390, 93xx and z/System family of mainframe computers. The same assembly language was also used on the Univac 90/60, 90/70, and 90/80 mainframes. The machines generally provided upward compatibility for user programs (a program such as a payroll program written for MVS on a 24-bit 360 in the 1960 or early 1970s, will run unmodified on 31-bit VS/1 in the 1980s. While a later program (say an accounts payable application) written for 31-bit VS/1 in the 1980s wouldn't run on a 360 if it used 31-bit operations, both programs will run unmodified under z/OS on a 64-bit z/System in 2015.)

A well-designed architecture can survive and be useful for a long time.

and that's the point: it's unnnbelievably rare for 3rd party
proprietary (binary-only) software to be useable even a few years into
the future.

There are many reasons why software may only be available in binary form – sometimes, even whoever initially wrote a piece of software may lose the source code.

Best wishes,

// Christian Brunschen

Kelly Dean

unread,

Mar 13, 2018, 7:36:09 AM3/13/18

to Andrew Waterman, isa...@groups.riscv.org

Andrew Waterman writes:

> On Wed, Mar 7, 2018 at 7:47 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

>> While I agree that it looks like a poor choice, I suspect that some other

>> constraint drove that decision. To Dr. Waterman: why was the choice made
>> to use OP for "varying width" instructions rather than have distinct
>> OP-{32,64,128} for all XLEN? (This was, AFAIK, part of the PhD thesis that
>> became RISC-V.)
>
> We're writing up an explanation of this design decision as part of
> revisions to the commentary, and we will send it out when we're done.

I eagerly await. Also please include an explanation of why the load/store opcodes are fixed while the computation opcodes slide [1].

Thanks!

[1] https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/oXzp1xDFJ-E/2fQulDLJAgAJ
Inconsistency of load/store vs. computation opcode assignments

Reply all

Reply to author

Forward