Proposal to remove the need for binary translation for RV32G to RV64G

500 views
Skip to first unread message

Kelly Dean

unread,
Mar 1, 2018, 2:10:15 PM3/1/18
to RISC-V ISA Dev
It appears that binary translation of 32-bit-dependent RV32I programs to run (correctly) on RV64I processors requires three steps:

Translate the nine instructions ADD[I], SUB, SLL[I], SRL[I], and SRA[I], to their W counterparts. (This just requires flipping one bit (inst[3]) in the major opcode.)

Following «[rdcycle | rdtime | rdinstret] rd», add the instruction:
addiw rd, rd, 0

Then replace «[rdcycleh | rdtimeh | rdinstreth] rd» by the sequence:
[rdcycle | rdtime | rdinstret] rd; srai rd, rd, 32

(Unlike the arithmetic and logic translations, the read-counter translations require expansion of the program.)

Is that all? If so, then why doesn't RV32I allow those W instructions? The advantage would be that no binary translation would be necessary (except for the gratuitous incompatibility of the read-counter instructions).


If it's just a historical accident, then I propose to remove this wart in the following way.

In section 4.2 of riscv-spec-v2.2.pdf:
Remove the sentence “They [the W instructions] cause an illegal instruction exception in RV32I.” and replace it by “They're equivalent to the non-W instructions in RV32I.” This adds no complexity to RV32I decoders; it might even simplify them, since they no longer need to discriminate inst[3] when the major opcode is 0x1x011 and funct3 is 000, 001, or 101.

In table 19.3 (CSRs):
Remove the gratuitous restriction “RV32I only” from cycleh, timeh, and instreth.
Add three new CSRs: cyclel, timel, and instretl, that map to the lower 32 bits of cycle, time, and instret.

In section 2.8:
Add new pseudo-instructions RDCYCLEL, RDTIMEL, and RDINSTRETL, to read the new CSRs. On RV32I, they do the same thing as RDCYCLE, RDTIME, and RDINSTRET.
Specify that when CSRR[S|C][I] reads [cycle | time | instret][l|h] on RV64I and RV128I, it sign-extends (to maintain the invariant described in section 4.2, page 29) to XLEN bits instead of zero-extending. This is backward compatible, because in the current spec these CSRs are either nonexistent or illegal to access in RV64I and RV128I.

In section 4.4:
Remove the sentence “Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are not necessary and are illegal in RV64I.”


Then to write RV32I programs that are portable to RV64I processors, just use the W instead of the non-W instructions for algorithms that are intolerant of 64-bit operations, and use the new L-suffixed read-counter instructions instead of the non-suffixed ones. (Continue using the current H-suffixed instructions.)

This is fully backward compatible; RV32I and RV64I programs written to the 2.2 spec run unchanged on processors that implement this revised spec. If it's politically impossible to modify the frozen spec, then just define these new compatibility features to be an optional standard extension.


RV32IM on RV64IM would also work by adding MULH[[S]U]W instructions and using them (along with the current MULW, DIV[U]W, and REM[U]W)) instead of their non-W counterparts. And this would give full RV32G on RV64G, since the A, F, and D extensions are already compatible.

RV32IC on RV64IC can't be done without breaking backward compatibility, due to their differing repertoires of compressed instructions, and harmonizing them would necessarily worsen one of them. But at least RV32G on RV64G could be done.

Bruce Hoult

unread,
Mar 1, 2018, 2:27:39 PM3/1/18
to Kelly Dean, RISC-V ISA Dev
I absolutely agree. I've pointed out here several times before that allowing the W instructions as aliases for the non W in RV32 is not only trivial but, as you say, even simplifies it.

As well as the arithmetic instructions, you also want to allow the LWU (Load Word Unsigned) as an alias for LW in RV32. Again this simply requires ignoring one bit in the instruction (though a different bit).

After that, you'd want to modify the compilers to start actually using those instructions in 32 bit mode -- optionally at first, as existing hardware doesn't accept them, but in a few years it could perhaps become the default.



--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/64Gn11pyCTCo9nARXanQovr2p8xOs08KAodpwjIliUp%40local.

Michael Clark

unread,
Mar 1, 2018, 3:28:40 PM3/1/18
to Kelly Dean, RISC-V ISA Dev


> On 2/03/2018, at 8:09 AM, Kelly Dean <ke...@prtime.org> wrote:
>
> It appears that binary translation of 32-bit-dependent RV32I programs to run (correctly) on RV64I processors requires three steps:
>
> Translate the nine instructions ADD[I], SUB, SLL[I], SRL[I], and SRA[I], to their W counterparts. (This just requires flipping one bit (inst[3]) in the major opcode.)
>
> Following «[rdcycle | rdtime | rdinstret] rd», add the instruction:
> addiw rd, rd, 0

If you need to insert instructions during binary translation then you need a shadow area containing the translated code along with special handling for indirect unconditional branches i.e. JALR, to handle the remapping of program counters. Static binary translation would only be possible if you didn’t alter the code size as after linkage, most of the relocation information is omitted from the linked binary. You could do static binary translate for simple statically linked programs that don’t use true indirect branches (e.g. GOT/PLT for shlibs) if you consider the AUIPC+JALR pair as a direct unconditional branch. If you have any lone JALRs in the code, you will need to do dynamic binary translation with translated code caches et al (loads from the TEXT for example should return the untranslated code, for the translation to be accurate).

It’s possible. I could pretty easily add an RV64 backend to rv8, which handled RV32. It could also be done in QEMU.

> Then replace «[rdcycleh | rdtimeh | rdinstreth] rd» by the sequence:
> [rdcycle | rdtime | rdinstret] rd; srai rd, rd, 32
>
> (Unlike the arithmetic and logic translations, the read-counter translations require expansion of the program.)
>
> Is that all? If so, then why doesn't RV32I allow those W instructions? The advantage would be that no binary translation would be necessary (except for the gratuitous incompatibility of the read-counter instructions).
>
>
> If it's just a historical accident, then I propose to remove this wart in the following way.
>
> In section 4.2 of riscv-spec-v2.2.pdf:
> Remove the sentence “They [the W instructions] cause an illegal instruction exception in RV32I.” and replace it by “They're equivalent to the non-W instructions in RV32I.” This adds no complexity to RV32I decoders; it might even simplify them, since they no longer need to discriminate inst[3] when the major opcode is 0x1x011 and funct3 is 000, 001, or 101.
>
> In table 19.3 (CSRs):
> Remove the gratuitous restriction “RV32I only” from cycleh, timeh, and instreth.
> Add three new CSRs: cyclel, timel, and instretl, that map to the lower 32 bits of cycle, time, and instret.
>
> In section 2.8:
> Add new pseudo-instructions RDCYCLEL, RDTIMEL, and RDINSTRETL, to read the new CSRs. On RV32I, they do the same thing as RDCYCLE, RDTIME, and RDINSTRET.
> Specify that when CSRR[S|C][I] reads [cycle | time | instret][l|h] on RV64I and RV128I, it sign-extends (to maintain the invariant described in section 4.2, page 29) to XLEN bits instead of zero-extending. This is backward compatible, because in the current spec these CSRs are either nonexistent or illegal to access in RV64I and RV128I.
>
> In section 4.4:
> Remove the sentence “Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are not necessary and are illegal in RV64I.”
>
>
> Then to write RV32I programs that are portable to RV64I processors, just use the W instead of the non-W instructions for algorithms that are intolerant of 64-bit operations, and use the new L-suffixed read-counter instructions instead of the non-suffixed ones. (Continue using the current H-suffixed instructions.)
>
> This is fully backward compatible; RV32I and RV64I programs written to the 2.2 spec run unchanged on processors that implement this revised spec. If it's politically impossible to modify the frozen spec, then just define these new compatibility features to be an optional standard extension.
>
>
> RV32IM on RV64IM would also work by adding MULH[[S]U]W instructions and using them (along with the current MULW, DIV[U]W, and REM[U]W)) instead of their non-W counterparts. And this would give full RV32G on RV64G, since the A, F, and D extensions are already compatible.
>
> RV32IC on RV64IC can't be done without breaking backward compatibility, due to their differing repertoires of compressed instructions, and harmonizing them would necessarily worsen one of them. But at least RV32G on RV64G could be done.

I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.

There would need to be a new RV32X ABI that is somewhat like x32 on x86_64. A 32-bit ABI on RV64 may be practical but RV32 is likely not going to change as the Base ISA is frozen. As you mentioned, the compressed extension is different.

I can see ilp32 for rv64 being a distinct possibility, but that would use 64-bit instructions for fast handling of long long as does x32. i.e. the primary benefit is in pointer size for smaller memory systems. x32 supports the 64-bit instructions for 64-bit scalars.

Bruce Hoult

unread,
Mar 1, 2018, 4:02:02 PM3/1/18
to Michael Clark, Kelly Dean, RISC-V ISA Dev
Note that making an RVC binary that runs correctly on both rv32 and rv64 only requires avoiding:

C.FLW, C.FSW, C.FLWSP, C.FSWSP -- single precision FP loads/stores. DP is fine.
C.JAL -- subroutine call. Of limited use in non-embedded programs as it only has a +/- 2KB range.

This would have very little effect on the size savings in most programs -- especially as most don't use FP at all.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Kelly Dean

unread,
Mar 1, 2018, 10:10:17 PM3/1/18
to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> As well as the arithmetic instructions, you also want to allow the LWU
> (Load Word Unsigned) as an alias for LW in RV32.

Why?

If address n contains FFFF_FFFF and register x1 contains n, and you do
lw x1, x1, 0
slt x1, x1, x0

Then you get 1 in x1, on both RV32 and RV64. No binary translation necessary.

But if you use LWU instead of LW, then you'd get 1 on RV32 but 0 on RV64. Adding support for LWU to RV32 would be counterproductive, because code using it would run incorrectly on RV64.

Bruce Hoult

unread,
Mar 1, 2018, 10:27:29 PM3/1/18
to Kelly Dean, RISC-V ISA Dev
Well, yes, because there's a bug in your code

If the FFFF_FFFF at address n is semantically an unsigned value (i.e. 4294967295 not -1) then you need to use...

lwu x1, x1, 0
sltu x1, x1, x0

... which will produce the same results on RV32 and RV64 if RV32 is as suggested modified to accept the lwu opcode and execute it exactly the same as it executes lw.

Of course no unsigned values are less than 0, so you might want to use something other than x0 to compare against.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Kelly Dean

unread,
Mar 1, 2018, 11:26:40 PM3/1/18
to Michael Clark, RISC-V ISA Dev

Michael Clark writes:

> I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.

Then as I said, just define it as an optional standard extension. Name it e.g. “Z”. RV32I silicon shipping and the spec being frozen doesn't mean new standard extensions can't be added; e.g. B will be, and RV32IB silicon will ship in the future. As with B, and any other extension, software that doesn't use Z will still run correctly on processors that do implement Z.

Of course, “Z” is a silly name for it. Section 22.4 of the spec v2.2 already explains exactly what the right names are:
“RV32I2p1M2p1”, “RV32G2p1”, “RV64I2p1M2p1”, and “RV64G2p1”.


> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64. A 32-bit ABI on RV64 may be practical

RV64 already has a standard 32-bit ABI: it's the *W instructions.

Samuel Falvo II

unread,
Mar 1, 2018, 11:31:45 PM3/1/18
to Kelly Dean, Michael Clark, RISC-V ISA Dev
On Thu, Mar 1, 2018 at 8:26 PM, Kelly Dean <ke...@prtime.org> wrote:
> Michael Clark writes:
>> I agree with the sentiment but not with the timing. Given there are already many RV32 designs including shipping silicon it just may not be practical.
>
> Then as I said, just define it as an optional standard extension. Name it e.g. “Z”. RV32I silicon shipping and the spec being frozen doesn't mean new standard extensions can't be added; e.g. B will be, and RV32IB silicon will ship in the future. As with B, and any other extension, software that doesn't use Z will still run correctly on processors that do implement Z.

I think this is what, in part, the "X" non-standard extensions were
intended for.

I'll be happy to support this extension for my future-planned revision
to my Kestrel project's processor design. It's not formally
documented, but I already support LDU and *all* ALU instructions have
corresponding -W variants (not just the 4 listed in the standard),
purely because I got lazy with the decoder and just didn't see the
point. ;)

--
Samuel A. Falvo II

Samuel Falvo II

unread,
Mar 1, 2018, 11:35:31 PM3/1/18
to Kelly Dean, Michael Clark, RISC-V ISA Dev
On Thu, Mar 1, 2018 at 8:31 PM, Samuel Falvo II <sam....@gmail.com> wrote:
> I'll be happy to support this extension for my future-planned revision
> to my Kestrel project's processor design. It's not formally

What I meant was to apply this core idea to RV64->RV128 instructions.
My CPU design is already 64-bit. Heh.

Kelly Dean

unread,
Mar 2, 2018, 10:43:00 AM3/2/18
to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> Well, yes, because there's a bug in your code

Yes, intentionally; my point is that LWU, itself, in RV32 software would (always) be a latent bug, masked when run on an RV32 processor but potentially exposed when run on an RV64 processor. The bug is not merely the combination of LWU and SLT.

> If the FFFF_FFFF at address n is semantically an unsigned value (i.e.
> 4294967295 not -1) then you need to use...
>
> lwu x1, x1, 0
> sltu x1, x1, x0
>
> ... which will produce the same results on RV32 and RV64 if RV32 is as
> suggested modified to accept the lwu opcode and execute it exactly the same
> as it executes lw.
>
> Of course no unsigned values are less than 0, so you might want to use
> something other than x0 to compare against.

Using LW with SLTU correctly produces the same results on RV32 and RV64. Even for unsigned values, you must use LW in 32-bit software (i.e. software that operates on 32-bit values in registers), regardless of whether the software runs on an RV32 processor (with 32-bit registers) or RV64 processor (where the *W instructions emulate 32-bit registers).

LWU doesn't work (not even for unsigned values) for 32-bit software, because LWU fails to emulate a 32-bit destination register. It violates the invariant described in section 4.2 of the spec v2.2, which says “all 32-bit values are held in a sign-extended format in 64-bit registers. Even 32-bit unsigned integers extend bit 31 into bits 63 through 32”. Consider another example:

i+0=i, for any value of i, whether signed or unsigned.
If address n contains FFFF_FFFF and register x1 contains n, and you do
lw x1, x1, 0
addiw x2, x1, 0
beq x2, x1, target

Then it would branch (as it should), on both RV32 and RV64. But if you use LWU instead of LW, it would branch on RV32 but not on RV64. This is 32-bit software, because it uses ADDIW; therefore, LWU itself would be the bug.

Alex Elsayed

unread,
Mar 2, 2018, 2:50:27 PM3/2/18
to isa...@groups.riscv.org
On Thursday, 1 March 2018 12:28:19 PST Michael Clark wrote:
> > On 2/03/2018, at 8:09 AM, Kelly Dean <ke...@prtime.org> wrote:
> >

<snip>

> > Is that all? If so, then why doesn't RV32I allow those W instructions? The
> > advantage would be that no binary translation would be necessary (except
> > for the gratuitous incompatibility of the read-counter instructions).
> >
> >
> > If it's just a historical accident, then I propose to remove this wart in
> > the following way.

<snip>

> I agree with the sentiment but not with the timing. Given there are already
> many RV32 designs including shipping silicon it just may not be practical.

I'll note that these changes take things that trap and give them non-trap
semantics. As a result, this can be implemented for existing silicon in M- or
S-mode, albeit with a performance penalty.

> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64.
> A 32-bit ABI on RV64 may be practical but RV32 is likely not going to
> change as the Base ISA is frozen. As you mentioned, the compressed
> extension is different.

This is notably different from x32, in that the new ABI will run on 32-bit
processors, while x32 runs only in long mode. If anything, this is bringing
RISC-V closer to SPARC's approach (of making 32-bit programs execute
unmodified under 64-bit mode). In addition, this does not change the calling
convention _at all_ compared to RV32. x32 does.

It stops short of that (due to the C extension), but IMO it's a very
interesting design point.

It would be a new ABI though, yes, simply due to the *L variants of the time
CSRs.

> I can see ilp32 for rv64 being a distinct possibility, but that would use
> 64-bit instructions for fast handling of long long as does x32. i.e. the
> primary benefit is in pointer size for smaller memory systems. x32 supports
> the 64-bit instructions for 64-bit scalars.

I don't think that would be anywhere near as beneficial. The benefits of an
ILP32 ABI along those lines on x86 (larger register file, new instructions,
etc) are much larger than on RISC-V (larger registers, all else unchanged). In
addition, the main benefit of the proposed change is _not_ performance: it's
compatibility.


Bruce Hoult

unread,
Mar 2, 2018, 4:11:09 PM3/2/18
to Alex Elsayed, RISC-V ISA Dev
On Fri, Mar 2, 2018 at 10:49 PM, Alex Elsayed <etern...@gmail.com> wrote:
On Thursday, 1 March 2018 12:28:19 PST Michael Clark wrote:
> There would need to be a new RV32X ABI that is somewhat like x32 on x86_64.
> A 32-bit ABI on RV64 may be practical but RV32 is likely not going to
> change as the Base ISA is frozen. As you mentioned, the compressed
> extension is different.

This is notably different from x32, in that the new ABI will run on 32-bit
processors, while x32 runs only in long mode. If anything, this is bringing
RISC-V closer to SPARC's approach (of making 32-bit programs execute
unmodified under 64-bit mode). In addition, this does not change the calling
convention _at all_ compared to RV32. x32 does.

And PowerPC. Right from the start, 32 bit PowerPC processors provided both lw and lwz (zero extend) instructions that acted identically on 32 bit CPUs. Compilers right from the start used lw for signed values and lwz for unsigned values.

When the 64 bit G5 Macs came out, all legacy 32 bit PowerPC code kept right on working.  
 
It stops short of that (due to the C extension), but IMO it's a very
interesting design point.
 
As I showed previously in this thread, you can use *almost* all of RV32C. You only have to avoid the four single precision FP load and store instructions, and C.JAL (which can only address +/- 2 KB anyway).

Kelly Dean

unread,
Mar 3, 2018, 12:39:33 AM3/3/18
to Bruce Hoult, RISC-V ISA Dev

Bruce Hoult writes:

> Note that making an RVC binary that runs correctly on both rv32 and rv64
> only requires avoiding:
>
> C.FLW, C.FSW, C.FLWSP, C.FSWSP -- single precision FP loads/stores. DP is
> fine.
> C.JAL -- subroutine call. Of limited use in non-embedded programs as it
> only has a +/- 2KB range.
>
> This would have very little effect on the size savings in most programs --
> especially as most don't use FP at all.

You'd also have to avoid C.ADDI4SPN, C.ADDI, C.ADDI16SP, C.SRLI, C.SUB, C.SLLI, and C.ADD, for the same reason that you have to avoid the full instructions that those represent. They operate on 32-bit values on RV32 but 64-bit values on RV64.

Adding insult to injury, the RV32/64 encodings for C.ADDW and C.SUBW are the same, but you can't use them in RV32 software as a substitute for C.ADD and C.SUB (which you must avoid, in order to be portable to RV64) because they're capriciously reserved on RV32 (consistent with the current prohibition of all *W computation instructions on RV32).

That all adds up to a very large effect on the size savings.

Richard Herveille

unread,
Mar 5, 2018, 4:04:35 AM3/5/18
to Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.

I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.

 

Richard

 

 

 

cid:image001.png@01D348FE.8B6D1030

 

Richard Herveille

Managing Director

Phone +31 (45) 405 5681

Cell +31 (6) 5207 2230

richard....@roalogic.com

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.


To post to this group, send email to

Tommy Thorn

unread,
Mar 5, 2018, 4:12:24 PM3/5/18
to Richard Herveille, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev
Why would that be?  Wouldn't you just run RV32 apps in RV32 mode on Linux
like you currently can run 32-bit apps on a 64-bit processor.  I fail to understand
what the big deal is here and when this would ever be useful.

Tommy


On Mar 5, 2018, at 01:04 , Richard Herveille <richard....@roalogic.com> wrote:

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.
I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.
 
Richard
 
 
 
<image001.png>

Richard Herveille

unread,
Mar 6, 2018, 3:05:29 AM3/6/18
to Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

Besides that there are other systems than Linux …

The RISC-V ISA has the potential of natively running RV32I (probably RV32IMA) on an RV64 or RV128 CPU.

However due to some minor choices this is not possible. For example the ADD instruction ix XLEN sized. So ADD on an RV32 CPU behaves differently than on an RV64 CPU. However RV32-ADD and RV64-ADDW behave the same.

 

So why not fix the opcode such that RV32 always uses ADDW instead of ADD? If that’s done consequently (ie for other conflicting opcodes), then RV32 code runs on an RV64 CPU (and RV128 CPU for that matter).

 

Richard

 

cid:image001.png@01D348FE.8B6D1030

Jacob Bachmeyer

unread,
Mar 6, 2018, 10:34:39 PM3/6/18
to Alex Elsayed, isa...@groups.riscv.org
Alex Elsayed wrote:
> I don't think that would be anywhere near as beneficial. The benefits of an
> ILP32 ABI along those lines on x86 (larger register file, new instructions,
> etc) are much larger than on RISC-V (larger registers, all else unchanged). In
> addition, the main benefit of the proposed change is _not_ performance: it's
> compatibility.

For compatibility, we already have the UXL field in sstatus, which
allows a supervisor to configure an RV64-capable processor to act as an
RV32 processor in U-mode.

Why are we concerned about running RV32 code on RV64 when we have the
option for RV64 processors to directly support RV32, including RV32C?

-- Jacob

Richard Herveille

unread,
Mar 7, 2018, 2:52:50 AM3/7/18
to jcb6...@gmail.com, Alex Elsayed, isa...@groups.riscv.org, Richard Herveille

Implementing the UXL field adds complexity, which is bad for embedded CPUs.

Besides, there are systems without HSU mode that might benefit from running RV32 code natively.

Not being able to execute RV32I(MA) natively is a mistake.

 

Richard

 

 

 

cid:image001.png@01D348FE.8B6D1030

 

Richard Herveille

Managing Director

Phone +31 (45) 405 5681

Cell +31 (6) 5207 2230

richard....@roalogic.com

 

 

From: Jacob Bachmeyer <jcb6...@gmail.com>
Reply-To: "jcb6...@gmail.com" <jcb6...@gmail.com>
Date: Wednesday, 7 March 2018 at 04:34
To: Alex Elsayed <etern...@gmail.com>
Cc: "isa...@groups.riscv.org" <isa...@groups.riscv.org>
Subject: Re: [isa-dev] Proposal to remove the need for binary translation for RV32G to RV64G

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Jacob Bachmeyer

unread,
Mar 7, 2018, 10:47:29 PM3/7/18
to Richard Herveille, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman
Richard Herveille wrote:
>
> Implementing the UXL field adds complexity, which is bad for embedded
> CPUs.
>

How much complexity does UXL add, given that RV64 is already supported?
Also, embedded systems are not running "just any" software, but a
unified and known firmware image, so build-time translation (or simply
"always compile for the actual processor you are targeting") is not
unreasonable.

> Besides, there are systems without HSU mode that might benefit from
> running RV32 code natively.
>

While slightly trickier, for this there is the MXL field in misa.

> Not being able to execute RV32I(MA) natively is a mistake.
>

While I agree that it looks like a poor choice, I suspect that some
other constraint drove that decision. To Dr. Waterman: why was the
choice made to use OP for "varying width" instructions rather than have
distinct OP-{32,64,128} for all XLEN? (This was, AFAIK, part of the PhD
thesis that became RISC-V.)


-- Jacob

Andrew Waterman

unread,
Mar 8, 2018, 1:53:58 AM3/8/18
to Jacob Bachmeyer, Richard Herveille, Alex Elsayed, isa...@groups.riscv.org
We're writing up an explanation of this design decision as part of
revisions to the commentary, and we will send it out when we're done.

Separately, and this is just my opinion, I don't find this to be a
particularly pressing issue. My reasons include the following, some
of which have already been pointed out in this thread:

- For embedded systems, it's hard to see why running RV32 binaries on
RV64 systems is compelling. For these systems, you can nearly always
just recompile the code. Furthermore, the memory map and other
platform details tend to be baked into the binary, and these will
differ between RV32 and RV64 systems; the instruction encoding is the
least of one's concerns. Finally, if you know you want to deploy RV32
software in a power constrained system, you should just use an RV32
core.

- For Unixy systems, the processors are already sufficiently complex
that supporting UXL is a blip on the complexity radar. (And note, the
hardware cost is very low. It may be a pain to implement, but it
takes few gates.)

- For Unixy systems, there is no legacy software base for RISC-V
(yet), so it is not even clear this will be a concern. And there
won't be a legacy RV32 software base for some time, because RV32 isn't
even supported in upstream Linux/glibc!

To this point, consider also that there are AArch64 server processors
incapable of natively executing ARMv7 programs. AFAIK this has not
proven problematic, since there isn't a long history of ARMv7 Linux
servers.

- We will eventually provide an x32-style ABI for applications that
want 4-byte pointers on RV64. I'm well aware that some people don't
like the complexity of this approach, but it does address the main
technical reason to deliberately run RV32 code on RV64 systems.

>
>
> -- Jacob

Richard Herveille

unread,
Mar 8, 2018, 4:06:32 AM3/8/18
to Andrew Waterman, Jacob Bachmeyer, Alex Elsayed, isa...@groups.riscv.org, Richard Herveille

 

- For embedded systems, it's hard to see why running RV32 binaries on

RV64 systems is compelling.  For these systems, you can nearly always

just recompile the code.  Furthermore, the memory map and other

platform details tend to be baked into the binary, and these will

differ between RV32 and RV64 systems; the instruction encoding is the

least of one's concerns.  Finally, if you know you want to deploy RV32

software in a power constrained system, you should just use an RV32

core.

 

[rih] This is simply not true.

RISC-V is a young architecture and there isn’t much legacy software out there at the moment, but there will be.

In the embedded world it is very common to buy libraries from 3rd parties.

If a user, at some point in the future, wants to upgrade from an RV32 to an RV64 CPU then the incompatibility makes this impossible.

 

 

- For Unixy systems, the processors are already sufficiently complex

that supporting UXL is a blip on the complexity radar.  (And note, the

hardware cost is very low.  It may be a pain to implement, but it

takes few gates.)

 

[rih] This argument holds no value in my opinion.

  1. This argument was about embedded stuff. Everybody keeps talking about unox, but in the embedded world there are many other systems.
  2. UXL (and the likes) is a cludge to make RV32 code work on an RV64 system.

If RV64 would natively run RV32 code none of this would be necessary in the first place. Hence even less gates and no pain to implement.

 

 

To this point, consider also that there are AArch64 server processors

incapable of natively executing ARMv7 programs.  AFAIK this has not

proven problematic, since there isn't a long history of ARMv7 Linux

servers.

 

[rih] You’re comparing apples and pears

RISC-V is supposed to be unique in that the architecture supports small microcontroller type implementations all the way up to mega servers.

If that’s not the case, then we need to split.

Richard Herveille

unread,
Mar 8, 2018, 4:17:36 AM3/8/18
to jcb6...@gmail.com, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman, Richard Herveille

 

 

Implementing the UXL field adds complexity, which is bad for embedded

CPUs.

 

 

How much complexity does UXL add, given that RV64 is already supported?  

 

[rih] Well,

  1. there’s the registers with their encoding in CSR
  2. muxes for illegal-opcode detection
  3. muxes for the actual instruction execution (i.e. the ALU block)

I don’t have an exact number and it won’t be a major contributor. But when every gate counts, this just seems a waste while it could have been avoided.

 

 

Also, embedded systems are not running "just any" software, but a

unified and known firmware image, so build-time translation (or simply

"always compile for the actual processor you are targeting") is not

unreasonable.

 

[rih] Yes and no. This same argument is used over and over.

My counter-argument is 3rd party libraries. Paying (sometimes a lot of money) for a library and not being able to use it when upgrading the processor (from RV32 to RV64) seems wrong and limits the use case for RISC-V.

 

 

Besides, there are systems without HSU mode that might benefit from

running RV32 code natively.

 

 

While slightly trickier, for this there is the MXL field in misa.

 

[rih] Isn’t that just a fudge?!

IMO MXL, UXL are just kludges to fix the issue. Running RV32 natively on RV64 voids all of this.

 

 

 

Not being able to execute RV32I(MA) natively is a mistake.

 

 

While I agree that it looks like a poor choice, I suspect that some

other constraint drove that decision

 

[rih] I am pretty sure there are. But we’re trying to move the RISC-V architecture from the academic world into the commercial world. Different decisions matter.

 

cid:image001.png@01D348FE.8B6D1030

Richard Herveille

unread,
Mar 8, 2018, 4:30:57 AM3/8/18
to Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

 

Why would that be?  Wouldn't you just run RV32 apps in RV32 mode on Linux

[rih] There are other systems besides unox.

 

 

like you currently can run 32-bit apps on a 64-bit processor.  I fail to understand

what the big deal is here and when this would ever be useful.

[rih] RISC-V is different in this aspect than other 64bit processors.

For x86 the 64bit opcodes are an extension of the 32bit CPU (which is an extension of the 16bit 8086).

MIPS32 code can run on MIPS64 and behaves the same as it would on an MIPS32-CPU.

 

There are issues when calling MIPS32 code from a MIPS64 CPU, because the 32bit code only save the lower part of the registers.

This is what I hoped would be fixed by RISC-V, but instead it went the exact opposite way.

Load/Store use specify the width, meaning RV32 uses LW to load a word. And RV64 uses LD to load a double. Whereas ADD has a different behavior on RV32 and RV64.

Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And specifying the width for ALU operations (i.e. ADDW for RV32) would have ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

 

cid:image001.png@01D348FE.8B6D1030

Cesar Eduardo Barros

unread,
Mar 8, 2018, 6:30:00 AM3/8/18
to Richard Herveille, Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev
Em 08-03-2018 06:30, Richard Herveille escreveu:
> There are issues when calling MIPS32 code from a MIPS64 CPU, because the
> 32bit code only save the lower part of the registers.
>
> This is what I hoped would be fixed by RISC-V, but instead it went the
> exact opposite way.
>
> Load/Store use specify the width, meaning RV32 uses LW to load a word.
> And RV64 uses LD to load a double. Whereas ADD has a different behavior
> on RV32 and RV64.
>
> Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits
> would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And
> specifying the width for ALU operations (i.e. ADDW for RV32) would have
> ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

I apologize if I'm missing something obvious, but how would a load/store
of XLEN bits help in the scenario where 64-bit code calls 32-bit code?

Suppose the 64-bit code has something in register s0, and 32-bit code
wants to save that value. If XLEN is 32 bits, it needs a 4-byte save
area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit
code always allocates a 4-byte save area, a XLEN-sized store will
overflow it when run on a 64-bit processor; if the 32-bit code always
allocates an 8-byte save area, it will be wasting space when run a
32-bit processor (and 32-bit processors usually have less memory).

Also, what about pointers? The 32-bit code might receive a pointer from
the 64-bit code; to work well for that case, the 32-bit code would need
to use 64-bit pointers on every data structure, once again wasting space
in a 32-bit processor.

The more I think about it, the less the "mixed 32-bit and 64-bit code in
the same process" scenario makes sense. In a separate process (or
process-like entity), it makes more sense: the process/task switch code
is responsible for saving and restoring the registers, and normally
pointers aren't shared between processes.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Richard Herveille

unread,
Mar 8, 2018, 9:12:11 AM3/8/18
to Cesar Eduardo Barros, Tommy Thorn, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev, Richard Herveille

 

 

There are issues when calling MIPS32 code from a MIPS64 CPU, because the

32bit code only save the lower part of the registers.

This is what I hoped would be fixed by RISC-V, but instead it went the

exact opposite way.

Load/Store use specify the width, meaning RV32 uses LW to load a word.

And RV64 uses LD to load a double. Whereas ADD has a different behavior

on RV32 and RV64.

Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits

would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And

specifying the width for ALU operations (i.e. ADDW for RV32) would have

ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.

 

I apologize if I'm missing something obvious, but how would a load/store

of XLEN bits help in the scenario where 64-bit code calls 32-bit code?

 

Suppose the 64-bit code has something in register s0, and 32-bit code

wants to save that value. If XLEN is 32 bits, it needs a 4-byte save

area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit

code always allocates a 4-byte save area, a XLEN-sized store will

overflow it when run on a 64-bit processor; if the 32-bit code always

allocates an 8-byte save area, it will be wasting space when run a

32-bit processor (and 32-bit processors usually have less memory).

 

[rih] XLEN is determined by the CPU, not the program code.

When passing values from 64bit code to 32bit code, the value can only be 32bits large of course. Also the value must be located in the lower 32bits of the register.

That’s not where an XLEN bits load/store helps. It helps when saving registers during the pre-/post-amble.

Support S0 contains some value that must be restored after the function call. XLEN=64bits, so a STORE would write all 64bits to memory, since the CPU is an RV64. The 32bit program only uses the 32LSBs of S0. After the call completes, S0 is restored using a LOAD. All 64bits are loaded from memory, thus restoring the original value.

When the 32bit function is called on an RV32, 32bits are stored and loaded, again restoring the original value.

 

 

 

Also, what about pointers? The 32-bit code might receive a pointer from

the 64-bit code; to work well for that case, the 32-bit code would need

to use 64-bit pointers on every data structure, once again wasting space

in a 32-bit processor.

[rih] No, 32bit code can only address 32bit space. So an RV64 CPU calling 32bit code must ensure the pointers fit in the ±2GB address space.

 

Richard

Alex Elsayed

unread,
Mar 8, 2018, 10:58:39 AM3/8/18
to RISC-V ISA Dev
On Wednesday, March 7, 2018 10:53:35 PM PST Andrew Waterman wrote:

<snip>

> - For Unixy systems, there is no legacy software base for RISC-V
> (yet), so it is not even clear this will be a concern. And there
> won't be a legacy RV32 software base for some time, because RV32 isn't
> even supported in upstream Linux/glibc!

<snip>

Yes, but the same argument applies to RV64/RV128.

Just as history has shown that address space expansion converges on a flat
address space with twice as many pointer bits (and the spec notes this, using
it as justification for RV128 existing at all), it has also shown that legacy
software _does_ rapidly become a differentiating factor, albeit not equally in
all spaces. IA64 learned this the hard way.

Companies and users buy proprietary software, and companies that _make_
proprietary software are not-infrequently outlived by their users.

An architecture that resulted in RV32 programs running unmodified on RV64
would almost certainly also allow both to run unmodified on RV128, where the
problem of legacy software will have had plenty of time to grow unchecked.
signature.asc

Alex Elsayed

unread,
Mar 8, 2018, 10:58:43 AM3/8/18
to RISC-V ISA Dev
On Thursday, March 8, 2018 6:12:04 AM PST Richard Herveille wrote:
> Cesar Eduardo Barros wrote:
>> Em 08-03-2018 06:30, Richard Herveille escreveu:
>>
>>> There are issues when calling MIPS32 code from a MIPS64 CPU, because the
>>> 32bit code only save the lower part of the registers.
>>> This is what I hoped would be fixed by RISC-V, but instead it went the
>>> exact opposite way.
>>>
>>> Load/Store use specify the width, meaning RV32 uses LW to load a word.
>>> And RV64 uses LD to load a double. Whereas ADD has a different behavior
>>> on RV32 and RV64.
>>>
>>> Having a LOAD which loads XLEN bits and a STORE which stores XLEN bits
>>> would have solved the ‘calling 32bit code on a 64bit CPU’ issue. And
>>> specifying the width for ALU operations (i.e. ADDW for RV32) would have
>>> ensured RV32 bit code would behave the same on an RV32 and RV64 CPU.
>>
>> I apologize if I'm missing something obvious, but how would a load/store
>> of XLEN bits help in the scenario where 64-bit code calls 32-bit code?
>>
>> Suppose the 64-bit code has something in register s0, and 32-bit code
>> wants to save that value. If XLEN is 32 bits, it needs a 4-byte save
>> area; if XLEN is 64 bits, it needs an 8-byte save area. If the 32-bit
>> code always allocates a 4-byte save area, a XLEN-sized store will
>> overflow it when run on a 64-bit processor; if the 32-bit code always
>> allocates an 8-byte save area, it will be wasting space when run a
>> 32-bit processor (and 32-bit processors usually have less memory).
>
> XLEN is determined by the CPU, not the program code.
>
> When passing values from 64bit code to 32bit code, the value can only be
> 32bits large of course. Also the value must be located in the lower 32bits
> of the register.
>
> That’s not where an XLEN bits load/store helps. It helps when saving
> registers during the pre-/post-amble.
>
> Support S0 contains some value that must be restored after the function
> call. XLEN=64bits, so a STORE would write all 64bits to memory, since the
> CPU is an RV64. The 32bit program only uses the 32LSBs of S0. After the
> call completes, S0 is restored using a LOAD. All 64bits are loaded from
> memory, thus restoring the original value.
>
> When the 32bit function is called on an RV32, 32bits are stored and loaded,
> again restoring the original value.

This doesn't work, though, because the RV32 code's ABI would not know about
the wider registers, and thus would only reserve 32-bit _stack slots_. The
wider stores then clobber adjacent stack slots, and everything goes badly
wrong. This design cannot work as written - unknown registers would have to
always be assumed to be as wide as the widest RISC-V variant _in existence_,
i.e. RV128, when reserving stack slots for spilling them.

>> Also, what about pointers? The 32-bit code might receive a pointer from
>> the 64-bit code; to work well for that case, the 32-bit code would need
>> to use 64-bit pointers on every data structure, once again wasting space
>> in a 32-bit processor.
>
> No, 32bit code can only address 32bit space. So an RV64 CPU calling
> 32bit code must ensure the pointers fit in the ±2GB address space.

Except that the userspace RV64 code doing the calling _does not control this_
- this is controlled by `malloc`, which itself usually boils down to `mmap`,
which has _zero_ awareness of the code's intent to _eventually_ pass the
pointer to RV32 code.

In addition, Richard, your nonstandard quoting style is _intensely_
problematic. It is difficult to read (hard to tell where your comments end and
a subsequent piece you are responding to begins), nonstandard (and thus
requires much more conscious parsing, and is unsupported by mail clients'
display logic), and worst of all _does not list the name of the person you are
responding to_.

Please, please use a more standard quoting style.
signature.asc

Richard Herveille

unread,
Mar 8, 2018, 11:12:33 AM3/8/18
to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

From: Alex Elsayed <etern...@gmail.com>
Date: Thursday, 8 March 2018 at 16:58
To: RISC-V ISA Dev <isa...@groups.riscv.org>


Subject: Re: [isa-dev] Proposal to remove the need for binary translation for RV32G to RV64G

That sounds like a reasonable argument.

However since ABIs are software I am sure we can come up with a way to handle this without wasting loads of stack space.

Assuming now that RV128 is the final variant is shortsighted too.

 

 

 

Also, what about pointers? The 32-bit code might receive a pointer from

the 64-bit code; to work well for that case, the 32-bit code would need

to use 64-bit pointers on every data structure, once again wasting space

in a 32-bit processor.

No, 32bit code can only address 32bit space. So an RV64 CPU calling

32bit code must ensure the pointers fit in the ±2GB address space.

 

Except that the userspace RV64 code doing the calling _does not control this_

- this is controlled by `malloc`, which itself usually boils down to `mmap`,

which has _zero_ awareness of the code's intent to _eventually_ pass the

pointer to RV32 code.

 

 

Well I was initially referring to embedded code, where one has some stricter control over what goes where.

And again, there are other systems besides Linux.

It is not unknown for 32bit code to run on 64bit machines (other CPUs support this). I am sure they solved this particular issue.

 

 

In addition, Richard, your nonstandard quoting style is _intensely_

problematic. It is difficult to read (hard to tell where your comments end and

a subsequent piece you are responding to begins), nonstandard (and thus

requires much more conscious parsing, and is unsupported by mail clients'

display logic), and worst of all _does not list the name of the person you are

responding to_.

 

Sorry about that. I am fighting my email client. Too many accounts with all different requirements.

Is this better??

 

Richard

Alex Elsayed

unread,
Mar 8, 2018, 2:43:20 PM3/8/18
to RISC-V ISA Dev
The problem here is that code needs to know how much stack space a spilled
register will take up at _compile_ time. There are thus three options:

1. Fit the stack space to the target ISA (i.e. RV32). RV64 code calling RV32
code must be responsible for spilling any 64-bit values before the call. This
avoids wasting stack space.
2. Fit the stack space to the worst-case ISA (i.e. RV128 currently). RV64 code
can safely call RV32 code, which stores full registers blindly. This wastes
stack space.
3. Use relocations to patch the stack slot size at load, in many places, all
over the library.

Currently, RISC-V uses (1). It'd be _plausible_ to use (2), but would be a
significant stack-size hit on any machine except RV128. (3) would be very
burdensome on the loader, and would also absolutely destroy any chance of
sharing memory between processes for library code, since it'd always be
modified.

>>>> Also, what about pointers? The 32-bit code might receive a pointer from
>>>> the 64-bit code; to work well for that case, the 32-bit code would need
>>>> to use 64-bit pointers on every data structure, once again wasting space
>>>> in a 32-bit processor.
>>>
>>> No, 32bit code can only address 32bit space. So an RV64 CPU calling
>>> 32bit code must ensure the pointers fit in the ±2GB address space.
>>
>> Except that the userspace RV64 code doing the calling _does not control
>> this_ - this is controlled by `malloc`, which itself usually boils down to
>> `mmap`, which has _zero_ awareness of the code's intent to _eventually_
>> pass the pointer to RV32 code.
>
> Well I was initially referring to embedded code, where one has some stricter
> control over what goes where.
> And again, there are other systems besides Linux.
> It is not unknown for 32bit code to run on 64bit machines (other CPUs
> support this). I am sure they solved this particular issue.

Yes, it's not unknown for 32-bit code to run on 64-bit machines. You are
proposing something _completely_ different, though, which is for 32-bit code
to run inside of 64-bit _processes_. Basically nothing does this, and the
problems I describe are a large part of why.

Another large part of why is that the syscall ABI almost always differs
between the two, and the kernel knows which to use based on the process'
architecture. However, if a 32-bit library is loaded into a 64-bit program,
and both make syscalls, this becomes impossible.

What you are describing is unrealistic, and while there are ways to make it
work, they are overwhelmingly unlikely to be worthwhile. In the vanishingly
few cases where a 64-bit program _does_ have reason to call into 32-bit code,
treating all 64-bit registers as caller-saved is a pretty small price to pay.

>> In addition, Richard, your nonstandard quoting style is _intensely_
>> problematic. It is difficult to read (hard to tell where your comments end
>> and a subsequent piece you are responding to begins), nonstandard (and thus
>> requires much more conscious parsing, and is unsupported by mail clients'
>> display logic), and worst of all _does not list the name of the person you
>> are responding to_.
>>
>> Please, please use a more standard quoting style.
>
> Sorry about that. I am fighting my email client. Too many accounts with all
> different requirements.
> Is this better??

Somewhat; your emails are more legible now, but they don't format properly
when replied to because the plain-text part is malformed (among other issues).

Inspecting the headers, you seem to use Outlook Web Access - OWA is known to
have serious quoting issues (especially on mailing lists); using absolutely
any _desktop_ client to send email avoids the issue. For example, in Outlook,
this guide describes how to enable "internet-style quoting": https://
www.slipstick.com/outlook/email/to-use-internet-style-quoting/

Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.
signature.asc

Guy Lemieux

unread,
Mar 8, 2018, 4:25:03 PM3/8/18
to RISC-V ISA Dev
What about power consumption?

A CPU that is 64b or 128b capable will likely toggle the upper 32b/96b
of the data path needlessly when running 32b code, unless special
measures are taken in the microarchitecture.

Running only 32b processes to save power is a nice thought, but unrealistic.

On many 64b systems, you can't even compile 32b code (missing 32b
libraries or compiler). In the future, if there are savings to be had,
I can imagine structuring a program so most of the code runs in 32b
mode, with possibly some data access portions using 64b pointers. This
would save both power and memory (64b pointers are twice as big; 128b
pointers are far worse).

Can software jump in/out of 32b/64b mode on the fly to save power? Is
that a reasonable thing to expect? Or should we allow 32b/64b
instructions to freely intermingle at a fine grain? Or should the
microarchitecture somehow figure out which mode to use on the fly,
erring on the side of caution?

Guy

Jacob Bachmeyer

unread,
Mar 8, 2018, 7:44:23 PM3/8/18
to Alex Elsayed, RISC-V ISA Dev
Alex Elsayed wrote:
> On Wednesday, March 7, 2018 10:53:35 PM PST Andrew Waterman wrote:
>
>> - For Unixy systems, there is no legacy software base for RISC-V
>> (yet), so it is not even clear this will be a concern. And there
>> won't be a legacy RV32 software base for some time, because RV32 isn't
>> even supported in upstream Linux/glibc!
>>
> Yes, but the same argument applies to RV64/RV128.
>
> Just as history has shown that address space expansion converges on a flat
> address space with twice as many pointer bits (and the spec notes this, using
> it as justification for RV128 existing at all), it has also shown that legacy
> software _does_ rapidly become a differentiating factor, albeit not equally in
> all spaces. IA64 learned this the hard way.
>
> Companies and users buy proprietary software, and companies that _make_
> proprietary software are not-infrequently outlived by their users.
>
> An architecture that resulted in RV32 programs running unmodified on RV64
> would almost certainly also allow both to run unmodified on RV128, where the
> problem of legacy software will have had plenty of time to grow unchecked.
>

The UXL field solves this for general purpose systems. User-space
programs can be run with the base ISA that they expect, and all you need
are full sets of dynamic libraries (i.e. disk space to store the extra
libraries, which is cheap and (still) getting cheaper) for each base ISA
your CPU can run.


-- Jacob

Jacob Bachmeyer

unread,
Mar 8, 2018, 9:48:58 PM3/8/18
to Richard Herveille, Alex Elsayed, isa...@groups.riscv.org, Andrew Waterman
Richard Herveille wrote: [edited into Internet quoting style]
>
> Jacob Bachmeyer wrote:
>
>> Richard Herveille wrote:
>>
>>> Implementing the UXL field adds complexity, which is bad for
>>> embedded CPUs.
>>>
>> How much complexity does UXL add, given that RV64 is already supported?
>>
>>
>>
> Well,
>
> 1. there’s the registers with their encoding in CSR
> 2. muxes for illegal-opcode detection
> 3. muxes for the actual instruction execution (i.e. the ALU block)
>
> I don’t have an exact number and it won’t be a major contributor. But
> when every gate counts, this just seems a waste while it could have
> been avoided.
>

Item 1 is probably the most involved. Item 2 is easily handled by
producing an "illegal opcode in RV32 mode" signal and masking that while
in RV64 mode. Similarly, item 3 is easily handled by (internally)
making OP-32 fully orthogonal to OP and simply mapping OP to OP-32 while
in RV32 mode. The only MUXes needed are those that distinguish
execution of ADD/ADDW and the other similar pairs in RV64 and those are
needed anyway to implement RV64.

If every gate counts, why would you be using RV64? Surely RV32 would
be adequate? I would expect that hardware sufficient to need RV64 would
be far more complex than the incremental cost of implementing UXL.

>> Also, embedded systems are not running "just any" software, but a
>>
>> unified and known firmware image, so build-time translation (or simply
>>
>> "always compile for the actual processor you are targeting") is not
>>
>> unreasonable.
>>
>
>
> Yes and no. This same argument is used over and over.
>
> My counter-argument is 3^rd party libraries. Paying (sometimes a lot
> of money) for a library and not being able to use it when upgrading
> the processor (from RV32 to RV64) seems wrong and limits the use case
> for RISC-V.
>

In an embedded environment, why would you "upgrade" from RV32 to RV64?
Presumably there was some reason to use RV32 in the first place.

I think that the Stallman crowd may actually be right here -- these
kinds of problems are simply an inherent cost of using someone else's
proprietary software. The correct answer is to include such costs in
your budget estimates when you license a 3rd-party library.



>>> Besides, there are systems without HSU mode that might benefit from
>>> running RV32 code natively.
>>
>>
>> While slightly trickier, for this there is the MXL field in misa.
>>
>>
>>
> Isn’t that just a fudge?!
>
> IMO MXL, UXL are just kludges to fix the issue. Running RV32 natively
> on RV64 voids all of this.
>

Except that running RV32 code natively on RV64 is not actually possible,
since the register widths are different, which means that registers
spilled onto the stack take up different amounts of space on the stack.
The only real solution to that problem is hardware stack support, which
RISC-V explicitly eschews -- the stack is a software structure in RISC-V
and there are no PUSH and POP opcodes. Hardware stack operations are
omitted because they inhibit instruction-level parallelism.



>>> Not being able to execute RV32I(MA) natively is a mistake.
>>
>>
>> While I agree that it looks like a poor choice, I suspect that some
>>
>> other constraint drove that decision
>>
>>
>>
> I am pretty sure there are. But we’re trying to move the RISC-V
> architecture from the academic world into the commercial world.
> Different decisions matter.
>

If we want to talk about commercial practicalities, MXL/SXL/UXL are
essentially the RISC-V equivalent to how x86 handles multiple ISA
widths: put a field in a register somewhere that defines the current
ISA width. (The x86 architecture uses the hidden segment register for
the code segment to store this value, ever since the 80386 needed to
support both 16-bit and 32-bit code. AMD64 uses a previously-reserved
bit in the segment descriptor to indicate 64-bit code segments.)



-- Jacob

Richard Herveille

unread,
Mar 9, 2018, 12:23:40 AM3/9/18
to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

On 08/03/2018, 20:43, "Alex Elsayed" <etern...@gmail.com> wrote:

 

 

What you are describing is unrealistic, and while there are ways to make it

work, they are overwhelmingly unlikely to be worthwhile. In the vanishingly

few cases where a 64-bit program _does_ have reason to call into 32-bit code,

treating all 64-bit registers as caller-saved is a pretty small price to pay.

 

Ok. From a HW point of view is sounded easy (easier). But the SW side makes this unpractical. I guess we can put this to rest. I learned a lot here!

Anyways, the initial thread was about running RV32 programs natively in RV64. Let’s focus the discussion on that argument again.

 

 

Please, please use a more standard quoting style.

 

Sorry about that. I am fighting my email client. Too many accounts with all

different requirements.

Is this better??

 

Somewhat; your emails are more legible now, but they don't format properly

when replied to because the plain-text part is malformed (among other issues).

 

Inspecting the headers, you seem to use Outlook Web Access - OWA is known to

have serious quoting issues (especially on mailing lists);  using absolutely

any _desktop_ client to send email avoids the issue. For example, in Outlook,

this guide describes how to enable "internet-style quoting": https://

 

Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.

 

Thanks for the pointer. Unfortunately I am forced to use Outlook for Mac. And guess what …. It doesn’t support a prefix.

The indent is the best I can do. I wish I could still use iMail, but each time M$ updates their API either iMail or iCal gets stuck

 

Thanks,

Alex Elsayed

unread,
Mar 9, 2018, 12:42:36 AM3/9/18
to RISC-V ISA Dev
On Thursday, March 8, 2018 9:23:34 PM PST Richard Herveille wrote:
> On 08/03/2018, 20:43, "Alex Elsayed" <etern...@gmail.com> wrote:

<snip>

>> Somewhat; your emails are more legible now, but they don't format properly
>> when replied to because the plain-text part is malformed (among other
>> issues).
>>
>> Inspecting the headers, you seem to use Outlook Web Access - OWA is known
>> to have serious quoting issues (especially on mailing lists); using
>> absolutely any _desktop_ client to send email avoids the issue. For
>> example, in Outlook,
>>
>> this guide describes how to enable "internet-style quoting":
>> https://www.slipstick.com/outlook/email/to-use-internet-style-quoting/
>>
>> Other clients, such as Thunderbird, KMail, or Apple Mail, do so by default.
>
> Thanks for the pointer. Unfortunately I am forced to use Outlook for Mac.
> And guess what …. It doesn’t support a prefix.
>
> The indent is the best I can do. I wish I could still use iMail, but each
> time M$ updates their API either iMail or iCal gets stuck ☹

<snip>

That's very odd; the link I gave includes instructions for Outlook for Mac
2016. Did a subsequent version remove the settings it describes?

Namely, Preferences -> Composing -> Replies and Forwards -> Indent each line
of the original message
signature.asc

Richard Herveille

unread,
Mar 9, 2018, 12:46:00 AM3/9/18
to Alex Elsayed, RISC-V ISA Dev, Richard Herveille

of the original message

 

That option is still there and that is what I enabled. This email is what it ends up looking like …

 

Richard

 

Rogier Brussee

unread,
Mar 9, 2018, 12:05:19 PM3/9/18
to RISC-V ISA Dev


Op donderdag 8 maart 2018 20:43:20 UTC+1 schreef Alex Elsayed:
On Thursday, March 8, 2018 8:12:25 AM PST Richard Herveille wrote:
> From: Alex Elsayed <etern...@gmail.com>
> Date: Thursday, 8 March 2018 at 16:58
> To: RISC-V ISA Dev <isa...@groups.riscv.org>
> Subject: Re: [isa-dev] Proposal to remove the need for binary translation
> for RV32G to RV64G
>


[snip]
 
The problem here is that code needs to know how much stack space a spilled
register will take up at _compile_ time. There are thus three options:

1. Fit the stack space to the target ISA (i.e. RV32). RV64 code calling RV32
code must be responsible for spilling any 64-bit values before the call. This
avoids wasting stack space.
2. Fit the stack space to the worst-case ISA (i.e. RV128 currently). RV64 code
can safely call RV32 code, which stores full registers blindly. This wastes
stack space.
3. Use relocations to patch the stack slot size at load, in many places, all
over the library.



4. Introduce instructions that allow efficiently dealing with units of XLEN bits e.g. with  xadd / xaddi  instructions that do something like

xadd    rd rs1 rs2   : rd <- rs1 + rs2 <<log2(XLEN/8)
xaddi   rd rs1 imm12 rd <- rs1 + sext(imm12, XLEN) << log2(XLEN/8)

and use that in bitsize portable libs. 
 

lkcl .

unread,
Mar 9, 2018, 1:02:38 PM3/9/18
to Richard Herveille, Bruce Hoult, Alex Elsayed, RISC-V ISA Dev
On Mon, Mar 5, 2018 at 9:04 AM, Richard Herveille <richard.herveille@roalogic.com> wrote:

Not being able to execute RV32 code natively in RV64 (and RV128 for that matter) is a big drawback.

I’ve voiced my concerns about this from the beginning. I am mostly concerned about legacy software that will emerge.


 the debian ia32 multiarch port was specifically added because, despite the limitation of only being able to address 32 bit memory spaces, using 32-bit x86 instructions on a *64-bit* x86 system was found to give massive memory usage reductions (30%) as well as modest efficiency gains:


that page shows clearly how to create such a strange (working) hybrid, including detection of X32

the problem will be (possibly addressing tommy's concern) is if there *is* no discernable difference between RV32 and RV64 instructions that a run-time execution engine (RISC-V core)  can detect.  i.e. if the exact same assembly-code instruction is utilised, how the heck at any given clock-cycle can the CPU tell if it is to treat the operands as 32-bit or whether to treat them as 64-bit [or 128]?

what i think you are saying, tommy, is that RV32 apps would be executed in an [emulated] RV32 mode on an RV64 [or RV128] processor.

what specifically distinguishes that scenario from the one that i believe richard would like to see (as outlined in the debian x32 port), RV32 instructions are **MIXED IN** with RV64 instructions **DIRECTLY IN THE SAME EXECUTABLE**.

for that to work, you would either need to ensure that [identical assembly-level binary-codes for] RV32 instructions may be easily distinguished from RV64 instructions... *or* that registers are specifically tagged as being either 32-bit [or 64-bit... or 128-bit] and thus *imply* that the operations should be directed to a 32-bit ALU as opposed to a 64-bit one.

both of these, i suspect, would be *massive* disruptive architectural changes and require a heck of a lot of work to analyse.  still, a 30% benefit (code size reduction) if it could also be achieved with RISC-V as it has been with x86, that would be... enormous.  the only thing is: x86 is so different from RISC-V (much larger numbers of registers being one of them) that it's not really possible to say that the gains from x86 will definitely be replicated.

are RV32 instructions more efficiently packed than RV64 ones, particularly on explicit operands (by value)?  does Compression mean that the same program when compiled for RV64 is as efficiently stored as tne exact same RV32 one?  such things strike me as being important to know.

lkcl .

unread,
Mar 9, 2018, 1:06:29 PM3/9/18