Proposal: Xcondensed, an alternative for Compact that can be used as 16 bit standalone G-ISA

641 views
Skip to first unread message

Rogier Brussee

unread,
Oct 6, 2016, 12:44:53 PM10/6/16
to RISC-V ISA Dev
The condensed extension (Xcondensed) is an alternative for the compact extension using the three quadrants reserved for a 16 bit encoding . Like the C extension it can be used in combination with thefull >= 32 bit instructions ISA. Unlike the standard C extension it can be used as a stand alone instruction set, in fact containg a condensed version of the full G instruction set.  By design It contains most of the C extension unaltered, the part of the C instruction responsible for >90% of the savings, so that the performance characteristics should be very similar to the standard  C extension (possibly even improved).   Also by design, like for the C extension, every instruction translates to a single 32 bit instructions .  Finally by design it tries to stay as close as possible to the original ISA, being a mostly mechanical excercise in translating one in the other once the proper tradeoff is chosen where to be stingy with the encoding  to sacrifice immediate versions of instructions, restricting use of registers to only the 8 most common ones and range of immediates (this is the hard part). The part that deviates most from the original ISA (memory mapped CSR's and privileged instructions) is also the most messy part.   
Rationale
* If the 16 bit instruction is a standalone instructionset an implementation can choose to use a simpeler fixed length ISA.  It should also be simpeler than the full ISA together with the C extension. The combination should be more performant  but the question becomes wheter it is worth the extra complexity. 
* Allows fused ops to be composed with essentially all basic instructions from condensed ones. E.g fusing Xc.auipc_ra with Xc.jalr_ra immediately gives a 22 bit version of JAL in just 32 bit, and likewise a 32 bit JAL in 48 bit fusing auipc ra  with Xc.auipc_ra  .
* VLIW  with bundles of 16 bit instructions seems a bit more reasonable (although such an architecture probably still wants the full ISA)
* A RV16 architecture drops out if somebody needs it. 
* To explore what can be done if one is stingy in the encoding of the C extension.  It was when dabbling how a  RV64IM could be accomodated as a strict superset of the RV32IM  I realised there was room for a condensed version of the amo's and that there was just enough room for a basic but complete set of floating point instructions. 
Of course something has to give. 
* Compared to the C extension Some of the instructions that give the least benefit are replaced by versions that use 3 bit registers (i.e. the 8 used in the  C extension)  and immediates are reduced to 5 bit. The support for floating point load and store is greatly reduced. 
* Only 2 registers or 1 register and an immediate version (like the compact encoding)
* Although like the C extension the Xcondensed extension can be used by only teaching the assembler in combination with the full ISA, it probably needs compiler support if used as a standalone instructionset. As a stand alone instuctionset it will need compiler support. However it is  very much a instructionset in the RV family and it should be relatively easy to add. 
Compared to the 32 bit encoding, there are less amo's  and the fused floating point operations are not supported
* CSR's are supposed to be memory mapped. This more or less forces one to have amo's to have an essentially equivalent semantics. 
* Similar complexity as the C extension, but more instructions will inevitably be more complex than fewer.
* RV128 should "just work" because the lx/sx instructions are aligned to xlen and load xlen bits of memory, i.e. they work as lq/sq.  That should work well with a LP128 memory model. However even on 128 bit machines one might use LP64 loading 64 bits at a time may be more useful. In any case if you can afffored 128 bit you can afford the full ISA (and there is still a little room in the ISA)
*RV16 just drops out of the ISA. No special provision is made and I didn't think it through very carefully.  Not sure if it is even useful and just using 3/4 of the encoding space for an architecture that is obviously restricted seems inefficient, although such an architecture could not reasonably use a 32 bit ISA so could use the first quadrant for other purposes without portability worries.


This spreadsheet lists all instructions their encoding and how they map to 32 bit ISA. 
https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing


Below list the IM64 instructions. 


instruction R1 R2    imm      semantics


lxsp     5rd*  ___     6imm     l[hwdq] rd (imm<<x)(sp)

lwsp    5rd*  ___     6imm     lw rd (imm<<2)(sp)

lx         3rd     3rs1  5imm     l[hwdq] rd (imm<<x)(rs1)

lw        3rd     3rs1  5imm     lw rd (imm<<2)(rs1)

l.b       3rd     3rs1 ____       lb rd 0(rs1)

l.h       3rd     3rs1 ____       lh rd 0(rs1)

l.bu     3rd     3rs1 ____       lbu rd 0(rs1)

l.hu     3rd     3rs1 ____       lhu rd 0(rs1)

l.wu    3rd     3rs1 ____       lwu rd 0(rs1)


sxsp    5rs1* ___     6imm    s[hwdq] rs1 (imm<<x)(sp)

swsp   5rs1* ___     6imm    sw rs1 (imm<<2)(sp)

sx        3rs1   3rs2  5imm     s[hwdq] rs1  (imm<<x)(rs2)

sw       3rs1   3rs2  5imm     sw rs1 (imm<<2)(rs2)

s.b      3rs1   3rs2 ____       sb rs1 0(rs2)

s.h      3rs1   3rs2 ____       sh rs1 0(rs2)


li         5rd*   ___  6imm       li rd imm

lui       3rd    ___  5imm*      lui rd imm

auipc_ra     ___   11imm     auipc ra imm


addi   5rsd* ___     6imm*   addi  rsd  rsd imm

add6i 5rsd* ___     6imm*   addi rsd rsd imm<<6

add4i_sp     _ _     6imm*   addi sp sp imm<<4

addxi 3rsd  ___     5imm*   addi rsd rsd imm<<x 

axisp 3rd    ___     5imm*   addi rd sp  imm<<x

a7isp 3rd    ___     5imm*   addi rd  sp imm<<7

addi.w 3rsd ___    5imm     addiw rsd rsd imm


andi 3rsd   ___      5imm**  andi rsd rsd imm


slli   5srd* ___       6imm*   slli rsd rsd  imm

srli  3rsd   ___       5imm*   srli rsd rsd imm

srai 3rsd   ___       5imm*   srai rsd rsd imm

srli16 3rsd ___      ____      srli rsd rsd 16

srai16 3rsd ___     ____      sra rsd rsd 16


mv      5rd*   5rs2* ____      add rd zero rs2


add     5rsd* 5rs2* ____      add rsd rsd rs2

sub     3rsd  3rs2   ____      sub rsd rsd rs2

neg     5rsd* ___   ____      sub rsd zero  rsd

add.w 3rsd 3rs2 ____         addw rsd rsd rs2

sub.w 3rsd 3rs2 ____         subw rsd rsd rs2


and     3rsd 3rs2 ____        and rsd rsd rs2

or        3rsd 3rs2 ____        or  rsd rsd rs2

xor      3rsd 3rs2 ____        xor rsd rsd rs2

not      3rd   3rs1 ____        xori rd rs1 -1


sll        3rsd 3rs2 ____       sll rsd rsd rs2

srl       3rsd 3rs2 ____       srl rsd rsd rs2

sra      3rsd 3rs2 ____       sra rsd rsd rs2

sll.w    3rsd 3rs2 ____       sllw rsd rsd rs2

srl.w   3rsd 3rs2 ____       srlw rsd rsd rs2

sra.w  3rsd 3rs2 ____       sra rsd rsd rs2


slt       3rsd 3rs2 ____       slt  rsd rsd rs2

sltu     3rsd 3rs2 ____       sltu rsd rsd rs2

seqz   3rd 3rs1  ____        sltui rd rs1 1

sltz     3rd 3rs1  ____        slt rd rs1 zero

slez    3rd 3rs1  ____        slti rd rs1 1


mult   3rsd 3rs2 ____       mult rsd rsd rs2

div     3rsd 3rs2 ____       div rsd rsd rs2

divu   3rsd 3rs2 ____       divu rsd rsd rs2

rem   3rsd 3rs2 ____       rem rsd rsd rs2

remu 3rsd 3rs2 ____       remu rsd rsd rs2

multh 3rsd 3rs2 ____       multh rsd rsd rs2

multhu 3rsd 3rs2 ____     multhu rsd rsd rs2

multhsu 3rsd 3rs2 ____   multhsu rsd rsd rs2

mult.w 3rsd 3rs2 ____     multw rsd rsd rs2

div.w 3rsd 3rs2 ____       divw rsd rsd rs2

divu.w 3rsd 3rs2 ____     divwu rsd rsd rs2

rem.w 3rsd 3rs2 ____     remw  rsd rsd rs2

remu.w 3rsd 3rs2 ____   remwu  rsd rsd rs2


beqz  3rs1 ___ 8imm*     beq rs1 zero imm<<1

bnez  3rs1 ___ 8imm*     bne  s1 zero imm<<1


j         ___ ___  11imm*   jal zero imm<<1

jal ___ ___       11imm*   jal ra imm<<1

jalr_ra_ra  _ _  11imm*    jalr ra ra imm<<1

jalr     3rd    3rs1 ____     jalr rd rs1 0x0

jr        5rs1 ____  ___      jalr zero rs1 0x0

jalr_ra 5rs1 __  ____       jalr ra rs1 0x0

ret        __ __ ___            jalr zero ra 0x0 and pop the return stack


illegal  __ __ __               illegal

uret     __ __ __               uret

sret     __ __ __               sret

hret     __ __ __               hret

mret    __ __ __               mret

sfence.vm_zero _ _ _      sfence.vm zero

sfence.vm_ra    _ _ _       sfence.vm x1

csrrw_sp_mscratch_sp _ _ _ csrrw sp mscratch sp

csrrw_sp_hscratch_sp _ _ _ csrrw sp hscratch sp

csrrw_sp_sscratch_sp _ _ _ csrrw sp sscratch sp
csrrw_sp_uscratch_sp _ _ _ csrrw sp uscratch sp

ebreak __ __ __              ebreak

nop      __ __ __              nop

ecall    __ __ __              ecall

fence.i __ __ __              fence.i

wfi       __ __ __              wfi

rdcycle_ra __ __ __       csrr x1 rdcycle

rdinstret_ra __ _ __       csrr x1rdtime

rdtime_ra    __ __ __     csrr x1 rdtime


fence.mem  __ __ 4imm fence --imm[3:2]-- imm[1:0]

fence.io     __ __   4imm fence imm[3:2] -- imm[1:0]--



Jacob Bachmeyer

unread,
Oct 7, 2016, 8:07:23 PM10/7/16
to Rogier Brussee, RISC-V ISA Dev
Rogier Brussee wrote:
> * A RV16 architecture drops out if somebody needs it.
> *RV16 just drops out of the ISA. No special provision is made and I
> didn't think it through very carefully. Not sure if it is even useful
> and just using 3/4 of the encoding space for an architecture that is
> obviously restricted seems inefficient, although such an architecture
> could not reasonably use a 32 bit ISA so could use the first quadrant
> for other purposes without portability worries.
>
>

An RV16EC ISA was one of the first ideas I had after realizing that
RV32I and RV64I use essentially equivalent instructions. (Note that
RV32E code can run unmodified on RV32I.) The idea that a larger RISC-V
system could run RV16 code for testing seemed like the icing on the
cake. Then I actually read the C extension spec and that quickly became
obviously infeasible.

Overall, I think that RV16E would need to be an independent base ISA,
possibly with only eight registers total and probably with separate code
and data address spaces, with code addressed in word-size units to make
better use of a 16-bit instruction address space. RV16E would only make
sense for extremely tiny embedded systems, on par with current 8-bit
microcontrollers. Floating point, privilege modes, AMOs, and possibly
even CSRs would all be omitted. A hardware FPU would be larger than an
entire RV16E core, so FP is out. Implementing privilege modes on a
16-bit MCU is a bit insane, so RV16E would only have M-mode. AMOs are
completely unneeded, because these types of MCUs are uniprocessor. The
various FENCE instructions are unneeded, because embedded systems this
small use SRAM for main memory.

The programming model ends up resembling an 8086, or possibly an AVR. I
admit some curiosity about just how small an RV16E core would be, but I
have yet to see a good use for 16-bit RISC-V.

-- Jacob

Rogier Brussee

unread,
Oct 8, 2016, 6:40:42 AM10/8/16
to RISC-V ISA Dev, rogier....@gmail.com, jcb6...@gmail.com


Op zaterdag 8 oktober 2016 02:07:23 UTC+2 schreef Jacob Bachmeyer:
I certainly think that Xcondensed used as a stand alone ISA is mostly useful for  RV32EI to RV32IMA class processors. RV32XcEI has 59 fixed length instructions including fences, and counting slli16 and sri16 as seperate instructions. I very much agree that a 16 bit processor would be very very minimal and would probably only be  RV16XcEI  with likewise 59 instructions and 16 registers.  

The point is that the C extension is really not all that far from allowing a 16 bit stand alone ISA that closely mimics the 32 bit ISA.  Once you do that  IM is easy and if you want to have memory mapped CSR's (which you want if you want CSR's because there is just no room for CSR's with immediate addressing in a 16 bit ISA) you are forced to also have (some) AMO's. The floating point instructions are mostly a joke, but they would be provide code compression and a standardised and complete RISCV like ISA for a simple fpu.  

On the other hand the goal of the C extension in making the combined16/32 bit ISA higher density should remain the primary goal and that means catering for higher end processors including 64 and (fictitious) 128 bit ones. By adopting by far the most important part of the C extension unaltered, making sure that the remaining instructions are functionally available (although in slightly less powerfull form sacrificing mostly load and store of floating point)  and by having more instructions that should combine well with the full ISA.  To save on encoding (and maybe to make 32/64 portable executables but that is probably a pipedream) it was useful to have instructions that are xlen based as much as possible. That conveniently gets rid of having to worry about 128 bit and gives you a RISCV isa for 16 bit processors where a 32 bit ISA would be horrible, for free. 

Jacob Bachmeyer

unread,
Oct 9, 2016, 11:02:36 PM10/9/16
to Rogier Brussee, RISC-V ISA Dev
> I certainly think that Xcondensed used /as a stand alone ISA/ is
> mostly useful for RV32EI to RV32IMA class processors. RV32XcEI has 59
> fixed length instructions including fences, and counting slli16 and
> sri16 as seperate instructions. I very much agree that a 16 bit
> processor would be very very minimal and would probably only be
> RV16XcEI with likewise 59 instructions and 16 registers.
>
> The point is that the C extension is really not all that far from
> allowing a 16 bit stand alone ISA that closely mimics the 32 bit ISA.
> Once you do that IM is easy and if you want to have memory mapped
> CSR's (which you want if you want CSR's because there is just no room
> for CSR's with immediate addressing in a 16 bit ISA) you are forced to
> also have (some) AMO's. The floating point instructions are mostly a
> joke, but they would be provide code compression and a standardised
> and complete RISCV like ISA for a simple fpu.

Another probable difference in an RV16E instruction set would be literal
immediate values--instructions that use an immediate would use the
following instruction word as a 16-bit immediate value.

The problem with adding extensions to RV16E is that the reduction in
area that justifies a 16-bit variant quickly gets eaten back up by the
extensions. The M extension (using software or microcode for division)
might be reasonable, but other extensions make very little or no sense
on a 16-bit core. RV32E itself was reduced from RV32I because it was
found that about half of a small RV32I core was the register file.

An RV16EM variant makes sense, but AFD do not. Hardware floating point
does not make sense for the same reason it is not available with
RV32E--an FPU is much larger than the area savings from the reduced base
ISA. AMOs do not make sense because RV16E would be intended for
embedded applications on par with 8-bit MCUs and these are all
uniprocessors with SRAM as main memory. AMOs solve race conditions, on
a uniprocessor microcontroller, what is there to race with?


-- Jacob

Andrew Waterman

unread,
Oct 9, 2016, 11:13:08 PM10/9/16
to Jacob Bachmeyer, Rogier Brussee, RISC-V ISA Dev
Reentrancy issues still exist on uniprocessors because of interrupts.
Disabling interrupts suffices to achieve atomicity in privileged code,
but is often undesirable.

Also, AMOs can be useful for twiddling bits in device registers with
only a single bus transaction. This effect is significant when the
bus is running much slower than the processor.

>
>
> -- Jacob
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/57FB04C8.9010004%40gmail.com.

Michael Clark

unread,
Oct 10, 2016, 1:18:30 AM10/10/16
to jcb6...@gmail.com, Rogier Brussee, RISC-V ISA Dev
Hi Jacob,

I had this DRAFT open on my home machine and hadn’t hit send yet.

What about same idea in the opposite direction?

A 64-bit encoding would allow the compiler to emit code in SSA form for deferred register scheduling.

A 32-bit base instruction requires:

- 2-bits for the length encoding
- 20 bits for 4 x 5 bit register slots
- 10 bits for opcode (funct5+opcode7-length2 base=111)

A 64-bit instruction with “expanded register encoding" requires:

- 7-bits for the length encoding
- 10-bits for opcode
- 3 bits ‘111’ for Base ISA in 64-bit
- 40 bits for 4 x 10 bit register slots
- 4 bits left over for some form of compiler metadata (branch target, address load, constant load, etc)

A 128-bit instruction with “expanded register encoding" requires:

- 10-bits for the length encoding
- 10-bits for opcode
- 3 bits ‘111’ for Base ISA in 128-bit
- 96 bits for 4 x 24 bit register slots
- 9-bits left over for some form of compiler metadata (branch target, address load, constant load, etc)

I wonder if the compiler would ever spill scalars with 1024 registers? arrays of course arrays have to go into a line/box/cube of RAM somewhere.

If one had a function that didn’t spill into more than ~16 million registers, and had constraints that all loops could be unrolled and conditional statements predicated, e.g. used RVI only instructions; then one could coax the compiler into producing something that can run on an FPGA.

Oh right. We need RV4I with “expanded registers” to compose something out of many small LUTS that have 4-bit widths, and then feed them into a place and route algorithm.

Would RV4I with ~16 million registers be useful? LOAD, STORE, FENCE and ECALL would be kind of tough

Michael.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/57F838B8.80402%40gmail.com.

Rogier Brussee

unread,
Oct 10, 2016, 3:25:57 AM10/10/16
to RISC-V ISA Dev, rogier....@gmail.com, jcb6...@gmail.com


Op maandag 10 oktober 2016 05:02:36 UTC+2 schreef Jacob Bachmeyer:
If that was not clear yet, I did _not_ intend Xcompact as an ISA for a hypothetical RV16, but as an alternative for CompactV1.9 (or if you like, a somewhat radical proposal for a CompactV2.0) that _can_ (not must or should) be used as a stand alone ISA. Just like the base ISA you would opt in for various extensions: M, A, S, D and U, S (or H) as privileged instructions. It also seems quite reasonable to _not_ opt in an extension, say S and D or memory mapped CSR's in the Xcompact ISA, yet have them in the full base ISA. The point of the exercise is to use an encoding that allows you to cram enough instructions in the 3/4 of the opcode space reserved for the C extension that you end up with a useful RV-like standalone 16 bit instruction set, but without sacrificing (too much of) the code compression properties of the C extension. Unfortunately, the current V1.9  proposal for Compact makes that impossible. In other words if Xcondensed (or some version thereof) were adopted it would make RV32EXc the minimal instruction set in the RV family. A hypothetical RV16EXc would simply be a bonus, and I certainly don't see a 16 bit processor adopting D. 

Incidentally the ISA is set up such that the preferred way to get at a <=17 bit immediate would be.  

lui a0 5imm_1; add6i a0 6imm_2; addi  a0 6imm_3

See my first post and
 
Message has been deleted

Rogier Brussee

unread,
Oct 10, 2016, 8:06:43 AM10/10/16
to RISC-V ISA Dev, jcb6...@gmail.com, rogier....@gmail.com

The Xcondensed-A could conceivably be reduced to just lr, sc, which is enough, for example, to run the MUSL libc.

However I don't know how that would work out with memory mapped CSR's (if these are needed) and as Andrew
Waterman wrote: bit twiddling device registers.  It seemed more generally useful to have the aq.rl versions of
amo.l, amo.s, amo.or, amo.and and amo.swap to "emulate" the csr instructions with memory mapped
csr's than to have specialised instructions that pass the CSR value in a register rather than as an immediate.
It would also violate the "every instruction maps to a an instruction in the base 32bit-ISA" constraint.
Having gone that road, leaving out amo.add seemed illogical.  Having this in both w and x = XLEN versions does
add up a bit in terms of encoding space but it fits. 

https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing
 

Ciao

Rogier

Op maandag 10 oktober 2016 05:13:08 UTC+2 schreef waterman:

Andrew Waterman

unread,
Oct 10, 2016, 2:24:25 PM10/10/16
to Rogier Brussee, RISC-V ISA Dev, Jacob Bachmeyer
On Mon, Oct 10, 2016 at 4:47 AM, Rogier Brussee
<rogier....@gmail.com> wrote:
> I think RIch Felker of MUSL fame, thought it was a mistake not to have at
> least lr, sc in the base instruction set, and the MUSL libc actually needs

Many people want the base ISA to be exactly what their application
demands, but that misses the point. Some applications obviously don't
need LR/SC, and in those cases mandating LR/SC is an undue burden.

It's perfectly reasonable for MUSL to require the A extension.

> it. Xcondensed-A might conceivably be reduced to just lr and sc as the bare
> minimum. However, I don't know how that would work out with memory mapped
> CSR's (If those are needed in the first place) or as you wrote, twiddling
> device registers. It seemed more generally useful to have the aq.rl
> versions of amo.l . amo.s, amo.or, amo.and, and amo.swap to "emulate" csr
> instructions with memory mapping, than having analogues of the csr
> instructions with the CSR passed in register instead of as an immediate. It
> would also obviously violate the "every instruction maps to a an instruction
> in the 32 bit ISA" constraint. Having gone that route, amo.add seemed like
> a natural addition. And it fits, see
>
> https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing
>
> Ciao
>
> Rogier
>
>
> Op maandag 10 oktober 2016 05:13:08 UTC+2 schreef waterman:
>>
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/c768288d-8f81-4d27-92ca-8c317780209c%40groups.riscv.org.

Jacob Bachmeyer

unread,
Oct 10, 2016, 6:56:01 PM10/10/16
to Andrew Waterman, Rogier Brussee, RISC-V ISA Dev
Andrew Waterman wrote:
> On Sun, Oct 9, 2016 at 8:02 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Another probable difference in an RV16E instruction set would be literal
>> immediate values--instructions that use an immediate would use the following
>> instruction word as a 16-bit immediate value.
>>
>> The problem with adding extensions to RV16E is that the reduction in area
>> that justifies a 16-bit variant quickly gets eaten back up by the
>> extensions. The M extension (using software or microcode for division)
>> might be reasonable, but other extensions make very little or no sense on a
>> 16-bit core. RV32E itself was reduced from RV32I because it was found that
>> about half of a small RV32I core was the register file.
>>
>> An RV16EM variant makes sense, but AFD do not. Hardware floating point does
>> not make sense for the same reason it is not available with RV32E--an FPU is
>> much larger than the area savings from the reduced base ISA. AMOs do not
>> make sense because RV16E would be intended for embedded applications on par
>> with 8-bit MCUs and these are all uniprocessors with SRAM as main memory.
>> AMOs solve race conditions, on a uniprocessor microcontroller, what is there
>> to race with?
>>
>
> Reentrancy issues still exist on uniprocessors because of interrupts.
> Disabling interrupts suffices to achieve atomicity in privileged code,
> but is often undesirable.
>

You are correct, but on an embedded microcontroller, predictable
interrupt latency is a desirable feature. One simple way to gain that
is for interrupts to "propagate down the pipeline" with a constant delay
between asserting an interrupt and taking the trap, which allows an
"interrupt barrier" instruction to be introduced. If no interrupt is
pending, interrupt-barrier is a simple NOP, but the next N instructions
are guaranteed to not be interrupted. If an interrupt *is* pending, at
any pipeline stage or countdown value, interrupt-barrier is a series of
NOP until the trap is taken.

> Also, AMOs can be useful for twiddling bits in device registers with
> only a single bus transaction. This effect is significant when the
> bus is running much slower than the processor.

This is a good reason to have AMOs even on a microcontroller. I stand
corrected.

-- Jacob

Xan Phung

unread,
Dec 12, 2016, 7:57:33 AM12/12/16
to RISC-V ISA Dev
Hi Rogier,

I thought the previous responses to your proposal got too bogged down in AMOs (when they were just one small element of your overall proposal).

Instead I found it very useful to compare the major (5 bit) opcode map in your spreadsheet to the RV64GC opcode map in Table 14.3 of the User Level spec (RV v2.1)

What it shows is your Xcondensed proposal contains a multiplicity of good ideas, which can each (in my opinion) should be considered individually for incorporation into standard "C" extension.  In fact, I hope RVC 1.9 won't get finalised too quickly and instead more time is spent examining the alternative choices you outline in Xcondensed.

Here are the ideas I like especially, and these should be key decisions to discuss & debate:

1(a).  Splitting RVC "MISC-ALU" opcode into `ALU register" and "ALU-immediate" opcodes.  The RVC "MISC-ALU" instruction format is quite messy, being a complex mix of immediate shifts and register based arithmetic instructions, each group having a different instruction sub-format.  Having "ALU" and "ALU-I" opcodes creates a much nicer and tidier instruction format organisation.

1(b).  "Paying" for the new ALU-I opcode by downgrading the "C.LUI" to a subopcode within "ALU-I". C.LUI is extremely rarely used, <0.1% of dynamic Linux or SPEC code according to Table 14.8 in the RV v2.1 spec.  I'd rather much have "ALU-I" with all the extra orthogonality & simplicity it creates, with an 8 register version of LUI as one of the eight immediate ALU operations offered by ALU-I.  This also more closely matches to uncompressed ISA distinction between OP-32 and OP-32IMM.

2.  Downgrading ADDI4SPN into an instruction within the ALU-I group, thereby freeing an extra major opcode.  This major opcode was then used (notionally, in my comparison) to add back the C.JAL instruction that was present in RV32C but lost in RV64C.  This improves the orthogonality of RV32C and RV64C, ie: the only difference between these two instruction sets are the load/stores rather than branch instructions.  C.JAL has slightly higher usage (0.59% vs 0.44%) so this is a choice that favours reduction of dynamic & static instruction count.

3.  Rebalancing the opcodes allocated to floating point load/stores versus floating point ops.  RV64GC has a massive 4 opcodes for C.FLD, C.FSD, C.FLDSP, C.FLSDP (versus none for FPU operations).  This choice was justified by analysis of SPEC2006 code only (as Linux kernel code does not have any floating point at all).  Even in SPEC2006, floating point is still only typically ~0.5% or less of instruction counts, althought FLD is more common at 1.6% pf dynamic instructions.  Is it wise to make such an extreme decision (biased towards load/store vs ops) based purely on optimising for a single benchmark set?  Admittedly, Xcondensed goes to the other extreme of downgrading FPU load/store to 1/4 of the opcode space of "load"/"store" major opcodes, ie: effectively 1/2 of one major opcode in total.  It shows though that FPU ops can fit nicely into one major opcode, so maybe a better balance can be achieved with two FPU load/store opcodes and 1-2 major opcodes for FPU operations (I'd argue one opcode should be reserved for decimal FP).

Rogier Brussee

unread,
Dec 12, 2016, 6:56:12 PM12/12/16
to RISC-V ISA Dev


Op maandag 12 december 2016 13:57:33 UTC+1 schreef Xan Phung:
Hi Rogier,

I thought the previous responses to your proposal got too bogged down in AMOs (when they were just one small element of your overall proposal).


I agree. The main thrust of the proposal is the IM part, although it is nice that there is room for the AMO part. 
 
Instead I found it very useful to compare the major (5 bit) opcode map in your spreadsheet to the RV64GC opcode map in Table 14.3 of the User Level spec (RV v2.1)

What it shows is your Xcondensed proposal contains a multiplicity of good ideas, which can each (in my opinion) should be considered individually for incorporation into standard "C" extension.  In fact, I hope RVC 1.9 won't get finalised too quickly and instead more time is spent examining the alternative choices you outline in Xcondensed.


Thanks :-)! 

In fact it is a tribute to how well designed the RV instruction is, as I was just trying to mimic the 32 bit wide basic RISCV ISA within the inevitable restrictions imposed by the restricted encoding space.  I am surprised though that the C extension gets so few comments despite the fact that it occupies 3/4 of the major opcode space and there is simply no room for having different variations.  
 
Here are the ideas I like especially, and these should be key decisions to discuss & debate:

1(a).  Splitting RVC "MISC-ALU" opcode into `ALU register" and "ALU-immediate" opcodes.  The RVC "MISC-ALU" instruction format is quite messy, being a complex mix of immediate shifts and register based arithmetic instructions, each group having a different instruction sub-format.  Having "ALU" and "ALU-I" opcodes creates a much nicer and tidier instruction format organisation.

That was indeed my starting point.

1(b).  "Paying" for the new ALU-I opcode by downgrading the "C.LUI" to a subopcode within "ALU-I". C.LUI is extremely rarely used, <0.1% of dynamic Linux or SPEC code according to Table 14.8 in the RV v2.1 spec.  I'd rather much have "ALU-I" with all the extra orthogonality & simplicity it creates, with an 8 register version of LUI as one of the eight immediate ALU operations offered by ALU-I.  This also more closely matches to uncompressed ISA distinction between OP-32 and OP-32IMM. 

I agree 
 
2.  Downgrading ADDI4SPN into an instruction within the ALU-I group, thereby freeing an extra major opcode.  This major opcode was then used (notionally, in my comparison) to add back the C.JAL instruction that was present in RV32C but lost in RV64C.  This improves the orthogonality of RV32C and RV64C, ie: the only difference between these two instruction sets are the load/stores rather than branch instructions.  C.JAL has slightly higher usage (0.59% vs 0.44%) so this is a choice that favours reduction of dynamic & static instruction count.


Actually what frees up room for JAL and makes the ISA 32/64 bit orthogonal is downgrading  addi.w from a 5rsd 6imm instruction to a 3rsd 5imm instruction for 64 bit processors. I think the small loss of range and the restriction to the 8 main registers is worth it, and of course it is made possible by having room in  ALU-I  (But clearly it is the overall opcode budget that counts  )

For a stand alone instruction set a JAL instruction with a range of ±1024 (16 bit) instructions seems not enough. Therefore I proposed the combination of Xc.auipc_ra and Xc.jalr_ra_ra  whose combination is effectively a JAL with ±2^22 (16 bit) instructions (i.e. freely in a 8 Mb space). This costs 2 major opcodes so it is costly, but for code compression it offers 4 times the range of a regular JAL instruction in 32 bit, and in combination with the regular auipc it gives a JAL instruction with the full range of   ±2Gbyte (i.e ± 1G (16 bit) instructions), so it may be worth it. 

 
3.  Rebalancing the opcodes allocated to floating point load/stores versus floating point ops.  RV64GC has a massive 4 opcodes for C.FLD, C.FSD, C.FLDSP, C.FLSDP (versus none for FPU operations).  This choice was justified by analysis of SPEC2006 code only (as Linux kernel code does not have any floating point at all).  Even in SPEC2006, floating point is still only typically ~0.5% or less of instruction counts, althought FLD is more common at 1.6% pf dynamic instructions.  Is it wise to make such an extreme decision (biased towards load/store vs ops) based purely on optimising for a single benchmark set?  Admittedly, Xcondensed goes to the other extreme of downgrading FPU load/store to 1/4 of the opcode space of "load"/"store" major opcodes, ie: effectively 1/2 of one major opcode in total.  It shows though that FPU ops can fit nicely into one major opcode, so maybe a better balance can be achieved with two FPU load/store opcodes and 1-2 major opcodes for FPU operations (I'd argue one opcode should be reserved for decimal FP).



Letting go of the floating point load/store instructions is just necessary to fit things in, especially if opcodes are not overloaded. But the idea is that to load 3 doubles from the the stack (say at an offset of 10 xlen words) you would use something like

Xc.axisp a5 10
Xc.l.fd0 fa0 (a5)
Xc.l.fd1 fa1 (a5)
Xc.l.fd2 fa2 (a5)

and something similar for stores while for the typical  inner product floating point computation you would have something like

Loop:
Xc.l.fd0 fa0 (a0)
Xc.l.fd0 fa1 (a1)
madd  fa2 fa0 fa1
Xc.addi a0 8
Xc.addi a1 8
bneq a0 a2 Loop

if you have the full isa available and something similar if you buy into the Xc version of  D. In any case for  loads and stores in long fp loops the important case is an immediate of 0, and many vectors in graphics are length 4 or shorter. 

As for the Xcondensed fp instructions, it is a mostly to show that doing basic floating point in one opcode is quite reasonable, but probably even more realistic, the right thing to do at the moment is to keep the opcode reserved.

Xan Phung

unread,
Dec 13, 2016, 2:52:29 AM12/13/16
to RISC-V ISA Dev

Hi (to anyone on RISC V Foundation):

Can anyone outline the process required for providing input into RISC V Foundation standards setting/review?

The fact (as pointed out by Rogier) that RV Compressed uses 3/4 of all opcode space was what me think RVC v1.9 should have more extensive analysis before it gets "frozen" as a v2.0 spec...

Very close and intense review of RVC v1.9 is therefore warranted, perhaps even more than uncompressed RV itself... not because RVC designers haven't done great work (they have!) but because RVC2.0 will be the critical step that will lock-away the vast portion (~90%) of the RISC V ISA that isn't yet frozen.

I believe the current RVC v1.9 is generally very high quality and the majority of it (but not all of it) is ready for "freezing".  It is highly optimised along one dimension, ie: for the perspective of providing compression for SPEC2006 code & Linux kernel code.  However, I think there is a very strong case that it can be improved along other dimensions (eg. robustness for general purpose computing & versatility across broader range of use-cases), **without** reducing the existing optimisation for SPEC2006/Linux.

(Wasn't a key lesson from the MIPS/SPARC era that over-optimising for a single implementation strategy or targeting for too narrow a set of use-cases is the reason for mistakes made in those architectures?)

A key example is the "C.ADDI4SPN" instruction. On it's own this instruction consumes ~2.5% of entire RISC V opcode space, ie: just 40 similar instructions will use up nearly the entire opcode space.  It requires it's own dedicated instruction format (not used by any other instruction).  Yet ADDI4SPN caters for (in my opinion) a specialised use case, that is not highly used.  (Even in the benchmark code that is it's optimisation target, it is only 0.07% of SPEC2006 dynamic count, and 0.05% of Linux dynamic count).

In comparison, C.JAL has higher usage than C.ADDI4SPN in non-SPEC benchmarks, of up to 0.59%, yet gets left out of RV64C.  Also, JAL has merit from the perspective of 16 bit instruction general purpose computing use cases (beyond just the compression use case).  Alternatively, use the extra RVC opcode taken from ADDI4SPN to divide the C.MISC-ALU instructions amongst two opcodes, "C.ALU register" and "C.ALU immediate".  This will tidy up what is currently a messy "mixed" opcode and create a much nicer & orthogonal set of integer ALU 16 bit instructions.

I also note that RVC v1.9 is only optimised for assembler stage 32->16bit compression, and there may be other compression opportunities if the whole compilation stage is redesigned for the 16 bit instructions - this may mean that having a nice orthogonal Compressed instruction set may make it a better target for compilers, and perhaps better compiler stage register allocation may mean the 32 register forms of the "C" instructions (see C.LUI/C.ADDIW below) provide even less marginal benefit beyond what 8 registers can provide.

In addition to C.ADDI4SPN, there are also other lower-usage opcodes like C.LUI, C.ADDIW, and floating point opcodes (up to 20% of entire opcode space, for just load/store!), which I think should also be reviewed.  (LUI/ADDIW can still be provided in an 8 register form within a new ALU-I opcode)

Anyway, in congrats to RISC V designers for their great progress to date!  But I hope they will look more closely at some ideas from Rogier's Xcondensed instruction set in the spreadsheet link below.

Michael Clark

unread,
Dec 13, 2016, 3:14:55 AM12/13/16
to Xan Phung, RISC-V ISA Dev

On 13 Dec 2016, at 8:52 PM, Xan Phung <xan....@gmail.com> wrote:

Anyway, in congrats to RISC V designers for their great progress to date!  But I hope they will look more closely at some ideas from Rogier's Xcondensed instruction set in the spreadsheet link below.


I’m not a member of the RISC-V Foundation but I just wanted to reiterate the positive sentiment of the great progress to date. I have a personal tendency to chomp at the bit (interested in the interrupts specification most recently which I understand is still forming). I am also always keen to see progress but I understand it takes time.  As far as I can tell the projects have progressed a long way in a short period of time.

Regards,
Michael.

Rogier Brussee

unread,
Dec 13, 2016, 4:18:15 AM12/13/16
to RISC-V ISA Dev
Dear Xan,

thank you very much for taking this up and your excellent comments on the Xcondensed proposal (aka proposal for Cv2.0). Sparking discussion in this direction was exactly what I hoped for. 

I completely share your sentiment, the designers did a very good job, but 75% of the opcode space is a very valuable resource and I believe there is room for improvement especially in the direction of providing a fixed width very basic 16 bit general purpose ISA that doubles as code compression.

Because this thread has been renamed (thanks Xan!) let me for future reference link once more to the spreadsheet of the Xcondensed proposal that implements the points Xan made.

Bruce Hoult

unread,
Dec 13, 2016, 4:37:49 AM12/13/16
to Xan Phung, RISC-V ISA Dev
Yes, I agree that the designers have done an excellent job! To have what is promising to be a viable open source ISA and hardware project is something very exciting to have at *all*, but to have such as nicely designed ISA as well ... fantastic.

I think there could be a real opportunity to get some big market share, especially with RV64, with ARM unaccountably not doing compressed instructions for 64 bit code.

Of course I have niggles and there are things I'd like to see in there that aren't. For example an option to negate one operand to AND/OR/XOR (like PPC and ARM64). Or bitfield extract/insert (again PPC and ARM64). Or conditional select. But those are all pretty minor things that can be done with two instructions instead of one with what *is* there. And none of these are very frequent is most code -- except maybe detecting integer overflow in safe languages can become quite a common operation, and can really use a negated operation. I've already mentioned compressed function prologue and epilogue, without having to pollute the ROM with lots of millicode. A push/pop multiple instruction might well use less area.

On the other hand, things like the ability to use all 32 registers in 2-address instructions in compressed code is fantastic. That means RVC doesn't suffer in code density compared to AVR on embedded controller code that uses lots of byte-valued variables in registers and may not use RAM at all.

I'm sure there are some tweaks that could be made to RVC -- there always will be -- but as it stands right now I'm already super excited!


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Andrew Waterman

unread,
Dec 13, 2016, 5:51:23 AM12/13/16
to Rogier Brussee, RISC-V ISA Dev


On Tuesday, December 13, 2016, Rogier Brussee <rogier....@gmail.com> wrote:
Dear Xan,

thank you very much for taking this up and your excellent comments on the Xcondensed proposal (aka proposal for Cv2.0). Sparking discussion in this direction was exactly what I hoped for. 

I completely share your sentiment, the designers did a very good job, but 75% of the opcode space is a very valuable resource and I believe there is room for improvement especially in the direction of providing a fixed width very basic 16 bit general purpose ISA that doubles as code compression.

This is an interesting design exercise, but a functional 16-bit ISA is an explicit non-goal of the C extension. Xcondensed remains the right name for this effort, because it is outside the scope of the standard RISC-V ISA, which mandates the presence of the 32-bit base.


Because this thread has been renamed (thanks Xan!) let me for future reference link once more to the spreadsheet of the Xcondensed proposal that implements the points Xan made.

https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing



Op dinsdag 13 december 2016 08:52:29 UTC+1 schreef Xan Phung:

Hi (to anyone on RISC V Foundation):

Can anyone outline the process required for providing input into RISC V Foundation standards setting/review?

The fact (as pointed out by Rogier) that RV Compressed uses 3/4 of all opcode space was what me think RVC v1.9 should have more extensive analysis before it gets "frozen" as a v2.0 spec...

Very close and intense review of RVC v1.9 is therefore warranted, perhaps even more than uncompressed RV itself... not because RVC designers haven't done great work (they have!) but because RVC2.0 will be the critical step that will lock-away the vast portion (~90%) of the RISC V ISA that isn't yet frozen.

I believe the current RVC v1.9 is generally very high quality and the majority of it (but not all of it) is ready for "freezing".  It is highly optimised along one dimension, ie: for the perspective of providing compression for SPEC2006 code & Linux kernel code.  However, I think there is a very strong case that it can be improved along other dimensions (eg. robustness for general purpose computing & versatility across broader range of use-cases), **without** reducing the existing optimisation for SPEC2006/Linux.

(Wasn't a key lesson from the MIPS/SPARC era that over-optimising for a single implementation strategy or targeting for too narrow a set of use-cases is the reason for mistakes made in those architectures?)

A key example is the "C.ADDI4SPN" instruction. On it's own this instruction consumes ~2.5% of entire RISC V opcode space, ie: just 40 similar instructions will use up nearly the entire opcode space.  It requires it's own dedicated instruction format (not used by any other instruction).  Yet ADDI4SPN caters for (in my opinion) a specialised use case, that is not highly used.  (Even in the benchmark code that is it's optimisation target, it is only 0.07% of SPEC2006 dynamic count, and 0.05% of Linux dynamic count).

In comparison, C.JAL has higher usage than C.ADDI4SPN in non-SPEC benchmarks, of up to 0.59%, yet gets left out of RV64C.  Also, JAL has merit from the perspective of 16 bit instruction general purpose computing use cases (beyond just the compression use case).  Alternatively, use the extra RVC opcode taken from ADDI4SPN to divide the C.MISC-ALU instructions amongst two opcodes, "C.ALU register" and "C.ALU immediate".  This will tidy up what is currently a messy "mixed" opcode and create a much nicer & orthogonal set of integer ALU 16 bit instructions.

I also note that RVC v1.9 is only optimised for assembler stage 32->16bit compression, and there may be other compression opportunities if the whole compilation stage is redesigned for the 16 bit instructions - this may mean that having a nice orthogonal Compressed instruction set may make it a better target for compilers, and perhaps better compiler stage register allocation may mean the 32 register forms of the "C" instructions (see C.LUI/C.ADDIW below) provide even less marginal benefit beyond what 8 registers can provide.

In addition to C.ADDI4SPN, there are also other lower-usage opcodes like C.LUI, C.ADDIW, and floating point opcodes (up to 20% of entire opcode space, for just load/store!), which I think should also be reviewed.  (LUI/ADDIW can still be provided in an 8 register form within a new ALU-I opcode)

Anyway, in congrats to RISC V designers for their great progress to date!  But I hope they will look more closely at some ideas from Rogier's Xcondensed instruction set in the spreadsheet link below.


 

On Friday, 7 October 2016 03:44:53 UTC+11, Rogier Brussee wrote:

> This spreadsheet lists all instructions their encoding and how they map to 32 bit ISA. 
https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing



--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Bruce Hoult

unread,
Dec 13, 2016, 6:42:37 AM12/13/16
to Andrew Waterman, Rogier Brussee, RISC-V ISA Dev
If nothing else, there's no way to fit two full sized register fields AND a reasonable jump offset into a 16 bit instruction, so compare-and-branch can only be 32 bit.

Replacing it with 16 bit instructions is pretty horrible. You can't even do a subtract and then compare-with-zero-and-branch unless you know the operands have a restricted range. If the operands aren't both in the common 8 registers then you need a move and then subtract. If overflow is a possibility then you need another four or so instructions to detect that.

I'm all for making as many instructions as possible be 16 bit in most code, but insisting that they *all* are is a losing proposition.

What successful pure 16 bit instruction sets have there been?

PDP11 and 68000 used extra words for offsets and immediates. Thumb isn't a stand-alone instruction set.

AVR is I think pure fixed length 16 bit instructions -- and 32 registers, load/store, 2-address arithmetic. It's quite pleasant if you're dealing with 8 bit data, and dont' need many pointers.

I think DG Nova may have been pure fixed length 16 bit instructions. But only four registers!

And both AVR and Nova use condition codes.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Rogier Brussee

unread,
Dec 13, 2016, 6:54:56 AM12/13/16
to RISC-V ISA Dev, rogier....@gmail.com
Dear Andrew,

thanks for your reaction. 

[snip]


I completely share your sentiment, the designers did a very good job, but 75% of the opcode space is a very valuable resource and I believe there is room for improvement especially in the direction of providing a fixed width very basic 16 bit general purpose ISA that doubles as code compression.

This is an interesting design exercise, but a functional 16-bit ISA is an explicit non-goal of the C extension. Xcondensed remains the right name for this effort, because it is outside the scope of the standard RISC-V ISA, which mandates the presence of the 32-bit base.

I know that a 16 bit ISA is a non goal of the C extension, but as it is, the C extension makes it impossible to have something like Xcondensed as an extension of C or even something that can co-exist with C. Something like a RV32EMXcondensed  that only has a fixed width 16 bit decoder seems a reasonable design point but you seem to suggest that you (and or the RISC-V foundation) explicitly don't want this to be RISC-V but something that is as best RISC-V inspired.  Is that a deliberate decision or just something that drops out of the Cv1.9 design?  

Rogier Brussee

unread,
Dec 13, 2016, 7:15:05 AM12/13/16
to RISC-V ISA Dev, and...@sifive.com, rogier....@gmail.com, br...@hoult.org
Hitachi SuperH, which has been rebooted as Jcore, has a 16 bit base  instruction set.



Op dinsdag 13 december 2016 12:42:37 UTC+1 schreef Bruce Hoult:

David PATTERSON

unread,
Dec 13, 2016, 12:31:26 PM12/13/16
to Rogier Brussee, RISC-V ISA Dev
I try and stay out of these discussions, but you may be missing a philosophically important point about RISC-V. 

As the only path forward for improved energy-cost-performance is domain-specific co-processors (DSC) given the end of Moore's Law and Dinard scaling, an important requirement for 21st ISAs is to PRESERVE opcode space for future DSCs.

20th-century ISA architects didn't have such a goal, so they used up virtually all the opcodes to have slightly larger branch or immediate fields or slightly better compression.

It would not be a wise long-term decision to abandon future expansibility for 5%-10% reduction in code size.

Dave

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Andrew Waterman

unread,
Dec 13, 2016, 2:23:20 PM12/13/16
to Rogier Brussee, RISC-V ISA Dev
Hi Rogier,

On Tue, Dec 13, 2016 at 3:54 AM, Rogier Brussee <rogier....@gmail.com> wrote:
Dear Andrew,

thanks for your reaction. 

[snip]

I completely share your sentiment, the designers did a very good job, but 75% of the opcode space is a very valuable resource and I believe there is room for improvement especially in the direction of providing a fixed width very basic 16 bit general purpose ISA that doubles as code compression.

This is an interesting design exercise, but a functional 16-bit ISA is an explicit non-goal of the C extension. Xcondensed remains the right name for this effort, because it is outside the scope of the standard RISC-V ISA, which mandates the presence of the 32-bit base.

I know that a 16 bit ISA is a non goal of the C extension, but as it is, the C extension makes it impossible to have something like Xcondensed as an extension of C or even something that can co-exist with C. Something like a RV32EMXcondensed  that only has a fixed width 16 bit decoder seems a reasonable design point but you seem to suggest that you (and or the RISC-V foundation) explicitly don't want this to be RISC-V but something that is as best RISC-V inspired.  Is that a deliberate decision or just something that drops out of the Cv1.9 design?  

I can't speak for the foundation at large, of course, but that does accurately summarize the opinion of the RISC-V instigators.

There are applications for which a purely 16-bit encoding is appropriate, but it really is a different beast.  To make it efficient, you'd make several ISA decisions quite a bit differently, some of which would follow from designing around the 16-register variant.  Designing the C extension to interoperate with both RV32I and the standalone 16-bit extension is inherently suboptimal for both.




Because this thread has been renamed (thanks Xan!) let me for future reference link once more to the spreadsheet of the Xcondensed proposal that implements the points Xan made.

https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing



Op dinsdag 13 december 2016 08:52:29 UTC+1 schreef Xan Phung:

Hi (to anyone on RISC V Foundation):

Can anyone outline the process required for providing input into RISC V Foundation standards setting/review?

The fact (as pointed out by Rogier) that RV Compressed uses 3/4 of all opcode space was what me think RVC v1.9 should have more extensive analysis before it gets "frozen" as a v2.0 spec...

Very close and intense review of RVC v1.9 is therefore warranted, perhaps even more than uncompressed RV itself... not because RVC designers haven't done great work (they have!) but because RVC2.0 will be the critical step that will lock-away the vast portion (~90%) of the RISC V ISA that isn't yet frozen.

I believe the current RVC v1.9 is generally very high quality and the majority of it (but not all of it) is ready for "freezing".  It is highly optimised along one dimension, ie: for the perspective of providing compression for SPEC2006 code & Linux kernel code.  However, I think there is a very strong case that it can be improved along other dimensions (eg. robustness for general purpose computing & versatility across broader range of use-cases), **without** reducing the existing optimisation for SPEC2006/Linux.

(Wasn't a key lesson from the MIPS/SPARC era that over-optimising for a single implementation strategy or targeting for too narrow a set of use-cases is the reason for mistakes made in those architectures?)

A key example is the "C.ADDI4SPN" instruction. On it's own this instruction consumes ~2.5% of entire RISC V opcode space, ie: just 40 similar instructions will use up nearly the entire opcode space.  It requires it's own dedicated instruction format (not used by any other instruction).  Yet ADDI4SPN caters for (in my opinion) a specialised use case, that is not highly used.  (Even in the benchmark code that is it's optimisation target, it is only 0.07% of SPEC2006 dynamic count, and 0.05% of Linux dynamic count).

In comparison, C.JAL has higher usage than C.ADDI4SPN in non-SPEC benchmarks, of up to 0.59%, yet gets left out of RV64C.  Also, JAL has merit from the perspective of 16 bit instruction general purpose computing use cases (beyond just the compression use case).  Alternatively, use the extra RVC opcode taken from ADDI4SPN to divide the C.MISC-ALU instructions amongst two opcodes, "C.ALU register" and "C.ALU immediate".  This will tidy up what is currently a messy "mixed" opcode and create a much nicer & orthogonal set of integer ALU 16 bit instructions.

I also note that RVC v1.9 is only optimised for assembler stage 32->16bit compression, and there may be other compression opportunities if the whole compilation stage is redesigned for the 16 bit instructions - this may mean that having a nice orthogonal Compressed instruction set may make it a better target for compilers, and perhaps better compiler stage register allocation may mean the 32 register forms of the "C" instructions (see C.LUI/C.ADDIW below) provide even less marginal benefit beyond what 8 registers can provide.

In addition to C.ADDI4SPN, there are also other lower-usage opcodes like C.LUI, C.ADDIW, and floating point opcodes (up to 20% of entire opcode space, for just load/store!), which I think should also be reviewed.  (LUI/ADDIW can still be provided in an 8 register form within a new ALU-I opcode)

Anyway, in congrats to RISC V designers for their great progress to date!  But I hope they will look more closely at some ideas from Rogier's Xcondensed instruction set in the spreadsheet link below.


 

On Friday, 7 October 2016 03:44:53 UTC+11, Rogier Brussee wrote:

> This spreadsheet lists all instructions their encoding and how they map to 32 bit ISA. 
https://docs.google.com/spreadsheets/d/1rray4sbhGarasDS6acnWyAlOjLvDqBXX3s1LrBLtFs8/edit?usp=sharing



--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Alex Bradbury

unread,
Dec 14, 2016, 8:36:18 AM12/14/16
to Xan Phung, RISC-V ISA Dev
On 13 December 2016 at 07:52, Xan Phung <xan....@gmail.com> wrote:
>
>
> Hi (to anyone on RISC V Foundation):
>
> Can anyone outline the process required for providing input into RISC V Foundation standards setting/review?

Posting comments to the isa-dev mailing list is the best way to go.
The proposed model for finalising new specifications includes an
explicit public comment period, but this process hasn't actually been
used yet.

Xan, Rogier and others who have been making compressed proposals -
thank you for sharing, I think these sort of contributions demonstrate
the value of open standards with development also taking place out in
the open.

> The fact (as pointed out by Rogier) that RV Compressed uses 3/4 of all opcode space was what me think RVC v1.9 should have more extensive analysis before it gets "frozen" as a v2.0 spec...
>
> Very close and intense review of RVC v1.9 is therefore warranted, perhaps even more than uncompressed RV itself... not because RVC designers haven't done great work (they have!) but because RVC2.0 will be the critical step that will lock-away the vast portion (~90%) of the RISC V ISA that isn't yet frozen.
>
> I believe the current RVC v1.9 is generally very high quality and the majority of it (but not all of it) is ready for "freezing". It is highly optimised along one dimension, ie: for the perspective of providing compression for SPEC2006 code & Linux kernel code. However, I think there is a very strong case that it can be improved along other dimensions (eg. robustness for general purpose computing & versatility across broader range of use-cases), **without** reducing the existing optimisation for SPEC2006/Linux.
>
> (Wasn't a key lesson from the MIPS/SPARC era that over-optimising for a single implementation strategy or targeting for too narrow a set of use-cases is the reason for mistakes made in those architectures?)

I think this is a good point. It also raises an important question -
is the C extension meant to be optimised for a specific workload? e.g.
the nebulously defined "embedded" space. My understanding is that it's
intended to be a win across all workloads (see e.g.
https://arxiv.org/pdf/1607.02318v1.pdf for the potential advantage
when paired with opcode fusion). With JVM RISC-V support in the works,
I think it would be valuable to analyse the application of the
compressed specification to instructions traces from the JIT. There is
also the opportunity for others to share numbers from their own
proprietary embedded workloads.

> I also note that RVC v1.9 is only optimised for assembler stage 32->16bit compression, and there may be other compression opportunities if the whole compilation stage is redesigned for the 16 bit instructions - this may mean that having a nice orthogonal Compressed instruction set may make it a better target for compilers, and perhaps better compiler stage register allocation may mean the 32 register forms of the "C" instructions (see C.LUI/C.ADDIW below) provide even less marginal benefit beyond what 8 registers can provide.

I think the fact every compressed instruction translates simply to a
standard RV32/RV64 instruction is a big advantage for implementation
complexity and verification, even if it potentially sacrifices some
opportunities for further optimisations.

Best,

Alex

Stefan O'Rear

unread,
Dec 14, 2016, 9:38:58 AM12/14/16
to Alex Bradbury, Xan Phung, RISC-V ISA Dev
On Wed, Dec 14, 2016 at 5:36 AM, Alex Bradbury <a...@asbradbury.org> wrote:
> I think this is a good point. It also raises an important question -
> is the C extension meant to be optimised for a specific workload? e.g.
> the nebulously defined "embedded" space. My understanding is that it's
> intended to be a win across all workloads (see e.g.
> https://arxiv.org/pdf/1607.02318v1.pdf for the potential advantage
> when paired with opcode fusion). With JVM RISC-V support in the works,
> I think it would be valuable to analyse the application of the
> compressed specification to instructions traces from the JIT. There is
> also the opportunity for others to share numbers from their own
> proprietary embedded workloads.

The biggest question for me in this thread is: do we intend to make
-mrvc the default in the future? Either in gcc itself, or at the
distribution level.

If I were to make a RV64G chip with an opcode-hungry custom
accelerator by removing the C extension as described in the user spec
§10.3 "Available 30-bit instruction encoding spaces", would I be able
to use standard precompiled software or should we just focus on making
poky as useful as possible?

-s

Alex Bradbury

unread,
Dec 14, 2016, 10:11:43 AM12/14/16
to Stefan O'Rear, Xan Phung, RISC-V ISA Dev
On 14 December 2016 at 14:38, Stefan O'Rear <sor...@gmail.com> wrote:
> On Wed, Dec 14, 2016 at 5:36 AM, Alex Bradbury <a...@asbradbury.org> wrote:
>> I think this is a good point. It also raises an important question -
>> is the C extension meant to be optimised for a specific workload? e.g.
>> the nebulously defined "embedded" space. My understanding is that it's
>> intended to be a win across all workloads (see e.g.
>> https://arxiv.org/pdf/1607.02318v1.pdf for the potential advantage
>> when paired with opcode fusion). With JVM RISC-V support in the works,
>> I think it would be valuable to analyse the application of the
>> compressed specification to instructions traces from the JIT. There is
>> also the opportunity for others to share numbers from their own
>> proprietary embedded workloads.
>
> The biggest question for me in this thread is: do we intend to make
> -mrvc the default in the future? Either in gcc itself, or at the
> distribution level.

I had a brief discussion about this with a few people at a previous
RISC-V workshop (along the lines of "if C is so good, why shouldn't it
be considered part of RV32G/RV64G"). The feeling was that the
implementation cost for various cores hadn't been fully determined,
and there's nothing stopping Linux distributions from standardising on
RV64GC. I personally feel that moving distros to RV64GC would be a
worthy goal. We know that for some workloads there's very heavy
pressure on the instruction cache (I expect the move to a 48KiB I$ in
the Cortex-A57 and A72 is motivated by more than just increased
associativity), and standardising system software on a format that
might allow implementers to ship an SoC with a small instruction cache
without sacrificing performance seems a useful thing to do.

> If I were to make a RV64G chip with an opcode-hungry custom
> accelerator by removing the C extension as described in the user spec
> §10.3 "Available 30-bit instruction encoding spaces", would I be able
> to use standard precompiled software or should we just focus on making
> poky as useful as possible?

That's a good point. My personal inclination is that if the benefits
of C are demonstrated for typical Linux workloads (for instance, in
allowing SoC designers to get away with smaller caches) and there is
sufficient uptake then it would be more valuable to the wider
ecosystem for distributions support C by default than to worry about
specialised use cases that might want to define non-standard
extensions.

Alex

Alex Bradbury

unread,
Dec 14, 2016, 10:20:35 AM12/14/16
to Stefan O'Rear, Xan Phung, RISC-V ISA Dev
And, as I've just been reminded, a 64KiB instruction cache on the A73!

Alex

Stefan O'Rear

unread,
Dec 14, 2016, 10:25:46 AM12/14/16
to Alex Bradbury, Xan Phung, RISC-V ISA Dev
Yeah, I do wonder sometimes about how full RV64GC is though, and
whether it's painting us into a corner. (The 50% proposal is at least
superficially appealing.)

I'd be happier about it if we had more implementation experience with
48-bit instructions. I don't know of anyone that supports them yet.

-s

Andrew Waterman

unread,
Dec 14, 2016, 11:36:06 AM12/14/16
to Stefan O'Rear, Alex Bradbury, Xan Phung, RISC-V ISA Dev


On Wednesday, December 14, 2016, Stefan O'Rear <sor...@gmail.com> wrote:
On Wed, Dec 14, 2016 at 7:11 AM, Alex Bradbury <a...@asbradbury.org> wrote:
> On 14 December 2016 at 14:38, Stefan O'Rear <sor...@gmail.com> wrote:
>> If I were to make a RV64G chip with an opcode-hungry custom
>> accelerator by removing the C extension as described in the user spec
>> §10.3 "Available 30-bit instruction encoding spaces", would I be able
>> to use standard precompiled software or should we just focus on making
>> poky as useful as possible?
>
> That's a good point. My personal inclination is that if the benefits
> of C are demonstrated for typical Linux workloads (for instance, in
> allowing SoC designers to get away with smaller caches) and there is
> sufficient uptake then it would be more valuable to the wider
> ecosystem for distributions support C by default than to worry about
> specialised use cases that might want to define non-standard
> extensions.

Yeah, I do wonder sometimes about how full RV64GC is though, and
whether it's painting us into a corner.  (The 50% proposal is at least
superficially appealing.)

There is some FUD going around the mailing list to this effect lately, and it's a little specious. The 16-bit encoding space is very tight, but still has room for dozens of instructions with e.g. two register operands and a tiny immediate.

In the GC space, I count room for about 80 more I-type or similar instructions, without even using the space reserved for RV128 and >32-bit instructions, which add about 48 more.  That is quite abundant: there are nowhere near 80 I-type instructions in the base plus extensions.

There is also room for thousands of R-type instructions.  Many instructions don't benefit from immediate variants, which consume the most opcode space.


I'd be happier about it if we had more implementation experience with
48-bit instructions.  I don't know of anyone that supports them yet.

Ask Intel ;)
 

-s


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Rogier Brussee

unread,
Dec 15, 2016, 6:11:48 AM12/15/16
to RISC-V ISA Dev, rogier....@gmail.com
dear Andrew and Dave,

Thanks a lot for your reaction. 

I am stil glad I posted my "design exercise" as it has sparked good discussions.

The point of the Xcondensed design exercise was to give a proof of concept that with only slightly different encodings and design choices there is a lot more than can be put in the 16 bit encoding space (an fixed width dense ISA within the RISCV ecosystem ) without (really) sacrificing on the compression or extensibility int >16bit wide ISA front.

Xan pushed that in the direction of freeing up a quadrant which may very well be a better direction and fits in very well with professor Pattersons remarks.



Op dinsdag 13 december 2016 20:23:20 UTC+1 schreef andrew:
Hi Rogier,

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0c7c856e-5b94-4bce-82bb-d8294d8ec537%40groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Rogier Brussee

unread,
Dec 15, 2016, 10:38:17 AM12/15/16
to RISC-V ISA Dev, rogier....@gmail.com
I posted the following reaction to Xan's 2 quadrant compact proposal:

Hi Xan,

you really seem to have run with the idea of rethinking Compact with Xcondensed as input,  thanks for that!  

I like your idea to free up a whole quadrant a lot. It actually makes Xcondensed possible as an extension of Compact (but see below on instructions of width >32) but that is not the main reason. Even though I can see that in theory using a variable length encoding there is an infinite amount of encoding space, using the 32 bit opcode space is nicer. I really agree with your point that it is prudent to not use up so much of the encoding space for so little extra benefit in RVC and the prudent thing to do is to keep it reserved while RV has a chance to mature. 

Maybe however, it is an idea to move the instructions of width > 32 to this opcode space.This has to be done ASAP because while this technically is breaking the v2.0 standard, as no one seems to actually use them yet, so it would not actually break anything yet

I will give technical comments inline below.

Yours,

Rogier



Op woensdag 14 december 2016 13:33:55 UTC+1 schreef Xan Phung:

At the moment, the RV64G/128G opcode space for 32 bit instructions is almost fully utilised (only 3 major opcodes = 9% are reserved for standard extensions beyond RV128G).  Also, the 16 bit opcode space will be ~95% utilised by RV Compressed v1.9.  So there is very little free opcode space left in RISC V once RVC is frozen.  This will constrict RISC V's future potential extension.

How about reducing RV Compressed's opcode allocation from 75% to 50% (ie: 16 opcodes)?  This would then double the opcode space available for 32 bit instructions from ~25% to ~50%, ie: 60 major opcodes instead of 28.

RVC's compression performance is only reduced slightly when restricted to 16 opcodes (eg. Linux is 30.5% size reduction instead of 31% size reduction, SPEC2006 is 24% reduction instead of ~25% reduction).

The full list of 16 opcodes for RV64C is shown in table below, as well as the static & dynamic compression stats (first 4 columns are cumulative total size reduction, next 4 columns are size reduction attributable to each individual instruction).

What's lost:
- ADDI4SPN
see below on a 5 bit immediate
- LWSP
- SWSP

LWSP and SWSP are rarely used for 64 bit processors but used a lot on 32 bit processors. That is one reason why I like having LX/LXSP, SX/SXSP that load/store XLEN width data  and LW/LWSP, SW/SWSP that store 32 bit wide data. However if one sticks with word and doubleword effectively the same thing can be achieved by reusing the same opcode for LWSP/SWSP  for 32 bit processor and LDSP/SDSP for 64 processor and dropping LWSP/SWSP. .  
 
- LUI gets reduced to 8 register format (instead of 32 registers) - becomes same format as other ALU-immediate operations like ANDI/right shifts
- 4 floating point memory ops (in the case of RV64).
- code is 1~2% bigger than for 23 opcode RVC proposal

This is under the assumption that the ALUI instruction opcode uses a 6 bit immediate staying as close as possible to the original Compact proposal. In my Xcondensed proposal, which is no more difficult to encode, I use a 5 bit immediate allowing 8 ALUI instructions. For reference:

aluiandi03rsd5imm**andi rsd rsd imm
addiaddxi13rsd5imm*addi rsd rsd imm<<xaddi has only three bits of range for additions to pointers if x = 3
aluisrli23rsd5imm*srli rsd rsd imm
aluisrai33rsd5imm*srai rsd rsd imm Is this one really worth it ????? Arithmetic shift right considered harmful.
lilui43rd5imm*lui rd imm
liaxisp53rd5imm*addi rd sp imm<<x(for naming, compare auipc = a12ipc)
lia7isp63rd5imm*addi rd sp imm<<7(for naming, compare auipc = a12ipc)
addiaddi.w73rsd5immaddiw rsd rsd immaddi.w rsd 0x0 is sign extension from 32 bit
03rsd0x0
13rsd0x0
aluisrli1623rsd0x0srli rsd rsd 16
aluisrai1633rsd0x0sra rsd rsd 16
43rd0x0
53rd0x0
63rd0x0
13rsd0b11111use for andi0xFFFFFFFF = "andi rsd rsd 0xFFFFFFFF"???

Here the immediates are assumed to be signed.



This of course means that the immediate has less range, but has advantages:

the axisp should be usable for most of the uses of ADDI4SPN for making temporaries on stack (say with C++'s  const& ) while the combination of  a7isp should allow to to access larger structs on stack in combination with lw and ld (or lx)

room for ADDI.W and therefore room for JAL 
while addiw is relatively often used, my hunch is that for the  addition of with proper constant integers (as opposed to offsets in address calculations) the distribution is much more skewed to small integers with -16 <= n < 16, <handwaving> and that a lot of that is in preparing arguments for function calls so forced to use the 3 bit registers (better name?) anyway</handwaving>. Most importantly, it frees up opcode space to allow JAL to be used for both 32 and 64 bit processors. In your table JAL has both static and dynamic frequencies in the 0.4 % range so this should more than make up for this loss.


room for ADDXI:

for address computations most additions are in units of XLEN/ BYTE_SIZE. Therefore effectively on 64 bit processors for address computations have only 3 bit of range. In fact according to table 14.7 in the RVC1.9 spec, you can see that ADDI is markedly less efficient for 64 bit even for the same benchmark (spec 2006 1.87% vs 1.19%) so this instruction gets you more range.

A more radical approach is to switch the roles of C.ADDI and C.ADDXI  i.e. have a C.ADDXI 5rsd 6imm  (--> addi rsd rsd imm<<XLEN)  and C.ADDI 3rsd 5imm (-->addi rsd rsd imm)

 
What's gained:
- 25% of opcode space becomes available for future extensions
- ALU-REGOPS are now in their own major opcode (separate from ALU-immediate major opcode) - tidies up a currently messy opcode in RVC v1.9
- ALU-REGOPS has 5 bit function field, up to 32 register to register operations are possible - if desired, can put in full set of multiply/divide/add/sub/logical as per Xcondensed

For the Xcondensed proposal I threw in an analogue of everything in the RVIM spec to make things as similar as possible. For compression you don't need that so it is probably wise to forget about the shifts and division and possibly the comparison operators and create extension space in the 16 bit opcodes. Multiplication might work well with instruction fusion though!
 




You have to find a place for C.NOP and C.BREAK and opcode 00000000 has to remain  illegal, but I don't think that is a problem. Certainly you don't have the messy problems with CSR's.

In Xcondensed I managed to tuck away fences in the SXSP with RS1 = zero. I don't really know if the approach to fences I took is valid but if the memory model requires lots of fences that may be something to think about.

Bruce Hoult

unread,
Jan 5, 2017, 10:39:11 AM1/5/17
to Rogier Brussee, RISC-V ISA Dev, Andrew Waterman
Yes. Like AVR and Nova (and unlike RISC-V) it uses condition codes to link a compare with a conditional branch (8 bit displacement = 256 instructions = 512 bytes)

Other characteristics for reference: 16 registers, load/store, 2 address arithmetic. Addressing is predecrement, postincrement, or 4 bit offset (multiplied by operand size) from register.

SH2A adds larger addressing and branch offsets, but in a 32-bit instruction.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
Reply all
Reply to author
Forward
0 new messages