Proposal: RV16E

1,736 views
Skip to first unread message

Cesar Eduardo Barros

unread,
Apr 1, 2018, 8:23:17 AM4/1/18
to RISC-V ISA Dev
The RISC-V instruction set architecture is available in a 32-bit
variant, for small microcontrolers, a 64-bit variant, for servers and
workstations, and a future-proof 128-bit variant. For even smaller
microcontrollers, there is a reduced 32-bit variant (RV32E) which omits
half of the register set. But what if you have a need for an even
smaller microcontroller?

I present here a proposal for a RISC-V variant that's even smaller than
RV32E, yet still usable. I call it RV16E, and going in the oposite
direction of RV128I, it extends RISC-V downwards, with sixteen 16-bit
integer registers.

That is, XLEN=16, and like RV32E, only x0-x15 are available. Immediates
are also 16-bit only: for instructions like LUI, AUIPC or jumps, the
immediate must be sign-extended before being encoded into the
instruction, otherwise it's an invalid instruction.

Going in the order the instructions are described in the manual:

- Registers x16-x31 are not available;
- For SLLI/SRLI/SRAI, the shamt field is reduced to 4 bits, the leftover
bit being always zero;
- For LUI/AUIPC, bits [31:16] of the immediate must be a copy of bit 15
of the immediate;
- For SLL/SRL/SRA, the shift amount is in the lower 4 bits of the register;
- For JAL, bits [20:16] of the offset must be a copy of bit 15 of the
offset;
- For LOAD/STORE, the available widths are only LB/SB, LH/SH, and LBU;
- Like with RV32E, counter instructions are optional, and floating point
not allowed.

Using compressed instructions with RV16E is clearly desirable, since for
instance C.LUI can replace nearly all uses of LUI. The RVC extension for
RV16E is based on RV32C, with the following modifications:

- C.LHSP replaces C.LWSP, and scales by 2 (imm is offset[5] and
offset[4:1|6])
- C.SHSP replaces C.SWSP, and scales by 2 (imm is offset[5:1|6])
- C.LH replaces C.LW, and scales by 2 (imm is offset[5:3] and offset[2:1])
- C.SH replaces C.SW, and scales by 2 (imm is offset[5:3] and offset[2:1])
- C.J, C.JAL, C.JR, C.JALR, C.BEQZ, C.BNEZ, C.LI stay the same
- C.LUI must have bits 17 and 16 of nzuimm idential to bit 15
- C.ADDI stays the same
- C.ADDI16SP is replaced by C.ADDI4SP (TODO: immediate encoding)
- C.ADDI4SPN is replaced by C.ADDI2SPN (TODO: immediate encoding)
- C.SLLI, C.SRLI, C.SRAI must have shamt[5] zero
- C.ANDI, integer register-register, illegal, C.NOP, C.EBREAK stay the same

The stack is aligned to 4 bytes, instead of 16 bytes. (TODO: check
immediate encodings)


The obvious disadvantage of RV16E is being able to address only 65536
bytes of memory, which has to be shared between the large 4-byte
instructions, data, and memory-mapped I/O. The traditional solution for
this is banking. I propose, therefore, a set of four BANKn CSRs, each
having up to 16 bits. The top two bits of the memory address would
select which CSR contains the bank number, while the lower 14 bits would
be the offset within the bank.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 10:53:38 AM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 1:23 PM, Cesar Eduardo Barros
<ces...@cesarb.eti.br> wrote:

> The obvious disadvantage of RV16E is being able to address only 65536 bytes
> of memory, which has to be shared between the large 4-byte instructions,
> data, and memory-mapped I/O. The traditional solution for this is banking. I
> propose, therefore, a set of four BANKn CSRs, each having up to 16 bits. The
> top two bits of the memory address would select which CSR contains the bank
> number, while the lower 14 bits would be the offset within the bank.

ha, very cool: i was just going to ask about this (but in the context
of RV32* as i was considering investigating adding NUMA RV32* cores to
operate along-side SMP RV64GC cores, for multimedia video and 3D
graphics processing).

what you propose with BANKn CSRs reminds me of the Z80. that had
memory banks that could (shock, gasp) address up to 1MB of RAM (!).
from what i remember of the Z80 it was a pain: half the 64k memory was
bank-addressable and the other half not, and i don't believe you could
address multiple banks at once. this in turn meant that if you wished
to operate on two sets of bank-addressed memory you simply couldn't:
you had to *copy* one bank into the bottom 32k and then change the
bank address to refer to the other, do the operations and then copy
the results *back* to the first bank.

total pain.

i note however that you are proposing 4 BANK addresses. 2^14=..
16384. so the addressable memory range would be 16384. and with 2
bits in the top selecting which CSR you could... yes! simultaneously
address 4 separate different areas in memory. smart. i like it.

however.... presumably the BANKn CSR would need to be 18 bits not 16
in order to address the full 2^32 memory range? otherwise the memory
range is limited to 2^30 = 1GB of memory, not 4GB. it might not make
sense in a traditional micro-controller environment however i am used
to some really weird architectures: 2D grids of 4-bit ALUs (a company
in bristol, UK), 1D strings of 1-bit and later 2-bit ALUs (Aspex
Microelectronic Array-String Processor: massively wide SIMD: one
processor with 4096 ALUs with 256 bits of content-addressable RAM *per
ALU*). also, eperantotech (*waves to Allen*) have 4096 RV32 cores,
they might well have considered 8192 or 16384 RV16 cores, perhaps
fitting into the same die area if they are really that much smaller,
who knows.

so with that in mind, Cesar, had you considered BANK0 applying to the
first memory-address (a read) and BANK1 applying to the stores, and so
on? i don't know if it's possible to issue 2 reads (or two writes) in
a single RISC-V instruction.

or, having the BANKn CSRs be 32-bit (would require 2 16-bit
instructions to set each, i realise) and be *added* to the load/store,
turning all instructions into *relative* addresses, what about that?
developers could then choose to set the lower 14 bits to zero and
choose not to issue the 2nd of the bank-setting instructions, thus
effectively being functionally-identical to the idea that you propose,
and save on one instruction... but the advantage is, relative
addressing would allow inter-bank boundaries to be crossed without
needing to mess about with extra manual memory copying [and detection
of when such boundaries occur. yuck!]

oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
instruction which would allow the top 2 bits (17 and 18) to be written
to at the time that the other 16 bits were being loaded? bit of a
hack that... :)

really like the idea of tiny cores, cesar. love to see the BANKn idea
added to RV32 as well in some fashion.

l.

Cesar Eduardo Barros

unread,
Apr 1, 2018, 2:49:49 PM4/1/18
to Luke Kenneth Casson Leighton, RISC-V ISA Dev
Em 01-04-2018 11:53, Luke Kenneth Casson Leighton escreveu:
> On Sun, Apr 1, 2018 at 1:23 PM, Cesar Eduardo Barros
> <ces...@cesarb.eti.br> wrote:
>
>> The obvious disadvantage of RV16E is being able to address only 65536 bytes
>> of memory, which has to be shared between the large 4-byte instructions,
>> data, and memory-mapped I/O. The traditional solution for this is banking. I
>> propose, therefore, a set of four BANKn CSRs, each having up to 16 bits. The
>> top two bits of the memory address would select which CSR contains the bank
>> number, while the lower 14 bits would be the offset within the bank.
>
> ha, very cool: i was just going to ask about this (but in the context
> of RV32* as i was considering investigating adding NUMA RV32* cores to
> operate along-side SMP RV64GC cores, for multimedia video and 3D
> graphics processing).

If you want to make RV32 access more memory, something like x86's PAE
(adding more levels to the page table) would be a better idea. Or, since
they are auxiliary cores, somethine like an IOMMU managed by the RV64
side, using the RV64 page table formats.

> what you propose with BANKn CSRs reminds me of the Z80. that had
> memory banks that could (shock, gasp) address up to 1MB of RAM (!).
> from what i remember of the Z80 it was a pain: half the 64k memory was
> bank-addressable and the other half not, and i don't believe you could
> address multiple banks at once. this in turn meant that if you wished
> to operate on two sets of bank-addressed memory you simply couldn't:
> you had to *copy* one bank into the bottom 32k and then change the
> bank address to refer to the other, do the operations and then copy
> the results *back* to the first bank.
>
> total pain.

Yes, it was directly inspired by the Z80.

> i note however that you are proposing 4 BANK addresses. 2^14=..
> 16384. so the addressable memory range would be 16384. and with 2
> bits in the top selecting which CSR you could... yes! simultaneously
> address 4 separate different areas in memory. smart. i like it.

I tried to balance the number of CSRs (which are a limited resource)
with the convenience of having multiple mappable ranges. Two would be
the minimum, I chose 4 to be more flexible.

> however.... presumably the BANKn CSR would need to be 18 bits not 16
> in order to address the full 2^32 memory range? otherwise the memory
> range is limited to 2^30 = 1GB of memory, not 4GB. it might not make
> sense in a traditional micro-controller environment however i am used
> to some really weird architectures: 2D grids of 4-bit ALUs (a company
> in bristol, UK), 1D strings of 1-bit and later 2-bit ALUs (Aspex
> Microelectronic Array-String Processor: massively wide SIMD: one
> processor with 4096 ALUs with 256 bits of content-addressable RAM *per
> ALU*). also, eperantotech (*waves to Allen*) have 4096 RV32 cores,
> they might well have considered 8192 or 16384 RV16 cores, perhaps
> fitting into the same die area if they are really that much smaller,
> who knows.

If XLEN is 16, each CSR can hold up to 16 bits. No exceptions. And 1GB
of memory ought to be enough for anybody ;-)

> so with that in mind, Cesar, had you considered BANK0 applying to the
> first memory-address (a read) and BANK1 applying to the stores, and so
> on? i don't know if it's possible to issue 2 reads (or two writes) in
> a single RISC-V instruction.
>
> or, having the BANKn CSRs be 32-bit (would require 2 16-bit
> instructions to set each, i realise) and be *added* to the load/store,
> turning all instructions into *relative* addresses, what about that?
> developers could then choose to set the lower 14 bits to zero and
> choose not to issue the 2nd of the bank-setting instructions, thus
> effectively being functionally-identical to the idea that you propose,
> and save on one instruction... but the advantage is, relative
> addressing would allow inter-bank boundaries to be crossed without
> needing to mess about with extra manual memory copying [and detection
> of when such boundaries occur. yuck!]
>
> oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
> instruction which would allow the top 2 bits (17 and 18) to be written
> to at the time that the other 16 bits were being loaded? bit of a
> hack that... :)

That would be too much complexity for a joke proposal (check the date).
A more serious proposal could use a separate BANK_TOP CSR to hold the
top bits of the BANKn registers. That would give 20-bit bank numbers,
which should be way beyond plenty.

> really like the idea of tiny cores, cesar. love to see the BANKn idea
> added to RV32 as well in some fashion.

I'm glad someone liked it ;-) Even though it's just an April 1st joke
proposal, I do believe the idea of "extending down" RISC-V into the
16-bit land might have some merit.

Christopher Celio

unread,
Apr 1, 2018, 3:01:40 PM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
I have to applaud you for resisting the urge to add condition codes to RV16 (as overflow would be much more likely now!). Otherwise, it would be much harder to implement RV16 in a higher-performance environment like say an out-of-order core.

-Chris

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0b7c0d18-a5e4-0696-fe0b-e3a25c07f8c7%40cesarb.eti.br.

Liviu Ionescu

unread,
Apr 1, 2018, 3:12:43 PM4/1/18
to Cesar Eduardo Barros, Luke Kenneth Casson Leighton, RISC-V ISA Dev
On 1 April 2018 at 21:49:49, Cesar Eduardo Barros (ces...@cesarb.eti.br) wrote:

> Even though it's just an April 1st joke
> proposal,

a nice one, I would say. ;-)

> I do believe the idea of "extending down" RISC-V into
> the
> 16-bit land might have some merit.

come on!

from a software point of view, the ideal embedded core would be a
64-bits one (if you don't believe this, take a look at the recommended
method to access 64-bits timer registers on a 32-bits core). not to
mention multiply/divide instructions, more consistent with double
floating point, etc.

hopefully the 16-bit land will be a thing of the past, and remain so.


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 4:36:11 PM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 7:49 PM, Cesar Eduardo Barros
<ces...@cesarb.eti.br> wrote:

> If you want to make RV32 access more memory, something like x86's PAE
> (adding more levels to the page table) would be a better idea.

oh ok. yes. nice.

> Or, since
> they are auxiliary cores, somethine like an IOMMU managed by the RV64 side,
> using the RV64 page table formats.

i was thinking along the lines of them being self-managing, so that
code synchronisation would not be needed.

> Yes, it was directly inspired by the Z80.

cool!

> If XLEN is 16, each CSR can hold up to 16 bits. No exceptions. And 1GB of
> memory ought to be enough for anybody ;-)

:)

>> oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
>> instruction which would allow the top 2 bits (17 and 18) to be written
>> to at the time that the other 16 bits were being loaded? bit of a
>> hack that... :)
>
>
> That would be too much complexity for a joke proposal (check the date).

awww! i was just warming to the idea!

> A
> more serious proposal could use a separate BANK_TOP CSR to hold the top bits
> of the BANKn registers. That would give 20-bit bank numbers, which should be
> way beyond plenty.

yes. something like that. that's what i meant about having 2 CSRs
per bank, i just expressed it badly enough for you not to be able to
recognise it as such.

>> really like the idea of tiny cores, cesar. love to see the BANKn idea
>> added to RV32 as well in some fashion.
>
>
> I'm glad someone liked it ;-) Even though it's just an April 1st joke
> proposal, I do believe the idea of "extending down" RISC-V into the 16-bit
> land might have some merit.

for student projects (easier to implement, smaller resource FPGAs),
for washing machine processors, SIM cards, and sub-micro-amp power
scenarios, hell yes. the STM8S003 for example is a 20-pin TSSOP, it's
$0.24 in quantity *ONE* even from digikey (so imagine what the volume
price is in Shenzhen), it has 256 bytes of RAM and i believe 1k of
NAND and it's *awesome*. gets used in microwaves, washing machines,
fridges, the works. how many of _those_ are sold world-wide? just
because RV32 and above are sexy and modern doesn't mean that RiSC-V as
a concept has to stop there. hell, 10 years ago i heard of a company
doing extremely well with an 8-bit fully-functioning processor that
only had *140 gates*.

also (liviu), as Aspex Microelectronics showed, when the processor
core is small enough such that it can be efficiently embedded as part
of a massively-replicable array that also has a small
Content-Addressable RAM in each element of the array, very very
interesting things start to become possible. certain applications
become literally a hundred times faster: pattern recognition, network
routing, video processing, neural networks and so on. unfortunately
they're also a couple orders of magnitude more of a bitch to program
but hey you can't have everything.

l.

Liviu Ionescu

unread,
Apr 1, 2018, 4:57:55 PM4/1/18
to Cesar Eduardo Barros, Luke Kenneth Casson Leighton, RISC-V ISA Dev
On 1 April 2018 at 23:36:11, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:

> > also (liviu), as Aspex Microelectronics showed, when the processor
> core is small enough such that it can be efficiently embedded
> as part
> of a massively-replicable array that also has a small
> Content-Addressable RAM in each element of the array, very very
> interesting things start to become possible. ... unfortunately
> they're also a couple orders of magnitude more of a bitch to program
> but hey you can't have everything.

I have nothing against experimentation and research.

however, personally I'm more interested in improving the quality of
life for the today software guys, in order to become more productive.
thus my proposal for a C/C++ friendly RISC-V microcontroller
architecture, where I accept to trade some transistors for ease of
use.


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 5:07:07 PM4/1/18
to Liviu Ionescu, Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 9:57 PM, Liviu Ionescu <i...@livius.net> wrote:

> I have nothing against experimentation and research.
>
> however, personally I'm more interested in improving the quality of
> life for the today software guys, in order to become more productive.
> thus my proposal for a C/C++ friendly RISC-V microcontroller
> architecture, where I accept to trade some transistors for ease of
> use.

totally get it, liviu. the embedded worlds (STM32F, STM8S, PICs)
which are absolutely enormous volumes (literally billions of units)
and the programs extremely small and often written in assembler or are
targets of sdcc or avr-utils (specialist subset c compilers) and the
general-purpose worlds (MIPS32/MIPS64, x86_64, ARM) are completely...
alien to each other. there's almost nothing in common.

l.

Tommy Murphy

unread,
Apr 1, 2018, 5:19:00 PM4/1/18
to RISC-V ISA Dev, lk...@lkcl.net, ces...@cesarb.eti.br
Ha ha - well played - caught me out anyway! :-)

Ray Van De Walker

unread,
Apr 2, 2018, 3:37:15 PM4/2/18
to RISC-V ISA Dev
I realize that this is an April-1st joke, but it is well-conceived, and sometimes there are not enough transistors.
For any newcomers, here is a link to an earlier set of RV16 proposals:
https://groups.google.com/a/groups.riscv.org/forum/#!searchin/isa-dev/16-bit/isa-dev/iK3enKGb5bw/cuVAq0J8EAAJ
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2a020397-f4b9-9edc-16e2-283ddac5b37b%40cesarb.eti.br.

Rogier Brussee

unread,
Apr 4, 2018, 10:27:43 AM4/4/18
to RISC-V ISA Dev, i...@livius.net, "; jc...@gmail.com, "; jhaus...@gmail.com, "; k...@prtime.org, "; micha...@mac.com, "; xan...@gmail.com
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec, John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data, Kelly Dean because he proposed ideas on binary portability between RV32 and RV64, Michael Clark because of RV8 JIT, and Xan Phung in recognition of his ideas on Xcondensed). 

Thanks for linking to the Xcondensed proposal. My proposal was not so much a proposal for RV16 as for an alternative for the C extension using 16bit wide instructions with a comparable code compression characteristics as the standard C extension when used in combination with the full 32bit wide instructions with the following properties:


* Every 16 bit wide instructions maps to a 32 bit wide RV instruction.
* Reuse the parts of C that give 98% of the compression of RV code.  
* Stand alone: just having the fixed 16bit with instructions gives a simple but complete ISA. Without the 32 bit wide ISA, it would _not_ be RV (even with the 32 bit wide ISA  Xcondensed would still be a non standard extension, so YMMV) but it is close enough that I think it could reuse most of the compiler infrastructure and experience with RV, simply because it is fairly straightforward translation to a two register form of RVIMA just like and mostly overlapping with the C extension.  I then remarked that such an ISA would be a natural choice for a hypothetical RV16.  I didn't really pursue this, although I thought it might be useful, but  Waterman and others thought RV16 was a non goal and an alternative to C was a bad idea because it was already done (although at the time C was still in review). In any case, this might have given the impression that Xcondensed was a proposal for RV16.

Xan Phung (https://groups.google.com/a/groups.riscv.org/forum/#!searchin/isa-dev/Xan$20Phung/isa-dev/YG80MMHMWis/y5TTkHcADwAJ) came up with a modification of the proposal that could be used for an alternative for C that uses only 2 quadrants and frees a quadrant for 32 bit instructions (which may well be the best use of the proposal). In particular he pointed out that register spilling is/could be done with XLEN sized load/store instructions, drastically reducing the need for L*SP and S*SP instructions.  

Targetting RVE (i.e. 16 registers)  and using ideas of Xan, makes things a little less cramped, even leaving room for a few CSR's. Note that this does not preclude using registers x16-x32 if the 32 bit isa is available: you just cannot use them with condensed instructions. Xcondensed should give an isa that should allow binaries that are efficiently binary compatible between RV32Xcondensed and RV64Xcondensed processors by simply using w instructions where int's are meant rather than pointers or long and throwing in the occasional sextw or zextw sign or zero extension that would be nops on RV32. I will indicate how I think it should be changed to get an RV16EXcondensed and portability between RV16EXcondensed /RV32EXcondensed (or 16/32 portability for short). 

A flat adres space is assumed including for the RV16 case! 

notation by example:

4rsd = registers x0-x15 used as rd and rs1

4rs2* = registers x1-x15 used as rs2

3rs1 = registers x8-x15  used as rs1 encoded as in the C extension

5imm = 5bit immediate.

7imm* = 7bit nonzero immediate

# = comment

#instruction = instruction that does not currently exist in RVIMA but might become one in B or small integer extension 

lx = lh for rv16 lw for rv32, ld for rv64 lq for rv 128 (or maps to a hypothetical lx instruction with an immediate in units of XLEN/8 byte). Also x is used to indicate shift over log2(XLEN/8)

@auipc zero imm: break 5imm =  sneak in the break instruction by reusing the li opcode with rsd = zero 

newline  = new 5bit opcode.


li         4rd*   5imm     addi rd zero sext(imm)

li_7     4rd*   5imm     addi rd zero sext(imm<<7)       

lui       4rd*   5imm     lui    rd sext(imm)                      #aka li_12

auipc  4rd*   5imm     auipc rd sext(imm)                    #aka ai_12pc


addi    4rsd* 7imm*     addi rsd rsd sext(imm, XLEN)


ai_xsp 4rd* 7imm*       add rd sp sext(imm<<7)   #for stack adjustment and pointers into stack. Assume stack is 2*XLEN/8 aligned. Alternatively define addi16sp  imm add sp sp imm<<4  (aka addi_4sp) for rd = sp


addwi     4rsd* 5imm    addwi rsd rsd sext(imm<<7)                                 #replace with addhi for 16/32 portability

addi_7    4rsd* 5imm*  addi rsd rsd sext(5imm << 7)

addi_x    4rsd* 5imm*  addi  rsd rsd sext(5imm << log2(XLEN/8)

addi_5x  4rsd* 5imm*  addi rsd rsd sext(5imm << (log2(XLEN/8)+5))


beqz   4rs1*  7imm*    beq rs1 zero imm


bnez   4rs1*  7imm*    bne rs1 zero imm


jalri     4rsd 7imm      jalr rsd rsd sext(imm)   #mainly useful for milicode: use li_7 t0  MILICODE_BASE  jalr t0 imm to call imm for a milicode call en  li_7 t1 MILICODE_BASE jalr t1 imm for a tailcall

 

lxsp    4rd*  7imm      l[h/w]  rd sext(imm<<log2(XLEN/8))(sp)


lx        3rd  3rs1 5imm  l[h/w] rd  zext(imm<<log2(XLEN/8))(rs1)


lw        3rd  3rs1 5imm  lh      rd  zext(imm<<1)(rs1)        #replace with lh 3rd 3rs1 5imm    for 16/32 portability


lbu      3rd  3rs1 3imm  lbu    rd  zext(imm)(rs1)              # replace all with lbu, 3rd 3rs1 5imm   for 16/32  portability

lh        3rd  3rs1 3imm  lh      rd  zext(imm<<1)(rs1)        # replace with ld for 64/128 portability

flw       3rd 3rs1 3imm  flw     rd  zext(imm<<2)(rs1)

fld       3rd 3rs1 3imm  fld      rd  zext(imm<<3)(rs1)


sxsp   4rs1* 7imm         s[h/w] rs1 sext(imm <<log2(XLEN/8))(sp) 


sx       3rd 3rs1 5imm  s[h/w]  rs1zext(imm<<log2(XLEN/8))(sp)


sw       3rd 3s1  5imm  sh       rs1 zext(imm<<1)(sp)          #replace with sh  3rs1 3rs2 5imm  for 16/32 portability


sb      3rs1  3rs2 3imm  lbu    rd  zext(imm)(rs1)                # replace with sb, 3rd 3rs1 5imm   for 16/32  portability

lh       3rs1  3rs1 3imm  lh      rd  zext(imm<<1)(rs1)          

fsw     3rs1  3rs2  3imm flw    rd  zext(imm<<2)(rs1)

fsd     3rd 3rs1 3imm     fld     rd  zext(imm<<3)(rs1)

  

auipc_ra   11imm        auipc ra sext(imm) for 32/64        


jalr_rara    11imm         jalr ra ra imm<<1                             #use in combination with auipc_ra.  Fusable to effectively jal ra 22imm


j               11imm*         jal zero sext(imm)


jal            11imm*         jal ra sext(imm)                       


andi        4rsd* 5imm    andi rsd rsd sext(imm)                     #imm == 0 and imm == -1 are both useless.

slli           4rsd* 5imm    slli rsd rsd   sext(imm)                     #imm ==0 encodes 32 for 32/64

srli          4rsd* 5imm     srli rsd rsd  sext(imm)                     #likewise

srai         4rsd* 5imm     srai rsd rsd sext(imm)                     #likewise


add           4rsd* 4rs2*   add rsd rsd rs2 

sub           4rsd* 4rs2*   sub rsd rsd rs2

addw        4rsd* 4rs2*   addw rsd rsd rs2                             #replace with addh for 16/32,  to follow the letter of the RV spec simply map to add instead of addw in RV32

subw        4rsd* 4rs2*   subw rsd rsd rs2                             #replace with addh for 16/32,  to follow the letter of the RV spec simply map to add instead of addw in RV32 

slt             4rsd* 4rs2    slt rsd rsd rs2

sltu           4rsd* 4rs2    sltu rsd rsd rs2

mv           4rd*  4rs1*    add rd rs1 zero

jalr           4rd   4rs1*    jalr rd rs1 0

                                                       

and          3rsd 3rs2    and rsd rsd rs2

or             3rsd 3rs2    or rsd rsd rs2

xor           3rsd 3rs2    xor rsd rsd rs2

#addh      3rsd 3rs2                                                                          #superfluous for 16/32

sll            3rsd 3rs2    sll rsd rsd rs2

srl            3rsd 3rs2    srl rsd rsd rs2

sra           3rsd 3rs2    sra rsd rsd rsd

#rll           3rsd 3rs2                                                                          #rs2 is taken mod log2(XLEN), therefore negative values rotate right.

mul          3rsd 3rs2    mul rsd rsd rs2

mulh        3rsd 3rs2    mulh rsd rsd rs2

mulhsu    3rsd 3rs2    mulhsu rsd rsd rs2

mulhu      3rsd 3rs2    mulhu rsd rsd rs2

div           3rsd 3rs2    div rsd rsd rs2

divu         3rsd 3rs2    div rsd rsd rs2

rem          3rsd 3rs2   rem rsd rsd rs2

remu        3rsd 3rs2   rem rsd rsd rs2

not           3rd 3s1      xori rd rs1 -1

sllx           3rd 3rs1     slli rd rs1  x 

#sextb     3rd 3rs1

#sexth     3rd 3rs1

#sextw    3rd 3rs1

#zextb     3rd 3rs1

#zexth     3rd 3rs1

#zextw    3rd 3rs1

#popc     3rd 3rs1

#clz        3rd 3rs1

#bswap  3rd 3rs1

   

lr       rd rs1        lr rd rs1

sc     rsd rs1       sc rsd rs1 rsd

lrw    rd rs1         lrw rsd rs1 rsd                                                   #for 16/32 portability just drop

scw   rsd rs2       scw rsd rs1 rsd                                                 #for 16/32 portability just drop

amoadd         3rsd 3rs1         amoadd.aqrl rsd rs1 rsd

amoaddw       3rsd 3rs1        amoaddw.aqrl rsd rs1 rsd              #for 16/32 portability just drop

amoswap       3rsd 3rs1         amoswap.aqrl rsd rs1 rsd         

amoand         3rsd 3rs1         amoand.aqrl rsd rs1 rsd

amoor            3rsd 3rs1         amoor.aqrl   rsd rs1 rsd

amoxor          3rsd 3rs1         amoxor.aqrl    rsd rs1 rsd

memadd        3rsd 3rs1         amoadd.        rsd rs1 rsd              #no ordering, but indivisible 

memaddw      3rsd 3rs1         amoaddw.     rsd rs1 rsd              #no ordering, but indivisible; for 16/32 portability just drop

memswap      3rsd 3rs1         amoswap.        rsd rs1 rsd           #no ordering, but indivisible

memand        3rsd 3rs1         amoand.        rsd rs1 rsd              #no ordering, but indivisible

memor           3rsd 3rs1         amoor.          rsd rs1 rsd               #no ordering, but indivisible

memxor         3rsd 3rs1         amoxor.         rsd rs1 rsd              #no ordering, but indivisible

csrrw              3rsd imm7      csrrw rsd rsd   map(imm7)            #mapping TBD                                

csrrs               3rsd imm7      csrrs  rsd rsd  map(imm7)     

csrrc               3rsd imm7      csrrc  rsd rsd  map(imm7)

csrr                 3rd  imm7      csrrc  rd zero map(imm7)



@li zero      0                :  designated illegal

@li zero      1                :  ecall

@li zero      1imm[3:0]  :  break 4imm                                         #different breaks are useful for hosted environments

@li_7 zero  0imm[3:0]  :  mfence 4imm   fence.imm0000       

@ll_7 zero  1imm[3:0]  :  iofence 4imm   fence 0000imm       

@lui   zero  0                :  ifence    

      


@addi zero 0               : designated nop
@beq  zero 0               : wfi 

Op maandag 2 april 2018 21:37:15 UTC+2 schreef ray.vandewalker:

Luke Kenneth Casson Leighton

unread,
Apr 4, 2018, 11:22:54 AM4/4/18
to Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, ", jcb...@gmail.com, ", jhaus...@gmail.com, ", k...@prtime.org, ", micha...@mac.com, ", xan...@gmail.com
On Wed, Apr 4, 2018 at 3:27 PM, Rogier Brussee <rogier....@gmail.com> wrote:
> (CC'ing Liviu Ionescu and Jacob Bachmeyer as something like this might be
> useful for the microcontroller spec, John Hauser because he proposed
> modifications for RV32E and RV32EC to better use halfword and bytesize
> data, Kelly Dean because he proposed ideas on binary portability between
> RV32 and RV64, Michael Clark because of RV8 JIT, and Xan Phung in
> recognition of his ideas on Xcondensed).
>
> Thanks for linking to the Xcondensed proposal. My proposal was not so much a
> proposal for RV16 as for an alternative for the C extension using 16bit wide
> instructions with a comparable code compression characteristics as the
> standard C extension when used in combination with the full 32bit wide
> instructions with the following properties:

if you (collectively) don't mind me throwing in a curveball, i've
been on these lists for only a couple of months so have been catching
up, and i've seen quite a fair share of proposals and questions about
support for 8 and 16-bit integer operations, as well as some 16-bit FP
justifications.

such operations appear to be misaligned (haha) sorry *ma*ligned,
perplexingly with the justificattion "the world is going 64-bit, we
tolerate 32-bit, why on earth would you want 16 and 8 bit arithmetic"
and as a result there is load and store with zero and sign-extend into
the *full* extent of the 32/64-bit registers.

where such operations (8 and 16 bit) make sense is when you perform
multiples of those in parallel (Vector/SIMD), or you need to be
bit-level manipulation. so let's take a look...

* B Extension: place-holder
* V-Extension: i love it for its power and potential: sadly it's so
complex and comprehensive and all-or-nothing that there only one
implementation, and that's not been published.

at this point we can quietly say to ourselves a single word:

"...oops".

to address the problem of V-Extension being too complex and
comprehsnsive, i raised the following topic / question a couple of
days ago:
https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/GuukrSjgBH8

borrowing from the V-Extension, the proposal basically boils down to a
single instruction: "please implicitly tag register N as being a
vector of length M, such that operations on register N implicitly
actually are carried out simultaneously across registers N through
N+M-1".

the basic assumption of this proposal was that it would be possible to
use the RV32 instructions to say "i'd like to do 32-bit vector
operations", and the RV64 instructions to say "i'd like to do 64-bit
vector operations", and i hadn't quite thought through how to do 8 and
16-bit operations completely.

i guess i kinda assumed that 16-bit operations were possible... but
was then shocked to find that they weren't *anywhere* in the spec: you
*have* to use the load/store with zero/sign-extend operations, and, in
a vector/SIMD world that's not going to fly as it wastes huge amounts
of space (and cycles).

with apologies at not quite being able to remember who it was (was it
richard herveille? i think it was you, wasn't it?), someone previously
raised the puzzling lack of symmetry in the instruction set: there is
an "impllcit-sized" add, a 32-bit add and a 64-bit add.

the implicit-sized operations *might* be the saving grace by which
it's possible to extricate from this hole, by borrowing (again) from
the V-Extension, by being able to say "please implicitly tag register
N as being of width M" where M is 8, 16, "original size" or
"future-reserved" (2 bits to store that). this would over-rule the
default "implicit-sized" operations to be of size M, indefinitely.

so in this way, rogier, there are a couple of possibilities:

(A) rather than add RV16 (and even RV8!) to the existing instruction
set, registers are "tagged" into a CSR with a size (exactly as is
proposed in V-Extension, right now). by setting the Vector length
equal to one, you have the means to use the "implicit-sized"
operations.

(B) you *still* add RV16 (and possibly even RV8) *not* so much because
someone might want to implement stand-alone 16-bit or 8-bit processors
(they might) but because those instructions would become *part of
RV32/64/128*.

so in the case that you describe, rogier, of condensed instructions in
the proposed RV16, they might not actually matter as much. to make
that clear: the counter-arguments against RV16 *did not take into
account* the fact that RV16 (or RV8) operations would be accessible to
RV32/64/128, and as such could reduce the burden of implementing a B
extension proposal, and also a simplified V extension proposal.

if Bit-wise operations are *forced* to be carried out on the full
(default) bit-width, how on earth do you do 16-bit rotate when it's
needed? or 8-bit rotate? it has to be *explicitly* coded into the
actual Bit-wise instruction, doesn't it?

and in some cases you *really cannot* do full (default, 32/64/128
bit-wise operations). for example here:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/zi_7B15kj6s/w9y_KHM0AwAJ
clifford points out that the proposed BGS (bit gatherer and shuffler)
instruction is limited deliberately to only 16 bits, because the
amount of control/bit-manipulation information required to arbitrarily
swap greater than 16 bits is simply too large.

.... but if there was an instruction which explicitly allowed the
operand size to be reduced to 16 or 8 bits, that problem is solved, is
it not?

anyway apologies to you, rogier, i'm not ignoring what you described :)

l.

Liviu Ionescu

unread,
Apr 4, 2018, 11:22:58 AM4/4/18
to Rogier Brussee, RISC-V ISA Dev
On 4 April 2018 at 17:27:44, Rogier Brussee (rogier....@gmail.com) wrote:

> CC'ing Liviu Ionescu ... as something like
> this might be useful for the microcontroller spec
...
> an alternative for the C extension using 16bit wide instructions
> with a comparable code compression characteristics as the standard
> C extension

Thank you, Rogier.

As far as the microcontroller proposal is targetted, I already
mentioned that it does not focus on changes to the instruction set,
but to making the architecture more C/C++ friendly.

So, any instruction sets and encodings that will be agreed for the
privileged profile will probably be ok for the microcontroller profile
too, except the ABI, which needs a redesign to reduce the number of
registers saved by the caller and possibly be consistent with the
RV32E reduced number of registers.

In my oppinion, in a well designed architecture, the actual user
should have nothing to do with the instruction set at all, the
toolchain must deal with these details, not the end user.

This does not mean that the instruction set is not important, it
obviously it, but it is not the corner stone of the microcontroller
profile.

In addition, although I agree that there may be use cases that I did
not think about, I would not go below 32-bits registers and memory
space.


Regards,

Liviu

lk...@lkcl.net

unread,
Apr 4, 2018, 11:56:02 AM4/4/18
to RISC-V ISA Dev, jha...@gmail.com, i...@livius.net, jcb6...@gmail.com, ke...@prtime.org, xan....@gmail.com, michae...@mac.com
oof, whoops rogier, the cc list was borked! :)  also i tracked down some cross-references to the various discussions you mention, for the benefit of people who may not have seen them (or wish to re-read and refresh their memories).


On Wednesday, April 4, 2018 at 3:27:43 PM UTC+1, Rogier Brussee wrote:
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec,

 
John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data,

 
Kelly Dean because he proposed ideas on binary portability between RV32 and RV64,

 i *think* it's this link (kelly, rogier, can you confirm?)

Michael Clark because of RV8 JIT,

this looks relevant (also link to http://rv8.io)

ok whoops that's important to note that it's "rv8 as in like google v8" *NOT* repeat *NOT* "RV8" as in "RV8, RV16, RV32, RV64, RV128...."

and Xan Phung in recognition of his ideas on Xcondensed). 

i believe you may be referring to this, rogier?

for completeness, and for the benefit of the people who were (attempted to be!) cc'd, an archive link to the full message that rogier sent is here:

l.

Ray Van De Walker

unread,
Apr 4, 2018, 12:40:18 PM4/4/18
to RISC-V ISA Dev
The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
But, 32 registers are too many for most register allocators to use well, so I have always thought
this wasted some bits, and was a real opportunity for improvement.
If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.
Other formats could extend the immediate fields.
Five of the sizes are obvious: 8, 16, 32, 64, 128. The three unused sizes could handle the misty future.
The float set's R-mode instructions can then encode the D and Q R-format instructions.

Another way to handle sizes and types would have load instructions tag registers.
The tags then become part of the instruction decoding. That's a very different ISA, however.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDzK_P4KdV7unEhrS9ojsNFsHbQLLJP3dzYX_UP5QPNAsA%40mail.gmail.com.

Luke Kenneth Casson Leighton

unread,
Apr 4, 2018, 1:46:31 PM4/4/18
to Ray Van De Walker, RISC-V ISA Dev
On Wed, Apr 4, 2018 at 5:40 PM, Ray Van De Walker
<ray.van...@silergy.com> wrote:

> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.

the V extension marks (tags) registers with a size field. section
17.5. it's termed "element width". sizes are 8 16 32 64 128 and
"disabled". oh... i hadn't noticed before, there's... don't quite
understand the latter paragraphs (table 17.6)

17.12 then goes on to describe the vector instruction format(s),
17.14 describes the polymorphism feature (impllicit type-casting from
int to float including automatic zero sign-extension).

so there are two divergent aspects:

(1) what i proposed does not need a size field to be added to the
ISA. it *implicitly* marks registers as containing 8-bit (or 16-bit)
values, where the top bits would (implicitly) be left unaltered.

(2) i was asking if RV16 (and RV8?) were practical to add, with their
own complete ISA, such that there becomes now a separate add, separate
div, separate mul and so on, each carrying out 16-bit (or 8-bit)
operations respectively, *such that*, when added to the ISA, they
*augment* the RV32 and RV64 ISAs in an *identical* way to that which
the RV32 ISA augments the RV64 ISA.

> If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.

so am i correct in understanding that this would (because I set is
frozen) be a hypothetical but alternative way to gain 8-bit and 16-bit
operations, and that (1) or (2) above (as separate and distinct from
th hypothetical R-format recast) would still be feasible and/or worth
exploring?

l.

Richard Herveille

unread,
Apr 4, 2018, 3:01:00 PM4/4/18
to Ray Van De Walker, RISC-V ISA Dev


Sent from my iPhone

> On 4 Apr 2018, at 18:40, Ray Van De Walker <ray.van...@silergy.com> wrote:
>
> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.

The ISA is frozen. Any changes must ensure backward compatibility.
Adding a register size control register would not break the ISA and be fully backward compatible.

Richard
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/DM5PR2001MB1019E3765F62C55BD3B25071F0A40%40DM5PR2001MB1019.namprd20.prod.outlook.com.

lkcl .

unread,
Apr 4, 2018, 3:20:51 PM4/4/18
to Richard Herveille, Ray Van De Walker, RISC-V ISA Dev
On Wed, Apr 4, 2018 at 8:00 PM, Richard Herveille
<richard....@roalogic.com> wrote:

>> On 4 Apr 2018, at 18:40, Ray Van De Walker <ray.van...@silergy.com> wrote:
>>
>> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
>
> The ISA is frozen. Any changes must ensure backward compatibility.
> Adding a register size control register would not break the ISA and be fully backward compatible.

thanks for clarifying, richard.

l.

Rogier Brussee

unread,
Apr 5, 2018, 6:21:15 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com, i...@livius.net, ", jcb...@gmail.com, ", jhaus...@gmail.com, ", k...@prtime.org, ", micha...@mac.com, ", xan...@gmail.com


Op woensdag 4 april 2018 17:22:54 UTC+2 schreef lk...@lkcl.net:
First, Xcondensed is _primarily_ about an _alternative for the C extension_ for RV32/RV64 that you _could_ use as a stand alone 16 bit ISA, and that would give a natural binary compatible upgrade path from 32 to 64 bit. 
If you leave out the 32 bit wide instructions it would be merely "RV inspired" though  But indeed I also indicated how it could be modified to do something similar for
a hypothetical RV16/RV32 in a thread literally started as an aprils fools joke. However,  people noted that 16 bit is not quite dead because running the clock of a microwave works fine in 16 bit and might cost less. _If_  you want a 16 bit CPU that is "RV16", then it is going to be constrained, fixed length 16 bit instructions will be natural fit, and standalone Xcondensed may be close enough

.


(A) rather than add RV16 (and even RV8!) to the existing instruction
set, registers are "tagged" into a CSR with a size (exactly as is
proposed in V-Extension, right now).  by setting the Vector length
equal to one, you have the means to use the "implicit-sized"
operations.

(B) you *still* add RV16 (and possibly even RV8) *not* so much because
someone might want to implement stand-alone 16-bit or 8-bit processors
(they might) but because those instructions would become *part of
RV32/64/128*.

That was basically what John Hauser proposed: adding addh subh and addhi, but he proposed to use the space reserved for the w instructions so that would not work well with your vector instructions.
I would think that addh rd rs1 rs2 (and perhaps subh)  which includes sexth  and  something that gives zexth like addhu rd rs1 rs2  :  add rs1 zext(rs2, 16)  and similar for b should be enough. Immediate instructions are expensive in encoding space.

 
 

so in the case that you describe, rogier, of condensed instructions in
the proposed RV16, they might not actually matter as much.  to make
that clear: the counter-arguments against RV16 *did not take into
account* the fact that RV16 (or RV8) operations would be accessible to
RV32/64/128, and as such could reduce the burden of implementing a B
extension proposal, and also a simplified V extension proposal.

if Bit-wise operations are *forced* to be carried out on the full
(default) bit-width, how on earth do you do 16-bit rotate when it's
needed?  or 8-bit rotate? it has to be *explicitly* coded into the
actual Bit-wise instruction, doesn't it?

use shifts or add  sll[h/b] srad[h/b] srl[h/b] and rll[h/b] instructions.
(note no immediates!) I don't know if they are worth it though.

 
and in some cases you *really cannot* do full (default, 32/64/128
bit-wise operations).  for example here:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/zi_7B15kj6s/w9y_KHM0AwAJ
clifford points out that the proposed BGS (bit gatherer and shuffler)
instruction is limited deliberately to only 16 bits, because the
amount of control/bit-manipulation information required to arbitrarily
swap greater than 16 bits is simply too large.

....  but if there was an instruction which explicitly allowed the
operand size to be reduced to 16 or 8 bits, that problem is solved, is
it not?

anyway apologies to you, rogier, i'm not ignoring what you described :)

No need. Thanx for your reaction. 

l.

Rogier Brussee

unread,
Apr 5, 2018, 6:33:32 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com


Op woensdag 4 april 2018 17:22:58 UTC+2 schreef Liviu Ionescu:
On 4 April 2018 at 17:27:44, Rogier Brussee (rogier....@gmail.com) wrote:

> CC'ing Liviu Ionescu ... as something like
> this might be useful for the microcontroller spec
...
> an alternative for the C extension using 16bit wide instructions
> with a comparable code compression characteristics as the standard
> C extension

Thank you, Rogier.

As far as the microcontroller proposal is targetted, I already
mentioned that it does not focus on changes to the instruction set,
but to making the architecture more C/C++ friendly.

Which seems eminently sensible. 


So, any instruction sets and encodings that will be agreed for the
privileged profile will probably be ok for the microcontroller profile
too, except the ABI, which needs a redesign to reduce the number of
registers saved by the caller and possibly be consistent with the
RV32E reduced number of registers.


Yes. That inspired me to tinker with my original proposal target 16 registers, 
retain a few CSR's despite their cost in encoding. I already had mmio based registers. 

 
In my oppinion, in a well designed architecture, the actual user
should have nothing to do with the instruction set at all, the
toolchain must deal with these details, not the end user.


Agreed completely.
 
This does not mean that the instruction set is not important, it
obviously it, but it is not the corner stone of the microcontroller
profile.


And rightly so, RV32EC should work fine. The C extension is a bit wasteful
if you only use 16 registers however. 
 
In addition, although I agree that there may be use cases that I did
not think about, I would not go below 32-bits registers and memory
space.


 
Xcondensed is primarily about RV32/RV64. 
 
Regards,


Thanx for your time!

Rogier
 
Liviu

Michael Chapman

unread,
Apr 5, 2018, 6:45:53 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev, i...@livius.net, jcb6...@gmail.com, jhause...@gmail.com, ke...@prtime.org, michae...@mac.com, xan....@gmail.com

Our proprietary 32 bit CPUs are often used to replace 8 and 16 bit cores and usually run code which originates from a code base which was created for an 8 or 16 bit cpu.

We have only 32 bit operations, but we do have [s|z]ext[b|h] instructions. However, we find that these are actually rarely required and are required even less in well written code.

The biggest use for the extend instructions is when a loop counter has been declared as something like uint8_t or uint16_t instead of just int and there is a possibility that the value could wrap around. In many cases it is possible for the compiler to avoid generating the extend instructions as it is easy enough to ascertain that the value will not ever wrap around.

My opinion is that you should fix the compiler rather than adding half word or byte signed and unsigned add instructions to the ISA. There are very few occasions where there they will be useful - even on code written for 8 or 16 bit cores.

We do have an option to support unaligned accesses on all our cores. On our smallest cores, customers very rarely use this option - even when their code is coming from an 8 or 16 bit processor.

I think unaligned accesses should be prohibited and dropped from the specification. At the moment the spec says that an unaligned access could be very slow. In which code will avoid ever using it. And then there is no point in having it in the spec at all.

A bit field insert instruction is often useful for deeply embedded code. I.e. an instruction which can take the n least significant bits from a register and insert them at an arbitrary position in another register without upsetting the other bits. This can be used for coding rotates as well.

16 registers is plenty for most applications we see. For RV32E I would still allow the possibility to have single precision floating point, but would not encode them into 16 bit instructions. I would also not have a separate floating point register file but use the same registers as for integer instructions. This reduces the context size required for each task in a small embedded RTOS and again, in practice for most code we see - even with floating point, 16 registers is enough.

Even in floating point intensive applications, there is little point in using up 16 bit instruction space with the floating point instructions. Leave them all as 32 bits.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Apr 5, 2018, 6:53:41 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 13:33:34, Rogier Brussee (rogier....@gmail.com) wrote:

> > As far as the microcontroller proposal is targetted, I already
> > mentioned that it does not focus on changes to the instruction
> set,
> > but to making the architecture more C/C++ friendly.
>
> Which seems eminently sensible.

Can you elaborate? I'm open to any suggestions.

Regards,

Liviu

Rogier Brussee

unread,
Apr 5, 2018, 7:00:03 AM4/5/18
to RISC-V ISA Dev, jha...@gmail.com, i...@livius.net, jcb6...@gmail.com, ke...@prtime.org, xan....@gmail.com, michae...@mac.com


Op woensdag 4 april 2018 17:56:02 UTC+2 schreef lk...@lkcl.net:
oof, whoops rogier, the cc list was borked! :)  also i tracked down some cross-references to the various discussions you mention, for the benefit of people who may not have seen them (or wish to re-read and refresh their memories).


Oops.
 
On Wednesday, April 4, 2018 at 3:27:43 PM UTC+1, Rogier Brussee wrote:
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec,

 
John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data,

 

Yep.
 
Kelly Dean because he proposed ideas on binary portability between RV32 and RV64,

 i *think* it's this link (kelly, rogier, can you confirm?)


Yep.
 
Michael Clark because of RV8 JIT,

this looks relevant (also link to http://rv8.io)

Yep.
 
 
yes that threat resulted in making the  auipc ra high(imm) jalr ra ra low(imm) pattern officially blessed. In fact it is so useful that I consider C.jalr_ra_ra 11imm:   jalr  ra ra imm<<1 as a good candidate for using the remaining reserved slot in C, but that would need more data (e.g. a linux distribution with and without this instruction)


ok whoops that's important to note that it's "rv8 as in like google v8" *NOT* repeat *NOT* "RV8" as in "RV8, RV16, RV32, RV64, RV128...."


Correct.
 
and Xan Phung in recognition of his ideas on Xcondensed). 

i believe you may be referring to this, rogier?


Rogier Brussee

unread,
Apr 5, 2018, 7:01:40 AM4/5/18
to RISC-V ISA Dev


Op woensdag 4 april 2018 18:40:18 UTC+2 schreef ray.vandewalker:
The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
But, 32 registers are too many for most register allocators to use well, so I have always thought
this wasted some bits, and was a real opportunity for improvement.
If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.
Other formats could extend the immediate fields.
Five of the sizes are obvious: 8, 16, 32, 64, 128. The three unused sizes could handle the misty future.

size  = XLEN 

Rogier Brussee

unread,
Apr 5, 2018, 8:04:25 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 5 april 2018 12:53:41 UTC+2 schreef Liviu Ionescu:
What you already know and stated: people are more expensive and important than transistors. They choose the path of least resistance and least cost. Therefore, focussing on the usability part trumps (comparatively small) technical advantages or savings for succes in the market. C/C++ code is obviously far easier to use, less expensive and bug prone than assembler. Since as you point out, in the embedded world you have to interact (and debug) low level features like interrupt handling, an easy to understand, reliable, and easy to debug programming model for low level features will be a deciding factor. Also if assembler is needed, RV, at least the non privileged part, is about as easy to understand as it gets, and will be (relatively) widely taught*. This means that fewer guru points are required to use assembler but also that the intricacies of low level control features (CSR's cough) become relatively more important stumbling blocks for using that skill effectively.  

*this would be the one reason why I could imagine a 16 bit RV processor having some succes: if there is reason enough to bear the pain of a 16 bit processor, then at least make it as easy as possible to deal with it. 


Regards,

Liviu

Liviu Ionescu

unread,
Apr 5, 2018, 8:50:33 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 15:04:27, Rogier Brussee (rogier....@gmail.com) wrote:

> ... people are more expensive and important than transistors.

nicely said. probably it should be the motto of the microcontroller profile.

> ... an easy to understand,
> reliable, and easy to debug programming model for low level features will
> be a deciding factor.

I hope it will.

unfortunately I am not aware of any formal efforts to advance the
RISC-V microcontroller profile proposal :-(


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 9:12:39 AM4/5/18
to Michael Chapman, Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, Jacob Bachmeyer, John Hauser, Kelly Dean, Michael Clark, xan....@gmail.com
On Thu, Apr 5, 2018 at 11:46 AM, Michael Chapman <michael.c...@gmail.com> wrote:


My opinion is that you should fix the compiler rather than adding half word or byte signed and unsigned add instructions to the ISA. There are very few occasions where there they will be useful - even on code written for 8 or 16 bit cores.


if the only use-case was general-purpose computing i would absolutely agree with you.  however would it not be reasonable to want to use RISC-V for 3D Graphics, Video Processing, Cryptographic algorithms, Audio processing, and many many more applications which normally you would easily spend $500k on for 3D, $200k for video, $100k for a cryptographics co-processor and $50k-$100k for an audio DSP and so on?  these are not uncommon use-case scenarios [understatement: Xtensa were proud to announce their *billionth* license of their Audio DSP hard mcro, about ten years ago!]

that's an awful lot of money to be spending when you _could_ .... if RISC-V supported it... use a B-Extended SIMD/Simple-Vector extended 8/16-bit-capable RISC-V processor instead [hell, the cost of licensing the above hard macros *alone* justifies putting a team together to make that happen!].

looking at jeff bush's nyuzi 3D GPU analysis [1], he points out that the reason why software-defined GPUs have failed is because it's not the amount of processing that's so much the issue, it's the amount of power needed for the SRAM / L1 cache.  once you've got the data into the ALU,  it's *really* important to do as much work as possible before writing it back out of the registers.

if we want RISC-V to be successful in really rather high-profile mass-volume uses (cryptography, DSP work, Video, 3D, Tensors for AI), we *really* need to think beyond just the "general-purpose" scenario.


We do have an option to support unaligned accesses on all our cores. On our smallest cores, customers very rarely use this option - even when their code is coming from an 8 or 16 bit processor.

I think unaligned accesses should be prohibited and dropped from the specification. At the moment the spec says that an unaligned access could be very slow. In which code will avoid ever using it. And then there is no point in having it in the spec at all.

i also considered suggesting the same thing (to prohibit unaligned memory access).  however... how would you then do audio processing of data that comes in from a DMA buffer, in 8, 16, 24 or 32-bit configurations (back-to-back samples with no word-alignment)?  someone buys an off-the-shelf AC97 hard macro... they pay $50k to $100k for it and they *can't read the data*???  or they have to jump through insane hoops to get at it, by doing a multiply (shift by 8 or 16), then & to mask out unwanted bits, then divide (shift by 16 or 24) to get the lower bits?  and do that on almost every single or every other audio sample?

so maybe the data rate of audio one might imagine that doing that would be fine... but for video processing (1080p60 which is nearly 500 mbytes per second of bandwidth for 32-bit pixels), you might think that going to 24-bit or 16-bit would save on bandwidth but on CPU cycles the above hoops to jump through would... you get the idea.



A bit field insert instruction is often useful for deeply embedded code. I.e. an instruction which can take the n least significant bits from a register and insert them at an arbitrary position in another register without upsetting the other bits. This can be used for coding rotates as well.

 

16 registers is plenty for most applications we see. For RV32E I would still allow the possibility to have single precision floating point, but would not encode them into 16 bit instructions. I would also not have a separate floating point register file but use the same registers as for integer instructions.

yes: i was quite surprised to see that FP has a separate register file.  it makes sense from a perspective of an optimised implementation where the FPU runs separately from an ALU (and those FENCE instructions are used to keep stuff in order).  or.... no actually it doesn't make sense at all :)

l.

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 9:18:07 AM4/5/18
to Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev
On Thu, Apr 5, 2018 at 1:50 PM, Liviu Ionescu <i...@livius.net> wrote:
> On 5 April 2018 at 15:04:27, Rogier Brussee (rogier....@gmail.com) wrote:
>
>> ... people are more expensive and important than transistors.
>
> nicely said. probably it should be the motto of the microcontroller profile.

:)

>> ... an easy to understand,
>> reliable, and easy to debug programming model for low level features will
>> be a deciding factor.
>
> I hope it will.
>
> unfortunately I am not aware of any formal efforts to advance the
> RISC-V microcontroller profile proposal :-(

*and* the B-Extension working group was shut down (annoying its
external contributors) *and* V-Extension is stalled (i learned that
Hwacha was terminated in 2017, it's listed as a "former project" here:
http://people.eecs.berkeley.edu/~krste/)

whaaat's gooing ooon?

l.

Richard Herveille

unread,
Apr 5, 2018, 9:53:14 AM4/5/18
to Luke Kenneth Casson Leighton, Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev, Richard Herveille

http://people.eecs.berkeley.edu/~krste/)

 

The V-extensions are not stalled.

Hwacha is an implementation of a vector processor, but it is not compatible with the proposed V-extensions.

See Esperanto Technology’s presentations.

 

Richard

 

 

 

whaaat's gooing ooon?

 

l.

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Liviu Ionescu

unread,
Apr 5, 2018, 10:00:56 AM4/5/18
to Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 16:18:01, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:

> ... whaaat's gooing ooon?

well, on the Linux front, lots of things.

outside the Linux world...  nothing. :-(

and don't expect any change soon, Krste clearly stated that the
official position is to maintain compatibility with the privileged
specs. which will probably discourage any use of RISC-V in
microcontrollers.

regards,

Liviu


On 17 March 2018 at 02:16:23, kr...@berkeley.edu (kr...@berkeley.edu) wrote:

> ... The
> new task group is looking at extending interrupt behavior, but
> with a
> view to maintaining backwards compatibility and to support
> dual-use
> cores that run either real-time or virtual-memory code.

Richard Herveille

unread,
Apr 5, 2018, 10:06:48 AM4/5/18
to Liviu Ionescu, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev, Richard Herveille

 

... whaaat's gooing ooon?

 

well, on the Linux front, lots of things.

 

outside the Linux world...  nothing. :-(

 

and don't expect any change soon, Krste clearly stated that the

official position is to maintain compatibility with the privileged

specs. which will probably discourage any use of RISC-V in

microcontrollers.

 

 

There’s nothing from stopping us from writing a microcontroller spec which does not comply to the privilege spec, but still comply to the user spec and the other extensions.

 

Richard

 

 

regards,

 

Liviu

 

 

On 17 March 2018 at 02:16:23, kr...@berkeley.edu (kr...@berkeley.edu) wrote:

 

... The

new task group is looking at extending interrupt behavior, but

with a

view to maintaining backwards compatibility and to support

dual-use

cores that run either real-time or virtual-memory code.

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 10:12:57 AM4/5/18