Proposal: RV16E

926 views
Skip to first unread message

Cesar Eduardo Barros

unread,
Apr 1, 2018, 8:23:17 AM4/1/18
to RISC-V ISA Dev
The RISC-V instruction set architecture is available in a 32-bit
variant, for small microcontrolers, a 64-bit variant, for servers and
workstations, and a future-proof 128-bit variant. For even smaller
microcontrollers, there is a reduced 32-bit variant (RV32E) which omits
half of the register set. But what if you have a need for an even
smaller microcontroller?

I present here a proposal for a RISC-V variant that's even smaller than
RV32E, yet still usable. I call it RV16E, and going in the oposite
direction of RV128I, it extends RISC-V downwards, with sixteen 16-bit
integer registers.

That is, XLEN=16, and like RV32E, only x0-x15 are available. Immediates
are also 16-bit only: for instructions like LUI, AUIPC or jumps, the
immediate must be sign-extended before being encoded into the
instruction, otherwise it's an invalid instruction.

Going in the order the instructions are described in the manual:

- Registers x16-x31 are not available;
- For SLLI/SRLI/SRAI, the shamt field is reduced to 4 bits, the leftover
bit being always zero;
- For LUI/AUIPC, bits [31:16] of the immediate must be a copy of bit 15
of the immediate;
- For SLL/SRL/SRA, the shift amount is in the lower 4 bits of the register;
- For JAL, bits [20:16] of the offset must be a copy of bit 15 of the
offset;
- For LOAD/STORE, the available widths are only LB/SB, LH/SH, and LBU;
- Like with RV32E, counter instructions are optional, and floating point
not allowed.

Using compressed instructions with RV16E is clearly desirable, since for
instance C.LUI can replace nearly all uses of LUI. The RVC extension for
RV16E is based on RV32C, with the following modifications:

- C.LHSP replaces C.LWSP, and scales by 2 (imm is offset[5] and
offset[4:1|6])
- C.SHSP replaces C.SWSP, and scales by 2 (imm is offset[5:1|6])
- C.LH replaces C.LW, and scales by 2 (imm is offset[5:3] and offset[2:1])
- C.SH replaces C.SW, and scales by 2 (imm is offset[5:3] and offset[2:1])
- C.J, C.JAL, C.JR, C.JALR, C.BEQZ, C.BNEZ, C.LI stay the same
- C.LUI must have bits 17 and 16 of nzuimm idential to bit 15
- C.ADDI stays the same
- C.ADDI16SP is replaced by C.ADDI4SP (TODO: immediate encoding)
- C.ADDI4SPN is replaced by C.ADDI2SPN (TODO: immediate encoding)
- C.SLLI, C.SRLI, C.SRAI must have shamt[5] zero
- C.ANDI, integer register-register, illegal, C.NOP, C.EBREAK stay the same

The stack is aligned to 4 bytes, instead of 16 bytes. (TODO: check
immediate encodings)


The obvious disadvantage of RV16E is being able to address only 65536
bytes of memory, which has to be shared between the large 4-byte
instructions, data, and memory-mapped I/O. The traditional solution for
this is banking. I propose, therefore, a set of four BANKn CSRs, each
having up to 16 bits. The top two bits of the memory address would
select which CSR contains the bank number, while the lower 14 bits would
be the offset within the bank.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 10:53:38 AM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 1:23 PM, Cesar Eduardo Barros
<ces...@cesarb.eti.br> wrote:

> The obvious disadvantage of RV16E is being able to address only 65536 bytes
> of memory, which has to be shared between the large 4-byte instructions,
> data, and memory-mapped I/O. The traditional solution for this is banking. I
> propose, therefore, a set of four BANKn CSRs, each having up to 16 bits. The
> top two bits of the memory address would select which CSR contains the bank
> number, while the lower 14 bits would be the offset within the bank.

ha, very cool: i was just going to ask about this (but in the context
of RV32* as i was considering investigating adding NUMA RV32* cores to
operate along-side SMP RV64GC cores, for multimedia video and 3D
graphics processing).

what you propose with BANKn CSRs reminds me of the Z80. that had
memory banks that could (shock, gasp) address up to 1MB of RAM (!).
from what i remember of the Z80 it was a pain: half the 64k memory was
bank-addressable and the other half not, and i don't believe you could
address multiple banks at once. this in turn meant that if you wished
to operate on two sets of bank-addressed memory you simply couldn't:
you had to *copy* one bank into the bottom 32k and then change the
bank address to refer to the other, do the operations and then copy
the results *back* to the first bank.

total pain.

i note however that you are proposing 4 BANK addresses. 2^14=..
16384. so the addressable memory range would be 16384. and with 2
bits in the top selecting which CSR you could... yes! simultaneously
address 4 separate different areas in memory. smart. i like it.

however.... presumably the BANKn CSR would need to be 18 bits not 16
in order to address the full 2^32 memory range? otherwise the memory
range is limited to 2^30 = 1GB of memory, not 4GB. it might not make
sense in a traditional micro-controller environment however i am used
to some really weird architectures: 2D grids of 4-bit ALUs (a company
in bristol, UK), 1D strings of 1-bit and later 2-bit ALUs (Aspex
Microelectronic Array-String Processor: massively wide SIMD: one
processor with 4096 ALUs with 256 bits of content-addressable RAM *per
ALU*). also, eperantotech (*waves to Allen*) have 4096 RV32 cores,
they might well have considered 8192 or 16384 RV16 cores, perhaps
fitting into the same die area if they are really that much smaller,
who knows.

so with that in mind, Cesar, had you considered BANK0 applying to the
first memory-address (a read) and BANK1 applying to the stores, and so
on? i don't know if it's possible to issue 2 reads (or two writes) in
a single RISC-V instruction.

or, having the BANKn CSRs be 32-bit (would require 2 16-bit
instructions to set each, i realise) and be *added* to the load/store,
turning all instructions into *relative* addresses, what about that?
developers could then choose to set the lower 14 bits to zero and
choose not to issue the 2nd of the bank-setting instructions, thus
effectively being functionally-identical to the idea that you propose,
and save on one instruction... but the advantage is, relative
addressing would allow inter-bank boundaries to be crossed without
needing to mess about with extra manual memory copying [and detection
of when such boundaries occur. yuck!]

oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
instruction which would allow the top 2 bits (17 and 18) to be written
to at the time that the other 16 bits were being loaded? bit of a
hack that... :)

really like the idea of tiny cores, cesar. love to see the BANKn idea
added to RV32 as well in some fashion.

l.

Cesar Eduardo Barros

unread,
Apr 1, 2018, 2:49:49 PM4/1/18
to Luke Kenneth Casson Leighton, RISC-V ISA Dev
Em 01-04-2018 11:53, Luke Kenneth Casson Leighton escreveu:
> On Sun, Apr 1, 2018 at 1:23 PM, Cesar Eduardo Barros
> <ces...@cesarb.eti.br> wrote:
>
>> The obvious disadvantage of RV16E is being able to address only 65536 bytes
>> of memory, which has to be shared between the large 4-byte instructions,
>> data, and memory-mapped I/O. The traditional solution for this is banking. I
>> propose, therefore, a set of four BANKn CSRs, each having up to 16 bits. The
>> top two bits of the memory address would select which CSR contains the bank
>> number, while the lower 14 bits would be the offset within the bank.
>
> ha, very cool: i was just going to ask about this (but in the context
> of RV32* as i was considering investigating adding NUMA RV32* cores to
> operate along-side SMP RV64GC cores, for multimedia video and 3D
> graphics processing).

If you want to make RV32 access more memory, something like x86's PAE
(adding more levels to the page table) would be a better idea. Or, since
they are auxiliary cores, somethine like an IOMMU managed by the RV64
side, using the RV64 page table formats.

> what you propose with BANKn CSRs reminds me of the Z80. that had
> memory banks that could (shock, gasp) address up to 1MB of RAM (!).
> from what i remember of the Z80 it was a pain: half the 64k memory was
> bank-addressable and the other half not, and i don't believe you could
> address multiple banks at once. this in turn meant that if you wished
> to operate on two sets of bank-addressed memory you simply couldn't:
> you had to *copy* one bank into the bottom 32k and then change the
> bank address to refer to the other, do the operations and then copy
> the results *back* to the first bank.
>
> total pain.

Yes, it was directly inspired by the Z80.

> i note however that you are proposing 4 BANK addresses. 2^14=..
> 16384. so the addressable memory range would be 16384. and with 2
> bits in the top selecting which CSR you could... yes! simultaneously
> address 4 separate different areas in memory. smart. i like it.

I tried to balance the number of CSRs (which are a limited resource)
with the convenience of having multiple mappable ranges. Two would be
the minimum, I chose 4 to be more flexible.

> however.... presumably the BANKn CSR would need to be 18 bits not 16
> in order to address the full 2^32 memory range? otherwise the memory
> range is limited to 2^30 = 1GB of memory, not 4GB. it might not make
> sense in a traditional micro-controller environment however i am used
> to some really weird architectures: 2D grids of 4-bit ALUs (a company
> in bristol, UK), 1D strings of 1-bit and later 2-bit ALUs (Aspex
> Microelectronic Array-String Processor: massively wide SIMD: one
> processor with 4096 ALUs with 256 bits of content-addressable RAM *per
> ALU*). also, eperantotech (*waves to Allen*) have 4096 RV32 cores,
> they might well have considered 8192 or 16384 RV16 cores, perhaps
> fitting into the same die area if they are really that much smaller,
> who knows.

If XLEN is 16, each CSR can hold up to 16 bits. No exceptions. And 1GB
of memory ought to be enough for anybody ;-)

> so with that in mind, Cesar, had you considered BANK0 applying to the
> first memory-address (a read) and BANK1 applying to the stores, and so
> on? i don't know if it's possible to issue 2 reads (or two writes) in
> a single RISC-V instruction.
>
> or, having the BANKn CSRs be 32-bit (would require 2 16-bit
> instructions to set each, i realise) and be *added* to the load/store,
> turning all instructions into *relative* addresses, what about that?
> developers could then choose to set the lower 14 bits to zero and
> choose not to issue the 2nd of the bank-setting instructions, thus
> effectively being functionally-identical to the idea that you propose,
> and save on one instruction... but the advantage is, relative
> addressing would allow inter-bank boundaries to be crossed without
> needing to mess about with extra manual memory copying [and detection
> of when such boundaries occur. yuck!]
>
> oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
> instruction which would allow the top 2 bits (17 and 18) to be written
> to at the time that the other 16 bits were being loaded? bit of a
> hack that... :)

That would be too much complexity for a joke proposal (check the date).
A more serious proposal could use a separate BANK_TOP CSR to hold the
top bits of the BANKn registers. That would give 20-bit bank numbers,
which should be way beyond plenty.

> really like the idea of tiny cores, cesar. love to see the BANKn idea
> added to RV32 as well in some fashion.

I'm glad someone liked it ;-) Even though it's just an April 1st joke
proposal, I do believe the idea of "extending down" RISC-V into the
16-bit land might have some merit.

Christopher Celio

unread,
Apr 1, 2018, 3:01:40 PM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
I have to applaud you for resisting the urge to add condition codes to RV16 (as overflow would be much more likely now!). Otherwise, it would be much harder to implement RV16 in a higher-performance environment like say an out-of-order core.

-Chris

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0b7c0d18-a5e4-0696-fe0b-e3a25c07f8c7%40cesarb.eti.br.

Liviu Ionescu

unread,
Apr 1, 2018, 3:12:43 PM4/1/18
to Cesar Eduardo Barros, Luke Kenneth Casson Leighton, RISC-V ISA Dev
On 1 April 2018 at 21:49:49, Cesar Eduardo Barros (ces...@cesarb.eti.br) wrote:

> Even though it's just an April 1st joke
> proposal,

a nice one, I would say. ;-)

> I do believe the idea of "extending down" RISC-V into
> the
> 16-bit land might have some merit.

come on!

from a software point of view, the ideal embedded core would be a
64-bits one (if you don't believe this, take a look at the recommended
method to access 64-bits timer registers on a 32-bits core). not to
mention multiply/divide instructions, more consistent with double
floating point, etc.

hopefully the 16-bit land will be a thing of the past, and remain so.


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 4:36:11 PM4/1/18
to Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 7:49 PM, Cesar Eduardo Barros
<ces...@cesarb.eti.br> wrote:

> If you want to make RV32 access more memory, something like x86's PAE
> (adding more levels to the page table) would be a better idea.

oh ok. yes. nice.

> Or, since
> they are auxiliary cores, somethine like an IOMMU managed by the RV64 side,
> using the RV64 page table formats.

i was thinking along the lines of them being self-managing, so that
code synchronisation would not be needed.

> Yes, it was directly inspired by the Z80.

cool!

> If XLEN is 16, each CSR can hold up to 16 bits. No exceptions. And 1GB of
> memory ought to be enough for anybody ;-)

:)

>> oooor.... is there 2 bits spare somewhere in the BANKn CSR setting
>> instruction which would allow the top 2 bits (17 and 18) to be written
>> to at the time that the other 16 bits were being loaded? bit of a
>> hack that... :)
>
>
> That would be too much complexity for a joke proposal (check the date).

awww! i was just warming to the idea!

> A
> more serious proposal could use a separate BANK_TOP CSR to hold the top bits
> of the BANKn registers. That would give 20-bit bank numbers, which should be
> way beyond plenty.

yes. something like that. that's what i meant about having 2 CSRs
per bank, i just expressed it badly enough for you not to be able to
recognise it as such.

>> really like the idea of tiny cores, cesar. love to see the BANKn idea
>> added to RV32 as well in some fashion.
>
>
> I'm glad someone liked it ;-) Even though it's just an April 1st joke
> proposal, I do believe the idea of "extending down" RISC-V into the 16-bit
> land might have some merit.

for student projects (easier to implement, smaller resource FPGAs),
for washing machine processors, SIM cards, and sub-micro-amp power
scenarios, hell yes. the STM8S003 for example is a 20-pin TSSOP, it's
$0.24 in quantity *ONE* even from digikey (so imagine what the volume
price is in Shenzhen), it has 256 bytes of RAM and i believe 1k of
NAND and it's *awesome*. gets used in microwaves, washing machines,
fridges, the works. how many of _those_ are sold world-wide? just
because RV32 and above are sexy and modern doesn't mean that RiSC-V as
a concept has to stop there. hell, 10 years ago i heard of a company
doing extremely well with an 8-bit fully-functioning processor that
only had *140 gates*.

also (liviu), as Aspex Microelectronics showed, when the processor
core is small enough such that it can be efficiently embedded as part
of a massively-replicable array that also has a small
Content-Addressable RAM in each element of the array, very very
interesting things start to become possible. certain applications
become literally a hundred times faster: pattern recognition, network
routing, video processing, neural networks and so on. unfortunately
they're also a couple orders of magnitude more of a bitch to program
but hey you can't have everything.

l.

Liviu Ionescu

unread,
Apr 1, 2018, 4:57:55 PM4/1/18
to Cesar Eduardo Barros, Luke Kenneth Casson Leighton, RISC-V ISA Dev
On 1 April 2018 at 23:36:11, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:

> > also (liviu), as Aspex Microelectronics showed, when the processor
> core is small enough such that it can be efficiently embedded
> as part
> of a massively-replicable array that also has a small
> Content-Addressable RAM in each element of the array, very very
> interesting things start to become possible. ... unfortunately
> they're also a couple orders of magnitude more of a bitch to program
> but hey you can't have everything.

I have nothing against experimentation and research.

however, personally I'm more interested in improving the quality of
life for the today software guys, in order to become more productive.
thus my proposal for a C/C++ friendly RISC-V microcontroller
architecture, where I accept to trade some transistors for ease of
use.


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 1, 2018, 5:07:07 PM4/1/18
to Liviu Ionescu, Cesar Eduardo Barros, RISC-V ISA Dev
On Sun, Apr 1, 2018 at 9:57 PM, Liviu Ionescu <i...@livius.net> wrote:

> I have nothing against experimentation and research.
>
> however, personally I'm more interested in improving the quality of
> life for the today software guys, in order to become more productive.
> thus my proposal for a C/C++ friendly RISC-V microcontroller
> architecture, where I accept to trade some transistors for ease of
> use.

totally get it, liviu. the embedded worlds (STM32F, STM8S, PICs)
which are absolutely enormous volumes (literally billions of units)
and the programs extremely small and often written in assembler or are
targets of sdcc or avr-utils (specialist subset c compilers) and the
general-purpose worlds (MIPS32/MIPS64, x86_64, ARM) are completely...
alien to each other. there's almost nothing in common.

l.

Tommy Murphy

unread,
Apr 1, 2018, 5:19:00 PM4/1/18
to RISC-V ISA Dev, lk...@lkcl.net, ces...@cesarb.eti.br
Ha ha - well played - caught me out anyway! :-)

Ray Van De Walker

unread,
Apr 2, 2018, 3:37:15 PM4/2/18
to RISC-V ISA Dev
I realize that this is an April-1st joke, but it is well-conceived, and sometimes there are not enough transistors.
For any newcomers, here is a link to an earlier set of RV16 proposals:
https://groups.google.com/a/groups.riscv.org/forum/#!searchin/isa-dev/16-bit/isa-dev/iK3enKGb5bw/cuVAq0J8EAAJ
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2a020397-f4b9-9edc-16e2-283ddac5b37b%40cesarb.eti.br.

Rogier Brussee

unread,
Apr 4, 2018, 10:27:43 AM4/4/18
to RISC-V ISA Dev, i...@livius.net, "; jc...@gmail.com, "; jhaus...@gmail.com, "; k...@prtime.org, "; micha...@mac.com, "; xan...@gmail.com
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec, John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data, Kelly Dean because he proposed ideas on binary portability between RV32 and RV64, Michael Clark because of RV8 JIT, and Xan Phung in recognition of his ideas on Xcondensed). 

Thanks for linking to the Xcondensed proposal. My proposal was not so much a proposal for RV16 as for an alternative for the C extension using 16bit wide instructions with a comparable code compression characteristics as the standard C extension when used in combination with the full 32bit wide instructions with the following properties:


* Every 16 bit wide instructions maps to a 32 bit wide RV instruction.
* Reuse the parts of C that give 98% of the compression of RV code.  
* Stand alone: just having the fixed 16bit with instructions gives a simple but complete ISA. Without the 32 bit wide ISA, it would _not_ be RV (even with the 32 bit wide ISA  Xcondensed would still be a non standard extension, so YMMV) but it is close enough that I think it could reuse most of the compiler infrastructure and experience with RV, simply because it is fairly straightforward translation to a two register form of RVIMA just like and mostly overlapping with the C extension.  I then remarked that such an ISA would be a natural choice for a hypothetical RV16.  I didn't really pursue this, although I thought it might be useful, but  Waterman and others thought RV16 was a non goal and an alternative to C was a bad idea because it was already done (although at the time C was still in review). In any case, this might have given the impression that Xcondensed was a proposal for RV16.

Xan Phung (https://groups.google.com/a/groups.riscv.org/forum/#!searchin/isa-dev/Xan$20Phung/isa-dev/YG80MMHMWis/y5TTkHcADwAJ) came up with a modification of the proposal that could be used for an alternative for C that uses only 2 quadrants and frees a quadrant for 32 bit instructions (which may well be the best use of the proposal). In particular he pointed out that register spilling is/could be done with XLEN sized load/store instructions, drastically reducing the need for L*SP and S*SP instructions.  

Targetting RVE (i.e. 16 registers)  and using ideas of Xan, makes things a little less cramped, even leaving room for a few CSR's. Note that this does not preclude using registers x16-x32 if the 32 bit isa is available: you just cannot use them with condensed instructions. Xcondensed should give an isa that should allow binaries that are efficiently binary compatible between RV32Xcondensed and RV64Xcondensed processors by simply using w instructions where int's are meant rather than pointers or long and throwing in the occasional sextw or zextw sign or zero extension that would be nops on RV32. I will indicate how I think it should be changed to get an RV16EXcondensed and portability between RV16EXcondensed /RV32EXcondensed (or 16/32 portability for short). 

A flat adres space is assumed including for the RV16 case! 

notation by example:

4rsd = registers x0-x15 used as rd and rs1

4rs2* = registers x1-x15 used as rs2

3rs1 = registers x8-x15  used as rs1 encoded as in the C extension

5imm = 5bit immediate.

7imm* = 7bit nonzero immediate

# = comment

#instruction = instruction that does not currently exist in RVIMA but might become one in B or small integer extension 

lx = lh for rv16 lw for rv32, ld for rv64 lq for rv 128 (or maps to a hypothetical lx instruction with an immediate in units of XLEN/8 byte). Also x is used to indicate shift over log2(XLEN/8)

@auipc zero imm: break 5imm =  sneak in the break instruction by reusing the li opcode with rsd = zero 

newline  = new 5bit opcode.


li         4rd*   5imm     addi rd zero sext(imm)

li_7     4rd*   5imm     addi rd zero sext(imm<<7)       

lui       4rd*   5imm     lui    rd sext(imm)                      #aka li_12

auipc  4rd*   5imm     auipc rd sext(imm)                    #aka ai_12pc


addi    4rsd* 7imm*     addi rsd rsd sext(imm, XLEN)


ai_xsp 4rd* 7imm*       add rd sp sext(imm<<7)   #for stack adjustment and pointers into stack. Assume stack is 2*XLEN/8 aligned. Alternatively define addi16sp  imm add sp sp imm<<4  (aka addi_4sp) for rd = sp


addwi     4rsd* 5imm    addwi rsd rsd sext(imm<<7)                                 #replace with addhi for 16/32 portability

addi_7    4rsd* 5imm*  addi rsd rsd sext(5imm << 7)

addi_x    4rsd* 5imm*  addi  rsd rsd sext(5imm << log2(XLEN/8)

addi_5x  4rsd* 5imm*  addi rsd rsd sext(5imm << (log2(XLEN/8)+5))


beqz   4rs1*  7imm*    beq rs1 zero imm


bnez   4rs1*  7imm*    bne rs1 zero imm


jalri     4rsd 7imm      jalr rsd rsd sext(imm)   #mainly useful for milicode: use li_7 t0  MILICODE_BASE  jalr t0 imm to call imm for a milicode call en  li_7 t1 MILICODE_BASE jalr t1 imm for a tailcall

 

lxsp    4rd*  7imm      l[h/w]  rd sext(imm<<log2(XLEN/8))(sp)


lx        3rd  3rs1 5imm  l[h/w] rd  zext(imm<<log2(XLEN/8))(rs1)


lw        3rd  3rs1 5imm  lh      rd  zext(imm<<1)(rs1)        #replace with lh 3rd 3rs1 5imm    for 16/32 portability


lbu      3rd  3rs1 3imm  lbu    rd  zext(imm)(rs1)              # replace all with lbu, 3rd 3rs1 5imm   for 16/32  portability

lh        3rd  3rs1 3imm  lh      rd  zext(imm<<1)(rs1)        # replace with ld for 64/128 portability

flw       3rd 3rs1 3imm  flw     rd  zext(imm<<2)(rs1)

fld       3rd 3rs1 3imm  fld      rd  zext(imm<<3)(rs1)


sxsp   4rs1* 7imm         s[h/w] rs1 sext(imm <<log2(XLEN/8))(sp) 


sx       3rd 3rs1 5imm  s[h/w]  rs1zext(imm<<log2(XLEN/8))(sp)


sw       3rd 3s1  5imm  sh       rs1 zext(imm<<1)(sp)          #replace with sh  3rs1 3rs2 5imm  for 16/32 portability


sb      3rs1  3rs2 3imm  lbu    rd  zext(imm)(rs1)                # replace with sb, 3rd 3rs1 5imm   for 16/32  portability

lh       3rs1  3rs1 3imm  lh      rd  zext(imm<<1)(rs1)          

fsw     3rs1  3rs2  3imm flw    rd  zext(imm<<2)(rs1)

fsd     3rd 3rs1 3imm     fld     rd  zext(imm<<3)(rs1)

  

auipc_ra   11imm        auipc ra sext(imm) for 32/64        


jalr_rara    11imm         jalr ra ra imm<<1                             #use in combination with auipc_ra.  Fusable to effectively jal ra 22imm


j               11imm*         jal zero sext(imm)


jal            11imm*         jal ra sext(imm)                       


andi        4rsd* 5imm    andi rsd rsd sext(imm)                     #imm == 0 and imm == -1 are both useless.

slli           4rsd* 5imm    slli rsd rsd   sext(imm)                     #imm ==0 encodes 32 for 32/64

srli          4rsd* 5imm     srli rsd rsd  sext(imm)                     #likewise

srai         4rsd* 5imm     srai rsd rsd sext(imm)                     #likewise


add           4rsd* 4rs2*   add rsd rsd rs2 

sub           4rsd* 4rs2*   sub rsd rsd rs2

addw        4rsd* 4rs2*   addw rsd rsd rs2                             #replace with addh for 16/32,  to follow the letter of the RV spec simply map to add instead of addw in RV32

subw        4rsd* 4rs2*   subw rsd rsd rs2                             #replace with addh for 16/32,  to follow the letter of the RV spec simply map to add instead of addw in RV32 

slt             4rsd* 4rs2    slt rsd rsd rs2

sltu           4rsd* 4rs2    sltu rsd rsd rs2

mv           4rd*  4rs1*    add rd rs1 zero

jalr           4rd   4rs1*    jalr rd rs1 0

                                                       

and          3rsd 3rs2    and rsd rsd rs2

or             3rsd 3rs2    or rsd rsd rs2

xor           3rsd 3rs2    xor rsd rsd rs2

#addh      3rsd 3rs2                                                                          #superfluous for 16/32

sll            3rsd 3rs2    sll rsd rsd rs2

srl            3rsd 3rs2    srl rsd rsd rs2

sra           3rsd 3rs2    sra rsd rsd rsd

#rll           3rsd 3rs2                                                                          #rs2 is taken mod log2(XLEN), therefore negative values rotate right.

mul          3rsd 3rs2    mul rsd rsd rs2

mulh        3rsd 3rs2    mulh rsd rsd rs2

mulhsu    3rsd 3rs2    mulhsu rsd rsd rs2

mulhu      3rsd 3rs2    mulhu rsd rsd rs2

div           3rsd 3rs2    div rsd rsd rs2

divu         3rsd 3rs2    div rsd rsd rs2

rem          3rsd 3rs2   rem rsd rsd rs2

remu        3rsd 3rs2   rem rsd rsd rs2

not           3rd 3s1      xori rd rs1 -1

sllx           3rd 3rs1     slli rd rs1  x 

#sextb     3rd 3rs1

#sexth     3rd 3rs1

#sextw    3rd 3rs1

#zextb     3rd 3rs1

#zexth     3rd 3rs1

#zextw    3rd 3rs1

#popc     3rd 3rs1

#clz        3rd 3rs1

#bswap  3rd 3rs1

   

lr       rd rs1        lr rd rs1

sc     rsd rs1       sc rsd rs1 rsd

lrw    rd rs1         lrw rsd rs1 rsd                                                   #for 16/32 portability just drop

scw   rsd rs2       scw rsd rs1 rsd                                                 #for 16/32 portability just drop

amoadd         3rsd 3rs1         amoadd.aqrl rsd rs1 rsd

amoaddw       3rsd 3rs1        amoaddw.aqrl rsd rs1 rsd              #for 16/32 portability just drop

amoswap       3rsd 3rs1         amoswap.aqrl rsd rs1 rsd         

amoand         3rsd 3rs1         amoand.aqrl rsd rs1 rsd

amoor            3rsd 3rs1         amoor.aqrl   rsd rs1 rsd

amoxor          3rsd 3rs1         amoxor.aqrl    rsd rs1 rsd

memadd        3rsd 3rs1         amoadd.        rsd rs1 rsd              #no ordering, but indivisible 

memaddw      3rsd 3rs1         amoaddw.     rsd rs1 rsd              #no ordering, but indivisible; for 16/32 portability just drop

memswap      3rsd 3rs1         amoswap.        rsd rs1 rsd           #no ordering, but indivisible

memand        3rsd 3rs1         amoand.        rsd rs1 rsd              #no ordering, but indivisible

memor           3rsd 3rs1         amoor.          rsd rs1 rsd               #no ordering, but indivisible

memxor         3rsd 3rs1         amoxor.         rsd rs1 rsd              #no ordering, but indivisible

csrrw              3rsd imm7      csrrw rsd rsd   map(imm7)            #mapping TBD                                

csrrs               3rsd imm7      csrrs  rsd rsd  map(imm7)     

csrrc               3rsd imm7      csrrc  rsd rsd  map(imm7)

csrr                 3rd  imm7      csrrc  rd zero map(imm7)



@li zero      0                :  designated illegal

@li zero      1                :  ecall

@li zero      1imm[3:0]  :  break 4imm                                         #different breaks are useful for hosted environments

@li_7 zero  0imm[3:0]  :  mfence 4imm   fence.imm0000       

@ll_7 zero  1imm[3:0]  :  iofence 4imm   fence 0000imm       

@lui   zero  0                :  ifence    

      


@addi zero 0               : designated nop
@beq  zero 0               : wfi 

Op maandag 2 april 2018 21:37:15 UTC+2 schreef ray.vandewalker:

Luke Kenneth Casson Leighton

unread,
Apr 4, 2018, 11:22:54 AM4/4/18
to Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, ", jcb...@gmail.com, ", jhaus...@gmail.com, ", k...@prtime.org, ", micha...@mac.com, ", xan...@gmail.com
On Wed, Apr 4, 2018 at 3:27 PM, Rogier Brussee <rogier....@gmail.com> wrote:
> (CC'ing Liviu Ionescu and Jacob Bachmeyer as something like this might be
> useful for the microcontroller spec, John Hauser because he proposed
> modifications for RV32E and RV32EC to better use halfword and bytesize
> data, Kelly Dean because he proposed ideas on binary portability between
> RV32 and RV64, Michael Clark because of RV8 JIT, and Xan Phung in
> recognition of his ideas on Xcondensed).
>
> Thanks for linking to the Xcondensed proposal. My proposal was not so much a
> proposal for RV16 as for an alternative for the C extension using 16bit wide
> instructions with a comparable code compression characteristics as the
> standard C extension when used in combination with the full 32bit wide
> instructions with the following properties:

if you (collectively) don't mind me throwing in a curveball, i've
been on these lists for only a couple of months so have been catching
up, and i've seen quite a fair share of proposals and questions about
support for 8 and 16-bit integer operations, as well as some 16-bit FP
justifications.

such operations appear to be misaligned (haha) sorry *ma*ligned,
perplexingly with the justificattion "the world is going 64-bit, we
tolerate 32-bit, why on earth would you want 16 and 8 bit arithmetic"
and as a result there is load and store with zero and sign-extend into
the *full* extent of the 32/64-bit registers.

where such operations (8 and 16 bit) make sense is when you perform
multiples of those in parallel (Vector/SIMD), or you need to be
bit-level manipulation. so let's take a look...

* B Extension: place-holder
* V-Extension: i love it for its power and potential: sadly it's so
complex and comprehensive and all-or-nothing that there only one
implementation, and that's not been published.

at this point we can quietly say to ourselves a single word:

"...oops".

to address the problem of V-Extension being too complex and
comprehsnsive, i raised the following topic / question a couple of
days ago:
https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/GuukrSjgBH8

borrowing from the V-Extension, the proposal basically boils down to a
single instruction: "please implicitly tag register N as being a
vector of length M, such that operations on register N implicitly
actually are carried out simultaneously across registers N through
N+M-1".

the basic assumption of this proposal was that it would be possible to
use the RV32 instructions to say "i'd like to do 32-bit vector
operations", and the RV64 instructions to say "i'd like to do 64-bit
vector operations", and i hadn't quite thought through how to do 8 and
16-bit operations completely.

i guess i kinda assumed that 16-bit operations were possible... but
was then shocked to find that they weren't *anywhere* in the spec: you
*have* to use the load/store with zero/sign-extend operations, and, in
a vector/SIMD world that's not going to fly as it wastes huge amounts
of space (and cycles).

with apologies at not quite being able to remember who it was (was it
richard herveille? i think it was you, wasn't it?), someone previously
raised the puzzling lack of symmetry in the instruction set: there is
an "impllcit-sized" add, a 32-bit add and a 64-bit add.

the implicit-sized operations *might* be the saving grace by which
it's possible to extricate from this hole, by borrowing (again) from
the V-Extension, by being able to say "please implicitly tag register
N as being of width M" where M is 8, 16, "original size" or
"future-reserved" (2 bits to store that). this would over-rule the
default "implicit-sized" operations to be of size M, indefinitely.

so in this way, rogier, there are a couple of possibilities:

(A) rather than add RV16 (and even RV8!) to the existing instruction
set, registers are "tagged" into a CSR with a size (exactly as is
proposed in V-Extension, right now). by setting the Vector length
equal to one, you have the means to use the "implicit-sized"
operations.

(B) you *still* add RV16 (and possibly even RV8) *not* so much because
someone might want to implement stand-alone 16-bit or 8-bit processors
(they might) but because those instructions would become *part of
RV32/64/128*.

so in the case that you describe, rogier, of condensed instructions in
the proposed RV16, they might not actually matter as much. to make
that clear: the counter-arguments against RV16 *did not take into
account* the fact that RV16 (or RV8) operations would be accessible to
RV32/64/128, and as such could reduce the burden of implementing a B
extension proposal, and also a simplified V extension proposal.

if Bit-wise operations are *forced* to be carried out on the full
(default) bit-width, how on earth do you do 16-bit rotate when it's
needed? or 8-bit rotate? it has to be *explicitly* coded into the
actual Bit-wise instruction, doesn't it?

and in some cases you *really cannot* do full (default, 32/64/128
bit-wise operations). for example here:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/zi_7B15kj6s/w9y_KHM0AwAJ
clifford points out that the proposed BGS (bit gatherer and shuffler)
instruction is limited deliberately to only 16 bits, because the
amount of control/bit-manipulation information required to arbitrarily
swap greater than 16 bits is simply too large.

.... but if there was an instruction which explicitly allowed the
operand size to be reduced to 16 or 8 bits, that problem is solved, is
it not?

anyway apologies to you, rogier, i'm not ignoring what you described :)

l.

Liviu Ionescu

unread,
Apr 4, 2018, 11:22:58 AM4/4/18
to Rogier Brussee, RISC-V ISA Dev
On 4 April 2018 at 17:27:44, Rogier Brussee (rogier....@gmail.com) wrote:

> CC'ing Liviu Ionescu ... as something like
> this might be useful for the microcontroller spec
...
> an alternative for the C extension using 16bit wide instructions
> with a comparable code compression characteristics as the standard
> C extension

Thank you, Rogier.

As far as the microcontroller proposal is targetted, I already
mentioned that it does not focus on changes to the instruction set,
but to making the architecture more C/C++ friendly.

So, any instruction sets and encodings that will be agreed for the
privileged profile will probably be ok for the microcontroller profile
too, except the ABI, which needs a redesign to reduce the number of
registers saved by the caller and possibly be consistent with the
RV32E reduced number of registers.

In my oppinion, in a well designed architecture, the actual user
should have nothing to do with the instruction set at all, the
toolchain must deal with these details, not the end user.

This does not mean that the instruction set is not important, it
obviously it, but it is not the corner stone of the microcontroller
profile.

In addition, although I agree that there may be use cases that I did
not think about, I would not go below 32-bits registers and memory
space.


Regards,

Liviu

lk...@lkcl.net

unread,
Apr 4, 2018, 11:56:02 AM4/4/18
to RISC-V ISA Dev, jha...@gmail.com, i...@livius.net, jcb6...@gmail.com, ke...@prtime.org, xan....@gmail.com, michae...@mac.com
oof, whoops rogier, the cc list was borked! :)  also i tracked down some cross-references to the various discussions you mention, for the benefit of people who may not have seen them (or wish to re-read and refresh their memories).


On Wednesday, April 4, 2018 at 3:27:43 PM UTC+1, Rogier Brussee wrote:
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec,

 
John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data,

 
Kelly Dean because he proposed ideas on binary portability between RV32 and RV64,

 i *think* it's this link (kelly, rogier, can you confirm?)

Michael Clark because of RV8 JIT,

this looks relevant (also link to http://rv8.io)

ok whoops that's important to note that it's "rv8 as in like google v8" *NOT* repeat *NOT* "RV8" as in "RV8, RV16, RV32, RV64, RV128...."

and Xan Phung in recognition of his ideas on Xcondensed). 

i believe you may be referring to this, rogier?

for completeness, and for the benefit of the people who were (attempted to be!) cc'd, an archive link to the full message that rogier sent is here:

l.

Ray Van De Walker

unread,
Apr 4, 2018, 12:40:18 PM4/4/18
to RISC-V ISA Dev
The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
But, 32 registers are too many for most register allocators to use well, so I have always thought
this wasted some bits, and was a real opportunity for improvement.
If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.
Other formats could extend the immediate fields.
Five of the sizes are obvious: 8, 16, 32, 64, 128. The three unused sizes could handle the misty future.
The float set's R-mode instructions can then encode the D and Q R-format instructions.

Another way to handle sizes and types would have load instructions tag registers.
The tags then become part of the instruction decoding. That's a very different ISA, however.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDzK_P4KdV7unEhrS9ojsNFsHbQLLJP3dzYX_UP5QPNAsA%40mail.gmail.com.

Luke Kenneth Casson Leighton

unread,
Apr 4, 2018, 1:46:31 PM4/4/18
to Ray Van De Walker, RISC-V ISA Dev
On Wed, Apr 4, 2018 at 5:40 PM, Ray Van De Walker
<ray.van...@silergy.com> wrote:

> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.

the V extension marks (tags) registers with a size field. section
17.5. it's termed "element width". sizes are 8 16 32 64 128 and
"disabled". oh... i hadn't noticed before, there's... don't quite
understand the latter paragraphs (table 17.6)

17.12 then goes on to describe the vector instruction format(s),
17.14 describes the polymorphism feature (impllicit type-casting from
int to float including automatic zero sign-extension).

so there are two divergent aspects:

(1) what i proposed does not need a size field to be added to the
ISA. it *implicitly* marks registers as containing 8-bit (or 16-bit)
values, where the top bits would (implicitly) be left unaltered.

(2) i was asking if RV16 (and RV8?) were practical to add, with their
own complete ISA, such that there becomes now a separate add, separate
div, separate mul and so on, each carrying out 16-bit (or 8-bit)
operations respectively, *such that*, when added to the ISA, they
*augment* the RV32 and RV64 ISAs in an *identical* way to that which
the RV32 ISA augments the RV64 ISA.

> If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.

so am i correct in understanding that this would (because I set is
frozen) be a hypothetical but alternative way to gain 8-bit and 16-bit
operations, and that (1) or (2) above (as separate and distinct from
th hypothetical R-format recast) would still be feasible and/or worth
exploring?

l.

Richard Herveille

unread,
Apr 4, 2018, 3:01:00 PM4/4/18
to Ray Van De Walker, RISC-V ISA Dev


Sent from my iPhone

> On 4 Apr 2018, at 18:40, Ray Van De Walker <ray.van...@silergy.com> wrote:
>
> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.

The ISA is frozen. Any changes must ensure backward compatibility.
Adding a register size control register would not break the ISA and be fully backward compatible.

Richard
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/DM5PR2001MB1019E3765F62C55BD3B25071F0A40%40DM5PR2001MB1019.namprd20.prod.outlook.com.

lkcl .

unread,
Apr 4, 2018, 3:20:51 PM4/4/18
to Richard Herveille, Ray Van De Walker, RISC-V ISA Dev
On Wed, Apr 4, 2018 at 8:00 PM, Richard Herveille
<richard....@roalogic.com> wrote:

>> On 4 Apr 2018, at 18:40, Ray Van De Walker <ray.van...@silergy.com> wrote:
>>
>> The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
>
> The ISA is frozen. Any changes must ensure backward compatibility.
> Adding a register size control register would not break the ISA and be fully backward compatible.

thanks for clarifying, richard.

l.

Rogier Brussee

unread,
Apr 5, 2018, 6:21:15 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com, i...@livius.net, ", jcb...@gmail.com, ", jhaus...@gmail.com, ", k...@prtime.org, ", micha...@mac.com, ", xan...@gmail.com


Op woensdag 4 april 2018 17:22:54 UTC+2 schreef lk...@lkcl.net:
First, Xcondensed is _primarily_ about an _alternative for the C extension_ for RV32/RV64 that you _could_ use as a stand alone 16 bit ISA, and that would give a natural binary compatible upgrade path from 32 to 64 bit. 
If you leave out the 32 bit wide instructions it would be merely "RV inspired" though  But indeed I also indicated how it could be modified to do something similar for
a hypothetical RV16/RV32 in a thread literally started as an aprils fools joke. However,  people noted that 16 bit is not quite dead because running the clock of a microwave works fine in 16 bit and might cost less. _If_  you want a 16 bit CPU that is "RV16", then it is going to be constrained, fixed length 16 bit instructions will be natural fit, and standalone Xcondensed may be close enough

.


(A) rather than add RV16 (and even RV8!) to the existing instruction
set, registers are "tagged" into a CSR with a size (exactly as is
proposed in V-Extension, right now).  by setting the Vector length
equal to one, you have the means to use the "implicit-sized"
operations.

(B) you *still* add RV16 (and possibly even RV8) *not* so much because
someone might want to implement stand-alone 16-bit or 8-bit processors
(they might) but because those instructions would become *part of
RV32/64/128*.

That was basically what John Hauser proposed: adding addh subh and addhi, but he proposed to use the space reserved for the w instructions so that would not work well with your vector instructions.
I would think that addh rd rs1 rs2 (and perhaps subh)  which includes sexth  and  something that gives zexth like addhu rd rs1 rs2  :  add rs1 zext(rs2, 16)  and similar for b should be enough. Immediate instructions are expensive in encoding space.

 
 

so in the case that you describe, rogier, of condensed instructions in
the proposed RV16, they might not actually matter as much.  to make
that clear: the counter-arguments against RV16 *did not take into
account* the fact that RV16 (or RV8) operations would be accessible to
RV32/64/128, and as such could reduce the burden of implementing a B
extension proposal, and also a simplified V extension proposal.

if Bit-wise operations are *forced* to be carried out on the full
(default) bit-width, how on earth do you do 16-bit rotate when it's
needed?  or 8-bit rotate? it has to be *explicitly* coded into the
actual Bit-wise instruction, doesn't it?

use shifts or add  sll[h/b] srad[h/b] srl[h/b] and rll[h/b] instructions.
(note no immediates!) I don't know if they are worth it though.

 
and in some cases you *really cannot* do full (default, 32/64/128
bit-wise operations).  for example here:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/zi_7B15kj6s/w9y_KHM0AwAJ
clifford points out that the proposed BGS (bit gatherer and shuffler)
instruction is limited deliberately to only 16 bits, because the
amount of control/bit-manipulation information required to arbitrarily
swap greater than 16 bits is simply too large.

....  but if there was an instruction which explicitly allowed the
operand size to be reduced to 16 or 8 bits, that problem is solved, is
it not?

anyway apologies to you, rogier, i'm not ignoring what you described :)

No need. Thanx for your reaction. 

l.

Rogier Brussee

unread,
Apr 5, 2018, 6:33:32 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com


Op woensdag 4 april 2018 17:22:58 UTC+2 schreef Liviu Ionescu:
On 4 April 2018 at 17:27:44, Rogier Brussee (rogier....@gmail.com) wrote:

> CC'ing Liviu Ionescu ... as something like
> this might be useful for the microcontroller spec
...
> an alternative for the C extension using 16bit wide instructions
> with a comparable code compression characteristics as the standard
> C extension

Thank you, Rogier.

As far as the microcontroller proposal is targetted, I already
mentioned that it does not focus on changes to the instruction set,
but to making the architecture more C/C++ friendly.

Which seems eminently sensible. 


So, any instruction sets and encodings that will be agreed for the
privileged profile will probably be ok for the microcontroller profile
too, except the ABI, which needs a redesign to reduce the number of
registers saved by the caller and possibly be consistent with the
RV32E reduced number of registers.


Yes. That inspired me to tinker with my original proposal target 16 registers, 
retain a few CSR's despite their cost in encoding. I already had mmio based registers. 

 
In my oppinion, in a well designed architecture, the actual user
should have nothing to do with the instruction set at all, the
toolchain must deal with these details, not the end user.


Agreed completely.
 
This does not mean that the instruction set is not important, it
obviously it, but it is not the corner stone of the microcontroller
profile.


And rightly so, RV32EC should work fine. The C extension is a bit wasteful
if you only use 16 registers however. 
 
In addition, although I agree that there may be use cases that I did
not think about, I would not go below 32-bits registers and memory
space.


 
Xcondensed is primarily about RV32/RV64. 
 
Regards,


Thanx for your time!

Rogier
 
Liviu

Michael Chapman

unread,
Apr 5, 2018, 6:45:53 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev, i...@livius.net, jcb6...@gmail.com, jhause...@gmail.com, ke...@prtime.org, michae...@mac.com, xan....@gmail.com

Our proprietary 32 bit CPUs are often used to replace 8 and 16 bit cores and usually run code which originates from a code base which was created for an 8 or 16 bit cpu.

We have only 32 bit operations, but we do have [s|z]ext[b|h] instructions. However, we find that these are actually rarely required and are required even less in well written code.

The biggest use for the extend instructions is when a loop counter has been declared as something like uint8_t or uint16_t instead of just int and there is a possibility that the value could wrap around. In many cases it is possible for the compiler to avoid generating the extend instructions as it is easy enough to ascertain that the value will not ever wrap around.

My opinion is that you should fix the compiler rather than adding half word or byte signed and unsigned add instructions to the ISA. There are very few occasions where there they will be useful - even on code written for 8 or 16 bit cores.

We do have an option to support unaligned accesses on all our cores. On our smallest cores, customers very rarely use this option - even when their code is coming from an 8 or 16 bit processor.

I think unaligned accesses should be prohibited and dropped from the specification. At the moment the spec says that an unaligned access could be very slow. In which code will avoid ever using it. And then there is no point in having it in the spec at all.

A bit field insert instruction is often useful for deeply embedded code. I.e. an instruction which can take the n least significant bits from a register and insert them at an arbitrary position in another register without upsetting the other bits. This can be used for coding rotates as well.

16 registers is plenty for most applications we see. For RV32E I would still allow the possibility to have single precision floating point, but would not encode them into 16 bit instructions. I would also not have a separate floating point register file but use the same registers as for integer instructions. This reduces the context size required for each task in a small embedded RTOS and again, in practice for most code we see - even with floating point, 16 registers is enough.

Even in floating point intensive applications, there is little point in using up 16 bit instruction space with the floating point instructions. Leave them all as 32 bits.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Apr 5, 2018, 6:53:41 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 13:33:34, Rogier Brussee (rogier....@gmail.com) wrote:

> > As far as the microcontroller proposal is targetted, I already
> > mentioned that it does not focus on changes to the instruction
> set,
> > but to making the architecture more C/C++ friendly.
>
> Which seems eminently sensible.

Can you elaborate? I'm open to any suggestions.

Regards,

Liviu

Rogier Brussee

unread,
Apr 5, 2018, 7:00:03 AM4/5/18
to RISC-V ISA Dev, jha...@gmail.com, i...@livius.net, jcb6...@gmail.com, ke...@prtime.org, xan....@gmail.com, michae...@mac.com


Op woensdag 4 april 2018 17:56:02 UTC+2 schreef lk...@lkcl.net:
oof, whoops rogier, the cc list was borked! :)  also i tracked down some cross-references to the various discussions you mention, for the benefit of people who may not have seen them (or wish to re-read and refresh their memories).


Oops.
 
On Wednesday, April 4, 2018 at 3:27:43 PM UTC+1, Rogier Brussee wrote:
(CC'ing Liviu Ionescu and Jacob Bachmeyer as  something like this might be useful for the microcontroller spec,

 
John Hauser because he proposed modifications for RV32E  and RV32EC to better use halfword and bytesize data,

 

Yep.
 
Kelly Dean because he proposed ideas on binary portability between RV32 and RV64,

 i *think* it's this link (kelly, rogier, can you confirm?)


Yep.
 
Michael Clark because of RV8 JIT,

this looks relevant (also link to http://rv8.io)

Yep.
 
 
yes that threat resulted in making the  auipc ra high(imm) jalr ra ra low(imm) pattern officially blessed. In fact it is so useful that I consider C.jalr_ra_ra 11imm:   jalr  ra ra imm<<1 as a good candidate for using the remaining reserved slot in C, but that would need more data (e.g. a linux distribution with and without this instruction)


ok whoops that's important to note that it's "rv8 as in like google v8" *NOT* repeat *NOT* "RV8" as in "RV8, RV16, RV32, RV64, RV128...."


Correct.
 
and Xan Phung in recognition of his ideas on Xcondensed). 

i believe you may be referring to this, rogier?


Rogier Brussee

unread,
Apr 5, 2018, 7:01:40 AM4/5/18
to RISC-V ISA Dev


Op woensdag 4 april 2018 18:40:18 UTC+2 schreef ray.vandewalker:
The user-mode I set is frozen, (and honestly, quite well-designed) so it's too late for a size field to be added to the ISA.
But, 32 registers are too many for most register allocators to use well, so I have always thought
this wasted some bits, and was a real opportunity for improvement.
If R-format instructions were recast for 16 registers, 3 bits are freed for an orthogonal size field.
Other formats could extend the immediate fields.
Five of the sizes are obvious: 8, 16, 32, 64, 128. The three unused sizes could handle the misty future.

size  = XLEN 

Rogier Brussee

unread,
Apr 5, 2018, 8:04:25 AM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 5 april 2018 12:53:41 UTC+2 schreef Liviu Ionescu:
What you already know and stated: people are more expensive and important than transistors. They choose the path of least resistance and least cost. Therefore, focussing on the usability part trumps (comparatively small) technical advantages or savings for succes in the market. C/C++ code is obviously far easier to use, less expensive and bug prone than assembler. Since as you point out, in the embedded world you have to interact (and debug) low level features like interrupt handling, an easy to understand, reliable, and easy to debug programming model for low level features will be a deciding factor. Also if assembler is needed, RV, at least the non privileged part, is about as easy to understand as it gets, and will be (relatively) widely taught*. This means that fewer guru points are required to use assembler but also that the intricacies of low level control features (CSR's cough) become relatively more important stumbling blocks for using that skill effectively.  

*this would be the one reason why I could imagine a 16 bit RV processor having some succes: if there is reason enough to bear the pain of a 16 bit processor, then at least make it as easy as possible to deal with it. 


Regards,

Liviu

Liviu Ionescu

unread,
Apr 5, 2018, 8:50:33 AM4/5/18
to Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 15:04:27, Rogier Brussee (rogier....@gmail.com) wrote:

> ... people are more expensive and important than transistors.

nicely said. probably it should be the motto of the microcontroller profile.

> ... an easy to understand,
> reliable, and easy to debug programming model for low level features will
> be a deciding factor.

I hope it will.

unfortunately I am not aware of any formal efforts to advance the
RISC-V microcontroller profile proposal :-(


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 9:12:39 AM4/5/18
to Michael Chapman, Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, Jacob Bachmeyer, John Hauser, Kelly Dean, Michael Clark, xan....@gmail.com
On Thu, Apr 5, 2018 at 11:46 AM, Michael Chapman <michael.c...@gmail.com> wrote:


My opinion is that you should fix the compiler rather than adding half word or byte signed and unsigned add instructions to the ISA. There are very few occasions where there they will be useful - even on code written for 8 or 16 bit cores.


if the only use-case was general-purpose computing i would absolutely agree with you.  however would it not be reasonable to want to use RISC-V for 3D Graphics, Video Processing, Cryptographic algorithms, Audio processing, and many many more applications which normally you would easily spend $500k on for 3D, $200k for video, $100k for a cryptographics co-processor and $50k-$100k for an audio DSP and so on?  these are not uncommon use-case scenarios [understatement: Xtensa were proud to announce their *billionth* license of their Audio DSP hard mcro, about ten years ago!]

that's an awful lot of money to be spending when you _could_ .... if RISC-V supported it... use a B-Extended SIMD/Simple-Vector extended 8/16-bit-capable RISC-V processor instead [hell, the cost of licensing the above hard macros *alone* justifies putting a team together to make that happen!].

looking at jeff bush's nyuzi 3D GPU analysis [1], he points out that the reason why software-defined GPUs have failed is because it's not the amount of processing that's so much the issue, it's the amount of power needed for the SRAM / L1 cache.  once you've got the data into the ALU,  it's *really* important to do as much work as possible before writing it back out of the registers.

if we want RISC-V to be successful in really rather high-profile mass-volume uses (cryptography, DSP work, Video, 3D, Tensors for AI), we *really* need to think beyond just the "general-purpose" scenario.


We do have an option to support unaligned accesses on all our cores. On our smallest cores, customers very rarely use this option - even when their code is coming from an 8 or 16 bit processor.

I think unaligned accesses should be prohibited and dropped from the specification. At the moment the spec says that an unaligned access could be very slow. In which code will avoid ever using it. And then there is no point in having it in the spec at all.

i also considered suggesting the same thing (to prohibit unaligned memory access).  however... how would you then do audio processing of data that comes in from a DMA buffer, in 8, 16, 24 or 32-bit configurations (back-to-back samples with no word-alignment)?  someone buys an off-the-shelf AC97 hard macro... they pay $50k to $100k for it and they *can't read the data*???  or they have to jump through insane hoops to get at it, by doing a multiply (shift by 8 or 16), then & to mask out unwanted bits, then divide (shift by 16 or 24) to get the lower bits?  and do that on almost every single or every other audio sample?

so maybe the data rate of audio one might imagine that doing that would be fine... but for video processing (1080p60 which is nearly 500 mbytes per second of bandwidth for 32-bit pixels), you might think that going to 24-bit or 16-bit would save on bandwidth but on CPU cycles the above hoops to jump through would... you get the idea.



A bit field insert instruction is often useful for deeply embedded code. I.e. an instruction which can take the n least significant bits from a register and insert them at an arbitrary position in another register without upsetting the other bits. This can be used for coding rotates as well.

 

16 registers is plenty for most applications we see. For RV32E I would still allow the possibility to have single precision floating point, but would not encode them into 16 bit instructions. I would also not have a separate floating point register file but use the same registers as for integer instructions.

yes: i was quite surprised to see that FP has a separate register file.  it makes sense from a perspective of an optimised implementation where the FPU runs separately from an ALU (and those FENCE instructions are used to keep stuff in order).  or.... no actually it doesn't make sense at all :)

l.

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 9:18:07 AM4/5/18
to Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev
On Thu, Apr 5, 2018 at 1:50 PM, Liviu Ionescu <i...@livius.net> wrote:
> On 5 April 2018 at 15:04:27, Rogier Brussee (rogier....@gmail.com) wrote:
>
>> ... people are more expensive and important than transistors.
>
> nicely said. probably it should be the motto of the microcontroller profile.

:)

>> ... an easy to understand,
>> reliable, and easy to debug programming model for low level features will
>> be a deciding factor.
>
> I hope it will.
>
> unfortunately I am not aware of any formal efforts to advance the
> RISC-V microcontroller profile proposal :-(

*and* the B-Extension working group was shut down (annoying its
external contributors) *and* V-Extension is stalled (i learned that
Hwacha was terminated in 2017, it's listed as a "former project" here:
http://people.eecs.berkeley.edu/~krste/)

whaaat's gooing ooon?

l.

Richard Herveille

unread,
Apr 5, 2018, 9:53:14 AM4/5/18
to Luke Kenneth Casson Leighton, Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev, Richard Herveille

http://people.eecs.berkeley.edu/~krste/)

 

The V-extensions are not stalled.

Hwacha is an implementation of a vector processor, but it is not compatible with the proposed V-extensions.

See Esperanto Technology’s presentations.

 

Richard

 

 

 

whaaat's gooing ooon?

 

l.

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Liviu Ionescu

unread,
Apr 5, 2018, 10:00:56 AM4/5/18
to Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 16:18:01, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:

> ... whaaat's gooing ooon?

well, on the Linux front, lots of things.

outside the Linux world...  nothing. :-(

and don't expect any change soon, Krste clearly stated that the
official position is to maintain compatibility with the privileged
specs. which will probably discourage any use of RISC-V in
microcontrollers.

regards,

Liviu


On 17 March 2018 at 02:16:23, kr...@berkeley.edu (kr...@berkeley.edu) wrote:

> ... The
> new task group is looking at extending interrupt behavior, but
> with a
> view to maintaining backwards compatibility and to support
> dual-use
> cores that run either real-time or virtual-memory code.

Richard Herveille

unread,
Apr 5, 2018, 10:06:48 AM4/5/18
to Liviu Ionescu, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev, Richard Herveille

 

... whaaat's gooing ooon?

 

well, on the Linux front, lots of things.

 

outside the Linux world...  nothing. :-(

 

and don't expect any change soon, Krste clearly stated that the

official position is to maintain compatibility with the privileged

specs. which will probably discourage any use of RISC-V in

microcontrollers.

 

 

There’s nothing from stopping us from writing a microcontroller spec which does not comply to the privilege spec, but still comply to the user spec and the other extensions.

 

Richard

 

 

regards,

 

Liviu

 

 

On 17 March 2018 at 02:16:23, kr...@berkeley.edu (kr...@berkeley.edu) wrote:

 

... The

new task group is looking at extending interrupt behavior, but

with a

view to maintaining backwards compatibility and to support

dual-use

cores that run either real-time or virtual-memory code.

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 10:12:57 AM4/5/18
to Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev
On Thu, Apr 5, 2018 at 3:00 PM, Liviu Ionescu <i...@livius.net> wrote:

> On 5 April 2018 at 16:18:01, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:
>
>> ... whaaat's gooing ooon?
>
> well, on the Linux front, lots of things.
>
> outside the Linux world... nothing. :-(
>
> and don't expect any change soon, Krste clearly stated that the
> official position is to maintain compatibility with the privileged
> specs.

why?

Liviu Ionescu

unread,
Apr 5, 2018, 10:25:31 AM4/5/18
to Richard Herveille, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 17:06:46, Richard Herveille
(richard....@roalogic.com) wrote:

> There’s nothing from stopping us from writing a microcontroller
> spec which does not comply to the privilege spec, but still comply
> to the user spec and the other extensions.

right. I already did a first step towards this.

those who took the time to read it generally had positive feedback.

Krste also acknowledged that 'we also need a standard "rich"
microcontroller profile and that this should support C ISRs and
preemption/nesting efficiently'.

but apart from this... there were no further contributions. :-(


regards,

Liviu

Liviu Ionescu

unread,
Apr 5, 2018, 10:28:40 AM4/5/18
to Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 17:12:51, Luke Kenneth Casson Leighton (lk...@lkcl.net) wrote:

> > official position is to maintain compatibility with the privileged
> > specs.
>
> why?

no idea.

I already analysed this concern, and did not find it realistic:

https://github.com/emb-riscv/specs-markdown/blob/develop/improvements-upon-privileged.md#fragmentation-would-break-upward-compatibility


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 10:58:11 AM4/5/18
to Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev
agreed. cost-wise, 48-pin 0.5mm pitch QFPs can be put onto
single-sided double-layer PCBs that cost $50 to prototype and come
back in 24-48 hours from shenzhen factories (and about 7-10 days from
Eurocircuits for about the same price).

you're absolutely right about the embedded NAND and RAM (but didn't
mention the PMIC issue). *nobody* replaces a $0.32 STM32F030 or
$1.50 STM32F072, which can be powered from discrete components costing
$0.15 where the PCB costs $0.40 and assembly even less, where
prototypes can be made all-in for under $200 with a $4+ 400-pin
processor that takes up 8x the PCB space, needs a PMIC with *four*
inductors surrounding it, external DRAM, external NAND and a 4-layer
to 8-layer PCB that costs two THOUSAND dollars to get 10 samples made
up.

any company that tried that would quickly go out of business.

plus, the software that fits on embedded cores (1k, 8k, 16k, gosh you
have how much... you have 128k RAM on that micro-controller? wow
that's amazing!) is so tiny and so tied to the actual hardware that
you just... you _just_ don't port it, you rewrite it. the libraries
(libopencm3) are so hard-core specialist (libopencm3 is an exception
in that it provides abstracted APIs common to many many different ECUs
but is still so far from POSIX you might as well be talking Klatchian)
that a total rewrite to a linux-based OS - even if the SoC supported
all the same peripherals - would be about your only option.

so yeah i'd agree you pretty much nailed it.

the rest... i feel... yeah, the people in the RISC-V Foundation who
were controlling the specification for micro-controllers really were
out of their depth, with not enough knowledge and expertise on how
MCUs are deployed in commercial real-world applications.... and their
response was... to shut down the development of that part of the
specification.

i'm seeing a pattern, here.

l.

Michael Chapman

unread,
Apr 5, 2018, 11:18:04 AM4/5/18
to Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, Jacob Bachmeyer, John Hauser, Kelly Dean, Michael Clark, xan....@gmail.com

On 05-Apr-18 15:12, Luke Kenneth Casson Leighton wrote:
On Thu, Apr 5, 2018 at 11:46 AM, Michael Chapman <michael.c...@gmail.com> wrote:


My opinion is that you should fix the compiler rather than adding half word or byte signed and unsigned add instructions to the ISA. There are very few occasions where there they will be useful - even on code written for 8 or 16 bit cores.


if the only use-case was general-purpose computing i would absolutely agree with you.  however would it not be reasonable to want to use RISC-V for 3D Graphics, Video Processing, Cryptographic algorithms, Audio processing, and many many more applications which normally you would easily spend $500k on for 3D, $200k for video, $100k for a cryptographics co-processor and $50k-$100k for an audio DSP and so on?  these are not uncommon use-case scenarios [understatement: Xtensa were proud to announce their *billionth* license of their Audio DSP hard mcro, about ten years ago!]

I am talking about low cost micro-controllers for deeply embedded devices such as SIM cards, Java cards etc, where the complete SOC (with NVM. peripherals and accelerators) sells for 10c including profit margin. One of our clients has made billions of those.

We have other deeply embedded devices with similar costs (including one where the number of IOs and IO pad size determines the choice of process node and foundry rather than what is actually inside the chip - basically an IO pad costs more in smaller geometries).



that's an awful lot of money to be spending when you _could_ .... if RISC-V supported it... use a B-Extended SIMD/Simple-Vector extended 8/16-bit-capable RISC-V processor instead [hell, the cost of licensing the above hard macros *alone* justifies putting a team together to make that happen!].


This is SIMD/Vector processing. Nothing to do with supporting wrap around 16 bit arithmetic for low cost microcontrollers running old code.



looking at jeff bush's nyuzi 3D GPU analysis [1], he points out that the reason why software-defined GPUs have failed is because it's not the amount of processing that's so much the issue, it's the amount of power needed for the SRAM / L1 cache.  once you've got the data into the ALU,  it's *really* important to do as much work as possible before writing it back out of the registers.

if we want RISC-V to be successful in really rather high-profile mass-volume uses (cryptography, DSP work, Video, 3D, Tensors for AI), we *really* need to think beyond just the "general-purpose" scenario.

I agree. And this is vector processing. A proper vector processor will be much better than any SIMD hack. The basic Cray kind of model is not bad - providing you have the right data precisions (fixed point and integer). There are many interesting implementation trade-offs possible which are entirely SW compatible for such machines and allow for low cost very high bandwidth/processing power implementations.




We do have an option to support unaligned accesses on all our cores. On our smallest cores, customers very rarely use this option - even when their code is coming from an 8 or 16 bit processor.

I think unaligned accesses should be prohibited and dropped from the specification. At the moment the spec says that an unaligned access could be very slow. In which code will avoid ever using it. And then there is no point in having it in the spec at all.

i also considered suggesting the same thing (to prohibit unaligned memory access).  however... how would you then do audio processing of data that comes in from a DMA buffer, in 8, 16, 24 or 32-bit configurations (back-to-back samples with no word-alignment)?  someone buys an off-the-shelf AC97 hard macro... they pay $50k to $100k for it and they *can't read the data*???  or they have to jump through insane hoops to get at it, by doing a multiply (shift by 8 or 16), then & to mask out unwanted bits, then divide (shift by 16 or 24) to get the lower bits?  and do that on almost every single or every other audio sample?

Any C compiler will generate the proper code to access unaligned data out of a packed structure without doing an unaligned load or store. And that code is not that inane either. However, thought should be given to fixing the peripheral (or putting a wrapper around it) to present the data in a sensible format.


so maybe the data rate of audio one might imagine that doing that would be fine... but for video processing (1080p60 which is nearly 500 mbytes per second of bandwidth for 32-bit pixels), you might think that going to 24-bit or 16-bit would save on bandwidth but on CPU cycles the above hoops to jump through would... you get the idea.


Fix the peripheral. If you are designing anything for video today, you should be doing it for 8K.
You are already far too late to market for HD!




A bit field insert instruction is often useful for deeply embedded code. I.e. an instruction which can take the n least significant bits from a register and insert them at an arbitrary position in another register without upsetting the other bits. This can be used for coding rotates as well.

 clifford kindly elaborated on this very recently:
 

16 registers is plenty for most applications we see. For RV32E I would still allow the possibility to have single precision floating point, but would not encode them into 16 bit instructions. I would also not have a separate floating point register file but use the same registers as for integer instructions.

yes: i was quite surprised to see that FP has a separate register file.  it makes sense from a perspective of an optimised implementation where the FPU runs separately from an ALU

Once you get to high performance implementations, this no longer makes a difference. It is only an advantage for the way they have built the Rocket CPU.


(and those FENCE instructions are used to keep stuff in order).  or.... no actually it doesn't make sense at all :)

Indeed - it is a shame that one implementation seems to have forced the ISA definition on this point.

On https://github.com/emb-riscv/specs-markdown/blob/develop/improvements-upon-privileged.md

The hardware stack limit register is expensive

"The stack limit register needs to be read and compared on every store via the stack register so it should have dedicated read circuit and comparator.

Yes, it is a small price to pay, but by far the most common cause of crashes in a multi-threaded device is stack overflow, so detecting this exception should be worth the extra price.

It is true that by far the most common cause of crashes in a multi-threaded device is stack overflow. However, it is very easy for the compiler to add code to check for stack overflow and the overhead is really not great. There is no need for a HW register.

The RTOS has to context switch the stack limit register/global variable in any case.



Virus-free. www.avg.com

Luke Kenneth Casson Leighton

unread,
Apr 5, 2018, 12:21:21 PM4/5/18
to Michael Chapman, Rogier Brussee, RISC-V ISA Dev, Liviu Ionescu, Jacob Bachmeyer, John Hauser, Kelly Dean, Michael Clark, xan....@gmail.com
On Thu, Apr 5, 2018 at 4:18 PM, Michael Chapman <michael.c...@gmail.com> wrote:
 
I am talking about low cost micro-controllers

understood [now].  there's still quite a bit of cross-over (shared purpose)
 
for deeply embedded devices such as SIM cards, Java cards etc, where the complete SOC (with NVM. peripherals and accelerators) sells for 10c including profit margin. One of our clients has made billions of those.


niiiice.
 
We have other deeply embedded devices with similar costs (including one where the number of IOs and IO pad size determines the choice of process node and foundry rather than what is actually inside the chip - basically an IO pad costs more in smaller geometries).

180nm: $600 for a 10in wafer.  much more commonly-used than people expect.


This is SIMD/Vector processing. Nothing to do with supporting wrap around 16 bit arithmetic for low cost microcontrollers running old code.

true... buuut if you don't have the 16-bit arithmetic (and 8-bit arithmetic) you don't have a base to build on to [efficiently] do many of the SIMD operations needed in those computationally-expensive scenarios.  meaning, you *can* do it but you'd be looking at 10x to 100x more power and die area used by any commercial GPU / VPU / Crypto engine today.
 
so surprisingly there is common ground with the requirements for a micro-controller.
 
Any C compiler will generate the proper code to access unaligned data out of a packed structure without doing an unaligned load or store.

yyyeah ok :)
 
And that code is not that inane either. However, thought should be given to fixing the peripheral (or putting a wrapper around it) to present the data in a sensible format.


yyeah that means it's now not a standard off-the-shelf peripheral, and, also, the zero-padding's now consuming extra bus bandwidth (which in a SoC tends to be precious as it's usually a shared bus).

 
Fix the peripheral. If you are designing anything for video today, you should be doing it for 8K.
You are already far too late to market for HD!


 :)  sigh the memory requirements for 4k are so high for an SoC that i just... didn't want to go there yet.   a single lane of 32-bit-wide DDR3 can cope with 1080p60 H.264 decode (and display on a framebuffer) even when running at a 350mhz clock rate (DDR 700mhz).  and you can also *just* about utilise a 300mhz RGB/TTL interface (hello richard herveille) with an off-the-shelf converter IC (to HDMI, or eDP) etc.   to support 4k you have to go to at *least* double that width (twin 32-bit DDR3), with a clock rate of at least 800mhz, and it would be necessary to license a proprietary HDMI 2.0 hard macro and/or a multi-lane eDP interface.


Indeed - it is a shame that one implementation seems to have forced the ISA definition on this point.


this seems to be the general theme, which i'm slowly picking up on (only been on these lists for a couple of months).  and, that efforts from outsiders are... shut down or ignored.  not good.

 


oink?? i didn't read further down until you referenced that again (liviu also posted it in another thread):

> Microcontrollers should not be on networks

> "Generally, microcontrollers should probably not be on networks, except 

> possibly for multi-core versions that can handle real-time tasks on

> one core and network latency on the other."

wtf??? try telling that to the people who sell ethernet, WIFI and BT Shields for Arduino-compatible devices!!  3D printer controller developers (dc42 and the developers of the smoothieboard) are going to be *pissed*!  and ST Micro, "sorry, you know the high-end STM32* Cortex M-series devices - all 90 of them [1] - that you sell with MII ethernet PHY support, i'm sorry, you're going to have to stop selling those because microcontrollers should not be on networks" ???

yes it's true that most arduino shields are SPI-based and contain their own embedded SoC running their own TCP/IP and even HTTP stack but that doesn't cover the high-end STM32 cases where they *will* be powerful enough for vendors to consider having an event-driven [RTOS-based or light-weight] TCP/IP stack - hell, there's even an arduino port of one! [2]

sorry for getting a bit sarcastic / low-grade humour there but i'm really quite taken aback at what i'm learning.

l.


Rogier Brussee

unread,
Apr 5, 2018, 1:12:47 PM4/5/18
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 5 april 2018 14:50:33 UTC+2 schreef Liviu Ionescu:
You seem to be doing a good job. I like your proposals. 
   

 
regards,

Liviu

Liviu Ionescu

unread,
Apr 5, 2018, 1:21:14 PM4/5/18
to Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 20:12:49, Rogier Brussee (rogier....@gmail.com) wrote:

> You seem to be doing a good job. I like your proposals.

Thank you, Rogier, I appreciate it.

If you have any proposals on how to improve them, please let me know.


At a certain point, when RISC-V QEMU will be more usable outside
Linux, I'll try to emulate a device based on this proposal, so we'll
have a platform to test the microcontroller software.


Liviu

Xan Phung

unread,
Apr 5, 2018, 11:12:38 PM4/5/18
to Rogier Brussee, RISC-V ISA Dev, jha...@gmail.com, i...@livius.net, jcb6...@gmail.com, ke...@prtime.org, michae...@mac.com
and Xan Phung in recognition of his ideas on Xcondensed). 

No this (unfortunately Xan seems to have removed his image)


The image Rogier refers to above was accidentally deleted.  It contained spreadsheet data analysing instruction counts & operand bit size requirements (for GCC, BZIP and Ogg Vorbis assembly code)

I haven't been following this thread closely so I can't comment on how useful my data is for RV16E. 

For those interested in seeing the data, I've recreated a new snapshot of my spreadsheet data for GCC & Ogg Vorbis (compiled into RISCV assembly, by GCC itself).  Unfortunately, I no longer have the data for BZIP assembly code.

Table 1: my 16 opcode version of RV Compressed
  Notes:
  (a) red opcodes are those deleted from "official" 23 opcode RV Compressed
  (b) 16 major opcodes (with 11bits for operands/minor opcode)  = 50% of 32 bit instruction encoding space


Table 2: GCC assembly code stats


Table 3: Ogg Vorbis assembly code stats




EXPLANATION OF ABOVE DATA (the BZIP data mentioned below are no longer available):

It turns out it's possible in my 16 opcode (50% opcode space) version of RVC to add back in some additional load/store opcodes, and in fact get **better** compression (compared to the 23 opcode/72% opcode space version of RVC).

As a recap of my 16 opcode RVC proposal (mark1), the following major opcodes are deleted from the 23 opcode RVC:
- ADDI4SPN
- LWSP
- SWSP
- LUI gets reduced to 8 register format (instead of 32 registers) - becomes same format as other ALU-immediate operations like ANDI/right shifts
- 4 floating point memory ops (in the case of RV64).

In a "mark2" 16 opcode RVC outlined below, I then "add back" three load/store instructions (by reducing LW/SW to only 3 bit offsets).  This produces better compression or highly competitive compression compared to the full 23 opcode RVC, but uses only 16 opcodes... See the following step by step analysis:

Step 1: compilation of benchmarks into RV64 assembly source files
I used the RISC V GCC cross compiler to obtain assembly language output by compiling the BZIP2, Ogg Vorbis and GCC sources (using the single compilation unit versions was incredibly useful for this - see http://people.csail.mit.edu/smcc/projects/single-file-programs/)

These two programs comprise approx 60% of the SPEC2006 subset previously used to benchmark RVC.

The compiler options used were "gcc -Os -S <src_code.c>"
I compiled using RV64 bit architecture and lp64d ABI

Step 2: import assembly language files into Excel as a csv format file.
This allowed me to analyse counts of instructions which can fit into a range of possible instruction formats (ie: destructive/non destructive, and various immediate operand bit sizes from 12 bits to 6 bits, 5 bits or 3 bits... eg the column "d5rt6im" provides counts for instructions which can fit into a destructive 32 register format with 6 immediate bits (or 32 registers for REG-REG ops), whereas the "3rt3rs5im" column are the instruction counts that can fit into an instruction format with 8 register source (3rs), 8 register destination (3rt), and 5 bit immediates.

Limitations
- can't analyse branch offset sizes, as can't extract this from assembly language files
- can't analyse LUI immediate sizes, as these aren't resolved until linking stage.  (Data is thus shown for LUI with 20 bit immediates, and varying register encoding bits)

Step 3: instruction by instruction analysis
As seen in the tables below, the "core" (top 11 non branch) instructions account for ~64-68% of all instructions.  The instruction formats used by the 23 opcode RVC for these "core" instructions are highlighted in pink.  What's interesting is for LW/SW, the 3 bit signed offsets are virtually just as good as the signed 5 bit versions.  By shrinking the offset bits it's possible to encode 4 sets of load/store instructions with 3 bit in place of the LW/SW instructions.

The best of load/stores to choose would probably vary with RV64 vs RV32.
For RV64, LD/SD would retain their dedicated opcodes with 5 bit offsets.  Then the other four sets of load/store instructions (with 3 bit offsets) would comprise the following:
1. LW/SW
2. LBU/SB
3. FLD/FSD
4. FLW/FSW

For RV32, one could replace the LW/SW with LHU/SH, as LW/SW would take the place of the LD/SD dedicated opcodes and get a full 5 bit signed offset encoding,

The byte & half word load/store instruction counts are shown in the line "MEM total".  The instruction counts for the 8 register/8 register/3 bit offset version are highlighted in green, comprise approx ~1.9% of total instructions in GCC's data.  Note also that LUI can be encoded in 8 register form (also highlighted in green), instead of 32 register form with only slight loss in terms of instruction count coverage.

There is a gain of around 1/2 x ~1.9% additional compression by using 4xload/store pairs with 3 bit offsets (the halving factor comes from replacing 32 bit instruction with 16 bit instruction).  Offsetting this is the loss of LWSP/SWSP, which contribute negligible compression in GCC and around 1/2 x 1.2% compression in BZIP2.  So in fact, compression is improved in GCC and nearly identical compression in BZIP2.  (The ADDI16SPN opcode is also lost, but this opcode contributes virtually negligible compression once it's overlapping function with other compressed forms of ADD is taken out).

4. Other issues:
(i). Floating point code: BZIP2 and GCC are mainly integer code, but floating point code also looks very good in terms of compression performance of 3 bit offset load/stores - will post some data for Ogg Vorbis code (approx 60,000 lines of C source) in new year.
(ii). Instruction format complexity: one new format is added for the new load/store instructions. But on the other hand, I delete one instruction format by deleting the ADDI16SPN instruction, so overall no increase in number of formats


Alex Bradbury

unread,
Apr 6, 2018, 5:28:12 AM4/6/18
to Richard Herveille, Liviu Ionescu, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On 5 April 2018 at 15:06, Richard Herveille
<richard....@roalogic.com> wrote:
>
> There’s nothing from stopping us from writing a microcontroller spec which
> does not comply to the privilege spec, but still comply to the user spec and
> the other extensions.

Although the separation between privileged and unprivileged is
intended to make them cleanly separable, it's not clear to me that it
will be possible to jettison the privilege spec, implement a
non-standard alternative, and still have a core that can use the
'RISC-V' name. I do strongly hope this is possible, as it only seems
consistent with the flexibility available for the standard extensions.
Either way, it would be good to have clarity.

There was some discussion on this issue here
https://github.com/riscv/riscv-isa-manual/commit/a439dada57fe6c1ed426351742a5ba7dd2cace37#commitcomment-27447508

Best,

Alex

Richard Herveille

unread,
Apr 6, 2018, 7:36:08 AM4/6/18
to Alex Bradbury, Liviu Ionescu, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev, Richard Herveille

 

On 06/04/2018, 11:28, "Alex Bradbury" <a...@asbradbury.org> wrote:

 

 

There’s nothing from stopping us from writing a microcontroller spec which

does not comply to the privilege spec, but still comply to the user spec and

the other extensions.

 

Although the separation between privileged and unprivileged is

intended to make them cleanly separable, it's not clear to me that it

will be possible to jettison the privilege spec, implement a

non-standard alternative, and still have a core that can use the

'RISC-V' name.

 

Note that I specifically omitted that claim. Given previous replies/emails I am afraid it won’t be recognized as RISC-V anymore.

We could call it R5M, but that leads to fragmentation, which should be avoided if possible.

 

I do strongly hope this is possible, as it only seems

consistent with the flexibility available for the standard extensions.

Either way, it would be good to have clarity.

 

Agreed. I hope the foundation shows some flexibility and leeway here.

 

Richard

 

 

Liviu Ionescu

unread,
Apr 6, 2018, 8:28:40 AM4/6/18
to Richard Herveille, Alex Bradbury, RISC-V ISA Dev, Rogier Brussee, Luke Kenneth Casson Leighton
On 6 April 2018 at 14:36:07, Richard Herveille
(richard....@roalogic.com) wrote:

> I hope the foundation shows some flexibility and leeway here.

hope is good, action is better.

https://github.com/riscv/riscv-isa-manual/blob/a9d7704765360679c1a5e3fa06e0b0e41d6c5f26/src/intro.tex#L57-L63

as long as these lines will not be changed, the privileged specs
remain part of the mandatory requirements.


and the first manual should cover only the instruction set (The RISC-V
Architecture: Instruction Set), without any mention to 'unprivileged',
or 'user' or anything that reminds the 'privileged' specs.


regards,

Liviu

Luke Kenneth Casson Leighton

unread,
Apr 6, 2018, 11:35:12 AM4/6/18
to Richard Herveille, Alex Bradbury, Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev, Rick O'Connor
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Fri, Apr 6, 2018 at 12:36 PM, Richard Herveille
<richard....@roalogic.com> wrote:
>
>
> On 06/04/2018, 11:28, "Alex Bradbury" <a...@asbradbury.org> wrote:

> Note that I specifically omitted that claim. Given previous replies/emails I
> am afraid it won’t be recognized as RISC-V anymore.
>
> We could call it R5M, but that leads to fragmentation, which should be
> avoided if possible.

VR16. and V-CSIR.

>
>
> I do strongly hope this is possible, as it only seems
>
> consistent with the flexibility available for the standard extensions.
>
> Either way, it would be good to have clarity.
>
>
>
> Agreed. I hope the foundation shows some flexibility and leeway here.

There is a different tactic available, and i am deliberately cc'ing
the Director of the RISC-V Foundation for that purpose.

Rick: there appears to be a general consensus and disenfranchisement
with the unintentional exclusion of libre and open hardware
contributors (shutting down of WGs without warning being one of many
examples), who for various reasons (financial and ethical) cannot or
will not join the RISC-V Foundation, and yet would have an
extraordinary amount to contribute to the development of RISC-V *if*
they were actually empowered and enabled to do so.

That they have not been able to do so is resulting in them wishing to
take matters into their own hands and to go ahead with their own
initiative, effectively forking the RiSC-V ISA.

If we do not hear from you with a proposal that allows libre and open
hardware contributors to take a more active role in RISC-V's
development and steering, in a way that is both financially and
ethically respectful of our independent sovereign status as separate
from Corporate interests that primarily make up the members of the
RISC-V Foundation, we will ASSUME that it is perfectly acceptable to
proceed, without the RISC-V Foundation taking any action, to
effectively fork the development of RISC-V under a different name.

If that is not clear please do not hesitate to *publicly* discuss
this in an open fashion on the RISC-V mailing lists. We look forward
to hearing from you but if we do not, we will ASSUME that our proposed
direction is perfectly acceptable and compatible with the RISC-V
Foundation and that no action can or will be taken which prevents and
prohibits us from exploring the options that the RISC-V Foundation has
closed to us without wider consultation and consideration.

thanks.

l.

lkcl

unread,
Apr 6, 2018, 11:59:00 AM4/6/18
to Richard Herveille, Alex Bradbury, Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev, Rick O'Connor
On Fri, Apr 6, 2018 at 4:34 PM, Luke Kenneth Casson Leighton
<lk...@lkcl.net> wrote:


> If that is not clear please do not hesitate to *publicly* discuss
> this in an open fashion on the RISC-V mailing lists. We look forward
> to hearing from you but if we do not, we will ASSUME that our proposed
> direction is perfectly acceptable and compatible with the RISC-V
> Foundation and that no action can or will be taken which prevents and
> prohibits us from exploring the options that the RISC-V Foundation has
> closed to us without wider consultation and consideration.

oh, apologies, i forgot to say, we ASSUME that, unless we hear
otherwise, it will be perfectly acceptable to discuss AND MANUFACTURE
AND SELL commercial implementations of the same, without license,
restriction or impediment, any fork or enhancement / extension to
RISC-V, under a different name (yet to be decided), as long as the
RISC-V Trademark is not utilised in any such commercially-sold
libre-licensed implementations.

l.

Samuel Falvo II

unread,
Apr 6, 2018, 12:43:21 PM4/6/18
to Richard Herveille, Alex Bradbury, Liviu Ionescu, Luke Kenneth Casson Leighton, Rogier Brussee, RISC-V ISA Dev
On Fri, Apr 6, 2018 at 4:36 AM, Richard Herveille
<richard....@roalogic.com> wrote:
> Although the separation between privileged and unprivileged is
>
> intended to make them cleanly separable, it's not clear to me that it
>
> will be possible to jettison the privilege spec, implement a
>
> non-standard alternative, and still have a core that can use the
>
> 'RISC-V' name.
>
>
>
> Note that I specifically omitted that claim. Given previous replies/emails I
> am afraid it won’t be recognized as RISC-V anymore.

I disagree, at least conditionally.

There are a plurality of ARM variants (some very small, some large
enough to run server workloads), not all of which are binary
compatible with each other. Yet they are all still recognized as
being ARM variants, and I seem quite well insulated from any
vociferous concerns over whether or not a variant is Linux-compatible,
etc. In fact, my ONLY exposure to concerns about how fragmented Linux
support for ARM devices is has come from this very mailing list.

I think the industry is smarter than you give it credit for.

Once upon a time, if memory serves me right, support for the
privileged specification was denoted as RV64S or RV32S. Then, it was
changed so that U and S denoted support for user-mode and
supervisor-mode. I think, then, all one needs to do is just allocate
an additional letter to denote the final privilege specification:
machine-mode. For example, I'm always very careful to state that my
KCP53000 processor supports the RV64I instruction set <<and version
1.9 of the M-mode only privileged subset>>, but I have no convenient
way of denoting the bracketed part of that phrase. I'd love to be
able to label my ISA support level as RV64I_0.9 (where _ is some
officially sanctioned letter indicating machine-mode per the draft
specs).

Looking at the current, online version of the draft privilege
specifications, the misa register documentation reserves no bit nor
provides any indication of a letter to indicate machine mode is
supported. I'm thinking the framers thought it superfluous, as if
you're reading misa, you must obviously support the rest of M-mode
too. But, this need not be true; CSR instructions are not M-mode
specific; they're actually defined in user ISA now, although excepting
for fcsr, no specific CSRs have been defined.

Perhaps it's time to isolate misa from M-mode, and allocate a bit for
the draft proposed machine-mode, as has been done for S and U?

Given that, this issue becomes moot, an exercise in labeling the
correct compliance level (e.g., RV64IMXmyCustomMachineMode vs
RV32IM_SU).

--
Samuel A. Falvo II

lkcl

unread,
Apr 6, 2018, 1:18:49 PM4/6/18
to Samuel Falvo II, Richard Herveille, Alex Bradbury, Liviu Ionescu, Rogier Brussee, RISC-V ISA Dev
On Fri, Apr 6, 2018 at 5:43 PM, Samuel Falvo II <sam....@gmail.com> wrote:
> On Fri, Apr 6, 2018 at 4:36 AM, Richard Herveille

> There are a plurality of ARM variants (some very small, some large
> enough to run server workloads), not all of which are binary
> compatible with each other. Yet they are all still recognized as
> being ARM variants, and I seem quite well insulated from any
> vociferous concerns over whether or not a variant is Linux-compatible,
> etc. In fact, my ONLY exposure to concerns about how fragmented Linux
> support for ARM devices is has come from this very mailing list.

Even before i had heard of RISC-V i was keenly aware of how
ridiculous the ARM situation is, and have been writing - publicly -
for the past TWELVE YEARS - about it, many many times.... not that
anybody cared what the hell i said because, hey, My Name's Not Linus
Or {insert N.E.Other well-known prominent free software guru}. Ten
years ago i heard that there were *OVER SEVEN HUNDRED* separate
licensees of some ARM design. Since then with Cortex M and many more
designs that number will have gone through the roof.

Efforts by ARM to standardise on AXI-Bus internal identifiers (similar
to USB IDs), that would have made hard macro interfaces much much
easier to identify in a dynamic fashion, went COMPLETELY IGNORED by
implementors licensing ARM's processors, who decided in their infinite
wisdom to set that AXI-Bus ID field to *ZERO*.

Now... do you think that ARM would tolerate having people discuss the
*disadvantages* of their eco-system on forums under their own control
(hint: ARM *actively* censors discussions on their forums, in case you
were ever wondering), such that you would find it easy to *find* such
disadvantages in an easily documented and clear fashion?

So the fact that you've only just heard - on here - that ARM's
eco-system is a huge gelatinous mess is probably because it's
attracted people who are willing and feel comfortable discussing such
limitations... and then wish to learn from them.

Whether that actually happens (the "Learning") remains to be seen.
I'm seeing evidence which tends to suggest that despite having a clear
goal and clear foundational guiding principles, which have resulted in
an absolutely amazing and astoundingly well assembled ISA so far,
there is clear cognitive dissonnance in how the RISC-V Foundation is
run, a feedback mechnism that is either non-existent, extremely
restrictive or completely broken for *both* how the RISC-V Foundation
itself is run *and* how it makes technical decisions, that is going to
have consequences and affect whether the RISC-V Foundation can achieve
its clearly-stated and extremely laudble mission if those issues are
not addressed.

l.

Liviu Ionescu

unread,
Apr 6, 2018, 1:27:24 PM4/6/18
to Richard Herveille, Samuel Falvo II, RISC-V ISA Dev, Rogier Brussee, Alex Bradbury, Luke Kenneth Casson Leighton
On 6 April 2018 at 19:43:19, Samuel Falvo II (sam....@gmail.com) wrote:

> Perhaps it's time to isolate misa from M-mode,

I think that the idea of 'modes', as they are known now,
M-mode/S-mode/U-mode, should be decoupled from the instruction set,
and moved from Volume I to Volume II.

The microcontroller profile proposal has no benefits from keeping any
compatibility with the current M-mode; the few CSRs kept were
reorganised, mainly to make context switching easier.

Maybe I'm wrong, but if we take a look at the current Linux context
switch routine, there are 5 CSRs saved:

https://github.com/torvalds/linux/blob/38c23685b273cfb4ccf31a199feccce3bdcb5d83/arch/riscv/kernel/entry.S#L90-L100

The microcontroller profile uses only one word for the hart status.

---

The point is that the modes, as defined now, are probably fine for the
privileged profile, but a different profile (like the microcontroller
profile) can be designed to make a better use of them.


Regards,

Liviu

Samuel Falvo II

unread,
Apr 6, 2018, 1:36:39 PM4/6/18