Instruction formats >32 bit, 2nd attempt

485 views
Skip to first unread message

Clifford Wolf

unread,
May 14, 2019, 6:06:45 AM5/14/19
to RISC-V ISA Dev
Hi,

a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.

So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:

Most of the differences between these formats and the previous ones are based on feedback I got from here for the last proposal. So please keep the feedback coming.

regards,
 - Clifford

lk...@lkcl.net

unread,
May 14, 2019, 7:10:32 AM5/14/19
to RISC-V ISA Dev


On Tuesday, May 14, 2019 at 11:06:45 AM UTC+1, clifford wrote:
Hi,

a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.

So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:

neat.  makes it clear how it matches up with the 32-bit format.  comments:

* there's a spelling correction "simply an additional" in place of "simply and additional"
* funct7 is specifically 7-bits, i added a dividing line on the prefix format to make that clear, left in the ellipsis to make it clear that the remaining 2 bits can be something else.
* between bits 15 and 16 in the header i added a divider (matching those between 31 and 32)
* the 32-bit format i space-indented so that it matches vertically precisely with the corresponding fields in the 48+ version
* along the same theme, and following on from how RV formats tend to make sure that identical fields in other formats (R-Type, etc.) do not have to be decoded differently, complicating the hardware, i lined up the "prefix" format and "packed" format, and split the immediate into bits 0..15, from bit 7 through 22, and the remainder of the immediate *after* funct7.

also, as can be seen from how the 32-bit format has to be shifted along by 2 bits compared to the 48-bit format (due to the 2 bits normally used for the 32-bit format not being needed in the 48-bit format), a case could be made for barrel-shifting the *entirety* of the 4 formats, from bits 16 to 47, by 2 bits.

this would place the 48 bit format in *exactly* the same position(s) as the 32-bit format, just offset by 16 bits instead of 14, which kinda just feels better, (even if it means that bits 16 and 17 are extensions of funct7 or something else)

a case could probably be made for leaving them offset by 14 bits.

overall, looks great.

i cannot help but wonder though if a trick is being missed by not using the same 48b=011111, 64b=0111111 (etc.) format that's in the original spec, and wonder what the prefix, load-immediate JAL and packed formats would look like if the original spec 48b/64b prefixes were kept.

l.

proposal_2.txt

Jacob Lifshay

unread,
May 14, 2019, 1:34:07 PM5/14/19
to Clifford Wolf, RISC-V ISA Dev, Luke Kenneth Casson Leighton
On Tue, May 14, 2019, 03:06 Clifford Wolf <cliffor...@gmail.com> wrote:
Hi,

a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.

So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:
Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.

One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.

Most of the differences between these formats and the previous ones are based on feedback I got from here for the last proposal. So please keep the feedback coming.
Will try my best.

Jacob


lk...@lkcl.net

unread,
May 14, 2019, 2:21:52 PM5/14/19
to RISC-V ISA Dev, cliffor...@gmail.com, lk...@lkcl.net


On Tuesday, May 14, 2019 at 6:34:07 PM UTC+1, Jacob Lifshay wrote:
On Tue, May 14, 2019, 03:06 Clifford Wolf <cliffor...@gmail.com> wrote:
Hi,

a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.

So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:
Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.

One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats.

if len is in the same place, the spread of immediate across 4 locations is "just some wires".

if len is *not* in the same place, imm requires gates to decode, lots of other things need gates to decode... etc. etc.

which tends to suggest that it may be beneficial to use the exact same format for "packed" as closely as possible to the 32-bit opcode it's intended to mirror... then add the extra bits of immediate onto the end (extending imm from 12? bits to 17 (and beyond).



      |              4                    |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     
|-----------------------------------|-------------------------------|-------------------------------|
...  |    funct7   |   rs2   |   rs1   |  f3 |    rd   | opcode (8bit) |f89| len | 00|page | 00|  11111  | prefix
   
...                               immediate                          |f| len | f2| rd' | op|  11111  | LI
    ...                               immediate                          |f| len | f2| rd  | op|  11111  | JALR
imm18|    funct7   |   rs2   |   rs1   |imm15|    rd   | imm[5..14]        | len | imm[0.4]| op|  11111  | packed
..NN                                     ..17      

For comparison, the standard 32-bit format (indentation added for clarity, to line up with the above):

     |  3                   2                   1                    |
     |1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     |---------------------------------------------------------------|
     |    funct7   |   rs2   |   rs1   |  f3 |    rd   | opcode(7bit)| 32-bit format
     | imm[5..11]  |   rs2   |   rs1   |  f3 | imm[0.4]| opcode(7bit)| 32-bit S-format
     | imm[0..11]            |   rs1   |  f3 |    rd   | opcode(7bit)| 32-bit I-format
     | imm[12..31                            |    rd   | opcode(7bit)| 32-bit U-format



hmmmm.... just looking at that... what's the reason for having opcode be 8 bit and funct7 be 9 bit?  why not make opcode be 9 bit and funct7 8 bit, then only 1 bit is needed to extend funct7, and it happens to match precisely with the field named "f" in bit 15 of the other formats.

again.... less gates....

l.
 
proposal_2.txt

Clifford Wolf

unread,
May 14, 2019, 2:27:40 PM5/14/19
to Jacob Lifshay, RISC-V ISA Dev
Hi,

On Tue, May 14, 2019 at 7:34 PM Jacob Lifshay <program...@gmail.com> wrote:
Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.

One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.

I don't understand what you mean. It is always in the same position. len is always instr[4:2]. 

regards,
 - Clifford

Clifford Wolf

unread,
May 14, 2019, 2:32:50 PM5/14/19
to Jacob Lifshay, RISC-V ISA Dev
Hi,

On Tue, May 14, 2019 at 8:27 PM Clifford Wolf <cliffor...@gmail.com> wrote: 
One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.

I don't understand what you mean. It is always in the same position. len is always instr[4:2]. 

I meant inst[14:12] of course. :)

Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.

btw, it's 3 more bits. when inst[14:12]=111 then instr[6:5] also becomes available to the custom extension.
In my previous proposal those two bits where used as part of the length encoding.

regards,
 - Clifford

Jacob Lifshay

unread,
May 14, 2019, 4:41:32 PM5/14/19
to Clifford Wolf, RISC-V ISA Dev, Luke Kenneth Casson Leighton
On Tue, May 14, 2019, 11:32 Clifford Wolf <cliffor...@gmail.com> wrote:
Hi,

On Tue, May 14, 2019 at 8:27 PM Clifford Wolf <cliffor...@gmail.com> wrote: 
One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.

I don't understand what you mean. It is always in the same position. len is always instr[4:2]. 

I meant inst[14:12] of course. :)
Sorry, I think I read the version modified by Luke and thought that that was in the original version.

Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.

btw, it's 3 more bits. when inst[14:12]=111 then instr[6:5] also becomes available to the custom extension.
In my previous proposal those two bits where used as part of the length encoding.
the more the better :)

Jacob

Jonathan

unread,
May 14, 2019, 10:21:07 PM5/14/19
to RISC-V ISA Dev
Am I missing something or does the prefix format only use 15 bits (rather than 16 bits)? Thinking about it, it might actually even make sense to disallow mixing compressed instructions and prefixes, which would unlock 2 more bits.

It also feels really unfortunate that the LI instruction type would be limited to so few destination registers. This seems to be a consequence of making the length field 3 bits for {48, 64, 80}-bit instructions. However, that could be mitigated by using multiple len values for 48-bit and 80-bit instructions (where floating point immediate values would be in play, and thus more space is needed):

len | op | meaning
------------------
0xx | 00 | prefix
0xx | 01 | packed format
0xx | 10 | Load integer immediate
0xx | 11 | JALR immediate
100 | ?? | Load 32-bit FP immediate (48-bit instruction)
101 | xx | reserved for custom
110 | ?? | Load 64-bit FP immediate (80-bit instruction)
111 | xx | reserved for longer instructions




To me, this sort of proposal feels really hard to evaluate without getting sketching out how everything is going to fit into the encoding space. How much leftover space is there in the the load-imm and JAL format opcode spaces? Any chance of allowing all 32 destination registesr

lk...@lkcl.net

unread,
May 15, 2019, 1:01:50 AM5/15/19
to RISC-V ISA Dev, cliffor...@gmail.com, lk...@lkcl.net


On Tuesday, May 14, 2019 at 9:41:32 PM UTC+1, Jacob Lifshay wrote:
On Tue, May 14, 2019, 11:32 Clifford Wolf <cliffor...@gmail.com> wrote:
Hi,

On Tue, May 14, 2019 at 8:27 PM Clifford Wolf <cliffor...@gmail.com> wrote: 
One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.

I don't understand what you mean. It is always in the same position. len is always instr[4:2]. 

I meant inst[14:12] of course. :)
Sorry, I think I read the version modified by Luke and thought that that was in the original version.

yep that's corrected above (2nd revision, posted yesterday), and funct7, rs2, rs1 and rd are all in exactly the same bit-location, by spreading the immediate out across multiple locations.

that will mean that, compared to the original, it will not be necessary to have special logic for decoding funct7, rs2, rs1 and rd, for prefix and packed format.

l.

Rogier Brussee

unread,
May 15, 2019, 4:31:50 AM5/15/19
to RISC-V ISA Dev
You seem to have dropped the "everything can have an immediate" bit but I guess you just do fusion on things like

LLI rd imm64
C.add rd rs2

I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so: 

    |              4                    |  3                   2                   1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------|---------------------------------------------------------------|
   ...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | jump-and-link format 
   ...                               immediate                          |func3|len  | rd' | 00|  11111  | load-immediate format
   ...        immediate                 |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len  | rd' | 01|  11111  | prefix16 format
   ...       funct7   |   rs2   |   rs1   |  f3 |    rd   |   opcode5|op|func3|len  | page| 10|  11111  | prefix32 format
   ...*********************************TBD******************************|func3|len  | page| 11|  11111  | reserved (prefix48 format ?????)
   

(unfortunately this looks horrible in proportional font)  The idea is to reuse the RVC decoder for RVC-R-type instructions (like C.SUB) for the  prefixing bit 0..15  which depending on bit[5:6] is then  followed by 
00:  an immediate directly,
01: an instruction in restricted RVC--S-type (like C.sd) RVC format 
10: a regular 32bit R-type RV instruction (like ADD).
11: reserved or a hypothetical 48 bit instruction format

 In this scheme it may be easier to have Len encode the length of the immediate: length(imm) <-- 16* (len != 111)?2^len: 0 bits, which gives a maximum immediate length of 64*16 bit = 128  byte
Also in the load immediate format I have assumed func3 = 001 .. 111 but if you are allowing rd (registers x0..x7) in the jump_and_link format anyway (which is a good idea), one _could_ also decide to use rd (registers x0..x7 rather than x8..15)  for loading immediates, using 1 bit of func3 but little extra cost to the encoder, effectively increasing the register range to x0...x15. 


 In fact you probably don't have to restrict restricted RVC--S-type  or R-type instructions and still be able to reuse decoders.  

    |              4                    |  3                   2                   1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------|---------------------------------------------------------------|
   ...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | jump-and-link format 
   ...                               immediate                          |func3|len  | rd' | 00|  11111  | load-immediate format
   ...        immediate                 |func3 |*****RVC-FMT*********|op|func3|len  | rd' | 01|  11111  | prefix16 format
   ...******************************RV_FMT****************|   opcode5|op|func3|len  |page | 10|  11111  | prefix32 format



Op dinsdag 14 mei 2019 12:06:45 UTC+2 schreef clifford:

Clifford Wolf

unread,
May 15, 2019, 4:38:35 AM5/15/19
to Jonathan, RISC-V ISA Dev
Hi,

On Wed, May 15, 2019 at 4:21 AM Jonathan <fint...@gmail.com> wrote:
Am I missing something or does the prefix format only use 15 bits (rather than 16 bits)?

In the prefix format the MSB of the  first 16-bit word is the LSB of the 8-bit opcode field.
 
Thinking about it, it might actually even make sense to disallow mixing compressed instructions and prefixes, which would unlock 2 more bits.

Disallowing compressed instructions sounds like a bad idea. 
 
It also feels really unfortunate that the LI instruction type would be limited to so few destination registers.

load-immediates are infrequent enough that this should never be an issue.
 
This seems to be a consequence of making the length field 3 bits for {48, 64, 80}-bit instructions. However, that could be mitigated by using multiple len values for 48-bit and 80-bit instructions (where floating point immediate values would be in play, and thus more space is needed):

I don't see what this would gain you, considering load-immediates are sufficiently infrequent that it shouldn't be a problem to allocate the registers in a way that is compatible with this restrictions.
 
len | op | meaning
------------------
0xx | 00 | prefix
0xx | 01 | packed format
0xx | 10 | Load integer immediate
0xx | 11 | JALR immediate
100 | ?? | Load 32-bit FP immediate (48-bit instruction)
101 | xx | reserved for custom
110 | ?? | Load 64-bit FP immediate (80-bit instruction)
111 | xx | reserved for longer instructions
 
So you are blowing through the entire reserved op space for len=0xx and then some more just because you want long load-immediates work with rd instead of rd'?

I seriously doubt this would be worth it.

Also, reserving the entire len=111 for longer instructions is very wasteful. I've now reserved len=111, op=11 and added an appendix that outlines how larger instructions (up to 592 bits length = 74 bytes) could be encoded.
 
To me, this sort of proposal feels really hard to evaluate without getting sketching out how everything is going to fit into the encoding space. How much leftover space is there in the the load-imm and JAL format opcode spaces?

It depends on how you want to use them, but the scheme proposed in Appendix II would fit all immediate loads into op=00, with 1 or 2 encodings spared, depending on if there's a loat-float-immediate instruction for that length encoding.
 
Any chance of allowing all 32 destination registesr

Could easily be done in a prefix-format instruction that's 16-bit longer.

regards,
 - Clifford

Clifford Wolf

unread,
May 15, 2019, 5:03:44 AM5/15/19
to Rogier Brussee, RISC-V ISA Dev
Hi,

On Wed, May 15, 2019 at 10:31 AM Rogier Brussee <rogier....@gmail.com> wrote:
I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so: 

    |              4                    |  3                   2                   1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------|---------------------------------------------------------------|
   ...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | jump-and-link format 
   ...                               immediate                          |func3|len  | rd' | 00|  11111  | load-immediate format
   ...        immediate                 |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len  | rd' | 01|  11111  | prefix16 format
   ...       funct7   |   rs2   |   rs1   |  f3 |    rd   |   opcode5|op|func3|len  | page| 10|  11111  | prefix32 format
   ...*********************************TBD******************************|func3|len  | page| 11|  11111  | reserved (prefix48 format ?????)
   
(unfortunately this looks horrible in proportional font)  The idea is to reuse the RVC decoder for RVC-R-type instructions (like C.SUB) for the  prefixing bit 0..15  which depending on bit[5:6] is then  followed by 
00:  an immediate directly,
01: an instruction in restricted RVC--S-type (like C.sd) RVC format 
10: a regular 32bit R-type RV instruction (like ADD).
11: reserved or a hypothetical 48 bit instruction format

Interesting idea, however, I am not sure if there is an application for it. I'd assume that in most cases one would want a prefix32 instruction as well, and then the prefix16 would be the "compressed version"? So you'd have for example a 64-bit prefix32 instruction and then a compressed 48-bit prefix16 instruction?

I'd assume that most instructions >32-bit will be infrequent enough so that the additional decoder cost would not be worth the decrease in code size.
 
In this scheme it may be easier to have Len encode the length of the immediate: length(imm) <-- 16* (len != 111)?2^len: 0 bits, which gives a maximum immediate length of 64*16 bit = 128  byte

Not all instructions have an immediate that's a power of two.

Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.
 
Also in the load immediate format I have assumed func3 = 001 .. 111 but if you are allowing rd (registers x0..x7) in the jump_and_link format anyway (which is a good idea), one _could_ also decide to use rd (registers x0..x7 rather than x8..15)  for loading immediates, using 1 bit of func3 but little extra cost to the encoder, effectively increasing the register range to x0...x15. 

I'd argue one would never want to use load-immediate with x0..x4.

However, I think it might be interesting to allow load-immedate to address t1/t2 (x6/x7) instead of s0/s1 (x8/x9).

But I have not proposed this because I didn't want to make it more complex than necessary.

regards,
 - Clifford

Rogier Brussee

unread,
May 15, 2019, 7:27:15 AM5/15/19
to RISC-V ISA Dev, rogier....@gmail.com


Op woensdag 15 mei 2019 11:03:44 UTC+2 schreef clifford:
Hi,

On Wed, May 15, 2019 at 10:31 AM Rogier Brussee <rogier...@gmail.com> wrote:
I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so: 

    |              4                    |  3                   2                   1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------|---------------------------------------------------------------|
   ...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | jump-and-link format 
   ...                               immediate                          |func3|len  | rd' | 00|  11111  | load-immediate format
   ...        immediate                 |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len  | rd' | 01|  11111  | prefix16 format
   ...       funct7   |   rs2   |   rs1   |  f3 |    rd   |   opcode5|op|func3|len  | page| 10|  11111  | prefix32 format
   ...*********************************TBD******************************|func3|len  | page| 11|  11111  | reserved (prefix48 format ?????)
   
(unfortunately this looks horrible in proportional font)  The idea is to reuse the RVC decoder for RVC-R-type instructions (like C.SUB) for the  prefixing bit 0..15  which depending on bit[5:6] is then  followed by 
00:  an immediate directly,
01: an instruction in restricted RVC--S-type (like C.sd) RVC format 
10: a regular 32bit R-type RV instruction (like ADD).
11: reserved or a hypothetical 48 bit instruction format

Interesting idea, however, I am not sure if there is an application for it.

The prefix16 is an alternative approach to your "packed format" with some register bits traded for some additional immediate (or function selector) bits, and the prefix32 format is what you call prefix format.

I'd assume that in most cases one would want a prefix32 instruction as well, and then the prefix16 would be the "compressed version"? So you'd have for example a 64-bit prefix32 instruction and then a compressed 48-bit prefix16 instruction?  

I'd assume that most instructions >32-bit will be infrequent enough so that the additional decoder cost would not be worth the decrease in code size.
 

It would certainly be reasonable to demand that there is a prefix32 version of a prefix16 instruction.  However that has some problems, note that if you use the immediate bits contained in bit[16..31] you would also need an (even) longer immediate field.  Moreover as you rightly point out, the longer instructions are going to be rare, and the point of these rare long instructions is that if you need them, you can. The trade off would be a shorter instruction vs being usable with all registers. E.g. an attractive option seems prefix16 len=2 and using the 5 bits of immediates in bit  16..31 as an additional function selector for 256 3 register + 32 bit of immediate instructions in a single 64 bit instruction.  Even more room for 2 register + 32 bit immediate in a 64 bit instruction using other instruction formats in RVC including the possibility to use a standard 5 bit register for rs1.  
If you insist on a prefix32 version, that would be an at least 96 bit instruction, and that seems noticeably more of a burden than a 64 bit instruction for a largely theoretical benefit.

So, although it is a bit unconventional, one could argue prefix16 is just a particular (regular 3 register!) instruction encoding that happens to put mild restrictions on its use that may or may not be reflected internally. Therefore, I am not sure a full version is actually needed. In any case the point is to reuse decoders as much as possible certainly not to make decoding more difficult.


In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm)  = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128  byte

Not all instructions have an immediate that's a power of two.


The hypothetical bfxp instruction with 3 registers and  21 bit immediate would actually fit as a 48 bit instruction of type prefix16 with len= 0b000 (length(immediate) = 16) which gives 3 (popular) registers + 16 bits immediate in the immediate field + 5 bits of immediate in bits 16...31.


Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.

bit 5..6 and the len field bit 10.. 12 determine the length of the instruction uniquely

length(instruction) = 16*(bit[5:6] + ((1<<len)  & 127 ))
 
 
Also in the load immediate format I have assumed func3 = 001 .. 111 but if you are allowing rd (registers x0..x7) in the jump_and_link format anyway (which is a good idea), one _could_ also decide to use rd (registers x0..x7 rather than x8..15)  for loading immediates, using 1 bit of func3 but little extra cost to the encoder, effectively increasing the register range to x0...x15. 

I'd argue one would never want to use load-immediate with x0..x4.

However, I think it might be interesting to allow load-immedate to address t1/t2 (x6/x7)

Agreed
 
instead of s0/s1 (x8/x9).

But I have not proposed this because I didn't want to make it more complex than necessary.

 
Agreed,  I probably shouldn't have bothered pointing that out. 

Regards
Rogier

regards,
 - Clifford

lk...@lkcl.net

unread,
May 15, 2019, 9:05:09 AM5/15/19
to RISC-V ISA Dev


On Wednesday, May 15, 2019 at 9:31:50 AM UTC+1, Rogier Brussee wrote:
You seem to have dropped the "everything can have an immediate" bit but I guess you just do fusion on things like

LLI rd imm64
C.add rd rs2

I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so: 

rogier if you happen to be using the web interface to google groups you can hit the "{}" button after selecting the text and it will set "code" mode, like this:
 
    |              4                    |  3                   2                   1                    |
   
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
   
|-----------------------------------|---------------------------------------------------------------|

   
...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | JALI
   
...                               immediate                          |func3|len  | rd' | 00|  11111  | LDI
   ...        immediate                 |func3 |imm3 |rs1'
|i2 |rs2' |op|func3|len  | rd' | 01|  11111  | PX16
   
...       funct7   |   rs2   |   rs1   |  f3 |    rd   |   opcode5|op|func3|len  | page| 10|  11111  | PX32
   
...*********************************TBD******************************|func3|len  | page| 11|  11111  | PX48

(i changed the names at the end of each line so that that stands a chance of not line-wrapping on web browsers at below 1200 pixels in width)

l.

lk...@lkcl.net

unread,
May 15, 2019, 9:13:23 AM5/15/19
to RISC-V ISA Dev, fint...@gmail.com


On Wednesday, May 15, 2019 at 9:38:35 AM UTC+1, clifford wrote:
 
Could easily be done in a prefix-format instruction that's 16-bit longer.

can i just check, so that i'm not misunderstanding: are the prefix instructions intended to be simply extensions of the 32-bit J-format (where one instruction extends the immediate of the next one by specifying the HI bits), or are they intended to be stand-alone instructions (just with very large immediates), or both?

is there a use-case for extending immediate HI-bits with very long immediates?

l.

Rogier Brussee

unread,
May 15, 2019, 9:14:38 AM5/15/19
to RISC-V ISA Dev


Op woensdag 15 mei 2019 15:05:09 UTC+2 schreef lk...@lkcl.net:
Thanks! That looks the way I intended it to look!

Ciao
Rogier
 

Clifford Wolf

unread,
May 15, 2019, 11:54:45 AM5/15/19
to Rogier Brussee, RISC-V ISA Dev
Hi,

On Wed, May 15, 2019 at 1:27 PM Rogier Brussee <rogier....@gmail.com> wrote:
Interesting idea, however, I am not sure if there is an application for it.

The prefix16 is an alternative approach to your "packed format" with some register bits traded for some additional immediate (or function selector) bits, and the prefix32 format is what you call prefix format.

I don't see how any of that answers the question whether there's an application for it.

In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm)  = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128  byte

Not all instructions have an immediate that's a power of two.

The hypothetical bfxp instruction with 3 registers and  21 bit immediate would actually fit as a 48 bit instruction of type prefix16 with len= 0b000 (length(immediate) = 16) which gives 3 (popular) registers + 16 bits immediate in the immediate field + 5 bits of immediate in bits 16...31.

Again, not sure how that addresses the point that not all instructions have an immediate that's a power of two.

Regarding the implication that 3 registers and  21 bit immediate wouldn't fit in a 48-bit encoding with my formats: That's simply not true!

Here it is, using the regular 48-bit packed format:

    |7 6 5 4 3|2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------------------------------------------------------------------|
    |    start    |    length   |     dest    | f2|   rs2   |   rs1   | len |    rd   | op|  11111  | BFXP-48

It's just a question of what percentage of the encoding space you would want to spend on it.

And for the bulk of instructions >32-bit I'd assume that being able to just easily address all registers, and not having to fight for each and every bit of encoding space,
would be more valuable that occasionally saving 16-bit, at the cost of more complex instruction decoders and a far more cluttered encoding space.

Consider the BFXP instruction as described in my Appendix II:

    |      6                   5    |              4                |  3                   2        |          1                    |
    |3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-------------------------------------------------------------------------------------------------------------------------------|
    |    start    |    length   |     dest    | f2|   rs2   |   rs1   |  f3 |    rd   |     opcode    | len | 00|page | 00|  11111  | BFXP

There's enough encoding space here to define 65536 such instructions, only in the standard prefix encoding field in op=00. (If I wanted to fill the entire reserved space op=01 and op=10
space with those instructions I could squeeze 4M of them in there.)

Of course you can squeeze it in 48-bit. But even removing 6 bits (2 bits for each of the 3 registers) will only give you back 64x the encoding space. So in effect, by making it
a 48-bit instruction, even though you reduced the space needed for the registers, you just made the instruction 1000x more expensive in terms of the relative encoding space it
occupies.

An fringe instruction that occupies 1/65536 of the space it's sitting in, and is doing the thing it's supposed to do well: No discussion about encoding space.

An fringe instruction that occupies 1/64 of the space it's sitting in, _and_ has weird limitations: Not really and interesting proposal I'd say.

Of course, 1/65536 is extreme. But we can only increase instruction sizes in steps of 16 bits. So with some instructions, such as this one, we only have the choice between taking a huge chunk of an encoding space, or a tiny chunk of the next larger encoding space. And I think in this case tiny chunk is the right choice.

But even assuming one really want's the prefix16 format with it's limitations, I still think that it would probably be better to just use my packed format, just with rs2', rs1', and rd' instead of rs2, rs1, rd:

    |              4                    |  3                   2        |          1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    ----------------------------------------------------------------------------------------------------|
   ...           immediate              |    funct7   |   rs2   |   rs1   | len |    rd   | op|  11111  | regular packed format
   ...           immediate              |        funct9   | rs2'| f2| rs1'| len |  f2| rd'| op|  11111  | modified packed format

yes, the two f2 fields are placed a bit awkwardly, but I think this would still be a better encoding. For a few reasons, but the main one is that then it can share the same "op" space as regular packed instructions, and use bits in funct7 to distinguish the details of the instruction format.

In your proposal, by using different op encodings for the two formats, you are either assuming that both formats will see and equal encoding space pressure, or you are knowingly throwing away encoding space. Using funct7 is much more flexible. For example, if 1/4th of the encoding space should be used for the modified format, one could say instr[31:30]=11 is the modified format and other values is using the unmodified format. Or, one could reserve instr[31:30]=01 and instr[31:30]=10 and whenever we run out of encoding space in one of the categories _then_ we know for which one is greater demand and can define one of the reserved values accordingly.

Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.

bit 5..6 and the len field bit 10.. 12 determine the length of the instruction uniquely 

length(instruction) = 16*(bit[5:6] + ((1<<len)  & 127 ))

Whereas in my proposal, ignoring reserved spaces, the length is 16*instr[3:2]+48.

I'm proposing a function of 2 bits whereas you are proposing a function of 5 bits.

Which one is simpler to decode, assuming you already know instr[4:0] = 11111?

If you have a dual-issue pipeline, or do instruction fusing, you need to be able to decode multiple instructions in parallel. And the entire decode for the 2nd instruction depends on the length for the first instruction. So that better be as quick as you can possibly make it.

regards,
 - Clifford

Jonathan

unread,
May 15, 2019, 12:14:20 PM5/15/19
to RISC-V ISA Dev
I'm realizing that I don't quite know what the goals are for >32bit instructions. I can think of a couple:

  1. Add larger JAL & LI instructions
  2. Allow the bit manipulation or other extensions to define longer instructions 
  3. Add encoding pages for new "32-bit" instructions (which would actually be 48-bits because of the prefix)
  4. Produce versions of shorter instructions with additional immediate space
I think this covers everything proposed so far, but I may be missing something.

The first two points make sense to me. (3) does feel like an immediate priority, but is probably worthwhile while we're at it. (4) I'm more skeptical of: the current designs would allow longer versions of any instruction even though many don't take immediates at all. Even among the ones that do, there are tons of non-nonsensical combinations: 48.C.JAL would have 27 immediate bits compared to 48.JAL which would have 32. At the 64-bit instruction length, the extended version of nearly all 32-bit instructions would have 16+12=28 bits of immediate, less than the 32 they'd get from LUI+xxx or 48.LI+C.xxx. There are a couple sweet spots like where the 18bits from C.LUI + xxx is less than the 21-24 you'd have from 48.C.xxx, but if we focused in on those we might be able to come up with even better solutions there too.

Jonathan

Clifford Wolf

unread,
May 15, 2019, 12:45:56 PM5/15/19
to Jonathan, RISC-V ISA Dev
Hi,

On Wed, May 15, 2019 at 6:14 PM Jonathan <fint...@gmail.com> wrote:
I'm realizing that I don't quite know what the goals are for >32bit instructions. I can think of a couple:
  1. Add larger JAL & LI instructions
  2. Allow the bit manipulation or other extensions to define longer instructions 
  3. Add encoding pages for new "32-bit" instructions (which would actually be 48-bits because of the prefix)
  4. Produce versions of shorter instructions with additional immediate space
I think this covers everything proposed so far, but I may be missing something.

I might steal 1-3 for a short introduction section for the proposal.

The first two points make sense to me. (3) does feel like an immediate priority, but is probably worthwhile while we're at it.

My main motivation for working on this is that I fell like none of the potential users of the larger instructions is doing anything because they are all waiting for others to first come up with encodings and formats.

I think that's potentially especially true for potential users of those additional 32-bit spaces. They just end up doing a custom extension within one of the custom-[0123] opcodes instead.

(And notice that the 64-bit BFXP instruction is using the prefix format + addition 16 bits immediate. 48-bit prefix instructions are just the special case where there's no extra immediate available in the instruction word.)
 
(4) I'm more skeptical of: the current designs would allow longer versions of any instruction even though many don't take immediates at all.

I've read this sentence multiple times and I just can't parse it. I also don't know what "Produce versions of shorter instructions with additional immediate space" means exactly. Can you rephrase that?
 
Even among the ones that do, there are tons of non-nonsensical combinations: 48.C.JAL would have 27 immediate bits compared to 48.JAL which would have 32. At the 64-bit instruction length, the extended version of nearly all 32-bit instructions would have 16+12=28 bits of immediate, less than the 32 they'd get from LUI+xxx or 48.LI+C.xxx. There are a couple sweet spots like where the 18bits from C.LUI + xxx is less than the 21-24 you'd have from 48.C.xxx, but if we focused in on those we might be able to come up with even better solutions there too.

I have no idea how any of that relates to my text. Sorry. Are we talking about the same proposal?
 
regards,
 - Clifford

Jonathan Behrens

unread,
May 15, 2019, 1:13:25 PM5/15/19
to Clifford Wolf, RISC-V ISA Dev

(4) I'm more skeptical of: the current designs would allow longer versions of any instruction even though many don't take immediates at all.

I've read this sentence multiple times and I just can't parse it. I also don't know what "Produce versions of shorter instructions with additional immediate space" means exactly. Can you rephrase that?
 
Even among the ones that do, there are tons of non-nonsensical combinations: 48.C.JAL would have 27 immediate bits compared to 48.JAL which would have 32. At the 64-bit instruction length, the extended version of nearly all 32-bit instructions would have 16+12=28 bits of immediate, less than the 32 they'd get from LUI+xxx or 48.LI+C.xxx. There are a couple sweet spots like where the 18bits from C.LUI + xxx is less than the 21-24 you'd have from 48.C.xxx, but if we focused in on those we might be able to come up with even better solutions there too.

I have no idea how any of that relates to my text. Sorry. Are we talking about the same proposal?

This was referring to the "prefix format" instructions in the "everything can have an immediate" world. In other words, instructions which are a concatenation of: prefix + existing 16/32-bit instruction + more immediate bits. Reading more closely, I realize you've dropped that idea since the last version and that now "prefix instruction" doesn't actually mean a prefix tacked on to any existing instruction.

Now that I understand the proposal a bit better, I think I quite like the general contours: One format for LI/JAL, another for instructions with very large immediates, and a third that has a massive amount of opcode space.

Jonathan

Rogier Brussee

unread,
May 15, 2019, 5:16:16 PM5/15/19
to RISC-V ISA Dev, rogier....@gmail.com


Op woensdag 15 mei 2019 17:54:45 UTC+2 schreef clifford:
Hi,

On Wed, May 15, 2019 at 1:27 PM Rogier Brussee <rogier...@gmail.com> wrote:
Interesting idea, however, I am not sure if there is an application for it.

The prefix16 is an alternative approach to your "packed format" with some register bits traded for some additional immediate (or function selector) bits, and the prefix32 format is what you call prefix format.

I don't see how any of that answers the question whether there's an application for it.

The same applications as your packed format i.e. as you already point out, you would have to think twice before using it, but it gives a fair number of 48/ 64/96 bit instructions with 3 registers and a 16, 32 or 64 bits of immediate.


In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm)  = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128  byte

Not all instructions have an immediate that's a power of two.

The hypothetical bfxp instruction with 3 registers and  21 bit immediate would actually fit as a 48 bit instruction of type prefix16 with len= 0b000 (length(immediate) = 16) which gives 3 (popular) registers + 16 bits immediate in the immediate field + 5 bits of immediate in bits 16...31.

Again, not sure how that addresses the point that not all instructions have an immediate that's a power of two.


????
 To illustrate that the number of bits of the immediate of the instruction does not have to coincide with the 16* ((1 <<len) % 128) number of bits following bit 32 I used the bfxp instruction with 21 bits of immediate.
 
On second thought, perhaps you mean you cannot encode e.g. a "load a 48 bit integer" LLI48 instruction of length 64 bit ? 
That is true, the "logarithmic encoding" of the length of the encoding you would only have a LLI32 instruction in 48 bit and a LLI64 load 64 bit integer in a 96 bit instruction.

On the other hand a BXP instruction with 2*128bit  worth of immediate would work just fine :-)  
 
 
Regarding the implication that 3 registers and  21 bit immediate wouldn't fit in a 48-bit encoding with my formats: That's simply not true!


I don't think I implied that,  
 
Here it is, using the regular 48-bit packed format:

    |7 6 5 4 3|2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-----------------------------------------------------------------------------------------------|
    |    start    |    length   |     dest    | f2|   rs2   |   rs1   | len |    rd   | op|  11111  | BFXP-48

It's just a question of what percentage of the encoding space you would want to spend on it.

And for the bulk of instructions >32-bit I'd assume that being able to just easily address all registers, and not having to fight for each and every bit of encoding space,


Agreed completely.
 
would be more valuable that occasionally saving 16-bit, at the cost of more complex instruction decoders and a far more cluttered encoding space.

Consider the BFXP instruction as described in my Appendix II:

    |      6                   5    |              4                |  3                   2        |          1                    |
    |3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    |-------------------------------------------------------------------------------------------------------------------------------|
    |    start    |    length   |     dest    | f2|   rs2   |   rs1   |  f3 |    rd   |     opcode    | len | 00|page | 00|  11111  | BFXP

There's enough encoding space here to define 65536 such instructions, only in the standard prefix encoding field in op=00. (If I wanted to fill the entire reserved space op=01 and op=10
space with those instructions I could squeeze 4M of them in there.)

Of course you can squeeze it in 48-bit. But even removing 6 bits (2 bits for each of the 3 registers) will only give you back 64x the encoding space. So in effect, by making it
a 48-bit instruction, even though you reduced the space needed for the registers, you just made the instruction 1000x more expensive in terms of the relative encoding space it
occupies.


no argument there. 
 
An fringe instruction that occupies 1/65536 of the space it's sitting in, and is doing the thing it's supposed to do well: No discussion about encoding space.
 
An fringe instruction that occupies 1/64 of the space it's sitting in, _and_ has weird limitations: Not really and interesting proposal I'd say.


It was not even a proposal. I gave an example with 21 bit non power of two number of bits in the immediate. 
Actually the whole thing is not even a proposal but feedback on your proposal. The feedback was: you seem to be using a scheme with a 16 bit prefix that is using reduced registers for the load immediate and jump. You also have a packed format where the prefix is organised a little different mainly from using a full register. If you reorganise a few bits and use reduced reduced registers for the packed format as well you can reuse the RVC decoder!  And if you follow idea through you can use the RVC decoder for the second 16 bit too, and if you have rd a reduced registerset, why not have rs1 and rs2 also have a reduced register set. 
There are plenty of usages of the full registerset in RVC and reusing the RVC is actually the more relevant message. 


 
Of course, 1/65536 is extreme. But we can only increase instruction sizes in steps of 16 bits. So with some instructions, such as this one, we only have the choice between taking a huge chunk of an encoding space, or a tiny chunk of the next larger encoding space. And I think in this case tiny chunk is the right choice.

But even assuming one really want's the prefix16 format with it's limitations, I still think that it would probably be better to just use my packed format, just with rs2', rs1', and rd' instead of rs2, rs1, rd:

    |              4                    |  3                   2        |          1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
    ----------------------------------------------------------------------------------------------------|
   ...           immediate              |    funct7   |   rs2   |   rs1   | len |    rd   | op|  11111  | regular packed format
   ...           immediate              |        funct9   | rs2'| f2| rs1'| len |  f2| rd'| op|  11111  | modified packed format

yes, the two f2 fields are placed a bit awkwardly, but I think this would still be a better encoding. For a few reasons, but the main one is that then it can share the same "op" space as regular packed instructions, and use bits in funct7 to distinguish the details of the instruction format.
 
In your proposal,

I.e.
 
    |              4                    |  3                   2                   1                    |
    
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|

    
|-----------------------------------|---------------------------------------------------------------|

   
...                               immediate                          |0 0 0|len  | rd  | 00|  11111  | JALI
   
...                               immediate                          |func3|len  | rd' | 00|  11111  | LDI
   ...        immediate                 |func3 |imm3 |rs1'
 |i2 |rs2' |op|func3|len  | rd' | 01|  11111  | PX16
   
...       funct7   |   rs2   |   rs1   |  f3 |    rd   |   opcode5|op|func3|len  | page| 10|  11111  | PX32
   
...*********************************TBD******************************|func3|len  | page| 11|  11111  | PX48


 
by using different op encodings for the two formats, you are either assuming that both formats will see and equal encoding space pressure, or you are knowingly throwing away encoding space.

The latter I guess: Using the imm3 and i2 as additional function selectors rather than immediate  there are 32* 8 * 4* 8 = 8048 Prefix16 instructions with 3 registers  + immediate,  8 times Using the 


   ...        immediate                 |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len  | rd' | 01|  11111  | PX16
 
 
Using funct7 is much more flexible. For example, if 1/4th of the encoding space should be used for the modified format, one could say instr[31:30]=11 is the modified format and other values is using the unmodified format. Or, one could reserve instr[31:30]=01 and instr[31:30]=10 and whenever we run out of encoding space in one of the categories _then_ we know for which one is greater demand and can define one of the reserved values accordingly.

Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.

bit 5..6 and the len field bit 10.. 12 determine the length of the instruction uniquely 

length(instruction) = 16*(bit[5:6] + ((1<<len)  & 127 ))

Whereas in my proposal, ignoring reserved spaces, the length is 16*instr[3:2]+48.

I'm proposing a function of 2 bits whereas you are proposing a function of 5 bits.


I think that the biggest difference is that I have a logarithmic encoding of the immediate length +  prefix part of the instruction, and you have a linear encoding of the whole instruction. 
Anyway, whatever floats your/everybodies boat. 

Which one is simpler to decode, assuming you already know instr[4:0] = 11111?

If you have a dual-issue pipeline, or do instruction fusing, you need to be able to decode multiple instructions in parallel. And the entire decode for the 2nd instruction depends on the length for the first instruction. So that better be as quick as you can possibly make it.

OK

Clifford Wolf

unread,
May 15, 2019, 10:04:34 PM5/15/19
to Rogier Brussee, RISC-V ISA Dev
Hi,

I've now added a compressed-packed format, reworded the proposal a bit (I hope for the better :),
and changed the load-immediate format to be able to write to t1/t2.

I've now added the nomenclature of "spaces" (spc) and "subspaces" (ssp) to avoid confusion with
two fields that are both called "opcode" and/or "f2" within the same instruction.

The nice thing about the compressed-packed format is that it can share a space with load-immediate,
jump-and-link and prefix instructions.

I'm still not convinced this would be a very attractive instruction format, but it doesn't hurt to have it in the mix,
and the way it is laid out it should be pretty easy on the decoder, should an instruction opt to use it.

On Wed, May 15, 2019 at 11:16 PM Rogier Brussee <rogier....@gmail.com> wrote:
Again, not sure how that addresses the point that not all instructions have an immediate that's a power of two.

????
 To illustrate that the number of bits of the immediate of the instruction does not have to coincide with the 16* ((1 <<len) % 128) number of bits following bit 32 I used the bfxp instruction with 21 bits of immediate.

You used length(immediate) = 16 for your example, which is a power of two. You can't do length(immediate) = 48 with your encoding because it only supports powers of two.

On second thought, perhaps you mean you cannot encode e.g. a "load a 48 bit integer" LLI48 instruction of length 64 bit ? 

Yes.

Regarding the implication that 3 registers and  21 bit immediate wouldn't fit in a 48-bit encoding with my formats: That's simply not true!

I don't think I implied that,  

Sorry I understood it this way.
 
regards,
 - Clifford

lk...@lkcl.net

unread,
May 15, 2019, 11:41:02 PM5/15/19
to RISC-V ISA Dev, rogier....@gmail.com


On Thursday, May 16, 2019 at 3:04:34 AM UTC+1, clifford wrote:
Hi,

I've now added a compressed-packed format, reworded the proposal a bit (I hope for the better :),
and changed the load-immediate format to be able to write to t1/t2.

lining up the fields as much as possible will save decode logic: 

     |              4                    |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     
----------------------------------------------------------------------------------------------------|

   
...  |    funct7   |   rs2   |   rs1   |  f3 |    rd   |     opcode    | len | 00|page | 00|  11111  | PFX
   
...                               immediate                          |f| len |ssp| rd^ |spc|  11111  | LDI
   
...                               immediate                          |f| len |ssp| rd  |spc|  11111  | JAL
imm16
... |    funct7   | rs2'| f2| rs1'|f89| immediate[15:0]               | len |ssp| rd' |spc|  11111  | C#1?
imm16... |    funct7   |f89| rs2'
| f2| rs1'| immediate[15:0]               | len |ssp| rd' |spc|  11111  | C#2?
imm16
... |    funct7   |   rs2   |   rs1   | immediate[15:0]               | len |    rd   |spc|  11111  | PACK

For comparison, the standard 32-bit format (indentation added for clarity, to line up with the above):

         
|  3                   2                   1                    |

         
|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|

         
|---------------------------------------------------------------|
         
|    funct7   |   rs2   |   rs1   |  f3 |    rd   | opcode(7bit)| 32-bit format
         
| imm[5..11]  |   rs2   |   rs1   |  f3 | imm[0.4]| opcode(7bit)| 32-bit S-format
         
| imm[0..11]            |   rs1   |  f3 |    rd   | opcode(7bit)| 32-bit I-format
         
| imm[12..31                            |    rd   | opcode(7bit)| 32-bit U-format


note in particular that funct7 is lined up in *all* cases (reducing decode logic for funct7).  keeping to how the original 32-bit decode is done is particularly beneficial, it's extremely effective.

C#1 and C#2, i do not know if it would be better to have 2 extra bits to extend funct7 (f89) in bit positions 31-32 or positions 39-49.  i have not explored side-by-side comparisons of C and 32-bit to say.

for comparison, the original (revised):

    |              4                    |  3                   2        |          1                    |
   
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
   
----------------------------------------------------------------------------------------------------|

   
...       funct7   |   rs2   |   rs1   |  f3 |    rd   |     opcode    | len | 00|page | 00|  11111  | PFX
   
...                               immediate                          |f| len |ssp| rd^ |spc|  11111  | LDI
   
...                               immediate                          |f| len |ssp| rd  |spc|  11111  | JAL
   
...           immediate              |      funct9     | rs2'| f2| rs1'| len |ssp| rd' |spc|  11111  | C
   ...           immediate              |    funct7   |   rs2   |   rs1   | len |    rd   |spc|  11111  | PACK


the prior version that i sent had all of those shifted over by 2 bits, so that they lined up @ a 16-bit boundary, such that (hypothetically) 32-bit opcode decoding could be shared (in parallel) without needing a conditional bit-shift to do so.

however with the inclusion of C format, that becomes a bit more challenging: this however may work:

     |              4                    |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     
----------------------------------------------------------------------------------------------------|
...  |    funct7   |   rs2   |   rs1   |  f3 |    rd   |     opcode    |f89| len | 00|page | 00|  11111  | PFX
    
...                               immediate                          |f| len |ssp| rd^ |spc|  11111  | LDI
    
...                               immediate                          |f| len |ssp| rd  |spc|  11111  | JAL
imm18
|    funct7   | rs2'| f2| rs1'|f89| immediate[15:0]               |imm| len |ssp| rd' |spc|  11111  | C#1?
imm18|    funct7   |f89| rs2'
| f2| rs1'| immediate[15:0]               |imm| len |ssp| rd' |spc|  11111  | C#2?
imm18
|    funct7   |   rs2   |   rs1   | immediate[15:0]               |imm| len |    rd   |spc|  11111  | PACK
^^^ imm[NN..18]                                                         ^^^imm[17:16] 

* funct8-9 are moved to bits 15 and 16 (or may be used for "other purposes", exactly as original-revised except original-revised allocated bits 46-47 for "other purposes")
* in the pack format, bits 15 and 16 are bits 16 and 17 of the immediate, respectively
* in both C formats and pack, immediate bits beyond 18 are optional and are the same *number* of immediate bits (as in original-revised)
* funct7 lines up everywhere
* rs2 and rs1 line up in PACK and PFX formats.  C is a little more complex (as in original-revised)
* the 32-bit format is now on a 16-bit boundary
* comparing PFX to C and PACK: f3, rd, opcode and f89 are on the same boundary which has those same bits decoded as immediate, learning from (and copying) how the original 32-bit format does things.
* again, not enough knowledge of C to say if C#1 or C#2 (or even a hypothetical alternative that better lines up bits of rs1/2' and rs1/2) would be preferable.

questions:

* would it be worthwhile extending the "f" field of LDI and JAL to 2 bits, not just for aesthetic reasons but to get bits 15:0 of the immediate to line up with C and PACK formats? 
* if not, would it instead be worthwhile making bit 16 of the format equal to bit 17 of the immediate (again, to achieve the same purpose of making bits 15:0 to line up with C and PACK formats, thereby further reducing decode logic)

l.


Rogier Brussee

unread,
May 16, 2019, 10:18:40 AM5/16/19
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 16 mei 2019 05:41:02 UTC+2 schreef lk...@lkcl.net:
I think there is There is a miscount somewhere since opcode has 7 bits and so there is no room for an f89.  I think to align everything with existing decoders it should be: something like this (bit number changed to hex)

 
      |                4                 |  3                   2        |          1                    |
      
|  F E D C B A 9 8 7 6 5 4 3 2 1 |F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
      
|--------------------------------------------------------------------------------------------------|
immN0 |    funct7     |   rs2   |   rs1    |  f3 |    rd   |     opcode7 |f| len | 00|page | 00|  11111  | PFX
     
...                               immediateN0                       |f| len |ssp| rd^ | sp|  11111  | LDI
     
...                               immediateN0                       |f| len |ssp| rd  | sp|  11111  | JAL
immN16
|    funct7     | rs2'| f2| rs1'|f1|  immediate[15:0             |f| len |ssp| rd' | sp|  11111  | C#1?
immN16|    funct7     |f89| rs2'
| f2| rs1' | immediate[14:0]             |i| len |ssp| rd' | sp|  11111  | C#2?
immN16
|    funct7     |   rs2   |   rs1    | immediate[14:0]             |i| len |    rd   | sp|  11111  | PACK
^^^ imm[N..16]                                                            ^imm[15]                                                          

Remarks:

1) It seems C#1 is slightly more regular than C#2, but C#2 is closer to PACK.
2) Opcode7  and f3 are just as important as func7. In particular bit 16 and 17 are special.
3) Without reusing RVC the C versions look rather less attractive.

 
* in the pack format, bits 15 and 16 are bits 16 and 17 of the immediate, respectively
* in both C formats and pack, immediate bits beyond 18 are optional and are the same *number* of immediate bits (as in original-revised)
* funct7 lines up everywhere
* rs2 and rs1 line up in PACK and PFX formats.  C is a little more complex (as in original-revised)
* the 32-bit format is now on a 16-bit boundary
* comparing PFX to C and PACK: f3, rd, opcode and f89 are on the same boundary which has those same bits decoded as immediate, learning from (and copying) how the original 32-bit format does things.
* again, not enough knowledge of C to say if C#1 or C#2 (or even a hypothetical alternative that better lines up bits of rs1/2' and rs1/2) would be preferable.

questions:

* would it be worthwhile extending the "f" field of LDI and JAL to 2 bits, not just for aesthetic reasons but to get bits 15:0 of the immediate to line up with C and PACK formats? 

They must be reduced to 1 bit!

lk...@lkcl.net

unread,
May 16, 2019, 10:24:44 AM5/16/19
to RISC-V ISA Dev, rogier....@gmail.com


On Thursday, May 16, 2019 at 3:18:40 PM UTC+1, Rogier Brussee wrote:

I think there is There is a miscount somewhere since opcode has 7 bits and so there is no room for an f89.  

vvvv
" * --->funct8-9<---- are moved to --->bits 15 and 16<---- (or may be used for "other purposes", exactly as original-revised except original-revised allocated bits 46-47 for "other purposes")
^^^^

     |              4                    |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     
----------------------------------------------------------------------------------------------------|
                                                                        v v
...  |    funct7   |   rs2   |   rs1   |  f3 |    rd   |     opcode    |f89| len | 00|page | 00|  11111  | PFX
                                                                        ^ ^


lk...@lkcl.net

unread,
May 16, 2019, 10:40:51 AM5/16/19
to RISC-V ISA Dev, rogier....@gmail.com


On Thursday, May 16, 2019 at 3:18:40 PM UTC+1, Rogier Brussee wrote:
err.... errr.... oh whoops, clifford, there's 2 extra digits

    |              4                    |  3                   2        |          1                    |
    |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
                 ^ ^ ^ ^

so that's: 38, 39, 40, 41, 40, 42...

whew, good catch, rogier, i totally missed it as well.

l.

lk...@lkcl.net

unread,
May 16, 2019, 11:29:17 AM5/16/19
to RISC-V ISA Dev, rogier....@gmail.com


On Thursday, May 16, 2019 at 3:18:40 PM UTC+1, Rogier Brussee wrote:
i took a look at the RVC format, and it's nothing like the 32-bit format.  the key difference is: rs2 is where rs1 is placed, and rsd/rs1 is placed where rs2 is placed, if you line up the bits so that at least funct and the 5-bit boundaries line up.

so... there's no point trying to optimise the format to fit the rs/rd registers, however there _is_ a point to lining up the immediate to match imm[15:0].

adding in some of the C formats (the ones that make sense):

      |              4                |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|

     
|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
     
|-----------------------------------------------------------------------------------------------|

immN0
|  funct7     |   rs2   |   rs1   |  f3 |    rd   |     opcode7 |f| len | 00|page | 00|  11111  |
PFX
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd^ | sp|  11111  | LDI
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd  | sp|  11111  | JAL
imm15
|  funct7     |   rs2   |   rs1   | immediate[14:0]             |f| len |    rd   |spc|  11111  | PACK
imm15
|fn789| funct6    | rs1'| f2| rs2'| immediate[14:0]             |f| len |ssp| rd' |spc|  11111  | CArith
imm15 |fn456|func3|imm?which? |   rs2   | immediate[14:0]             |f| len |ssp| rd'
|spc|  11111  | CStackST
imm15
|fn567| func4 |     rs1 |   rs2   | immediate[14:0]             |f| len |ssp| rd' |spc|  11111  | CReg
^^^ imm[N..15]


* CReg is actually identical to PACK (just with rs2 and rs1 reversed) once funct bits 5, 6 and 7 are added, so is redundant.

* CStackST (Stack-relative store) is too awkward: the immediate from stack-relative store messes with the positioning (and dynamic extendability) if you try to keep imm16 in the same place as well as immed[15:0] in the same place.  how do you extend the instruction to use imm16+ and place the bits between funct3 and rs2 in that?  too much of a mess.  scrap it.

* CArith sort-of makes sense as long as rs2' and rs1' are put back in their original slots (clifford you swapped them).  rs2' lines up with the 32-bit decoder, the 3 bits for x8-x15 still match up with their 32-bit equivalents, just with a hard-coded 0b01 for the top 2 bits.  however.... rd is *already* specified in bits 7-9 of the extended format, funct6 doesn't line up... 

* CWideImmediate doesn't make *any* sense (unmodified): rd' already exists in the proposed 48+ bit format, the imm field already exists in PACK... 

basically, fitting C in here is going to take a lot more thought.

personally, i would feel much happier with a completely revised C format (with a reservation made to be able to do that over time), that better fits the fact that rd' is already part of the proposed 48+ bit format.

especially given that... hmmm... who was it... xan phung? sorry, it was over a year ago now: the guy who did that study and came up with RV16, an alternative Compressed format that saved a whopping *25%* code-space *compared to RVC*.

so that leaves:

      |              4                |  3                   2        |          1                    |
      
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|

      
|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
      
|-----------------------------------------------------------------------------------------------|

immN0 
|  funct7     |   rs2   |   rs1   |  f3 |    rd   |     opcode7 |f| len | 00|page | 00|  11111  |
 PFX
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd^ | sp|  11111  | LDI
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd  | sp|  11111  | JAL
imm15 
|  funct7     |   rs2   |   rs1   | immediate[14:0]             |f| len |    rd   |spc|  11111  | PACK
imm15 
|fn789| funct6    | rs1'| f2| rs2'| immediate[14:0]             |f| len |ssp| rd' |spc|  11111  | CArith
^^^ imm[N..15]

with room in CArith for fn789 (and bit 15) to decide what to do with immediate[14:0].  having rd' *and* rs2' *and* rs1' *and* a 15-bit immediate is pretty neat.

l.

lk...@lkcl.net

unread,
May 16, 2019, 11:36:10 AM5/16/19
to RISC-V ISA Dev, rogier....@gmail.com
[well google groups royally screwed _that_ up... *sigh*, let's try again, and in case google screws up a 2nd time, it's attached in plain text.  remember to use a fixed-width font editor]

----
proposal_3.txt

Rogier Brussee

unread,
May 17, 2019, 3:52:14 AM5/17/19
to RISC-V ISA Dev, rogier....@gmail.com


Op donderdag 16 mei 2019 17:36:10 UTC+2 schreef lk...@lkcl.net:
[well google groups royally screwed _that_ up... *sigh*, let's try again, and in case google screws up a 2nd time, it's attached in plain text.  remember to use a fixed-width font editor]

----

i took a look at the RVC format, and it's nothing like the 32-bit format.  the key difference is: rs2 is where rs1 is placed, and rsd/rs1 is placed where rs2 is placed, if you line up the bits so that at least funct and the 5-bit boundaries line up.

so... there's no point trying to optimise the format to fit the rs/rd registers, however there _is_ a point to lining up the immediate to match imm[15:0].

adding in some of the C formats (the ones that make sense):

      |              4                |  3                   2        |          1                    |
     
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
     
|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
     
|-----------------------------------------------------------------------------------------------|
immN0
|  funct7     |   rs2   |   rs1   |  f3 |    rd   |     opcode7 |f| len | 00|page | 00|  11111  | PFX
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd^ | sp|  11111  | LDI
   
...             immediate[N..15]  ...  immediate[14:0]             |f| len |ssp| rd  | sp|  11111  | JAL
imm15
|  funct7     |   rs2   |   rs1   | immediate[14:0]             |f| len |    rd   |spc|  11111  | PACK
imm15
|fn789| funct6    | rs1'| f2| rs2'| immediate[14:0]             |f| len |ssp| rd' |spc|  11111  | CArith
imm15 |fn456|func3|imm?which? |   rs2   | immediate[14:0]             |f| len |ssp| rd'
|spc|  11111  | CStackST
imm15
|fn567| func4 |     rs1 |   rs2   | immediate[14:0]             |f| len |ssp| rd' |spc|  11111  | CReg
^^^ imm[N..15]

* CReg is actually identical to PACK (just with rs2 and rs1 reversed) once funct bits 5, 6 and 7 are added, so is redundant.

* CStackST (Stack-relative store) is too awkward: the immediate from stack-relative store messes with the positioning (and dynamic extendability) if you try to keep imm16 in the same place as well as immed[15:0] in the same place.  how do you extend the instruction to use imm16+ and place the bits between funct3 and rs2 in that?  too much of a mess.  scrap it.

* CArith sort-of makes sense as long as rs2' and rs1' are put back in their original slots (clifford you swapped them).  rs2' lines up with the 32-bit decoder, the 3 bits for x8-x15 still match up with their 32-bit equivalents, just with a hard-coded 0b01 for the top 2 bits.  however.... rd is *already* specified in bits 7-9 of the extended format, funct6 doesn't line up... 

* CWideImmediate doesn't make *any* sense (unmodified): rd' already exists in the proposed 48+ bit format, the imm field already exists in PACK... 

basically, fitting C in here is going to take a lot more thought.
 

imm[N:0]  |  funct7     |   rs2   |   rs1   |  f3 |    rd   |opcode5|op |len |f | 00|page | 00|  11111  | PFX
 
This looks like a phoney 16 bit instruction (phoney because its op field = 11)  followed by a 32 bit instruction for the decoder which it will "fuse" (except of course there _are_ no separate instructions but the decoder does not have to know this) followed by more immediate. 

 ...                               ...  immediate[N:0]               |len |f |ssp| rd^ | sp|  11111  | LDI
 
...                               ...  immediate[N:0]               |len |f |ssp| rd  | sp|  11111  | JAL

This looks like a phoney 16 bit instruction (phoney because its op field = 11)  followed by more immediate

imm[N:0]                               |fn789| fn3| rs1'| f2| rs2'|f|op|len |f |ssp| rd' |spc|  11111  | CArith
imm{N:0]                               |fn789|f1|  rs1  |   func5   |op|len |f |     rd 
 |spc|  11111  | CADDI
imm[N:0]                               
|fn567|f1|  rs1  |   rs2     |op|len |f |     rd  |spc|  11111  | CADD
 
This looks like a phoney 16 bit instruction (phoney because its op field = 11)  followed by a 16 bit instruction (especially if op != 11)  that will be "fused"  followed by more immediate. The CADD is of course equivalent to PACK . 




personally, i would feel much happier with a completely revised C format (with a reservation made to be able to do that over time), that better fits the fact that rd' is already part of the proposed 48+ bit format.

especially given that... hmmm... who was it... xan phung? sorry, it was over a year ago now: the guy who did that study and came up with RV16,


That would be me, based on Xan Phung, who in turn based himself on my Xcompressed proposal. :-)
 
an alternative Compressed format that saved a whopping *25%* code-space *compared to RVC*.

so that leaves:

      |              4                |  3                   2        |          1                    |
      
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1<span styl

lk...@lkcl.net

unread,
May 17, 2019, 4:34:23 AM5/17/19
to RISC-V ISA Dev, rogier....@gmail.com


On Friday, May 17, 2019 at 8:52:14 AM UTC+1, Rogier Brussee wrote:

imm[N:0]  |  funct7     |   rs2   |   rs1   |  f3 |    rd   |opcode5|op |len |f | 00|page | 00|  11111  | PFX
 
This looks like a phoney 16 bit instruction (phoney because its op field = 11)  followed by a 32 bit instruction for the decoder

... [same for CADD/PACK, and for LDI and JAL]
 
that's right, they do, which is why, we can surmise, clifford concluded that adding C-based instructions would be redundant/unnecessary.

or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?

why are they not more like the full 32-bit instructions (with the optional opportunity to extend those, e.g. funct7-->funct9)

with that in mind, i feel it necessary to throw in a "word of caution" about choosing a format that dedicates the entirety of the future opcode space to a *subset* of the register space as the core fundamental basis of all future RISC-V instruction encodings.

in particular, this would make it... challenging to extend register files to beyond 32 entries, such as is a mandatory requirement for efficient GPUs.

any GPU that does not have a minimum of 128 64-bit registers (256 32-bit registers) is forced to pass data back and forth through the L1/L2 cache barrier, resulting in such a huge increase in power consumption that users will punish the implementor that chooses such a path by simply not buying the product, (because it will be too inefficient)

 
especially given that... hmmm... who was it... xan phung? sorry, it was over a year ago now: the guy who did that study and came up with RV16,


That would be me, based on Xan Phung, who in turn based himself on my Xcompressed proposal. :-)

cool!


Rogier Brussee

unread,
May 17, 2019, 5:49:17 AM5/17/19
to RISC-V ISA Dev, rogier....@gmail.com


Op vrijdag 17 mei 2019 10:34:23 UTC+2 schreef lk...@lkcl.net:


On Friday, May 17, 2019 at 8:52:14 AM UTC+1, Rogier Brussee wrote:

imm[N:0]  |  funct7     |   rs2   |   rs1   |  f3 |    rd   |opcode5|op |len |f | 00|page | 00|  11111  | PFX
 
This looks like a phoney 16 bit instruction (phoney because its op field = 11)  followed by a 32 bit instruction for the decoder

... [same for CADD/PACK, and for LDI and JAL]
 
that's right, they do, which is why, we can surmise, clifford concluded that adding C-based instructions would be redundant/unnecessary.

or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?

 
Because the load immediate versions already use the restricted register for rd (I think to have room for  LI  of len bit version of  integers, and  JAL,  single and double precision floats).
and it is sort of a 16 bit prefix to something fitting in 16 bit and an existing  32 bit  formats followed by a variable length standard bitorder immediate. Therefore it makes sense 
to reuse a _decoder_ that can already handle the 16 bit chunks, and (have the option of)  pretending to the _decoder_ that you do some sort of instruction fusion.  
If you want 128 registers, which I understand, it makes perfect sense to have one of these 16 bit PREFIX32 prefixes encode for being followed by an entirely non standard 32 bit format with  7 bit rd rs1 rs2 register specifications 11 bits of opcode, and a variable length standard bit order immediate. You just would not be able to reuse the standard decoder, which is just fine.  

It is also entirely reasonable to think that using the reduced registers is not worth the trouble. 

lk...@lkcl.net

unread,
May 18, 2019, 3:36:11 AM5/18/19
to RISC-V ISA Dev, rogier....@gmail.com
On Friday, May 17, 2019 at 10:49:17 AM UTC+1, Rogier Brussee wrote:

or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?

 
Because the load immediate versions already use the restricted register for rd (I think to have room for  LI  of len bit version of  integers, and  JAL,  single and double precision floats).
and it is sort of a 16 bit prefix to something fitting in 16 bit and an existing  32 bit  formats followed by a variable length standard bitorder immediate. Therefore it makes sense 
to reuse a _decoder_ that can already handle the 16 bit chunks, and (have the option of)  pretending to the _decoder_ that you do some sort of instruction fusion.  

ok so yes, it makes sense... *for those instructions* [the extended JAL and extended IMM].

that leaves the case for imposing the 16-bit reduced format on the rest - the entirety - of the 48+ encoding space.  what is the case (justification) for that?

to make that clear: are all future instructions (not JAL, not IMM) going to be forced to have that reduced rd' field, in perpetuity?

this seems to be a serious (unacceptable) imposition.


If you want 128 registers, which I understand, it makes perfect sense to have one of these 16 bit PREFIX32 prefixes encode for being followed by an entirely non standard 32 bit format with  7 bit rd rs1 rs2 register specifications 11 bits of opcode, and a variable length standard bit order immediate. You just would not be able to reuse the standard decoder, which is just fine.  

this is not what we are doing. what we are doing instead is keeping the 32-bit instruction format identical (not just because of the reduced decoder logic but because we need the concept of extending *ALL* 32-bit instructions UNMODIFIED), embedding it into the 48-bit format, and utilising what precious few bits are remaining to *extend* rs1, rs2 and rd.

turns out that we only have one bit spare per register (due to the need to add predication as well), so that one bit is "if zero, register is a scalar in range 0b00000..0b111111" and "if set, register is a vector in range 0b0000000..0b1111100" i.e. the register number is shifted by 2 bits.

it's by no means perfect, however the space is so extremely limited there's not much else can be done.  going to 64 bit isn't really acceptable as the penalty for doing so is a 25% increase in the instruction cache size (with associated appx 40% increase in power consumption of the same)

l.
 

Rogier Brussee

unread,
May 20, 2019, 12:01:20 PM5/20/19
to RISC-V ISA Dev, rogier....@gmail.com


Op zaterdag 18 mei 2019 09:36:11 UTC+2 schreef lk...@lkcl.net:
On Friday, May 17, 2019 at 10:49:17 AM UTC+1, Rogier Brussee wrote:

or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?

 
Because the load immediate versions already use the restricted register for rd (I think to have room for  LI  of len bit version of  integers, and  JAL,  single and double precision floats).
and it is sort of a 16 bit prefix to something fitting in 16 bit and an existing  32 bit  formats followed by a variable length standard bitorder immediate. Therefore it makes sense 
to reuse a _decoder_ that can already handle the 16 bit chunks, and (have the option of)  pretending to the _decoder_ that you do some sort of instruction fusion.  

ok so yes, it makes sense... *for those instructions* [the extended JAL and extended IMM].

that leaves the case for imposing the 16-bit reduced format on the rest - the entirety - of the 48+ encoding space.  what is the case (justification) for that?



It would be just one _option_ to use for a small bit  of the 48 bit instruction space.
   
 
to make that clear: are all future instructions (not JAL, not IMM) going to be forced to have that reduced rd' field, in perpetuity?


No. (not to mention that nobody seems to like my suggestion)
 
this seems to be a serious (unacceptable) imposition.


If you want 128 registers, which I understand, it makes perfect sense to have one of these 16 bit PREFIX32 prefixes encode for being followed by an entirely non standard 32 bit format with  7 bit rd rs1 rs2 register specifications 11 bits of opcode, and a variable length standard bit order immediate. You just would not be able to reuse the standard decoder, which is just fine.  

this is not what we are doing. what we are doing instead is keeping the 32-bit instruction format identical (not just because of the reduced decoder logic but because we need the concept of extending *ALL* 32-bit instructions UNMODIFIED), embedding it into the 48-bit format, and utilising what precious few bits are remaining to *extend* rs1, rs2 and rd.

turns out that we only have one bit spare per register (due to the need to add predication as well), so that one bit is "if zero, register is a scalar in range 0b00000..0b111111" and "if set, register is a vector in range 0b0000000..0b1111100" i.e. the register number is shifted by 2 bits.

makes sense. You divide up your registers in blocks of 1 or 4. 
 
it's by no means perfect, however the space is so extremely limited there's not much else can be done.  going to 64 bit isn't really acceptable as the penalty for doing so is a 25% increase in the instruction cache size (with associated appx 40% increase in power consumption of the same)



Maybe you can use the last remaining reserved slot in RVC  (Inst[0:1] = 0b00  Inst[13:15] = 0b100) ) as a 16 bit "prefix" for how to interpret the next instruction including 16bit or 48bit or longer ones. Gives 11 bits to play with. Of course being reserved, you are practically guaranteed that some version of RVC v2  will eventually trample over your prefix and not be compatible. However, assemblers and linkers will have to deal with RVC vs RVC v2 anyway, and all that not supporting RVC v2 means is that some 32 bit instruction would not get compressed, so the pain you get for having more bits might be worth it. And yes it is a hack. 


l.
 

lkcl

unread,
Jun 3, 2019, 10:24:47 PM6/3/19
to RISC-V ISA Dev, rogier....@gmail.com
On Tuesday, May 21, 2019 at 12:01:20 AM UTC+8, Rogier Brussee wrote:

> to make that clear: are all future instructions (not JAL, not IMM) going to be forced to have that reduced rd' field, in perpetuity?
>
>
>
>
> No.

Ok good then that needs to be made very clear in the proposal (Clifford).

Before any decisions get made I think it would be an extremely sensible idea to attempt to create the key JAL and LONG IMM instruction encodings *within* the existing 48 and 64 and beyond format, discussed publicly, even if it is clearly unworkable.

By demonstrating that such an effort will fail the reasons for making such far reaching strategic changes to RISCV encodings will be more palatable.

It may not actually fail!

> makes sense. You divide up your registers in blocks of 1 or 4. 


Yes. Now why did I not have such simple words??


>
>
>
>
> Maybe you can use the last remaining reserved slot in RVC  (Inst[0:1] = 0b00  Inst[13:15] = 0b100) ) as a 16 bit "prefix" for how to interpret the next instruction including 16bit or 48bit or longer ones. Gives 11 bits to play with.


I like it.

> Of course being reserved, you are practically guaranteed that some version of RVC v2  will eventually trample over your prefix and not be compatible.

Or xBitManip.

Btw just to check, there is no private cartelled discussion of creating RVCv2 without a wider public real time discussion, is there?

> However, assemblers and linkers will have to deal with RVC vs RVC v2 anyway,

This *has* to be dealt with by having the mvendorid-marchid-isamux scheme implemented BEFORE that happens.

Otherwise, and this is not a joke or something that can be dismissed as "not likely", if there is even *one* official RISCV instruction that changes meaning, even by one bit, in a way that is not dynamically detectable, RISCV is DEAD.

This is not a joke. It is just reality that needs to be recognised and accepted, not dismissed lightly.

Look at Altivec, how sharing the same opcodes between two incompatible extensions had gcc developers and users rejecting PowerPC outright.

I cannot emphasise enough how serious this is. RISCV simply will not recover from such a shortsighted mistake, even if it is a single bit change in the meaning of a single official instruction.



>a nd all that not supporting RVC v2 means is that some 32 bit instruction would not get compressed, so the pain you get for having more bits might be worth it. And yes it is a hack. 

If we did not need the RVC space for potential xBitManip, which also needs the same transparent Vectorisation, to give us Video and ARGB and YUV pixel conversion as parallel ops, I would advocate our team consider pursuing the idea, seriously, even though decode would be messier.

L.

Allen Baum

unread,
Jun 4, 2019, 2:06:51 PM6/4/19
to lkcl, RISC-V ISA Dev, Rogier Brussee
I'm going to have to disagree, probably because I don't understand which scenario you're tinking of.
If there is an "official" RISC Foundation sanctioned change to an existing instruction, then that change would be trivially be detected dynamically simply by executing some instruction that behaves differently and seeing what the result is. Compilers, etc, must be told which version of the architecture they are compiling for, since the mvendorid-marchid-isamux proposal won't work. If you have a custom extension that steps on reserved opcodes, then you're kind of on your own. If you have a custom extension that steps on opcode used by some other unimplemented extension, then you need to tell the compilers you don't have the extension so it doesn't try to fgeerate instructions that use it.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/a5e91093-a9b7-4e17-91a3-ca9f84bcf322%40groups.riscv.org.

lk...@lkcl.net

unread,
Jun 4, 2019, 4:59:54 PM6/4/19
to RISC-V ISA Dev, luke.l...@gmail.com, rogier....@gmail.com


On Tuesday, June 4, 2019 at 7:06:51 PM UTC+1, Allen Baum wrote:
I'm going to have to disagree, probably because I don't understand which scenario you're thinking of.

if you recall it took jacob bachmeyer and i (plus others) several weeks, if not a good couple of months, to go over this.

 
If there is an "official" RISC Foundation sanctioned change to an existing instruction, then that change would be trivially be detected dynamically simply by executing some instruction that behaves differently and seeing what the result is.

and what then?

what about binaries that cannot be recompiled, as they are binary-only distributions?

what about proprietary binaries that are distributed widely, become mission-critical, and the company that made the software goes out of business?

relying on the *compiler* to be the location at which the decision is made to support INCOMPATIBLE hardware is a myopic perspective, one that is *absolutely* guaranteed to cause vendors to throw their arms up in absolute horror and run away screaming from RISC-V.

if on the other hand there is no desire to have rock-solid stability in RISC-V's future, then of course it is absolutely fine to **** things up by making drastic incompatible changes to instructions.

 
Compilers, etc, must be told which version of the architecture they are compiling for,

of course... except that you have forgotten that the compiler is not and cannot be the answer to everything, due to the "legacy binaries" scenario, without which the faith and trust in RISC-V as a reliable long-term platform is undermined.

permanently.

 
since the mvendorid-marchid-isamux proposal won't work.

why not?  (bear in mind, you said above that you don't understand it).

 
If you have a custom extension that steps on reserved opcodes, then you're kind of on your own.

actually, with the isamux proposal, the hardware may DYNAMICALLY switch OFF the custom extensions, entirely.  that is ENTIRELY the point and purpose of it.  is this aspect of the isamux proposal something that you understood?

thus, "legacy" binaries will always successfully run... WITHOUT RECOMPILATION... because they are given their own dedicated "isamux" ID.

thus, it provides a smooth non-disruptive transition path over a 5-20 year period and beyond.

thus, a vendor that has an implementation that tramples all over reserved opcodes may STILL BE COMPLIANT.  all they do is run the compliance suite software with "isamux=0".

... you see how that works?

now, if there is a drastic change to the spec (one that is an emergency and absolutely unavoidable), then that would become an OFFICIAL "isamux=1" requirement, i.e. it would require that vendors OFFICIALLY set "isamux=1".

i.e. NEW vendors with a NEW implementation may choose:

* whether to support "legacy" binaries (isamux=0) and to apply for Certification of the "old" RISC-V standard
* whether to support the (emergency, incompatible, OFFICIAL) new format (through isamux=1), and to apply for Certification of the NEW RISC-V standard
* whether to support both.



If you have a custom extension that steps on opcode used by some other unimplemented extension, then you need to tell the compilers you don't have the extension so it doesn't try to fgeerate instructions that use it.

there is more to it, and yet it is far simpler than it seems, and also much better understood than you may have been led to believe.

if you think through the well-known case of the dynamic little-endian / big-endian synergistic hardware-and-compiler support, mvendorid-marchid-isamux is that EXACT same concept, generalised and formalised.

if you understand fully how hardware can switch the meaning of the instruction set dynamically between little-endian and big-endian, you have also understood the isamux proposal.

l.

Allen Baum

unread,
Jun 4, 2019, 6:17:50 PM6/4/19
to lk...@lkcl.net, RISC-V ISA Dev, lkcl, Rogier Brussee
OK, now I know the scenario(s) you're thinking over. I was thinking of a different one

I would agree that making a backwards incompatible change to the architecture without some way to execute old code (and there are many ways to do that, not just isamux) is, um short-sighted, shall we say. Simply trapping on ops that are either changed or no longer supported is the easy, and most likely, fix for that. You can even automatically patch the code if it encountered to avoid performance penalties.

I don't agree that a generalized isamux is the best way to go. I think point solutions (as my example above and your little/big endian switch) are far easier to implement and validate.
In fact the architected solution (XS, FS) probably works just fine for any custom extension that you might want to turn on and off.

Far uglier architectures have survived for far longer than 25 years with less.

The real solution is not to ship binaries, but only ship intermediate representations that can be complied to the final HW.
That has been a very profitable business for some. We will see more of it...

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lkcl

unread,
Jun 4, 2019, 7:19:40 PM6/4/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
On Wednesday, June 5, 2019 at 6:17:50 AM UTC+8, Allen Baum wrote:
> OK, now I know the scenario(s) you're thinking over. I was thinking of a different one

Ok :) whew

>
>
> I would agree that making a backwards incompatible change to the architecture without some way to execute old code (and there are many ways to do that, not just isamux) is, um short-sighted, shall we say. Simply trapping on ops that are either changed or no longer supported is the easy, and most likely, fix for that.

Think it through: a "legacy" design compatible only with the "old" official spec would potentially have no way of trapping, particularly on "Base" extensions.

Some vendors are *not* supporting the "please disable this extension" capability, which, even if used, would generate an absolutely awful number of traps as the *entire* official extension's opcode space would now be effectively software emulated.

And if the change was in say RV64I, we are *really* in trouble as far as trap volumes are concerned!

This is why binary incompatibility - even one single bit change in an "official" extension - is an absolute unmitigated disaster for the entire RISCV ecosystem.

I would go so far as to say that it would be better to start again from scratch, with RISC-VI (RISC-6 for those people not familiar with Roman numerals). At least then the full implications of the binary incompatibility would be properly appreciated.

> You can even automatically patch the code if it encountered to avoid performance penalties.

Yes that would actually work. A full decompilation or full RISCV to RISCV JIT engine, translating either statically or dynamically to the "new" format.

I would however expect most vendors to hit the roof if this was made a mandatory requirement to support both "legacy" and "new" official RISCV binaries.

Plus, someone would need to develop the software tool, and the "new" meaning could NOT be ratified until it was absolutely known that the software JIT / decompile-recompile approach worked 100% reliably.

Again, hairy nightmare basically.


>
>
> I don't agree that a generalized isamux is the best way to go. I think point solutions (as my example above and your little/big endian switch) are far easier to implement and validate.

The thing is that if the big/little dynamic switch is acceptable, that will go in 1 bit of a dedicated official "switching" CSR, yes?

Now let us suppose that there is another problem that needs solving (either emergency or just "a damn good idea, so good that the performance enhancement cannot be ignored" such as RVCv2).

Where would the logical place be to have the CSR bit that switches the meaning of RVC opcodes from v1 meaning to v2 meaning?

Right next to the bit that switches LE/BE mode, of course!

This *is* the isamux concept.

Thus, in the very incremental fashion that you describe, more bits over the years (decades) accumulate, none of them disruptive, all of them very carefully planned and managed. One bit added at a time.

The next logical progression of this is to split the CSR into "officially reserved" space and "custom useable space", and that is the full extent of the isamux proposal.

Really very simple and, when the alternatives are fully evaluated and found to be unfortunately unacceptable or just plain dangerous, we are left reluctantly with isamux as the only palatable option.

That is not to say that it should be abused! It is designed for really, really serious situations (or advantages so compelling that they cannot be ignored), because the work needed particularly on the binutils and gcc side is simple enough but needs to be really carefully architected and thought through.

About that: Jacob Bachmeyer and I went over it, you have to emit the full triple, the mvendorid-marchid-isamux from the gcc assembler backend (a table managed atomically by the FSF *NOT* the RISCV Foundation), and binutils picks that up and generates the required CSR isamux switching (in and out).

This means that precompiled binaries with eg bigendian in one and littleendian in another will actually work and run on the same system that supports BE/LE isamuxing.

It's kinda simple and kinda complex at the same time. Takes getting used to.

> In fact the architected solution (XS, FS) probably works just fine for any custom extension that you might want to turn on and off.

Slightly confused, apologies. Custom extensions are partly on their own, partly given "safe breathing space" by isamux, of much greater concern as described above is switching on or off "official" extensions, partucularly as doing so is entirely optional and I know of no system that has even done it, probably precisely because it is optional, and complicates the instruction decode phase quite a lot.

Remember I warned 18 months ago about what happens when a Standard makes things optional? Sigh.

>
> Far uglier architectures have survived for far longer than 25 years with less.

:)

>
> The real solution is not to ship binaries, but only ship intermediate representations that can be complied to the final HW.

Sigh I agree, if LLVM (and other JIT approaches) were 100% the norm, right across the board, we would not be having this discussion.

Reality is however that the most critical piece of the puzzle - the linux kernel, being the highest volume of systems out there - has only just had the last c variable dynamic data structure removed about 6 months ago, which was preventing and prohibiting it from being compiled by clang-llvm.

Point being, it is far too early and with RTOS vendors having to go through the same pain barrier, LLVM and other JITs are just not going to see the level of adoption any time soon, if at all, that would make the need for isamux go away.

Sigh.

L.


Jonathan Behrens

unread,
Jun 4, 2019, 8:52:14 PM6/4/19
to lkcl, RISC-V ISA Dev, lk...@lkcl.net, rogier....@gmail.com
Could you explain how the isamux idea is different than the current misa? From your description they both seem to just be a collection of bits indicating what extensions are enabled. It sounds like you are concerned about how current implementations are choosing to make that register read only. However, although none of the current extensions require that their feature bits in misa are writable, a future extension could totally say that "this extension must start disabled and be turned on only by writing to misa". In fact, it is likely that the UNIX-class platform spec will require all extensions with user-visible state to start disabled. Extensions being mutually exclusive also wouldn't be a problem because the misa register is already WARL.

I'm also not following why there would ever have to be an "emergency update" that changed the meaning of existing RV64I instructions. What is wrong with adding instructions with new opcodes using the normal extension mechanism? Are there any example of these emergency changes for prior ISAs?

--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lkcl

unread,
Jun 4, 2019, 11:42:03 PM6/4/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com
On Wednesday, June 5, 2019 at 8:52:14 AM UTC+8, Jonathan wrote:
> Could you explain how the isamux idea is different than the current misa?

Sure. Let's change the subject line too, should have done that 5 msgs ago.

> From your description they both seem to just be a collection of bits indicating what extensions are enabled.

Not quite.

Two key issues,

1 isamux is designed to allow EXISTING instructions, even ones in RATIFIED extensions, to have ONE (or more) opcodes change meaning.

In the case of LE/BE the dynamic change of the CSR bit associated would actually affect *multiple* instructions on *multiple* extensions, past present *and future* (RV128, QUAD FP LD/ST).

MISA was most definitely and clearly never intended for that purpose.

2. MISA can be readonly (WARL is it called?) and the choice of whether to be read only or writable is up to the vendor.

isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.

ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example.

> It sounds like you are concerned about how current implementations are choosing to make that register read only.

This is misstating things. I am not personally concerned, I can however perceive that, from my experience in writing long-lasting standards, letting MISA be readonly OR writeable was a costly mistake.

Extensibility in standards *has* to be managed with some very specific rules, that the RISCV Foundation violated in this instance.

> However, although none of the current extensions require that their feature bits in misa are writable, a future extension could totally say that "this extension must start disabled and be turned on only by writing to misa".

It could.. that does not help at all with existing hardware. Also, the source of the problem is that it is *not mandatory* to have extensions be disableable.

Oh. I just remembered. It has been so long.

3. The other key difference is that MISA requires DESTRUCTION of extension state information. It is LITERALLY a kill-switch.

isamux DOES NOT DO THAT. isamux is an INSTRUCTION ENCODING "lengthener", that may be viewed as "adding hidden bits to the instruction" using escape-sequence methodology (a CSR).

The example LE/BE CSR bit for example is as if you now had a 33 bit instruction.

Thus, hypothetically (or more like actually) a major extension behind isamux would ALSO need a MISA bit, because whilst the MISA bit would disable the extension (and destroy any state info), isamux would NOT.

This has implications for context switching. MISA state must STILL BE SAVED for BOTH extensions behind an isamux because BOTH EXTENSIONS ARE STILL ACTIVE.

So it really is truly a different purpose. And has no associated delays on switching. isamux bits *literally* plug directly into the instruction decode because they are literally the 33rd (and 34th, and 35th) bits of the instruction.

NOT an on/off kill switch for an extension.


> In fact, it is likely that the UNIX-class platform spec will require all extensions with user-visible state to start disabled.

If there exist even one hardware vendor (that has spent millions on hardware that does not support that), that is not possible. It's far too late.

Are we going to have disparate kernel binaries, one for "legacy" hardware and one for "clean extension state"? Suggest it on sw-dev and see how far the idea gets (hint: wear fireproof jacket) :)

Changes of that nature would be disastrous to existing hardware that was expected to be backwards compatible with such a drastic spec change.

I have to remind people, this is why it is such a seriously bad idea to hold secretive cartelled discussions such as having a closed UNIX WG list.

> Extensions being mutually exclusive also wouldn't be a problem because the misa register is already WARL.
>
>
>
> I'm also not following why there would ever have to be an "emergency update"
> that changed the meaning of existing RV64I instructions.

Conflation in your mind allowed you to believe I said that. I did not. However I will follow up with a separate post on the topic.

A likely emergency scenario would be.. I dunno... hmmm, perhaps that there was not enough analysis done of a particular instruction, or that insights came to light only well after public consultation and were ignored even then, and, far too late, ratified and silicon released...

... and then it turns out that yes, a mistake was indeed made that had far reaching damaging consequences. A CSR needed changing from WARL to something else (the privspec 1.11 notes precisely such a change. I have not analysed it).


> What is wrong with adding instructions with new opcodes using the normal extension mechanism?

This was again discussed 18 months ago, it was actually the reason for the discussion in the first place.

Count the number of spare major 32 bit opcodes available for official extension usage. (answer: none. There are only brownfield spaces left. 2 major opcodes are reserved for RV128. 2 for custom space).

After the brownfield is officially used up, we are forced to go into 48 bit.

Or... there is always the possibility of using isamux.

isamux could be used for example to declare that the entirety of the RVV opcode space is OFFICIALLY available either for custom extensions or for other official RISCV extensions.

All that would be needed: set isamux "bit 2" equal to zero, and the changes are MUXed out: the RVV opcode space HAS to be treated as "raises an unimplemented trap" (thus allowing software emulation of RVV), and if the bit is 1 then it HAS to be implemented as whatever-official-MUXed-in-opcodes-sit-in-that-space.

What is fascinating to me is that BOTH options may result in unimplemented traps, allowing vendors the option to software emulate BOTH isamuxed instruction encodings!


> Are there any example of these emergency changes for prior ISAs?

Prior ISAs? I don't know my ISA history that well.

The examples that I know were quoted from the discussion 18 months back were

Intel, as THE absolute rock solid canonical benchmark / example of how to stick to your guns on the ISA. They have taken backwards compatibility to the absolute inviolate limit, even when 8086, 186, 286, 386 and 486 compatibility makes an absolute pig's ear of the layout.

You should see the ASIC photos for the bit that covers legacy instructions, compared to the rest of the design, it's hilarious.

The other example I already mentioned, it's Altivec / SSE conflict in Power PC. There were people with experience of PPC who confirmed what should be blindingly obvious but clearly wasn't to the muppets that decided to reuse the same opcodes to create utterly incompatible binaries.

MIPS bless 'em probably have something similar, although because of the more proprietary and embedded nature of MIPS it is far less impact because, well, embarrassingly, there isn't a public ecosystem to speak of.

ARM have not made the mistake surprisingly. They're big enough and ugly enough. The switch to hardfloat 10 years ago was painful for distros but was executed cleanly precisely because there were no incompatibilities, only emulation needed.

Others with more historical knowledge will know some examples, I'd love to hear as I do find them both useful and also funny.

L.

lkcl

unread,
Jun 5, 2019, 12:17:53 AM6/5/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com
I promised a separate post about RV32I / RV64I. I will be brief.

Several time there have been complaints that RV32 integer is binary incompatible with RV64.

This is a costly mistake given that it has been demonstrated that RV32 binaries are smaller than their directly-compiled RV64 counterparts.

"solutions" to this have been offered by way of switching the hardware to 32 bit mode... except of course *no actual hardware exists* that supports that as an option because, surpriiise, it's optional.

I know *why* the meaning of the ADD opcode changes in the 64 and 32 bit execution environment, it's because otherwise you need Pascal Triangle opcode proliferation to interface between 32 and 64 bit, and 32 and 64 and 128 bit, and 64 and 128 bit, just as exists in the FP opcode space.

With so many more options in the INT space for such proliferation (ANDing, ORing, xBitManip.in future), it made some sense to save opcodes by allowing the meaning of the opcode to change.

At the cost of binary incompatibility, with associates increase in executable size.

If RVV had been put into the 48 bit space (where it really belongs, given that predication is limited, and so is the total numbedr of vector registers, a future version is going to *have* to exist in 48b anyway), if this had been properly openly discussed, this would potentially have come up, the space reserved for RVV freed, and the INT ops allowed to mirror the strategy used in FP, giving us binary RV32/64 compatibility in the process.

There does however exist a way in which, HYPOTHETICALLY, we as a community may actually be empowered to realistically discuss such an optional and drastic yet COEXISTING beneficial change to RISCV ISA at such a fundamental level, yet still in a nondisruptive fashion:

isamux.

With for example "bit 4" of isamux ratified officially for such an endeavour, exploration of enhancements to RISCV that would allow RV32 binaries to execute on an RV64 system could actually be seriously considered and evaluated.

However Johnathon this is not an "emergency" spec change, that was a conflation that can be cleared up by rereading my post more carefully.

It is however an example of a beneficial change that may prove to be sufficiently compelling as to be worthwhile pursuing, where as of right now, without isamux, it is absolutely off the table and not even worth pursuing except as an academic exercise.

Other examples include RV16 (bit 5 of isamux), adding FP clip (missing from base but present in RVV - bit 6 of isamux), SIMD INT clipping and rounding suitable for audio usage as outlined by AndesStar (bit 7 of isamux), auto switching of opcodes to SIMD, in both the FP and INT spaces (bits 8 to 9 of isamux, to cover 8/16/32/None SIMD splits of standard RV register files)

The possibilities are endless and *do not happen* without isamux, not in the 16 bit or 32 bit space, because there is no more room.

No, not getting Ratification because of using standard RISCV encodings for custom usage, and more to the point having to maintain a hard fork of the entire software ecosystem, is NOT a viable or realistic option.

Yes it was already mentioned that Redhat would take a dim view of anyone forking the Fedora distro for the purposes of supporting their own custom (incompatible) ISA extensions. This for the damage and harm it would do to the Fedora community, and as Redhat is a trademark which would be harmed by such attempts, they would be justified in flexing some legal muscle.

isamux is a way out of a huge number of corners that the whole RISCV community has accidentally been backed into.

L.

Allen Baum

unread,
Jun 5, 2019, 1:12:55 PM6/5/19
to lkcl, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
You said:
 for "isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.
ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example."

This sounds completely wrong. If I have an implementation that only implements BE or LE, then it can't switch, and the the bit should be RO. You might trap if you try to write it to the value that isn't implemented, of course...
Alternatively, it is writable, and if written, it disables all Ld/St/Atomic ops and traps if you attempt to execute them.

What are you thinking here? It can't be that you require all implementations always implement both LE or BE or any other ops that is covered by isamux.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Allen Baum

unread,
Jun 5, 2019, 1:28:50 PM6/5/19
to lkcl, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
Oh, an Intel example:
Because of the variable length nature of the encoding, ugly as it is, they are able to fit new instructions in without stepping on old ones - so they don't have that issues. 
 They also have an architected method of discovering which architectural features exist or don't. RISC-V has that to some extent, though (beside MISA), its very ad-hoc (e.g. try it, and if you trap - it isn't implemented. Write a CSR, read it back, and if its different, some bits aren't implemented or that combination isn't legal). I'm not real happy with that...

But, Intel has actually turned of functionality that has existed for 40 years. It's a slow, multi-year sequence to do that (this isn't the sequence, but it will give you a taste):
 - announce you are deprecating some instruction/feature
 - add some bit in a control register or fuse that turns it off
 - ship the next generation with the feature turned off by default (but a uCode patch can re-enable it)
 - ship further generations that don't implement the feature at all
I'm probably missing some steps there, 

Jonathan Behrens

unread,
Jun 5, 2019, 1:53:13 PM6/5/19
to lkcl, RISC-V ISA Dev, lk...@lkcl.net, rogier....@gmail.com
> What is fascinating to me is that BOTH options may result in unimplemented traps, allowing vendors the option to software emulate BOTH isamuxed instruction encodings!

This would be a very cool capability but it wouldn't be all that usable in practice. If the unimplemented traps occur frequently (like they would from doing trap-and-emulate on every compressed instruction) then you are going to get something like a 100x slowdown. You'd be way better off using binary translation to "JIT compile" your program to the ISA your processor actually supports. QEMU can do this with a very moderate performance slowdown. And if the traps are going to occur rarely, then the extension isn't that impactful to code size and probably shouldn't be entitled to prime opcode real-estate in the first place...

> 1 isamux is designed to allow EXISTING instructions, even ones in RATIFIED extensions, to have ONE (or more) opcodes change meaning.
> In the case of LE/BE the dynamic change of the CSR bit associated would actually affect *multiple* instructions on *multiple* extensions, past present *and future* (RV128, QUAD FP LD/ST).
> MISA was most definitely and clearly never intended for that purpose.

I agree that a BE extension would have a larger impact than the existing ones and may not have been considered when designing misa, but I don't see why that would actually be a problem. Lets say we decided to add some new extension called "H" with the following semantics:

* all current processors hard-wire misa bit 7 to zero and thus are always in little endian mode
* processors which support our extension must support writing both 0 and 1 to misa bit 7, with the bit starting out as 0 on boot. We further assert that if they do anything else they are in violation of the spec and non-conformant
* whenever bit 7 is set the processor executes all loads/stores in big endian mode

What actually goes wrong with this proposal? All legacy LE code will continue to work, and new code can choose either continue using LE code or use BE code when running on newer processors (and either refuse to support old processors or have a separate fallback path to run in LE). There is also no state that would be lost by switching modes with misa.

--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Lifshay

unread,
Jun 5, 2019, 4:12:38 PM6/5/19
to Jonathan Behrens, Luke Kenneth Casson Leighton, RISC-V ISA Dev, lk...@lkcl.net, rogier....@gmail.com


On Wed, Jun 5, 2019, 10:53 Jonathan Behrens <fint...@gmail.com> wrote:
I agree that a BE extension would have a larger impact than the existing ones and may not have been considered when designing misa, but I don't see why that would actually be a problem. Lets say we decided to add some new extension called "H" with the following semantics:

* all current processors hard-wire misa bit 7 to zero and thus are always in little endian mode
* processors which support our extension must support writing both 0 and 1 to misa bit 7, with the bit starting out as 0 on boot. We further assert that if they do anything else they are in violation of the spec and non-conformant
* whenever bit 7 is set the processor executes all loads/stores in big endian mode

What actually goes wrong with this proposal? All legacy LE code will continue to work, and new code can choose either continue using LE code or use BE code when running on newer processors (and either refuse to support old processors or have a separate fallback path to run in LE). There is also no state that would be lost by switching modes with misa.
One problem with that is you would need to pick a different misa bit since H/bit-7 is used for the upcoming revamped hypervisor proposal (It may be complete by now though, I didn't check).

Jacob

lk...@lkcl.net

unread,
Jun 5, 2019, 4:22:39 PM6/5/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com


On Wednesday, June 5, 2019 at 6:12:55 PM UTC+1, Allen Baum wrote:
You said:
 for "isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.
ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example."

This sounds completely wrong.

i know!  as i was writing it, i was thinking, "hang on, this can't be true", however it's... well, you go through the options below
 
If I have an implementation that only implements BE or LE, then it can't switch, and the the bit should be RO.

 if it only implements BE or LE but not both, such that the bit is RO, then isamux is not the correct thing to use.

 isamux is a "hidden opcode bit" (bit 33, bit 34), it's *not* an "Extension Enablement" or an "Extension Capabilily Declaration".

 so to have the bit be read-only makes no sense.  it's as if say... an implementation tried to declare that bit 15 of the 32-bit RV opcode was read-only!

 
You might trap if you try to write it to the value that isn't implemented, of course...

ahh now we're getting somewhere.  this makes more sense, because if it generates a trap, it allows the software to go into "emulation" mode.  however, what would need to be done would be, because the rules are that the new (hidden) bit is added to ALL instructions (in that privilege mode), ALL instructions from that point onwards (in that privilege mode) would need to be trapped-and-emulated, until such time as the emulator saw an attempt to set the bit back to the "supported" value.

obviously this would be bad :)
 
Alternatively, it is writable, and if written, it disables all Ld/St/Atomic ops and traps if you attempt to execute them.

*now* we are getting somewhere.  *this* is a viable option.  basically, it's like this:

(isamux=0) LD xxx    # actually a 33-bit operation, mapping to standard RVC LD
(isamux=0) CSR ISAMUX=1 # from this point onwards, the 33rd bit is now 1.
(isamux=1) LD xxx    # this is an unknown unsupported instruction, detected at decode! trap!
(isamux=1) ...
(isamux=1) CSR ISAMUX=0 # from this point onwards, the 33rd bit is now 0.
(isamux=0) ....
 
so this illustrates that it genuinely is and does have to be considered part of the *opcode*, going *directly* into the instruction decoder.

it is perfectly ok even to throw traps on *both* isamux LE/BE bit = 1/0!  of course, that would mean emulating all LD/ST operations (both BE or LE), but that would be madness.  if it were even possible.

however on another example, say bit 2, which would, say, represent RVCv1 and RVCv2, *now* it makes more sense that CSRRS ISAMUX |=0b10 or CSRRC ISAMUX &=~0b10 would be trapped-and-emulatable by a system that didn't support *either* RVC1 *or* RVCv2.


What are you thinking here? It can't be that you require all implementations always implement both LE or BE or any other ops that is covered by isamux.

i hope the above illustrates that yes, because it is quite literally a (hidden) extension of the opcode (33rd bit, 34th bit), it is a mandatory requirement that the instruction *decode* phase accept (and respond to) the additional isamux bit.

however what you've very usefully helped clarify is that trap-and-emulate is permitted on both *or* either *or* none of the options... but they *do* have to be supported, because the isamux bits go *directly* into the instruction decode logic, no question about it.

l.

lk...@lkcl.net

unread,
Jun 5, 2019, 4:31:15 PM6/5/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com


On Wednesday, June 5, 2019 at 6:28:50 PM UTC+1, Allen Baum wrote:
Oh, an Intel example:
Because of the variable length nature of the encoding, ugly as it is, they are able to fit new instructions in without stepping on old ones - so they don't have that issues. 

 yehyeh.  it gets more and more unwieldy as time goes by, but it works.
 
 They also have an architected method of discovering which architectural features exist or don't. RISC-V has that to some extent, though (beside MISA), its very ad-hoc (e.g. try it, and if you trap - it isn't implemented. Write a CSR, read it back, and if its different, some bits aren't implemented or that combination isn't legal). I'm not real happy with that...

i can see why :)
 

But, Intel has actually turned of functionality that has existed for 40 years. It's a slow, multi-year sequence to do that (this isn't the sequence, but it will give you a taste):
 - announce you are deprecating some instruction/feature
 - add some bit in a control register or fuse that turns it off
 - ship the next generation with the feature turned off by default (but a uCode patch can re-enable it)
 - ship further generations that don't implement the feature at all
I'm probably missing some steps there, 

it sound eminently sensible to me, and exactly the kind of process that a responsible Standards Implementor would follow.  i'm guessing that this process takes literally decades, and that they respond to customer feedback.  if enough customers contact them and complain that they're still using that feature, they might actually delay the deprecation by a significant number of years.

RISC-V - as a Foundation that just does not have that level of control over implementations - doesn't really have this process as an option to follow.  the coordination required to get the feedback from hundreds of implementors' customers... blech :)

l.

Allen Baum

unread,
Jun 5, 2019, 4:45:28 PM6/5/19
to lk...@lkcl.net, RISC-V ISA Dev, luke.l...@gmail.com, rogier....@gmail.com
I'm reminded of an extended that helps this along. Before turning off the feature, they make it run painfully slow...

-Allen
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 5, 2019, 4:46:52 PM6/5/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com


On Wednesday, June 5, 2019 at 6:53:13 PM UTC+1, Jonathan wrote:
> What is fascinating to me is that BOTH options may result in unimplemented traps, allowing vendors the option to software emulate BOTH isamuxed instruction encodings!

This would be a very cool capability but it wouldn't be all that usable in practice. If the unimplemented traps occur frequently (like they would from doing trap-and-emulate on every compressed instruction) then you are going to get something like a 100x slowdown. You'd be way better off using binary translation to "JIT compile" your program to the ISA your processor actually supports.

perfect.  yes, good call.  you get the idea, perfectly.  simplest option trap-and-emulate, alternative option, JIT-recompile.  either way works.

it's basically no different from not supporting "N E Other extension"... except now rather than an implementation not supporting *one* option (in hardware) and having to trap-and-do-something, now it has *two* options to trap-and-emulate.

basically, in the trap, the isamux CSR *must* be examined in the trap and taken into consideration *literally* as if it was part of the instruction to decode.  this being mandatory in the hardware, it is also mandatory in the trap.  whatever that trap happens to do.

 
I agree that a BE extension would have a larger impact than the existing ones and may not have been considered when designing misa, but I don't see why that would actually be a problem. Lets say we decided to add some new extension called "H" with the following semantics:

* all current processors hard-wire misa bit 7 to zero and thus are always in little endian mode
* processors which support our extension must support writing both 0 and 1 to misa bit 7, with the bit starting out as 0 on boot. We further assert that if they do anything else they are in violation of the spec and non-conformant
* whenever bit 7 is set the processor executes all loads/stores in big endian mode

What actually goes wrong with this proposal?

apart from what jacob says in a follow-up proposal, you missed that MISA quite literally destroys the state information.  MISA is a *full-on* "off" switch.  it destroys state that is reinitialised to known-good on re-enablement.

for complex extensions, that could have huge ramifications, cause significant latency as the re-initialisation could take several cycles to complete.

by contrast ISAMUX is *literally* an extra 33rd (or 34th, or 35th) instruction bit [or the 49th or 50th bit in the case of 48-bit instructions, or the 65th or 66th bit in 64-bit isa space, you get the idea).

MISA most definitely and categorically is *NOT* an extension to the opcode space.  it is most definitely *NOT* a 33rd bit that tacks - quite literally - onto the end of the instruction, to be inserted into the decoder (at the hardware level or in the trap).

it is perfectly legitimate (and may be useful) for example to set ISAMUX=1 (the 33rd opcode bit) and then set the corresponding MISA bit to DISABLE THE EXTENSION that was accessible only by setting ISAMUX=1 (the 33rd instruction bit) in the first place.

what would happen here?  oh look!  the instruction decode phase would find that MISA had disabled the Extension - just like any other extension - and that means it has to be treated, by the decode phase, as an illegal instruction.  what do we do when an illegal instruction encoding is found?  we throw a trap!

so in the case where CSRRS ISAMUX |=1 is called, and then MISA disables the Extension, you are *still required to trap* [and can choose to emulate]

bear in mind also that isamux covers the option to affect (change the meaning of) MULTIPLE instructions across MULTIPLE Extensions (past, present and future).  it is NOT PERMITTED to affect the state OF the extension (other than that as required by actioning the 33/34/35-bit-long instruction itself of course).

MISA disables (or enables) only *one* Extension (per bit).... and destroys state information in the process.

ISAMUX *does not permit that* because it is *not ISAMUX's job*.  it is *literally* - plain and simple - hooked directly, permanently and irrevocably, into the instruction decoder, at the hardware level.

l.




Jonathan Behrens

unread,
Jun 5, 2019, 6:05:01 PM6/5/19
to lk...@lkcl.net, RISC-V ISA Dev, lkcl, rogier....@gmail.com
> apart from what jacob says in a follow-up proposal, you missed that MISA quite literally destroys the state information.  MISA is a *full-on* "off" switch.  it destroys state that is reinitialised to known-good on re-enablement.
> for complex extensions, that could have huge ramifications, cause significant latency as the re-initialisation could take several cycles to complete.

Jacob correctly points out that we'd need a different letter than "H" but this is a trivial change. However, your concern about destroying state doesn't apply in this case because my proposed extension has no state.

> by contrast ISAMUX is *literally* an extra 33rd (or 34th, or 35th) instruction bit [or the 49th or 50th bit in the case of 48-bit instructions, or the 65th or 66th bit in 64-bit isa space, you get the idea).
> MISA most definitely and categorically is *NOT* an extension to the opcode space.  it is most definitely *NOT* a 33rd bit that tacks - quite literally - onto the end of the instruction, to be inserted into the decoder (at the hardware level or in the trap).
> it is perfectly legitimate (and may be useful) for example to set ISAMUX=1 (the 33rd opcode bit) and then set the corresponding MISA bit to DISABLE THE EXTENSION that was accessible only by setting ISAMUX=1 (the 33rd instruction bit) in the first place.
> what would happen here?  oh look!  the instruction decode phase would find that MISA had disabled the Extension - just like any other extension - and that means it has to be treated, by the decode phase, as an illegal instruction.  what do we do when an illegal instruction encoding is found?  we throw a trap!
> so in the case where CSRRS ISAMUX |=1 is called, and then MISA disables the Extension, you are *still required to trap* [and can choose to emulate]
> bear in mind also that isamux covers the option to affect (change the meaning of) MULTIPLE instructions across MULTIPLE Extensions (past, present and future).  it is NOT PERMITTED to affect the state OF the extension (other than that as required by actioning the 33/34/35-bit-long instruction itself of course).
> MISA disables (or enables) only *one* Extension (per bit).... and destroys state information in the process.
> ISAMUX *does not permit that* because it is *not ISAMUX's job*.  it is *literally* - plain and simple - hooked directly, permanently and irrevocably, into the instruction decoder, at the hardware level.

This seems like just a really convoluted way of supported longer instructions. Instead of adding a 32-bit instruction which requires isamux=0x1, just add a brand new 48-bit instruction. Your design also sounds like it would wreak havoc on disassemblers and static analysis tools because they wouldn't be able to know what isamux would be set to when the relevant code was run.

--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Guy Lemieux

unread,
Jun 5, 2019, 6:32:05 PM6/5/19
to Jonathan Behrens, lk...@lkcl.net, RISC-V ISA Dev, lkcl, Rogier Brussee
Long ago, I started the discussion with the suggestion that we add
something like ISAMUX to switch between custom instruction set
extensions.

The reason? Because of software portability.

Consider custom extension from vendor A, called Ax, and another
extension from vendor B, called Bx. Both vendors chose to use the
exact same opcode space to implement their extensions.

This works great for a while, as software that uses Ax is developed
and proliferates by some subset of users of vendor A, and other
software that uses Bx is developed and proliferates by a completely
different subset of users of vendor B.

Eventually, vendors A and B merge, and they want to develop a single,
new RISC-V processor that supports both Ax and Bx. Also, they want to
encourage software to be written by the open source community that
simultaneously uses Ax and Bx. However, because of the extensive
proliferation of Ax and Bx in separate software stacks, they cannot
alienate those users by dropping those instruction encodings. Instead,
they must somehow allow both Ax and Bx to simultaneously exist on a
single RISC-V processor core, but software needs to switch between Ax
decode mode and Bx decode mode.

Using MISA is inappropriate here. ISAMUX is the answer.

There was a lot of discussion months back on how useful this would be
etc etc. I suggested the solution is simple, but didn't actually give
one because I first wanted others in the community to understand the
need for this feature.

It seems people still don't understand the need yet. That's ok, I'm
patient. It will happen eventually...

Ciao,
Guy
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CANnJOVEnjngocmkzB8BP5wZJKH_Tn_jGQ1e9iZ2BCBhWW013cw%40mail.gmail.com.

lk...@lkcl.net

unread,
Jun 5, 2019, 9:27:28 PM6/5/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
On Wednesday, June 5, 2019 at 11:05:01 PM UTC+1, Jonathan wrote:
 
Jacob correctly points out that we'd need a different letter than "H" but this is a trivial change. However, your concern about destroying state doesn't apply in this case because my proposed extension has no state.


the hypothetical case that you mention does not, however another hypothetical case might.  isamux has to cover both.

> ISAMUX *does not permit that* because it is *not ISAMUX's job*.  it is *literally* - plain and simple - hooked directly, permanently and irrevocably, into the instruction decoder, at the hardware level.

This seems like just a really convoluted way of supported longer instructions. Instead of adding a 32-bit instruction which requires isamux=0x1, just add a brand new 48-bit instruction.

that would work fine if it were likely to be palatable and acceptable to do so, and if isamux were just a way to give some extra instruction bits for custom-extension use, it would be fine.

however, to repeat: it's designed not just for custom extension use, it's designed for emergency situations and for *official* use to get RISC-V out of several corners that it has been backed into.
 
Your design also sounds like it would wreak havoc on disassemblers and static analysis tools because they wouldn't be able to know what isamux would be set to when the relevant code was run.

true... if certain conventions are not properly followed.

this was also raised for SV, which is also an escape-sequenced based system.

it's also true for RVV, which also has the "state" of VL covering a range of instructions.

it's also true for other CSRs that have far-reaching implications that change the state and behaviour of instructions.

it would also be true if N.E.Other mechanism was chosen that used a CSR to change BE/LE

if unlike the usual "global setting" of BE/LE CSRs, the isamux settings are carried out in a limited local scope (and restored to a known-good default setting at the end of that scope) the disassemblers and static analysis tools have everything that they need.

this by convention that would be the responsibility of compiler writers to follow.

given that the team that manages binutils implements the assembler *and* the disassembler, they will have the skills and expertise to advise accordingly and create and get those conventions right.

l.

lk...@lkcl.net

unread,
Jun 5, 2019, 9:40:28 PM6/5/19
to RISC-V ISA Dev, fint...@gmail.com, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
On Wednesday, June 5, 2019 at 11:32:05 PM UTC+1, glemieux wrote:
Long ago, I started the discussion with the suggestion that we add
something like ISAMUX to switch between custom instruction set
extensions.

 i remember.  welcome back, guy :)
 
The reason? Because of software portability.

Consider custom extension from vendor A, called Ax, and another
extension from vendor B, called Bx. Both vendors chose to use the
exact same opcode space to implement their extensions.

This works great for a while, as software that uses Ax is developed
and proliferates by some subset of users of vendor A, and other
software that uses Bx is developed and proliferates by a completely
different subset of users of vendor B.

Eventually, vendors A and B merge, and they want to develop a single,
new RISC-V processor that supports both Ax and Bx. Also, they want to
encourage software to be written by the open source community that
simultaneously uses Ax and Bx. However, because of the extensive
proliferation of Ax and Bx in separate software stacks, they cannot
alienate those users by dropping those instruction encodings.

in effect, without isamux, the first vendor that releases a publicly and widely-adopted custom extension, it becomes, de-facto, the dominator of that custom opcode.  it becomes by "force majeur" the "de-facto" standard "owner" of that custom opcode.

there are only 2 major 32-bit custom opcodes available.

further discussions pointed out that it would be unacceptable and unrealistic to expect adopters to move to the 48-bit space *especially when a custom instruction had already been dominated de-facto in the 32-bit space*, and especially given that they would need a major overhaul of their hardware (putting in an instruction queue that could *cope* with 32-bit and 48-bit), which would be quite unreasonable to expect, for example, a simple embedded RV32 system to have to do.

not to mention: if they're a softcore running from an embedded FPGA, and this was an in-the-field upgrade, the cost of the additional LUTs to add a mixed 32/48 decoder might be so high that it actually could not be deployed.

further discussions also involved what happens when one vendor adopts extension A and maps it to custom32 opcode 1, another vendor adopts extension A and maps it to custom32 opcode 2, another does something different... *and those all become public and commonly-adopted*...

that creates absolute hell and merry mayhem for gcc and binutils because now opcode1 has *two* publicly and widely-adopted meanings?? wtf??

Instead,
they must somehow allow both Ax and Bx to simultaneously exist on a
single RISC-V processor core, but software needs to switch between Ax
decode mode and Bx decode mode.

and many many other scenarios.  it was a looong discussion.
 

Using MISA is inappropriate here.

because, most of all, it switches off the extension entirely (destroying state information in the process).
 
 ISAMUX is the answer.

There was a lot of discussion months back on how useful this would be
etc etc. I suggested the solution is simple, but didn't actually give
one because I first wanted others in the community to understand the
need for this feature.

It seems people still don't understand the need yet.  

a lot of time has passed since.  there's new people.
 
That's ok, I'm
patient. It will happen eventually...

:)


Rogier Brussee

unread,
Jun 6, 2019, 9:19:00 AM6/6/19
to RISC-V ISA Dev, rogier....@gmail.com


Op dinsdag 4 juni 2019 04:24:47 UTC+2 schreef lkcl:
On Tuesday, May 21, 2019 at 12:01:20 AM UTC+8, Rogier Brussee wrote:

[]

> makes sense. You divide up your registers in blocks of 1 or 4. 


Yes. Now why did I not have such simple words??

 
It seems you think in terms of how things work rather than what things do (which is quite understandable if you are hard at work trying to make them work)
 

>
>
>
>
> Maybe you can use the last remaining reserved slot in RVC  (Inst[0:1] = 0b00  Inst[13:15] = 0b100) ) as a 16 bit "prefix" for how to interpret the next instruction including 16bit or 48bit or longer ones. Gives 11 bits to play with.


I like it.

> Of course being reserved, you are practically guaranteed that some version of RVC v2  will eventually trample over your prefix and not be compatible.

Or xBitManip.


I don't think xBitManip uses that slot (AFAIK the only bit of C extension left in the xBitmanip proposal is snuggling in C.NOT with C.LUI with imm == 0. (Personally I think even that would be better left out and moreover I think it would make more sense for both xBitManip and the ISA overall to use that tiny space for  C.LPC rd --> AUIPC rd 0x0  to compress PC relative loads and stores)
 
Btw just to check, there is no private cartelled discussion of creating RVCv2 without a wider public real time discussion, is there? 


 
> However, assemblers and linkers will have to deal with RVC vs RVC v2 anyway,


[] 
>a nd all that not supporting RVC v2 means is that some 32 bit instruction would not get compressed, so the pain you get for having more bits might be worth it. And yes it is a hack. 

If we did not need the RVC space for potential xBitManip,


See above. 

lkcl

unread,
Jun 9, 2019, 9:30:54 AM6/9/19
to RISC-V ISA Dev, fint...@gmail.com, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
A summary of the isamux discussion of the past couple of days

* Backwards binary compatibility if modifications to an official RISCV Extension prove necessary or desirable is just not possible without a way for newer systems to dynamically support legacy *and* revised official ISA Standards.

* The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.

* Whilst Intel does do "retirement" of legacy ISAs on a very long timescale, this is only possible because of the monopoly position Intel holds. RISCV does not have customers, therefore "retirement" as deployed by Intel is just not an option.

* The expectation that everything will be fine in an unregulated use of the precious last 2 remaining 32 bit custom opcodes is completely unrealistic. Three independent high profile custom extensions that require upstream *mainline* gcc and binutils to forcibly accept patches due to sheer overwhelming demand is all it takes to create merry hell and mayhem, as two of them are guaranteed to clash.

* isamux adds 33rd, 34th, 35th and more actual bits to the instruction in a hidden fashion. It is *NOT* the same as MISA, which entirely disables an extension and destroys (resets) internal state.

* isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.

* Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.

* isamux is not optional, must be writable, and given that it is the 33rd, 34th etc. bit of the opcode, traps or hardware implementations *must* be implemented, for *all* permutations.

* Trying to treat isamux as read only is the same as trying to create an ISA encoding where bit 30 of the opcode is always set to 1. It makes no sense because isamux is literally a mandatory extension of the length of all opcodes.

* by the same logic, traps *must* respect isamux. It cannot be ignored. Ignoring isamux is directly equivalent to e.g ignoring bit 11 of any given opcode.

* isamux is not to be implemented lightly or treated as a tool that can be abused: it is intended for emergency or compelling or strictly absolutely necessary circumstances.

* through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation. Coordination with LLVM and other compilers will also prove necessary.

* static disassembly issues due to the escape-sequence-like nature of isamux changing opcode meanings will require some strict discipline in conventions to be defined and enforced by binutils, to set isamux in a LOCAL scope and to return it to its former value as quickly as possible. Actual conventions to be defined by gcc and binutils developers.

* adding new bits to isamux can be done piecemeal, incrementally, one bit at a tine. That's what it's for.

* having official bits reserved and having some bits available for custom extensions is just common sense.

* the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.

Realistic acceptance of the reality of the need for an isamux solution will come when the first crisis hits RISCV. It would be good, instead, to plan ahead and have isamux already in place.

L.

Jonathan Behrens

unread,
Jun 9, 2019, 1:43:28 PM6/9/19
to lkcl, RISC-V ISA Dev, lk...@lkcl.net, rogier....@gmail.com
>  * The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.

When and by whom? QEMU's binary translation of RISC-V to x86 (a radically different ISA than RISC-V) is quite likely the fastest implementation in existence.

> * isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.

This seems to be purely semantics. There is no limit to how invasive any extension can be, and you can absolutely change the setting of multiple settings simultaneously just by setting multiple bits in MISA.

> * Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.

Adding cycles to every context switch every done.

> * through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation.  Coordination with LLVM and other compilers will also prove necessary.

Neither mvendorid nor marchid are visible to user mode software. Even if they were, there is no way programs are going to have a table of every processor ever made to try and determine which features they can use. Also the idea that the RISC-V foundation / foundation members are just going to surrender control to the FSF sounds improbable to say the least.

> * having official bits reserved and having some bits available for custom extensions is just common sense.
> * the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.

Custom bits in isamux is crazy. Some bits will have conflicting meanings on different processors and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.

lk...@lkcl.net

unread,
Jun 9, 2019, 5:35:44 PM6/9/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com


On Sunday, June 9, 2019 at 6:43:28 PM UTC+1, Jonathan wrote:
>  * The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.

When and by whom?

when: four days ago 
by what: logical reasoning and deduction
by whom: me.
where: here on this list.
 
QEMU's binary translation of RISC-V to x86 (a radically different ISA than RISC-V) is quite likely the fastest implementation in existence.


are you recommending that QEMU be made an official hard, critical and absolutely essential dependency of all and any non-backwards-compatible modifications to the RISC-V Standard?

can you see that that (or the hard requirement that any kind of JIT emulator be deployed as a two-way compatibility layer) would be both ridiculed and flat-out rejected if officially proposed?

> * isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.

This seems to be purely semantics.

not at all.
 
There is no limit to how invasive any extension can be, and you can absolutely change the setting of multiple settings simultaneously just by setting multiple bits in MISA.

jonathon: study it properly.  accept that MISA disables extensions entirely, is WARL, and is completely unsuited to the task that isamux is suited.
 

> * Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.

Adding cycles to every context switch every done.

tough.  saving the register files dwarfs that by over an order of magnitude.
 

> * through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation.  Coordination with LLVM and other compilers will also prove necessary.

Neither mvendorid nor marchid are visible to user mode software. Even if they were, there is no way programs are going to have a table of every processor ever made to try and determine which features they can use.

johnathon: you're really not getting it: you keep misunderstanding.  the binary programs do not have that table: if that were the intent, i would have explicitly spelled it out, "the binary programs when compiled need a triplet table".

I SAID THAT GCC AND BINUTILS NEED THE TRIPLET TABLE.

please pay proper attention on this complex topic.

Also the idea that the RISC-V foundation / foundation members are just going to surrender control to the FSF sounds improbable to say the least.


it's not their responsibility, and cannot be their responsibility.  recognising and accepting this is crucial.
 
> * having official bits reserved and having some bits available for custom extensions is just common sense.
> * the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.

Custom bits in isamux is crazy.

not having custom bits is crazy.  you've not thought it through.
 
Some bits will have conflicting meanings on different processors

not if the compiler - the COMPILER - is the central atomic world-wide global registration point for mvendorid-marchid-isamux translation points.  this was discussed and comprehensively analysed eighteen months ago.
 
and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.


that's what the COMPILE TIME mvendorid-marchid-isamux table is for.  the one that you misunderstood.

please be more careful, attentative, and open-minded in your approach and questions.

l.

lk...@lkcl.net

unread,
Jun 9, 2019, 5:54:49 PM6/9/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com
johnathon: the message that you just wrote was a cascading series of misunderstandings, misreadings and belief-judgements, based on which you made closed-minded judgements.

following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.

can you please be more careful and open-minded in how you approach postings on this list?

thanks.

l.

Dan Petrisko

unread,
Jun 9, 2019, 7:50:24 PM6/9/19
to lk...@lkcl.net, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
Hi Luke --

I think you have a misunderstanding of what WARL means in the context of RISC-V CSRs.  Here is the snippet from the privileged spec 1.11
Some read/write CSR fields are only defined for a subset of bit encodings, but allow any value to be written while guaranteeing to return a legal value whenever read. Assuming that writing the CSR has no other side effects, the range of supported values can be determined by attempting to write a desired setting then reading to see if the value was retained. These fields are labeled WARL in the register descriptions. Implementations will not raise an exception on writes of unsupported values to an WARL field. Implementations must always deterministically return the same legal value after a given illegal value is written.

Here's an example of how this would be used by software/hardware.  I have two machines FOO64-32 and BAR-32.  As the names imply, FOO64-32 supports both RV64I and RV32I code, while BAR-32 only supports 32 bit code.  Now, I have a program which runs faster on RV64I, but can run slower in an RV32I fallback mode. The instruction sequence for this might be something like:

<write misa with MXL = 64>
<read misa>
if MXL == 64
run_64_bit_mode()
else if MXL == 32
run_32_bit_mode()

This exact code can run on both machines, because only legal values can be written. Thus, WARL can be used by software to check for features on platforms which may or may not support them as well change modes in platforms which do.

The reason I bring this up is that by mandating that all bits of isamux must be read/writable , you are losing this ability.  If software writes to isamux, changing the effective instruction to one that the processor does not support, you must trap. There's no mechanism for machine mode software to determine what is supported so as to not call those instructions.  Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves, so then all current RISC-V hardware needs mandatory machine-mode emulation code in order to be compliant with the new spec.

The other key difference is that MISA requires DESTRUCTION of extension state information.  It is LITERALLY a kill-switch.

You've mentioned this several times in this thread but I've been unable to find anything confirming this.  Please point me to where in the spec this behavior is specified. From my reading, it is perfectly legal to implement a custom extension which overrides the default behavior of other standard/non-standard extensions. That is, using the custom extension space in misa as your 'isamux bits'.  

Here's an alternative, less disruptive proposal to solve the problem of multiple conflicting extensions that Guy mentioned.  encoding 1 in misa means A is supported, encoding 2 means B is supported, encoding 3 means A+B=C is supported. C can also specify a 1 bit CSR which muxes between the two conflicting extensions.  At some point, isamux or no, the two conflicting bodies will need to decide on how to arbitrate access to their extensions.

As far as expanding the standard extension space, the privileged spec suggests a mechanism for that.
The “G” bit is used as an escape to allow expansion to a larger space of standard extension names. G is used to indicate the combination IMAFD, so is redundant in the misa CSR, hence we reserve the bit to indicate that additional standard extensions are present.
 
So there is an 'escape hatch' for if we need to override extension behavior.  However, the hope is that since we're a RISC ISA, there's very little chance of modifying the behavior of user-mode extensions once frozen.  I've yet to hear a compelling case for doing so.

Some bits will have conflicting meanings on different processors and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.

This is true. The problem you're addressing is that custom behavior may become standardized and conflict.  Having custom bits in your isamux presents the same opportunity for conflict. Unless...

not if the compiler - the COMPILER - is the central atomic world-wide global registration point for mvendorid-marchid-isamux translation points

So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it? There have been many heated discussions about why we don't even have a gcc multilib configuration for both softfloat and hardfloat, because the cross-product of _current_ possible configurations is too large.  To expand this to every possible RISC-V platform/revision ever to be made is infeasible.

Let's be clear: this is CISC-V that you're proposing.  Variable, even dynamic-length instructions.  It's a very cool concept, but it is not in line with RISC principles. It's also just a plain extremely disruptive change for a problem which it only partially solves.
The most likely path for isamux is to itself become a custom extension which simply provides this CSR (See the 'Counters' extension in the most recent unprivileged ISA for an example of this).  Then you're free to use it exactly as you've described.  If it is useful, it may get adopted as a standard extension.   (But more likely, conflicting extensions will implement their own arbitration CSRs, if necessary).

following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.

can you please be more careful and open-minded in how you approach postings on this list?
 
This is unwarranted and paints us as an unwelcoming community. Jonathon's points are completely reasonable critiques of some very strong claims that you have made.  open-minded != agrees 100% with your implementation.

Best,
Dan Petrisko


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 9, 2019, 9:34:55 PM6/9/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
dan many thanks for these insights, i'll come back to them shortly.  

On Monday, June 10, 2019 at 12:50:24 AM UTC+1, Dan Petrisko wrote:
Hi Luke --

following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.

can you please be more careful and open-minded in how you approach postings on this list?
 
This is unwarranted

not at all.  what you mean to say is: you weren't the target of his message, you don't understand how *i* felt on receiving them.
 
and paints us as an unwelcoming community.
 
indeed.  i could feel the scorn in his words. it wasn't very nice, and i didn't feel like i was being made to feel welcome.  i don't have to tolerate it, and i have every right to say so.

Jonathon's points are completely reasonable critiques of some very strong claims that you have made. 

no, what he's done is: not properly read the messages, not been part of the original analysis 18 months ago, not taken the time to read that analysis, and consequently has no understanding of the seriousness of the situation.

they're not "strong claims", they're simply reality and based on logical deduction.  look at what happened to PowerPC with Altivec.  if we want that to happen, we can ignore reality.

open-minded != agrees 100% with your implementation.

i'll rewrite what he wrote to illustrate what i meant.  it will take up my time to do so.  i am not being paid to do this.  i am already annoyed and am not happy to be forced to address this.  if this was a welcoming and respectful community that accepted reality there would be no need.

l.

lk...@lkcl.net

unread,
Jun 9, 2019, 10:03:57 PM6/9/19
to RISC-V ISA Dev, luke.l...@gmail.com, lk...@lkcl.net, rogier....@gmail.com
johnathon: as i am addressing dan, here, i'm referring to you in the third person.  this does not imply (disrespectfully) that i am unaware that you are reading.


On Sunday, June 9, 2019 at 6:43:28 PM UTC+1, Jonathan wrote:
>  * The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.

When and by whom?

Johnathon meant to write, in a respectful way: "I appreciate that you are writing a summary, here.  I must have missed the original analysis where that was concluded, and who was involved in it.  Could you perhaps provide a pointer or more explanation?"

This much more respectful rewritten question, which conforms to basic netiquette rules of internet communication, could have been answered as follows: "Thank you Johnathon, yes, it's a summary, so has to miss things out.  The logical deductions were done four or five days ago, when Allen mentioned JIT translation as a potential option.  I went through the logical reasoning there, asking the simple and pertinent question as to whether RISC-V implementors would accept that a JIT *software* translator would become a hard critical dependency of RISC-V compliance.  in particular, if you appreciate that some implementors are planning to run RISC-V systems for the control of fast breeder nuclear reactors, JIT software translation is an absolute "no".  for the full analysis and discussion you can look back in the archives, I hope that's ok, let me know if you can't find it".


> * isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.

This seems to be purely semantics. There is no limit to how invasive any extension can be, and you can absolutely change the setting of multiple settings simultaneously just by setting multiple bits in MISA.

(nothing wrong with this assertion.  it was covered in the original reply, however i was pretty exasperated by johnathon's curtness, above)
 

> * Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.

Adding cycles to every context switch every done.

again: this is bordering on a form of sarcasm.  a better way for it to be stated would be simply, "This would add cycles to every context switch".  even if he had not said "every done" (spelt correctly as "ever done"), it would have been better.


> * through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation.  Coordination with LLVM and other compilers will also prove necessary.

Neither mvendorid nor marchid are visible to user mode software. Even if they were, there is no way programs are going to have a table of every processor ever made to try and determine which features they can use.

here, he is correctly aware that mvid/marchid are not visible to user mode software, however he assumes - without asking the actual question - that the proposal *needs* to access mvid/marchid.

a much better way to have put it was, "Am I right in thinking that you are proposing that programs need access to mvendorid/marchid at runtime?  That does not seem right because they're not visible to user mode software"

and the answer to this much more respectful way of speaking, which conforms to basic netiquette rules, could have been, "no, it definitely wasn't: only the compiler (gcc and binutils) needs the mvendorid-marchid-triple.  this was discussed back in the original discussion.  gcc emits the triple as a prefix / context into the .S file, and binutils looks that up in the table and REMOVES it, inserting the correct assembly sequence relevant for that triplet".


Also the idea that the RISC-V foundation / foundation members are just going to surrender control to the FSF sounds improbable to say the least.


there's nothing wrong with this assertion, however yet again he'd already pissed me off (twice, now).

the RISC-V Foundation has *already* abdicated responsibility for the custom opcode space.  they cannot now take back control or management of it.

this is just reality that needs to be accepted.

 
> * having official bits reserved and having some bits available for custom extensions is just common sense.
> * the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.

Custom bits in isamux is crazy.

this is an extremely disrespectful assertion.  it's the third in in this message.
 
Some bits will have conflicting meanings on different processors and you'll be telling us there

this is extremely rude.  it's called "putting words into peoples' mouths".  the tactic is as follows:

* you make up something completely stupid (that the person didn't say)
* you assign it to them without their consent (and, for full effect, without their knowledge)
* you then RIDICULE them (har har, what an idiot for making such a stupid assertion)
* thus, having undermined them, now and in the future it is easier to dismiss anything and everything that they say.

i hope and trust, dan, that by highlighting the areas which broke netiquette and basic communications rules on treating people with dignity and respect that this illustrates better how i may have been made to feel belittled and extremely unwelcome by johnathon's message.

i am keenly aware that words that i write may be viewed as ... not very respectful to the recipient, sometimes.  i play a different role from most: that of speaking truth.  i have not yet learned the way to do so whilst at the same time being "diplomatic", shall we say, and if i had the time, money and energy (rather than operating near close to exhaustion for most of the time) i would have more energy to focus on being so.  with apologies that this is not so.

l.

Dan Petrisko

unread,
Jun 9, 2019, 11:00:45 PM6/9/19
to lk...@lkcl.net, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
i hope and trust, dan, that by highlighting the areas which broke netiquette and basic communications rules on treating people with dignity and respect that this illustrates better how i may have been made to feel belittled and extremely unwelcome by johnathon's message.

I am truly sorry that participating in this forum made you feel so.  I cannot speak for anyone but myself, but I hope that perceived rudeness stems from terseness rather than malice.

However, this is what I was responding to:
 
johnathon: the message that you just wrote was a cascading series of misunderstandings, misreadings and belief-judgements, based on which you made closed-minded judgements.
following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.

can you please be more careful and open-minded in how you approach postings on this list?

You did not address violations in netiquette. Instead, you sent a scathing message and then insinuated that someone is close-minded and wasting everyone's time by virtue of participating.  It comes off condescending and if I were a newcomer to this mailing list I would think twice about contributing, lest I be told I'm doing readers eternal a disservice.

i have not yet learned the way to do so whilst at the same time being "diplomatic"

And yet you expect impeccable phrasing from the rest of us.

i don't have to tolerate it, and i have every right to say so.

That's absolutely true. I encourage you to speak up when you feel the community is being unwelcoming, as I did.

Best,
Dan Petrisko


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 9, 2019, 11:05:55 PM6/9/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Monday, June 10, 2019 at 12:50:24 AM UTC+1, Dan Petrisko wrote:
Hi Luke --

I think you have a misunderstanding of what WARL means in the context of RISC-V CSRs. 

possibly.  appreciate the clarification.

This exact code can run on both machines, because only legal values can be written. Thus, WARL can be used by software to check for features on platforms which may or may not support them as well change modes in platforms which do.

The reason I bring this up is that by mandating that all bits of isamux must be read/writable , you are losing this ability. 

the point is moot.  by raising the issue, it allows me to understand that you've fundamentally misunderstood what isamux is.  i mentioned it before, and it is worth repeating: isamux is LITERALLY a fixed quantity of extra instruction opcode bits that goes DIRECTLY into the instruction decode phase.

* if a vendor has only one isamux bit, they now have 33 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.

* if a vendor has two isamux bits, they now have 34 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.

and so on.

does that make it clear, now, how isamux is radically and utterly different from (say) MISA?

does it also illustrate why having WARL capabilities on the isamux is... well... "silly"?  and why i gave examples (earlier and in the summary) which said "making isamux WARL is as if you decided that bit 11 of the 32-bit opcode was set to 1"?

 
If software writes to isamux, changing the effective instruction to one that the processor does not support, you must trap.

yes.
 
There's no mechanism for machine mode software to determine what is supported so as to not call those instructions.

*sigh* except by flipping the entire processor into JIT emulation / dynamic recompilation mode, until such time as a CSRRW (now emulated) flips back to a zero.  at that point, the processor may exit JIT mode and go back to "real" execution mode.

i remember hearing that DEC Alpha used this part-JIT, part-real-execution to create a binary translation of x86 which, on 2nd execution, was just as quick as x86.

this is one of the reasons why i said that when deployed in an "official" capacity, it is an "emergency" measure.

think it through about what would happen if isamux was not deployed, yet an *official* change to a RISC-V extension was published.  both legacy *and* new binaries would be hopelessly incompatible, wouldn't they?

without isamux:

* legacy harwdare can run legacy binaries.
* legacy hardware cannot run newer binaries without JIT
* new hardware cannot run legacy binaries without JIT
* new hardware can run new binaries.
* vendors, not knowing what to do, get seriously pissed off and abandon RISC-V entirely.

at least with isamux:

* legacy hardware can run legacy binaries
* legacy hardware cannot run newer binaries without JIT
* new hardware can run legacy binaries by setting the relevant isamux bit to "0"
* new hardware can run new binaries by setting the relevant isamux bit to "1".
* vendors, at least not very happy about the situation, have a clearly-defined upgrade path.

can you see that one of those paths leads to the destruction of the RISC-V community, and the other leads to a bit of grumbling and a stable long-lasting trustworthy RISC-V ISA?

 
  Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,

good.  that's precisely what's needed for legacy hardware to do JIT of newer binaries.

so then all current RISC-V hardware needs mandatory machine-mode emulation code in order to be compliant with the new spec.


i do not perceive that to be a problem, as by this point we are talking legacy hardware.  legacy hardware: legacy binaries.  trying to get legacy hardware re-certified with an evolving spec is... silly :)


The other key difference is that MISA requires DESTRUCTION of extension state information.  It is LITERALLY a kill-switch.

You've mentioned this several times in this thread but I've been unable to find anything confirming this. 

yes, i took a look, and it appears to be missing.  it's been 18 months.  if it's been removed, that is a serious and costly mistake.
 
Please point me to where in the spec this behavior is specified.

 i've raised a separate query about it.

From my reading, it is perfectly legal to implement a custom extension which overrides the default behavior of other standard/non-standard extensions.

except that every single bit of MISA is allocated and reserved.  plus, think it through: what mechanism would be used to switch on and off that custom extension, such that the vendor could apply for RISC-V Conformance and receive a "pass"?

 
That is, using the custom extension space in misa as your 'isamux bits'.  

they are not "my" isamux bits.  i very deliberately do not use personal pronouns to refer to ideas.  i do not take "personal ownership" of ideas, particularly ones that are intended for wider benefit of a community.  coming _up_ with the ideas (and being credited for them), coming up with the ideas and bringing them to other peoples' attention (and being credited for doing so), yes.  saying "THIS IS MIIINE, ALL MIIINE"... mmmm.... no :)

moving swiftly on: unfortunately, there aren't any MISA bits allocated for custom use.

btw you prompted me to re-read section 3.1.1 (V20190405-Priv-MSU-Ratification) and i noted something that immediately illustrates why MISA cannot be used as a substitute for isamux.

Writing misa may increase IALIGN, e.g., by disabling the "C" extension. If an instruction that
would write misa increases IALIGN, and the subsequent instruction'ss address is not IALIGN-bit
aligned, the write to misa is suppressed, leaving misa unchanged.

and, earlier:

If an ISA feature x depends on an ISA feature y, then attempting to enable feature x but disable
feature y results in both features being disabled. For example, setting "F" =0 and "D" =1 results
in both F and D being cleared.

these are not things where you can reasonably do high-speed execution of alternate encodings!  isamux, by being effectively the 33rd (34th etc) bits of the opcode, and thus being mandatory, you *know* with confidence that the compiler can emit the CSR to change an isamux bit, follow up with an opcode, emit a CSR to change another bit, follow up with another opcode, one after the other, bang, bang, bang.

the above scenarios are *awful*!  emitting code that has to conditionally check if multiple features were enabled/disabled?  moo? :)  

 
Here's an alternative, less disruptive proposal to solve the problem of multiple conflicting extensions that Guy mentioned.  encoding 1 in misa means A is supported, encoding 2 means B is supported, encoding 3 means A+B=C is supported. C can also specify a 1 bit CSR which muxes between the two conflicting extensions. 

 you realise that that 1 bit CSR *is* isamux? :)
 
As far as expanding the standard extension space, the privileged spec suggests a mechanism for that.
The “G” bit is used as an escape to allow expansion to a larger space of standard extension names. G is used to indicate the combination IMAFD, so is redundant in the misa CSR, hence we reserve the bit to indicate that additional standard extensions are present.
 
So there is an 'escape hatch' for if we need to override extension behavior.  However, the hope is that since we're a RISC ISA, there's very little chance of modifying the behavior of user-mode extensions once frozen. 

the little-endian / big-endian debate was one such scenario, discussed.... last year.  Japan - the entire industry of Japan - is running PowerPC.  converting software specifically written for the *opposite* encoding used in the rest of the industrial world is... hopelessly impractical.

they *need* LE/BE, and they need RISC-V.  PowerPC is running out of time (and increasing in cost).
 
I've yet to hear a compelling case for doing so.

isamux is envisaged to be the RISC-V community having a ready-to-deply "preparedness" system, should precisely such a compelling case arise.


Some bits will have conflicting meanings on different processors and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.

This is true. The problem you're addressing is that custom behavior may become standardized and conflict.

yes.
 
  Having custom bits in your isamux

again: it is not "my" isamux.  it is an idea for consideration and refinement by the whole RISC-V community, for the benefit and stability *of* the whole RISC-V community.
 
presents the same opportunity for conflict. Unless...

not if the compiler - the COMPILER - is the central atomic world-wide global registration point for mvendorid-marchid-isamux translation points

So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?

que?  i'm lost.  there's a cognitive break where some logical deductive reasoning hasn't been spelled out.  can you elaborate on how you reached that conclusion (and perhaps tone down the disbelief somewhat? the phrase "surely you did not mean to imply that" would have helped)

There have been many heated discussions about why we don't even have a gcc multilib configuration for both softfloat and hardfloat, because the cross-product of _current_ possible configurations is too large.  To expand this to every possible RISC-V platform/revision ever to be made is infeasible.

let's revisit this conclusion in a few years time when RISC-V needs to evolve beyond its current vision (which is already several years old, and young at that).

[or... as a community we can plan ahead and not be caught out by the need to evolve]
 

Let's be clear: this is CISC-V that you're proposing. 

no, it is not.  that is absolutely and categorically not the case.
 
Variable, even dynamic-length instructions.

NO.  this is PLAIN WRONG.

isamux is very very specifically a FIXED number of bits that go DIRECTLY into the instruction decode phase AT THE HARDWARE LEVEL.  i have made this clear any number of times.

(caveat: if someone really wants to use those bits to indicate variable-length instructions, that's up to them: i wish them good luck and look forward to seeing how far they get).

  It's a very cool concept, but it is not in line with RISC principles.

why have other RISC systems in the past adopted it, then?


It's also just a plain extremely disruptive change for a problem which it only partially solves.

as i said, many times, and emphasised above: (a) it is an emergency provision, not to be abused and (b) the alternatives (as illustrated above) are far, far worse.

 
The most likely path for isamux is to itself become a custom extension which simply provides this CSR (See the 'Counters' extension in the most recent unprivileged ISA for an example of this). 

yes... and that will result in whatever CSR is used for the purpose becoming a de-facto "dominator" of whatever CSR is utilised for the purpose.  it is far better that this is recognised, accepted, planned for and adopted as an official RISC-V extension.
 
Then you're free to use it exactly as you've described.  If it is useful, it may get adopted as a standard extension.   (But more likely, conflicting extensions will implement their own arbitration CSRs, if necessary).

precisely.  at which point, the hell-on-earth moves from the conflicting extensions to the conflicting "arbitration CSRs".

can you therefore see that it would be best to nip that in the bud and make it an official scheme?

l.

lk...@lkcl.net

unread,
Jun 9, 2019, 11:20:24 PM6/9/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Monday, June 10, 2019 at 4:00:45 AM UTC+1, Dan Petrisko wrote:
i hope and trust, dan, that by highlighting the areas which broke netiquette and basic communications rules on treating people with dignity and respect that this illustrates better how i may have been made to feel belittled and extremely unwelcome by johnathon's message.

I am truly sorry that participating in this forum made you feel so. 

being excluded from RISC-V Membership due to the business requirement to operate fully transparently (where RISC-V WGs operate behind closed forums) has forced me to waste significant time reverse-engineering intent and discussions from snippets of conversations, often leading to misunderstandings, and the very people "inside" the cartel going, "well if you knew what WE knew, if you weren't such a high-and-mighty total loser, you wouldn't waste our time by being utterly wrong, har har".  publicly and repeatedly.

so it's not specifically this forum that is unwelcoming, it's the wider RISC-V community (RISC-V Membership giving privileged access to secretive discussions about the future and direction of RISC-V) that has people automatically feel utterly excluded and unwelcome.

over the past 3+ years, many of the people who felt so excluded, those who were initially excited by the use of the word "open" in reference to RISC-V, also felt betrayed at their constructive feedback being devalued and even ignored.  they've long since given up and have walked away entirely from RISC-V.


I cannot speak for anyone but myself, but I hope that perceived rudeness stems from terseness rather than malice.


i did notice that.  i do appreciate that it wasn't johnathon's intent (with apologies for speaking about you in the third person again).  however at the same time i could not let it go without comment.
 
However, this is what I was responding to:
 
can you please be more careful and open-minded in how you approach postings on this list?

You did not address violations in netiquette.

in the followup, i had calmed down, and did so.  you make a good point about being "scathing", having an effect on others.  i'm not sure how to address that, and also make things clear to johnathon at the same time.


i have not yet learned the way to do so whilst at the same time being "diplomatic"

And yet you expect impeccable phrasing from the rest of us.


the reason for mentioning my own limitations is to make people aware that *i* am aware of them, and that i do expect people to point them out to me (as you did).  to do otherwise would be hypocritical.
 
i don't have to tolerate it, and i have every right to say so.

That's absolutely true. I encourage you to speak up when you feel the community is being unwelcoming, as I did.


appreciated.

l.

Dan Petrisko

unread,
Jun 10, 2019, 2:49:00 AM6/10/19
to lk...@lkcl.net, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
the point is moot.  by raising the issue, it allows me to understand that you've fundamentally misunderstood what isamux is.  i mentioned it before, and it is worth repeating: isamux is LITERALLY a fixed quantity of extra instruction opcode bits that goes DIRECTLY into the instruction decode phase.


This is an ISA forum.  The microarchitectural implementation is an orthogonal concern.  And for good reason; as long as the uarch is compliant with the ISA, it's free to perform all sort of optimizations which will baffle and infuriate ISA writers. That's why a proposal which says that the definition of the entire ISA can change with a single instruction scares me.  High performance things like pre-decoding instructions, uop sequencing, memory consistency, LSQs, even something as simple as early branch prediction will be touched by this proposal (after all, how can you identify a branch if the definition of RV64I can change by the time the instruction is fully decoded?).  Because of this separation, there is no concept of a "decoder" in the ISA.

From an ISA perspective, isamux is a WLRL CSR which switches out active extensions.  So how many bits is isamux at the ISA level? It seems like it's not constant based on the following:

* if a vendor has only one isamux bit, they now have 33 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.
* if a vendor has two isamux bits, they now have 34 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.

What happens when software writes 1-bit-vendor's isamux with a 2-bit value?

think it through about what would happen if isamux was not deployed, yet an *official* change to a RISC-V extension was published.

That's the point of 'freezing' the base ISA + extensions.  So further extensions can build on top, using the standard mechanisms.  Like I say, I cannot picture a single scenario which would require overriding bits in a standard, frozen extension. That's why I disagree with the argument that we should optimize for it at the cost of imposing software requirements on past and hardware requirements on future implementations.

does it also illustrate why having WARL capabilities on the isamux is... well... "silly"?  and why i gave examples (earlier and in the summary) which said "making isamux WARL is as if you decided that bit 11 of the 32-bit opcode was set to 1"?
Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,
 
It is not silly.  It's totally legal for a processor which only supports user-mode spec to not have a trap mechanism at all.  If an isamux bit 0 determines LE vs BE and my system does not support BE, then software should be able to determine this and refuse to set the bit.  If my system has hardware emulation support for an alternate mode, then it's of course allowed to set the bit and trap.  Mandating a JIT-mode with hardware emulation support to be spec-compliant is a non-starter for minimal RISC-V implementations.   So unless isamux is WARL, it fundamentally can't work for overriding user-mode standard or custom extensions.

i remember hearing that DEC Alpha used this part-JIT, part-real-execution to create a binary translation of x86 which, on 2nd execution, was just as quick as x86.
 
These are massive, high performance chips that are able to provide such a mechanism. 

the above scenarios are *awful*!  emitting code that has to conditionally check if multiple features were enabled/disabled?  moo? :) 

This should only done once at boot time, saving the results.  So while annoying, it's not a performance hit.

The little-endian / big-endian debate was one such scenario, discussed.... last year.  Japan - the entire industry of Japan - is running PowerPC.  converting software specifically written for the *opposite* encoding used in the rest of the industrial world is... hopelessly impractical.

RISC-V already has had a similar situation. It now has Ztso in addition to its normal memory model.  They mention LE/BE as another scenario for such an extension, without requiring changes at the individual instruction level.  As far as I know, there's no need (except for academic interest) for a system which can switch LE/BE instruction by instruction.
 
So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?
que?  i'm lost.  there's a cognitive break where some logical deductive reasoning hasn't been spelled out.  can you elaborate on how you reached that conclusion (and perhaps tone down the disbelief somewhat? the phrase "surely you did not mean to imply that" would have helped)

Say I have a processor that I'm developing. I do not have a compiler sanctioned vendorid or marchid.  Therefore I need to add my custom triple to gcc, no? Whereas current state of the art is just to use off-the-shelf gcc.

why have other RISC systems in the past adopted it, then?

Perhaps I missed this in another thread, but I have never seen a comparable structure in another RISC ISA?

you realise that that 1 bit CSR *is* isamux? :)
precisely.  at which point, the hell-on-earth moves from the conflicting extensions to the conflicting "arbitration CSRs".
 
Yes, I do. That's why I'm saying that a spec-mandated isamux is unnecessary.  When there is a conflict between two vendors and they want to support each other's extensions simultaneously, there is already a pathway to doing so (through a third extension with a mode bit).  It's much simpler to deal with this on a case-by-case basis than to impose additional compiler, software and hardware requirements on all implementations moving forward.
 

Best,
Dan Petrisko



--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 10, 2019, 3:30:53 AM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Monday, June 10, 2019 at 7:49:00 AM UTC+1, Dan Petrisko wrote:
the point is moot.  by raising the issue, it allows me to understand that you've fundamentally misunderstood what isamux is.  i mentioned it before, and it is worth repeating: isamux is LITERALLY a fixed quantity of extra instruction opcode bits that goes DIRECTLY into the instruction decode phase.


This is an ISA forum.  The microarchitectural implementation is an orthogonal concern.

not in this case.  ignoring opcode instruction bits is not an option.  anyone that suggested that a given microarchitectural implementation was going to ignore bit 17 of all opcodes would cause quite a few laughs.  and isamux is llliiiitittterrrrrraallllllyyyy the addition of a fixed quantity of extra instruction opcode bits into the decode phase.

there is no way for any microarchitecture to ignore those additional bits.

 
  And for good reason; as long as the uarch is compliant with the ISA, it's free to perform all sort of optimizations which will baffle and infuriate ISA writers. That's why a proposal which says that the definition of the entire ISA can change with a single instruction scares me. 

tell me about it.  now you understand why i said it's an emergency provision that gets the RV ISA out of several corners it's been backed into (past and future).
 
High performance things like pre-decoding instructions, uop sequencing, memory consistency, LSQs, even something as simple as early branch prediction will be touched by this proposal (after all, how can you identify a branch if the definition of RV64I can change by the time the instruction is fully decoded?).  Because of this separation, there is no concept of a "decoder" in the ISA.

From an ISA perspective, isamux is a WLRL CSR

yes.
 
which switches out active extensions. 

NO. it does NOT "switch out" anything.  how many times do i have to say it?  it's LITERALLY the 33rd, 34th and so on opcode bit.  it is LITERALLY as if RISC-V opcodes were now 33 bits long.  or 34 bits long.

what is chosen to be done *with* those bits is entirely up to us.



So how many bits is isamux at the ISA level?

that's down to the standards process, on a case-by-case basis.  five days ago i gave several examples, and 18 months ago many more were discussed.
 
It seems like it's not constant based on the following:

each vendor may choose which bits to implement, just as they may choose which extensions to implement.  what is the problem?
 
* if a vendor has only one isamux bit, they now have 33 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.
* if a vendor has two isamux bits, they now have 34 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.

What happens when software writes 1-bit-vendor's isamux with a 2-bit value?

the question is exactly the same as, "what happens when software writes only 17 bits of opcode to a processor that decodes 32-bits".  or tries to write 33 bits.

it won't work, will it?

so let's give some concrete examples.

* in 2022 the new ratified LE/BE isamux standard comes out.  ISAMUX bit 0 is chosen to represent it.
* in 2023 the new ratified RVC2 isamux standard comes out.  ISAMUX bit 1 is chosen to represent it.
* a vendor chooses to implement *only* LE/BE but has not implemented RVC2.
* the customers try to run a binary that is compiled with **BOTH** LE/BE **AND** RVC2.
* instructions with LE/BE work perfectly well
* instructions with RVC2 *TRAP* on the writing / setting of ISAMUX bit 0, and require JIT emulation, just as any "legacy" processor would require.

this is absolutely no different a situation from if the *actual* RISC-V instruction set were *already* 34 bits.




think it through about what would happen if isamux was not deployed, yet an *official* change to a RISC-V extension was published.

That's the point of 'freezing' the base ISA + extensions. 

and what happens if a mistake is made?

So further extensions can build on top, using the standard mechanisms. 

and what happens if that is not enough?
 
Like I say, I cannot picture a single scenario which would require overriding bits in a standard, frozen extension.

then you did not see the examples that were given 5 days ago, nor the ones that were given 18 months ago.
 
That's why I disagree with the argument that we should optimize for it at the cost of imposing software requirements on past and hardware requirements on future implementations.

not being able to envisage and think through future needs seems to be a common argument for not taking strategic action.  i do not understand this perspective, at all.  it seems to be extremely short-sighted.

and, also, you didn't read what i wrote 4-5 days ago.  please don't make me repeat it, my RSI is getting particularly bad.  can i leave it to you to review the thread?



does it also illustrate why having WARL capabilities on the isamux is... well... "silly"?  and why i gave examples (earlier and in the summary) which said "making isamux WARL is as if you decided that bit 11 of the 32-bit opcode was set to 1"?
Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,
 
It is not silly.  It's totally legal for a processor which only supports user-mode spec to not have a trap mechanism at all. 

ah.  in the "embedded" platform it is, however in the UNIX platform space, it is categorically NOT acceptable.  at all.  failure to trap on illegal instructions will result in non-compliance.
 
If an isamux bit 0 determines LE vs BE and my system does not support BE, then software should be able to determine this and refuse to set the bit. 

NO.  this is a complete failure to comprehend the nature of isamux and what it solves.

if a system wanted to not implement LE/BE (in bit 0), it would be better that that system simply not implement isamux at all.

at least then the system could choose - correctly - to soft-JIT-emulate isamux on trap of the write to the isamux CSR.

 
If my system has hardware emulation support for an alternate mode, then it's of course allowed to set the bit and trap.  Mandating a JIT-mode with hardware emulation support to be spec-compliant is a non-starter for minimal RISC-V implementations.

you've fundamentally misunderstood.  please re-read what i wrote when outlining the two cases where isamux does exist, and where isamux does not exist.
 
   So unless isamux is WARL, it fundamentally can't work for overriding user-mode standard or custom extensions.


you've fundamentally misunderstood.
 
i remember hearing that DEC Alpha used this part-JIT, part-real-execution to create a binary translation of x86 which, on 2nd execution, was just as quick as x86.
 
These are massive, high performance chips that are able to provide such a mechanism. 

the above scenarios are *awful*!  emitting code that has to conditionally check if multiple features were enabled/disabled?  moo? :) 

This should only done once at boot time, saving the results.  So while annoying, it's not a performance hit.

it's also not an acceptable mandatory option.
 

The little-endian / big-endian debate was one such scenario, discussed.... last year.  Japan - the entire industry of Japan - is running PowerPC.  converting software specifically written for the *opposite* encoding used in the rest of the industrial world is... hopelessly impractical.

RISC-V already has had a similar situation. It now has Ztso in addition to its normal memory model.  They mention LE/BE as another scenario for such an extension, without requiring changes at the individual instruction level.  As far as I know, there's no need (except for academic interest) for a system which can switch LE/BE instruction by instruction.
 
So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?
que?  i'm lost.  there's a cognitive break where some logical deductive reasoning hasn't been spelled out.  can you elaborate on how you reached that conclusion (and perhaps tone down the disbelief somewhat? the phrase "surely you did not mean to imply that" would have helped)

Say I have a processor that I'm developing. I do not have a compiler sanctioned vendorid or marchid. 

then my understanding is that such a system, by not having applied for a JEDEC mvendorid, would *not* pass RISC-V Conformance tests.

 
Therefore I need to add my custom triple to gcc, no? Whereas current state of the art is just to use off-the-shelf gcc.

why have other RISC systems in the past adopted it, then?

Perhaps I missed this in another thread, but I have never seen a comparable structure in another RISC ISA?


PowerPC and many others have dynamic runtime CSR-driven LE/BE switching.  this *is* isamux.

 
you realise that that 1 bit CSR *is* isamux? :)
precisely.  at which point, the hell-on-earth moves from the conflicting extensions to the conflicting "arbitration CSRs".
 
Yes, I do. That's why I'm saying that a spec-mandated isamux is unnecessary. 

oh, dan, please read in full what i write.  i *already told you* what the consequences of non-spec-mandated isamux would be, only a couple of paragraphs away.

When there is a conflict between two vendors and they want to support each other's extensions simultaneously, there is already a pathway to doing so (through a third extension with a mode bit). 

dan: that mode bit **IS** isamux.
 
It's much simpler to deal with this on a case-by-case basis

yes it is... and where would those bits accumulate?  *IN THE ISAMUX CSR*!  i said exactly these words only 5 days ago!  please do not make me repeat them time and time again!

it is becoming very painful to type for such prolonged periods of time.

than to impose additional compiler, software and hardware requirements on all implementations moving forward.
 

dan: what do you think happens when all those "case-by-case" occurrencs accumulate over time, and interact?  you get *precisely and exactly* what has been described and outlined, and thought through over several months, discussed here only 5 days ago and 18 months ago in considerable detail!

l.

Dan Petrisko

unread,
Jun 10, 2019, 12:31:45 PM6/10/19
to lk...@lkcl.net, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Rogier Brussee
I have been reading the threads. All of them. It is incredibly disrespectful to insinuate otherwise.

llliiiitittterrrrrraallllllyyyy LITERALLY LITERALLY

Absolutely unacceptable. I am trying to have a technical discussion and I won't tolerate any further condescension.

not in this case.  ignoring opcode instruction bits is not an option.

Yes in this case. And in all cases.  A proposal to change the ISA cannot neglect specifying what is changing in the ISA.  That is nonsense.  That the proposal is 'non-optional' does not change the fact that there needs to be a software mechanism in the ISA to access this. From the ISA perspective, it is writing bits to a CSR. If isamux is guaranteed to accept those writes as you've multiply said must be a requirement, then there are two options.  

1) isamux needs to be large enough to support all future bits, with unused bits reserved. Therefore, the question of what happens when you write to the reserved bits arises and needs to be answered. The normal RISC-V solution is to make them WARL

2) Have a separate bit address for each CSR.  However, the address space for CSRs is small and mostly reserved.  Non-standard read/write CSRs must live in 0x7C0-0x7FF or 0xBC0-0xBFF = 128 total.  Not to mention the additional hardware cost of muxing/demuxing at a bit granularity, which is unacceptable for small targets.
 
anyone that suggested that a given microarchitectural implementation was going to ignore bit 17 of all opcodes would cause quite a few laughs.  and isamux is llliiiitittterrrrrraallllllyyyy the addition of a fixed quantity of extra instruction opcode bits into the decode phase.
 
If I don't support C-extension, then I am ignoring bit 0 of the opcode. No one is laughing about that choice.

which switches out active extensions. 
NO. it does NOT "switch out" anything.

actually, with the isamux proposal, the hardware may DYNAMICALLY switch OFF the custom extensions, entirely.  that is ENTIRELY the point and purpose of it.  is this aspect of the isamux proposal something that you understood?

- You, several days ago.

* in 2022 the new ratified LE/BE isamux standard comes out.  ISAMUX bit 0 is chosen to represent it.
* in 2023 the new ratified RVC2 isamux standard comes out.  ISAMUX bit 1 is chosen to represent it.
* a vendor chooses to implement *only* LE/BE but has not implemented RVC2.
* the customers try to run a binary that is compiled with **BOTH** LE/BE **AND** RVC2.
* instructions with LE/BE work perfectly well
* instructions with RVC2 *TRAP* on the writing / setting of ISAMUX bit 0, and require JIT emulation, just as any "legacy" processor would require.
this is absolutely no different a situation from if the *actual* RISC-V instruction set were *already* 34 bits.

It is unthinkable that RISC-V is going to change its instruction length every year.  From both a hardware and software design perspective.  It's a huge disruption to the ecosystem and it will never be done.

Intel, as THE absolute rock solid canonical benchmark / example of how to stick to your guns on the ISA. They have taken backwards compatibility to the absolute inviolate limit, even when 8086, 186, 286, 386 and 486 compatibility makes an absolute pig's ear of the layout.
You should see the ASIC photos for the bit that covers legacy instructions, compared to the rest of the design, it's hilarious.
The other example I already mentioned, it's Altivec / SSE conflict in Power PC. There were people with experience of PPC who confirmed what should be blindingly obvious but clearly wasn't to the muppets that decided to reuse the same opcodes to create utterly incompatible binaries.
MIPS bless 'em probably have something similar, although because of the more proprietary and embedded nature of MIPS it is far less impact because, well, embarrassingly, there isn't a public ecosystem to speak of.
ARM have not made the mistake surprisingly. They're big enough and ugly enough.  The switch to hardfloat 10 years ago was painful for distros but was executed cleanly precisely because there were no incompatibilities, only emulation needed.
 
None of these are comparable to isamux. The only one that is close is Altivec, which from the wikipedia page: "There are also overloaded intrinsic functions such as "vec_add" that emit the appropriate op code based on the type of the elements within the vector, and very strong type checking is enforced." Completely different mechanism.

then you did not see the examples that were given 5 days ago, nor the ones that were given 18 months ago.
and what happens if a mistake is made?

I am talking about an example of a reason to change the existing definition of a existing standard extension.  Which has not been given anywhere in this thread, despite your exasperation. LE/BE does not count, since I pointed out where it is described in the spec to be handled as a new extension, not a modification as an old one.

ah.  in the "embedded" platform it is, however in the UNIX platform space, it is categorically NOT acceptable.  at all.  failure to trap on illegal instructions will result in non-compliance.

Okay, so you agree that this is not a solution for RV64ABCDEFGIJKLMOPQRTVWXYZ?  This is not a solution for embedded processors?  The vast, vast majority of RISC-V devices sold will not be UNIX processors (2 billion WD procs shipping soon).  Additionally, processors most likely to use custom extensions in the 30-bit encoding space are embedded DSP processors and academic research projects. This is described in the unprivileged ISA spec.  So isamux as proposed does not solve the issue in 99.9999% of cases.

then my understanding is that such a system, by not having applied for a JEDEC mvendorid, would *not* pass RISC-V Conformance tests.

I'm not talking about certifying a processor. I'm talking about using an off-the-shelf compiler for developing a processor.  Surely you don't expect hobbyists to pay money to register their toy? Or alternatively, fork gcc as I suggested.
 
yes it is... and where would those bits accumulate?  *IN THE ISAMUX CSR*!  i said exactly these words only 5 days ago!  please do not make me repeat them time and time again!

I agree that the mechanism is similar.  That's why I suggested it.  There does not need to be a centralized body which governs the arbitration bits. It is by far not the common case, it will complicate systems, and it will slow down adoption. We should not disrupt the existing ecosystem to support it.


Best,
Dan Petrisko




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Rogier Brussee

unread,
Jun 10, 2019, 6:06:58 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


Op maandag 10 juni 2019 18:31:45 UTC+2 schreef Dan Petrisko:
I have been reading the threads. All of them. It is incredibly disrespectful to insinuate otherwise.

[]
 
* in 2022 the new ratified LE/BE isamux standard comes out.  ISAMUX bit 0 is chosen to represent it.
* in 2023 the new ratified RVC2 isamux standard comes out.  ISAMUX bit 1 is chosen to represent it.
* a vendor chooses to implement *only* LE/BE but has not implemented RVC2.
* the customers try to run a binary that is compiled with **BOTH** LE/BE **AND** RVC2.
* instructions with LE/BE work perfectly well
* instructions with RVC2 *TRAP* on the writing / setting of ISAMUX bit 0, and require JIT emulation, just as any "legacy" processor would require.
this is absolutely no different a situation from if the *actual* RISC-V instruction set were *already* 34 bits.

It is unthinkable that RISC-V is going to change its instruction length every year.  From both a hardware and software design perspective.  It's a huge disruption to the ecosystem and it will never be done.


I  think some of the confusion is due to the name ISAMUX. Can I suggest to use the name ISANS for ISA name space?  For every value of this CSR you should have a different clean ISA namespace of all 16, 32, 48 ,... bit instructions (I think this is the point Luke is trying to make with bits 33, 34, .. of the isa). Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).  

Now 2^32 namespaces may be slightly exaggerated: I imagine, you would have "global namespaces" like an hypothetical RVCv2 that people will want to run for a whole program and that people want to extend. That would mean they are essentially just feature bits. But it should also be possible to set and unset  ISANS locally in a function e.g. for using , a hypothetical on chip hardware RDMA interface with a long namespace number (probably the most priced interface numbers are those in the top 20 bits because LUI can provides a 20 bits immediate in one instruction). In theory, after setting ISANS, every bit in the ISA is yours, but in practice you would ,say, reuse IMAC  because networking needs  integer processing, and, say, the FMADD major opcode of the isa, because you wanted to reuse the decoder for 3 input registers, and you don't care about FMADD while networking. Perhaps even more likely, you could use custom major opcodes, safe in the knowledge that because of namespacing, this does not trample over anybody else's custom opcode. 

Note1: If such an ISANS scheme is adopted by the standard, I see no particular reason why being a standardised instruction would _necessarily_ imply  being in the default global namespace. The main advantage of being in the global namespace would be, not having to switch  namespaces back and forth with a CSR instruction. The disadvantage would be, taking up precious space in the default global namespace that can no longer be used for other purposes.
Note2: I can imagine namespaces being per privilege level, but it seems ISANS will have to saved and restored with a change of privilege level anyway. Likewise one would have to think about the impact on (user level) interrupts.
Note3:   Perhaps in such a scheme, MISA should be set per namespace. In the RDMA example: while in the RDMA networking extension namespace you would no longer fully support F and D and MISA would reflect that.

  
[]
ah.  in the "embedded" platform it is, however in the UNIX platform space, it is categorically NOT acceptable.  at all.  failure to trap on illegal instructions will result in non-compliance.

Okay, so you agree that this is not a solution for RV64ABCDEFGIJKLMOPQRTVWXYZ?  This is not a solution for embedded processors?  The vast, vast majority of RISC-V devices sold will not be UNIX processors (2 billion WD procs shipping soon).  Additionally, processors most likely to use custom extensions in the 30-bit encoding space are embedded DSP processors and academic research projects. This is described in the unprivileged ISA spec.  So isamux as proposed does not solve the issue in 99.9999% of cases.

then my understanding is that such a system, by not having applied for a JEDEC mvendorid, would *not* pass RISC-V Conformance tests.

I'm not talking about certifying a processor. I'm talking about using an off-the-shelf compiler for developing a processor.  Surely you don't expect hobbyists to pay money to register their toy? Or alternatively, fork gcc as I suggested.
 
yes it is... and where would those bits accumulate?  *IN THE ISAMUX CSR*!  i said exactly these words only 5 days ago!  please do not make me repeat them time and time again!

I agree that the mechanism is similar.  That's why I suggested it.  There does not need to be a centralized body which governs the arbitration bits. It is by far not the common case, it will complicate systems, and it will slow down adoption. We should not disrupt the existing ecosystem to support it.


One could easily imagine the namespaces be partitioned in  "to be used by the standard", "registered", and "free for experimentation with a random namespace number".

Ciao

Rogier

Dan Petrisko

unread,
Jun 10, 2019, 7:54:15 PM6/10/19
to Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, Luke Kenneth Casson Leighton
Thanks for the clarification Rogier!

I do like 'isa namespaces' better.  As long as ISANS is WARL so that implementations which only support the default namespace do not trap, it seems reasonable.

Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).  

I very much hope it is :).  One of the nice things about RISC-V is its smallness.  In the RISC-V Reader, they espouse the virtues of having the entire ISA fit in a small book, rather than the several volumes of say, Power.  As a community we should try to avoid "kitchen-sink" ISA proliferation.  It would be much better if a few custom extensions become widely adopted and then enshrined in the standard, rather than tons and tons of custom extensions become relatively popular, conflicting with each other, causing toolchain complications and software developer confusion.

Note2: I can imagine namespaces being per privilege level, but it seems ISANS will have to saved and restored with a change of privilege level anyway. Likewise one would have to think about the impact on (user level) interrupts.

Potentially very cool idea, has interesting implications for virtualization.

One could easily imagine the namespaces be partitioned in  "to be used by the standard", "registered", and "free for experimentation with a random namespace number".

My opinion: The extension discovery/enable/disable mechanism is a platform issue -- embedding it into the user-level spec is a mistake.  A standard mechanism could have a place in a separate platform specification. However, many systems which handle some complicated set of non-standard extensions may want their own mechanisms.

I could be wrong, I don't have a ton of experience in BSP/ucontroller architectures.  My impression is that the toolchains are very custom.

Best,
Dan Petrisko

 

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 10, 2019, 8:18:57 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
dan,

i apologise for saying that you had not read the discussion: you had made several comments that had been repeated, and demonstrated more than once that you'd not understood - or missed - some of the logical reasoning, some of it even within a couple of paragraphs, and one idea near-verbatim repeated idea from allen.

from this repeated pattern it is quite reasonable to conclude that you had not read the discussion.

in particular, you missed that the suggestion that a custom extension create a "discernment bit" being literally the same as an isamux bit, when combined with the next extension that does the same thing, you have precisely and exactly the same circumstances that you stated would be "fatal" to the entire isamux scheme.

in addition, you're making completely logical deductions, "X has happened which invalidates the need for Y, therefore in the future the need for dealing with a similar Y occurrence is completely unnecessary", which is clearly not true.

this leaves me to conclude that you are not equipped to make a sufficiently comprehensive and in-depth analysis of this space.

i trust that you will not be involved in the decision-making of RISC-V extensions, as you are unable to deal with the complexities and unable to strategically think ahead.

l.

Jacob Lifshay

unread,
Jun 10, 2019, 8:25:24 PM6/10/19
to Luke Kenneth Casson Leighton, RISC-V ISA Dev, Luke Kenneth Casson Leighton, rogier....@gmail.com
Luke, if you're saying that someone is unable to do something ever, that's extremely rude, and I would suggest apologising. Pretty much all people are quite capable of learning in the future even if they don't understand something now.

Sincerely,
Jacob Lifshay

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lk...@lkcl.net

unread,
Jun 10, 2019, 9:06:31 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Monday, June 10, 2019 at 11:06:58 PM UTC+1, Rogier Brussee wrote:
 
I  think some of the confusion is due to the name ISAMUX. Can I suggest to use the name ISANS for ISA name space?

the name doesn't matter.  it may be called ISANS.  it is exactly the same as both dan and allen's suggestion to have "discernment" bits, which, when accumulated, become ISAMUX / ISANS.
 
 For every value of this CSR you should have a different clean ISA namespace of all 16, 32, 48 ,... bit instructions

yes, where:

* when ISANS[0..N] == 0b0000000...., representing N bits added to the instruction decode, that is the "legacy" (present) RISC-V ISA.
* there are massive overlaps between the permutations, leaving the vast majority of opcodes unmodified in the vast majority of use-cases.
* however there are even "MIPS" ISANS bits, and "ARM" ISANS bits, and "x86" ISANS bits

(I think this is the point Luke is trying to make with bits 33, 34, .. of the isa). Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).

this is intended as an "emergency" provision.  if the community has to go beyond that, 32 times or 64 times, it would be... quite eyebrow-raising.  i'd expect it to take... 70+ years to reach that point.



Now 2^32 namespaces may be slightly exaggerated:

not if they're allocated for particular purposes, they're not, because of the huge overlaps.  there's been several examples already:

* RDMA ISANS (see roger's example, which re-uses large portions of the standard RV opcode space)
* LE/BE ISANS - only the LD/ST namespace changes yet the rest *has* to stay the same
* RVCv2 ISANS - only the RVC namespace changes yet the rest *has* to stay the same.
* fixing (redesigning) the INT opcodes to be more like the FP ones, so that RV32 binaries run directly on RV64 without requiring a switch to RV32 mode
* MIPS ISANS
* ARM ISANS
* x86 ISANS
* SimpleV ISANS - *NONE* of the opcodes change, however the number of register bits is extended (from 32 to 128) and several other things

so the overlap very quickly reduces the namespace.


I imagine, you would have "global namespaces" like an hypothetical RVCv2 that people will want to run for a whole program and that people want to extend. That would mean they are essentially just feature bits.

it's critically important to view them as going directly and mandatorially into the instruction decoder, as 33rd, 34th and 35th (and so on) instruction bits.
 
But it should also be possible to set and unset  ISANS locally in a function

absolutely.  you understand it perfectly.  so you understand also that it is important to have a convention that the function push the former ISANS onto the stack, and restore it back to the prior value as quickly as possible.

i hesitate to recommend that ISANS be allowed to be set and function calls permitted, except that i can foresee circumstances where that may actually genuinely needed.

in the case of other ISAs that have a LE/BE dynamic switch, usually what they do is: they actually just set the bit globally during the boot phase, and have the *entire* suite of packages statically recompiled for LE or statically compiled for BE.

thus we get debian distros compiled completely separately for LE and BE yet they run on the exact same hardware.


e.g. for using , a hypothetical on chip hardware RDMA interface

can we assume that you are referring to a "Remote Direct Memory Access" scheme? https://en.wikipedia.org/wiki/Remote_direct_memory_access
 
with a long namespace number (probably the most priced interface numbers are those in the top 20 bits because LUI can provides a 20 bits immediate in one instruction). In theory, after setting ISANS, every bit in the ISA is yours, but in practice you would ,say, reuse IMAC  because networking needs  integer processing, and, say, the FMADD major opcode of the isa, because you wanted to reuse the decoder for 3 input registers, and you don't care about FMADD while networking. Perhaps even more likely, you could use custom major opcodes, safe in the knowledge that because of namespacing, this does not trample over anybody else's custom opcode. 

eeexactlyyyy.
 

Note1: If such an ISANS scheme is adopted by the standard, I see no particular reason why being a standardised instruction would _necessarily_ imply  being in the default global namespace. The main advantage of being in the global namespace would be, not having to switch  namespaces back and forth with a CSR instruction. The disadvantage would be, taking up precious space in the default global namespace that can no longer be used for other purposes.

you're suggesting the ISAMUX/ISANS setting be made a actual RV32 opcode?  that would be unnecessary because CSRRW/S/C (etc.) perform the required task perfectly well, and, yes, RV32 opcode space is extremely precious.

plus, remember: if one RV32 opcode space is taken up, it's taken up across *all* namespaces (pretty much) in some form.  it gets particularly interesting when switching to foreign ISAs.  the foreign ISA has to provide a mechanism for switching back to RISC-V (or other namespaces).
 
Note2: I can imagine namespaces being per privilege level,

yes.  this was part of the original discussion 18 months ago.
 
but it seems ISANS will have to saved and restored with a change of privilege level anyway.

it's a little more complex than that.  each privilege level needs to know that whatever it previously set, when it comes back to that privilege level, it's not going to be asked to execute completely alien instructions.

the example that was given 18 months ago was x86 or MIPS ISAMUX/ISANS.  a trap occurs in the MIPS ISANS: do you let the trap be handled in the RISC-V ISANS? no, not unless you want garbled instructions!

this is a more extreme case, however it makes the point that even for the less extreme cases (RVCv2, LE/BE) some considerable care and thought is needed, as the privileged level has to know that it can safely execute all and any instructions.

Likewise one would have to think about the impact on (user level) interrupts.

yes.  again, this was part of the original discussion 18 months ago.
 
Note3:   Perhaps in such a scheme, MISA should be set per namespace. In the RDMA example: while in the RDMA networking extension namespace you would no longer fully support F and D and MISA would reflect that.


mmmm yyyeahhh, you're right.  certain types of ISAMUX/ISANS switches will indeed need to actually switch/redirect to entire new banks of alternate CSRs.

the extreme cases (MIPS ISAMUX/ISANS) demonstrate this particularly clearly, as they switch the *entirety* of the CSRs, register files - everything - out, whilst they are active.

 
One could easily imagine the namespaces be partitioned in  "to be used by the standard", "registered", and "free for experimentation with a random namespace number".

exactly.  so dan's suggestion, which is exactly and precisely the same as the ISAMUX / ISANS concept, is to not have reserved, registered *or* experimentation, dan's suggestion is to "leave it entirely to the custom space, with no organisation of any kind" - clearly this suggestion is unworkable as it creates chaos.

it's very involved.  there is a lot going on.

l.



 

lk...@lkcl.net

unread,
Jun 10, 2019, 9:06:58 PM6/10/19
to RISC-V ISA Dev, rogier....@gmail.com, lk...@lkcl.net, luke.l...@gmail.com


On Tuesday, June 11, 2019 at 12:54:15 AM UTC+1, Dan Petrisko wrote:
 
I do like 'isa namespaces' better.  As long as ISANS is WARL

no. 

Schuyler Eldridge

unread,
Jun 10, 2019, 9:13:32 PM6/10/19
to lk...@lkcl.net, RISC-V ISA Dev, luke.l...@gmail.com, rogier....@gmail.com
On Mon, Jun 10, 2019 at 05:18:57PM -0700, lk...@lkcl.net wrote:
> this leaves me to conclude that you are not equipped to make a sufficiently
> comprehensive and in-depth analysis of this space.
>
> i trust that you will not be involved in the decision-making of RISC-V
> extensions, as you are unable to deal with the complexities and unable to
> strategically think ahead.

This type of language is toxic, unprofessional, and needs to stop.

lk...@lkcl.net

unread,
Jun 10, 2019, 9:24:18 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Tuesday, June 11, 2019 at 1:25:24 AM UTC+1, Jacob Lifshay wrote:
Luke, if you're saying that someone is unable to do something ever, that's extremely rude,

and also not true (as you know).  so if it was implied that he (yes, i'm aware that i'm referring to you in the 3rd person, dan: no disrespect implied by doing so) was pathologically incapable of learning, that is clearly false / absurd / plain wrong / unintentional.
 
and I would suggest apologising.

or clarifying.

Pretty much all people are quite capable of learning in the future even if they don't understand something now.

indeed.  it's why i refrained from explicitly saying that i was going to give up and abandon this discussion, despite being seriously tempted to do so, as i have hope that, after he gets over the shock, he'll actually think through the logical reasoning/deduction mistakes that he's made, take more care in the future, and will make valuable contributions.

l.

Dan Cross

unread,
Jun 10, 2019, 9:24:28 PM6/10/19
to Schuyler Eldridge, lk...@lkcl.net, RISC-V ISA Dev, luke.l...@gmail.com, rogier....@gmail.com
Indeed. The constant bullying from this one particular list participant makes for a very unpleasant forum.

        - Dan C.

lk...@lkcl.net

unread,
Jun 10, 2019, 9:26:00 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
it's just reality that needs to be accepted.  you can say whatever you like that helps you to reject and deny reality, if you so choose.

l. 

lk...@lkcl.net

unread,
Jun 10, 2019, 9:38:24 PM6/10/19
to RISC-V ISA Dev, schuyler...@gmail.com, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Tuesday, June 11, 2019 at 2:24:28 AM UTC+1, Dan Cross wrote:
 
Indeed. The constant bullying from this one particular list participant makes for a very unpleasant forum.


"he" is right here, dan.  did you notice that i took great care, when referring to people in the third person, to make a note that i was aware that they were being so discussed, and to explicitly point out to them that there was no disrespect intended by doing so?

referring to someone in the third person excludes them from the conversation and makes them unwelcome.  it is terribly insulting, and so, hypocritically, you have criticised *me* for being a "bully"... using techniques that are known to be used by bullies.

can you see that that's what you did, and how it's just as unacceptable as the misconception that i am "out to be a bully".  i've *been* subjected to sustained bullying (at a boarding school that i attended for six years) - they were the worst years of my life.

i *know* what it's like.

do you *really* think that i am sitting here, typing this, looking to utilise it as a way to *deliberately* subject someone else to pain, in order to "get a kick out of seeing them suffer"??

if so, i have to say, "what the ****???"

so you cannot possibly be using that word "bully" to refer to me.  either that, or you do not understand the true meaning of the word.


this is an exceptionally complex area which took *months* to go over, and the only reason i'm pursuing it, despite being hated by all of you for doing so, is because the consequences for RISC-V if it is gotten wrong are too severe to let happen.

so let's all take a step back, calm down, and go over this carefully and in a respectful fashion, ok?  no more "reducto ad absurdum".

l.


Jacob Lifshay

unread,
Jun 10, 2019, 9:41:42 PM6/10/19
to Luke Kenneth Casson Leighton, RISC-V ISA Dev, schuyler...@gmail.com, luke.l...@gmail.com, rogier....@gmail.com
On Mon, Jun 10, 2019, 18:38 lk...@lkcl.net <lk...@lkcl.net> wrote:
so let's all take a step back, calm down, and go over this carefully and in a respectful fashion, ok?  no more "reducto ad absurdum".
Sounds good to me. :)

Schuyler Eldridge

unread,
Jun 10, 2019, 10:40:11 PM6/10/19
to lk...@lkcl.net, RISC-V ISA Dev, luke.l...@gmail.com, rogier....@gmail.com
On Mon, Jun 10, 2019 at 06:26:00PM -0700, lk...@lkcl.net wrote:
> On Tuesday, June 11, 2019 at 2:13:32 AM UTC+1, Schuyler Eldridge wrote:
> > On Mon, Jun 10, 2019 at 05:18:57PM -0700, lk...@lkcl.net <javascript:>
> > wrote:
> > > this leaves me to conclude that you are not equipped to make a sufficiently
> > > comprehensive and in-depth analysis of this space.
> > >
> > > i trust that you will not be involved in the decision-making of RISC-V
> > > extensions, as you are unable to deal with the complexities and unable to
> > > strategically think ahead.
> >
> > This type of language is toxic, unprofessional, and needs to stop.
> >
>
> it's just reality that needs to be accepted. you can say whatever you like
> that helps you to reject and deny reality, if you so choose.

The reality is that the RISC-V community is a welcoming, courteous
environment for technical discussion and development doing a whole lot
of amazing things.

The *state* of these mailing lists is too frequently the opposite.

Bullying students? Stating that they will not be involved in the
decision making process? Insinuating that their intelligence is
insufficient for analytical reasoning?

That's naked harassment. Plain and simple.

Harassment and bullying are antithetical to the RISC-V community. That
is not, I repeate *NOT*, the reality that needs to be accepted.

I repeat, more strongly, this *MUST* stop.

lk...@lkcl.net

unread,
Jun 10, 2019, 11:15:43 PM6/10/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com


On Tuesday, June 11, 2019 at 3:40:11 AM UTC+1, Schuyler Eldridge wrote:
 
The reality is that the RISC-V community is a welcoming, courteous
environment for technical discussion and development doing a whole lot
of amazing things.


schuyler: this is a statement of desire (a wish and a goal), not a reflection of reality.  i trust that you recognise the difference...
 
The *state* of these mailing lists is too frequently the opposite.

 ... which you demonstrate here.


Bullying students? Stating that they will not be involved in the
decision making process? Insinuating that their intelligence is
insufficient for analytical reasoning?

That's naked harassment.

no, it's a statement of fact.  saying "you're a f****g moron because you're incapable of basic rational thought, get off these lists before you do even more damage than you already have", *that* is what's called harrassment.

can you see the difference?  i made a statement of fact, based on the fact that Dan (again, apologies for referring to you in the third person) said that he read everything, yet he made near word-for-word the act same suggestion that Allen had made only five days beforehand... and yet claimed to have read everything.  as this was not the only example, it is quite reasonable to make the logical deduction that i did, and said so.

now, i appreciate that it's not very _nice_ to say what i said: that doesn't make it any the less true.  this is what i mean when i say that reality needs to be accepted.

reality can indeed not be very nice!  once it is accepted, the opportunity exists to make the changes needed to progress towards the goal - the desired (and we assume much better) outcome, through a process of corrective feedback.

if however we do not ACCEPT reality as it is, how on earth can the process of corrective feedback even be applied??

you see how that works?


Harassment and bullying are antithetical to the RISC-V community.

you should have said that when you saw the harrassment and bullying that i have been repeatedly subjected to on this list, over the past eighteen+ months.  one person has actually followed me to other lists and forums, and subjected me to sustained harrassment and abuse there as well.  he sees it as a "personal crusade".  it's really quite scary.

That 
is not, I repeate *NOT*, the reality that needs to be accepted.


you misunderstand completely, as you've seen certain trigger-words and are upset by them.  this is quite common.

I repeat, more strongly, this *MUST* stop.

with apologies - and making it clear that you've misunderstood that i am operating faithfully with intent here throughout everything that i've said - i am compelled to point out that you don't have the right to tell me what to do.

sorry, schuyler.

give it a few days to reflect, ok?

in the meantime, can i recommend that people review the original discussion, bear in mind that it was... exceptionally long, and just as contentious.  i documented it as best that i could, from feedback from a wide range of contributors, and included links to the discussions.


this really is an extremely complex (involved) area that has huge ramifications.

with that: i apologise, but i am in such overwhelming physical pain at the moment that i'm going to have to take a break for a few days.  normally someone in the level of pain that i am in would seek immediate hospital treatment, however as i am in a foreign country, do not have medical insurance, and have insufficient financial resources, that's just not possible.

l.

Dan Cross

unread,
Jun 11, 2019, 12:17:51 AM6/11/19
to lk...@lkcl.net, RISC-V ISA Dev, luke.l...@gmail.com
This will be my only response to in the matter. I do not wish to engage with the toxic bullying behavior displayed by this abusive poster.

On Mon, Jun 10, 2019 at 9:38 PM lk...@lkcl.net <lk...@lkcl.net> wrote:
On Tuesday, June 11, 2019 at 2:24:28 AM UTC+1, Dan Cross wrote:
Indeed. The constant bullying from this one particular list participant makes for a very unpleasant forum.

[snip]
referring to someone in the third person excludes them from the conversation and makes them unwelcome.

And yet you replied, so clearly you did not feel excluded. Truthfully though, I have no interest in engaging you in conversation -- technical or otherwise -- because you are objectively abusive and, yes, a bully. Beyond the one-time act of adding mine to the chorus of voices protesting your behavior, I do not feel the need to further give you the satisfaction.

it is terribly insulting, and so, hypocritically, you have criticised *me* for being a "bully"... using techniques that are known to be used by bullies.

can you see that that's what you did, and how it's just as unacceptable as the misconception that i am "out to be a bully".  i've *been* subjected to sustained bullying (at a boarding school that i attended for six years) - they were the worst years of my life.

This is typical bully behavior: attempt to silence others through aggression and then, when someone stands up to you, retreat to a sob story in a further attempt at silencing your target. It is an abusive power-play; a pattern that you often publicly repeat.

i *know* what it's like.

Then why do you do it to others? You act as though being treated poorly as a school boy gives you a license to behave poorly towards others as an adult. It does not.

do you *really* think that i am sitting here, typing this, looking to utilise it as a way to *deliberately* subject someone else to pain, in order to "get a kick out of seeing them suffer"??

Yes.

I've long lurked on these lists. From you I've seen repeated ad hominem attacks, unprofessional displays of extreme emotion, bizarre legal threats, and strangely entitled demands for others' otherwise reasonable behavior to change to satisfy some elastic sense of having been wronged (but being incapable of doing such wrong yourself) and a narcissistic sense of superiority feeding delusions of grandeur. All of which seem are on objective display, even if obscured in walls of unreadable text.

Ordinarily I ignore it, (note that I almost never post here) but really, it lessens the quality of the lists to such a degree that it ought to be addressed. It is sufficiently pervasive that I strongly suspect people leave the RISC-V community rather than put up with the toxicity; valuable contributors already say that they feel the RISC-V lists are unsalvageable because of antics like yours, in posts telling you to keep the behavior away from lists where real work actually gets done.

if so, i have to say, "what the ****???"

Bluntly, your behavior is abusive and toxic and you present yourself as the archetypical unbridled bully.

so you cannot possibly be using that word "bully" to refer to me.  either that, or you do not understand the true meaning of the word.

No. I use that word because it precisely and succinctly describes your behavior. If you don't like it, then don't behave that way.

this is an exceptionally complex area which took *months* to go over, and the only reason i'm pursuing it, despite being hated by all of you for doing so, is because the consequences for RISC-V if it is gotten wrong are too severe to let happen.

Let me state it explicitly, then.

Whatever modicum of technical acumen you bring to these lists is vastly outweighed by the disruption and harm you do through your strange attitude, extreme sense of entitlement and overall poor behavior. When someone calls you on your repeated pattern of abuse and bully antics, you cry foul and indulge in such a public display of self-pity that Lady Godiva would be green with envy.

Further, while there is a long tradition in the computer industry of individuals cultivating pretentious but harmless eccentricities, whatever charm there is in that ends when the eccentricity veers into abuse: you are way, way past that line. It is unfortunate that you evidently suffered through your time in boarding school but, and while this may be a bitter pill to swallow, no one owes you anything for that experience and no one is under any obligation to suborn themselves to your cruelty as a result.

If you do not wish to be categorized as I (and undoubtedly others) have categorized you, then behave like a reasonable professional adult. You claim to care about this community: prove that with your actions.

It's really as simple as that.

        - Dan C.

lkcl

unread,
Jun 11, 2019, 2:40:58 AM6/11/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com
On Tuesday, June 11, 2019 at 12:17:51 PM UTC+8, Dan Cross wrote:
> This will be my only response to in the matter.

Dan, your honesty is appreciated, I absolutely mean that. You've noticed that I simply do not operate according to conventional psychological behavioural patterns, and I have genuine difficulty identifying both when people have acted offensively towards me, and vice versa.

This lack leaves me with little option but to be plainly and completely honest, and unfortunately, as you've witnessed, rather than help a situation, such brutal honesty is frequently misunderstood and misinterpreted as speaking with vicious, spiteful, vengeful, callous and intentionally malicious intent.

I have absolutely no idea how to deal with this misperception.

Additionally, the lack of ability to spot when others have acted towards me in ways that would be immediately spotted by others as being offensive has a quite serious consequence: it allows people to get away with and continue to use completely inappropriate behaviour, whilst allowing them to accuse me of that very same offensive behaviour.

This isn't deliberate: it's simply an unintended side-effect.

The only thing that I can do is apologise to all concerned: beyond that, I can only apologise in advance that I have set a goal that needs to be completed, and there is nothing that I will let stand in the way of that (including my own limitations). Yes, this is not normal behaviour, it is quite pathological.

Lastly, it's worth saying, Dan (and Johnathon), neither of you did anything "wrong". You spoke your minds, as you should. Again, Dan, thank you, sincerely and genuinely, for speaking up and being so honest, particularly in such a public way. I'm sorry I have such limited understanding of human behaviour.

Warmest,

L.

Rogier Brussee

unread,
Jun 11, 2019, 5:01:28 AM6/11/19
to RISC-V ISA Dev, rogier....@gmail.com, lk...@lkcl.net, luke.l...@gmail.com


Op dinsdag 11 juni 2019 01:54:15 UTC+2 schreef Dan Petrisko:
Thanks for the clarification Rogier!

I do like 'isa namespaces' better.  As long as ISANS is WARL so that implementations which only support the default namespace do not trap, it seems reasonable.

I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode. M-mode may  emulate the instruction or decide the namespaced-instruction is an unrecognised/illegal instruction and take the usual measures . Some form of discovery mechanism/ device tree description is probably useful just like for instructions in the basic global ISA namespace. 


Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).  

I very much hope it is :). 

A bit of tongue in cheek. But I can imagine people could be using slews of namespaces parametrised by feature bits (e.g. RDMA namespace with and without RVCv2), which would mean that 2^32 may be less than it seems. Still I agree.  

 
One of the nice things about RISC-V is its smallness.  In the RISC-V Reader, they espouse the virtues of having the entire ISA fit in a small book, rather than the several volumes of say, Power.  As a community we should try to avoid "kitchen-sink" ISA proliferation.  It would be much better if a few custom extensions become widely adopted and then enshrined in the standard, rather than tons and tons of custom extensions become relatively popular, conflicting with each other, causing toolchain complications and software developer confusion.

Note2: I can imagine namespaces being per privilege level, but it seems ISANS will have to saved and restored with a change of privilege level anyway. Likewise one would have to think about the impact on (user level) interrupts.

Potentially very cool idea, has interesting implications for virtualization.

One could easily imagine the namespaces be partitioned in  "to be used by the standard", "registered", and "free for experimentation with a random namespace number".

My opinion: The extension discovery/enable/disable mechanism is a platform issue -- embedding it into the user-level spec is a mistake.  A standard mechanism could have a place in a separate platform specification. However, many systems which handle some complicated set of non-standard extensions may want their own mechanisms.


It would be more of a convention to use and a statement similar to claiming parts of the ISA bits reserved for use by the standard. 
 

I could be wrong, I don't have a ton of experience in BSP/ucontroller architectures.  My impression is that the toolchains are very custom.


Nor have I (in fact none). I just tried to isolate and clarify what seems like a potentially useful and natural idea that seemed to drown in overheated discussion. But enough has been said on that matter.

Ciao

Rogier

 
Best,
Dan Petrisko

 

To unsubscribe from this group and stop receiving emails from it, send an email to isa...@groups.riscv.org.

Daniel Petrisko

unread,
Jun 11, 2019, 9:36:18 AM6/11/19
to Rogier Brussee, RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com
I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode.

WARL gives you 3 options:
1) Don’t allow the bit to be set. Then U-mode software knows to emulate. 
2) Allow the bit to be set, but trap as illegal and emulate in M-mode. (The behavior you’re describing)
3) Allow the bit to be set, implement the namespace. 

It allows systems which don’t have a trap mechanism (e.g. only implement the unprivileged spec) to use it. 

Making it WLRL only allows option 2 or 3, and mandates a trap mechanism. 

Can you explain why WLRL is the better option?

Best,
Dan Petrisko
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

lkcl

unread,
Jun 11, 2019, 9:49:55 AM6/11/19
to RISC-V ISA Dev, rogier....@gmail.com, lk...@lkcl.net, luke.l...@gmail.com
On Tuesday, June 11, 2019 at 9:36:18 PM UTC+8, Dan Petrisko wrote:
> I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode.
>
>
>
> WARL gives you 3 options:
> 1) Don’t allow the bit to be set. Then U-mode software knows to emulate. 
> 2) Allow the bit to be set, but trap as illegal and emulate in M-mode. (The behavior you’re describing)
> 3) Allow the bit to be set, implement the namespace. 
>
>
> It allows systems which don’t have a trap mechanism (e.g. only implement the unprivileged spec) to use it. 
>
>
> Making it WLRL only allows option 2 or 3, and mandates a trap mechanism. 
>
>
> Can you explain why WLRL is the better option?

Daniel: this is an extremely well thought out, respectful and open minded approach. Why on earth did you not take this approach before??

I will leave it to Rogier to explain, as you respond well to his insights.

Once he has done so I will be able to highlight a crucial caveat that only makes sense once the reason for WLRL has been made clear.

L.

Rogier Brussee

unread,
Jun 11, 2019, 10:05:33 AM6/11/19
to RISC-V ISA Dev, lk...@lkcl.net, luke.l...@gmail.com, rogier....@gmail.com
[]

Note1: If such an ISANS scheme is adopted by the standard, I see no particular reason why being a standardised instruction would _necessarily_ imply  being in the default global namespace. The main advantage of being in the global namespace would be, not having to switch  namespaces back and forth with a CSR instruction. The disadvantage would be, taking up precious space in the default global namespace that can no longer be used for other purposes.

you're suggesting the ISAMUX/ISANS setting be made a actual RV32 opcode?  that would be unnecessary because CSRRW/S/C (etc.) perform the required task perfectly well, and, yes, RV32 opcode space is extremely precious.

plus, remember: if one RV32 opcode space is taken up, it's taken up across *all* namespaces (pretty much) in some form.  it gets particularly interesting when switching to foreign ISAs.  the foreign ISA has to provide a mechanism for switching back to RISC-V (or other namespaces).


No I was not suggesting that: as you wrote, using CSR instructions  should work fine (at least as long as the namespace has these instructions or any other cheap way to escape back out of that namespace).

I was probably not so clear however. I mean that _if_ ISANS is part of the standard, then whoever defines the standard has three options.
* Find room in the 32 bit ISA, making the instruction cheap to access but taking up precious bits assuming they are available in the first place.
* Define an instruction with more bits, typically 48 or 64 taking up more but less precious bits in the instruction stream.
* Define an instruction in a separate namespace, using even less precious bits and perhaps following de facto standard practice. Also there _can_ be eventually fewer bits in the instruction stream than if the new instruction is 48 or 64 bit, but you have an amortised cost of switching namespaces.



 


 

Rogier Brussee

unread,
Jun 11, 2019, 11:06:44 AM6/11/19
to RISC-V ISA Dev, rogier....@gmail.com, lk...@lkcl.net, luke.l...@gmail.com


Op dinsdag 11 juni 2019 15:36:18 UTC+2 schreef Dan Petrisko:
I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode.

WARL gives you 3 options:
1) Don’t allow the bit to be set. Then U-mode software knows to emulate. 
2) Allow the bit to be set, but trap as illegal and emulate in M-mode. (The behavior you’re describing)
3) Allow the bit to be set, implement the namespace. 

It allows systems which don’t have a trap mechanism (e.g. only implement the unprivileged spec) to use it. 

Making it WLRL only allows option 2 or 3, and mandates a trap mechanism. 

Can you explain why WLRL is the better option?

The better option, I think, would be WARA (Write Any Read Any if I get that right). It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made. I _think_ (but correct me if I am wrong) that if a processor cannot trap to M-mode because M mode is all there is, you are supposed to just not use unknown instructions (which using an instruction in an unsupported namespace would be) and if you do it anyway, you must expect your program to terminate. 

Having said that, I can see the usefulness of being able to test if a namespace is recognised, so that  you can test whether there is an _advantage_ to using the instruction in the namespace, as opposed to being slow but not blowing up, in portable software. Maybe (just maybe, I make this up as I write) if MISA has to change when you change ISANS then it can indicate whether the current ISA namespace is supported in hardware, Or maybe it is better to have two CSR's that shadow each other: a WARA version and a WLRL version. But maybe both of these ideas are well known, gaping virtualisation holes that will ruin world peace. I really don't know. 


Ciao

Rogier

[]

Rogier Brussee

unread,
Jun 11, 2019, 11:16:06 AM6/11/19
to RISC-V ISA Dev, rogier....@gmail.com, lk...@lkcl.net, luke.l...@gmail.com


Op dinsdag 11 juni 2019 17:06:44 UTC+2 schreef Rogier Brussee:
make that: WARA and WARL version.
 

Ciao

Rogier

[]

Samuel Falvo II

unread,
Jun 11, 2019, 11:26:43 AM6/11/19
to Daniel Petrisko, Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, lkcl
On Tue, Jun 11, 2019 at 6:36 AM Daniel Petrisko
<petr...@cs.washington.edu> wrote:
> WARL gives you 3 options:
> 1) Don’t allow the bit to be set. Then U-mode software knows to emulate.
> 2) Allow the bit to be set, but trap as illegal and emulate in M-mode. (The behavior you’re describing)
> 3) Allow the bit to be set, implement the namespace.

There's a fourth option which is available:

4) Provide a ISANS register to actually select which ISA profile you
wish to run under; provide also an ISANSDETECT register which is WIRL
and is a bitmask indicating which bits are supported.

I find zero value in burdening the software engineer with busy-work
when the hardware can just answer the question directly and do so with
practically no cost in hardware. There once was a time when MISA was
itself a read-only register; but now that it *can be* read-write, the
only way to reliably "auto-detect" what ISA extensions the processor
truly supports is to shadow the cold-boot value of MISA in RAM
somewhere, captured as closely to cold-boot startup as is feasible,
and hope that it never gets overwritten.

Another reason why option 4 might be preferable is that, because the
preferred embodiment of ISANS is to feed the instruction decoders
directly, it's entirely possible that one processor implementation may
select one combination of ISANS bits which result in a completely
alternative instruction set becoming active (e.g., a processor
designed to support native retro-gaming which switches from RISC-V to
65816 instruction sets, for example). Once this ISA is selected, you
need a "thunk" of some kind to switch back to the RISC-V ISA. Instead
of polling the register and risk having to implement auto-detect code
in N possible ISAs, exposing a read-only bitmask of supported ISANS
fields is far cheaper in both hardware and software terms, I think.


--
Samuel A. Falvo II

Dan Petrisko

unread,
Jun 11, 2019, 11:54:32 AM6/11/19
to Samuel Falvo II, Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, lkcl
The better option, I think, would be WARA (Write Any Read Any if I get that right). It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.

Ah, I think I understand.  Your concern is that if one hardcodes all future extension bits to 0 (as permitted in WARL), then you will not be able to hardware emulate those instructions for future software on the same silicon?  
  • If your processor supports emulation, then the correct thing to do is allow all bits of ISANS to be written.  Then it is able to use all ISANS currently specified or in the future, either natively or in emulation (with an M-mode patch).  
  • If your processor does not support emulation, then the correct thing to do is hardcode unsupported bits to zero.  Then it is able to use all extensions that it natively supports. It will never be able to be upgraded because there's no concept of an M-mode patch.
WARL gives you the flexibility to do either, with the benefit of a built-in discovery mechanism.  It is strictly better than WARA in this case.
 
I _think_ (but correct me if I am wrong) that if a processor cannot trap to M-mode because M mode is all there is, you are supposed to just not use unknown instructions (which using an instruction in an unsupported namespace would be) and if you do it anyway, you must expect your program to terminate. 

Yep, exactly.  
 
Having said that, I can see the usefulness of being able to test if a namespace is recognised, so that  you can test whether there is an _advantage_ to using the instruction in the namespace, as opposed to being slow but not blowing up, in portable software.

Note that neither WLRL or WARL actually tells you this information.  If a bit is supported it is _either_ allowed to execute natively or trap and emulate in M-mode, with the user-level software having no idea about which way it's implemented.  A 'hardware-supported extension query' must be implemented as a platform mechanism.

There once was a time when MISA was
itself a read-only register; but now that it *can be* read-write, the
only way to reliably "auto-detect" what ISA extensions the processor
truly supports is to shadow the cold-boot value of MISA in RAM
somewhere, captured as closely to cold-boot startup as is feasible,
and hope that it never gets overwritten.

My understanding is that this behavior is intentional, to support virtualization.
 

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Samuel Falvo II

unread,
Jun 11, 2019, 12:05:07 PM6/11/19
to Dan Petrisko, Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, lkcl
On Tue, Jun 11, 2019 at 8:54 AM Dan Petrisko <petr...@cs.washington.edu> wrote:
> My understanding is that this behavior is intentional, to support virtualization.

Also to support running RV32 code on RV64 systems. While I know why
it was done, I pointed it out as a source of perhaps inadvertent
complexity that change introduced. My goal was to remind contributors
to this thread that sometimes the simplest solution is the most
obvious. We don't always need to be so clever. ;)

Allen Baum

unread,
Jun 11, 2019, 5:10:04 PM6/11/19
to Daniel Petrisko, Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, lkcl
No, from sec 2.3 of the priv spec:
    Implementations will not raise an exception on writes of unsupported values to a WARL field. 
If the value written is legal, it writes it.
If the value written is illegal, it instead writes an (implementation defined) deterministic function of the value attempted to be written, and the previous contents.
The compliance group will be standardizing on 2 major options with variations 

WLRL is permitted to, but not required to , raise an exception if the write of an illegal value is attempted.
If it doesn't raise an exception, the result is undefined, and what is read back is undefined.
It is the only defined field type that allows traps.

On Tue, Jun 11, 2019 at 6:36 AM Daniel Petrisko <petr...@cs.washington.edu> wrote:

Daniel Petrisko

unread,
Jun 11, 2019, 5:17:42 PM6/11/19
to Allen Baum, Rogier Brussee, RISC-V ISA Dev, Luke Kenneth Casson Leighton, lkcl
Sorry, I realize what I wrote is unclear. 

You are correct, it will not raise an exception upon a write to CSR. 

I meant that the processor is still allowed to trap upon illegal instruction, not upon the CSR write itself. 

To rephrase:

WARL gives you 3 options:
1) Don’t allow the bit to be set. Then U-mode software knows to emulate. 
2) Allow the bit to be set, but trap as illegal when encountering instructions that the processor does not support and emulate in M-mode. (The behavior you’re describing)
3) Allow the bit to be set, implement the namespace. 

Luke Kenneth Casson Leighton

unread,
Jun 12, 2019, 12:16:26 AM6/12/19
to Allen Baum, Daniel Petrisko, Rogier Brussee, RISC-V ISA Dev


On Wednesday, June 12, 2019, Allen Baum <allen...@esperantotech.com> wrote:
 

WLRL is permitted to, but not required to , raise an exception if the write of an illegal value is attempted.
If it doesn't raise an exception, the result is undefined, and what is read back is undefined.
It is the only defined field type that allows traps.

Ok so this is the one that is needed.

For the embedded platform, it would not be sensible but would be permitted to ignore illegal values.

For the UNIX platform, it *has* to be made mandatory that the trap is raised, otherwise chaos ensues as software emulation becomes impossible.

Thank you Allen.




--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

Luke Kenneth Casson Leighton

unread,
Jun 12, 2019, 12:16:27 AM6/12/19
to Daniel Petrisko, Allen Baum, Rogier Brussee, RISC-V ISA Dev


On Wednesday, June 12, 2019, Daniel Petrisko <petr...@cs.washington.edu> wrote:
Sorry, I realize what I wrote is unclear. 

You are correct, it will not raise an exception upon a write to CSR. 

I meant that the processor is still allowed to trap upon illegal instruction, not upon the CSR write itself. 

How would a missing namespace be detected and emulated?

Luke Kenneth Casson Leighton

unread,
Jun 12, 2019, 12:16:30 AM6/12/19
to Daniel Petrisko, Allen Baum, Rogier Brussee, RISC-V ISA Dev
Ok so I trust it's now clear why WLRL (thanks Allen) is needed.

When Dan raised the WARL concern initially a situation was masked by the conflict, that if gone unnoticed would jeapordise ISAMUX/ISANS entirely. Actually, two separate errors. So thank you for raising the question.

The situation arises when foreign archs are to be given their own NS bit. MIPS is allocated bit 8, x86 bit 9, whilst LE/BE is given bit 0, RVCv2 bit 1 andso on. All of this potential rather than actual, clearly.

Imagine then that software tries to write and set not just bit 8 and bit 9, it also tries to set bit 0 and 1 as well.

This *IS* on the face of it a legitimate reason to make ISAMUX/ISANS WARL.

However it masks a fundamental flaw that has to be addressed, which brings us back much closer to the original design of 18 months ago, and it's highlighted thus:

x86 and simultaneous RVCv2 modes are total nonsense in the first place!

The solution instead is to have a NS bit (bit0) that SPECIFICALLY determines if the arch is RV or not.  If 0, the rest of the ISAMUX/ISANS is very specifically RV *only*, and if 1, the ISAMUX/ISANS is a *binary* table of foreign architectures and foreign architectures only.

Exactly how many bits are used for the foreign arch table, is to be determined. 7 bits, one of which is reserved for custom usage, leaving a whopping 64 possible "official" foreign instruction sets to be hardware-supported/JIT-emulated seems to be sufficiently gratuitous, to me.

One of those could even be Java Bytecode!

Now, it could *hypothetically* be argued that the permutation of setting LE/BE and MIPS for example is desirable. A simple analysis shows this not to be the case: once in the MIPS foreign NS, it is the MIPS hardware implementation that should have its own way of setting and managing its LE/BE mode, because to do otherwise drastically interferes with MIPS binary compatibility.

Thus, it is officially Not Our Problem: only flipping into one foreign arch at a time makes sense, thus this has to be reflected in the ISAMUX/ISANS CSR itself, completely side-stepping the (apparent) need to make the NS CSR WARL (which would not work anyway, as previously mentioned).

So, thank you, again, Dan, for raising this. It would have completely jeapordised ISAMUX/NS if not spotted.

The second issue is: how does any hardware system, whether it support ISANS or not, and whether any future hardware supports some Namespaces and, in a transitive fashion, has to support *more* future namespaces, through JIT emulation, if this is not planned properly in advance?

Let us take the simple case first: a current 2019 RISCV fully compliant RV64GC UNIX capable system (with mandatory traps on all unsupported CSRs).

Fast forward 20 years, there are now 5 ISAMUX/NS unary bits, and 3 foreign arch binary table entries.

Such a system is perfectly possible of software JIT emulating ALL of these options because the write to the (illegal, for that system) ISAMUX/NS CSR generates the trap that is needed for that system ti begin JIT mode.

(This again emphasises exactly why the trap is mandatory).

Now let us take the case of a hypothetical system from say 2021 that implements RVCv2 at the hardware level.

Fast forward 20 years: if the CSR were made WARL, that system would be absolutely screwed. The implementor would be under the false impression that ignoring setting of "illegal" bits was acceptable, making the transition to JIT mode flat-out impossible to detect.

When this is considered transitively, considering all future additions to the NS, and all permutations, it can be logically deduced that there is a need to reserve a *full* set of bits in the ISAMUX/NS CSR *in advance*.

i.e. that *right now*, in the year 2019, the entire ISAMUX/NS CSR cannot be added to piecemeal, the full 32 (or 64) bits *has* to be reserved, and reserved bits set at zero.

Furthermore, if any software attempts to write to those reserved bits, it *must* be treated just as if those bits were distinct and nonexistent CSRs, and a trap raised.

It makes more sense to consider each NS as having its own completely separate CSR, which, if it does not exist, clearly it should be obvious that, as an unsupported CSR, a trap should be raised (and JIT emulation activated).

However given that only the one bit is needed (in RV NS Mode, not Foreign NS Mode), it would be terribly wasteful of the CSRs to do this, despite it being technically correct and much easier to understand why trap raising is so essential (mandatory).

This again should emphasise how to mentally get one's head round this mind-bendingly complex problem space: think of each NS bit as its own totally separate CSR that every implementor is free and clear to implement (or leave to JIT Emulation) as they see fit.

Only then does the mandatory need to trap on write really start to hit home, as does the need to preallocate a full set of reserved zero values in the RV ISAMUX/NS.

Lastly, I *think* it's ok to only reserve say 32 bits, and, in 50 years time if that genuinely is not enough, start the process all over again with a new CSR.  ISAMUX2/NS2.

Subdivision of the RV NS (support for RVCv3/4/5/RV16 without wasting precious CSR bits) best left for discussion another time, the above is a heck of a lot to absorb, already.

L.

Luke Kenneth Casson Leighton

unread,
Jun 12, 2019, 12:16:30 AM6/12/19
to Dan Petrisko, Samuel Falvo II, Rogier Brussee, RISC-V ISA Dev


On Tuesday, June 11, 2019, Dan Petrisko <petr...@cs.washington.edu> wrote:
The better option, I think, would be WARA (Write Any Read Any if I get that right). 

Allen clarified: it's called WLRL and it's the only one that's allowed to raise exceptions.

This exception is what allows the processor to enter JIT emulation mode until such time as the software (all of it) writes to the ISAMUX/ISANS with a value that the hardware *can* continue further execution in a namespace it *does* have the hardware capability to execute.

This is very different from WARL, where due to the loss of context (the namespace), the processor has absolutely no way of determining the meaning of an opcode.

Effectively, ISAMUX aka ISANS is identical to the c++ "using namespace { ....}" construct.

Pauses to reflect a bit.  Yeah. Whether done globally or in scopes, it's an exceptionally clear and precise analogy, in pretty much every way.


It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.

Ah, I think I understand.  Your concern is that if one hardcodes all future extension bits to 0 (as permitted in WARL), then you will not be able to hardware emulate those instructions for future software on the same silicon?  

Partly other way round.  Not be able to *software* emulate those instructions (the ones that have the same opcode) for future software running on the same silicon.

Is that clear now, with the c++ analogy? The need in c++ to be able to distinguish between shorter names that clash globally is very well understood, and very clear.

The situation here is no different.

Rogier Brussee

unread,
Jun 12, 2019, 5:51:21 AM6/12/19
to RISC-V ISA Dev, petr...@cs.washington.edu, sam....@gmail.com, rogier....@gmail.com


Op woensdag 12 juni 2019 06:16:30 UTC+2 schreef lk...@lkcl.net:


On Tuesday, June 11, 2019, Dan Petrisko <petr...@cs.washington.edu> wrote:
The better option, I think, would be WARA (Write Any Read Any if I get that right). 

Allen clarified: it's called WLRL and it's the only one that's allowed to raise exceptions.

This exception is what allows the processor to enter JIT emulation mode until such time as the software (all of it) writes to the ISAMUX/ISANS with a value that the hardware *can* continue further execution in a namespace it *does* have the hardware capability to execute.

This is very different from WARL, where due to the loss of context (the namespace), the processor has absolutely no way of determining the meaning of an opcode.

Effectively, ISAMUX aka ISANS is identical to the c++ "using namespace { ....}" construct.
 
Pauses to reflect a bit.  Yeah. Whether done globally or in scopes, it's an exceptionally clear and precise analogy, in pretty much every way. 


Exactly. Which is why I suggested the name ISANS for ISA namespace, and think it is helpfully easier to communicate about than ISAMUX.  It suggests what the mechanism _does_ rather than how it Is implemented (which is also important, but not necessary the right starting point for thinking about it, especially if you are communicating with someone who has not gone through the same process of figuring out how to make it work). 


 

It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.

Ah, I think I understand.  Your concern is that if one hardcodes all future extension bits to 0 (as permitted in WARL), then you will not be able to hardware emulate those instructions for future software on the same silicon?  

Partly other way round.  Not be able to *software* emulate those instructions (the ones that have the same opcode) for future software running on the same silicon.

Is that clear now, with the c++ analogy? The need in c++ to be able to distinguish between shorter names that clash globally is very well understood, and very clear.

The situation here is no different.




>x86 and simultaneous RVCv2 modes are total nonsense in the first place!
>
>The solution instead is to have a NS bit (bit0) that SPECIFICALLY determines if the arch is RV or not.  If 0, the rest of the ISAMUX/ISANS is very specifically RV *only*, and if 1, the ISAMUX/ISANS is a *binary* table of >foreign architectures and foreign architectures only.

>Exactly how many bits are used for the foreign arch table, is to be determined. 7 bits, one of which is reserved for custom usage, leaving a whopping 64 possible "official" foreign instruction sets to be hardware->supported/JIT-emulated seems to be sufficiently gratuitous, to me.

>One of those could even be Java Bytecode!

ISANS, contains a 32 (or 64)  bit _number_ , rather similar to a major opcode.

Now for decoder simplicity it may be useful to reserve a stupidly simple 1 bit information  number  like 1 << 12 for things like RVC vs RVCv2  so that two namespaces that share effectively all except one uses RVC and the other RVCv2 can just differ in that bit. However namespaces that just give access to some custom functionality or for completely different ISA's where RVC vs RVCv2 is  obviously not an issue, the same bit may be used for different purposes. E.g. it seems perfectly reasonable (warning completely made up toy example!) to have encodings like

ISANS = 0x86<< 20              for  x86-32  
ISANS = 0x86<<20 | 1<<12  for  x86-64  
ISANS = 0x87<<20               for MIPS32 
ISANS = 0x87<<20 | 1 <<12 for MIPS64
ISANS = 0x87<<20 | 3<<12  for MIPS64R6  
ISANS = 0x87<<20 | 7<<12  for microMIPS
ISANS = 0x88<<20               for Java Bytecode
ISANS = 0x89<<20               for ARM thumb
ISANS = 0x8A<<20               for ARM64
......
etc etc.

lkcl

unread,
Jun 12, 2019, 11:22:34 AM6/12/19
to RISC-V ISA Dev, petr...@cs.washington.edu, sam....@gmail.com, rogier....@gmail.com
Record being kept here, will evolve into standard plus FAQ over time.
https://libre-riscv.org/isa_conflict_resolution/isamux_isans/

>
>
> > Pauses to reflect a bit.  Yeah. Whether done globally or in scopes, it's an exceptionally clear and precise analogy, in pretty much every way. 
>
>
>
>
>
> Exactly. Which is why I suggested the name ISANS for ISA namespace, and think it is helpfully easier to communicate about than ISAMUX.

Started using both words.

> It suggests what the mechanism _does_ rather than how it Is implemented (which is also important, but not necessary the right starting point for thinking about it, especially if you are communicating with someone who has not gone through the same process of figuring out how to make it work). 
>

Good point. Standards need to be understandable without context or special inside knowledge.

>
> >One of those could even be Java Bytecode!
>
>
> ISANS, contains a 32 (or 64)  bit _number_ , rather similar to a major opcode.

Sort-of... yes. It actually makes more sense to think of each namespace as its own completely separate WLRL CSR, of 1 to 32 bits per "purpose" (namespace). One CSR for LEBE, one for foreign arch selection and so on.

>
> Now for decoder simplicity it may be useful to reserve a stupidly simple 1 bit information  number  like 1 << 12 for things like RVC vs RVCv2  so that two namespaces that share effectively all except one uses RVC and the other RVCv2 can just differ in that bit.

Ok, here is appropriate to raise an idea how to cover RVC and future variants, including RV16.

Just as with foreign archs, and you quite rightly highlight above, it makes absolutely no sense to try to select both RVCv1, v2, v3 and so on, all simultaneously. An unary bit vector for RVC modes, changing the 16 BIT opcode space meaning, is wasteful and again has us believe that WARL is the "solution".

The correct thing to do is, again, just like with foreign archs, to treat RVCs as a *binary* namespace selector. Bits 1 thru 3 would give 8 possible completely new alternative meanings, just like how the Z80 and the 286 and 386 used to do bank switching.

All zeros is clearly reserved for the present RVC. 0b001 for RVCv2. 0b010 for RV16 (look it up) and there should definitely be room reserved here for custom reencodings of the 16 bit opcode space.



> However namespaces that just give access to some custom functionality or for completely different ISA's where RVC vs RVCv2 is  obviously not an issue, the same bit may be used for different purposes.

Some care has to be taken not to make the decoder too complex.

> E.g. it seems perfectly reasonable (warning completely made up toy example!) to have encodings like
>
>
> ISANS = 0x86<< 20              for  x86-32  
> ISANS = 0x86<<20 | 1<<12  for  x86-64  
> ISANS = 0x87<<20               for MIPS32 
> ISANS = 0x87<<20 | 1 <<12 for MIPS64
> ISANS = 0x87<<20 | 3<<12  for MIPS64R6  
> ISANS = 0x87<<20 | 7<<12  for microMIPS
> ISANS = 0x88<<20               for Java Bytecode
> ISANS = 0x89<<20               for ARM thumb
> ISANS = 0x8A<<20               for ARM64
> ......
> etc etc.

Time for one of those ASCII Art bitfields, I feel.

L.
It is loading more messages.
0 new messages