Hi,a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:
Hi,a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:
Most of the differences between these formats and the previous ones are based on feedback I got from here for the last proposal. So please keep the feedback coming.
On Tue, May 14, 2019, 03:06 Clifford Wolf <cliffor...@gmail.com> wrote:Hi,a while back I posted a proposal for instruction formats >32 bit. Unfortunately that discussion got sidetracked, as I feel, by the instructions I proposed alongside the formats.So here is a 2nd attempt. This is an updated proposal for the instruction formats. All concrete instructions are just examples for how the formats could be used:Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats.
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|-----------------------------------|-------------------------------|-------------------------------|
... | funct7 | rs2 | rs1 | f3 | rd | opcode (8bit) |f89| len | 00|page | 00| 11111 | prefix
... immediate |f| len | f2| rd' | op| 11111 | LI
... immediate |f| len | f2| rd | op| 11111 | JALR
imm18| funct7 | rs2 | rs1 |imm15| rd | imm[5..14] | len | imm[0.4]| op| 11111 | packed..NN ..17
For comparison, the standard 32-bit format (indentation added for clarity, to line up with the above):
| 3 2 1 | |1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0| |---------------------------------------------------------------| | funct7 | rs2 | rs1 | f3 | rd | opcode(7bit)| 32-bit format
| imm[5..11] | rs2 | rs1 | f3 | imm[0.4]| opcode(7bit)| 32-bit S-format | imm[0..11] | rs1 | f3 | rd | opcode(7bit)| 32-bit I-format | imm[12..31 | rd | opcode(7bit)| 32-bit U-format
On Tue, May 14, 2019, 03:06 Clifford Wolf <cliffor...@gmail.com> wrote:Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.
One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.I don't understand what you mean. It is always in the same position. len is always instr[4:2].
Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.
Hi,On Tue, May 14, 2019 at 8:27 PM Clifford Wolf <cliffor...@gmail.com> wrote:One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.I don't understand what you mean. It is always in the same position. len is always instr[4:2].I meant inst[14:12] of course. :)
Looks quite good so far. Quite happy that there is 1 more bit available in the custom extension space.btw, it's 3 more bits. when inst[14:12]=111 then instr[6:5] also becomes available to the custom extension.In my previous proposal those two bits where used as part of the length encoding.
On Tue, May 14, 2019, 11:32 Clifford Wolf <cliffor...@gmail.com> wrote:Hi,On Tue, May 14, 2019 at 8:27 PM Clifford Wolf <cliffor...@gmail.com> wrote:One thing I think would be good to change is to move the len field so it's in the same position (counting from LSB) in all instruction formats. This will facilitate decoding and make the encoding of custom instructions actually well defined, since otherwise it's unknown which bits should be set to 111 to have a custom instruction, since setting both locations to 111 seems like an unviable solution.I don't understand what you mean. It is always in the same position. len is always instr[4:2].I meant inst[14:12] of course. :)Sorry, I think I read the version modified by Luke and thought that that was in the original version.
Am I missing something or does the prefix format only use 15 bits (rather than 16 bits)?
Thinking about it, it might actually even make sense to disallow mixing compressed instructions and prefixes, which would unlock 2 more bits.
It also feels really unfortunate that the LI instruction type would be limited to so few destination registers.
This seems to be a consequence of making the length field 3 bits for {48, 64, 80}-bit instructions. However, that could be mitigated by using multiple len values for 48-bit and 80-bit instructions (where floating point immediate values would be in play, and thus more space is needed):
len | op | meaning------------------0xx | 00 | prefix0xx | 01 | packed format0xx | 10 | Load integer immediate0xx | 11 | JALR immediate100 | ?? | Load 32-bit FP immediate (48-bit instruction)101 | xx | reserved for custom110 | ?? | Load 64-bit FP immediate (80-bit instruction)111 | xx | reserved for longer instructions
To me, this sort of proposal feels really hard to evaluate without getting sketching out how everything is going to fit into the encoding space. How much leftover space is there in the the load-imm and JAL format opcode spaces?
Any chance of allowing all 32 destination registesr
I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so:| 4 | 3 2 1 ||7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0||-----------------------------------|---------------------------------------------------------------|... immediate |0 0 0|len | rd | 00| 11111 | jump-and-link format... immediate |func3|len | rd' | 00| 11111 | load-immediate format... immediate |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len | rd' | 01| 11111 | prefix16 format... funct7 | rs2 | rs1 | f3 | rd | opcode5|op|func3|len | page| 10| 11111 | prefix32 format...*********************************TBD******************************|func3|len | page| 11| 11111 | reserved (prefix48 format ?????)(unfortunately this looks horrible in proportional font) The idea is to reuse the RVC decoder for RVC-R-type instructions (like C.SUB) for the prefixing bit 0..15 which depending on bit[5:6] is then followed by00: an immediate directly,01: an instruction in restricted RVC--S-type (like C.sd) RVC format10: a regular 32bit R-type RV instruction (like ADD).11: reserved or a hypothetical 48 bit instruction format
In this scheme it may be easier to have Len encode the length of the immediate: length(imm) <-- 16* (len != 111)?2^len: 0 bits, which gives a maximum immediate length of 64*16 bit = 128 byte
Also in the load immediate format I have assumed func3 = 001 .. 111 but if you are allowing rd (registers x0..x7) in the jump_and_link format anyway (which is a good idea), one _could_ also decide to use rd (registers x0..x7 rather than x8..15) for loading immediates, using 1 bit of func3 but little extra cost to the encoder, effectively increasing the register range to x0...x15.
Hi,
On Wed, May 15, 2019 at 10:31 AM Rogier Brussee <rogier...@gmail.com> wrote:I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so:| 4 | 3 2 1 ||7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0||-----------------------------------|---------------------------------------------------------------|... immediate |0 0 0|len | rd | 00| 11111 | jump-and-link format... immediate |func3|len | rd' | 00| 11111 | load-immediate format... immediate |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len | rd' | 01| 11111 | prefix16 format... funct7 | rs2 | rs1 | f3 | rd | opcode5|op|func3|len | page| 10| 11111 | prefix32 format...*********************************TBD******************************|func3|len | page| 11| 11111 | reserved (prefix48 format ?????)(unfortunately this looks horrible in proportional font) The idea is to reuse the RVC decoder for RVC-R-type instructions (like C.SUB) for the prefixing bit 0..15 which depending on bit[5:6] is then followed by00: an immediate directly,01: an instruction in restricted RVC--S-type (like C.sd) RVC format10: a regular 32bit R-type RV instruction (like ADD).11: reserved or a hypothetical 48 bit instruction formatInteresting idea, however, I am not sure if there is an application for it.
I'd assume that in most cases one would want a prefix32 instruction as well, and then the prefix16 would be the "compressed version"? So you'd have for example a 64-bit prefix32 instruction and then a compressed 48-bit prefix16 instruction?
I'd assume that most instructions >32-bit will be infrequent enough so that the additional decoder cost would not be worth the decrease in code size.
In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm) = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128 byte
Not all instructions have an immediate that's a power of two.
Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.
Also in the load immediate format I have assumed func3 = 001 .. 111 but if you are allowing rd (registers x0..x7) in the jump_and_link format anyway (which is a good idea), one _could_ also decide to use rd (registers x0..x7 rather than x8..15) for loading immediates, using 1 bit of func3 but little extra cost to the encoder, effectively increasing the register range to x0...x15.I'd argue one would never want to use load-immediate with x0..x4.However, I think it might be interesting to allow load-immedate to address t1/t2 (x6/x7)
instead of s0/s1 (x8/x9).
But I have not proposed this because I didn't want to make it more complex than necessary.
regards,- Clifford
You seem to have dropped the "everything can have an immediate" bit but I guess you just do fusion on things likeLLI rd imm64C.add rd rs2
I really like how you reuse existing (i.e. necessary anyway) decoders. So how about reusing the RVC decoder for the prefix like so:
| 4 | 3 2 1 |
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|-----------------------------------|---------------------------------------------------------------|
... immediate |0 0 0|len | rd | 00| 11111 | JALI
... immediate |func3|len | rd' | 00| 11111 | LDI
... immediate |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len | rd' | 01| 11111 | PX16
... funct7 | rs2 | rs1 | f3 | rd | opcode5|op|func3|len | page| 10| 11111 | PX32
...*********************************TBD******************************|func3|len | page| 11| 11111 | PX48
Could easily be done in a prefix-format instruction that's 16-bit longer.
Interesting idea, however, I am not sure if there is an application for it.The prefix16 is an alternative approach to your "packed format" with some register bits traded for some additional immediate (or function selector) bits, and the prefix32 format is what you call prefix format.
In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm) = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128 byteNot all instructions have an immediate that's a power of two.The hypothetical bfxp instruction with 3 registers and 21 bit immediate would actually fit as a 48 bit instruction of type prefix16 with len= 0b000 (length(immediate) = 16) which gives 3 (popular) registers + 16 bits immediate in the immediate field + 5 bits of immediate in bits 16...31.
Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.bit 5..6 and the len field bit 10.. 12 determine the length of the instruction uniquely
length(instruction) = 16*(bit[5:6] + ((1<<len) & 127 ))
I'm realizing that I don't quite know what the goals are for >32bit instructions. I can think of a couple:
- Add larger JAL & LI instructions
- Allow the bit manipulation or other extensions to define longer instructions
- Add encoding pages for new "32-bit" instructions (which would actually be 48-bits because of the prefix)
- Produce versions of shorter instructions with additional immediate space
I think this covers everything proposed so far, but I may be missing something.
The first two points make sense to me. (3) does feel like an immediate priority, but is probably worthwhile while we're at it.
(4) I'm more skeptical of: the current designs would allow longer versions of any instruction even though many don't take immediates at all.
Even among the ones that do, there are tons of non-nonsensical combinations: 48.C.JAL would have 27 immediate bits compared to 48.JAL which would have 32. At the 64-bit instruction length, the extended version of nearly all 32-bit instructions would have 16+12=28 bits of immediate, less than the 32 they'd get from LUI+xxx or 48.LI+C.xxx. There are a couple sweet spots like where the 18bits from C.LUI + xxx is less than the 21-24 you'd have from 48.C.xxx, but if we focused in on those we might be able to come up with even better solutions there too.
(4) I'm more skeptical of: the current designs would allow longer versions of any instruction even though many don't take immediates at all.I've read this sentence multiple times and I just can't parse it. I also don't know what "Produce versions of shorter instructions with additional immediate space" means exactly. Can you rephrase that?Even among the ones that do, there are tons of non-nonsensical combinations: 48.C.JAL would have 27 immediate bits compared to 48.JAL which would have 32. At the 64-bit instruction length, the extended version of nearly all 32-bit instructions would have 16+12=28 bits of immediate, less than the 32 they'd get from LUI+xxx or 48.LI+C.xxx. There are a couple sweet spots like where the 18bits from C.LUI + xxx is less than the 21-24 you'd have from 48.C.xxx, but if we focused in on those we might be able to come up with even better solutions there too.I have no idea how any of that relates to my text. Sorry. Are we talking about the same proposal?This is my text: http://svn.clifford.at/handicraft/2019/rvlonginsn/proposal_2.txt
Hi,
On Wed, May 15, 2019 at 1:27 PM Rogier Brussee <rogier...@gmail.com> wrote:Interesting idea, however, I am not sure if there is an application for it.The prefix16 is an alternative approach to your "packed format" with some register bits traded for some additional immediate (or function selector) bits, and the prefix32 format is what you call prefix format.I don't see how any of that answers the question whether there's an application for it.
In this scheme it may be easier to have Len encode the length of the immediate: bitlength(imm) = 16* (len != 0b111)?(2^ len): 0 bits, which gives a maximum immediate length of 64*16 bit = 128 byteNot all instructions have an immediate that's a power of two.The hypothetical bfxp instruction with 3 registers and 21 bit immediate would actually fit as a 48 bit instruction of type prefix16 with len= 0b000 (length(immediate) = 16) which gives 3 (popular) registers + 16 bits immediate in the immediate field + 5 bits of immediate in bits 16...31.Again, not sure how that addresses the point that not all instructions have an immediate that's a power of two.
Regarding the implication that 3 registers and 21 bit immediate wouldn't fit in a 48-bit encoding with my formats: That's simply not true!
Here it is, using the regular 48-bit packed format:|7 6 5 4 3|2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0||-----------------------------------------------------------------------------------------------|| start | length | dest | f2| rs2 | rs1 | len | rd | op| 11111 | BFXP-48It's just a question of what percentage of the encoding space you would want to spend on it.And for the bulk of instructions >32-bit I'd assume that being able to just easily address all registers, and not having to fight for each and every bit of encoding space,
would be more valuable that occasionally saving 16-bit, at the cost of more complex instruction decoders and a far more cluttered encoding space.Consider the BFXP instruction as described in my Appendix II:| 6 5 | 4 | 3 2 | 1 ||3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0||-------------------------------------------------------------------------------------------------------------------------------|| start | length | dest | f2| rs2 | rs1 | f3 | rd | opcode | len | 00|page | 00| 11111 | BFXPThere's enough encoding space here to define 65536 such instructions, only in the standard prefix encoding field in op=00. (If I wanted to fill the entire reserved space op=01 and op=10space with those instructions I could squeeze 4M of them in there.)Of course you can squeeze it in 48-bit. But even removing 6 bits (2 bits for each of the 3 registers) will only give you back 64x the encoding space. So in effect, by making ita 48-bit instruction, even though you reduced the space needed for the registers, you just made the instruction 1000x more expensive in terms of the relative encoding space itoccupies.
An fringe instruction that occupies 1/65536 of the space it's sitting in, and is doing the thing it's supposed to do well: No discussion about encoding space.
An fringe instruction that occupies 1/64 of the space it's sitting in, _and_ has weird limitations: Not really and interesting proposal I'd say.
Of course, 1/65536 is extreme. But we can only increase instruction sizes in steps of 16 bits. So with some instructions, such as this one, we only have the choice between taking a huge chunk of an encoding space, or a tiny chunk of the next larger encoding space. And I think in this case tiny chunk is the right choice.But even assuming one really want's the prefix16 format with it's limitations, I still think that it would probably be better to just use my packed format, just with rs2', rs1', and rd' instead of rs2, rs1, rd:| 4 | 3 2 | 1 ||7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|----------------------------------------------------------------------------------------------------|... immediate | funct7 | rs2 | rs1 | len | rd | op| 11111 | regular packed format... immediate | funct9 | rs2'| f2| rs1'| len | f2| rd'| op| 11111 | modified packed formatyes, the two f2 fields are placed a bit awkwardly, but I think this would still be a better encoding. For a few reasons, but the main one is that then it can share the same "op" space as regular packed instructions, and use bits in funct7 to distinguish the details of the instruction format.
In your proposal,
| 4 | 3 2 1 |
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|-----------------------------------|---------------------------------------------------------------|
... immediate |0 0 0|len | rd | 00| 11111 | JALI
... immediate |func3|len | rd' | 00| 11111 | LDI
... immediate |func3 |imm3 |rs1' |i2 |rs2' |op|func3|len | rd' | 01| 11111 | PX16
... funct7 | rs2 | rs1 | f3 | rd | opcode5|op|func3|len | page| 10| 11111 | PX32
...*********************************TBD******************************|func3|len | page| 11| 11111 | PX48
by using different op encodings for the two formats, you are either assuming that both formats will see and equal encoding space pressure, or you are knowingly throwing away encoding space.
Using funct7 is much more flexible. For example, if 1/4th of the encoding space should be used for the modified format, one could say instr[31:30]=11 is the modified format and other values is using the unmodified format. Or, one could reserve instr[31:30]=01 and instr[31:30]=10 and whenever we run out of encoding space in one of the categories _then_ we know for which one is greater demand and can define one of the reserved values accordingly.Also, generally you want to make it as simple as possible to determine the length of the instruction. So if one depends on the other, you want the decoding of the instruction type to depend on the decoding on the length, never the other way around.bit 5..6 and the len field bit 10.. 12 determine the length of the instruction uniquelylength(instruction) = 16*(bit[5:6] + ((1<<len) & 127 ))Whereas in my proposal, ignoring reserved spaces, the length is 16*instr[3:2]+48.I'm proposing a function of 2 bits whereas you are proposing a function of 5 bits.
Which one is simpler to decode, assuming you already know instr[4:0] = 11111?If you have a dual-issue pipeline, or do instruction fusing, you need to be able to decode multiple instructions in parallel. And the entire decode for the 2nd instruction depends on the length for the first instruction. So that better be as quick as you can possibly make it.
Again, not sure how that addresses the point that not all instructions have an immediate that's a power of two.????To illustrate that the number of bits of the immediate of the instruction does not have to coincide with the 16* ((1 <<len) % 128) number of bits following bit 32 I used the bfxp instruction with 21 bits of immediate.
On second thought, perhaps you mean you cannot encode e.g. a "load a 48 bit integer" LLI48 instruction of length 64 bit ?
Regarding the implication that 3 registers and 21 bit immediate wouldn't fit in a 48-bit encoding with my formats: That's simply not true!I don't think I implied that,
Hi,I've now added a compressed-packed format, reworded the proposal a bit (I hope for the better :),and changed the load-immediate format to be able to write to t1/t2.
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
----------------------------------------------------------------------------------------------------|
... | funct7 | rs2 | rs1 | f3 | rd | opcode | len | 00|page | 00| 11111 | PFX
... immediate |f| len |ssp| rd^ |spc| 11111 | LDI
... immediate |f| len |ssp| rd |spc| 11111 | JAL
imm16... | funct7 | rs2'| f2| rs1'|f89| immediate[15:0] | len |ssp| rd' |spc| 11111 | C#1?
imm16... | funct7 |f89| rs2'| f2| rs1'| immediate[15:0] | len |ssp| rd' |spc| 11111 | C#2?
imm16... | funct7 | rs2 | rs1 | immediate[15:0] | len | rd |spc| 11111 | PACK
For comparison, the standard 32-bit format (indentation added for clarity, to line up with the above):
| 3 2 1 |
|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|---------------------------------------------------------------|
| funct7 | rs2 | rs1 | f3 | rd | opcode(7bit)| 32-bit format
| imm[5..11] | rs2 | rs1 | f3 | imm[0.4]| opcode(7bit)| 32-bit S-format
| imm[0..11] | rs1 | f3 | rd | opcode(7bit)| 32-bit I-format
| imm[12..31 | rd | opcode(7bit)| 32-bit U-format
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
----------------------------------------------------------------------------------------------------|
... funct7 | rs2 | rs1 | f3 | rd | opcode | len | 00|page | 00| 11111 | PFX
... immediate |f| len |ssp| rd^ |spc| 11111 | LDI
... immediate |f| len |ssp| rd |spc| 11111 | JAL
... immediate | funct9 | rs2'| f2| rs1'| len |ssp| rd' |spc| 11111 | C
... immediate | funct7 | rs2 | rs1 | len | rd |spc| 11111 | PACK
* in the pack format, bits 15 and 16 are bits 16 and 17 of the immediate, respectively* in both C formats and pack, immediate bits beyond 18 are optional and are the same *number* of immediate bits (as in original-revised)* funct7 lines up everywhere* rs2 and rs1 line up in PACK and PFX formats. C is a little more complex (as in original-revised)* the 32-bit format is now on a 16-bit boundary* comparing PFX to C and PACK: f3, rd, opcode and f89 are on the same boundary which has those same bits decoded as immediate, learning from (and copying) how the original 32-bit format does things.* again, not enough knowledge of C to say if C#1 or C#2 (or even a hypothetical alternative that better lines up bits of rs1/2' and rs1/2) would be preferable.questions:* would it be worthwhile extending the "f" field of LDI and JAL to 2 bits, not just for aesthetic reasons but to get bits 15:0 of the immediate to line up with C and PACK formats?
I think there is There is a miscount somewhere since opcode has 7 bits and so there is no room for an f89.
| 4 | 3 2 | 1 | |7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
^ ^ ^ ^
so that's: 38, 39, 40, 41, 40, 42...
whew, good catch, rogier, i totally missed it as well.
l.
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
|-----------------------------------------------------------------------------------------------|
immN0 | funct7 | rs2 | rs1 | f3 | rd | opcode7 |f| len | 00|page | 00| 11111 |
PFX
... immediate[N..15] ... immediate[14:0] |f| len |ssp| rd^ | sp| 11111 | LDI
... immediate[N..15] ... immediate[14:0] |f| len |ssp| rd | sp| 11111 | JAL
imm15 | funct7 | rs2 | rs1 | immediate[14:0] |f| len | rd |spc| 11111 | PACK
imm15 |fn789| funct6 | rs1'| f2| rs2'| immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CArith
imm15 |fn456|func3|imm?which? | rs2 | immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CStackST
imm15 |fn567| func4 | rs1 | rs2 | immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CReg
^^^ imm[N..15]
[well google groups royally screwed _that_ up... *sigh*, let's try again, and in case google screws up a 2nd time, it's attached in plain text. remember to use a fixed-width font editor]----i took a look at the RVC format, and it's nothing like the 32-bit format. the key difference is: rs2 is where rs1 is placed, and rsd/rs1 is placed where rs2 is placed, if you line up the bits so that at least funct and the 5-bit boundaries line up.so... there's no point trying to optimise the format to fit the rs/rd registers, however there _is_ a point to lining up the immediate to match imm[15:0].adding in some of the C formats (the ones that make sense):
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0|
|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|F E D C B A 9 8 7 6 5 4 3 2 1 0|
|-----------------------------------------------------------------------------------------------|
immN0 | funct7 | rs2 | rs1 | f3 | rd | opcode7 |f| len | 00|page | 00| 11111 | PFX
... immediate[N..15] ... immediate[14:0] |f| len |ssp| rd^ | sp| 11111 | LDI
... immediate[N..15] ... immediate[14:0] |f| len |ssp| rd | sp| 11111 | JAL
imm15 | funct7 | rs2 | rs1 | immediate[14:0] |f| len | rd |spc| 11111 | PACK
imm15 |fn789| funct6 | rs1'| f2| rs2'| immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CArith
imm15 |fn456|func3|imm?which? | rs2 | immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CStackST
imm15 |fn567| func4 | rs1 | rs2 | immediate[14:0] |f| len |ssp| rd' |spc| 11111 | CReg
^^^ imm[N..15]
* CReg is actually identical to PACK (just with rs2 and rs1 reversed) once funct bits 5, 6 and 7 are added, so is redundant.* CStackST (Stack-relative store) is too awkward: the immediate from stack-relative store messes with the positioning (and dynamic extendability) if you try to keep imm16 in the same place as well as immed[15:0] in the same place. how do you extend the instruction to use imm16+ and place the bits between funct3 and rs2 in that? too much of a mess. scrap it.* CArith sort-of makes sense as long as rs2' and rs1' are put back in their original slots (clifford you swapped them). rs2' lines up with the 32-bit decoder, the 3 bits for x8-x15 still match up with their 32-bit equivalents, just with a hard-coded 0b01 for the top 2 bits. however.... rd is *already* specified in bits 7-9 of the extended format, funct6 doesn't line up...* CWideImmediate doesn't make *any* sense (unmodified): rd' already exists in the proposed 48+ bit format, the imm field already exists in PACK...basically, fitting C in here is going to take a lot more thought.
personally, i would feel much happier with a completely revised C format (with a reservation made to be able to do that over time), that better fits the fact that rd' is already part of the proposed 48+ bit format.especially given that... hmmm... who was it... xan phung? sorry, it was over a year ago now: the guy who did that study and came up with RV16,
an alternative Compressed format that saved a whopping *25%* code-space *compared to RVC*.so that leaves:
| 4 | 3 2 | 1 |
|7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6|5 4 3 2 1 0 9 8 7 6 5 4 3 2 1<span styl
imm[N:0] | funct7 | rs2 | rs1 | f3 | rd |opcode5|op |len |f | 00|page | 00| 11111 | PFXThis looks like a phoney 16 bit instruction (phoney because its op field = 11) followed by a 32 bit instruction for the decoder
especially given that... hmmm... who was it... xan phung? sorry, it was over a year ago now: the guy who did that study and came up with RV16,That would be me, based on Xan Phung, who in turn based himself on my Xcompressed proposal. :-)
On Friday, May 17, 2019 at 8:52:14 AM UTC+1, Rogier Brussee wrote:imm[N:0] | funct7 | rs2 | rs1 | f3 | rd |opcode5|op |len |f | 00|page | 00| 11111 | PFXThis looks like a phoney 16 bit instruction (phoney because its op field = 11) followed by a 32 bit instruction for the decoder... [same for CADD/PACK, and for LDI and JAL]that's right, they do, which is why, we can surmise, clifford concluded that adding C-based instructions would be redundant/unnecessary.or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?
or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?Because the load immediate versions already use the restricted register for rd (I think to have room for LI of len bit version of integers, and JAL, single and double precision floats).and it is sort of a 16 bit prefix to something fitting in 16 bit and an existing 32 bit formats followed by a variable length standard bitorder immediate. Therefore it makes senseto reuse a _decoder_ that can already handle the 16 bit chunks, and (have the option of) pretending to the _decoder_ that you do some sort of instruction fusion.
If you want 128 registers, which I understand, it makes perfect sense to have one of these 16 bit PREFIX32 prefixes encode for being followed by an entirely non standard 32 bit format with 7 bit rd rs1 rs2 register specifications 11 bits of opcode, and a variable length standard bit order immediate. You just would not be able to reuse the standard decoder, which is just fine.
On Friday, May 17, 2019 at 10:49:17 AM UTC+1, Rogier Brussee wrote:or... the other question to ask is: why are the proposed 48+ bit formats being restricted to subsets of the registers (pseudo-Compressed) at all?Because the load immediate versions already use the restricted register for rd (I think to have room for LI of len bit version of integers, and JAL, single and double precision floats).and it is sort of a 16 bit prefix to something fitting in 16 bit and an existing 32 bit formats followed by a variable length standard bitorder immediate. Therefore it makes senseto reuse a _decoder_ that can already handle the 16 bit chunks, and (have the option of) pretending to the _decoder_ that you do some sort of instruction fusion.ok so yes, it makes sense... *for those instructions* [the extended JAL and extended IMM].that leaves the case for imposing the 16-bit reduced format on the rest - the entirety - of the 48+ encoding space. what is the case (justification) for that?
to make that clear: are all future instructions (not JAL, not IMM) going to be forced to have that reduced rd' field, in perpetuity?
this seems to be a serious (unacceptable) imposition.If you want 128 registers, which I understand, it makes perfect sense to have one of these 16 bit PREFIX32 prefixes encode for being followed by an entirely non standard 32 bit format with 7 bit rd rs1 rs2 register specifications 11 bits of opcode, and a variable length standard bit order immediate. You just would not be able to reuse the standard decoder, which is just fine.this is not what we are doing. what we are doing instead is keeping the 32-bit instruction format identical (not just because of the reduced decoder logic but because we need the concept of extending *ALL* 32-bit instructions UNMODIFIED), embedding it into the 48-bit format, and utilising what precious few bits are remaining to *extend* rs1, rs2 and rd.turns out that we only have one bit spare per register (due to the need to add predication as well), so that one bit is "if zero, register is a scalar in range 0b00000..0b111111" and "if set, register is a vector in range 0b0000000..0b1111100" i.e. the register number is shifted by 2 bits.
it's by no means perfect, however the space is so extremely limited there's not much else can be done. going to 64 bit isn't really acceptable as the penalty for doing so is a 25% increase in the instruction cache size (with associated appx 40% increase in power consumption of the same)
l.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/a5e91093-a9b7-4e17-91a3-ca9f84bcf322%40groups.riscv.org.
I'm going to have to disagree, probably because I don't understand which scenario you're thinking of.
If there is an "official" RISC Foundation sanctioned change to an existing instruction, then that change would be trivially be detected dynamically simply by executing some instruction that behaves differently and seeing what the result is.
Compilers, etc, must be told which version of the architecture they are compiling for,
since the mvendorid-marchid-isamux proposal won't work.
If you have a custom extension that steps on reserved opcodes, then you're kind of on your own.
If you have a custom extension that steps on opcode used by some other unimplemented extension, then you need to tell the compilers you don't have the extension so it doesn't try to fgeerate instructions that use it.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/35d71e9b-49e5-43c0-982d-44a5544da985%40groups.riscv.org.
Ok :) whew
>
>
> I would agree that making a backwards incompatible change to the architecture without some way to execute old code (and there are many ways to do that, not just isamux) is, um short-sighted, shall we say. Simply trapping on ops that are either changed or no longer supported is the easy, and most likely, fix for that.
Think it through: a "legacy" design compatible only with the "old" official spec would potentially have no way of trapping, particularly on "Base" extensions.
Some vendors are *not* supporting the "please disable this extension" capability, which, even if used, would generate an absolutely awful number of traps as the *entire* official extension's opcode space would now be effectively software emulated.
And if the change was in say RV64I, we are *really* in trouble as far as trap volumes are concerned!
This is why binary incompatibility - even one single bit change in an "official" extension - is an absolute unmitigated disaster for the entire RISCV ecosystem.
I would go so far as to say that it would be better to start again from scratch, with RISC-VI (RISC-6 for those people not familiar with Roman numerals). At least then the full implications of the binary incompatibility would be properly appreciated.
> You can even automatically patch the code if it encountered to avoid performance penalties.
Yes that would actually work. A full decompilation or full RISCV to RISCV JIT engine, translating either statically or dynamically to the "new" format.
I would however expect most vendors to hit the roof if this was made a mandatory requirement to support both "legacy" and "new" official RISCV binaries.
Plus, someone would need to develop the software tool, and the "new" meaning could NOT be ratified until it was absolutely known that the software JIT / decompile-recompile approach worked 100% reliably.
Again, hairy nightmare basically.
>
>
> I don't agree that a generalized isamux is the best way to go. I think point solutions (as my example above and your little/big endian switch) are far easier to implement and validate.
The thing is that if the big/little dynamic switch is acceptable, that will go in 1 bit of a dedicated official "switching" CSR, yes?
Now let us suppose that there is another problem that needs solving (either emergency or just "a damn good idea, so good that the performance enhancement cannot be ignored" such as RVCv2).
Where would the logical place be to have the CSR bit that switches the meaning of RVC opcodes from v1 meaning to v2 meaning?
Right next to the bit that switches LE/BE mode, of course!
This *is* the isamux concept.
Thus, in the very incremental fashion that you describe, more bits over the years (decades) accumulate, none of them disruptive, all of them very carefully planned and managed. One bit added at a time.
The next logical progression of this is to split the CSR into "officially reserved" space and "custom useable space", and that is the full extent of the isamux proposal.
Really very simple and, when the alternatives are fully evaluated and found to be unfortunately unacceptable or just plain dangerous, we are left reluctantly with isamux as the only palatable option.
That is not to say that it should be abused! It is designed for really, really serious situations (or advantages so compelling that they cannot be ignored), because the work needed particularly on the binutils and gcc side is simple enough but needs to be really carefully architected and thought through.
About that: Jacob Bachmeyer and I went over it, you have to emit the full triple, the mvendorid-marchid-isamux from the gcc assembler backend (a table managed atomically by the FSF *NOT* the RISCV Foundation), and binutils picks that up and generates the required CSR isamux switching (in and out).
This means that precompiled binaries with eg bigendian in one and littleendian in another will actually work and run on the same system that supports BE/LE isamuxing.
It's kinda simple and kinda complex at the same time. Takes getting used to.
> In fact the architected solution (XS, FS) probably works just fine for any custom extension that you might want to turn on and off.
Slightly confused, apologies. Custom extensions are partly on their own, partly given "safe breathing space" by isamux, of much greater concern as described above is switching on or off "official" extensions, partucularly as doing so is entirely optional and I know of no system that has even done it, probably precisely because it is optional, and complicates the instruction decode phase quite a lot.
Remember I warned 18 months ago about what happens when a Standard makes things optional? Sigh.
>
> Far uglier architectures have survived for far longer than 25 years with less.
:)
>
> The real solution is not to ship binaries, but only ship intermediate representations that can be complied to the final HW.
Sigh I agree, if LLVM (and other JIT approaches) were 100% the norm, right across the board, we would not be having this discussion.
Reality is however that the most critical piece of the puzzle - the linux kernel, being the highest volume of systems out there - has only just had the last c variable dynamic data structure removed about 6 months ago, which was preventing and prohibiting it from being compiled by clang-llvm.
Point being, it is far too early and with RTOS vendors having to go through the same pain barrier, LLVM and other JITs are just not going to see the level of adoption any time soon, if at all, that would make the need for isamux go away.
Sigh.
L.
--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/f86fe0fa-e5b3-45ef-b576-2647b920d21d%40groups.riscv.org.
Sure. Let's change the subject line too, should have done that 5 msgs ago.
> From your description they both seem to just be a collection of bits indicating what extensions are enabled.
Not quite.
Two key issues,
1 isamux is designed to allow EXISTING instructions, even ones in RATIFIED extensions, to have ONE (or more) opcodes change meaning.
In the case of LE/BE the dynamic change of the CSR bit associated would actually affect *multiple* instructions on *multiple* extensions, past present *and future* (RV128, QUAD FP LD/ST).
MISA was most definitely and clearly never intended for that purpose.
2. MISA can be readonly (WARL is it called?) and the choice of whether to be read only or writable is up to the vendor.
isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.
ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example.
> It sounds like you are concerned about how current implementations are choosing to make that register read only.
This is misstating things. I am not personally concerned, I can however perceive that, from my experience in writing long-lasting standards, letting MISA be readonly OR writeable was a costly mistake.
Extensibility in standards *has* to be managed with some very specific rules, that the RISCV Foundation violated in this instance.
> However, although none of the current extensions require that their feature bits in misa are writable, a future extension could totally say that "this extension must start disabled and be turned on only by writing to misa".
It could.. that does not help at all with existing hardware. Also, the source of the problem is that it is *not mandatory* to have extensions be disableable.
Oh. I just remembered. It has been so long.
3. The other key difference is that MISA requires DESTRUCTION of extension state information. It is LITERALLY a kill-switch.
isamux DOES NOT DO THAT. isamux is an INSTRUCTION ENCODING "lengthener", that may be viewed as "adding hidden bits to the instruction" using escape-sequence methodology (a CSR).
The example LE/BE CSR bit for example is as if you now had a 33 bit instruction.
Thus, hypothetically (or more like actually) a major extension behind isamux would ALSO need a MISA bit, because whilst the MISA bit would disable the extension (and destroy any state info), isamux would NOT.
This has implications for context switching. MISA state must STILL BE SAVED for BOTH extensions behind an isamux because BOTH EXTENSIONS ARE STILL ACTIVE.
So it really is truly a different purpose. And has no associated delays on switching. isamux bits *literally* plug directly into the instruction decode because they are literally the 33rd (and 34th, and 35th) bits of the instruction.
NOT an on/off kill switch for an extension.
> In fact, it is likely that the UNIX-class platform spec will require all extensions with user-visible state to start disabled.
If there exist even one hardware vendor (that has spent millions on hardware that does not support that), that is not possible. It's far too late.
Are we going to have disparate kernel binaries, one for "legacy" hardware and one for "clean extension state"? Suggest it on sw-dev and see how far the idea gets (hint: wear fireproof jacket) :)
Changes of that nature would be disastrous to existing hardware that was expected to be backwards compatible with such a drastic spec change.
I have to remind people, this is why it is such a seriously bad idea to hold secretive cartelled discussions such as having a closed UNIX WG list.
> Extensions being mutually exclusive also wouldn't be a problem because the misa register is already WARL.
>
>
>
> I'm also not following why there would ever have to be an "emergency update"
> that changed the meaning of existing RV64I instructions.
Conflation in your mind allowed you to believe I said that. I did not. However I will follow up with a separate post on the topic.
A likely emergency scenario would be.. I dunno... hmmm, perhaps that there was not enough analysis done of a particular instruction, or that insights came to light only well after public consultation and were ignored even then, and, far too late, ratified and silicon released...
... and then it turns out that yes, a mistake was indeed made that had far reaching damaging consequences. A CSR needed changing from WARL to something else (the privspec 1.11 notes precisely such a change. I have not analysed it).
> What is wrong with adding instructions with new opcodes using the normal extension mechanism?
This was again discussed 18 months ago, it was actually the reason for the discussion in the first place.
Count the number of spare major 32 bit opcodes available for official extension usage. (answer: none. There are only brownfield spaces left. 2 major opcodes are reserved for RV128. 2 for custom space).
After the brownfield is officially used up, we are forced to go into 48 bit.
Or... there is always the possibility of using isamux.
isamux could be used for example to declare that the entirety of the RVV opcode space is OFFICIALLY available either for custom extensions or for other official RISCV extensions.
All that would be needed: set isamux "bit 2" equal to zero, and the changes are MUXed out: the RVV opcode space HAS to be treated as "raises an unimplemented trap" (thus allowing software emulation of RVV), and if the bit is 1 then it HAS to be implemented as whatever-official-MUXed-in-opcodes-sit-in-that-space.
What is fascinating to me is that BOTH options may result in unimplemented traps, allowing vendors the option to software emulate BOTH isamuxed instruction encodings!
> Are there any example of these emergency changes for prior ISAs?
Prior ISAs? I don't know my ISA history that well.
The examples that I know were quoted from the discussion 18 months back were
Intel, as THE absolute rock solid canonical benchmark / example of how to stick to your guns on the ISA. They have taken backwards compatibility to the absolute inviolate limit, even when 8086, 186, 286, 386 and 486 compatibility makes an absolute pig's ear of the layout.
You should see the ASIC photos for the bit that covers legacy instructions, compared to the rest of the design, it's hilarious.
The other example I already mentioned, it's Altivec / SSE conflict in Power PC. There were people with experience of PPC who confirmed what should be blindingly obvious but clearly wasn't to the muppets that decided to reuse the same opcodes to create utterly incompatible binaries.
MIPS bless 'em probably have something similar, although because of the more proprietary and embedded nature of MIPS it is far less impact because, well, embarrassingly, there isn't a public ecosystem to speak of.
ARM have not made the mistake surprisingly. They're big enough and ugly enough. The switch to hardfloat 10 years ago was painful for distros but was executed cleanly precisely because there were no incompatibilities, only emulation needed.
Others with more historical knowledge will know some examples, I'd love to hear as I do find them both useful and also funny.
L.
Several time there have been complaints that RV32 integer is binary incompatible with RV64.
This is a costly mistake given that it has been demonstrated that RV32 binaries are smaller than their directly-compiled RV64 counterparts.
"solutions" to this have been offered by way of switching the hardware to 32 bit mode... except of course *no actual hardware exists* that supports that as an option because, surpriiise, it's optional.
I know *why* the meaning of the ADD opcode changes in the 64 and 32 bit execution environment, it's because otherwise you need Pascal Triangle opcode proliferation to interface between 32 and 64 bit, and 32 and 64 and 128 bit, and 64 and 128 bit, just as exists in the FP opcode space.
With so many more options in the INT space for such proliferation (ANDing, ORing, xBitManip.in future), it made some sense to save opcodes by allowing the meaning of the opcode to change.
At the cost of binary incompatibility, with associates increase in executable size.
If RVV had been put into the 48 bit space (where it really belongs, given that predication is limited, and so is the total numbedr of vector registers, a future version is going to *have* to exist in 48b anyway), if this had been properly openly discussed, this would potentially have come up, the space reserved for RVV freed, and the INT ops allowed to mirror the strategy used in FP, giving us binary RV32/64 compatibility in the process.
There does however exist a way in which, HYPOTHETICALLY, we as a community may actually be empowered to realistically discuss such an optional and drastic yet COEXISTING beneficial change to RISCV ISA at such a fundamental level, yet still in a nondisruptive fashion:
isamux.
With for example "bit 4" of isamux ratified officially for such an endeavour, exploration of enhancements to RISCV that would allow RV32 binaries to execute on an RV64 system could actually be seriously considered and evaluated.
However Johnathon this is not an "emergency" spec change, that was a conflation that can be cleared up by rereading my post more carefully.
It is however an example of a beneficial change that may prove to be sufficiently compelling as to be worthwhile pursuing, where as of right now, without isamux, it is absolutely off the table and not even worth pursuing except as an academic exercise.
Other examples include RV16 (bit 5 of isamux), adding FP clip (missing from base but present in RVV - bit 6 of isamux), SIMD INT clipping and rounding suitable for audio usage as outlined by AndesStar (bit 7 of isamux), auto switching of opcodes to SIMD, in both the FP and INT spaces (bits 8 to 9 of isamux, to cover 8/16/32/None SIMD splits of standard RV register files)
The possibilities are endless and *do not happen* without isamux, not in the 16 bit or 32 bit space, because there is no more room.
No, not getting Ratification because of using standard RISCV encodings for custom usage, and more to the point having to maintain a hard fork of the entire software ecosystem, is NOT a viable or realistic option.
Yes it was already mentioned that Redhat would take a dim view of anyone forking the Fedora distro for the purposes of supporting their own custom (incompatible) ISA extensions. This for the damage and harm it would do to the Fedora community, and as Redhat is a trademark which would be harmed by such attempts, they would be justified in flexing some legal muscle.
isamux is a way out of a huge number of corners that the whole RISCV community has accidentally been backed into.
L.
for "isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.
ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example."
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2fda9f04-c9ad-4870-a5bc-cb9237117e09%40groups.riscv.org.
--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/a2731b3a-ee77-4a9f-951c-4bbb252390e7%40groups.riscv.org.
I agree that a BE extension would have a larger impact than the existing ones and may not have been considered when designing misa, but I don't see why that would actually be a problem. Lets say we decided to add some new extension called "H" with the following semantics:* all current processors hard-wire misa bit 7 to zero and thus are always in little endian mode* processors which support our extension must support writing both 0 and 1 to misa bit 7, with the bit starting out as 0 on boot. We further assert that if they do anything else they are in violation of the spec and non-conformant* whenever bit 7 is set the processor executes all loads/stores in big endian modeWhat actually goes wrong with this proposal? All legacy LE code will continue to work, and new code can choose either continue using LE code or use BE code when running on newer processors (and either refuse to support old processors or have a separate fallback path to run in LE). There is also no state that would be lost by switching modes with misa.
You said:for "isamux it is ABSOLUTELY CRITICAL that it be properly implemented as both readable and writable.ie if the bit is written to with a 0, the instruction encoding MUST switch to LE, and if 1, encoding MUST switch to BE, for example."This sounds completely wrong.
If I have an implementation that only implements BE or LE, then it can't switch, and the the bit should be RO.
You might trap if you try to write it to the value that isn't implemented, of course...
Alternatively, it is writable, and if written, it disables all Ld/St/Atomic ops and traps if you attempt to execute them.
What are you thinking here? It can't be that you require all implementations always implement both LE or BE or any other ops that is covered by isamux.
Oh, an Intel example:Because of the variable length nature of the encoding, ugly as it is, they are able to fit new instructions in without stepping on old ones - so they don't have that issues.
They also have an architected method of discovering which architectural features exist or don't. RISC-V has that to some extent, though (beside MISA), its very ad-hoc (e.g. try it, and if you trap - it isn't implemented. Write a CSR, read it back, and if its different, some bits aren't implemented or that combination isn't legal). I'm not real happy with that...
But, Intel has actually turned of functionality that has existed for 40 years. It's a slow, multi-year sequence to do that (this isn't the sequence, but it will give you a taste):- announce you are deprecating some instruction/feature- add some bit in a control register or fuse that turns it off- ship the next generation with the feature turned off by default (but a uCode patch can re-enable it)- ship further generations that don't implement the feature at allI'm probably missing some steps there,
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/6b4453da-c7bf-43bc-82be-f276df2cd146%40groups.riscv.org.
> What is fascinating to me is that BOTH options may result in unimplemented traps, allowing vendors the option to software emulate BOTH isamuxed instruction encodings!This would be a very cool capability but it wouldn't be all that usable in practice. If the unimplemented traps occur frequently (like they would from doing trap-and-emulate on every compressed instruction) then you are going to get something like a 100x slowdown. You'd be way better off using binary translation to "JIT compile" your program to the ISA your processor actually supports.
I agree that a BE extension would have a larger impact than the existing ones and may not have been considered when designing misa, but I don't see why that would actually be a problem. Lets say we decided to add some new extension called "H" with the following semantics:* all current processors hard-wire misa bit 7 to zero and thus are always in little endian mode* processors which support our extension must support writing both 0 and 1 to misa bit 7, with the bit starting out as 0 on boot. We further assert that if they do anything else they are in violation of the spec and non-conformant* whenever bit 7 is set the processor executes all loads/stores in big endian modeWhat actually goes wrong with this proposal?
--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/x-uFZDXiOxY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/fa475717-6b20-42ef-be13-7b9e15763a52%40groups.riscv.org.
Jacob correctly points out that we'd need a different letter than "H" but this is a trivial change. However, your concern about destroying state doesn't apply in this case because my proposed extension has no state.
> ISAMUX *does not permit that* because it is *not ISAMUX's job*. it is *literally* - plain and simple - hooked directly, permanently and irrevocably, into the instruction decoder, at the hardware level.This seems like just a really convoluted way of supported longer instructions. Instead of adding a 32-bit instruction which requires isamux=0x1, just add a brand new 48-bit instruction.
Your design also sounds like it would wreak havoc on disassemblers and static analysis tools because they wouldn't be able to know what isamux would be set to when the relevant code was run.
Long ago, I started the discussion with the suggestion that we add
something like ISAMUX to switch between custom instruction set
extensions.
The reason? Because of software portability.
Consider custom extension from vendor A, called Ax, and another
extension from vendor B, called Bx. Both vendors chose to use the
exact same opcode space to implement their extensions.
This works great for a while, as software that uses Ax is developed
and proliferates by some subset of users of vendor A, and other
software that uses Bx is developed and proliferates by a completely
different subset of users of vendor B.
Eventually, vendors A and B merge, and they want to develop a single,
new RISC-V processor that supports both Ax and Bx. Also, they want to
encourage software to be written by the open source community that
simultaneously uses Ax and Bx. However, because of the extensive
proliferation of Ax and Bx in separate software stacks, they cannot
alienate those users by dropping those instruction encodings.
Instead,
they must somehow allow both Ax and Bx to simultaneously exist on a
single RISC-V processor core, but software needs to switch between Ax
decode mode and Bx decode mode.
Using MISA is inappropriate here.
ISAMUX is the answer.
There was a lot of discussion months back on how useful this would be
etc etc. I suggested the solution is simple, but didn't actually give
one because I first wanted others in the community to understand the
need for this feature.
It seems people still don't understand the need yet.
That's ok, I'm
patient. It will happen eventually...
On Tuesday, May 21, 2019 at 12:01:20 AM UTC+8, Rogier Brussee wrote:
[]
> makes sense. You divide up your registers in blocks of 1 or 4.
Yes. Now why did I not have such simple words??
>
>
>
>
> Maybe you can use the last remaining reserved slot in RVC (Inst[0:1] = 0b00 Inst[13:15] = 0b100) ) as a 16 bit "prefix" for how to interpret the next instruction including 16bit or 48bit or longer ones. Gives 11 bits to play with.
I like it.
> Of course being reserved, you are practically guaranteed that some version of RVC v2 will eventually trample over your prefix and not be compatible.
Or xBitManip.
Btw just to check, there is no private cartelled discussion of creating RVCv2 without a wider public real time discussion, is there?
> However, assemblers and linkers will have to deal with RVC vs RVC v2 anyway,
>a nd all that not supporting RVC v2 means is that some 32 bit instruction would not get compressed, so the pain you get for having more bits might be worth it. And yes it is a hack.
If we did not need the RVC space for potential xBitManip,
* Backwards binary compatibility if modifications to an official RISCV Extension prove necessary or desirable is just not possible without a way for newer systems to dynamically support legacy *and* revised official ISA Standards.
* The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.
* Whilst Intel does do "retirement" of legacy ISAs on a very long timescale, this is only possible because of the monopoly position Intel holds. RISCV does not have customers, therefore "retirement" as deployed by Intel is just not an option.
* The expectation that everything will be fine in an unregulated use of the precious last 2 remaining 32 bit custom opcodes is completely unrealistic. Three independent high profile custom extensions that require upstream *mainline* gcc and binutils to forcibly accept patches due to sheer overwhelming demand is all it takes to create merry hell and mayhem, as two of them are guaranteed to clash.
* isamux adds 33rd, 34th, 35th and more actual bits to the instruction in a hidden fashion. It is *NOT* the same as MISA, which entirely disables an extension and destroys (resets) internal state.
* isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.
* Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.
* isamux is not optional, must be writable, and given that it is the 33rd, 34th etc. bit of the opcode, traps or hardware implementations *must* be implemented, for *all* permutations.
* Trying to treat isamux as read only is the same as trying to create an ISA encoding where bit 30 of the opcode is always set to 1. It makes no sense because isamux is literally a mandatory extension of the length of all opcodes.
* by the same logic, traps *must* respect isamux. It cannot be ignored. Ignoring isamux is directly equivalent to e.g ignoring bit 11 of any given opcode.
* isamux is not to be implemented lightly or treated as a tool that can be abused: it is intended for emergency or compelling or strictly absolutely necessary circumstances.
* through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation. Coordination with LLVM and other compilers will also prove necessary.
* static disassembly issues due to the escape-sequence-like nature of isamux changing opcode meanings will require some strict discipline in conventions to be defined and enforced by binutils, to set isamux in a LOCAL scope and to return it to its former value as quickly as possible. Actual conventions to be defined by gcc and binutils developers.
* adding new bits to isamux can be done piecemeal, incrementally, one bit at a tine. That's what it's for.
* having official bits reserved and having some bits available for custom extensions is just common sense.
* the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.
Realistic acceptance of the reality of the need for an isamux solution will come when the first crisis hits RISCV. It would be good, instead, to plan ahead and have isamux already in place.
L.
> * The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.When and by whom?
QEMU's binary translation of RISC-V to x86 (a radically different ISA than RISC-V) is quite likely the fastest implementation in existence.
> * isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.This seems to be purely semantics.
There is no limit to how invasive any extension can be, and you can absolutely change the setting of multiple settings simultaneously just by setting multiple bits in MISA.
> * Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.
Adding cycles to every context switch every done.
> * through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation. Coordination with LLVM and other compilers will also prove necessary.
Neither mvendorid nor marchid are visible to user mode software. Even if they were, there is no way programs are going to have a table of every processor ever made to try and determine which features they can use.
Also the idea that the RISC-V foundation / foundation members are just going to surrender control to the FSF sounds improbable to say the least.
> * having official bits reserved and having some bits available for custom extensions is just common sense.
> * the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.Custom bits in isamux is crazy.
Some bits will have conflicting meanings on different processors
and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.
Some read/write CSR fields are only defined for a subset of bit encodings, but allow any value to be written while guaranteeing to return a legal value whenever read. Assuming that writing the CSR has no other side effects, the range of supported values can be determined by attempting to write a desired setting then reading to see if the value was retained. These fields are labeled WARL in the register descriptions. Implementations will not raise an exception on writes of unsupported values to an WARL field. Implementations must always deterministically return the same legal value after a given illegal value is written.
The other key difference is that MISA requires DESTRUCTION of extension state information. It is LITERALLY a kill-switch.
The “G” bit is used as an escape to allow expansion to a larger space of standard extension names. G is used to indicate the combination IMAFD, so is redundant in the misa CSR, hence we reserve the bit to indicate that additional standard extensions are present.
Some bits will have conflicting meanings on different processors and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.
not if the compiler - the COMPILER - is the central atomic world-wide global registration point for mvendorid-marchid-isamux translation points
following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.can you please be more careful and open-minded in how you approach postings on this list?
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/922f93a3-dd50-4bcb-8c6d-29dab44c48e6%40groups.riscv.org.
Hi Luke --
following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.can you please be more careful and open-minded in how you approach postings on this list?This is unwarranted
and paints us as an unwelcoming community.
Jonathon's points are completely reasonable critiques of some very strong claims that you have made.
open-minded != agrees 100% with your implementation.
> * The burden of backwards and forwards binary compatibility has to be on the newer designs: efforts to do software traps (JIT or static translation) were shown to be undesirable or unworkable.When and by whom?
> * isamux is intended for potential use in changing meaning of opcodes in *multiple* Extensions, custom *or official*, right across the board (good example is BigEndian/LittleEndian) where MISA applies to just one Extension and one extension alone.This seems to be purely semantics. There is no limit to how invasive any extension can be, and you can absolutely change the setting of multiple settings simultaneously just by setting multiple bits in MISA.
> * Therefore isamux is *not* to be ignored during context switches: it has to be saved and restored.
Adding cycles to every context switch every done.
> * through the FSF, gcc and binutils are the atomic point of registration of mvendorid-marchid-isamux triples, matching actual meanings of opcodes to their actual functionality, NOT the RISCV Foundation. Coordination with LLVM and other compilers will also prove necessary.
Neither mvendorid nor marchid are visible to user mode software. Even if they were, there is no way programs are going to have a table of every processor ever made to try and determine which features they can use.
Also the idea that the RISC-V foundation / foundation members are just going to surrender control to the FSF sounds improbable to say the least.
> * having official bits reserved and having some bits available for custom extensions is just common sense.
> * the custom bits do not clash because the bits are to be recognised as triples (mvendorid-marchid-customisamux) in the FSF-managed atomic database.Custom bits in isamux is crazy.
Some bits will have conflicting meanings on different processors and you'll be telling us there
i hope and trust, dan, that by highlighting the areas which broke netiquette and basic communications rules on treating people with dignity and respect that this illustrates better how i may have been made to feel belittled and extremely unwelcome by johnathon's message.
johnathon: the message that you just wrote was a cascading series of misunderstandings, misreadings and belief-judgements, based on which you made closed-minded judgements.
following this kind of pattern does you a disservice, it does me a disservice, and it does other readers, now and forever in the archives of this list, a disservice.
can you please be more careful and open-minded in how you approach postings on this list?
i have not yet learned the way to do so whilst at the same time being "diplomatic"
i don't have to tolerate it, and i have every right to say so.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/f80ed4b7-cd9a-4d3a-84bd-7cfd0940ff18%40groups.riscv.org.
Hi Luke --I think you have a misunderstanding of what WARL means in the context of RISC-V CSRs.
This exact code can run on both machines, because only legal values can be written. Thus, WARL can be used by software to check for features on platforms which may or may not support them as well change modes in platforms which do.The reason I bring this up is that by mandating that all bits of isamux must be read/writable , you are losing this ability.
If software writes to isamux, changing the effective instruction to one that the processor does not support, you must trap.
There's no mechanism for machine mode software to determine what is supported so as to not call those instructions.
Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,
so then all current RISC-V hardware needs mandatory machine-mode emulation code in order to be compliant with the new spec.
The other key difference is that MISA requires DESTRUCTION of extension state information. It is LITERALLY a kill-switch.You've mentioned this several times in this thread but I've been unable to find anything confirming this.
Please point me to where in the spec this behavior is specified.
From my reading, it is perfectly legal to implement a custom extension which overrides the default behavior of other standard/non-standard extensions.
That is, using the custom extension space in misa as your 'isamux bits'.
Writing misa may increase IALIGN, e.g., by disabling the "C" extension. If an instruction thatwould write misa increases IALIGN, and the subsequent instruction'ss address is not IALIGN-bitaligned, the write to misa is suppressed, leaving misa unchanged.
If an ISA feature x depends on an ISA feature y, then attempting to enable feature x but disablefeature y results in both features being disabled. For example, setting "F" =0 and "D" =1 resultsin both F and D being cleared.
Here's an alternative, less disruptive proposal to solve the problem of multiple conflicting extensions that Guy mentioned. encoding 1 in misa means A is supported, encoding 2 means B is supported, encoding 3 means A+B=C is supported. C can also specify a 1 bit CSR which muxes between the two conflicting extensions.
As far as expanding the standard extension space, the privileged spec suggests a mechanism for that.The “G” bit is used as an escape to allow expansion to a larger space of standard extension names. G is used to indicate the combination IMAFD, so is redundant in the misa CSR, hence we reserve the bit to indicate that additional standard extensions are present.So there is an 'escape hatch' for if we need to override extension behavior. However, the hope is that since we're a RISC ISA, there's very little chance of modifying the behavior of user-mode extensions once frozen.
I've yet to hear a compelling case for doing so.
Some bits will have conflicting meanings on different processors and you'll be telling us there needs to be an isamuxmux for when two vendors with different mvendorid's merge and want to combine their extensions.This is true. The problem you're addressing is that custom behavior may become standardized and conflict.
Having custom bits in your isamux
presents the same opportunity for conflict. Unless...not if the compiler - the COMPILER - is the central atomic world-wide global registration point for mvendorid-marchid-isamux translation pointsSo every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?
There have been many heated discussions about why we don't even have a gcc multilib configuration for both softfloat and hardfloat, because the cross-product of _current_ possible configurations is too large. To expand this to every possible RISC-V platform/revision ever to be made is infeasible.
Let's be clear: this is CISC-V that you're proposing.
Variable, even dynamic-length instructions.
It's a very cool concept, but it is not in line with RISC principles.
It's also just a plain extremely disruptive change for a problem which it only partially solves.
The most likely path for isamux is to itself become a custom extension which simply provides this CSR (See the 'Counters' extension in the most recent unprivileged ISA for an example of this).
Then you're free to use it exactly as you've described. If it is useful, it may get adopted as a standard extension. (But more likely, conflicting extensions will implement their own arbitration CSRs, if necessary).
i hope and trust, dan, that by highlighting the areas which broke netiquette and basic communications rules on treating people with dignity and respect that this illustrates better how i may have been made to feel belittled and extremely unwelcome by johnathon's message.I am truly sorry that participating in this forum made you feel so.
I cannot speak for anyone but myself, but I hope that perceived rudeness stems from terseness rather than malice.
However, this is what I was responding to:
can you please be more careful and open-minded in how you approach postings on this list?
You did not address violations in netiquette.
i have not yet learned the way to do so whilst at the same time being "diplomatic"And yet you expect impeccable phrasing from the rest of us.
i don't have to tolerate it, and i have every right to say so.That's absolutely true. I encourage you to speak up when you feel the community is being unwelcoming, as I did.
the point is moot. by raising the issue, it allows me to understand that you've fundamentally misunderstood what isamux is. i mentioned it before, and it is worth repeating: isamux is LITERALLY a fixed quantity of extra instruction opcode bits that goes DIRECTLY into the instruction decode phase.
* if a vendor has only one isamux bit, they now have 33 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.
* if a vendor has two isamux bits, they now have 34 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.
think it through about what would happen if isamux was not deployed, yet an *official* change to a RISC-V extension was published.
does it also illustrate why having WARL capabilities on the isamux is... well... "silly"? and why i gave examples (earlier and in the summary) which said "making isamux WARL is as if you decided that bit 11 of the 32-bit opcode was set to 1"?
Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,
i remember hearing that DEC Alpha used this part-JIT, part-real-execution to create a binary translation of x86 which, on 2nd execution, was just as quick as x86.
the above scenarios are *awful*! emitting code that has to conditionally check if multiple features were enabled/disabled? moo? :)
The little-endian / big-endian debate was one such scenario, discussed.... last year. Japan - the entire industry of Japan - is running PowerPC. converting software specifically written for the *opposite* encoding used in the rest of the industrial world is... hopelessly impractical.
So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?
que? i'm lost. there's a cognitive break where some logical deductive reasoning hasn't been spelled out. can you elaborate on how you reached that conclusion (and perhaps tone down the disbelief somewhat? the phrase "surely you did not mean to imply that" would have helped)
why have other RISC systems in the past adopted it, then?
you realise that that 1 bit CSR *is* isamux? :)
precisely. at which point, the hell-on-earth moves from the conflicting extensions to the conflicting "arbitration CSRs".
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/51ef5bb0-0c98-4132-b067-8a9ce1ec6f8a%40groups.riscv.org.
the point is moot. by raising the issue, it allows me to understand that you've fundamentally misunderstood what isamux is. i mentioned it before, and it is worth repeating: isamux is LITERALLY a fixed quantity of extra instruction opcode bits that goes DIRECTLY into the instruction decode phase.This is an ISA forum. The microarchitectural implementation is an orthogonal concern.
And for good reason; as long as the uarch is compliant with the ISA, it's free to perform all sort of optimizations which will baffle and infuriate ISA writers. That's why a proposal which says that the definition of the entire ISA can change with a single instruction scares me.
High performance things like pre-decoding instructions, uop sequencing, memory consistency, LSQs, even something as simple as early branch prediction will be touched by this proposal (after all, how can you identify a branch if the definition of RV64I can change by the time the instruction is fully decoded?). Because of this separation, there is no concept of a "decoder" in the ISA.From an ISA perspective, isamux is a WLRL CSR
which switches out active extensions.
So how many bits is isamux at the ISA level?
It seems like it's not constant based on the following:
* if a vendor has only one isamux bit, they now have 33 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.
* if a vendor has two isamux bits, they now have 34 instruction bits going PERMANENTLY and MANDATORIALLY into the instruction decoder AT THE HARDWARE LEVEL.What happens when software writes 1-bit-vendor's isamux with a 2-bit value?
think it through about what would happen if isamux was not deployed, yet an *official* change to a RISC-V extension was published.That's the point of 'freezing' the base ISA + extensions.
So further extensions can build on top, using the standard mechanisms.
Like I say, I cannot picture a single scenario which would require overriding bits in a standard, frozen extension.
That's why I disagree with the argument that we should optimize for it at the cost of imposing software requirements on past and hardware requirements on future implementations.
does it also illustrate why having WARL capabilities on the isamux is... well... "silly"? and why i gave examples (earlier and in the summary) which said "making isamux WARL is as if you decided that bit 11 of the 32-bit opcode was set to 1"?Not only that, but hardware which is currently built without isamux will trap on isamux accesses themselves,It is not silly. It's totally legal for a processor which only supports user-mode spec to not have a trap mechanism at all.
If an isamux bit 0 determines LE vs BE and my system does not support BE, then software should be able to determine this and refuse to set the bit.
If my system has hardware emulation support for an alternate mode, then it's of course allowed to set the bit and trap. Mandating a JIT-mode with hardware emulation support to be spec-compliant is a non-starter for minimal RISC-V implementations.
So unless isamux is WARL, it fundamentally can't work for overriding user-mode standard or custom extensions.
i remember hearing that DEC Alpha used this part-JIT, part-real-execution to create a binary translation of x86 which, on 2nd execution, was just as quick as x86.These are massive, high performance chips that are able to provide such a mechanism.the above scenarios are *awful*! emitting code that has to conditionally check if multiple features were enabled/disabled? moo? :)This should only done once at boot time, saving the results. So while annoying, it's not a performance hit.
The little-endian / big-endian debate was one such scenario, discussed.... last year. Japan - the entire industry of Japan - is running PowerPC. converting software specifically written for the *opposite* encoding used in the rest of the industrial world is... hopelessly impractical.RISC-V already has had a similar situation. It now has Ztso in addition to its normal memory model. They mention LE/BE as another scenario for such an extension, without requiring changes at the individual instruction level. As far as I know, there's no need (except for academic interest) for a system which can switch LE/BE instruction by instruction.So every single new commercial, embedded, academic, and toy processor now needs to fork GCC/LLVM/intel's compilers/my research university's compilers just to be able to use it?
que? i'm lost. there's a cognitive break where some logical deductive reasoning hasn't been spelled out. can you elaborate on how you reached that conclusion (and perhaps tone down the disbelief somewhat? the phrase "surely you did not mean to imply that" would have helped)Say I have a processor that I'm developing. I do not have a compiler sanctioned vendorid or marchid.
Therefore I need to add my custom triple to gcc, no? Whereas current state of the art is just to use off-the-shelf gcc.why have other RISC systems in the past adopted it, then?
Perhaps I missed this in another thread, but I have never seen a comparable structure in another RISC ISA?
you realise that that 1 bit CSR *is* isamux? :)precisely. at which point, the hell-on-earth moves from the conflicting extensions to the conflicting "arbitration CSRs".Yes, I do. That's why I'm saying that a spec-mandated isamux is unnecessary.
When there is a conflict between two vendors and they want to support each other's extensions simultaneously, there is already a pathway to doing so (through a third extension with a mode bit).
It's much simpler to deal with this on a case-by-case basis
than to impose additional compiler, software and hardware requirements on all implementations moving forward.
llliiiitittterrrrrraallllllyyyy LITERALLY LITERALLY
not in this case. ignoring opcode instruction bits is not an option.
anyone that suggested that a given microarchitectural implementation was going to ignore bit 17 of all opcodes would cause quite a few laughs. and isamux is llliiiitittterrrrrraallllllyyyy the addition of a fixed quantity of extra instruction opcode bits into the decode phase.
which switches out active extensions.
NO. it does NOT "switch out" anything.
actually, with the isamux proposal, the hardware may DYNAMICALLY switch OFF the custom extensions, entirely. that is ENTIRELY the point and purpose of it. is this aspect of the isamux proposal something that you understood?
* in 2022 the new ratified LE/BE isamux standard comes out. ISAMUX bit 0 is chosen to represent it.
* in 2023 the new ratified RVC2 isamux standard comes out. ISAMUX bit 1 is chosen to represent it.
* a vendor chooses to implement *only* LE/BE but has not implemented RVC2.
* the customers try to run a binary that is compiled with **BOTH** LE/BE **AND** RVC2.
* instructions with LE/BE work perfectly well
* instructions with RVC2 *TRAP* on the writing / setting of ISAMUX bit 0, and require JIT emulation, just as any "legacy" processor would require.
this is absolutely no different a situation from if the *actual* RISC-V instruction set were *already* 34 bits.
Intel, as THE absolute rock solid canonical benchmark / example of how to stick to your guns on the ISA. They have taken backwards compatibility to the absolute inviolate limit, even when 8086, 186, 286, 386 and 486 compatibility makes an absolute pig's ear of the layout.
You should see the ASIC photos for the bit that covers legacy instructions, compared to the rest of the design, it's hilarious.
The other example I already mentioned, it's Altivec / SSE conflict in Power PC. There were people with experience of PPC who confirmed what should be blindingly obvious but clearly wasn't to the muppets that decided to reuse the same opcodes to create utterly incompatible binaries.
MIPS bless 'em probably have something similar, although because of the more proprietary and embedded nature of MIPS it is far less impact because, well, embarrassingly, there isn't a public ecosystem to speak of.
ARM have not made the mistake surprisingly. They're big enough and ugly enough. The switch to hardfloat 10 years ago was painful for distros but was executed cleanly precisely because there were no incompatibilities, only emulation needed.
vec_add
" that emit the appropriate op code based on the type of the elements within the vector, and very strong type checking is enforced." Completely different mechanism.then you did not see the examples that were given 5 days ago, nor the ones that were given 18 months ago.
and what happens if a mistake is made?
ah. in the "embedded" platform it is, however in the UNIX platform space, it is categorically NOT acceptable. at all. failure to trap on illegal instructions will result in non-compliance.
then my understanding is that such a system, by not having applied for a JEDEC mvendorid, would *not* pass RISC-V Conformance tests.
yes it is... and where would those bits accumulate? *IN THE ISAMUX CSR*! i said exactly these words only 5 days ago! please do not make me repeat them time and time again!
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9f10e8b9-7d1a-46e4-af57-3225e5206fd8%40groups.riscv.org.
I have been reading the threads. All of them. It is incredibly disrespectful to insinuate otherwise.
* in 2022 the new ratified LE/BE isamux standard comes out. ISAMUX bit 0 is chosen to represent it.* in 2023 the new ratified RVC2 isamux standard comes out. ISAMUX bit 1 is chosen to represent it.
* a vendor chooses to implement *only* LE/BE but has not implemented RVC2.
* the customers try to run a binary that is compiled with **BOTH** LE/BE **AND** RVC2.
* instructions with LE/BE work perfectly well
* instructions with RVC2 *TRAP* on the writing / setting of ISAMUX bit 0, and require JIT emulation, just as any "legacy" processor would require.
this is absolutely no different a situation from if the *actual* RISC-V instruction set were *already* 34 bits.It is unthinkable that RISC-V is going to change its instruction length every year. From both a hardware and software design perspective. It's a huge disruption to the ecosystem and it will never be done.
[]
ah. in the "embedded" platform it is, however in the UNIX platform space, it is categorically NOT acceptable. at all. failure to trap on illegal instructions will result in non-compliance.Okay, so you agree that this is not a solution for RV64ABCDEFGIJKLMOPQRTVWXYZ? This is not a solution for embedded processors? The vast, vast majority of RISC-V devices sold will not be UNIX processors (2 billion WD procs shipping soon). Additionally, processors most likely to use custom extensions in the 30-bit encoding space are embedded DSP processors and academic research projects. This is described in the unprivileged ISA spec. So isamux as proposed does not solve the issue in 99.9999% of cases.then my understanding is that such a system, by not having applied for a JEDEC mvendorid, would *not* pass RISC-V Conformance tests.I'm not talking about certifying a processor. I'm talking about using an off-the-shelf compiler for developing a processor. Surely you don't expect hobbyists to pay money to register their toy? Or alternatively, fork gcc as I suggested.yes it is... and where would those bits accumulate? *IN THE ISAMUX CSR*! i said exactly these words only 5 days ago! please do not make me repeat them time and time again!I agree that the mechanism is similar. That's why I suggested it. There does not need to be a centralized body which governs the arbitration bits. It is by far not the common case, it will complicate systems, and it will slow down adoption. We should not disrupt the existing ecosystem to support it.
Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).
Note2: I can imagine namespaces being per privilege level, but it seems ISANS will have to saved and restored with a change of privilege level anyway. Likewise one would have to think about the impact on (user level) interrupts.
One could easily imagine the namespaces be partitioned in "to be used by the standard", "registered", and "free for experimentation with a random namespace number".
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/da6cd2b0-db82-4d50-bc78-314921a4147c%40groups.riscv.org.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2604c7da-f84a-4afd-aefa-263f8febb06b%40groups.riscv.org.
I think some of the confusion is due to the name ISAMUX. Can I suggest to use the name ISANS for ISA name space?
For every value of this CSR you should have a different clean ISA namespace of all 16, 32, 48 ,... bit instructions
(I think this is the point Luke is trying to make with bits 33, 34, .. of the isa). Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).
Now 2^32 namespaces may be slightly exaggerated:
I imagine, you would have "global namespaces" like an hypothetical RVCv2 that people will want to run for a whole program and that people want to extend. That would mean they are essentially just feature bits.
But it should also be possible to set and unset ISANS locally in a function
e.g. for using , a hypothetical on chip hardware RDMA interface
with a long namespace number (probably the most priced interface numbers are those in the top 20 bits because LUI can provides a 20 bits immediate in one instruction). In theory, after setting ISANS, every bit in the ISA is yours, but in practice you would ,say, reuse IMAC because networking needs integer processing, and, say, the FMADD major opcode of the isa, because you wanted to reuse the decoder for 3 input registers, and you don't care about FMADD while networking. Perhaps even more likely, you could use custom major opcodes, safe in the knowledge that because of namespacing, this does not trample over anybody else's custom opcode.
Note1: If such an ISANS scheme is adopted by the standard, I see no particular reason why being a standardised instruction would _necessarily_ imply being in the default global namespace. The main advantage of being in the global namespace would be, not having to switch namespaces back and forth with a CSR instruction. The disadvantage would be, taking up precious space in the default global namespace that can no longer be used for other purposes.
Note2: I can imagine namespaces being per privilege level,
but it seems ISANS will have to saved and restored with a change of privilege level anyway.
Likewise one would have to think about the impact on (user level) interrupts.
Note3: Perhaps in such a scheme, MISA should be set per namespace. In the RDMA example: while in the RDMA networking extension namespace you would no longer fully support F and D and MISA would reflect that.
One could easily imagine the namespaces be partitioned in "to be used by the standard", "registered", and "free for experimentation with a random namespace number".
I do like 'isa namespaces' better. As long as ISANS is WARL
Luke, if you're saying that someone is unable to do something ever, that's extremely rude,
and I would suggest apologising.
Pretty much all people are quite capable of learning in the future even if they don't understand something now.
Indeed. The constant bullying from this one particular list participant makes for a very unpleasant forum.
so let's all take a step back, calm down, and go over this carefully and in a respectful fashion, ok? no more "reducto ad absurdum".
The reality is that the RISC-V community is a welcoming, courteous
environment for technical discussion and development doing a whole lot
of amazing things.
The *state* of these mailing lists is too frequently the opposite.
Bullying students? Stating that they will not be involved in the
decision making process? Insinuating that their intelligence is
insufficient for analytical reasoning?
That's naked harassment.
Harassment and bullying are antithetical to the RISC-V community.
That
is not, I repeate *NOT*, the reality that needs to be accepted.
I repeat, more strongly, this *MUST* stop.
On Tuesday, June 11, 2019 at 2:24:28 AM UTC+1, Dan Cross wrote:Indeed. The constant bullying from this one particular list participant makes for a very unpleasant forum.
[snip]
referring to someone in the third person excludes them from the conversation and makes them unwelcome.
it is terribly insulting, and so, hypocritically, you have criticised *me* for being a "bully"... using techniques that are known to be used by bullies.can you see that that's what you did, and how it's just as unacceptable as the misconception that i am "out to be a bully". i've *been* subjected to sustained bullying (at a boarding school that i attended for six years) - they were the worst years of my life.
i *know* what it's like.
do you *really* think that i am sitting here, typing this, looking to utilise it as a way to *deliberately* subject someone else to pain, in order to "get a kick out of seeing them suffer"??
if so, i have to say, "what the ****???"
so you cannot possibly be using that word "bully" to refer to me. either that, or you do not understand the true meaning of the word.
this is an exceptionally complex area which took *months* to go over, and the only reason i'm pursuing it, despite being hated by all of you for doing so, is because the consequences for RISC-V if it is gotten wrong are too severe to let happen.
Dan, your honesty is appreciated, I absolutely mean that. You've noticed that I simply do not operate according to conventional psychological behavioural patterns, and I have genuine difficulty identifying both when people have acted offensively towards me, and vice versa.
This lack leaves me with little option but to be plainly and completely honest, and unfortunately, as you've witnessed, rather than help a situation, such brutal honesty is frequently misunderstood and misinterpreted as speaking with vicious, spiteful, vengeful, callous and intentionally malicious intent.
I have absolutely no idea how to deal with this misperception.
Additionally, the lack of ability to spot when others have acted towards me in ways that would be immediately spotted by others as being offensive has a quite serious consequence: it allows people to get away with and continue to use completely inappropriate behaviour, whilst allowing them to accuse me of that very same offensive behaviour.
This isn't deliberate: it's simply an unintended side-effect.
The only thing that I can do is apologise to all concerned: beyond that, I can only apologise in advance that I have set a goal that needs to be completed, and there is nothing that I will let stand in the way of that (including my own limitations). Yes, this is not normal behaviour, it is quite pathological.
Lastly, it's worth saying, Dan (and Johnathon), neither of you did anything "wrong". You spoke your minds, as you should. Again, Dan, thank you, sincerely and genuinely, for speaking up and being so honest, particularly in such a public way. I'm sorry I have such limited understanding of human behaviour.
Warmest,
L.
Thanks for the clarification Rogier!I do like 'isa namespaces' better. As long as ISANS is WARL so that implementations which only support the default namespace do not trap, it seems reasonable.
Thinking about such a CSR as an ISA namespace, means that for a 32 (64) bit CSR there are 2^32 (2^64) namespaces which should be enough for everybody (famous last words).I very much hope it is :).
One of the nice things about RISC-V is its smallness. In the RISC-V Reader, they espouse the virtues of having the entire ISA fit in a small book, rather than the several volumes of say, Power. As a community we should try to avoid "kitchen-sink" ISA proliferation. It would be much better if a few custom extensions become widely adopted and then enshrined in the standard, rather than tons and tons of custom extensions become relatively popular, conflicting with each other, causing toolchain complications and software developer confusion.Note2: I can imagine namespaces being per privilege level, but it seems ISANS will have to saved and restored with a change of privilege level anyway. Likewise one would have to think about the impact on (user level) interrupts.Potentially very cool idea, has interesting implications for virtualization.One could easily imagine the namespaces be partitioned in "to be used by the standard", "registered", and "free for experimentation with a random namespace number".My opinion: The extension discovery/enable/disable mechanism is a platform issue -- embedding it into the user-level spec is a mistake. A standard mechanism could have a place in a separate platform specification. However, many systems which handle some complicated set of non-standard extensions may want their own mechanisms.
I could be wrong, I don't have a ton of experience in BSP/ucontroller architectures. My impression is that the toolchains are very custom.
Best,Dan Petrisko
To unsubscribe from this group and stop receiving emails from it, send an email to isa...@groups.riscv.org.
I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0982bfb3-2998-419d-a56c-14fbf3d0996a%40groups.riscv.org.
Note1: If such an ISANS scheme is adopted by the standard, I see no particular reason why being a standardised instruction would _necessarily_ imply being in the default global namespace. The main advantage of being in the global namespace would be, not having to switch namespaces back and forth with a CSR instruction. The disadvantage would be, taking up precious space in the default global namespace that can no longer be used for other purposes.you're suggesting the ISAMUX/ISANS setting be made a actual RV32 opcode? that would be unnecessary because CSRRW/S/C (etc.) perform the required task perfectly well, and, yes, RV32 opcode space is extremely precious.plus, remember: if one RV32 opcode space is taken up, it's taken up across *all* namespaces (pretty much) in some form. it gets particularly interesting when switching to foreign ISAs. the foreign ISA has to provide a mechanism for switching back to RISC-V (or other namespaces).
I think it would be better to not be WARL, but to trap to M-mode if an instruction in that namespace happens to be unsupported just like any other unknown instruction traps and is deferred to M-mode.WARL gives you 3 options:1) Don’t allow the bit to be set. Then U-mode software knows to emulate.2) Allow the bit to be set, but trap as illegal and emulate in M-mode. (The behavior you’re describing)3) Allow the bit to be set, implement the namespace.It allows systems which don’t have a trap mechanism (e.g. only implement the unprivileged spec) to use it.Making it WLRL only allows option 2 or 3, and mandates a trap mechanism.Can you explain why WLRL is the better option?
CiaoRogier[]
The better option, I think, would be WARA (Write Any Read Any if I get that right). It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.
I _think_ (but correct me if I am wrong) that if a processor cannot trap to M-mode because M mode is all there is, you are supposed to just not use unknown instructions (which using an instruction in an unsupported namespace would be) and if you do it anyway, you must expect your program to terminate.
Having said that, I can see the usefulness of being able to test if a namespace is recognised, so that you can test whether there is an _advantage_ to using the instruction in the namespace, as opposed to being slow but not blowing up, in portable software.
There once was a time when MISA was
itself a read-only register; but now that it *can be* read-write, the
only way to reliably "auto-detect" what ISA extensions the processor
truly supports is to shadow the cold-boot value of MISA in RAM
somewhere, captured as closely to cold-boot startup as is feasible,
and hope that it never gets overwritten.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAEz%3Dso%3D_bvgAnjMzAELM-CQWL3jC%2BSJ_OCPSEZh6xwQTfz3d-Q%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/89D967BB-0790-4730-A298-36D0DE50CC00%40cs.washington.edu.
WLRL is permitted to, but not required to , raise an exception if the write of an illegal value is attempted.If it doesn't raise an exception, the result is undefined, and what is read back is undefined.It is the only defined field type that allows traps.
Sorry, I realize what I wrote is unclear.You are correct, it will not raise an exception upon a write to CSR.I meant that the processor is still allowed to trap upon illegal instruction, not upon the CSR write itself.
The better option, I think, would be WARA (Write Any Read Any if I get that right).
It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.Ah, I think I understand. Your concern is that if one hardcodes all future extension bits to 0 (as permitted in WARL), then you will not be able to hardware emulate those instructions for future software on the same silicon?
On Tuesday, June 11, 2019, Dan Petrisko <petr...@cs.washington.edu> wrote:The better option, I think, would be WARA (Write Any Read Any if I get that right).Allen clarified: it's called WLRL and it's the only one that's allowed to raise exceptions.This exception is what allows the processor to enter JIT emulation mode until such time as the software (all of it) writes to the ISAMUX/ISANS with a value that the hardware *can* continue further execution in a namespace it *does* have the hardware capability to execute.This is very different from WARL, where due to the loss of context (the namespace), the processor has absolutely no way of determining the meaning of an opcode.Effectively, ISAMUX aka ISANS is identical to the c++ "using namespace { ....}" construct.
Pauses to reflect a bit. Yeah. Whether done globally or in scopes, it's an exceptionally clear and precise analogy, in pretty much every way.
It allows for the option that all or just some instructions in a ISA namespace are trapped and emulated in software. Note that this emulation may be for instructions in a namespace that didn't even exist when the processor was made.Ah, I think I understand. Your concern is that if one hardcodes all future extension bits to 0 (as permitted in WARL), then you will not be able to hardware emulate those instructions for future software on the same silicon?Partly other way round. Not be able to *software* emulate those instructions (the ones that have the same opcode) for future software running on the same silicon.Is that clear now, with the c++ analogy? The need in c++ to be able to distinguish between shorter names that clash globally is very well understood, and very clear.The situation here is no different.