This is feedback from IAR on the proposed Zicond extension.Summary: With Zicond, we see no gain in code size and (if the hardware
* Avoid small extensions (general feedback)
Lately, the number of RISC-V extensions have exploded. This allows
designers of hardware to pick any from a large smorgasbord of
Unfortunately, this puts tool vendors in a horrible situation.
Currently, runtime libraries are provided in a number of flavors.
However, each additional extension puts tool vendors in a situation
where they have to choose whether to increase the number of libraries
or provide suboptimal libraries for some devices.
In fact, this doesn't just apply to tool vendors, it applies to
anybody who will provide prebuilt libraries or executable files.
A better solution would be to group related instructions into larger
extensions and to enforce that devices implement all instructions in
the selected extensions.
This is not the first time I have provided this feedback to the RISC-V
community, so hopefully it will not come as a surprise.
* No gain in code size (when Rd = Rs1)
If the compressed extension is present, there is no gain in code size
for the czero instructions, when Rd and Rs1 are the same register.
czero.nez a0, a0, a1 -- 4 bytes.
c.eqz a1, skip -- 2 bytes.
c.mv a0, x0 -- 2 bytes.
Of course, this doesn't apply when Rd and Rs1 are different registers.
However, modern compilers are very good at register allocation, so
this limitation will have very little impact in the real world.
* No gain in speed (when Rd = Rs1)
If this is considered a common pattern, hardware can easily decode the
two instructions as one large and run it without breaking the
pipeline. This is known as "instruction fusing".
In fact, if I understand correctly, there are RISC-V hardware which
already do this for all conditional branches over a single
* The examples
I noticed that the provided examples can be translated to smaller code
when the czero extension isn't used!
(rc == 0) ? (rs1 + rs2) : rs1
The czero specification suggested using the 6 byte sequence:
czero.nez rd, rs2, rc
add rd, rs1, rd == c.add rd, rs1
However, when the result should be placed in the same register as Rs1,
only 4 bytes are needed:
c.bnez a2, skip
c.add a0, a1
When Rd and Rs1 are different registers, 6 bytes are sufficient:
c.bnez a2, skip
c.add a1, a0
c.mv a0, a1
For the bitwise "and" example, "(rc == 0) ? (rs1 & rs2) : rs1", the
situation is even worse. The specification suggests using an 8 or 10
byte sequence whereas code that doesn't use the czero extension use 4
or 6 bytes of code.
Hence, the "motivating examples" provided by the czero specification
fail to provide proof that this extension provides any real benefits.
* An alternative: Conditional move
If branch free conditional instructions are of interest to RISC-V,
maybe we should consider a conditional move instead, for example:
mv.eqz rd, rs1, rc
This would have the following semantic:
if rc == 0 then rd = rs1
This could use the same encoding space as the proposed czero
When using x0 as rs1, this would behave almost like czero, with the
exception that it doesn't contain an implicit move.
mv.eqz rd, x0, rc // Conditionally clear RD.
For example, conditional select (from the czero specification):
rd = (rc == 0) ? rs1 : rs2
With czero (10 bytes, need temp register):
czero.nez rd, rs1, rc
czero.eqz rtmp, rs2, rc
or rd, rd, rtmp
With mv.cond (8 bytes, doesn't need temp register):
mv.eqz rd, rs1, rc
mv.nez rd, rs2, rc
Anders Lindgren, Lead Engineer for the IAR compiler for RISC-V
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/adbad487-e265-0844-4499-02e953f41dab%40iar.com.
If the libraries from IAR are closed source (even to paying customers) then perhaps you could consider a service where customers told you exactly what extensions their CPU core has, and IAR builds a custom set of libraries (and maybe compiler) for them?
I expect such a request could be fulfilled within hours, or even minutes with an automated system.
You could of course cache the generated libraries in case another customer asked for the same combination in future.
IAR provides everything needed for customers to rebuild the libraries, including source files and IAR Embedded Workbench projects. You can enable any extension, change optimization settings etc. so for IAR this isn't a big problem.
However, hopefully there will be many other organizations providing prebuilt RISC-V libraries and applications in the future, and a diverging will make this more difficult
Why is a 3 input extension a non-starter? Many other common architectures support 3 input instructions. This is an extension that is optional. If you don’t want to support it because it has 3 sources, you don’t have to. If you support F you already have the HW support for it. What is the driving decision to say it is a non-starter? Opcode space?
Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/PA4PR04MB801446DCAFD78C05435360578D769%40PA4PR04MB8014.eurprd04.prod.outlook.com.
I come from a Zfinx world, so the HW to support 3 source ops is the same regardless of integer or float. My only point is 3 source should not be a “non-starter”. In cases where it makes sense, we should absolutely do it. In cases where it doesn’t make sense, we absolutely should not do it.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/PA4PR04MB80140EB2A14DBC801A8536588D779%40PA4PR04MB8014.eurprd04.prod.outlook.com.
Also, PowerPC base ISA had 3 source operands and there were many OoO implementations of it. So, it may be more of a headache, but can be done.
Tommy, I don’t disagree, but remember, these are optional extensions. They may not make sense in HPC, but they may make sense in embedded MCU world. Let’s not neglect the needs of the MCU space. I am not talking specifically about Zicond here. This is a general statement to 3 source being a “non-starter”.
My statements are not about “this extension”.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/25691.53270.855200.484429%40KAMacBookAir14.local.
I have to agree with Tommy here. 3-operand integer is very expensive for superscalar cores. While the register file and execution datapaths can be sized to support "80-percentile" use-case, the decode/dependency-check/rename at the front of the machine has to be sized for worst-case (all integer instructions needing 3-operands).
Yes, ARM and friends show it is possible, but it is a fundamental critical path that limits machine width. Considering average register operand needs are <1.5 per instruction, throwing down 3 ports for every instruction and having that be the limiting width factor feels a bit off.
It's nice if 3-operands can cut down your program's "critical path length", but not if you added effective path length elsewhere in your design via cracking or sequencing.Also, once you have 3-operand int instructions, you probably need to double-down and add a lot more dense/CISC operations to keep the arithmetic intensity up to make up for the width limiting that 3-operand support just caused (which then are likely cracked into multiple uops after making it past the rename pinch-point).