Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.
Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:
slt rt1, rs1, rs2 # rt1 = (rs1 < rs2) ? 1 : 0
mul rt2, rs1, rt1 # rt2 = (rs1 < rs2) ? rs1 : 0
neg rt3, rt1 # rt3 = (rs1 < rs2) ? 0 : ~0
or rt4, rs2, rt3 # rt4 = (rs1 < rs2) ? 0 : rs2
or rd, rt4, rt5 # rd = (rs1 < rs2) ? rs1 : rs2
using Zicond, that improves to 4 instructions:
slt rt1, rs1, rs2 # rt1 = (rs1 < rs2) ? 1 : 0
czero.nez rt2, rt1, rs1 # rt2 = (rs1 < rs2) ? rs1 : 0
czero.eqz rt3, rt1, rs2 # rt3 = (rs1 < rs2) ? 0 : rs2
or rd, rt2, rt3 # rd = (rs1 < rs2) ? rs1 : rs1
Without heroic macro-op fusion, the only real benefit is one less instruction.
Generally, the above sequence can be applied to any ternary operation `(C) ? A : B`. If the condition C or results A or B take multiple instructions to evaluate, then the benefit of Zicond diminishes further.
Looked at another way, consider that `czero.eqz rd, rs1, rs2` can itself be implemented with two existing instructions:
seqz rd, rs1
mul rd, rs2, rd
Ditto for `czero.nez`.
So at best Zicond is saving a single instruction per choice.
To have a meaningful conditional operation extension, RISC-V will have to add more real instructions, potentially with three source operands or more built-in comparison operations. For example, a true conditional select instruction would reduce the min example above to two instructions.
Anticipating constructive criticism:
Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.
Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:
slt rt1, rs1, rs2 # rt1 = (rs1 < rs2) ? 1 : 0
mul rt2, rs1, rt1 # rt2 = (rs1 < rs2) ? rs1 : 0
neg rt3, rt1 # rt3 = (rs1 < rs2) ? 0 : ~0
or rt4, rs2, rt3 # rt4 = (rs1 < rs2) ? 0 : rs2
or rd, rt4, rt5 # rd = (rs1 < rs2) ? rs1 : rs2
.globl min
min:
slt t1,a0,a1 # 1 : 0
neg t2,t1 # ~0 : 0
and t3,a0,t2 # a0 : 0
andn t4,a1,t2 # 0 : a1
or a0,t4,t3 # a0 : a1
ret
Sorry about the confusion. You are correct that the example is incorrect. Your snippet correctly illustrates the point.
From: Bruce Hoult <br...@hoult.org>
Sent: Thursday, May 18, 2023 11:10 AM
To: Derek Hower <dho...@qti.qualcomm.com>
Cc: RISC-V ISA Dev <isa...@groups.riscv.org>
Subject: Re: [isa-dev] Qualcomm feedback on Zicond
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Fri, May 19, 2023 at 1:56 AM Derek Hower <dho...@qti.qualcomm.com> wrote:Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.
Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:
slt rt1, rs1, rs2 # rt1 = (rs1 < rs2) ? 1 : 0
mul rt2, rs1, rt1 # rt2 = (rs1 < rs2) ? rs1 : 0
neg rt3, rt1 # rt3 = (rs1 < rs2) ? 0 : ~0
or rt4, rs2, rt3 # rt4 = (rs1 < rs2) ? 0 : rs2
or rd, rt4, rt5 # rd = (rs1 < rs2) ? rs1 : rs2Looks very wrong to me. First assuming you mean rt2 not rt5 in the last instruction...or rt4, rs2, rt3 # rt4 = (rs1 < rs2) ? ~0 : rs2
or rd, rt4, rt2 # rd = (rs1 < rs2) ? ~0 : rs2Maybe you mean an AND instead of the first OR, but it's still wrong. That just makes the final result (rs1 < rs2) ? 0 : rs2Using ANDN (Zbb) instead of the first OR makes it correct.But you don't need the "MUL" at all. Just use AND for that one. Here is working code, tested on a VisionFive 2:.globl min
min:
slt t1,a0,a1 # 1 : 0
neg t2,t1 # ~0 : 0
and t3,a0,t2 # a0 : 0
andn t4,a1,t2 # 0 : a1
or a0,t4,t3 # a0 : a1
ret
You can also use "addi t2,t1,-1" and reverse the "and" and "andn" as this allows a C instruction for the "addi", which "neg" doesn't until Zcb exists.
The better way to reduce this sequence by one instruction might be to define new versions of SLT and SLTU that produce 0 and ~0 directly. Maybe call them MLT and MLTU, for Mask Less Than. Maybe want MEQ and MNE too.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/d8053fd5-35e1-451a-bd56-d3aa7808e336n%40groups.riscv.org.