Qualcomm feedback on Zicond

120 views
Skip to first unread message

Derek Hower

unread,
May 18, 2023, 9:56:51 AM5/18/23
to RISC-V ISA Dev

Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.

Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:


  slt rt1, rs1, rs2   # rt1 = (rs1 < rs2) ? 1 : 0
  mul rt2, rs1, rt1   # rt2 = (rs1 < rs2) ? rs1 : 0
  neg rt3, rt1        # rt3 = (rs1 < rs2) ? 0 : ~0
  or  rt4, rs2, rt3   # rt4 = (rs1 < rs2) ? 0 : rs2
  or  rd, rt4, rt5    # rd  = (rs1 < rs2) ? rs1 : rs2

 
using Zicond, that improves to 4 instructions:

  slt       rt1, rs1, rs2    # rt1 = (rs1 < rs2) ? 1 : 0
  czero.nez rt2, rt1, rs1    # rt2 = (rs1 < rs2) ? rs1 : 0
  czero.eqz rt3, rt1, rs2    # rt3 = (rs1 < rs2) ? 0 : rs2
  or        rd, rt2, rt3     # rd  = (rs1 < rs2) ? rs1 : rs1

 
Without heroic macro-op fusion, the only real benefit is one less instruction.

Generally, the above sequence can be applied to any ternary operation `(C) ? A : B`. If the condition C or results A or B take multiple instructions to evaluate, then the benefit of Zicond diminishes further.

Looked at another way, consider that `czero.eqz rd, rs1, rs2` can itself be implemented with two existing instructions:

  seqz rd, rs1
  mul  rd, rs2, rd

Ditto for `czero.nez`.

So at best Zicond is saving a single instruction per choice.

To have a meaningful conditional operation extension, RISC-V will have to add more real instructions, potentially with three source operands or more built-in comparison operations. For example, a true conditional select instruction would reduce the min example above to two instructions.


Anticipating constructive criticism:

  • Multiply is not required. The combination of neg/and has the same effect, with one additional instruction. Regardless, a machine that has expensive branch mispredictions (when conditional operations are helpful) is very likely to also have a reasonable multiplier. Also consider that the input to the multiply is always 0 or 1 and could be optimized if desired.
  • The examples above are written in an SSA style so that they are easier to follow. They do not actually require 3/4 different temporary register numbers.

Bruce Hoult

unread,
May 18, 2023, 11:10:02 AM5/18/23
to Derek Hower, RISC-V ISA Dev
On Fri, May 19, 2023 at 1:56 AM Derek Hower <dho...@qti.qualcomm.com> wrote:

Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.

Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:


  slt rt1, rs1, rs2   # rt1 = (rs1 < rs2) ? 1 : 0
  mul rt2, rs1, rt1   # rt2 = (rs1 < rs2) ? rs1 : 0
  neg rt3, rt1        # rt3 = (rs1 < rs2) ? 0 : ~0
  or  rt4, rs2, rt3   # rt4 = (rs1 < rs2) ? 0 : rs2
  or  rd, rt4, rt5    # rd  = (rs1 < rs2) ? rs1 : rs2


Looks very wrong to me. First assuming you mean rt2 not rt5 in the last instruction...

  or  rt4, rs2, rt3   # rt4 = (rs1 < rs2) ? ~0 : rs2
  or  rd, rt4, rt2    # rd  = (rs1 < rs2) ? ~0 : rs2

Maybe you mean an AND instead of the first OR, but it's still wrong. That just makes the final result (rs1 < rs2) ? 0 : rs2

Using ANDN (Zbb) instead of the first OR makes it correct.

But you don't need the "MUL" at all. Just use AND for that one. Here is working code, tested on a VisionFive 2:

.globl min

min:

slt  t1,a0,a1 # 1  : 0

neg  t2,t1    # ~0 : 0

and  t3,a0,t2 # a0 : 0

andn t4,a1,t2 # 0  : a1

or   a0,t4,t3 # a0 : a1

ret


You can also use "addi t2,t1,-1" and reverse the "and" and "andn" as this allows a C instruction for the "addi", which "neg" doesn't until Zcb exists.

The better way to reduce this sequence by one instruction might be to define new versions of SLT and SLTU that produce 0 and ~0 directly. Maybe call them MLT and MLTU, for Mask Less Than. Maybe want MEQ and MNE too.


Derek Hower

unread,
May 18, 2023, 11:37:23 AM5/18/23
to Bruce Hoult, RISC-V ISA Dev

Sorry about the confusion. You are correct that the example is incorrect. Your snippet correctly illustrates the point.

 

  • Derek

 

From: Bruce Hoult <br...@hoult.org>
Sent: Thursday, May 18, 2023 11:10 AM
To: Derek Hower <dho...@qti.qualcomm.com>
Cc: RISC-V ISA Dev <isa...@groups.riscv.org>
Subject: Re: [isa-dev] Qualcomm feedback on Zicond

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Philipp Tomsich

unread,
May 18, 2023, 12:04:49 PM5/18/23
to Bruce Hoult, Derek Hower, RISC-V ISA Dev
On Thu, 18 May 2023 at 17:10, Bruce Hoult <br...@hoult.org> wrote:
On Fri, May 19, 2023 at 1:56 AM Derek Hower <dho...@qti.qualcomm.com> wrote:

Zicond does not meaningfully improve the RISC-V ISA; at best, it is reducing a sequence of N instructions to a sequence of N-1 instructions. Conditional operations are typically used to eliminate branches in hard-to-predict code. As we show below, Zicond does not significantly advance that goal. Unless there is data showing how it would improve performance, we do not support Zicond as a ratified extension.

Consider a function that picks the minimum of two numbers, i.e., `(rs1 < rs2) ? rs1 : rs2`. Without branches or Zicond, the function can be implemented in 5 instructions:


  slt rt1, rs1, rs2   # rt1 = (rs1 < rs2) ? 1 : 0
  mul rt2, rs1, rt1   # rt2 = (rs1 < rs2) ? rs1 : 0
  neg rt3, rt1        # rt3 = (rs1 < rs2) ? 0 : ~0
  or  rt4, rs2, rt3   # rt4 = (rs1 < rs2) ? 0 : rs2
  or  rd, rt4, rt5    # rd  = (rs1 < rs2) ? rs1 : rs2


Looks very wrong to me. First assuming you mean rt2 not rt5 in the last instruction...

  or  rt4, rs2, rt3   # rt4 = (rs1 < rs2) ? ~0 : rs2
  or  rd, rt4, rt2    # rd  = (rs1 < rs2) ? ~0 : rs2

Maybe you mean an AND instead of the first OR, but it's still wrong. That just makes the final result (rs1 < rs2) ? 0 : rs2

Using ANDN (Zbb) instead of the first OR makes it correct.

But you don't need the "MUL" at all. Just use AND for that one. Here is working code, tested on a VisionFive 2:

.globl min

min:

slt  t1,a0,a1 # 1  : 0

neg  t2,t1    # ~0 : 0

and  t3,a0,t2 # a0 : 0

andn t4,a1,t2 # 0  : a1

or   a0,t4,t3 # a0 : a1

ret


You can also use "addi t2,t1,-1" and reverse the "and" and "andn" as this allows a C instruction for the "addi", which "neg" doesn't until Zcb exists.

I know this is meant as a demonstrative example only, but it's still worth to point out that if andn (Zbb) is available, so are min/max (also in Zbb).

That said, consider czero a contraction of seqz/snez + neg + and (or seqz + neg + and/andn).   Also consider that it is easy to fuse with an or (and any of the basic arithmetic instructions).  It serves two simple purposes: to (sometimes) eliminate one architectural temporary and create a sequence that allows fusion into conditional-logical and conditional-arithmetic.

While Zicond helps with the conditional select, Derek's observation (that it can not replace an instruction w/ 3 input operands — but it lessens the pain somewhat) for that case is spot-on.  Zicond is a lightweight extension (enabling an incremental improvement, especially in the presence of macro-fusion) that uses a small number of code-points (<64K code points, wedged into an already fragmented corner of the opcode space) and can be implemented as a very regular r-type instruction.   Zicond sidesteps the larger discussion (on adding operations with 3 input operands).
 
The better way to reduce this sequence by one instruction might be to define new versions of SLT and SLTU that produce 0 and ~0 directly. Maybe call them MLT and MLTU, for Mask Less Than. Maybe want MEQ and MNE too.

Note that we already have a canonical way of expressing a MGT/MGTU in RISC-V (as used in your example): chaining a NEG on the output operand of the SLT/SLTU.
The intermediate register will be overwritten, so it should make for an easy fusion case.

--Philipp. 

Rogier Brussee

unread,
May 18, 2023, 1:01:31 PM5/18/23
to RISC-V ISA Dev, Philipp Tomsich, Derek Hower, RISC-V ISA Dev, Bruce Hoult
Since the "it does not make anything shorter" thing  keeps coming up and nobody mentions this :
there is one conditional instruction that conveniently fits in a regular R type instruction

ccond.eqz  rd rs1 rs2 # rd = rs2?:rs1 ==  (rs2 != 0)?rs2 : rs1  ==  (rs2 == 0)?rs1 : rs2

we can also do 
ccond.gez,, ccond.gtz,   ccond.lez, ccond ltz
(ccond.nez is czero.nez)

this could even be compressed instruction although if that would be deemed a good idea rd = (rs1)?:rs2   which for rs1 == rd  gives rsd =(rsd)?rs2 might be nicer to have.  

Rogier

Ps. ?: is also called the Elvis operator because it looks like Elvis when viewed sideway, but I would totally not suggest ever  to call the instruction Elvis. Totally not Ever. 
Op donderdag 18 mei 2023 om 18:04:49 UTC+2 schreef Philipp Tomsich:

Bruce Hoult

unread,
May 18, 2023, 1:49:15 PM5/18/23
to Rogier Brussee, RISC-V ISA Dev, Philipp Tomsich, Derek Hower
In C terms, the Elvis operator is x ? x : y except that x is only evaluated once. Both gcc (since 2001!) and clang allow you to write x ? : y (with or without space between the ? and : )

If x and y evaluate to boolean values (i.e. restricted to 0 or 1) then it is also the same as x || y and in fact many languages with dynamic types use || in this way, or in the case of Perl "or".

As a machine code instruction, elvis would have the same property as czero.nez and czero.eqz of not having to wait for the 2nd operand to be computed / loaded if the first operand is non-zero.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/d8053fd5-35e1-451a-bd56-d3aa7808e336n%40groups.riscv.org.
Reply all
Reply to author
Forward
0 new messages