What is a bool/_Bool

326 views
Skip to first unread message

Rogier Brussee

unread,
Jul 25, 2017, 9:40:06 AM7/25/17
to RISC-V ISA Dev
The ABI spec spec https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md, mentions the representation of the fundamental C/C++ datatypes e.g. for RV64

TypeSize (Bytes)
char1
short2
int4
long8
long long8
void *8
float4
double8
long double16


However it does not mention the representation of a _Bool/bool. I would assume a _Bool/bool is 1 byte and its values are restricted to 0 and 1, and  in particular _Bool/bool are passed in registers.


Rogier

P.s. 

Alex Bradbury

unread,
Jul 25, 2017, 10:00:25 AM7/25/17
to Rogier Brussee, RISC-V ISA Dev
Hi Rogier, I noticed the same and at least have an issue tracking this
(and _Complex) here
https://github.com/riscv/riscv-elf-psabi-doc/issues/28. I think your
assumptions are correct, though of course a bool would be passed on
the stack if the argument registers were exhausted.

Best,

Alex

Rogier Brussee

unread,
Jul 25, 2017, 2:16:16 PM7/25/17
to RISC-V ISA Dev, rogier....@gmail.com
Dear Alex,

Very well!. 

The gcc and llvm ports must already have made a choice, so it should just be a matter of writing it in the spec.

Thanx!

Rogier

 


Op dinsdag 25 juli 2017 16:00:25 UTC+2 schreef asb:

Michael Clark

unread,
Jul 25, 2017, 4:45:06 PM7/25/17
to Rogier Brussee, RISC-V ISA Dev
Hi Rogier,

What is interesting, in addition to the storage being equal to char and the value being 0 or 1, it appears that on x86-64 ABI that bits[63:8] of the register are undefined, as the _Bool test is just for non-zero byte, ignoring the upper bits in the register. On x86 SETcc only sets the low 8-bits, so the rest of the register may have contents from another operation:


I’m not sure what RISC-V defines as the register contents for the upper bits in a char as RISC-V can only reference a register as 32-bits or 64-bits. Due to this I believe canonical false needs to be zero-extended to 32-bits or 64-bits on RV32 and RV64 respectively. I think not zero has to test the full register width.

Note: while true is defined to be 1 (rewriting other values to 1), the code is guarded and tends to test either zero or not zero, so a non-canonical true (not 0 and not 1) could be true.

Michael.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/40cb2678-1f52-4b34-a97d-8eff74846c47%40groups.riscv.org.

Andrew Waterman

unread,
Jul 25, 2017, 4:49:25 PM7/25/17
to Michael Clark, Rogier Brussee, RISC-V ISA Dev

Cesar Eduardo Barros

unread,
Jul 25, 2017, 6:11:16 PM7/25/17
to Michael Clark, Rogier Brussee, RISC-V ISA Dev
Em 25-07-2017 17:44, Michael Clark escreveu:
> Hi Rogier,
>
> What is interesting, in addition to the storage being equal to char and
> the value being 0 or 1, it appears that on x86-64 ABI that bits[63:8] of
> the register are undefined, as the _Bool test is just for non-zero byte,
> ignoring the upper bits in the register. On x86 SETcc only sets the low
> 8-bits, so the rest of the register may have contents from another
> operation:

On RISC-V, SLT/SLTU/SLTI/SLTIU/LB/LBU all set the whole register, so it
makes sense and is simpler to require the whole register to be defined
when used to store a _Bool. That is: a _Bool in a register is whatever
SLT/SLTU/SLTI/SLTIU would set it to. A _Bool in memory is the lowest
byte of a _Bool in a register. Any other value is invalid.

> I’m not sure what RISC-V defines as the register contents for the upper
> bits in a char as RISC-V can only reference a register as 32-bits or
> 64-bits. Due to this I believe canonical false needs to be zero-extended
> to 32-bits or 64-bits on RV32 and RV64 respectively. I think not zero
> has to test the full register width.
>
> Note: while true is defined to be 1 (rewriting other values to 1), the
> code is guarded and tends to test either zero or not zero, so a
> non-canonical true (not 0 and not 1) could be true.

A "non-canonical true" could also be false. Or worse, it could make the
program jump into nowhere (for instance, if used to index into a jump
table). There's a reason the compiler converts a cast to _Bool into a
"snez" (sltu with x0) instruction.

Yeah, most of the time you'll be comparing a _Bool with zero, but that's
only because it's so convenient due to the x0 register. Don't count on it.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Michael Clark

unread,
Jul 25, 2017, 7:23:29 PM7/25/17
to Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev

> On 26 Jul 2017, at 10:11 AM, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:
>
> Em 25-07-2017 17:44, Michael Clark escreveu:
>> Hi Rogier,
>>
>> What is interesting, in addition to the storage being equal to char and
>> the value being 0 or 1, it appears that on x86-64 ABI that bits[63:8] of
>> the register are undefined, as the _Bool test is just for non-zero byte,
>> ignoring the upper bits in the register. On x86 SETcc only sets the low
>> 8-bits, so the rest of the register may have contents from another
>> operation:
>
> On RISC-V, SLT/SLTU/SLTI/SLTIU/LB/LBU all set the whole register, so it makes sense and is simpler to require the whole register to be defined when used to store a _Bool. That is: a _Bool in a register is whatever SLT/SLTU/SLTI/SLTIU would set it to. A _Bool in memory is the lowest byte of a _Bool in a register. Any other value is invalid.

Agree. I was mostly pointing out how _Bool values in registers are implementation defined, and thus should be defined in the ABI doc, given Rogier spotted its absence.

>> I’m not sure what RISC-V defines as the register contents for the upper
>> bits in a char as RISC-V can only reference a register as 32-bits or
>> 64-bits. Due to this I believe canonical false needs to be zero-extended
>> to 32-bits or 64-bits on RV32 and RV64 respectively. I think not zero
>> has to test the full register width.
>>
>> Note: while true is defined to be 1 (rewriting other values to 1), the
>> code is guarded and tends to test either zero or not zero, so a
>> non-canonical true (not 0 and not 1) could be true.
>
> A "non-canonical true" could also be false. Or worse, it could make the program jump into nowhere (for instance, if used to index into a jump table). There’s a reason the compiler converts a cast to _Bool into a "snez" (sltu with x0) instruction.

What the C standard says and what implementations define are different things. On X86 0x1 through 0xff are true. i.e. all non-zero char values. 0x100 might not be safe on x86.

> Yeah, most of the time you'll be comparing a _Bool with zero, but that's only because it's so convenient due to the x0 register. Don't count on it.

On RISC-V I believe we can count on non-zero being true, based on what has just been stated. The only completely safe thing for the compiler to emit snez, which is what it does. GCC and LLVM seem to be consistent in this regard as they test for non zero on other architectures (however with 8-bit width on x86).

Rogier Brussee

unread,
Jul 27, 2017, 4:42:21 AM7/27/17
to RISC-V ISA Dev, ces...@cesarb.eti.br, rogier....@gmail.com


Op woensdag 26 juli 2017 01:23:29 UTC+2 schreef michaeljclark:

> On 26 Jul 2017, at 10:11 AM, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:
>
> Em 25-07-2017 17:44, Michael Clark escreveu:
>> Hi Rogier,
>>
>> What is interesting, in addition to the storage being equal to char and
>> the value being 0 or 1, it appears that on x86-64 ABI that bits[63:8] of
>> the register are undefined, as the _Bool test is just for non-zero byte,
>> ignoring the upper bits in the register. On x86 SETcc only sets the low
>> 8-bits, so the rest of the register may have contents from another
>> operation:
>
> On RISC-V, SLT/SLTU/SLTI/SLTIU/LB/LBU all set the whole register, so it makes sense and is simpler to require the whole register to be defined when used to store a _Bool. That is: a _Bool in a register is whatever SLT/SLTU/SLTI/SLTIU would set it to. A _Bool in memory is the lowest byte of a _Bool in a register. Any other value is invalid.

Agree. I was mostly pointing out how _Bool values in registers are implementation defined, and thus should be defined in the ABI doc, given Rogier spotted its absence.

Not just in registers: the ABI COULD define that any nonzero byte stored in a _Bool should be interpreted as true (I am not arguing to do that just pointing out that this is also a  possibility)

In registers it is not even clear if the MSB of the register should be zero
The typical example is

_Bool is_nullptr(const void* p)
{
        return p;
}

should that compile to

ret

or

snez a0 a0
ret

and a related question is how to compile

_Bool are_nullptrs(const void* p. const void* q)
{
_Bool a = is_nullptr(p);
_Bool b = is_nullptr(q);
return a && b;
}

Should that be

jal is_nullptr
mv a5, a0
mv a0, a1
jal is_nullptr
and a0 a0 a5
ret

or 

jal is_nullptr
snez a5, a0
mv a0, a1
jal is_nullptr
snez a0 a0
and a0 a0 a5
ret

[snip]
>> Note: while true is defined to be 1 (rewriting other values to 1), the
>> code is guarded and tends to test either zero or not zero, so a
>> non-canonical true (not 0 and not 1) could be true.
>
> A "non-canonical true" could also be false.

????? 
Or worse, it could make the program jump into nowhere (for instance, if used to index into a jump table). There’s a reason the compiler converts a cast to _Bool into a "snez" (sltu with x0) instruction.

 
So It seems that the compiler normalises a  _Bool to 0 or 1 (which is what the comparison instructions give)  but optimises the normalisation away if the compiler can prove it is superfluous because it is only used to test for nonzero-ness in a branch instruction. FWIW that seems sane to me, and might as well be documented. 

Michael Clark

unread,
Jul 27, 2017, 6:31:31 AM7/27/17
to Rogier Brussee, RISC-V ISA Dev, ces...@cesarb.eti.br
On 27 Jul 2017, at 8:42 PM, Rogier Brussee <rogier....@gmail.com> wrote:



Op woensdag 26 juli 2017 01:23:29 UTC+2 schreef michaeljclark:

> On 26 Jul 2017, at 10:11 AM, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:
>
> Em 25-07-2017 17:44, Michael Clark escreveu:
>> Hi Rogier,
>>
>> What is interesting, in addition to the storage being equal to char and
>> the value being 0 or 1, it appears that on x86-64 ABI that bits[63:8] of
>> the register are undefined, as the _Bool test is just for non-zero byte,
>> ignoring the upper bits in the register. On x86 SETcc only sets the low
>> 8-bits, so the rest of the register may have contents from another
>> operation:
>
> On RISC-V, SLT/SLTU/SLTI/SLTIU/LB/LBU all set the whole register, so it makes sense and is simpler to require the whole register to be defined when used to store a _Bool. That is: a _Bool in a register is whatever SLT/SLTU/SLTI/SLTIU would set it to. A _Bool in memory is the lowest byte of a _Bool in a register. Any other value is invalid.

Agree. I was mostly pointing out how _Bool values in registers are implementation defined, and thus should be defined in the ABI doc, given Rogier spotted its absence.

Not just in registers: the ABI COULD define that any nonzero byte stored in a _Bool should be interpreted as true (I am not arguing to do that just pointing out that this is also a  possibility)

In registers it is not even clear if the MSB of the register should be zero
The typical example is

_Bool is_nullptr(const void* p)
{
        return p;
}

should that compile to

ret

or

snez a0 a0
ret

The latter.

The compiler is strict in that it always emits 0 or 1 for _Bool and tolerant in that it always tests against 0

and a related question is how to compile

_Bool are_nullptrs(const void* p. const void* q)
{
_Bool a = is_nullptr(p);
_Bool b = is_nullptr(q);
return a && b;
}

Should that be

jal is_nullptr
mv a5, a0
mv a0, a1
jal is_nullptr
and a0 a0 a5
ret

or 

jal is_nullptr
snez a5, a0
mv a0, a1
jal is_nullptr
snez a0 a0
and a0 a0 a5
ret

The compiler inlines the test if it has the definition, otherwise as you can see in the link above, it saves registers, perhaps unnecessarily. Note this is with -O3. It’s quite elucidating. I guess the compiler has to save callee save register to stash the volatile argument values before calling an external function.


so either this:

  beqz a0,.L5
  snez a0,a1
  ret
.L5:
  li a0,0
  ret

or this:

  add sp,sp,-32
  sd ra,24(sp)
  sd s0,16(sp)
  sd s1,8(sp)
  mv s1,a1
  call _is_nullptr(void const*)
  mv s0,a0
  mv a0,s1
  call _is_nullptr(void const*)
  snez a5,s0
  ld ra,24(sp)
  ld s0,16(sp)
  sub a5,zero,a5
  ld s1,8(sp)
  and a0,a0,a5
  add sp,sp,32
  jr ra

You can see it it negates the return from the first call, which depends on the value being 1 to generate -1, to use as an and mask for the result. The negate seems unnecessary.

It appears -Os uses a branch:


[snip]
>> Note: while true is defined to be 1 (rewriting other values to 1), the
>> code is guarded and tends to test either zero or not zero, so a
>> non-canonical true (not 0 and not 1) could be true.
>
> A "non-canonical true" could also be false.

????? 
Or worse, it could make the program jump into nowhere (for instance, if used to index into a jump table). There’s a reason the compiler converts a cast to _Bool into a "snez" (sltu with x0) instruction.

 
So It seems that the compiler normalises a  _Bool to 0 or 1 (which is what the comparison instructions give)  but optimises the normalisation away if the compiler can prove it is superfluous because it is only used to test for nonzero-ness in a branch instruction. FWIW that seems sane to me, and might as well be documented. 
 
What the C standard says and what implementations define are different things. On X86 0x1 through 0xff are true. i.e. all non-zero char values. 0x100 might not be safe on x86.

> Yeah, most of the time you'll be comparing a _Bool with zero, but that's only because it's so convenient due to the x0 register. Don't count on it.

On RISC-V I believe we can count on non-zero being true, based on what has just been stated. The only completely safe thing for the compiler to emit snez, which is what it does. GCC and LLVM seem to be consistent in this regard as they test for non zero on other architectures (however with 8-bit width on x86).

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Krste Asanovic

unread,
Jul 27, 2017, 6:31:40 AM7/27/17
to Rogier Brussee, RISC-V ISA Dev, ces...@cesarb.eti.br
With inlining, ideally,

are_nullptrs:
  or a0, a0, a1
  seqz a0, a0
  ret

Krste


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Cesar Eduardo Barros

unread,
Jul 27, 2017, 7:19:03 AM7/27/17
to Rogier Brussee, RISC-V ISA Dev
Em 27-07-2017 05:42, Rogier Brussee escreveu:
>
>
> Op woensdag 26 juli 2017 01:23:29 UTC+2 schreef michaeljclark:
>
>
> > On 26 Jul 2017, at 10:11 AM, Cesar Eduardo Barros
Yes, there are two possibilities for _Bool:

- true is one, false is zero, anything else is invalid
- true is non-zero, false is zero

The first option seems to be more common. (The x86 variant discussed in
this thread, which is the first option but with the high bytes as don't
care, is an anomaly that only works for x86 because its instructions can
address the lowest byte of a register as if it were a separate register.)

The first option is more efficient: since the compiler knows a _Bool
returned from a function or read from memory can only ever be 0 or 1, it
can elide a "snez" or equivalent in many cases.

There are also two possibilities for storage of _Bool in memory: as a
single byte, or as the same size of a full register. The first option
(single byte) is less wasteful and more efficient, as long as the ISA
has instructions to read and write a single byte in memory (which is the
case for RISC-V).

> In registers it is not even clear if the MSB of the register should be zero
> The typical example is
>
> _Bool is_nullptr(const void* p)
> {
> return p;
> }
>
> should that compile to
>
> ret
>
> or
>
> snez a0 a0
> ret

It should compile to a "snez".

> and a related question is how to compile
>
> _Bool are_nullptrs(const void* p. const void* q)
> {
> _Bool a = is_nullptr(p);
> _Bool b = is_nullptr(q);
> return a && b;
> }
>
> Should that be
>
> jal is_nullptr
> mv a5, a0
> mv a0, a1
> jal is_nullptr
> and a0 a0 a5
> ret
>
> or
>
> jal is_nullptr
> snez a5, a0
> mv a0, a1
> jal is_nullptr
> snez a0 a0
> and a0 a0 a5
> ret

Neither: since a5 is caller-saved, the second call to is_nullptr can
overwrite a5. Assuming you save it in a callee-saved register instead,
the answer is: the compiler can do either of these. The "snez" is
redundant but harmless, since a _Bool can only be 1 or 0.

> [snip]
>
> >> Note: while true is defined to be 1 (rewriting other values to
> 1), the
> >> code is guarded and tends to test either zero or not zero, so a
> >> non-canonical true (not 0 and not 1) could be true.
> >
> > A "non-canonical true" could also be false.
>
>
> ?????

Suppose you have a value which is supposed to be a _Bool but has the
invalid value "2". A compiler can treat "a && b", where both are _Bool,
as "a & b" (binary instead of logical AND), and it will work for all
valid values of _Bool; but if one of these booleans has an invalid value
with the LSB clear, like "2", while the other one has a valid value, the
result will always be zero, even if both input values are non-zero.

That is, a "non-canonical" (actually invalid) true can sometimes act as
a false. The only way to defend against that is to introduce redundant
"snez" instructions everywhere, which is wasteful.

> Or worse, it could make the program jump into nowhere (for instance,
> if used to index into a jump table). There’s a reason the compiler
> converts a cast to _Bool into a "snez" (sltu with x0) instruction.
>
>
> So It seems that the compiler normalises a _Bool to 0 or 1 (which is
> what the comparison instructions give) but optimises the normalisation
> away if the compiler can prove it is superfluous because it is only used
> to test for nonzero-ness in a branch instruction. FWIW that seems sane
> to me, and might as well be documented.

I agree that it should be documented that only one and zero are valid
values for a boolean (_Bool) variable, and that its storage size in
memory is a single byte. What the compiler does with that information is
an implementation detail, and can change as its optimizer gets better.

Paolo Bonzini

unread,
Jul 27, 2017, 12:47:19 PM7/27/17
to Michael Clark, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev
On 26/07/2017 01:23, Michael Clark wrote:
> What the C standard says and what implementations define are
> different things. On X86 0x1 through 0xff are true. i.e. all non-zero
> char values. 0x100 might not be safe on x86.

This is not correct.

The compiler tests against 0 when converting into bool. However, if you
take some memory whose value is not 0 or 1 and treat it as a bool, that
is undefined behavior and the compiler treats it as such. Try this
program:

#include <stdio.h>
#include <stdbool.h>

int test_boolptr(bool *p)
{
printf("%d %d %d\n", *p, !*p, *p == 0);
}

int main()
{
char c = 0; test_boolptr((bool *)&c);
char d = 1; test_boolptr((bool *)&d);
char e = 254; test_boolptr((bool *)&e);
char f = 255; test_boolptr((bool *)&f);
}

The expected output if the compiler tested against 0 on bool uses
would been

0 1 1
1 0 0
1 0 0
1 0 0

On x86, when compiled without optimization, the program prints:

0 1 1
1 0 0
254 255 255
255 254 254

It is clearly visible that the compiler implements ! and
equal-to-zero as "xor 1" for bool variables.

When compiled with optimization, the program prints

0 1 1
1 0 0
0 1 1
1 0 0

because the compiler is now computing the condition at compile time. It
is still doing a "xor 1", but it is also throwing away everything except
bit 1 when folding "*(bool *)&" at compile-time.

Paolo

Paolo Bonzini

unread,
Jul 27, 2017, 12:49:30 PM7/27/17
to Rogier Brussee, RISC-V ISA Dev, ces...@cesarb.eti.br
On 27/07/2017 10:42, Rogier Brussee wrote:
> In registers it is not even clear if the MSB of the register should be zero
> The typical example is
>
> _Bool is_nullptr(const void* p)
> {
> return p;
> }
>
> should that compile to
>
> ret
>
> or
>
> snez a0 a0
> ret

The latter.

> and a related question is how to compile
>
> _Bool are_nullptrs(const void* p. const void* q)
> {
> _Bool a = is_nullptr(p);
> _Bool b = is_nullptr(q);
> return a && b;
> }
>
> Should that be
>
> jal is_nullptr
> mv a5, a0
> mv a0, a1
> jal is_nullptr
> and a0 a0 a5
> ret
>
> or
>
> jal is_nullptr
> snez a5, a0
> mv a0, a1
> jal is_nullptr
> snez a0 a0
> and a0 a0 a5
> ret

The former. is_nullptr must return canonical values (0/1).

> [snip]
>
> >> Note: while true is defined to be 1 (rewriting other values to
> 1), the
> >> code is guarded and tends to test either zero or not zero, so a
> >> non-canonical true (not 0 and not 1) could be true.
> >
> > A "non-canonical true" could also be false.
>
> ?????

See my other message. !*b could be either true or false if *b points to
a value that is neither 0 nor 1.

> So It seems that the compiler normalises a _Bool to 0 or 1 (which is
> what the comparison instructions give) but optimises the normalisation
> away if the compiler can prove it is superfluous because it is only used
> to test for nonzero-ness in a branch instruction. FWIW that seems sane
> to me, and might as well be documented.

No, the compiler optimizes the normalization away every time it works
with a bool type. The normalization is done when converting *another
type* to bool, as is the case in is_nullptr.

Paolo

Michael Clark

unread,
Jul 27, 2017, 4:05:27 PM7/27/17
to Paolo Bonzini, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev

> On 28 Jul 2017, at 4:47 AM, Paolo Bonzini <bon...@gnu.org> wrote:
>
> On 26/07/2017 01:23, Michael Clark wrote:
>> What the C standard says and what implementations define are
>> different things. On X86 0x1 through 0xff are true. i.e. all non-zero
>> char values. 0x100 might not be safe on x86.
>
> This is not correct.

The observation is absolutely correct for that codegen. I can test the function as i’ve done.

It’s plainly visible from the asm that toBool only looks at bits[7:0]

toBool:
test %dil,%dil
setne %al
retq

e.g.

cat > a1.c <<EOF
#include <stdio.h>
#include <stdbool.h>

_Bool toBool(char c) { return c; }

_Bool testToBool(int b)
{
asm volatile(
"call toBool \n"
);
}
EOF

cat > a2.c <<EOF
#include <stdio.h>
#include <stdbool.h>

extern _Bool testToBool(int b);

int main() {
int a = 0, b = 1, c = 0xff, d = 0x100;
printf("0x%x=%s\n", a, testToBool(a) ? "true" : "false");
printf("0x%x=%s\n", b, testToBool(b) ? "true" : "false");
printf("0x%x=%s\n", c, testToBool(c) ? "true" : "false");
printf("0x%x=%s\n", d, testToBool(d) ? "true" : "false");
}
EOF

gcc -O3 -c a1.c -o a1.o
gcc -O3 -c a2.c -o a2.o
gcc a1.o a2.o -o a
./a
./a
0x0=false
0x1=true
0xff=true
0x100=false

It understand it is undefined behaviour.

As is already clear in this thread:

- sizeof(_Bool) == sizeof(char) == 1
- char/_Bool 1 is true
- char/_Bool 0,
- x86 supports register aliases with 8-bit widths and uses them for char/_Bool
- RISC-V needs to use a full width register

The big question is whether it is acceptable to perform optimisations that only look at bit[0] versus non zero register as true ; – for RISC-V. I prefer the latter.

I can see the utility for only testing bit[0] as a truth value for a number of reasons. It could allow a micro-architectural optimisation that exploits this for a low cost SELECT instruction that ignore bits[XLEN-1:1], however the choices of using a full width zero comparator on the register file versus a 3rd read port only on bit 0 (both only require a 1-bit read port) is much lower cost than the reorder buffer structures in an OoO i.e. both allow a 1 bit 3rd read port at relatively low cost. The compilers can be more cavalier with undefined behaviour. The ISA/CPU has to be more conservative than the compiler, which can take short cuts that lead to subtle bugs, which can later be patched with package updates. Patching silicon doesn’t work the same way. The ISA behaviour has to be absolutely defined assuming there were to be any instructions that take booleans. I think non-zero register is safer than bit[0] after reflecting on this a bit. I believe non-zero register as a truth value is much safer.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/ffa614b9-54cf-7763-2823-688f2f1f91aa%40gnu.org.

Michael Clark

unread,
Jul 27, 2017, 4:13:52 PM7/27/17
to Paolo Bonzini, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev
I’m type punning to test generated code to show that my statements are absolutely correct with respect to the particular x86 codegen. 

Sure there may be cases when the compiler is even more cavalier but given sizeof(_Bool) == sizeof(char) one would expect 0 to be false and 0x1 to 0xff to test as true (on x86 which has multiple size register aliases). I understand it is “implementation defined” behaviour. “undefined behaviour” has no place in a heterogenous interoperable ISA. “implementation defined” is acceptable but “specification defined” is obviously much better.

Bruce Hoult

unread,
Jul 27, 2017, 4:50:21 PM7/27/17
to Michael Clark, Paolo Bonzini, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev
I've always (for decades) liked all-1s as the canonical true. it works with testing bit[0]. It works with testing for not-zero. It works with AND and ANDC then OR to synthesise select().

It fits into the llvm paradigm of "bool is a 1 bit int" together with the RISC-V paradigm of "narrow values are always sign-extended in wide registers". 

About the only thing it's not good for is indexing into a zero-based array.

Of course it's trivial to convert between true=1 and true=-1 with a neg.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Michael Clark

unread,
Jul 27, 2017, 4:54:39 PM7/27/17
to Bruce Hoult, Paolo Bonzini, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev
-1 is useful for emulating select too, but given conservative code we should test for non-zero

snez a0,a0
neg a0,a0

if we had a SELECT instruction, non-zero would probably be safest for true, but I can see an argument for bit[0]

Michael Clark

unread,
Jul 27, 2017, 5:33:28 PM7/27/17
to Rogier Brussee, Bruce Hoult, Paolo Bonzini, Cesar Eduardo Barros, RISC-V ISA Dev
BTW here are the 4 forms of constant time select:


using this identity:

mix(x,y,z) == (x&y) | ((~x)&z) 
           == x&y ^ (~x)&z
           == x&y ^ ((-1) ^ x)&z 
           == x&y ^ x&z ^ (-1)&z 
           == x&(y^z) ^ z
           == ((y^z)&x)^z 

Both forms are 5 instructions if defining true to be non-zero and 4 instructions if the compiler knows the value is 0 or 1.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Michael Clark

unread,
Jul 27, 2017, 5:53:41 PM7/27/17
to Rogier Brussee, Bruce Hoult, Paolo Bonzini, Cesar Eduardo Barros, RISC-V ISA Dev
BTW if the ABI defines true to be bit[0], then in the type punned register value, true is odd and false is even. That definition allows some optimisations in some places but requires other code to be more guarded. My opinion is that non-zero register value being true and zero being false, at the ABI level, is probably the safest even with the precense of canonical true == 1. It requires a snez to canonicalise a “supposed” boolean before use. There are 3 options:

- (bit[0] == 1) == true
- (bits[XLEN-1:0] > 0) == true
- (bits[XLEN-1:0] == 1) == true

I guess non-canonical true would be undefined behaviour but the thought experiment is a SELECT rd,rs1,rs2,rs3. The ISA would need to clearly define what was true. It emits 1 from SLT, however having register values greater than 1 equate to false would be a bit weird. non-zero as true for a type-punned register value makes the most sense to me at least.

Michael Clark

unread,
Jul 28, 2017, 6:39:32 PM7/28/17
to Rogier Brussee, Bruce Hoult, Paolo Bonzini, Cesar Eduardo Barros, RISC-V ISA Dev
On 28 Jul 2017, at 9:33 AM, Michael Clark <michae...@mac.com> wrote:

BTW here are the 4 forms of constant time select:


I made a mistake with signed vs unsigned in the register width vs the compiler _Bool promotion to register width version; unsigned c; (c > 0). The former version had sgt zero (SLT) instead of snez (SLTU). This revised version uses uintptr_t as a proxy for a register width and uses the compiler promotion rules for _Bool to register width (snez a0,a0).


Nevertheless (c > 0) generates equivalent code when using unsigned long.

So constant time SELECT is 4 instructions if we know the register contains canonical _Bool (0, 1), but canonicalisation adds snez a0,a0. In fact, once there are deep pipelines with RVC, macro-op fusion coalescing, and OoO, then branch mis-predict might make constant time select profitable in some cases. I can also use this pattern match to emit MOV+CMOV in binary translation. i.e. detect the branchless predicated move pattern.

BTW “An Approach for Implementing Efficient Superscalar CISC Processors” paper describes an interesting approach of macro-op fused dependent pairs that I believe is equally applicable to RISC. In fact the RISC encoding is very similar to RISC-V, with 32-bit 3 operand instructions and 16-bit 2 operand destructive variants. The primary difference is the presence of a fuse bit. I’m still curious as to why Figure 4. “Macro-op execution overview” doesn’t split the 2 ALUs into two different pipeline stages, with the second ALU receiving one of its operands from the first ALU, eliding the temporary register write, to retire two fused dependent ops every cycle instead of one, assuming the first ALU can start on the head of another macro-op while the second ALU is working on the tail of the previous.


The idea of lowering the cost of OoO in this way is quite interesting. i.e. a 4-issue OoO that could retire up to 8 fused operations per cycle would be quite interesting. OoO exploits dependeny-free parallelism whereas the paper shows a design that exploits temporal parallelism in dependent fused operations, which from my analysis of RISC-V code, these 2-op patterns appear to be very frequent. It also reduces the cost of the OoO structures.

I’m still trying to figure out whether it is possible for this design to emit 2 dependent ops per cycle, at the cost of higher latency, with an additional pipeline stage versus 2 ops every 2 cycles (rotate the pair of ALUs in Fig 4 and split into two pipeline stages). I don’t understand why the 2 ALUs have been put in the same pipeline stage.

Alex Elsayed

unread,
Jul 29, 2017, 12:26:59 AM7/29/17
to isa...@groups.riscv.org

The problem with this is that the user-level ISA is already stabilized with SLT[I][U] setting the destination register to 1 or 0. As a result, using anything other than 1 as the canonical true seems very suboptimal.

 

If people need to turn a boolean into a bitmask, then they can use NEG.

> > To post to this group, send email to isa...@groups.riscv.org.

> > Visit this group at https://groups.google.com/a/

> > groups.riscv.org/group/isa-dev/.

> > To view this discussion on the web visit https://groups.google.com/a/

> > groups.riscv.org/d/msgid/isa-dev/89F4AC37-70B4-4975-AF45-

> > 7751D478F93F%40mac.com

signature.asc

Paolo Bonzini

unread,
Jul 30, 2017, 3:06:19 PM7/30/17
to Michael Clark, Cesar Eduardo Barros, Rogier Brussee, RISC-V ISA Dev
On 27/07/2017 22:13, Michael Clark wrote:
> I’m type punning to test generated code to show that my statements are
> absolutely correct with respect to the particular x86 codegen.

That just happens to be the cheapest in this particular occasion.

If you write

char f(bool x)
{
return x ? 2 : 0;
}

the compiler can compile it to just a left shift by one.

Paolo

Rogier Brussee

unread,
Jul 31, 2017, 5:44:59 AM7/31/17
to RISC-V ISA Dev, rogier....@gmail.com, br...@hoult.org, bon...@gnu.org, ces...@cesarb.eti.br


Op donderdag 27 juli 2017 23:33:28 UTC+2 schreef michaeljclark:
BTW here are the 4 forms of constant time select:



The functions select_1()  and select_2 contain a bug. 

long select_1(long c, long y, long z)
{
{
long x = -(c > 0);
return (-x&y) | ((~x)&z);
}

e.g select_1 that should be

long x = -(c > 0);
return (x&y) | ((~x)&z);

i.e. there is a minus sign to much.

The compiler explorer is great though!  with the minus sign fixed we get

select_1(long, long, long):
sgt a0,a0,zero
subw a5,zero,a0
addw a0,a0,-1
and a1,a1,a5
and a0,a2,a0
or a0,a1,a0
ret 

and 

select_3(bool, long, long):
sub a5,zero,a0
add a0,a0,-1
and a1,a5,a1
and a0,a0,a2
or a0,a1,a0
ret

(sidenote: the explorer still generates add a0, a0, -1 instead of addi a0, a0, -1)

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Michael Clark

unread,
Jul 31, 2017, 4:55:08 PM7/31/17
to Rogier Brussee, RISC-V ISA Dev, br...@hoult.org, bon...@gnu.org, ces...@cesarb.eti.br
On 31 Jul 2017, at 9:44 PM, Rogier Brussee <rogier....@gmail.com> wrote:



Op donderdag 27 juli 2017 23:33:28 UTC+2 schreef michaeljclark:
BTW here are the 4 forms of constant time select:



The functions select_1()  and select_2 contain a bug. 

long select_1(long c, long y, long z)
{
{
long x = -(c > 0);
return (-x&y) | ((~x)&z);
}

e.g select_1 that should be

long x = -(c > 0);
return (x&y) | ((~x)&z);

i.e. there is a minus sign to much.

The compiler explorer is great though!  with the minus sign fixed we get

Two bugs, as we needed to use unsigned to promote to bool. i.e. snez. It needs some test cases :-D

Ya. compiler explorer is quite neat. Here is the version that uses unsigned types and you can see the aarch64 versions too: 


The first two, gcc it is able to detect it is the SELECT pattern (for both the and,or and xor versions) and lower to csel, however I think we’ve found a compiler bug in aarch64 because it’s using floating point instructions for one of the _Bool versions. :-D

Reply all
Reply to author
Forward
0 new messages