Carting int to bool and the reverse for bit hacking ?

202 views
Skip to first unread message

christoph...@gmail.com

unread,
Nov 20, 2020, 4:26:57 AM11/20/20
to golang-nuts
Go has a strict type separation between int and bool. For a normal usage this is OK. I guess it also ease assembly portability because there is not always a machine instruction to do so.

For bit hacking, this is an unfortunate limitation. A handy feature of bool <-> int conversion is that true is converted to 1 and any integer different of zero to true. As a consequence, when we write int(bool(x)) we get 0 when x is 0, and 1 when x is not 0. I checked the math/bits package and there is not an equivalent function.

Is there a way around this that avoids the conditional branching ?

The use case I have to test if an integer is in a quite small constant set of integers. With bool <-> int conversion, we can use binary operation like this for instance which would be valid in C

r := int(bool(v^v1))&int(bool(v^v2))&int(bool(v^v3))

This is to be compared with

r := v==v1 || v==v2 || v==v3

The second form is of course more readable, but will also have a conditional branch at each ||. If most tested integers are different from v1, v2 and v3, the first form seam to me more efficient due to pipelining. Though branching prediction can mitigate the penalty.

The best I could do as valid Go alternative is this

r := (v^v1)*(v^v2)*(v^v3)

but the multiplication is obviously not efficient.

Did I miss something ?

b.ca...@pobox.com

unread,
Nov 20, 2020, 5:37:50 AM11/20/20
to golang-nuts
On Friday, 20 November 2020 at 09:26:57 UTC christoph...@gmail.com wrote:
A handy feature of bool <-> int conversion is that true is converted to 1 and any integer different of zero to true. As a consequence, when we write int(bool(x)) we get 0 when x is 0, and 1 when x is not 0.

I don't think you've tried this with Go.

./prog.go:9:15: cannot convert x (type int) to type bool

 It doesn't work the other way either: e.g. int(v ^ v1 != 0)

./go1.go:9:13: cannot convert v ^ v1 != 0 (type untyped bool) to type int



This is to be compared with

r := v==v1 || v==v2 || v==v3

The second form is of course more readable, but will also have a conditional branch at each ||.


You should compare the generated machine code before saying one is better than the other.  If int(bool(x)) worked, it might also have a conditional branch; that is, it would presumably be the same as int(x != 0). In that case, your code becomes int(v^v1 != 0) which may or may not compile to the same code as int(v != v1).

To see the assembly language:  go build -gcflags -S myprog.go

Note that

> r := (v^v1)*(v^v2)*(v^v3)

risks integer overflow and may give the wrong answer. Playground

Apart from this, I think you're trying to micro-optimise.  Do you have code which is measurably affected by the problem you describe, i.e. by actual profiling?  Modern processors have optimisations to deal with branches, e.g. speculative execution. Also, if you're dealing with data which is not in cache, then DRAM access is orders of magnitude slower than the CPU - i.e. the CPU cost is almost free compared to the DRAM access cost.

b.ca...@pobox.com

unread,
Nov 20, 2020, 6:01:01 AM11/20/20
to golang-nuts
On Friday, 20 November 2020 at 09:26:57 UTC christoph...@gmail.com wrote:
The use case I have to test if an integer is in a quite small constant set of integers. With bool <-> int conversion, we can use binary operation like this for instance which would be valid in C

r := int(bool(v^v1))&int(bool(v^v2))&int(bool(v^v3))


FWIW, here's some complete C:

#include <stdio.h>
#include <stdbool.h>

int Test0(int v, int v1, int v2, int v3)
{
     return v == v1 || v == v2 || v == v3;
}

int Test1(int v, int v1, int v2, int v3)
{
    int r = (int)((bool)(v^v1)) & (int)((bool)(v^v2)) & (int)((bool)(v^v3));
    return !r;
}

int Test2(int v, int v1, int v2, int v3)
{
    int r = !!(v^v1) & !!(v^v2) & !!(v^v3);
    return !r;
}

int main(void)
{
    printf("%d\n", Test0(5, 1, 2, 4));
    printf("%d\n", Test0(5, 1, 2, 5));
    printf("%d\n", Test1(5, 1, 2, 4));
    printf("%d\n", Test1(5, 1, 2, 5));
    printf("%d\n", Test2(5, 1, 2, 4));
    printf("%d\n", Test2(5, 1, 2, 5));
    return 0;
}


Test environment:

$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Using gcc -s, I see that Test1 and Test2 do indeed compile without branches (using setne) whilst Test0 has branches. However with flag -O3, all three functions compile to identical highly-compact code without branches.

I think the moral is: don't waste your time trying to outsmart the compiler.

It's true that Go's compiler doesn't currently optimise this case, but new optimisations are being added over time.

silas poulson

unread,
Nov 20, 2020, 8:04:15 PM11/20/20
to golang-nuts
On Friday, November 20, 2020 at 11:01:01 AM UTC b.ca...@pobox.com wrote:
>I think the moral is: don't waste your time trying to outsmart the compiler.

Would like to reiterate this - Godblot <https://godbolt.org/> is an
amazing tool for showing just how much you're compiler does for you

b.ca...@pobox.com

unread,
Nov 21, 2020, 4:30:13 AM11/21/20
to golang-nuts
That's fantastic.  Paste in

int Test0(int v, int v1, int v2, int v3)
{
     return v == v1 || v == v2 || v == v3;
}

and then the the compiler options to -O3.

christoph...@gmail.com

unread,
Nov 21, 2020, 5:36:26 AM11/21/20
to golang-nuts
Thank you for the responses. Unfortunately, they all comment on the given use case. 
The real question is if we should add a function in math/bits that performs the operation. 
The function would be 

func notZero(x int) int {
    if x == 0 {
        return 0
    }
    return 1
}

Regarding code optimization, do they also optimize range testing ?

if x >= a && x <= b ...

can be translated into the following when b > a. 

if uint(x - a) <= uint(b-a) ...

This replaces two conditional branches with one. I checked go assembly and the later is indeed simpler. 

I benchmarked decode rune with both and didn't see much difference. This is probably because in a tight loop of benchmark tests, the branching prediction does well its job. 

Aleksey Tulinov

unread,
Nov 21, 2020, 5:41:03 AM11/21/20
to christoph...@gmail.com, golang-nuts
To me your example appears somewhat confusing, int(bool(int())) is the
fishiest part IMO. I assume bool(int()) is just (v^v1 != 0) in
disguise and this is essentially

(v^v1 != 0) & (v^v2 != 0) & (v^v3 != 0)

Am i right?

Go can't & bools, so

func bool2int(b bool) int { // This is what Go compiler can optimize well
if b {
return 1
}
return 0
}

And this leaves us with

bool2int(v^v1 != 0) & bool2int(v^v2 != 0) & bool2int(v^v3 != 0)

Is that correct?

https://godbolt.org/z/jq368G

I don't see branching in relevant parts. v == v1 || v == v2 will of
course branch because || is a condition.

Does that answer your question or maybe I am missing something?

пт, 20 нояб. 2020 г. в 11:27, christoph...@gmail.com
<christoph...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a2b743d7-011d-481f-9a0f-3f00f4507328n%40googlegroups.com.

christoph...@gmail.com

unread,
Nov 22, 2020, 4:54:09 AM11/22/20
to golang-nuts
Thank you Aleksey. That is indeed a working solution, and it works well.
 Here are the two functions I wrote as suggested:

func bool2int(b bool) int {

        if b {
                return 1
        }
        return 0
}

func testBitHack(v int) bool {
        return (bool2int(v==10) & bool2int(v==5) & bool2int(v==15)) == 0
}

Here is the Go assembly code of testBitHack

"".testBitHack STEXT nosplit size=47 args=0x10 locals=0x0
        0x0000 00000 (main.go:12)       TEXT    "".testBitHack(SB), NOSPLIT|ABIInternal, $0-16
        0x0000 00000 (main.go:12)       FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (main.go:12)       FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (main.go:13)       MOVQ    "".v+8(SP), AX
        0x0005 00005 (main.go:13)       CMPQ    AX, $10
        0x0009 00009 (main.go:13)       SETEQ   CL
        0x000c 00012 (main.go:13)       CMPQ    AX, $5
        0x0010 00016 (main.go:13)       SETEQ   DL
        0x0013 00019 (main.go:13)       CMPQ    AX, $15
        0x0017 00023 (main.go:13)       SETEQ   AL
        0x001a 00026 (main.go:13)       MOVBLZX DL, DX
        0x001d 00029 (main.go:13)       MOVBLZX CL, CX
        0x0020 00032 (main.go:13)       ANDQ    DX, CX
        0x0023 00035 (main.go:13)       MOVBLZX AL, AX
        0x0026 00038 (main.go:13)       TESTQ   AX, CX
        0x0029 00041 (main.go:13)       SETEQ   "".~r1+16(SP)
        0x002e 00046 (main.go:13)       RET

The function bool2int and its condition were effectively optimized away by the Go compiler.  That’s awesome. Good job. It’s a nice trick.

Aleksey Tulinov

unread,
Nov 23, 2020, 7:09:48 AM11/23/20
to christoph...@gmail.com, golang-nuts
Yeah, it's not the first time this question appears and it's not
immediately obvious that Go compiler can do this optimization.

IIRC someone asked for this optimization on Github and it's there, but
Go doesn't allow bool to int conversion therefore it can't do this
optimization implicitly, which is quite unfortunate because some
people might take branching in bitwise operations for granted even
though there is no branching when integers are used. You might also
want to check what GCC is doing, maybe it can optimize code better.

https://godbolt.org/z/Eod5Ye

Assembly looks smaller and when xor is removed it appears like the
entire function is being optimized away perhaps because it's a
constant expression that can be evaluated at compile time. But i'm not
sure how good this assembly is without measuring actual performance.

Hope this helps.

вс, 22 нояб. 2020 г. в 11:54, christoph...@gmail.com
<christoph...@gmail.com>:
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/4381cff4-a79e-4b91-bf04-c7a2c95af309n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages