I recently stumbled over the discussion about possibly adding conversions from bool to various numerical types in go to avoid branches.
In my own code I solved this by using the unsafe package to convert bool to byte, but I noticed that this seems to be slower than just using an if statement in many cases.
So I wrote a very simple test case to investigate the behavior:
#####
package main
import "unsafe"
func main(){
s:=byte(0)
for i:=0;i<10000000000;i++{
if i<100 { s++ }
b:=i<100; s+=*(*byte)(unsafe.Pointer(&b))
}
}
#####
which includes minimal examples of equivalent branching and (supposedly) non-branching solutions inside the for loop (upper line branching, lower line no branching).
The corresponding assembly is the following (see end of message for the whole main):
--- branching part?
0011 (bool.go:8) CMPQ AX,$100 //Comparison (first instruction of branching part)
0012 (bool.go:8) JGE ,17 //Conditional jump to avoid increment instruction
0013 (bool.go:8) INCQ ,CX //Increment
0014 (bool.go:8) JMP ,17 //Unconditional jump to non-branching part (17)
--- non branching part?
0015 (bool.go:9) MOVB $1,b+-1(SP) //Store true on stack
0016 (bool.go:9) JMP ,20 //Unconditional jump to
0017 (bool.go:9) CMPQ AX,$100 //Comparison (first instruction of non-branching part)
0018 (bool.go:9) JLT ,15 //Conditional jump to creation of true value
0019 (bool.go:9) MOVB $0,b+-1(SP) //Store false on stack
0020 (bool.go:9) LEAQ b+-1(SP),BX //Get pointer to address on stack where true/false is stored
0021 (bool.go:9) NOP , //(?) Aligning jump target for end of loop? (RET is at 25 and target of 10) Or timing?
0022 (bool.go:9) MOVBQZX (BX),BX //Dereferencing pointer to true false
0023 (bool.go:9) ADDQ BX,CX //Addition
I commented the assembly as best as I could while trying to understand it (this is the first time I ever dealt with any assembly, so even nitpicky corrections are very welcome, I couldn't even find some instructions on google and had to wildly guess).
To me it looks like there are two issues with the non-branching part.
The first is that the chain of referencing and dereferencing is not optimized (20-22) and could be replaced by "MOVB b+-1(SP), BX" (I made that up and it might be ridiculously wrong or invalid).
The second issue would be that the true and false value are created by jumping around. Is there no way get it directly by copying a flag (I read somewhere that CMP does a signed subtract and which stores stuff in flags but throws away the result). The flag we would want would be ZF if I understood what CMP does correctly and there are two ways to do that (LAHF and something about pushing something onto the stack which I didn't understand at all).
LAHF can't be used because we might be running on an early AMD64 or EMT64 CPU that doesn't have it? What about the other option?
If both of these are impractical, can't we just actually do a subtraction and a few shift and & operations to convert everything non-zero to 1, or am I overestimating how much jumps can cost? There must be some way to do a comparison and utilize the results somehow, right?
At least the first optimization issue (reference, dereference) could be interesting, since this kind of casting is probably the most popular usecase of the unsafe package (even the math package does it), or would it be appropriate to make a direct conversions of values of a certain length available in unsafe (unsafe.Value or Value8, Value16, Value32, Value64 which are convertible to values of the same length [similar to unsafe.Pointer]) and not optimize this?
As I mentioned this is the first time I saw assembly in my life, so have mercy. I am fully expecting to be wrong about most stuff I wrote.
Best,
Florian
--- prog list "main" ---
0000 (bool.go:5) TEXT main+0(SB),$8-0
0001 (bool.go:5) FUNCDATA $0,gcargs·0+0(SB)
0002 (bool.go:5) FUNCDATA $1,gclocals·0+0(SB)
0003 (bool.go:5) TYPE b+-1(SP){bool},$1
0004 (bool.go:6) MOVQ $0,CX
0005 (bool.go:7) MOVQ $0,AX
0006 (bool.go:7) JMP ,8
0007 (bool.go:7) INCQ ,AX
0008 (bool.go:7) MOVQ $10000000000,BP
0009 (bool.go:7) CMPQ AX,BP
0010 (bool.go:7) JGE $0,25
0011 (bool.go:8) CMPQ AX,$100
0012 (bool.go:8) JGE ,17
0013 (bool.go:8) INCQ ,CX
0014 (bool.go:8) JMP ,17
0015 (bool.go:9) MOVB $1,b+-1(SP)
0016 (bool.go:9) JMP ,20
0017 (bool.go:9) CMPQ AX,$100
0018 (bool.go:9) JLT ,15
0019 (bool.go:9) MOVB $0,b+-1(SP)
0020 (bool.go:9) LEAQ b+-1(SP),BX
0021 (bool.go:9) NOP ,
0022 (bool.go:9) MOVBQZX (BX),BX
0023 (bool.go:9) ADDQ BX,CX
0024 (bool.go:7) JMP ,7
0025 (bool.go:11) RET ,