Possible case of missed CMOV optimization ?

313 views
Skip to first unread message

Agniva De Sarker

unread,
Sep 16, 2018, 10:20:45 AM9/16/18
to golan...@googlegroups.com, Giovanni Bajo
Hi,

I discovered a very interesting optimization when I received a PR here.

Consider this code- 

package main

func min2(a, b int) int {
if a < b {
return a
}
return b
}

func min3(a, b, c int) int {
if a < b {
if a < c {
return a
}
} else if b < c {
return b
}
return c
}

var a, b, c int
var d int

func main() {
d = min2(min2(a, b), c)  --- L25
d = min3(a, b, c)  -- L26
}

We are trying to find the min of 3 integers. Both L25 and L26 are doing the same thing, albeit in slightly different manners. First one performs min of 2 integers twice, second one performs them in a single function. One would think that the compiler would be smart enough so that the generated code would be equivalent. Or atleast, not be 14% faster than the other.

On investigating the assembly on L25 and L26, we see interesting results - 

First one - 

0x0021 00033 (funcreg.go:25)    PCDATA  $2, $0
0x0021 00033 (funcreg.go:25)    PCDATA  $0, $0
0x0021 00033 (funcreg.go:25)    MOVQ    "".c(SB), AX
0x0028 00040 (funcreg.go:25)    MOVQ    "".a(SB), CX
0x002f 00047 (funcreg.go:25)    MOVQ    "".b(SB), DX
0x0036 00054 (funcreg.go:25)    CMPQ    CX, DX
0x0039 00057 (funcreg.go:25)    CMOVQLT CX, DX
0x003d 00061 (funcreg.go:25)    CMPQ    DX, AX
0x0040 00064 (funcreg.go:25)    CMOVQLT DX, AX
0x0044 00068 (funcreg.go:25)    MOVQ    AX, "".d(SB)

Next one - 

0x002a 00042 (funcreg.go:26)    MOVQ    "".c(SB), AX
0x0031 00049 (funcreg.go:26)    MOVQ    "".a(SB), CX
0x0038 00056 (funcreg.go:26)    MOVQ    "".b(SB), DX
0x003f 00063 (funcreg.go:26)    CMPQ    CX, DX
0x0042 00066 (funcreg.go:25)    JGE     86
0x0044 00068 (funcreg.go:26)    CMPQ    CX, AX
0x0047 00071 (funcreg.go:26)    JGE     81
0x0049 00073 (funcreg.go:26)    MOVQ    CX, "".d(SB)
0x0050 00080 (funcreg.go:27)    RET
0x0051 00081 (funcreg.go:26)    MOVQ    AX, CX
0x0054 00084 (funcreg.go:26)    JMP     73
0x0056 00086 (funcreg.go:26)    CMPQ    DX, AX
0x0059 00089 (funcreg.go:26)    JGE     81
0x005b 00091 (funcreg.go:26)    MOVQ    DX, CX
0x005e 00094 (funcreg.go:26)    JMP     73

Notice that there is no CMOV generated in the 2nd case.

Only if we change the min3 code to behave like min2 like this - 

func min3(a, b, c int) int {
min := b
if a < b {
min = a
}
if c < min {
min = c
}
return min
}

then we get CMOV generated. But this is only after I realized how the compiler is converting the code to explicitly generate the CMOV instruction.

Is this something possible for the compiler to detect and automatically apply ?

-Agniva

Keith Randall

unread,
Sep 16, 2018, 11:08:38 AM9/16/18
to agniva.qu...@gmail.com, golang-dev, Giovanni Bajo
Sure, open an issue.
The tricky part about such optimizations is deciding when to apply them.  It's easy to make code worse...


--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giovanni Bajo

unread,
Sep 16, 2018, 2:48:52 PM9/16/18
to agniva.qu...@gmail.com, golan...@googlegroups.com
The branchelim pass does some pattern matching on basic CFG patterns (see the comment at the top in branchelim.go). Your second pattern is not currently recognized. As Keith says, the complex thing is not writing the code to detect the pattern, but rather tune the heuristics to make sure that we're not actually slowing down code.

Giovanni
--
Giovanni Bajo   ::  ra...@develer.com
Develer S.r.l.  ::  http://www.develer.com

Tel: +39-055-3984627 (ext: 206)

Agniva De Sarker

unread,
Sep 17, 2018, 12:21:03 AM9/17/18
to Giovanni Bajo, golan...@googlegroups.com
Got it. Thanks for the clarification !

toc...@gmail.com

unread,
Sep 17, 2018, 3:23:35 PM9/17/18
to golang-dev
IIRC branchelim doesn't convert any returns. 
E. g. 

func f(a, b int) int {
        if a < b {
                return a
        }
        return b
}

Generates:
v16    00006 (4)       MOVQ    "".a(SP), AX
 v10    00007 (4)       MOVQ    "".b+8(SP), CX
 v15    00008 (4)       CMPQ    AX, CX
 b1     00009 (4)       JGE     12
 v14    00010 (5)       MOVQ    AX, "".~r2+16(SP)
 b2     00011 (5)       RET
 v18    00012 (7)       MOVQ    CX, "".~r2+16(SP)
 b3     00013 (7)       RET

Agniva De Sarker

unread,
Sep 19, 2018, 3:41:21 AM9/19/18
to golang-dev
Yes, the code for the function does not change. But if you call the function, then it gets inlined at the call site and at that point CMOVs are inserted. I don't know why though.

Philip Hofer

unread,
Sep 20, 2018, 12:53:29 PM9/20/18
to golang-dev
Author of the original branchelim code here.

The reason the example with returns doesn't get optimized is because the compiler doesn't do return block unification.
Keith wrote a patch once to implement it, but it was never merged.

The optimization has no trouble rewriting that function once it is inlined because the target of the 'return' branch
is simply another basic block in the function. (In some sense, inlining has to implement return block unification.)

Agniva De Sarker

unread,
Sep 20, 2018, 1:53:02 PM9/20/18
to golang-dev
Filed https://github.com/golang/go/issues/27780. Let us continue further discussion there.
Reply all
Reply to author
Forward
0 new messages