Why does for-range behaves differently depend on the slice's struc size?

111 views
Skip to first unread message

Yulrizka

unread,
Apr 24, 2020, 8:26:08 AM4/24/20
to golang-nuts
Hello,


I was playing around with this code 

main_var.go
package main

func main
() {
   
const size = 1000000

    slice
:= make([]SomeStruct, size)
   
for _, s := range slice { // line 7
        _
= s
   
}

}


type_small.go
package main

type
SomeStruct struct {
    ID0 int64
    ID1 int64
    ID2 int64
    ID3 int64
    ID4 int64
    ID5 int64
    ID6 int64
    ID7 int64
    ID8 int64
}


I noted that if i added anoteher 64 bit int64 `ID9` (total of 10 * 8 byte = 80 byte) integer to the struct, the for-loop becomes slower.
And if I compare the assembly, it added instruction to copy the element

// with 9 int64 (72 bytes)
   
0x001d 00029 (main_var.go:6)    LEAQ    type."".SomeStruct(SB), AX
   
0x0024 00036 (main_var.go:6)    MOVQ    AX, (SP)
   
0x0028 00040 (main_var.go:6)    MOVQ    $1000000, 8(SP)
   
0x0031 00049 (main_var.go:6)    MOVQ    $1000000, 16(SP)
   
0x003a 00058 (main_var.go:6)    CALL    runtime.makeslice(SB)
   
0x003f 00063 (main_var.go:6)    XORL    AX, AX
   
0x0041 00065 (main_var.go:7)    INCQ    AX
   
0x0044 00068 (main_var.go:7)    CMPQ    AX, $1000000
   
0x004a 00074 (main_var.go:7)    JLT    65
   
0x004c 00076 (main_var.go:7)    MOVQ    32(SP), BP
   
0x0051 00081 (main_var.go:7)    ADDQ    $40, SP
   
0x0055 00085 (main_var.go:7)    RET
   
0x0056 00086 (main_var.go:7)    NOP
   
0x0056 00086 (main_var.go:3)    CALL    runtime.morestack_noctxt(SB)
   
0x005b 00091 (main_var.go:3)    JMP    0

// with 10 int64 (80 bytes), it added DUFFCOPY instruction
   
0x001d 00029 (main_var.go:6)    LEAQ    type."".SomeStruct(SB), AX
   
0x0024 00036 (main_var.go:6)    MOVQ    AX, (SP)
   
0x0028 00040 (main_var.go:6)    MOVQ    $1000000, 8(SP)
   
0x0031 00049 (main_var.go:6)    MOVQ    $1000000, 16(SP)
   
0x003a 00058 (main_var.go:6)    CALL    runtime.makeslice(SB)
   
0x003f 00063 (main_var.go:6)    MOVQ    24(SP), AX
   
0x0044 00068 (main_var.go:6)    XORL    CX, CX
   
0x0046 00070 (main_var.go:7)    JMP    76
   
0x0048 00072 (main_var.go:7)    ADDQ    $80, AX
   
0x004c 00076 (main_var.go:7)    LEAQ    ""..autotmp_7+32(SP), DI
   
0x0051 00081 (main_var.go:7)    MOVQ    AX, SI
   
0x0054 00084 (main_var.go:7)    DUFFCOPY    $826 # <-- copy the element
   
0x0067 00103 (main_var.go:7)    INCQ    CX
   
0x006a 00106 (main_var.go:7)    CMPQ    CX, $1000000
   
0x0071 00113 (main_var.go:7)    JLT    72
   
0x0073 00115 (main_var.go:7)    MOVQ    112(SP), BP
   
0x0078 00120 (main_var.go:7)    ADDQ    $120, SP
   
0x007c 00124 (main_var.go:7)    RET
   
0x007d 00125 (main_var.go:7)    NOP
   
0x007d 00125 (main_var.go:3)    CALL    runtime.morestack_noctxt(SB)
   
0x0082 00130 (main_var.go:3)    JMP    0


I am just curious to why the bahavior is different on larger struct (> 80 bytes) even though in both cases the element of the slice is not being use.

Thank you

Tamás Gulácsi

unread,
Apr 24, 2020, 10:25:51 AM4/24/20
to golang-nuts
After grepping the sources for DUFFCOPY,  ./cmd/compile/internal/ssa/rewriteAMD64.go suggests that it is a DUFFCOPY before SSA rewrites it to something faster - and this is a size-depending operation.

But I may be totally wrong...

Yulrizka

unread,
Apr 24, 2020, 4:31:25 PM4/24/20
to golang-nuts
Most definitely interesting. https://golang.org/src/cmd/compile/internal/ssa/rewriteAMD64.go#L54447
Am I correct to assume that DUFFCOPY is and optimization for MOVE?

If so, i wonder why the first example did not generate any MOV command at all.

Ian Lance Taylor

unread,
Apr 24, 2020, 4:44:26 PM4/24/20
to Yulrizka, golang-nuts
On Fri, Apr 24, 2020 at 1:32 PM Yulrizka <yulr...@gmail.com> wrote:
>
> Most definitely interesting. https://golang.org/src/cmd/compile/internal/ssa/rewriteAMD64.go#L54447
> Am I correct to assume that DUFFCOPY is and optimization for MOVE?
>
> If so, i wonder why the first example did not generate any MOV command at all.

In the first case, there was a move, but it was discarded in the SSA
pass because it was dead.

I would guess that the SSA pass is not smart enough to see that the
call to the builtin function is dead, so it doesn't remove it.

If I'm right, it's basically a missing optimization in the compiler.
But it doesn't seem like a very important one; not many people write
range loops that name the value but don't use it.

Ian
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/19743410-e4c6-4ca2-aee9-a0ba59afc4df%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages