gc: optimize JMP to RET instructions

Arseny Samoylov

unread,

Aug 13, 2024, 1:10:58 PM8/13/24

to golang-nuts

Hello community, recently I found that gc generates a lot of JMP to RET instructions and there is no optimization for that. Consider this example:

```

// asm_arm64.s

#include "textflag.h"

TEXT ·jmp_to_ret(SB), NOSPLIT, $0-0

JMP ret

ret:

RET

```

This compiles to :

```

TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s

asm_arm64.s:4 0x77530 14000001 JMP 1(PC)

asm_arm64.s:6 0x77534 d65f03c0 RET

```

Obviously, it can be optimized just to RET instruction.

So I made a patch that replaces JMP to RET with RET instruction (on Prog representation):

```

diff --git a/src/cmd/internal/obj/pass.go b/src/cmd/internal/obj/pass.go
index 066b779539..87f1121641 100644
--- a/src/cmd/internal/obj/pass.go
+++ b/src/cmd/internal/obj/pass.go
@@ -174,8 +174,16 @@ func linkpatch(ctxt *Link, sym *LSym, newprog ProgAlloc) {
continue
}
p.To.SetTarget(brloop(p.To.Target()))
- if p.To.Target() != nil && p.To.Type == TYPE_BRANCH {
- p.To.Offset = p.To.Target().Pc
+ if p.To.Target() != nil {
+ if p.As == AJMP && p.To.Target().As == ARET {
+ p.As = ARET
+ p.To = p.To.Target().To
+ continue
+ }
+
+ if p.To.Type == TYPE_BRANCH {
+ p.To.Offset = p.To.Target().Pc
+ }
}
}
}

```

You can find this patch on my GH.

I encountered few problems:

* Increase in code size - because RET instruction can translate in multiple instructions (ldp, add, and ret - on arm64 for example):

.text section of simple go program that calls function from above increases in 0x3D0 bytes; go binary itself increases in 0x2570 (almost 10KB) in .text section size

(this is for arm64 binaries)

* Optimization on Prog representation is too late, and example above translates to:

```

TEXT main.jmp_to_ret.abi0(SB) asm_arm64.s

asm_arm64.s:4 0x77900 d65f03c0 RET

asm_arm64.s:6 0x77904 d65f03c0 RET

```

(no dead code elimination was done =( )

So I am looking for some ideas. Maybe this optimization should be done on SSA form and needs some heuristics (to avoid increase in code size).

And also I would like to have suggestion where to benchmark my optimization. Bent benchmark is tooooo long =(.

Ps: example of JMP to RET from runtime:

```

TEXT runtime.strequal(SB) a/go/src/runtime/alg.go

…

alg.go:378 0x12eac 14000004 JMP 4(PC) // JMP to RET in Prog

alg.go:378 0x12eb0 f9400000 MOVD (R0), R0

alg.go:378 0x12eb4 f9400021 MOVD (R1), R1

alg.go:378 0x12eb8 97fffc72 CALL runtime.memequal(SB)

alg.go:378 0x12ebc a97ffbfd LDP -8(RSP), (R29, R30)

alg.go:378 0x12ec0 9100c3ff ADD $48, RSP, RSP

alg.go:378 0x12ec4 d65f03c0 RET

...

```

Keith Randall

unread,

Aug 13, 2024, 8:59:55 PM8/13/24

to golang-nuts

We generally don't do optimizations like that directly on assembly. In fact, we used to do some like that but they have been removed.

We want the generated machine code to faithfully mirror the assembly input. People writing assembly have all kind of reasons for laying out instructions in particular ways (better for various caches, etc) that we don't want to disrupt.

If the Go compiler is generating such a pattern, we can optimize that. There's some discussion here https://github.com/golang/go/issues/24936 but nothing substantive came of it. It would need benchmarks demonstrating it is worth it, and concerns about debuggability (can you set a breakpoint on each return in the source?) also matter.

> Ps: example of JMP to RET from runtime:

That is a JMP to the LDP instruction, not directly to the RET.

Arseny Samoylov

unread,

Aug 14, 2024, 12:31:55 PM8/14/24

to golang-nuts

Thank you for your answer!

> We generally don't do optimizations like that directly on assembly.

I definitely agree. But this is also a pattern for generated code.

> and concerns about debuggability (can you set a breakpoint on each return in the source?) also matter

This is an interesting problem that I haven't thought about, thank you!

> That is a JMP to the LDP instruction, not directly to the RET.

Yes, but on Prog representation it is. I mentioned it when pointed out problem with increasing code size (RET translates to multiple instructions).

> There's some discussion here https://github.com/golang/go/issues/24936

I am grateful for the link to the discussion. In this discussion, you mentioned yours abandoned CL that actually does the contrary of my optimization =).

> It would need benchmarks demonstrating it is worth it

Can you please provide some suggestions for benchmarks? I tried bent, but I would like to test on some other benchmarks.

Thank you in advance!

robert engels

unread,

Aug 14, 2024, 12:40:22 PM8/14/24

to Arseny Samoylov, golang-nuts

Won’t the speculative/parallel execution by most processors make the JMP essentially a no-op?

See https://stackoverflow.com/questions/5127833/meaningful-cost-of-the-jump-instruction

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/00b5127d-0027-4db0-93db-11f7fe21fb4an%40googlegroups.com.

Arseny Samoylov

unread,

Aug 14, 2024, 12:46:26 PM8/14/24

to golang-nuts

> Won’t the speculative/parallel execution by most processors make the JMP essentially a no-op?

I guess you are right, but this is true when JMP destination already in instruction buffer. I guess most of these cases are when JMP leads to RET inside on function, so indeed this optimization will have almost zero effect. But if RET instruction appears to be far enough, I guess this optimization can be meaningful.

robert engels

unread,

Aug 14, 2024, 1:21:01 PM8/14/24

to Arseny Samoylov, golang-nuts

My understanding is that optimizations like this are almost never worth it on modern processors - the increased code size works against the modern branch predictor and speculative executions - vs the single shared piece of code - there is less possibilities and thus instructions to preload.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1e280aca-1ccc-4aca-9d32-83ecddce50c3n%40googlegroups.com.

Arseny Samoylov

unread,

Aug 15, 2024, 4:00:59 AM8/15/24

to golang-nuts

I guess you are right.

Thank you very much for the discussion!

Reply all

Reply to author

Forward