What is the long term cost of closures in Golang

Alexander Shopov

unread,

Mar 8, 2026, 1:05:52 PMMar 8

to golang-nuts

Hi all,

Closures have some cost and we are advised to not use them in code
that strives to be performant.
However the quick benchmark I did showed this is not so currently but
I suppose the benchmark is way too trivial.

What optimisation will the official golang compiler try to do and what
is the long term plan for this?

AFAIK there are two types of costs for closures:
1. Some data may need to be allocated in heap and will be GC collected
rather than stack allocated and autorelease.
2. Typically inner functions are assigned to a variable. So at the
moment the body is called via the variable, the runtime needs to check
what the variable points to.

Note that both of these can be optimized away depending on things such
as the definition, call and size of function body, number of free
variables.

In the absolute trivial benchmark I created - it seems all calls are
practically the same (difference slowest to fastest is 2%). However I
am not sure how well this corresponds to typical behavior.

- When will a closure call be inlined?
- How long can the body be or how many free variables can be for the
compiler to optimize this?
- What can break the optimization?

Kind regards:
al_shopov

Benchmark here: https://go.dev/play/p/2Z7cg0tJVAe
Results:

BenchmarkNoclosurecall-12 92 12593210 ns/op
BenchmarkClosurecall-12 91 12327950 ns/op
BenchmarkClosurecallimmediate-12 92 12359107 ns/op

Jason E. Aten

unread,

Mar 8, 2026, 4:42:14 PMMar 8

to golang-nuts

> we are advised to not use them in code that strives to be performant

We are also advised to always profile before guessing about performance. I might

advise that this second piece of advice should almost always be given more weight.

:)

Best wishes,

Jason

Ugorji Nwoke

unread,

Mar 8, 2026, 8:42:47 PMMar 8

to golang-nuts

> However the quick benchmark I did showed this is not so currently but I suppose the benchmark is way too trivial.

Seems OP did. Just answer and help him if you can.

Brian Candler

unread,

Mar 9, 2026, 5:28:50 AMMar 9

to golang-nuts

Microbenchmarks are often unrepresentative of real world behaviour. The OP's microbenchmarks found no significant difference, but I think they are very poor examples.

For one, a good compiler might have optimised out the whole loops - although this appears not to have happened here. For another, these are compute-bound functions - each function takes about 12ms to run - so do not measure the cost of call and return from a closure versus a "normal" function. But even with step=int32(500_000_000) I see no difference. More importantly, these functions are so trivial that they almost certainly run entirely in registers. The initial "d := step" line copies the closure variable to a local variable, and I'd say that's highly likely to use a register. However again, changing the loops to use "step" directly doesn't appear to make a difference.

You can use godbolt if you want to look at the compiled code, which will answer your questions about inlining.

Whether the cost is significant in a particular real-world application is a very different, and much more relevant question; that will be specific to the OP's real-world problem. The only general feedback I can offer is: "I've not heard anybody complain here about the cost of closures in Go".

The OP has also identified that if there is a problem, it is likely to be related to garbage collection. Go's GC is continuously improving, and is not a problem for many real-world workloads. If their application really has a critical bottleneck in this area, then maybe a non-GC language would be more suitable. But to me, this sounds like a severe case of premature optimisation.

Jason E. Aten

unread,

Mar 9, 2026, 6:21:23 AMMar 9

to golang-nuts

Brian's words are wise. My point was its almost impossible to generalize.

But to provide a specific workflow to analyze your actual production code in place, in practice:

a) write a Benchmark that focuses on your area of interest in your actual code.

b) compile the test binary, "go test -c". You do this so pprof can show you the disassembly in step e) below.

c) run the benchmark with -benchtime=10s and -cpuprofile cpu.prof

For example, from my run

two minutes ago, the full benchmark run line was:

go test -v -tags memfs -run=xxx -bench Benchmark_Iter_YogaDB_Ascend -benchtime=10s -cpuprofile cpu.prof

d) open pprof; it will show you the flame graph; look for the longest horizontal bar, click it, then click it again

once it expands; then select View -> Source from the upper left menu. It will show you the source

code with the time spent on each hot line, out of the 10 seconds that you ran.

go tool pprof -http :7777 yogadb.test cpu.prof

It will look something like this:

e) if need be, choose View -> Disassembly instead of View -> Source

f) highlight the entire section that contains the hottest line, such as line 1252 above,

g) paste the text into your favorite LLM and ask it for ideas to help you optimize that code.

It will give you amazingly good ideas 90% of the time. try them one by one, running

your benchmark after each one and observing if the timing improved. Sometimes you

need to redirect it to just try to eliminate function calls, or to manually inline hot parts

of functions to avoid function calls. My ability to read assembly -- horrible. Does

not matter because the LLM speaks it fluently. Leverage that.

There is an example of what this process can do here, in this next link. It took a 340 nsec hot iteration path down to

currently less than 10 nsec, which is about an L3 cache load on my 2020 vintage Intel CPU.

https://pkg.go.dev/github.com/glycerine/yogadb#hdr-The_Iterator_Optimization_History_

Finally, move over to Linux and run "perf" to analyze and optimze your L1 hit rate. Something like:

sudo perf stat -e L1-dcache-loads,L1-dcache-load-misses ./drwmutex-bench -strat 1

Enjoy the process.

- Jason

p.s. there is a nice video that illustrates optimization in Go, here, in case some of the steps were unclear,

from Prashant V: "Profiling and Optimizing Go"

https://www.youtube.com/watch?v=N3PWzBeLX2M

slides: https://docs.google.com/presentation/d/1n6bse0JifemG7yve0Bb0ZAC-IWhTQjCNAclblnn2ANY/present?slide=id.g3a3e2af65_029

demo code https://github.com/prashantv/go_profiling_talk

Alexander Shopov

unread,

Mar 11, 2026, 6:11:42 AMMar 11

to golang-nuts

Thanx for all the responses but they are not what I need.

What I want is the mental model to look at the cost of closures rather
than solving a particular performance problem.
Pointers to the specific ways closures are implemented plus the basics
of usual go compiler optimisations (gc rather than gcc) would be more
helpful.
Tickets or outlines for future plans for optimisation would be nice too.

A benchmark does not give me that. Benchmarks can at most do
comparison of two or more particular code bases when run with a
particular toolchain (and on a particular processor).

The whole reason I had any benchmark in the initial question is to
prove I am willing to walk the walk when it comes to comparing
implementations.

Jason, thanx for bringing up the work of Prashant who is a former
colleague though this particular presentation is a bit dated as it was
done 9 years ago.

Note that it also does measurements at runtime - so the code has
already been written and deployed to production.
I am not expecting magic bullets but rules of thumb. I agree that when
you need exact results and actually doing the optimisation work you
need to find the hot spots and then do benchmarks.

I also don't buy the argument that any efforts for such rules of thumb
would be premature optimisations.
We tend to gravitate to using O(ln(x)) algorithms rather than O(n2)
even at the moment we write the initial code.

Anyway - I get I'll need to do more homework and delve in:
- src/cmd/compile/internal/walk/closure.go
- src/cmd/compile/internal/inline/inl.go
- src/cmd/compile/internal/escape/expr.go

Kind regards:
al_shopov

Ian Lance Taylor

unread,

Mar 12, 2026, 6:21:36 PMMar 12

to Alexander Shopov, golang-nuts

On Wed, Mar 11, 2026 at 3:11 AM Alexander Shopov <a...@kambanaria.org> wrote:
>
> What I want is the mental model to look at the cost of closures rather
> than solving a particular performance problem.
> Pointers to the specific ways closures are implemented plus the basics
> of usual go compiler optimisations (gc rather than gcc) would be more
> helpful.

https://go.dev/s/go11func

Ian

Alexander Shopov

unread,

Mar 13, 2026, 5:26:08 AMMar 13

to Ian Lance Taylor, golang-nuts

Thank you, Ian.
Will check it out and compare with the current state.
Kind regards:
al_shopov

Reply all

Reply to author

Forward