insane idea to eliminate CGO latency

1,757 views
Skip to first unread message

Jason E. Aten

unread,
Mar 14, 2021, 1:57:12 AM3/14/21
to golang-nuts
I was noodling about how to minimize the cost of crossing the CGO barrier from Go code into C code and back.

Then I thought, what if I look at this the other way around.

Instead of teaching the Go compiler how to better run C code, what if a C compiler (e.g. clang) was taught to generate code that used the Go stack and calling conventions.

Theoretically, the cc output "in Go convention" could be linked with Go code without paying the CGO penalty, no?

How crazy is this? :)

ma...@eliasnaur.com

unread,
Mar 14, 2021, 5:39:16 AM3/14/21
to golang-nuts
I'm no authority here, but I believe a large (major?) part of the Cgo overhead is caused by scheduling overhead. As I understand it, a C function call is non-preemptible and the Go runtime don't know whether the call will block. I believe there have been proposals to mark Cgo calls as "non-blocking" or similar, in effect promising the runtime that the call will return quickly enough to elide scheduling. The problem is that if a "non-blocking" C call turns out to block, your program may deadlock.

I also think that it's been mentioned by authorities (probably Ian) that there are still gains to be made from optimizing the existing trampolines and calling convention adapters.

Elias




Wojciech S. Czarnecki

unread,
Mar 14, 2021, 7:30:51 AM3/14/21
to golan...@googlegroups.com
On Sat, 13 Mar 2021 22:57:11 -0800 (PST)
"Jason E. Aten" <j.e....@gmail.com> wrote:

> Iminimize the cost of crossing the CGO barrier from Go code into C code and back.

I personally regard this cost as godsend ;) It incentivises us to rewrite C codebases into proper Go.

> Instead of teaching the Go compiler how to better run C code, what if a C
> compiler (e.g. clang) was taught to generate code that used the Go stack
> and calling conventions.

Someone would need to do it, then maintain it.

> Theoretically, the cc output "in Go convention" could be linked with Go
> code without paying the CGO penalty, no?

Not that easy. Elias Naur already said why.

> How crazy is this? :)

There are more "crazy" solutions that actually work:

Eg. if you really need to hook at battle-tested C code that needs no further maintenance, you may try https://github.com/minio/c2goasm
It lets you bootstrap fast, then you may aim at a proper rewrite.

Caveats: https://github.com/golang/go/issues/40724 https://github.com/golang/proposal/blob/master/design/27539-internal-abi.md

Hope this helps,

--
Wojciech S. Czarnecki
<< ^oo^ >> OHIR-RIPE

Elias Naur

unread,
Mar 14, 2021, 7:57:13 AM3/14/21
to Wojciech S. Czarnecki, golan...@googlegroups.com
On Sun Mar 14, 2021 at 12:30, Wojciech S. Czarnecki wrote:
> On Sat, 13 Mar 2021 22:57:11 -0800 (PST)
> "Jason E. Aten" <j.e....@gmail.com> wrote:
>
> > Iminimize the cost of crossing the CGO barrier from Go code into C code and back.
>
> I personally regard this cost as godsend ;) It incentivises us to rewrite C codebases into proper Go.
>

Rewriting (and maintaining!) mature and well-maintained libraries is impractical.
Rewriting platform-bound APIs (OpenGL, Cocoa) is impossible.

> > How crazy is this? :)
>
> There are more "crazy" solutions that actually work:
>
> Eg. if you really need to hook at battle-tested C code that needs no further maintenance, you may try https://github.com/minio/c2goasm
> It lets you bootstrap fast, then you may aim at a proper rewrite.
>

A rewrite is one thing, maintenance is another. Do you have the
resources to match those poured into SQLite or Harfbuzz?

Elias

Jan Mercl

unread,
Mar 14, 2021, 8:04:34 AM3/14/21
to Elias Naur, Wojciech S. Czarnecki, golang-nuts
On Sun, Mar 14, 2021 at 12:57 PM Elias Naur <ma...@eliasnaur.com> wrote:

> > Eg. if you really need to hook at battle-tested C code that needs no further maintenance, you may try https://github.com/minio/c2goasm
> > It lets you bootstrap fast, then you may aim at a proper rewrite.
> >
>
> A rewrite is one thing, maintenance is another. Do you have the
> resources to match those poured into SQLite or Harfbuzz?

Harfbuzz is C++ so until someone helps with converting it into pure C
it's off limits to ccgo. But it can handle SQLite already:
https://pkg.go.dev/modernc.org/sqlite. Passing more than 900k Tcl
tests that SQLite includes.

Elias Naur

unread,
Mar 14, 2021, 8:27:53 AM3/14/21
to Jan Mercl, Wojciech S. Czarnecki, golang-nuts
Indeed, I didn't mean to ignore ccgo. My point was aimed at the "then
you may aim at a proper rewrite" part which seems to argue that
rewriting is always the better approach.

Elias

Sebastien Binet

unread,
Mar 14, 2021, 9:29:42 AM3/14/21
to ma...@eliasnaur.com, 0xj...@gmail.com, oh...@fairbe.org, golan...@googlegroups.com
There's something to be said to the notion of being able to better inter-op with C code bases, though.
If we had a C compiler written in Go (like modernc.org/cc perhaps), integrated with the Go compiler (yes, that's an additional maintenance issue), that would spit off Go-compatible ASM, then we could have a better inter-op story and retain a nice cross-compatibility feature (assuming that C compiler could cross-compile as the Go one).

A bit like what the Zig compiler can do.

In that scheme, one wouldn't have to maintain a C and a Go code bases. ("Just" the C compiler one).

-s
-------- Original Message --------

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/C9X2NM4AAJ4C.3UHAPTBAIMZQY%40themachine.

Wojciech S. Czarnecki

unread,
Mar 14, 2021, 9:35:50 AM3/14/21
to Elias Naur, golang-nuts
On Sun, 14 Mar 2021 12:56:43 +0100
"Elias Naur" <ma...@eliasnaur.com> wrote:

> > It incentivises us to rewrite C codebases into proper Go.

> Rewriting (and maintaining!) mature and well-maintained libraries is impractical.
> Rewriting platform-bound APIs (OpenGL, Cocoa) is impossible.
> A rewrite is one thing, maintenance is another. Do you have the
> resources to match those poured into SQLite or Harfbuzz?

Where it is impractical there costs of CGO call (and writing a C/go wrapper to things like a Harfbuzz) likely are offsetted by the amount of CPU work done on C side, or by the amount of human work not done. Where it nears impossible there we are and we will be using wrappers :) Neither I advocated rewrite of everything — but for small to middle things it can be done and pays almost immediately with lower entry and usage costs (eg. https://www.gonum.org/).

> Elias

Jason E. Aten

unread,
Mar 14, 2021, 12:37:13 PM3/14/21
to golang-nuts
> I'm no authority here, but I believe a large (major?) part of the Cgo overhead is caused by scheduling overhead. As I understand it, a C function call is non-preemptible and the Go runtime don't know whether the call will block. 

But that part would be handled by the C-compiler-that-knows-Go inserting the pre-emption points just like the Go compiler does into the generated code. Or the same checks for blocking.

Robert Engels

unread,
Mar 14, 2021, 3:01:05 PM3/14/21
to Jason E. Aten, golang-nuts
Based on two decades of Java FFI - the overhead comes from type mapping not the housekeeping to control GC. The latter can be as simple as a volatile read and 2 writes per call and can usually be coalesced in tight loops. Since Go already has easy native C type mapping the FFi should be very efficient depending on types used.   

On Mar 14, 2021, at 11:37 AM, Jason E. Aten <j.e....@gmail.com> wrote:

> I'm no authority here, but I believe a large (major?) part of the Cgo overhead is caused by scheduling overhead. As I understand it, a C function call is non-preemptible and the Go runtime don't know whether the call will block. 

But that part would be handled by the C-compiler-that-knows-Go inserting the pre-emption points just like the Go compiler does into the generated code. Or the same checks for blocking.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Ian Lance Taylor

unread,
Mar 14, 2021, 4:04:57 PM3/14/21
to Robert Engels, Jason E. Aten, golang-nuts
On Sun, Mar 14, 2021 at 12:00 PM Robert Engels <ren...@ix.netcom.com> wrote:
>
> Based on two decades of Java FFI - the overhead comes from type mapping not the housekeeping to control GC. The latter can be as simple as a volatile read and 2 writes per call and can usually be coalesced in tight loops. Since Go already has easy native C type mapping the FFi should be very efficient depending on types used.

Go and Java are pretty different here. The type mapping overhead from
Go to C is effectively non-existent--or, to put it another way, it's
pushed entirely onto the programmer The GC housekeeping is, as you
say, low. The heaviest cost is the scheduling housekeeping: notifying
the scheduler that the goroutine is entering a new scheduling regime,
so that a blocking call in C does not block the entire program. A
minor cost is the change is the calling convention.

As Jason says, if all of the C code--and I really do mean all--can be
compiled by a Go-aware C compiler, then the scheduling overhead can be
largely eliminated, and pushed into the system call interface much as
is done for Go code. But that is a heavy lift. Compiling only some
of the C code with a Go-aware C compiler seems unlikely to provide any
significant benefit.

Ian



> On Mar 14, 2021, at 11:37 AM, Jason E. Aten <j.e....@gmail.com> wrote:
>
> > I'm no authority here, but I believe a large (major?) part of the Cgo overhead is caused by scheduling overhead. As I understand it, a C function call is non-preemptible and the Go runtime don't know whether the call will block.
>
> But that part would be handled by the C-compiler-that-knows-Go inserting the pre-emption points just like the Go compiler does into the generated code. Or the same checks for blocking.
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0ac6ac9e-ed99-4536-a8b0-44674f8b85a5n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/47869391-FC69-44C8-A7AA-8F335A17CF71%40ix.netcom.com.

Robert Engels

unread,
Mar 14, 2021, 4:46:46 PM3/14/21
to Ian Lance Taylor, Jason E. Aten, golang-nuts
That was my point, based on Java, there is the ability to make the GC coordination extremely efficient a read and two writes per Go to C complete call trip - and this can often be eliminated in tight loops.

So if the scheduling is the source of inefficiency there are more simple ways to tackle than this proposal.

> On Mar 14, 2021, at 3:04 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcVSphZ4w%2BNaCFnkHOmoZ%2BOdD-Ob3K%2BbcjVn02fivJRX%2Bg%40mail.gmail.com.

Ian Lance Taylor

unread,
Mar 14, 2021, 10:38:11 PM3/14/21
to Robert Engels, Jason E. Aten, golang-nuts
On Sun, Mar 14, 2021 at 1:46 PM Robert Engels <ren...@ix.netcom.com> wrote:
>
> That was my point, based on Java, there is the ability to make the GC coordination extremely efficient a read and two writes per Go to C complete call trip - and this can often be eliminated in tight loops.

I don't mean to drag out the conversation but I'm not sure I
understand the point. I think you were the first person to mention GC
coordination. I don't think there is any GC coordination issue here.
There is a scheduler coordination issue, specifically the need to
inform Go's goroutine scheduler that the goroutine is changing
behavior.

Ian

Robert Engels

unread,
Mar 14, 2021, 10:54:43 PM3/14/21
to Ian Lance Taylor, Jason E. Aten, golang-nuts
True. I was collapsing the two because why does Go care. If the routine is in a C native call don’t switch the routine assigned to the thread. Similarly. If the thread is in C native it can’t affect stacks / heap structures - so routines that make C calls only need to ensure a C minimum stack size. The state I was referring to supports the determination of “is running native” and if so “leave it alone” until it returns to Go code. As long as the pointers passed to the C code are either native (non heap) or tracked the C code is “safe”.

So to that point, it’s confusing as to why the scheduler is the bottleneck in calling C code.

> On Mar 14, 2021, at 9:38 PM, Ian Lance Taylor <ia...@golang.org> wrote:

Jason E. Aten

unread,
Mar 14, 2021, 10:58:30 PM3/14/21
to golang-nuts
Interesting. I have no idea how the GC would interact with "Go-aware" C code.

I suppose the hardest thing about the new compiler would be to emulate what Go does at blocking system calls to not actually block the whole process. (Notice somehow, and start a new thread; not sure if this is still true, I think I read it years ago). 

drc...@google.com

unread,
Mar 15, 2021, 1:28:16 PM3/15/21
to golang-nuts
Go "cares" because in Go it's common for a single OS thread to correspond to 25-100% of runnable goroutines.

So the accounting for "how many OS threads are available to run goroutines" tends to be fine-grained,
otherwise weird failure-to-schedule bugs can occur. It's likely it could be improved, but it's not at all easy,
especially if we factor in the problem of testing it, and any weird performance effects on other Go programs.



Ian Lance Taylor

unread,
Mar 15, 2021, 1:58:38 PM3/15/21
to Robert Engels, Jason E. Aten, golang-nuts
On Sun, Mar 14, 2021 at 7:54 PM Robert Engels <ren...@ix.netcom.com> wrote:
>
> True. I was collapsing the two because why does Go care. If the routine is in a C native call don’t switch the routine assigned to the thread. Similarly. If the thread is in C native it can’t affect stacks / heap structures - so routines that make C calls only need to ensure a C minimum stack size. The state I was referring to supports the determination of “is running native” and if so “leave it alone” until it returns to Go code. As long as the pointers passed to the C code are either native (non heap) or tracked the C code is “safe”.
>
> So to that point, it’s confusing as to why the scheduler is the bottleneck in calling C code.

Go uses a cooperative goroutine scheduler; even the signal based
preemption that we use now amounts to a mechanism for telling the
goroutine to cooperate. A goroutine that is running C code is not
cooperating with the scheduler. If the scheduling code is not aware
of that, it is easy for a blocking C function to block the entire
program, even if there are other runnable goroutines.

I think it is too strong to say that the scheduler is the "bottleneck"
in calling C code, but I believe that the operations required to tell
the scheduler what is happening to the goroutine are the most costly
parts of a call into C code.

I should clarify that I am not saying that this is some inherent
problem that can't be fixed. I'm saying that this is true in today's
implementation. If we want to speed up calls to C code--and, of
course, we do--then we should be looking at reducing this overhead of
communicating with the scheduler. We should not be looking at the
different calling convention or the change of stacks, because those,
while not entirely free, are not the heaviest cost in the current
implementation.

Ian

Jason E. Aten

unread,
Mar 15, 2021, 3:16:52 PM3/15/21
to golang-nuts
On Monday, March 15, 2021 at 12:58:38 PM UTC-5 Ian Lance Taylor wrote:
I think it is too strong to say that the scheduler is the "bottleneck"
in calling C code, but I believe that the operations required to tell
the scheduler what is happening to the goroutine are the most costly
parts of a call into C code.

Thanks Ian! I looked through https://github.com/golang/go/tree/master/src/cmd/cgo but couldn't
locate where the CGO communication with the scheduler happens. Could you point out the code?

 

Ian Lance Taylor

unread,
Mar 15, 2021, 3:24:07 PM3/15/21
to Jason E. Aten, golang-nuts
It is in runtime.cgocall, notably the calls to entersyscall and
exitsyscall. Also pay attention to runtime.cgocallbackg, which is
invoked when calling back from C to Go. Both functions are in
src/runtime/cgocall.go. It will help to review the long comment at
the start of that file.

Ian

t hepudds

unread,
Mar 15, 2021, 3:57:14 PM3/15/21
to Ian Lance Taylor, Jason E. Aten, golang-nuts
Hello fellow gophers,

Here is a helpful link that gives an overview of some of what impacts cgo performance, including the slides have pointers into the code for anyone interested in going deeper:


https://speakerdeck.com/filosottile/why-cgo-is-slow-at-capitalgo-2018 


That was a 2018 presentation “Why cgo is Slow” from Filippo Valsorda from the core Go team. (To my knowledge, I don’t know that there is a video of that talk, but I’d be curious if anyone has a pointer to a video, including even shaky handheld mobile video). 


And here are some quick pointers to some older related issues:


https://github.com/golang/go/issues/42469


https://github.com/golang/go/issues/16051


https://github.com/golang/go/issues/9704


If anyone is feeling curious, benchmarking performance of cgo across Go releases could be helpful to spot any slowdowns. For example, someone could run this trivial benchmark across recent releases:


https://github.com/golang/go/issues/9704#issuecomment-498812185


Or pick something from here to run across releases:


https://github.com/golang/go/issues/42469#issuecomment-746947396


Or some other benchmark across releases. 


On the scheduler front, I would be curious about this older comment from Ian:


———

“In Go 1.8 when a goroutine calls into C, it is still holding a GOMAXPROCS slot and blocking other goroutines from running. Only if it is running in C for more than 20 microseconds or so will the system monitor thread decide that it is blocked in C code and activate another goroutine. The fact that your system performs better than you increase GOMAXPROCS makes me suspect that there is something to improve in that area of the code.”

———


.... where that comment was later used as part of an explanation FAQ on why dqlite moved from Go to C (https://dqlite.io/docs/faq), and whether or not that explanation is currently accurate:


———

“The first prototype implementation of dqlite was in Go, leveraging the hashicorp/raft implementation of the Raft algorithm. The project was later rewritten entirely in C because of performance problems due to the way Go interoperates with C: Go considers a function call into C that lasts more than ~20 microseconds as a blocking system call, in that case, it will put the goroutine running that C call in waiting queue and resuming it will effectively cause a context switch, degrading performance (since there were a lot of them happening).”

———


Regards,

thepudds 


On Mar 15, 2021, at 3:24 PM, Ian Lance Taylor <ia...@golang.org> wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/K-If1Wh_6aA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXgvi5DfffbSUHn0WWq4sLtYW4bn7R2hhaDSgoWyvKGeA%40mail.gmail.com.

Robert Engels

unread,
Mar 15, 2021, 3:57:35 PM3/15/21
to Ian Lance Taylor, Jason E. Aten, golang-nuts
Totally agree and that was my point. If the desire is to speed up calls to C there are many options - done explicit (like marking calls non blocking), or implicit - once a routine makes a “unknown” C native call that routine is always bound to a dedicated thread - clearly you are trading performance for other resources in this case.

I was trying to communicate that if you don’t have to worry about type mapping you have some simpler options available than a new C compiler :)

> On Mar 15, 2021, at 2:29 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Andy Balholm

unread,
Mar 15, 2021, 4:58:35 PM3/15/21
to Jason E. Aten, golang-nuts

By the way, this existed at one point. Early versions of the Go toolchain included C compilers (6c, 8c, etc.) designed to work together nicely with Go code. If I remember right, most of the Go runtime was written in C, and compiled with these compilers. But they used an unusual dialect of C (which came from Plan 9) instead of ANSI C, so they couldn't compile most C libraries.

When the Go runtime was translated from C to Go, these compilers were dropped.

If you wanted to revive them and make them ANSI compliant, you would need to write a new libc that calls into the Go standard library for its system calls—because C code compiled with this compiler would not be able to call into the system libc without CGo!

Andy

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Robert Engels

unread,
Mar 15, 2021, 6:48:34 PM3/15/21
to Andy Balholm, Jason E. Aten, golang-nuts
I think it is more of using a specialized compiler on unmodified C code and expecting it to work. 

On Mar 15, 2021, at 3:58 PM, Andy Balholm <andyb...@gmail.com> wrote:


Reply all
Reply to author
Forward
0 new messages