What is the overhead of calling a C function from Go?

8,186 views
Skip to first unread message

adam_smith

unread,
Nov 30, 2009, 5:23:05 PM11/30/09
to golang-nuts
Can C functions be called as efficiently from Go as native Go
functions? E.g. in C# it is from my understanding some overhead in
calling native code because you have to switch from a managed
environment to an unmanaged environment (whatever that entails?).

Could you replace Go functions with C implementations and expect
performance gains or will calling out foreign functions limit
performance optimization by the compiler? (E.g. reducing possibility
of inlining)

Adam Langley

unread,
Nov 30, 2009, 5:26:14 PM11/30/09
to adam_smith, golang-nuts
On Mon, Nov 30, 2009 at 2:23 PM, adam_smith <erik.e...@gmail.com> wrote:
> Can C functions be called as efficiently from Go as native Go
> functions? E.g. in C# it is from my understanding some overhead in
> calling native code because you have to switch from a managed
> environment to an unmanaged environment (whatever that entails?).

Calling C functions via cgo involves a thread-switch, which is
reasonably expensive.

Calling C functions that are compiled with 6c (and family), like the
runtime functions, is the same as calling a Go function.


AGL

Ostsol

unread,
Nov 30, 2009, 8:37:55 PM11/30/09
to golang-nuts
Does that mean that calling a C function via a C wrapper written in
the cgo file is less expensive than calling the C function directly?
For example:

//#include <stdio.h>
//#include <stdlib.h>
//int GoPrintf (const char* str) { return printf (str); }
import "C"

func PrintStuff (stuff string) {
str := C.CString (stuff);
C.GoPrintf (str);
C.free (str)
}

-Ostsol

On Nov 30, 3:26 pm, Adam Langley <a...@golang.org> wrote:

Adam Langley

unread,
Nov 30, 2009, 8:45:32 PM11/30/09
to Ostsol, golang-nuts
On Mon, Nov 30, 2009 at 5:37 PM, Ostsol <ost...@gmail.com> wrote:
> Does that mean that calling a C function via a C wrapper written in
> the cgo file is less expensive than calling the C function directly?

No. Any time the process is running code with the native ABI, it needs
to do the context switch.


AGL

Russ Cox

unread,
Dec 1, 2009, 3:59:06 AM12/1/09
to adam_smith, golang-nuts
There's always going to be some overhead.
It's more expensive than a simple function call but
significantly less expensive than a context switch
(agl is remembering an earlier implementation;
we cut out the thread switch before the public release).
Right now the expense is basically just having to
do a full register set switch (no kernel involvement).
I'd guess it's comparable to ten function calls.

You could always write some programs to measure
it and let us know what you find. ;-)

Russ

Charlie

unread,
Dec 1, 2009, 6:30:07 AM12/1/09
to golang-nuts
Both functions just return

Averaged time over 1,000,000 iterations
Go subroutine execution time (ns): 10.585000
Extern C execution time (ns): 161.750000

for loop factored out of Go subroutine
for loop + wrapping go func factored out of Extern C (assumed to be
same as cost of calling simplefunc)
Using nanosecond (valid to microsecond)

Linux (Fedora 11 x64), amd be2300, 6x
Note: not seeing context switching while running

adam_smith

unread,
Dec 1, 2009, 9:43:05 AM12/1/09
to golang-nuts
I got to say I am a bit confused by what has been said in this thread.
At the moment it is not clear to me whether Adam Langley was right
about the thread switch. And whether Charlie's test was with thread
switch or not. Adam says that when using 6c compiler it is cheap but
not with cgo. But the cgo documentation says it produces two files one
for 6c and one for gcc (not that I understand the difference. I
thought a C file was a C file).

Further it is not clear to me the difference between regular C
functions compiled with 6c and those made in the same style as the
runtime C functions. If I understand correctly the runtime C functions
use a number of structs (e.g. String) which correspond to Go types.
But I am not sure what the implications of this are. Are the runtime
functions compiled / linked in another way? Could I basically get low
overhead on calling C functions if I wrote the C code like they are
written in the runtime? Following the same conventions and using the
structs representing standard Go types? Would that avoid a thread
switch?

I am sorry if this is a lot of stupid questions, I just can't find a
single place were all these differences are explained properly. At the
moment my hunch is that this is a bit like the difference between
calling a FFI and registering functions directly with the runtime. In
E.g. mono you can use a FFI interface to call C code but you could
also register a C function with the runtime, if it followed certain
conventions (function signature and types of objects in argument
list).

Russ Cox

unread,
Dec 1, 2009, 11:59:15 AM12/1/09
to adam_smith, golang-nuts
On Tue, Dec 1, 2009 at 06:43, adam_smith <erik.e...@gmail.com> wrote:
> I got to say I am a bit confused by what has been said in this thread.
> At the moment it is not clear to me whether Adam Langley was right
> about the thread switch. And whether Charlie's test was with thread
> switch or not.

Adam Langley was remembering an earlier version of the system.
In my reply I said:

(agl is remembering an earlier implementation;
we cut out the thread switch before the public release).

(I wrote agl instead of Adam there because your mails
say "adam_smith" and I was trying to avoid confusion.)

> Adam says that when using 6c compiler it is cheap but
> not with cgo. But the cgo documentation says it produces two files one
> for 6c and one for gcc (not that I understand the difference. I
> thought a C file was a C file).

The 6c file is just glue: all the work happens in the gcc-compiled
file. The difference is that code has different calling conventions
depending on which compiler gets invoked. When writing cgo
extensions you're trying to call other C libraries that have been
built with gcc, so you need to compile your C code with gcc.
Cgo provides the bridge between the two different worlds.

> Further it is not clear to me the difference between regular C
> functions compiled with 6c and those made in the same style as the
> runtime C functions. If I understand correctly the runtime C functions
> use a number of structs (e.g. String) which correspond to Go types.
> But I am not sure what the implications of this are. Are the runtime
> functions compiled / linked in another way? Could I basically get low
> overhead on calling C functions if I wrote the C code like they are
> written in the runtime? Following the same conventions and using the
> structs representing standard Go types? Would that avoid a thread
> switch?

It would avoid a few register operations. Unless you need to
be able to do very high frequency calls, it's probably easier
to use cgo than to make your code too familiar with the
runtime data structures.

> I am sorry if this is a lot of stupid questions, I just can't find a
> single place were all these differences are explained properly. At the
> moment my hunch is that this is a bit like the difference between
> calling a FFI and registering functions directly with the runtime. In
> E.g. mono you can use a FFI interface to call C code but you could
> also register a C function with the runtime, if it followed certain
> conventions (function signature and types of objects in argument
> list).

That's about right. One complicating factor in Go is that if
you want to call any code that has been compiled with gcc
(e.g., any C library already installed on your system) you
have to use cgo.

Russ
Message has been deleted

Russ Cox

unread,
Dec 1, 2009, 5:13:26 PM12/1/09
to inspector_jouve, golang-nuts
On Tue, Dec 1, 2009 at 13:36, inspector_jouve <kaush...@gmail.com> wrote:
> Any suggestions about assembler?

The first rule of writing assembly is don't.

> If I write .s file, it can be compiler and linked in a regular way?
> What is the overhead in this case?

It's an ordinary function call.

> Are platform-optimized versions of low-level functions welcome?

Assembly is really only appropriate if two things are true:

1. There is a fundamental reason you can't get the
same performance out of portable Go.
2. The performance difference, in real programs,
is significant (say, 2x or more).

(More generally, added complexity always has to pay for itself.)

The math package has an assembly square root, because
otherwise you can't get at the hardware implementation,
and there's a big difference between the software and
hardware versions. Switching to assembly made an
actual program doing a reasonable computation
(test/bench/nbody) 3x faster.

The big package has assembly versions of the basic
bignum operations, because otherwise you can't get
at the special hardware instructions (like double-word
multiplications and division), and the software equivalent
slows down real programs using bignums. When we added
386 assembly to bignum, it made RSA operations 7x faster.

> But, for example, string conversions? I played with Itoa method (just
> to get a sense of performance) - I can see it can be easily optimized
> by about 15% just by writing specific version for uints in go (by
> simply cloning existing code with "uints" instead of uint64). A lot
> more can be done with assembler, of course. Question is whether it
> should be done at all or not.

I think strconv fails on both criteria: you can do perfectly well
enough in Go, and well-written real world programs aren't
bottlenecked by itoa speed.

Russ

Bob Cunningham

unread,
Dec 1, 2009, 8:24:37 PM12/1/09
to golang-nuts
On 12/01/2009 02:13 PM, Russ Cox wrote:
> On Tue, Dec 1, 2009 at 13:36, inspector_jouve<kaush...@gmail.com> wrote:
>> Any suggestions about assembler?
>
> The first rule of writing assembly is don't.

Except, of course, when there is no alternative!

> ...
> Assembly is really only appropriate if two things are true:
>
> 1. There is a fundamental reason you can't get the
> same performance out of portable Go.
> 2. The performance difference, in real programs,
> is significant (say, 2x or more).

3. You need access to instructions that cannot
be emitted by the compiler (common in deeply
embedded programming). The Go compilers
don't (yet) generate SSE and MMX instructions.

> (More generally, added complexity always has to pay for itself.)
>...
>> But, for example, string conversions? I played with Itoa method (just
>> to get a sense of performance) - I can see it can be easily optimized
>> by about 15% just by writing specific version for uints in go (by
>> simply cloning existing code with "uints" instead of uint64). A lot
>> more can be done with assembler, of course. Question is whether it
>> should be done at all or not.
>
> I think strconv fails on both criteria: you can do perfectly well
> enough in Go, and well-written real world programs aren't
> bottlenecked by itoa speed.

Then again, it wasn't that long ago that CPUs contained dedicated string conversion instructions. Some mainframes still do!

-BobC

Dave Cheney

unread,
Feb 27, 2013, 11:25:56 PM2/27/13
to c...@chrisbolton.me, golan...@googlegroups.com, erik.e...@gmail.com
Yes, there will be some overhead. 

On 28/02/2013, at 15:15, c...@chrisbolton.me wrote:

would creating bindings to a physics engine via cgo see any significant decrease in performance?
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Philipp Schumann

unread,
Feb 28, 2013, 12:14:50 AM2/28/13
to golan...@googlegroups.com, erik.e...@gmail.com
Do it anyway! Would love to see one. Thinking of Bullet?

No matter what overhead there is, it cannot be as "bad" as rewriting it in Go, or writing your game in C/C++ would be...

In a simple OpenGL+GLFW graphics loop I'm having, there are some 400-500 CGO calls per frame right now. While I'm certainly going to down-optimize that to a fraction of this (not doing any proper batching yet for instance), whatever tiny overhead there is doesn't seem to bottleneck me. In a 4ms frame, on average 3.9ms are currently spent GPU-side drawing (glfw.SwapBuffers). I haven't properly timed only the CGO "overhead" but it must be miniscule -- here's why:

look at the above stats posted over 3 years ago:

Averaged time over 1,000,000 iterations 
Go subroutine execution time (ns): 10.585000 
Extern C execution time (ns): 161.750000

So say the budget for physics is 1 or 2 ms. Since 1million cgo calls cost 160ns (on 3-years-ago hardware with a much older Go/CGo implementation), that would be well within the budget.

And if the physics engine required 1 million calls per frame, it would be a terrible API design...


On Thursday, February 28, 2013 12:15:31 PM UTC+8, Chris Bolton wrote:
would creating bindings to a physics engine via cgo see any significant decrease in performance?

On Monday, November 30, 2009 2:23:05 PM UTC-8, adam_smith wrote:

Chris Bolton

unread,
Feb 28, 2013, 12:51:45 AM2/28/13
to golan...@googlegroups.com, erik.e...@gmail.com
Good to hear! And thinking of chipmunk actually :P.

Mikael Gustavsson

unread,
Mar 1, 2013, 7:28:22 AM3/1/13
to golan...@googlegroups.com, erik.e...@gmail.com
I'm pretty sure 160ns is the time per call.

Philipp Schumann

unread,
Mar 1, 2013, 10:29:15 PM3/1/13
to golan...@googlegroups.com, erik.e...@gmail.com
Well right now I do avg 396 cgo calls in avg 0.329ms.

That would be a shocking 830ns but I'm not really benchmarking here --- I'm doing some fair amounts of Go stuff in those 0.329ms and the CGO functions do some "actual work" (GPU driver / OS windowing) rather than being noops.

Not sure if that would be OK for a physics binding, depends on the API, depends on how much data is being calculated etc  etc.

I highly doubt a 160ns vs. 10ns "overhead" would be the bottleneck anyway, again depending on how the API works. Ideally one should be able to batch and minimize calls, but not sure if Chipmunk or Bullet support that as well as OpenGL does...

minux

unread,
Mar 2, 2013, 11:27:44 AM3/2/13
to Robert Zaremba, golan...@googlegroups.com, adam_smith, r...@golang.org

On Sat, Mar 2, 2013 at 5:42 PM, Robert Zaremba <robert....@zoho.com> wrote:
And what happens when I'm using gccgo compiler?
Do functions from the go compiled code and C libraries have the same calling conventions, thus the overhead is zero?
the overhead for translating between calling convention and switch stacks is small,
the major part of cgo overhead comes from coordination with the goroutine scheduler.
(for example, the scheduler might need to create new OS thread to run other ready
goroutines)

thus, even with gccgo, the overhead should be roughly the same.

Erwin

unread,
Mar 2, 2013, 12:13:47 PM3/2/13
to minux, Robert Zaremba, golan...@googlegroups.com, adam_smith, r...@golang.org

the overhead for translating between calling convention and switch stacks is small,
the major part of cgo overhead comes from coordination with the goroutine scheduler.
(for example, the scheduler might need to create new OS thread to run other ready
goroutines)

has it been considered to allow a fast path for calling C functions?  no idea if it is possible at all, but suppose one could label a C function as allowed to block the goroutine it is called from, so that the C call need not be run in its own thread.  and/or having part of the runtime made so that it forms a zero overhead environment to call C code from?

i agree that with OpenGL and its facilities to reduce the number of library calls, the GO/C overhead isn't worrisome, 
but wouldn't it be nice to be able to cooperate with C libraries very efficiently, even those that require many tiny function calls to do things.
  

bryanturley

unread,
Mar 2, 2013, 1:11:45 PM3/2/13
to golan...@googlegroups.com, minux, Robert Zaremba, adam_smith, r...@golang.org
I think the problem is calling standard c on a goroutine stack. You need to switch to a c stack first.

Dave Cheney

unread,
Mar 2, 2013, 3:53:18 PM3/2/13
to Erwin, minux, Robert Zaremba, golan...@googlegroups.com, adam_smith, r...@golang.org
You have to switch stacks during a cgo crosscall as there is no way to know the stack requirements of the C function you are calling. 

Dave

Maxim Khitrov

unread,
Mar 2, 2013, 4:02:45 PM3/2/13
to minux, Robert Zaremba, golan...@googlegroups.com, adam_smith, r...@golang.org
On Sat, Mar 2, 2013 at 11:27 AM, minux <minu...@gmail.com> wrote:
> the major part of cgo overhead comes from coordination with the goroutine
> scheduler.
> (for example, the scheduler might need to create new OS thread to run other
> ready
> goroutines)

I don't really understand why this is necessary. The current scheduler
is non-preemptive, so even in a pure Go program one goroutine can
prevent others from running by not executing any channel operations,
system calls, ... not sure what else may cause a switch. Why not do
the absolute minimum work necessary for a C call? If there are no
threads to run other goroutines, then that's a problem that should be
addressed via runtime.GOMAXPROCS. Am I missing something?

- Max

bryanturley

unread,
Mar 2, 2013, 4:08:36 PM3/2/13
to golan...@googlegroups.com, minux, Robert Zaremba, adam_smith, r...@golang.org

From the FAQ

"To make the stacks small, Go's run-time uses segmented stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time allocates (and frees) extension segments automatically. The overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number."

Go's segmented stacks != standard C stacks
The C code won't know how to grow the stack if it needs more than "a few kilobytes".

I think the current OS thread that the goroutine is running on has to be locked while you jump in and out of c code for signals as well.

Maxim Khitrov

unread,
Mar 2, 2013, 4:14:40 PM3/2/13
to bryanturley, golan...@googlegroups.com, minux, Robert Zaremba, adam_smith, r...@golang.org
I understand that the stack needs to be switched, but minux implied
that this isn't where most of the overhead comes from. The call to a C
function may also spawn a new thread, which would be used for running
other goroutines. Erwin and I are just wondering whether this is
really necessary.

- Max

minux

unread,
Mar 2, 2013, 4:37:17 PM3/2/13
to Maxim Khitrov, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
On Sun, Mar 3, 2013 at 5:14 AM, Maxim Khitrov <m...@mxcrypt.com> wrote:
I understand that the stack needs to be switched, but minux implied
that this isn't where most of the overhead comes from. The call to a C
function may also spawn a new thread, which would be used for running
other goroutines. Erwin and I are just wondering whether this is
really necessary.
basically, once a goroutine enters cgo, it's considered blocking, so not counted
in $GOMAXPROCS limit and so the goroutine scheduler might need to create
new OS thread to host other ready goroutines.

see also:

Erwin

unread,
Mar 2, 2013, 5:38:23 PM3/2/13
to minux, Maxim Khitrov, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
that's an interesting read.  a significant performance boost after russ applied the diffs that remove parts of the cgo work that need not be done when the c functions are known to return quickly and don't call back into go.  again, what if one could tag such c functions, and have a fast path for them in cgo?  seems doable?

Maxim Khitrov

unread,
Mar 2, 2013, 6:11:51 PM3/2/13
to minux, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org, Erwin
I tried to test the impact of removing entersyscall()/exitsyscall()
around asmcgocall(fn, arg) in cgocall.c (Russ's first patch), but
all.bat gets stuck when building runtime/cgo. No errors or any other
messages, it just sits there not doing anything.

To get past this, I added a new function to the runtime package that
can disable these calls at run time. The diff is below (for go 1.0.3).
I then ran a quick test that just called a C function, which didn't do
any real work:

http://play.golang.org/p/0Nc6Dlj6lU

Here are the results:

Before the patch: 820.0469ms
After the patch (syscall on): 841.0481ms
After the patch (syscall off): 326.0186ms

That's a pretty big difference. I don't think Russ's second patch can
be applied in the general case, since callbacks have to be supported,
but I would seriously consider either disabling
entersyscall()/exitsyscall() by default, or providing a function like
the one I added in the runtime package for disabling these calls at
run time.

The scheduler interaction makes sense for system calls, but I don't
think this behavior is appropriate when executing any and all C
functions.

- Max

diff -r 2d8bc3c94ecb src/pkg/runtime/cgocall.c
--- a/src/pkg/runtime/cgocall.c Fri Sep 21 17:10:44 2012 -0500
+++ b/src/pkg/runtime/cgocall.c Sat Mar 02 18:05:01 2013 -0500
@@ -84,6 +84,8 @@

void *initcgo; /* filled in by dynamic linker when Cgo is available */

+static bool cgosyscall = true;
+
static void unlockm(void);
static void unwindm(void);

@@ -131,9 +133,11 @@
* so it is safe to call while "in a system call", outside
* the $GOMAXPROCS accounting.
*/
- runtime·entersyscall();
+ if (cgosyscall)
+ runtime·entersyscall();
runtime·asmcgocall(fn, arg);
- runtime·exitsyscall();
+ if (cgosyscall)
+ runtime·exitsyscall();

if(d.nofree) {
if(g->defer != &d || d.fn != (byte*)unlockm)
@@ -151,6 +155,12 @@
}

void
+runtime·CgoAsSyscall(bool enable)
+{
+ cgosyscall = enable;
+}
+
+void
runtime·NumCgoCall(int64 ret)
{
M *m;
diff -r 2d8bc3c94ecb src/pkg/runtime/debug.go
--- a/src/pkg/runtime/debug.go Fri Sep 21 17:10:44 2012 -0500
+++ b/src/pkg/runtime/debug.go Sat Mar 02 18:05:01 2013 -0500
@@ -32,6 +32,9 @@
// NumGoroutine returns the number of goroutines that currently exist.
func NumGoroutine() int

+// CgoAsSyscall determines whether cgo calls are treated the same as syscalls.
+func CgoAsSyscall(enable bool)
+
// MemProfileRate controls the fraction of memory allocations
// that are recorded and reported in the memory profile.
// The profiler aims to sample an average of

Maxim Khitrov

unread,
Mar 2, 2013, 6:20:54 PM3/2/13
to minux, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org, Erwin
Forgot to mention that calling the regular Go test() function takes
35.002ms in my example (with inlining disabled), so cgo is still 9x
slower, but that's a lot better than 24x.

- Max

Archos

unread,
Mar 2, 2013, 6:54:33 PM3/2/13
to golan...@googlegroups.com, minux, bryanturley, Robert Zaremba, adam_smith, r...@golang.org, Erwin

El sábado, 2 de marzo de 2013 23:20:54 UTC, Maxim Khitrov escribió:
> http://play.golang.org/p/0Nc6Dlj6lU
>
> Here are the results:
>
> Before the patch:              820.0469ms
> After the patch (syscall on):  841.0481ms
> After the patch (syscall off): 326.0186ms

Forgot to mention that calling the regular Go test() function takes
35.002ms in my example (with inlining disabled), so cgo is still 9x
slower, but that's a lot better than 24x.

Having in mind those data I would argue that a Go library is always going to be faster than a binding to C library, even in libraries related to maths/graphics, since Go is about 3x slower than C (http://benchmarksgame.alioth.debian.org/u64/which-programs-are-best.php).

bryanturley

unread,
Mar 2, 2013, 7:58:16 PM3/2/13
to golan...@googlegroups.com, minux, bryanturley, Robert Zaremba, adam_smith, r...@golang.org, Erwin

From that page

Selected and weighted 'how-many-times-more compared to the-program-that-used-least' scores are compressed into one number, the weighted geometric mean, at the risk of being "neat, plausible, and wrong".

They are claiming to be "at the risk of being \"neat, plausible, and wrong\"."
Therefore they have a risk of both
* plausible: having an appearance of truth or reason
* wrong: deviating from truth or fact; erroneous
And being neat

So...  They might have the appearance of truth (appearances being deceiving), definitely not true, or clean/amusing...
The lack of confidence *the authors* of that data express, leads me to think I should completely ignore that data.


Robert Zaremba

unread,
Mar 2, 2013, 8:11:37 PM3/2/13
to golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
Thanks,
The patch sounds great. I think It is better to move responsibility for thread blocking to user C code. If the code does a lot of computation, then he needs to consider running it in separate OS thread.

Still didn't get an answer for Do functions from the gccgo compiled code and gcc complied C libraries have the same calling conventions?

bryanturley

unread,
Mar 2, 2013, 8:26:48 PM3/2/13
to golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
On Saturday, March 2, 2013 7:11:37 PM UTC-6, Robert Zaremba wrote:
Thanks,
The patch sounds great. I think It is better to move responsibility for thread blocking to user C code. If the code does a lot of computation, then he needs to consider running it in separate OS thread.


I think it is just simpler to assume the worst case, when you jump into dynamically linked code you never really know what is on the other side.
I try to stay away from cgo with the exception of hardware specific code like opengl.
If you need to have an inner loop calling tons of short c functions perhaps write that inner loop in c as well?
Once you jump into c you are free to jump around c/other without penalty, assuming good code...

Still didn't get an answer for Do functions from the gccgo compiled code and gcc complied C libraries have the same calling conventions?


No and partially.  C code can't return multiple values for instance.

Ian Lance Taylor

unread,
Mar 2, 2013, 8:36:09 PM3/2/13
to Robert Zaremba, adam_smith, golang-nuts, r...@golang.org


On Mar 2, 2013 5:11 PM, "Robert Zaremba" <robert....@zoho.com> wrote:
>
> Still didn't get an answer for Do functions from the gccgo compiled code and gcc complied C libraries have the same calling conventions?

Yes, they do.  Multiple return values are handled as a struct.

The cgo code for gccgo is simpler (take a look).  But it still has to switch stacks.

Ian

Philipp Schumann

unread,
Mar 2, 2013, 8:39:17 PM3/2/13
to golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
Here's a vote for a new runtime.CgoSyscallOff bool option... who's with me?



> If you need to have an inner loop calling tons of short c functions perhaps
> write that inner loop in c as well?

That sounds... painful to some of us not-so-hardcore Go-lovers-but-C-haters ;)

bryanturley

unread,
Mar 2, 2013, 8:52:47 PM3/2/13
to golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
So in a prefect world of C code running on segmented stacks without thread local storage, cgo is almost unnecessary?
I do know that is how the c in the runtime is written.

Dmitry Vyukov

unread,
Mar 3, 2013, 12:21:07 AM3/3/13
to Maxim Khitrov, minux, bryanturley, golang-nuts, Robert Zaremba, adam_smith, Russ Cox, Erwin
What hardware are you using? I guess it's an old processor. On newer
processors the difference should be smaller.
Cgo call overhead with and w/o entersyscall is fine for episodic
and/or heavy C functions. And it is big for a C function returning 42
called in a tight loop both with and w/o entersyscall. So the
conclusion seems to be: do not call a C function returning 42 in a
tight loop.

bryanturley

unread,
Mar 3, 2013, 12:23:19 AM3/3/13
to golan...@googlegroups.com, Maxim Khitrov, minux, bryanturley, Robert Zaremba, adam_smith, Russ Cox, Erwin

At least not without a towel.
 

Ian Lance Taylor

unread,
Mar 3, 2013, 12:36:43 AM3/3/13
to bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
Even in that perfect world cgo is currently necessary, because you
need to tell that Go scheduler that you might be about to block in a
way that the Go scheduler does not understand.

When and if the Go scheduler gets smarter, then it is possible to
imagine that cgo would not be necessary for gccgo if you were able to
compile all of your C libraries with -fsplit-stack.

Ian

Archos

unread,
Mar 3, 2013, 3:34:10 AM3/3/13
to golan...@googlegroups.com, minux, bryanturley, Robert Zaremba, adam_smith, r...@golang.org, Erwin
Sure that it is a reference but it is right to have an idea about the performance of every language there.
In my case, when I've to port a graphics library to Go from Rust (when can be embebed into other software), the tests will be made to check whether a pure library in Go has better performance than a binding to Rust.

minux

unread,
Mar 3, 2013, 8:42:31 AM3/3/13
to Maxim Khitrov, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org, Erwin
On Sun, Mar 3, 2013 at 7:11 AM, Maxim Khitrov <m...@mxcrypt.com> wrote:
The scheduler interaction makes sense for system calls, but I don't
think this behavior is appropriate when executing any and all C
functions.
if you agree that the scheduler interaction makes sense for syscalls, and
also agree that foreign C function could make syscalls (the runtime just
won't know), why the interaction is not appropriate when executing C
function?

in theory, if you set GOMAXPROCS high enough so that there are always
free OS threads to run the goroutines, you still risk block the garbage collector
for indefinite amount of time when code is blocked in C world (the GC need
to stop the world).

in summary, removing the scheduler interaction might actually work and
increase performance for you, but we can't make that a default nor can we
make that optional, as make use of that require non-trivial prior thought,
and people will tend to abuse such feature to increase performance without
knowing its consequences.

minux

unread,
Mar 3, 2013, 8:45:01 AM3/3/13
to Ian Lance Taylor, bryanturley, golan...@googlegroups.com, Robert Zaremba, adam_smith, r...@golang.org
On Sun, Mar 3, 2013 at 1:36 PM, Ian Lance Taylor <ia...@golang.org> wrote:
Even in that perfect world cgo is currently necessary, because you
need to tell that Go scheduler that you might be about to block in a
way that the Go scheduler does not understand.

When and if the Go scheduler gets smarter, then it is possible to
imagine that cgo would not be necessary for gccgo if you were able to
compile all of your C libraries with -fsplit-stack.
so the real solution to this problem is adopt Dmitry's syscall/blocking monitor
approach so that the scheduler could know that a goroutine has blocked
an OS thread and it needs to start new OS threads to host other goroutines.