Calling C function is super slow

652 views
Skip to first unread message

Stefano Casillo

unread,
Apr 26, 2014, 10:54:22 PM4/26/14
to golan...@googlegroups.com
Hello,

I am a big fan and supporter of Go and try to evangelize everywhere and anytime I can.
I am using Go to write the race server side for Assetto Corsa (www.assettocorsa.net) with good results, BTW multiplayer (thus the Go race server) is going live next week so wish me luck on that :P .. however...

Trying to explore the use of Go for the client side of a 3D application (game or simulation) I started measuring the cost of calls made to C (tried both standard CGO and syscall on windows) and the results are discouraging to say the least:
I tried an "old style" OpenGL immediate mode benchmark and used a number of calls similar to the one I see in Assetto Corsa.. well, the bad news is that where my control application in C is taking 1ms to issue a couple of clear calls and tiny triangles (to avoid having the GPU influencing the results) the equivalent Go version takes about 9ms! That's more than half of the available time for a 60fps rendering application gone in just overhead for calling into C... a showstopper.
.NET performs much better on the same test taking about 1.42ms . In different tests (not using OpenGL but other calls to C libraries) the new .Net Native is halfway between C and traditional .Net .. Rust and D are on par with C, so 0 cost, but that's understandable considering they don't have a GC and goroutines infrastructure to manage.

This is on Go 1.2.1 windows/386
now to the questions:

1) Is there a margin for Go to get better at this? Or should we just give up on it for any kind of application that need a high rate of communication with a native API? I am thinking gui, 3d graphics, physics, audio.
2) I am also wondering if this costs is somehow lower on platforms that get traditionally more love from Go devs such as Linux and Mac? I don't have a Linux install handy to give it a go right now.
3) I haven't dug into the Go standard library but I suppose there are parts (sockets?) where it needs to call C somehow, is this overhead hurting Go in other places where perhaps Go devs might be more sensible?
4) Is it possible I am doing something wrong? How to go about to improve the situation?

Ian Lance Taylor

unread,
Apr 26, 2014, 11:07:28 PM4/26/14
to Stefano Casillo, golang-nuts
On Sat, Apr 26, 2014 at 7:54 PM, Stefano Casillo
<stefano...@gmail.com> wrote:
>
> 1) Is there a margin for Go to get better at this? Or should we just give up
> on it for any kind of application that need a high rate of communication
> with a native API? I am thinking gui, 3d graphics, physics, audio.

There is probably some room to get faster, but I don't think there is
very much. If you need to call into a C API often, it's going to work
better if you can batch your calls to reduce the number of times you
have to cross the language boundary.


> 2) I am also wondering if this costs is somehow lower on platforms that get
> traditionally more love from Go devs such as Linux and Mac? I don't have a
> Linux install handy to give it a go right now.

It's expensive on GNU/Linux also. I don't know if there is a
significant difference. I don't know why there would be one. My rule
of thumb is that a function call from Go to C costs about 10 function
calls, and that seems consistent with what you are seeing.


> 3) I haven't dug into the Go standard library but I suppose there are parts
> (sockets?) where it needs to call C somehow, is this overhead hurting Go in
> other places where perhaps Go devs might be more sensible?

The Go standard library only calls into C in two places: to look up
user names in the os/user package, and to resolve network names in the
net package. For things like sockets the Go standard library issues
system calls directly.


> 4) Is it possible I am doing something wrong? How to go about to improve the
> situation?

It is of course possible that you are doing something wrong, but the
slowness of calls from Go to C is a known issue.

Ian

Stefano Casillo

unread,
Apr 26, 2014, 11:19:38 PM4/26/14
to golan...@googlegroups.com, Stefano Casillo

There is probably some room to get faster, but I don't think there is
very much.  If you need to call into a C API often, it's going to work
better if you can batch your calls to reduce the number of times you
have to cross the language boundary.


this is very unfortunate and sad. It basically means Go will never be competitive in any real world application where performance matter.. it's going to be a step above things like Python but that's it.. all the stories about building a language to replace C++ are basically.. just stories. You can make a language as fast as you want, but if hit a 10x penalty every single time you have to interact with the OS it's all a waste of time... until we get an OS written in Go :P


The Go standard library only calls into C in two places: to look up
user names in the os/user package, and to resolve network names in the
net package.  For things like sockets the Go standard library issues
system calls directly.


I went looking fd_windows.go and ya, it's doing a syscall to WSASendTo of course.. so that's a 10x penalty right there, every single time. I am already too depressed so I won't even look into synchronization primitives.. I know I am going to find these 10x all over the place. 
 

It is of course possible that you are doing something wrong, but the
slowness of calls from Go to C is a known issue.



thx Ian for the answer.. I now have a good reason to call it a day and get drunk until it all feels right again :P
 

Ian Lance Taylor

unread,
Apr 26, 2014, 11:29:39 PM4/26/14
to Stefano Casillo, golang-nuts
On Sat, Apr 26, 2014 at 8:19 PM, Stefano Casillo
<stefano...@gmail.com> wrote:
>>
>> There is probably some room to get faster, but I don't think there is
>> very much. If you need to call into a C API often, it's going to work
>> better if you can batch your calls to reduce the number of times you
>> have to cross the language boundary.
>>
>
> this is very unfortunate and sad. It basically means Go will never be
> competitive in any real world application where performance matter.. it's
> going to be a step above things like Python but that's it.. all the stories
> about building a language to replace C++ are basically.. just stories. You
> can make a language as fast as you want, but if hit a 10x penalty every
> single time you have to interact with the OS it's all a waste of time...
> until we get an OS written in Go :P

I think you have misunderstood. There is a penalty for calling from
Go to C. There is no penalty for interacting with the OS.


>> The Go standard library only calls into C in two places: to look up
>> user names in the os/user package, and to resolve network names in the
>> net package. For things like sockets the Go standard library issues
>> system calls directly.
>>
>
> I went looking fd_windows.go and ya, it's doing a syscall to WSASendTo of
> course.. so that's a 10x penalty right there, every single time. I am
> already too depressed so I won't even look into synchronization primitives..
> I know I am going to find these 10x all over the place.

There is no penalty for the call to WSASendTo.

Ian

Ian Lance Taylor

unread,
Apr 26, 2014, 11:31:57 PM4/26/14
to Stefano Casillo, golang-nuts
Oh, sorry, I may be wrong about that. On Windows there does seem to
be a penalty that does not exist on GNU/Linux. I don't know how
serious it is.

Ian

Stefano Casillo

unread,
Apr 26, 2014, 11:36:47 PM4/26/14
to golan...@googlegroups.com, Stefano Casillo

There is no penalty for the call to WSASendTo.


WSASentTo is using a syscall. I tested calling a windows API through #include <windows.h> and then C.APIName() and also by using syscall opening the containing kernel32.dll and issuing a syscall.. the time for both is the same, both very slow.

Are you suggesting that if I try to get hold of all the OpenGL calls I do by syscall.LoadLibrary and syscall.GetProcAddress I should get better results? Because I am not seeing that at all.

 

minux

unread,
Apr 26, 2014, 11:37:06 PM4/26/14
to Ian Lance Taylor, Stefano Casillo, golang-nuts
on windows, basically every call into DLL is a cgo call.

Stefano Casillo

unread,
Apr 26, 2014, 11:44:01 PM4/26/14
to golan...@googlegroups.com, Ian Lance Taylor, Stefano Casillo


on windows, basically every call into DLL is a cgo call.


what about Linux then? Are syscalls' cost 0? Sadly I am not familiar with Linux so.. is it possible to call OpenGL via syscall on Linux? It is possible on Windows.
If it is possible, I wonder why none of the existing OpenGL bindings are doing it instead of using CGO.
 

minux

unread,
Apr 26, 2014, 11:52:26 PM4/26/14
to Stefano Casillo, golang-nuts, Ian Lance Taylor
On Sat, Apr 26, 2014 at 11:44 PM, Stefano Casillo <stefano...@gmail.com> wrote:
on windows, basically every call into DLL is a cgo call.
what about Linux then? Are syscalls' cost 0?
yes. calling real syscall has zero cost for Go. 
Sadly I am not familiar with Linux so.. is it possible to call OpenGL via syscall on Linux?
no. OpenGL is not implemented by the Linux kernel directly, so you can't use syscall to call OpenGL calls. 
It is possible on Windows.
they're not real syscall. windows doesn't expose the real "system call".
the so-called "syscall" on windows is still calling into function exported by dll.

Gustavo Niemeyer

unread,
Apr 27, 2014, 12:00:43 AM4/27/14
to Stefano Casillo, golan...@googlegroups.com
On Sun, Apr 27, 2014 at 12:36 AM, Stefano Casillo
<stefano...@gmail.com> wrote:
> Are you suggesting that if I try to get hold of all the OpenGL calls I do by
> syscall.LoadLibrary and syscall.GetProcAddress I should get better results?
> Because I am not seeing that at all.

There's a technique that may be used to pretty much eliminate that
cost with certain APIs, and OpenGL is especially adequate for it:
instead of having Go call C on every single API, you can buffer a
number of calls in memory, and then cross the bridge just once for
flushing. This will eliminate the 10x pretty much entirely. I'm
planning to do that with the qml package when the time comes.

Another relevant detail which perhaps you already have in mind is that
with modern OpenGL you can load most of the content in the graphics
card memory, and then manipulate it there, instead of reloading it for
every frame.


gustavo @ http://niemeyer.net

Stefano Casillo

unread,
Apr 27, 2014, 12:13:00 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo

There's a technique that may be used to pretty much eliminate that
cost with certain APIs, and OpenGL is especially adequate for it:
instead of having Go call C on every single API, you can buffer a
number of calls in memory, and then cross the bridge just once for
flushing. This will eliminate the 10x pretty much entirely. I'm
planning to do that with the qml package when the time comes.


Ya that means writing a layer in C/C++ with high level concepts for things like Material, Meshes, Models, Textures and so on... at that point it becomes a 2 languages project and frankly I don't see the point of it.
 
Another relevant detail which perhaps you already have in mind is that
with modern OpenGL you can load most of the content in the graphics
card memory, and then manipulate it there, instead of reloading it for
every frame.


Sure but once you start adding passes for shadows, reflections and whatnot you're still looking into 3000-10000 calls to the API per frame for anything beyond simple demos.. and that's what I've bechmarked for and the results say that simply Go isn't able to handle that kind of traffic. Actually it is able, but you have to give up half of your frame time to the cost of calling CGO.

Stephen Gutekanst

unread,
Apr 27, 2014, 12:25:03 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo
I actually wrote an OpenGL wrapper generator in Go [1] which can be used with QML. It allows for batching of OpenGL calls into a small stack and on the C side of things executes them in large batches using a jump table.

Now, there is the cost of that additional stack and I've not bench-marked the performance cost of that.

Stephen

Gustavo Niemeyer

unread,
Apr 27, 2014, 12:56:27 AM4/27/14
to Stephen Gutekanst, golan...@googlegroups.com, Stefano Casillo
Oh, nice. Curious to see what's the performance impact of it.

The stack shouldn't be much of a burden if it's not re-allocated all
the time. sync.Pool might also help there.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--

gustavo @ http://niemeyer.net

Gustavo Niemeyer

unread,
Apr 27, 2014, 12:58:51 AM4/27/14
to Stefano Casillo, golan...@googlegroups.com
On Sun, Apr 27, 2014 at 1:13 AM, Stefano Casillo
<stefano...@gmail.com> wrote:
> Ya that means writing a layer in C/C++ with high level concepts for things
> like Material, Meshes, Models, Textures and so on... at that point it
> becomes a 2 languages project and frankly I don't see the point of it.

It's much simpler than that, as it's just parameter batching, and the
point is to overcome the overhead you were complaining about.

>> Another relevant detail which perhaps you already have in mind is that
>> with modern OpenGL you can load most of the content in the graphics
>> card memory, and then manipulate it there, instead of reloading it for
>> every frame.
>>
>
> Sure but once you start adding passes for shadows, reflections and whatnot
> you're still looking into 3000-10000 calls to the API per frame for anything
> beyond simple demos.. and that's what I've bechmarked for and the results
> say that simply Go isn't able to handle that kind of traffic. Actually it is
> able, but you have to give up half of your frame time to the cost of calling
> CGO.

Yes, there is a cost, and yes, there are possible solutions, and
indeed, you'd have to spend time implementing them.


gustavo @ http://niemeyer.net

Stefano Casillo

unread,
Apr 27, 2014, 1:19:55 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo


On Sunday, April 27, 2014 12:25:03 PM UTC+8, Stephen Gutekanst wrote:
I actually wrote an OpenGL wrapper generator in Go [1] which can be used with QML. It allows for batching of OpenGL calls into a small stack and on the C side of things executes them in large batches using a jump table.

Now, there is the cost of that additional stack and I've not bench-marked the performance cost of that.

Stephen


nice! Very very clever.
But the very fact something like this becomes necessary should be a sign that something is deeply wrong in Go. If this is the "solution" the result is that it will simply send any programmer interested in interacting with native API running away to better alternatives.

Also the std Go libs do not use this system, so, as it is at the moment, on Windows any Go program is paying a huge 10x price for every interaction with the OS.

Definitely something Go devs should have a look at if they really want to target C++ devs and not (only) Python devs.
  
 

Stefano Casillo

unread,
Apr 27, 2014, 1:24:55 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo

Yes, there is a cost, and yes, there are possible solutions, and
indeed, you'd have to spend time implementing them.


ya true.. but I think I'll simply pass on this one until somebody in the Go team will realize this cannot work.

Gustavo Niemeyer

unread,
Apr 27, 2014, 1:51:18 AM4/27/14
to Stefano Casillo, golan...@googlegroups.com
On Sun, Apr 27, 2014 at 2:19 AM, Stefano Casillo
<stefano...@gmail.com> wrote:
> But the very fact something like this becomes necessary should be a sign
> that something is deeply wrong in Go. If this is the "solution" the result
> is that it will simply send any programmer interested in interacting with
> native API running away to better alternatives.

It's fine that you don't want to do that, and it's fine that you want
to use something else, but the generalization is simply not true. I'm
not running away, that issue has never bothered me although I'm
wrapping native APIs for several years now, and there are multiple
solutions to solve the problem you describe in the packages I care
about, when the time comes.


gustavo @ http://niemeyer.net

minux

unread,
Apr 27, 2014, 2:30:04 AM4/27/14
to Stefano Casillo, golang-nuts
On Sun, Apr 27, 2014 at 1:19 AM, Stefano Casillo <stefano...@gmail.com> wrote:
On Sunday, April 27, 2014 12:25:03 PM UTC+8, Stephen Gutekanst wrote:
I actually wrote an OpenGL wrapper generator in Go [1] which can be used with QML. It allows for batching of OpenGL calls into a small stack and on the C side of things executes them in large batches using a jump table.

Now, there is the cost of that additional stack and I've not bench-marked the performance cost of that.
nice! Very very clever.
But the very fact something like this becomes necessary should be a sign that something is deeply wrong in Go. If this is the "solution" the result is that it will simply send any programmer interested in interacting with native API running away to better alternatives.

Also the std Go libs do not use this system, so, as it is at the moment, on Windows any Go program is paying a huge 10x price for every interaction with the OS.
I think you misunderstand, or exaggerate, what 10x overhead mentioned by iant means.

it means the function call costs ~10x as much as a normal Go call, but the time to actually
do the work should be the same. If the function call overhead is too big, then it simply means
the function is doing too little.

For most windows "syscalls" used by Go, that is never a problem, because doing a single IO
will require much more time than 10 function calls.

Stefano Casillo

unread,
Apr 27, 2014, 3:02:55 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo


On Sunday, April 27, 2014 2:30:04 PM UTC+8, minux wrote:
I think you misunderstand, or exaggerate, what 10x overhead mentioned by iant means.

it means the function call costs ~10x as much as a normal Go call, but the time to actually
do the work should be the same. If the function call overhead is too big, then it simply means
the function is doing too little.

For most windows "syscalls" used by Go, that is never a problem, because doing a single IO
will require much more time than 10 function calls.

ya of course I understand that the performance of the function itself is the same. And ya, you are right, for most application the overhead cost isn't going to be an issue at all.. which, actually, makes things worst for game programmers willing to use Go because they are pretty much guaranteed that this situation isn't going to be improved in any way because it's "good enough" for the other fields where Go is used.

I wonder if things are better with gccgo.. although, gccgo isn't supported on Windows.. so we're back at square one.. it's such a shame: Go is my fav language and still, I can't use it in my daily work.

Stephen Gutekanst

unread,
Apr 27, 2014, 6:08:16 AM4/27/14
to Stefano Casillo, golan...@googlegroups.com
I'm sorry but I don't think that the function call overhead is too bad for game development either. Immediate mode is deprecated in OpenGL anyways, if you're using immediate mode you should be expecting bad performance period.

There are very few game development C libraries that require such a large amount of function calls, the main offenders are OpenGL and perhaps a few physics engines, but again most of the time the cost here is negligible (and when it isn't: make fewer C calls). I don't think it's as big of a deal as you make it out to be.

Yes, you will need to work around it (kind of like how you will need to work around the garbage collector which might give you random pauses) but in the end only you can decide if the benefits of Go outweigh the downsides.

Go uses a different calling convention than C and C++ and honestly I don't think it's outrageous at all that there is an overhead for calling C functions, especially given the fact that the reason (AFAIK) the overhead is there is because of Go uses split stacks which is needed for goroutines to not consume lots of memory.

And there are other areas where Go sacrifices some performance for convenience and overall clarity, e.g. garbage collection. 

There are ways to work around these issues today and Go still looks like a better alternative than C for my game development projects. Maybe the situation will improve in the future (perhaps Go could do batching of C calls auto-magically or something), but it's not the case today and I still don't think it's a deal-breaker.

Stephen

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/RzA3HLI7ZKI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Stefano Casillo

unread,
Apr 27, 2014, 6:43:29 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo


On Sunday, April 27, 2014 6:08:16 PM UTC+8, Stephen Gutekanst wrote:
I'm sorry but I don't think that the function call overhead is too bad for game development either. Immediate mode is deprecated in OpenGL anyways, if you're using immediate mode you should be expecting bad performance period.


Stephen, thanks for your input. Do you think I wrote the game I linked in immediate mode? :P The point I am trying to make is that I KNOW how many calls you need to do to a graphics API every frame and I used OpenGL immediate mode just to test what happens when you feed Go with a similar amount of calls.
The fact that the C version on the same machine, same code line by line, same driver runs at 900 something fps (around 1.something ms per frame) and the Go version runs at 100fps (around 10x) both drawing basically nothing means that, a dev using Go for a 3D engine that is looking into making that number of calls into an API (ANY call, ANY API) is ready to accept the fact that half of his CPU budget goes into Go's overhead.
That has NOTHING to do with OpenGL nor immediate or not immediate mode.
Perhaps in the game you are planning to do that overhead is acceptable.. for my kind of stuff or any other serious game dev I am familiar with it just isn't.. no sane game programmer would give away 8-9ms of his CPU time just like that.
 
There are very few game development C libraries that require such a large amount of function calls, the main offenders are OpenGL and perhaps a few physics engines, but again most of the time the cost here is negligible (and when it isn't: make fewer C calls). I don't think it's as big of a deal as you make it out to be.


Well it's a big deal if your company and the people working for it depend on your software hitting 60Hz on the graphics and 300-400Hz on the physics side.... by looking and feeling better than the competition, it's simply undoable. 
 
Go uses a different calling convention than C and C++ and honestly I don't think it's outrageous at all that there is an overhead for calling C functions, especially given the fact that the reason (AFAIK) the overhead is there is because of Go uses split stacks which is needed for goroutines to not consume lots of memory.

I am sure there is a very good reason for having this overhead...I don't write compilers nor runtimes and I respect those who do.
Still, this just stops Go from being viable for serious game development. The only alternative I see is to create the core engine in C++ and use Go to drive the high level logic.. so you would call a "DrawModel" C function from Go that draws the entire model with submeshes, material and what not.. but, it doesn't make sense to write a core in a language like C++ and then write the game code with a language that offers less abstractions like Go... it's doable, but it doesn't make sense at all.

There are ways to work around these issues today and Go still looks like a better alternative than C for my game development projects. Maybe the situation will improve in the future (perhaps Go could do batching of C calls auto-magically or something), but it's not the case today and I still don't think it's a deal-breaker.


If Go works for you, that's great, I am happy for you because it's a great language. Clearly we have different performance requirements for our software.
I look forward to see your engine grow up and I will follow it with interest.


 

Gustavo Niemeyer

unread,
Apr 27, 2014, 7:35:34 AM4/27/14
to Stefano Casillo, golan...@googlegroups.com
On Sun, Apr 27, 2014 at 7:43 AM, Stefano Casillo
<stefano...@gmail.com> wrote:
> That has NOTHING to do with OpenGL nor immediate or not immediate mode.
> Perhaps in the game you are planning to do that overhead is acceptable.. for
> my kind of stuff or any other serious game dev I am familiar with it just
> isn't.. no sane game programmer would give away 8-9ms of his CPU time just
> like that.

It's fine that you're not interested in pursuing the ideas offered,
but it's uncandid to continue discoursing as if they were not given
and the problem was completely unavoidable. Besides the batching idea,
there are also ideas I'd try, but all of them require interest and
dedication towards optimization, which is somewhat of an expected
trait in serious game development either way.


gustavo @ http://niemeyer.net

Stefano Casillo

unread,
Apr 27, 2014, 7:57:16 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo


On Sunday, April 27, 2014 7:35:34 PM UTC+8, Gustavo Niemeyer wrote:

It's fine that you're not interested in pursuing the ideas offered,
but it's uncandid to continue discoursing as if they were not given
and the problem was completely unavoidable.

Ya, I have to admit that the "offered" idea of writing 15k lines of code just to be able to call into OpenGL doesn't really strike me as something I want to do. :)
I think it's better to use the right tool for the job.
I'll use Go for things that seem more natural for the way the language is designed and remain at the window hoping something will change.
 

Gustavo Niemeyer

unread,
Apr 27, 2014, 8:10:31 AM4/27/14
to golan...@googlegroups.com

Not 15k lines either, but sure, having no interest on things is completely fine.

gustavo @ http://niemeyer.net

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Stephen Gutekanst

unread,
Apr 27, 2014, 8:18:48 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo
As I mentioned earlier the code was generated, not written by hand. The suggestion is not to write a large amount of code, the suggestion is to work around the issue one of two ways:
  1. create your renderer (up until acceptable performance is found) in C and call it from Go.
  2. make several calls to OpenGL through one C call (which I mentioned you can have already with my OpenGL wrappers).
I did not think that you made the game in immediate mode, no. I was simply trying to state that the penalty would be significantly less without immediate mode, but it's clear to me now you have already considered that. I do completely think however that making a game similar or better than the one you have linked would be possible in Go.

It would be great to have an implementation of Go that allows calling C functions with no penalty at all, sure, but it's also not practical. The reason calling C functions is slow is for a reason, you might not have interest in it, but there is a reason behind it. If you're not interested then accept the cost, if you won't accept the cost then work around it. It is clear that you have weighed the benefits of Go versus the negatives. It sounds like Go is not the right choice for your project combined with your programming style or mindset.

I really did not intend to derail the conversation or upset any party involved, and I truly do apologize if that is the way that I came across.
Stephen

Stefano Casillo

unread,
Apr 27, 2014, 8:31:08 AM4/27/14
to golan...@googlegroups.com, Stefano Casillo
On Sunday, April 27, 2014 8:18:48 PM UTC+8, Stephen Gutekanst wrote:
I do completely think however that making a game similar or better than the one you have linked would be possible in Go.


well what can I say? talk is cheap.
 
 

David Arroyo

unread,
Apr 27, 2014, 8:35:51 AM4/27/14
to Stephen Gutekanst, golan...@googlegroups.com, Stefano Casillo

On Apr 27, 2014, at 8:18 AM, Stephen Gutekanst <stephen....@gmail.com> wrote:

> It would be great to have an implementation of Go that allows calling C functions with no penalty at all, sure, but it's also not practical.

I wonder about that. As far as I understand, there are a few reasons
for the cgo overhead:

- The Go stack is different from the C stack (for light-weight goroutines)
- The Go calling convention is different (with the gc compiler, at least)
- The Go scheduler needs to prevent a cgo call from blocking other
goroutines on the same thread.

You could create a modified Go implementation that did N:N scheduling
between goroutines and OS threads. It is a big tradeoff—go routines
become much more expensive—but there are a lot of great things about
Go besides lightweight goroutines. They would still be easier to
use than libpthread :). It would certainly make OpenGL and other
game libs easier to deal with if we knew a goroutine kept its own
OS thread.

It would be a significant amount of work on the scheduler and cgo
itself. But I think there would be a lot of interest in such a Go
implementation.

-David

jonathan....@gmail.com

unread,
Apr 27, 2014, 8:46:19 AM4/27/14
to golan...@googlegroups.com
You should profile it. I did a benchmark of Go OpenGL performance that I haven't got around to publishing, and running it through the Go profiler I found that the use of immediate mode did a bunch of stupid memory allocation stuff. Changing it to avoid immediate mode however resulted in performance comparable to C, so I highly recommend you don't judge Go OpenGL performance by immediate mode.

If you're interested, the Go implementation is at https://github.com/logicchains/ParticleBench/blob/master/Go.go, and the C one is at https://github.com/logicchains/ParticleBench/blob/master/C.c; they both run at around the same speed, with the majority of the runtime being spent in glDrawArrays.

If you use Go with modern OpenGL and object pooling I'd be really surprised if it's more than 5% slower than C.
 
On Sunday, 27 April 2014 12:54:22 UTC+10, Stefano Casillo wrote:
Hello,

I am a big fan and supporter of Go and try to evangelize everywhere and anytime I can.
I am using Go to write the race server side for Assetto Corsa (www.assettocorsa.net) with good results, BTW multiplayer (thus the Go race server) is going live next week so wish me luck on that :P .. however...

Trying to explore the use of Go for the client side of a 3D application (game or simulation) I started measuring the cost of calls made to C (tried both standard CGO and syscall on windows) and the results are discouraging to say the least:
I tried an "old style" OpenGL immediate mode benchmark and used a number of calls similar to the one I see in Assetto Corsa.. well, the bad news is that where my control application in C is taking 1ms to issue a couple of clear calls and tiny triangles (to avoid having the GPU influencing the results) the equivalent Go version takes about 9ms! That's more than half of the available time for a 60fps rendering application gone in just overhead for calling into C... a showstopper.
.NET performs much better on the same test taking about 1.42ms . In different tests (not using OpenGL but other calls to C libraries) the new .Net Native is halfway between C and traditional .Net .. Rust and D are on par with C, so 0 cost, but that's understandable considering they don't have a GC and goroutines infrastructure to manage.

This is on Go 1.2.1 windows/386
now to the questions:

1) Is there a margin for Go to get better at this? Or should we just give up on it for any kind of application that need a high rate of communication with a native API? I am thinking gui, 3d graphics, physics, audio.
2) I am also wondering if this costs is somehow lower on platforms that get traditionally more love from Go devs such as Linux and Mac? I don't have a Linux install handy to give it a go right now.
3) I haven't dug into the Go standard library but I suppose there are parts (sockets?) where it needs to call C somehow, is this overhead hurting Go in other places where perhaps Go devs might be more sensible?
4) Is it possible I am doing something wrong? How to go about to improve the situation?

Gustavo Niemeyer

unread,
Apr 27, 2014, 8:57:41 AM4/27/14
to jonathan....@gmail.com, golan...@googlegroups.com


Thanks!

That's how actual interest looks like.

gustavo @ http://niemeyer.net

Stefano Casillo

unread,
Apr 27, 2014, 9:24:12 AM4/27/14
to golan...@googlegroups.com, jonathan....@gmail.com


On Sunday, April 27, 2014 8:46:19 PM UTC+8, jonathan....@gmail.com wrote:

they both run at around the same speed, with the majority of the runtime being spent in glDrawArrays.


That means you are GPU bound in your example. To measure raw call cost you should call the API without drawing stuff, or drawing tiny 1 pixel triangles or even calling triangles out of the framebuffer, that way you are sure you become CPU bound and then you can really see the difference and it is huge.
Again, this has nothing to do with immediate mode, we're not benchmarking OpenGL here.. we're benchmarking Go calling into an API.
You can do the exact same test with any CGO call.. and get the exact same result of Go being at about 10x in calling into the function. Just as Ian Lance Taylor explained before.
 

Gustavo Niemeyer

unread,
Apr 27, 2014, 9:50:11 AM4/27/14
to golan...@googlegroups.com, jonathan....@gmail.com

That implies all your code ever does is to call empty C functions.

If you'd like to see people collaborating to solve that problem, a good start would be an actual benchmark.

gustavo @ http://niemeyer.net

--

Stefano Casillo

unread,
Apr 27, 2014, 9:52:10 AM4/27/14
to golan...@googlegroups.com, jonathan....@gmail.com
On Sunday, April 27, 2014 8:46:19 PM UTC+8, jonathan....@gmail.com wrote:
You should profile it. I did a benchmark of Go OpenGL performance that I haven't got around to publishing, and running it through the Go profiler I found that the use of immediate mode did a bunch of stupid memory allocation stuff. Changing it to avoid immediate mode however resulted in performance comparable to C, so I highly recommend you don't judge Go OpenGL performance by immediate mode.


I actually tried your benchmarks and the results are closer.. average 600fps for the C++ version vs. 400fps for Go with more than 100k API calls per frame.. which is, VERY GOOD news indeed!
 
Reply all
Reply to author
Forward
0 new messages