C to Go calls taking a long time - is this cgo overhead or my mistake?

530 views
Skip to first unread message

Tom Larsen

unread,
May 27, 2020, 1:08:17 PM5/27/20
to golang-nuts
I am attempting to build a Golang SDK for the Alteryx analytic application.  Alteryx provides a C API for interacting with the engine, so I thought I would use cgo to build a bridge between Alteryx and Go.

The basic flow-of-control looks something like this:
  1. The engine pushes a record of data (a C pointer to a blob of bytes) to my SDK by calling a cgo function (iiPushRecord). So, C is calling Go here. My cgo function looks like this:
    //export iiPushRecord
    func iiPushRecord
    (handle unsafe.Pointer, record unsafe.Pointer) C.long {
        incomingInterface
    := pointer.Restore(handle).(IncomingInterface)
       
    if incomingInterface.PushRecord(record) {
           
    return C.long(1)
       
    }
       
    return C.long(0)
    }
  2. My SDK calls a method on an interface that does something with the data.  For my basic example, I'm just copying the data to some outgoing buffers (theoretically, a best case scenario).
  3. The interface object pushes the data back to the engine by calling my SDK's PushRecord function, which in turn calls a similar C function on the engine.  The PushRecord function in my SDK looks like this:
    func PushRecord(connection *ConnectionInterfaceStruct, record unsafe.Pointer) error {
        result
    := C.callPushRecord(connection.connection, record)
       
    if result == C.long(0) {
           
    return fmt.Errorf(`error calling pII_PushRecord`)
       
    }
       
    return nil
    }

    and the callPushRecord function in C looks like this:
    long callPushRecord(struct IncomingConnectionInterface * connection, void * record) {
       
    return connection->pII_PushRecord(connection->handle, record);
    }
When I execute my base code 10 million times (simulating 10 million records) in a unit test, it will execute in 20-30 seconds.  This test does not include the cgo calls.  However, when I package the tool and execute it in Alteryx with 10 million records, it takes about 1 minute 20 seconds to execute.  I benchmarked against an equivalent tool I built using Alteryx's own Python SDK, which takes 1 minute.  My goal is to be faster than Python.

I ran a CPU profile while Alteryx was running.  Of the 1.38 minute runtime, the profile samples covered 42.95 seconds.  The profile starts out like this:

crosscall2 (0%) -> _cgoexp_89e40a732b6d_iiPushRecord (0%) -> runtime cgoballback (0%) -> runtime cgocallback_gofunc (0.14%)

At this point, the profile branches into 3:
  1. runtime cgocallback, which eventually calls all of my SDK code.  This branch accounts for 17.06 seconds in total
  2. runtime needm, which accounts for 8.21 seconds in total
  3. runtime dropm, which accounts for 17.43 seconds in total
If you want a graphical display of the profile, it's here: https://i.stack.imgur.com/CphbG.png

It looks like the C to Go overhead is responsible for ~60% of the total execution time?  Is this the correct way to interpret the profile?  If so, is it because of something I did wrong, or is this overhead inherent to the runtime?  There isn't noticeable overhead when my Go code calls C, so the upfront overhead from C to Go really surprised me.  Is there anything I can do here?

I am running Go 1.14.3 on windows/amd64.  It's actually a Windows 10 VM on my Macbook, if that makes any difference.

All of the code is on GitHub: https://github.com/tlarsen7572/goalteryx

Note: I asked this on SO a few days ago, but got no answers, so I thought I would try here.  I hope that's ok.

Ian Lance Taylor

unread,
May 27, 2020, 6:44:18 PM5/27/20
to Tom Larsen, golang-nuts
I haven't looked at your code in detail. But a plausible rule of
thumb is that a call from Go to C takes as long as ten function calls,
and calling from C to Go is worse. There are several reasons for
this, and there is certainly interest in making it faster, but it's a
hard problem.

This unfortunately means that you should not design your program to
casually call between Go and C. Where possible you should batch calls
and you should try to build data structures entirely in one language
before passing them to the other language.

Sorry for the difficulties.

Ian

Tom Larsen

unread,
May 27, 2020, 8:10:43 PM5/27/20
to golang-nuts
Thanks Ian, no need to apologize, I know this stuff is hard.  I'll adjust my approach and see if I can conjure up enough working C code to try buffering the incoming data, and then call into Go in batches.  Even buffering as few as 10 records at a time should significantly speed up the execution if my issue truly is cgo overhead.

Tom

Tom Larsen

unread,
May 30, 2020, 9:07:42 PM5/30/20
to golang-nuts
I've managed to batch the incoming data, so I thought I would provide an update:

Batching 10 records at a time reduced runtime to just under 40 seconds (beating Python!), so the slowdown I am seeing is overhead.  In my case the overhead equates to about 5 microseconds per call from C to Go.

Ian Lance Taylor

unread,
May 30, 2020, 10:24:23 PM5/30/20
to Tom Larsen, golang-nuts
On Sat, May 30, 2020, 6:08 PM Tom Larsen <larsen...@gmail.com> wrote:
I've managed to batch the incoming data, so I thought I would provide an update:

Batching 10 records at a time reduced runtime to just under 40 seconds (beating Python!), so the slowdown I am seeing is overhead.  In my case the overhead equates to about 5 microseconds per call from C to Go.

Thanks for the update.

It's interesting that needm and extram are so high.  Are these calls from threads started by C to Go, as opposed to calls from Go to C to Go?  In the current implementation that is the worst case.

Ian




--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/67cf04eb-ea28-4ac9-b341-ee8d33af992a%40googlegroups.com.

Tom Larsen

unread,
May 31, 2020, 8:58:55 AM5/31/20
to golang-nuts
The threads are started by C.  The C application starts by calling into Go/cgo to fill some structs with function pointers, which get returned back to C.  C then calls those function pointers as it needs to.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages