Why is runtime.Gosched needed in this single-threaded, no-goroutines cgo program?

332 views
Skip to first unread message

Marko Bencun

unread,
Feb 15, 2023, 11:42:24 AM2/15/23
to golan...@googlegroups.com
Dear Gophers

I am running into a a weird error in C code that only materializes
when calling it from Go using cgo. I hope you could help me understand
this behaviour and why calling runtime.Gosched() seems to resolve the
issue. I worry that this may not be the right fix, so I want to
understand what is going on.

I communicate to a USB HID device and get an error every once in a
while, only when using cgo, in around 5% of the messages sent to the
device, non-deterministically. The messages sent and received are all
the same and work most of the time.

The C functions I am calling are hid_write and hid_read from here:
https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c

If I call hid_write and hid_read in a loop in a C program, everything
works as expected:

```
#include "hidapi.h"
#include "hidapi_darwin.h"

int run(hid_device *handle) {
int res;

const uint8_t write[64] = "<some 64 byte msg>";
const uint8_t expected[64] = "<expected 64 byte response>";
uint8_t read[64] = {0};

while(1) {
res = hid_write(handle, write, sizeof(write));
if(res < 0) return -1;
res = hid_read(handle, read, sizeof(read));
if(res < 0) return -1;
if (memcmp(read, expected, 64)) return -1;
}

return 0;
}
```

I ported the above to Go using cgo like below. I link `hid.o`, the
same object file used in the C program, to make sure the same code is
running to rule out differences in compilation:

```
package main

import (
"bytes"
"encoding/hex"
)

/*
#cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation
-framework AppKit hid.o
#include "hidapi.h"
#include "hidapi_darwin.h"
*/
import "C"

func main() {
dev := C.hid_open(...)
if dev == nil {
panic("no device")
}
write := []byte("<some 64 byte msg>")
expected := []byte("<expected 64 bytes response>")
b := make([]byte, 64)
for {
written := C.hid_write(dev, (*C.uchar)(&write[0]), C.size_t(len(write)))
if written < 0 {
panic("error write")
}
read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b)))
if read < 0 {
panic("error read")
}
if !bytes.Equal(b, expected) {
panic("not equal")
}
}
}
```

The Go program errors on hid_write with a "generic error" error code
non-deterministically in about 5% of the messages sent. The USB device
receives the message and responds normally however.

I randomly tried adding `runtime.Gosched()` at the bottom of the
Go-loop, and the issue disappeared completely. I am very confused why
that would help, as the Go program has, as far as I can tell, no
threads and no goroutines (except main).

Other things I have checked:

- If I move the loop from Go to C by calling `C.run(dev)` (the looping
function defined above) from Go, there is no issue.
- LockOSThread: if the loop runs in a goroutine and that goroutine
switches OS threads, the issue reappears after some time (not
immediately after a thread switch - the error happens
non-deterministically) even if `runtime.Gosched()` is called.
`runtime.LockOSThread()` is needed to fix it in that case. Since the
goroutine is locked to an OS thread during the execution of a C
function call anyway, this indicates that either hidapi or the macOS
HID functions rely on thread-local vars across multiple C calls in
some way, which seems a bit crazy.
- In the above code (no goroutines), I checked that the OS thread ID
(pthread_self()) is constant for all calls, and yet the issue appears
unless runtime.Gosched() is called, which seems to contradict the
previous point
- I tried multiple Go versions between 1.16 and 1.20 and multiple
macOS SDK versions between 10.13 and 13.1, all with the same problem
(and same working fix).
- only macOS is affected - on linux and Windows, there is no similar
issue (these platforms use different C code to interact with the
device).

Does anyone have any insight into why invoking the scheduler could be
necessary here or what could be going on in general? My worry is that
using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper
fixes.

Best,
Marko

Ian Lance Taylor

unread,
Feb 15, 2023, 4:05:39 PM2/15/23
to Marko Bencun, golan...@googlegroups.com
I didn't try to look at this in detail, but I did glance at the C code
you are calling, and it uses pthread_mutex_lock and friends. In
general pthread mutexes must be unlocked by the thread that locked
them, so it is quite possible that LockOSThread is required for
correct operation. I don't have an explanation for why Gosched would
help, though.

Ian

Marko Bencun

unread,
Feb 15, 2023, 5:37:17 PM2/15/23
to Ian Lance Taylor, golan...@googlegroups.com
That is a good hint, thanks. However, in the Go code above, there are
no goroutines and everything runs in one thread as far as I can tell,
but the issue appears anyway.

Ian Lance Taylor

unread,
Feb 15, 2023, 5:53:14 PM2/15/23
to Marko Bencun, golan...@googlegroups.com
On Wed, Feb 15, 2023 at 2:36 PM Marko Bencun <be...@shiftcrypto.ch> wrote:
>
> That is a good hint, thanks. However, in the Go code above, there are
> no goroutines and everything runs in one thread as far as I can tell,
> but the issue appears anyway.

There are always goroutines, even if your program doesn't start them.
For example, the garbage collector normally runs four goroutines in
the background. The main goroutine can and will run on different
threads even without any actual "go" statements.

Ian

David Finkel

unread,
Feb 16, 2023, 10:44:34 AM2/16/23
to Ian Lance Taylor, Marko Bencun, golan...@googlegroups.com
I wonder if Gosched is preventing signal-based preemption from happening by calling into the scheduler often enough.
(it may not be fixing the issue that Ian identifies above 100% , but may make it extremely unlikely under some conditions)


Ian

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXYYvAYzDzhduz-dtWP0Xqgo24DQMzp4zMhDRYp6YvH8Q%40mail.gmail.com.

Marko Bencun

unread,
Feb 22, 2023, 8:19:00 AM2/22/23
to Ian Lance Taylor, golan...@googlegroups.com
That's good to know, thanks. I did log the current OS thread ID using
`pthread_self()` as well as with `pthread_threadid_np()` and the
thread ID is constant for all hid_write and hid_read calls. The issue
appears anyway, unless the scheduler is invoked.
Reply all
Reply to author
Forward
0 new messages