I suppose I don't understand how this could work. A goroutine that is
separate from the scheduler wouldn't be able to do anything that
requires scheduling interaction. That includes allocating memory,
storing a pointer into Go memory (which may require a write barrier),
sending on or receiving from a channel, and so forth. It would be an
extremely limited version of Go, and it would be essentially
impossible for any non-expert to write such code.
As far as I can tell the cgo calling sequence does not acquire the
scheduler lock. I ran perf on a simple cgo call and in my
measurements using perf the hottest line is the atomic.Cas in
runtime.casgstatus. It's not 80% of the time in my measurements, it's
more like 19%. But still. Then 9% of the time seems to be taken by
the `atomic.Store(&pp.status, _Psyscall)` in runtime.reentersyscall.
I don't know why these seem slow, as I would expect these atomic
operations to be uncontended.
Ian