Hi Robert,
First, note that the contention profile for runtime-internal locks doesn't correctly blame the part of the code that _caused_ delay: with Go 1.22 and 1.23, the call stacks are of (runtime-internal) lock users that _experienced_ delay. That's
https://go.dev/issue/66999, and those odd semantics are the reason that those stacks aren't shown by default (and instead require the GODEBUG). That was briefly fixed in late May, but caused some slowdowns and so couldn't be part of the Go 1.23 release.
Contention within findRunnable is probably on sched.lock, which protects the global portion of the scheduler. (Often, scheduling only needs to involve each P's local run queue.)
> Is this a side effect of the scheduler, and it is really blocked in the unlock call waiting for another routine to be runnable?
There really is a thread (an M with a P) that is blocked. It's not waiting for a G to be runnable .. depending on the exact part of findRunnable, it might be waiting to see whether a goroutine is runnable or not, or the M might be trying to go to sleep because it's determined that there's no work. The profile also shows runtime.goschedImpl, which means there's a G calling runtime.Gosched, and its M (and P) are blocked trying to add the G to the global run queue.
As for what's causing it ... consider how much work the program's goroutines do when they run (maybe "only tens of microseconds"), how and how often the program uses Gosched (maybe "a bit too much", though I can't quantify that), the setting for GOMAXPROCS (maybe "lots", though GOOS=darwin implies it won't be huge). Summing the runtime/metrics buckets for "/sched/latencies:seconds" and multiplying by 8 (runtime.gTrackingPeriod) will give an estimate of how many times the scheduler started running a goroutine ... or a runtime/trace can show all of those details at once.
Rhys