Help with sync.Cond failing to signal

153 views
Skip to first unread message

Kevin Burke

unread,
Jun 11, 2022, 10:27:21 AM6/11/22
to golang-nuts
Hi,
Recently I inherited some code in production that was hitting an error case that I didn't think should be possible to hit. I reduced it down to this test case. To be clear, there are several different ways to improve the code here, but I'd like to understand why it's behaving the way it does first. 

You should be able to just do "go test ." here to reproduce the error: https://github.com/kevinburke/sync-cond-experiment

What the code is doing:
  • Multiple different goroutines are taking a sync.Cond lock and then appending data to a shared buffer.
  • A "flush" goroutine calls sync.Cond.Wait() to wait for an incoming signal that data has been appended
  • Each goroutine that appends to the buffer calls Signal() after the write, to try to wake up the "flush" goroutine
expect that the flush goroutine will wake up after each call to Signal(), check whether the batch is ready to be flushed, and if not go back to sleep. 

What I see instead is that lots of other goroutines are taking out the lock before the flush goroutine can get to it, and as a result we're dropping data.

I didn't expect that to happen based on my reading of the docs for sync.Cond, which (to me) indicate that Signal() will prioritize a goroutine that calls Wait() (instead of any other goroutines that are waiting on sync.Cond.L). Instead it looks like it's just unlocking any goroutine? Maybe this is because the thread that is calling Signal() holds the lock?

Thanks for your help,
Kevin

Sean Liao

unread,
Jun 11, 2022, 12:36:59 PM6/11/22
to golang-nuts
sync.Cond does not affect goroutine scheduling priority, and Signal only makes the waiting goroutine available to be scheduled, but not force it to be.
After a Signal() (and Unlock()), every other waiting worker and the flusher then contends (fairly) for the lock.
What you want appears a better fit for either channels (send everything to the flusher) or just inlining the check+flush logic into writeEvent,
essentially a proper serialization of events

See also: https://github.com/golang/go/issues/21165 
> On top of that, condition variables are fiendishly difficult to use: they are prone to either missed or spurious signals [citation needed]

- sean


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d7e17b2f-159c-4124-a023-eb2cdb8ba423n%40googlegroups.com.

Kevin Burke

unread,
Jun 12, 2022, 12:06:47 AM6/12/22
to golang-nuts
That's a very clear explanation, it's obvious what the problem is now. Thank you!

Brian Candler

unread,
Jun 12, 2022, 5:46:58 AM6/12/22
to golang-nuts
And just as an aside, I think you would be interested in the talk "Rethinking Classical Concurrency Patterns" by Bryan C. Mills
https://www.youtube.com/watch?v=5zXAHh5tJqQ

Kevin Burke

unread,
Jun 15, 2022, 12:41:18 AM6/15/22
to Brian Candler, golan...@googlegroups.com
I've added https://go-review.googlesource.com/c/go/+/412237 to try to make this obvious to more people.




You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/J-E3VR0UEYA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/8d7ff74a-a623-4cce-b5fc-870ea8c1495en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages