playground - time.Sleep causes deadlock

robert engels

unread,

Aug 31, 2019, 6:13:26 PM8/31/19

to golang-nuts

The code at https://play.golang.org/p/9ZdPVvwyaYK

should not deadlock. But it reports:

fatal error: all goroutines are asleep - deadlock!

But if you look at the stack traces, you see multiple routines (sampled) like:

goroutine 1 [sleep]:
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:307
time.Sleep(0x3e8, 0x0)
	/usr/local/go/src/runtime/time.go:105 +0x1c0
main.main()
	/tmp/sandbox327672725/prog.go:81 +0x140

goroutine 5 [sleep]:
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:307
time.Sleep(0x1, 0x0)
	/usr/local/go/src/runtime/time.go:105 +0x1c0
main.producer(0x430150, 0x40a0e0)
	/tmp/sandbox327672725/prog.go:49 +0x60
created by main.main
	/tmp/sandbox327672725/prog.go:73 +0xe0

These routines are asleep, not blocked. They will awake from sleep (eventually) - so there cannot be a deadlock.

I am assuming this is a limitation of the Go playground back-end, but I can’t see it documented anywhere? Also, I’m not sure why this limitation would exist, since if all the routines are asleep or blocked, it is consuming no resources.

Ideas?

burak serdar

unread,

Aug 31, 2019, 6:18:42 PM8/31/19

to robert engels, golang-nuts

On Sat, Aug 31, 2019 at 4:13 PM robert engels <ren...@ix.netcom.com> wrote:
>
> The code at https://play.golang.org/p/9ZdPVvwyaYK
>
> should not deadlock. But it reports:

I had the same problem the other day. Code runs locally, but not on
playground. It also had a sleep in a goroutine.

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/C7B46199-05EA-47C1-9594-200E2DD36F99%40ix.netcom.com.

robert engels

unread,

Aug 31, 2019, 6:22:53 PM8/31/19

to burak serdar, golang-nuts

Yes, the code runs fine locally. Reviewing the playground docs, it sees that at one point time.Sleep() was a no-op and it was changed. I am thinking that in the course of that change, they forgot to change the ‘deadlock’ detector.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMV2RqpyK54ZsUJAPF%3D8Jk2GUjAXL%2B6E5-htX7R-wSuM_6MNVg%40mail.gmail.com.

andrey mirtchovski

unread,

Aug 31, 2019, 7:04:53 PM8/31/19

to robert engels, burak serdar, golang-nuts

see the "faking time" section here: https://blog.golang.org/playground

not sure if anything has changed since that article

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/F957AAC7-A4F9-45D5-A80E-2BA788C6C721%40ix.netcom.com.

roger peppe

unread,

Sep 1, 2019, 4:06:50 AM9/1/19

to robert engels, golang-nuts

The problem seems to be that you're using runtime.GOMAXPROCS. The fact that using GOMAXPROCS causes a deadlock is probably a bug, but it's an easy one to work around - just don't call it :)

FWIW here's a version of your code that I'd consider a little more idiomatic.

https://play.golang.org/p/ufLXXX1zbrn

- it reports true when an operation succeeds rather than when it fails (think reading from a channel or from a map).

- it removes the unnecessary (and probably wrong) atomic load

- it allows direct access to the channel for reading (and select etc) - no need to hide it behind the Read method.

- mutex fields are very often named "mu"

cheers,

rog.

--

Robert Engels

unread,

Sep 1, 2019, 7:47:16 AM9/1/19

to roger peppe, golang-nuts

As I pointed out in another reply, I am fairly certain the atomic operations must be valid in these cases due to the happens before relationship of them and mutexes.

I don’t agree that returning the channel is a proper design, it breaks the encapsulation. Technically mine could use multiple channels behind the scenes - yours cannot.

robert engels

unread,

Sep 1, 2019, 8:06:03 AM9/1/19

to roger peppe, golang-nuts

The point about ‘mu’ is valid. Rather than ‘closed’, it should really be an ‘error’ given the situation - as it is more applicable to the operation failures (as a black box IO component).

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/AC9DEEBD-37DF-49FA-8171-E45418584697%40ix.netcom.com.

Robert Engels

unread,

Sep 1, 2019, 9:20:11 AM9/1/19

to roger peppe, golang-nuts

Btw, the removing of the GOMAXPROCS causes things to execute serially- which is fine according to the spec - but it does make demonstrating certain concurrency structs/problems pretty difficult.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/C248F20A-88EC-4C36-9931-C9F35585206A%40ix.netcom.com.

roger peppe

unread,

Sep 2, 2019, 12:17:26 PM9/2/19

to Robert Engels, golang-nuts

Btw, the removing of the GOMAXPROCS causes things to execute serially- which is fine according to the spec - but it does make demonstrating certain concurrency structs/problems pretty difficult.

Unless something's changed recently, all programs in the playground execute without parallelism, so setting GOMAXPROCS shouldn't have any effect. The fact that it does have an affect appears to be a bug. I'd suggest reporting it as an issue.

As I pointed out in another reply, I am fairly certain the atomic operations must be valid in these cases due to the happens before relationship of them and mutexes.

In that other reply:
You can simply validate it by run: go run -race main.go for you program: https://play.golang.org/p/JRSEPU3Uf17
Not true. The race detector does not detect certain cases and can overreport.
Firstly, AFAIK there is no implied happens-before relationship between a non-atomic write to a variable and an atomic read from a variable, because there is no synchronization event associated with the atomic read.
Secondly, I am aware that the race detector can give false negatives in some cases, but it should not provide false positives AFAIK. There's only one such case that I'm aware of, and that's quite a different issue.
Given that the race detector reports a race in this very clear and simple case, ISTM that your program is wrong.
From https://golang.org/ref/mem:
If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Don't be clever.
Also, there's really no point in the atomic read. You could implement the Read method as follows:
func (c *MultiWriterIntChan) Read() (int, bool) {
	x, ok := <-c.ch
	return x, ok  

That is, let the close state flow naturally downstream via the channel. That way it will still work correctly if you start to use a buffered channel, for example.

I don’t agree that returning the channel is a proper design, it breaks the encapsulation. Technically mine could use multiple channels behind the scenes - yours cannot.

There's a trade-off here. By encapsulating in a method, you make it harder to wait for values in combination with other events (you'd need to create a goroutine to call your Read method and send the result to a channel, adding buffering, context switch overhead, and more state that needs to be explicitly shut down).

robert engels

unread,

Sep 2, 2019, 4:32:45 PM9/2/19

to roger peppe, golang-nuts

(you’re comments were quoted, but I think I am replying correctly).

The memory barriers provided by the Write Lock, and release will force the flush to memory of all mutations before the flush (the closed mutation) - meaning the atomic read will read the correct value.

There is a chance that a fully specified memory model could make that not be possible, but in the current de-facto memory model it is going to be the case on any CPU architecture I’m aware of. Some transactional memory systems might be able to avoid the flush on the ‘closed’, but unlikely.

I would welcome you writing a test case to demonstrate this not being the case and failing.

The reason the race detector reports a race is because it expects all access to be guarded by the same guard, the Lock/Release must perform the same memory barriers on the SMP systems I’m aware of.

The comment on the better “read” method is definitely the way to go. It still encapsulates the channel(s), and avoids any of the ambiguity of the memory model (while probably improving the performance a bit).

Agreed that the encapsulation can cause issues like you describe, but the response was originally related to “priority channels”, and I was demonstrating a technique (with additional channels behind the struct) that would allow easy implementations of priorities. But the code I provided certainly simplifies the case of multi writer/readers with indeterministic behavior/shutdown (without using panic/recover which is probably the easiest for the simple case).

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAJhgaci57DOYHOk7C%3DjM8B6wenCxYv4JvNn2%3D13OGcd66Yc3EA%40mail.gmail.com.

roger peppe

unread,

Sep 2, 2019, 7:03:10 PM9/2/19

to robert engels, golang-nuts

On Mon, 2 Sep 2019, 21:32 robert engels, <ren...@ix.netcom.com> wrote:

(you’re comments were quoted, but I think I am replying correctly).

The memory barriers provided by the Write Lock, and release will force the flush to memory of all mutations before the flush (the closed mutation) - meaning the atomic read will read the correct value.

In this case, I'm much more interested in writing Go code that's correct according to the spec than Go code that happens to work on some current architecture. Let's try to make code that's "obviously correct" rather than "not obviously incorrect". The memory model has no mention of "memory barriers" or "flushing".

There is a chance that a fully specified memory model could make that not be possible, but in the current de-facto memory model it is going to be the case on any CPU architecture I’m aware of. Some transactional memory systems might be able to avoid the flush on the ‘closed’, but unlikely.

I would welcome you writing a test case to demonstrate this not being the case and failing.

On a 32 bit architecture, the 64 bit write could easily be split across two non-atomic operations. AFAIK there's nothing about the atomic read that says that it couldn't observe a partial write by one of the write operations inside the mutex. It may well be possible to write a test case to demonstrate the issue on a 32-bit ARM machine. And even if it isn't, I would much always flag up code like this in a code review as being too subtle and prone to potential errors.

The reason the race detector reports a race is because it expects all access to be guarded by the same guard, the Lock/Release must perform the same memory barriers on the SMP systems I’m aware of.

Yes. That's what the memory model expects, and how Go code should be written. If code fails when run with the race detector, it should be fixed, IMHO.

Robert Engels

unread,

Sep 2, 2019, 7:53:17 PM9/2/19

to roger peppe, golang-nuts

As far as I know you are attributing properties to the memory model that don’t yet exist - they’re working on it. Even the “happens before” in relation to atomics is not yet defined. The position from the Go “team” so far has been, whatever the behavior you see, that’s the spec (in terms of memory model)

Still, I would hope the memory model would cover this case as it would (probably) avoid unnecessary atomic writes. In this case if there were atomic Boolean or bytes it wouldn’t be subject to tearing, maybe int32 in most platforms.

Still, like I said, if you want add the atomic writes, go ahead. This is definitely safer. But I think when the full memory model spec arrives it won’t be necessary.

Robert Engels

unread,

Sep 2, 2019, 9:00:39 PM9/2/19

to roger peppe, golang-nuts

Actually, in thinking about this some more, the atomics are completely unnecessary. Your solution being the best, but even without it, the channels create a happens before scenario - between the write of closed and “the channel close” and the read of closed. That is, for a channel to be reported as closed all other writes prior must be visible.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1753A5A2-5642-49A7-8815-5CC727E67F16%40ix.netcom.com.

Reply all

Reply to author

Forward