Busy synchronization wait seems to behave differently on 1.13 and 1.14

97 views
Skip to first unread message

Groups Discussion

unread,
Jul 25, 2020, 3:38:29 AM7/25/20
to golang-nuts
Hi all,

writing a stress test case for one of my apps I noticed a very strange thing: my test case works well on go 1.14 but it doesn't work on go 1.13.

I wrote a minimal reproducer


to make it work on go 1.13 I have to add the sleep at line 41. In Go playground it timeouts, without sleeping, even on 1.14.6, on real hw I tried it 1000 times, without sleeping, with no issues on go 1.14 while it fails every time on go 1.13 (tested 1.13.12 and 1.13.14)

I'm just curious to understand if there is something wrong with my code, even if it is not idiomatic, and if a such busy synchronization wait is expected to work in 1.14 only

thanks  

Martin Schnabel

unread,
Jul 25, 2020, 5:35:43 AM7/25/20
to golang-nuts
I am not certain but the reason probably is the change to go-routine
preemption in 1.14. From https://golang.org/doc/go1.14#runtime


Goroutines are now asynchronously preemptible. As a result, loops
without function calls no longer potentially deadlock the scheduler or
significantly delay garbage collection. This is supported on all
platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.


Before that busy loops in go routines needed a function call to be
preemptible.

Hope that helps!
> --
> You received this message because you are subscribed to the Google
> Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to golang-nuts...@googlegroups.com
> <mailto:golang-nuts...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/0dbc0ba3-0b4d-42bd-8edf-076e92507251o%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/0dbc0ba3-0b4d-42bd-8edf-076e92507251o%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jesper Louis Andersen

unread,
Jul 25, 2020, 6:17:51 AM7/25/20
to Groups Discussion, golang-nuts
This is probably due to improvements in preemption.

Garbage collectors often need some linearizable checkpoint (or an atomic commit point) where every CPU core agrees on a state. For instance, enabling a write barrier on the heap.

Back in the day, this was achieved on communication via channels, network communication, system calls and so on. In particular a goroutine doing computation could defer the invocation of the checkpoint. That meant the system would hang for every other CPU core and not do any productive work.

A later version of Go improved this. Every function call needs to check if the stack needs extension. By manipulating the extension point, the GC could signal that a checkpoint was needed: the stack extension check fails, and the goroutine enters the stack extension routine. But then it first checks if this is due to a GC signal. If it is, it enters the checkpoint.

With Go 1.14, preemption has been further improved to use OS signals. This means even loops with no function calls (as the one you have gathering logins) can now be preempted.

Your example is the worst possible outcome. But there are other situations which are almost equally bad in production systems. You can have sudden productivity halts where the program isn't able to continue for several hundred milliseconds. These look like GC pauses, but it is a bit of a philosophical discussion if they really are, since they live in the limbo between GC and preemption.

As an interesting observation: functional languages use recursion for loops, so they don't theoretically have to preempt loops as every loop has a function call in it. However, many functional languages also compile tail calls into loops for efficiency reasons, so the world becomes a bit more blurry. The usual way to get good preemption is to check on memory allocation, which is common in functional languages, and also the major reason they tend to use generational GCs with a two-space copying allocator in the young generation. However, the memory allocation check is also quickly becoming blurry as you can often use escape/liveness analysis and move many of these allocations onto the stack.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0dbc0ba3-0b4d-42bd-8edf-076e92507251o%40googlegroups.com.


--
J.
Reply all
Reply to author
Forward
0 new messages