Inquiry on TSAN implementation and memory model for Golang

14 views
Skip to first unread message

Shi Zheng

unread,
Oct 22, 2024, 6:45:03 AM10/22/24
to thread-s...@googlegroups.com, Andreas Pavlogiannis, Umang Mathur, yvan...@cs.au.dk
Dear TSAN Developers,

I hope this message finds you well. We are a research team from National University of Singapore and Aarhus University, and we are currently exploring the Go memory model and the TSAN race detector. We have encountered some issues regarding these topics and would greatly appreciate your insights.

  1. First, we understand that the Go memory model defines an execution as racy if there are two conflicting events that are unordered by the Happens-Before (HB) relation. However, we are curious why the memory model doesn't define an execution as racy, if two conflicting events occur simultaneously in it. While these two definitions may be equivalent in shared memory communication, we discovered an example program (see attached figures and description in the next paragraph) where a racy execution can occur according to the HB definition, but not under the alternative definition. In fact, the entire program is considered racy only with respect to the HB definition.

    In the attached example program, we can generate an execution where the Go memory model reports a race between the write and read of variable x (specifically, at lines 11 and 18 of the unsound.go program). We confirmed this behaviour by using TSAN (Go version 1.22.7) and manually injecting time delays using the time.Sleep() function. However, it’s important to note that the program can never generate an execution in which these two conflicting events occur simultaneously. This is because the read of x at line 11 can only occur if the receive operation at line 10 successfully receives the value 2. Given that the send operation at line 16 is ordered before line 17, there must be another receive operation that pops the value 1 out of the channel. Consequently, the receive operation at line 19 must have executed, which implies that the write at line 18 has also been executed. That is, in any executions where both the write and read to x occur, the write is always ordered before the read.

    We understand that in order to lower the runtime overhead, TSAN is not designed in the most predictive way and a better race detector may discover this problem and not report it as a race. However, if our observation is correct, this is also a race according to the current Go memory model, which may imply that Go memory model can report a non-racy program as racy. 

  2. Additionally, we also have a question regarding the HB edges defined for channels. Since Go appears to use mutexes/locks for channel operations, why does the Go memory model synchronize only accesses to the same slot of the channel buffer, rather than synchronizing all pairs of accesses on the same channel? If one synchronizes all accesses to the channel, then the w(x) and r(x) would not be regarded as races, which solves the problem. 

We would be grateful for any clarification you could provide on these matters. Thank you for your time and assistance!

Best regards,

Zheng Shi

exec.png
unsound.go
unsound-with-delay.go

Dmitry Vyukov

unread,
Oct 22, 2024, 10:24:32 AM10/22/24
to Shi Zheng, thread-s...@googlegroups.com, Andreas Pavlogiannis, Umang Mathur, yvan...@cs.au.dk
Hi Zheng,

The "why" question may be better answered on the golang-dev@ mailing
list. TSAN is just a tool that checks the existing memory model, it
does not define/introduce rules.

By "simultaneous" model you seem to refer to the so-called
"sequentially consistent" model. It's commonly rejected by all
practical languages (C, C++, Go, Java, LKMM) b/c it's too expensive to
implement (will require too much unneeded synchronization).

It's not a problem with HB model per-se. HB model can handle that as
well if we add enough HB edges between everything (e.g. what you
mention in bullet 2). So it's only a question of what exactly HB edges
are defined by a memory model. Sequentially consistent MM can also be
expressed as HB model.

Re bullet 2. There are 2 levels: external guarantees vs current
implementation details. It's generally a bad idea to provide external
guarantees for all current implementation details. E.g. you promised
users that channels provide sequentially consistent guarantees, but
tomorrow we want to optimize channels and replace mutexes with
finer-grained atomic operations, but now we can't because we
over-promised things (you can never remove/weaken guarantees b/c it
will break existing correct programs).
Reply all
Reply to author
Forward
0 new messages