Patterns for detecting data race with TLA+?

105 views
Skip to first unread message

Shuhao Wu

unread,
May 27, 2022, 5:13:46 PM5/27/22
to tlaplus
Hello all:

I'm currently working on some lock-less algorithms with TLA+
(specifically, with PlusCal). One of the things that needs to be checked
is that the algorithm is data-race-free, where no two processes (in
PlusCal terms) are reading and writing to the same variable at the same
time. Here's a sample spec with a data race on the Global2 variable:

variables Global1 = 0, Global2 = 0;

fair process A = "A"
variable local;
begin
A1:
local := Global1;
A2:
Global2 := 10;
end process;

fair process B = "B"
variable local;
begin
B1:
local := Global1;
B2:
local := Global2;
end process;

Checking if a data race occurred in TLC is technically as simple as
checking the invariant ~(pc["A"] = "A2" /\ pc["B"] = "B2"). However, I
suspect this doesn't scale very well to specifications with more
variables and states, as you need to manually enumerate every write (and
read) step for the all "global" variables.

Is there a known pattern somewhere to check for this class of problems?
My quick googling doesn't show anything. Am I approaching this problem
from an incorrect angle?

Thanks,

Shuhao

Hillel Wayne

unread,
May 27, 2022, 5:45:24 PM5/27/22
to tla...@googlegroups.com

Hi Shuhao,

Your spec doesn't necessarily mean the two variables are  reading and writing the same variable at the same time. Remember, each label represents an abstract amount of time: for all you know the two actions could be two years apart. It's a good proxy for bad behavior, but it's still an "indirect" property.

I find that "direct properties" scale better. What is the bad state that both processes writing at the same time would cause? If you check that as an invariant, race conditions will naturally appear in the error trace. You can then write indirect properties for the specific race conditions you see to make them trigger earlier in the model checking.

(One useful tool for this is Action Properties, which let you assert properties on how variables change in a step.)

H

Shuhao Wu

unread,
May 30, 2022, 12:59:08 PM5/30/22
to tla...@googlegroups.com
Thanks Hillel for the suggestion. Your blog post is also very
enlightening, especially on the forbidden state transition checks that I
also struggled with[1].

For my problem though, the race conditions that occur in real-life seem
difficult to model with TLA directly. For example, if I read/write to a
large (multi-word) variable in something like C/C++ without explicit
synchronization, it can result in a torn-read. Without synchronization,
the compiler may even re-order and/or optimize the statements in
unpredictable (undefined) manners. Since in TLA+, the variable values
are usually abstract values, it seems difficult to me to model the
actual "bad" event.

I suppose what I could do is to create an auxiliary variable that
increments and then decrements with each read and write operation, which
may require splitting the read/write operations into multiple steps for
the aux variable. Then I can use an invariant to assert that the aux
variable never increases above 1 to ensure synchronization is properly
implement. Not 100% sure if this is the simplest approach as I have yet
to test it, but it might be an acceptable work around for this purpose.

Thanks again,
Shuhao

[1] I've worked on an unfinished post detailing my journey on
discovering how to write temporaral properties that specifying
conditions like A leads to B and remains at B, or A leads to B at least
once. Once I get it done, I may ask for more feedback here.

On 2022-05-27 17:45, Hillel Wayne wrote:
> Hi Shuhao,
>
> Your spec doesn't /necessarily/ mean the two variables are reading and
> writing the same variable at the same time. Remember, each label
> represents an /abstract/ amount of time: for all you know the two
> actions could be two years apart. It's a good proxy for bad behavior,
> but it's still an "indirect" property.
>
> I find that "direct properties" scale better. What is the bad state that
> both processes writing at the same time would /cause/? If you check
> /that/ as an invariant, race conditions will naturally appear in the
> error trace. You can then write indirect properties for the specific
> race conditions you see to make them trigger earlier in the model checking.
>
> (One useful tool for this is Action Properties
> <https://www.hillelwayne.com/post/action-properties/>, which let you
> --
> You received this message because you are subscribed to the Google
> Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tlaplus+u...@googlegroups.com
> <mailto:tlaplus+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com
> <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer>.

Calvin Loncaric

unread,
May 30, 2022, 4:46:25 PM5/30/22
to tla...@googlegroups.com
> What is the bad state that both processes writing at the same time would cause?

Normally this is a great idea, but as Shuhao points out, in C/C++ the data race itself is the bad state, since data races are undefined behavior. Even if the data race is benign (in the sense that it does not lead to any bad states in the TLA+ model) you would still want to know about it, since it could cause bad behavior in a real C/C++ implementation.

> splitting the read/write operations into multiple steps

Yes, that is exactly how you would check such a thing in TLA+. PlusCal does not provide a way to express the statement "process P is about to do a read/write to variable X". However, by splitting reads and writes into multiple steps, we can make it possible to write such a statement.

Here's a spec I wrote that uses that idea. It uses auxiliary variables "op" and "addr" to record what each process is about to do before it does it. The extra steps for each read/write do unfortunately increase the state space, but this feels like the right approach to me.


---- MODULE DataRace ----

EXTENDS Naturals, Sequences

(*

--algorithm my_processes

    variables
        mem   = <<0, 0>>,
        addr  = [p \in ProcSet |-> 0],
        op    = [p \in ProcSet |-> "none"],
        local = [p \in ProcSet |-> 0]; \* return value from read(a)

    define
        Addr1 == 1
        Addr2 == 2
    end define

    procedure read(a)
    begin
        start_read:
            addr[self] := a;
            op[self] := "read";
        do_read:
            local[self] := mem[a];
            op[self] := "none";
            addr[self] := 0; \* optional; reduces the state space
            return;
    end procedure

    procedure write(a, value)
    begin
        start_write:
            addr[self] := a;
            op[self] := "write";
        do_write:
            mem[a] := value;
            op[self] := "none";
            addr[self] := 0; \* optional; reduces the state space
            return;
    end procedure


    fair process A = "A"
    begin
        A1:
            call read(Addr1);
        A2:
            call write(Addr2, 10);

    end process

    fair process B = "B"
    begin
        B1:
            call read(Addr1);
        B2:
            call write(Addr1, 4);
    end process

end algorithm

*)

\* BEGIN TRANSLATION
\* END TRANSLATION

\* A data race happens when
\*   (1) there are two concurrent operations to the same address and
\*   (2) at least one of those operations is a write.
DataRace ==
    \E x, y \in ProcSet:
        /\ x /= y
        /\ op[x] /= "none"
        /\ op[y] /= "none"
        /\ addr[x] = addr[y]
        /\ op[x] = "write" \/ op[y] = "write"

NoDataRace == ~DataRace

====



To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com.

Shuhao Wu

unread,
May 30, 2022, 8:57:15 PM5/30/22
to tla...@googlegroups.com
Thanks Calvin for the excellent example. You're right the state-space
explosion is a concern for TLC. My approach is also similar to yours,
with procedures and multi-steps. Except all I'm doing is incrementing
and decrementing a counter:

CONSTANT N
CONSTANT Uninitialized

(*--algorithm DataRaceDetection

variables
Memory = [i \in 1..N |-> Uninitialized],
MemAccCnt = [i \in 1..N |-> [reads |-> 0, writes |-> 0]],
ReadReturnValue; \* Return value of the Read procedure

procedure Write(idx, value) begin
mem_write1:
Memory[idx] := value;
MemAccCnt[idx].writes := MemAccCnt[idx].writes + 1;
mem_write2:
MemAccCnt[idx].writes := MemAccCnt[idx].writes - 1;
return;
end procedure;

procedure Read(idx) begin
mem_read1:
ReadReturnValue := Memory[idx];
MemAccCnt[idx].reads := MemAccCnt[idx].reads + 1;
mem_read2:
MemAccCnt[idx].reads := MemAccCnt[idx].reads - 1;
return;
end procedure;

fair process A = "A" begin
A1:
call Read(1);
A2:
call Write(2, 2);
end process;

fair process B = "B" begin
B1:
call Read(1);
B2:
call Read(2);
end process;

end algorithm; *)

WriteDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes > 1
ReadDataRace == \E i \in DOMAIN MemAccCnt: MemAccCnt[i].writes >= 1 /\
MemAccCnt[i].reads >= 1

DataRace == WriteDataRace \/ ReadDataRace
\* Maybe better to check ~WriteDataRace and ~ReadDataRace in TLC for
more detailed err msg
NoDataRace == ~DataRace

This approach is a bit cumbersome (especially since procedures cannot
directly return a value), but should get the job done. It also is a bit
tedious to parse in the error trace, as variables setting up the
procedure can clutter up the error trace window a bit.

Thanks all,
Shuhao

On 2022-05-30 16:46, Calvin Loncaric wrote:
> > What is the bad state that both processes writing at the same time
> would /cause/?
>
> Normally this is a great idea, but as Shuhao points out, in C/C++ the
> data race itself /is/ the bad state, since data races are undefined
> <mailto:tlaplus%2Bunsu...@googlegroups.com>
> > <mailto:tlaplus+u...@googlegroups.com
> <mailto:tlaplus%2Bunsu...@googlegroups.com>>.
> <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/tlaplus/6c3f1b9d-7000-1c53-0087-f6882d8afdc9%40gmail.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to tlaplus+u...@googlegroups.com
> <mailto:tlaplus%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com
> <https://groups.google.com/d/msgid/tlaplus/923db76b-4cb2-1256-9130-0a86641d6c6f%40shuhaowu.com>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tlaplus+u...@googlegroups.com
> <mailto:tlaplus+u...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tlaplus/CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tlaplus/CABf5HMj4YG91_qSGjgqCqtpGqt1GwuH9WJTAQQ879xAy7wWrfw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages