Verifying concurrent data structures

Jones Martins

unread,

Dec 21, 2021, 6:29:17 PM12/21/21

to tlaplus

Hello, everyone

Since TLA+ can also verify (and prove) correctness of algorithms, protocols, etc., I wonder how to verify data structures. When modelling a sequential data structure compared to a concurrent data structure, how should one go about specifying it?

For example, let's say I implemented an abstract Set specification with "Add", "Remove" and "Contains" actions. How do I prove correctness? What about a concurrent Set?

Thanks

Jones

Calvin Loncaric

unread,

Dec 22, 2021, 12:28:36 PM12/22/21

to tla...@googlegroups.com

Here's how I approach this:

Write a spec for the abstract datatype (e.g. set) and then show that your implementation refines it.
Your spec for the abstract datatype should model the client in addition to the data structure. For a single-threaded data structure, clients serialize their calls; for a concurrent data structure, they do not.

A while back I did this for a concurrent bloom filter I wrote, which is a perfect example since you asked about concurrent sets.

Here's the spec for the abstract datatype:

https://github.com/Calvin-L/calbloom/blob/main/design/ConcurrentSet.tla

The three variables "pc", "values_to_insert" and "output" model the clients. Each client first picks a value to insert into the set (PickValue) then hands off control to the data structure to do the actual insert (DoInsert). The data structure returns output to the client in the "output" variable. Since inserts are modeled as a single action, this spec says that inserts should appear to happen atomically.

Here's the spec for the implementation:

https://github.com/Calvin-L/calbloom/blob/main/design/ConcurrentBloomFilter.tla

There's a wrinkle here: since the bloom filter doesn't store the set explicitly, it's not possible to write a refinement mapping. So instead, my implementation refines a different form of the set specification that only includes the log of inputs/outputs that clients observe. I maintain that this form is equivalent to the simpler form in my first link, although I haven't proven it.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/667639f2-93b3-4d8d-b99f-e99dec4cbae5n%40googlegroups.com.

Markus Kuppe

unread,

Dec 22, 2021, 1:10:44 PM12/22/21

to tla...@googlegroups.com

https://github.com/tlaplus/tlaplus/blob/master/tlatools/org.lamport.tlatools/src/tlc2/tool/fp/OpenAddressing.tla
is a spec of a concurrent, lock-free set (no deletion).

Markus

Jones Martins

unread,

Dec 22, 2021, 7:19:15 PM12/22/21

to tlaplus

Thank you, Markus for the example, and Calvin, for your examples and explanations!

I looked for data structure specs in the tlaplus/examples repo on Github, but I couldn't find any. Maybe I didn't search right?

Also, Markus, OpenAddressing.tla seems like a nice addition to the tlaplus/examples repository, don't you think?

Regards,

Jones

Dmytro Ivanov

unread,

Dec 23, 2021, 5:05:30 AM12/23/21

to tlaplus

I've implemented something similar to one of my employers:
- We had existing lockfree MPMC stack/queue/etc data structures in C.
- I've converted existing code to PlusCal almost 1-to-1 to best of my ability. There were some nitpicks regarding how PlusCal implements return values, so I took some liberties. But important part is from a view of sequential consistency, most if not all sequence points were preserved. That is a bit weak guarantee as obviously every C sequence point potentially compiles to multiple instructions, so one could go and model instructions directly if that is what required. But most lockfree algorithms are not relying on specific ordering of instructions between threads, they more relying on ordering of sequence points, so overall it's fairly ok.
- On top of that I've added a "test program" written also in PlusCal that emulates 2-3+ threads and tries to add/remove elements from the data structure, test program keeps a separate set of "correct" elements. And it compares reference to result of algorithm.
- Then I've limited generation counters (common feature in lockfree algorithms to resolve ABA) to something small like 100.
- Running TLC then finds all possible states, and proves that algorithm is correct, with exception of generation counter wrap around (if one thread is stuck and another wraps around generation counter, then you still arrive at ABA).

Generally it's great for sequential consistency code. My next goal was to try different relaxed memory models, but I've moved to other stuff ..

Jones Martins

unread,

Dec 23, 2021, 2:03:46 PM12/23/21

to tla...@googlegroups.com

Thank you, Dmytro!

Your insight about sequence points and sequentially consistency is something of which I should've reminded myself.

Regards,

Jones

--
You received this message because you are subscribed to a topic in the Google Groups "tlaplus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tlaplus/4a8vbujvAZg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/d1e62261-5e38-47c8-95f0-ecb526f45bb8n%40googlegroups.com.

Reply all

Reply to author

Forward