Yet another stuttering question

Luca Tumedei

unread,

Apr 13, 2022, 5:41:40 AM4/13/22

to tlaplus

Hello everyone,

I'm a PHP developer that recently got very interested into TLA+, I'm trying to learn to use it and ran into a blocker while trying to write my first specification.

I'm trying to model a system composed of a Loop that will spawn workers to execute jobs; I'm not reinventing the wheel and would like to try my hand at writing the specification in TLC.

I've read the Practical TLA+ book and watched a number of videos online in my spare time, so my TLC is influenced by that.

The first version of the specification is simple enough: 1 Loop, 3 workers, a parallelism of 2.

I _think_ I got the part where the Loop starts 2 workers correctly, but now I'm stuck on a Stuttering error I cannot understand the cause of.

The Loop and Worker actions all are weakly fair, and it's my understanding the system should move on, but it doesn't. It might be glaringly obvious to experts, but I've been staring at it for too long to understand what is (not) going on.

Here is the spec:

---- MODULE loop_and_workers ----
EXTENDS TLC, Integers, FiniteSets

VARIABLES pc, started_workers, running_workers, work_done

Parallelism == 2
Workers == {"w1", "w2", "w3"}

vars == << pc, started_workers, running_workers, work_done >>
all_vars_but_pc == << started_workers, running_workers, work_done >>

NotStartedWorker == CHOOSE x \in Workers: pc[x] = "W_Not_Started"

Init ==
/\ started_workers = 0
/\ running_workers = 0
/\ work_done = 0
/\ pc = [self \in {"loop"} \cup Workers |->
CASE self = "loop" -> "L_Start"
[] self \in Workers -> "W_Not_Started"]

L_Start ==
/\ pc["loop"] = "L_Start"
/\ IF started_workers /= Cardinality(Workers) /\ started_workers < Parallelism
THEN LET worker == NotStartedWorker IN
pc' = [pc EXCEPT !["loop"] = "L_Start", ![worker] = "W_Work"]
/\ started_workers' = started_workers + 1
/\ running_workers' = running_workers + 1
/\ UNCHANGED work_done
ELSE pc' = [pc EXCEPT !["loop"] = "L_Wait"]
/\ UNCHANGED all_vars_but_pc

L_Wait ==
/\ pc["loop"] = "L_Wait"
/\ IF work_done /= Cardinality(Workers) /\ started_workers < Parallelism
THEN LET worker == NotStartedWorker IN
/\ pc' = [pc EXCEPT !["loop"] = "L_Wait", ![worker] = "W_Work"]
/\ started_workers' = started_workers + 1
/\ running_workers' = running_workers + 1
/\ UNCHANGED work_done
ELSE UNCHANGED vars

L_Complete ==
/\ pc["loop"] = "L_Complete"
/\ work_done = Cardinality(Workers)
/\ pc' = [pc EXCEPT !["loop"] = "Done"]

W_Work(self) ==
/\ pc[self] = "W_Work"
/\ work_done = work_done + 1
/\ pc' = [pc EXCEPT ![self] = "W_Exit"]

W_Exit(self) ==
/\ pc[self] = "W_Exit"
/\ running_workers' = running_workers - 1
/\ pc' = [pc EXCEPT ![self] = "Done"]

loop == L_Start \/ L_Wait \/ L_Complete
worker(self) == W_Work(self) \/ W_Exit(self)

Next == loop
\/ (\E self \in Workers: worker(self))

Spec == /\ Init /\ [][Next]_vars
/\ WF_vars(loop)
/\ \A self \in Workers: WF_vars(worker(self))

ParallelismRespected == running_workers <= Parallelism
====

And here is the output of the model checking:

Temporal properties were violated.

The following behavior constitutes a counter-example:

1: <Initial predicate>

/\ work_done = 0

/\ running_workers = 0

/\ pc = ( w1 :> "W_Not_Started" @@

w2 :> "W_Not_Started" @@

w3 :> "W_Not_Started" @@

"loop" :> "L_Start" )

/\ started_workers = 0

2: <L_Start line 23, col 5 to line 31, col 40 of module loop_and_workers>

/\ work_done = 0

/\ running_workers = 1

/\ pc = ( w1 :> "W_Work" @@

w2 :> "W_Not_Started" @@

w3 :> "W_Not_Started" @@

"loop" :> "L_Start" )

/\ started_workers = 1

3: <L_Start line 23, col 5 to line 31, col 40 of module loop_and_workers>

/\ work_done = 0

/\ running_workers = 2

/\ pc = ( w1 :> "W_Work" @@

w2 :> "W_Work" @@

w3 :> "W_Not_Started" @@

"loop" :> "L_Start" )

/\ started_workers = 2

4: <L_Start line 23, col 5 to line 31, col 40 of module loop_and_workers>

/\ work_done = 0

/\ running_workers = 2

/\ pc = ( w1 :> "W_Work" @@

w2 :> "W_Work" @@

w3 :> "W_Not_Started" @@

"loop" :> "L_Wait" )

/\ started_workers = 2

5: Stuttering

I will keep reading the group conversations and try to find a solution on my own.

Should that be the case, I will reply to it to provide my discoveries and solution.

Thanks in advance to anyone that took the time to read this or reply, much appreciated.

All the best,

Luca

Stephan Merz

unread,

Apr 13, 2022, 7:23:42 AM4/13/22

to tla...@googlegroups.com

Hello,

it may be helpful to know what property you are trying to verify.

However, I was curious why, say, w1 wouldn't be required to take a step in the final state of your execution. One would imagine that it should be able to perform W_Work, but in fact the definition of that action is

W_Work(self) ==
/\ pc[self] = "W_Work"
/\ work_done = work_done + 1
/\ pc' = [pc EXCEPT ![self] = "W_Exit"]

I am sure you meant to write

/\ work_done' = work_done + 1 \* mind the prime

Independently of this, I would recommend avoiding the CHOOSE expression in the definition of NotStartedWorker since CHOOSE is deterministic choice, but probably you'd rather run any worker that hasn't started yet. You could change the definition of L_Start as follows:

L_Start ==
/\ pc["loop"] = "L_Start"
/\ IF started_workers /= Cardinality(Workers) /\ started_workers < Parallelism

THEN \E worker \in Workers :

/\ pc[worker] = "W_Not_Started"
/\ pc' = [pc EXCEPT !["loop"] = "L_Start", ![worker] = "W_Work"]

/\ started_workers' = started_workers + 1
/\ running_workers' = running_workers + 1
/\ UNCHANGED work_done
ELSE pc' = [pc EXCEPT !["loop"] = "L_Wait"]
/\ UNCHANGED all_vars_but_pc

As a minor issue, note that the update of pc["loop"] in the THEN branch is a no-op and can be removed (same for action L_Wait).

In fact, the actions L_Start and L_Wait look very similar: it is not clear to me why you want to distinguish them (the information about the number of running workers is available separately from the pc value), and the action L_Complete can never be enabled because no action sets pc["loop"] to "L_Complete".

Hope this helps,

Stephan

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/6f31d511-b6cf-488b-ac30-bcfbb42bda85n%40googlegroups.com.

Luca Tumedei

unread,

Apr 14, 2022, 1:08:11 PM4/14/22

to tlaplus

Thanks Stephan for taking the time to read and reply.

I've read your answer again and again and am still wrapping my head around what "stuttering" means and how to debug it.

I've been using the VSCode extension to write and test the specification... and got different results.

This is the specification I ended up with:

---- MODULE loop ----
EXTENDS TLC, Integers, FiniteSets

VARIABLES worker_status, work_done, loop_done

Parallelism == 2
Workers == {"w1", "w2", "w3"}

WorkersCount == Cardinality(Workers)
Processes == {"loop"} \cup Workers
RunningCount == IF \E x \in Workers: worker_status[x] = "running"
THEN Cardinality({x \in Workers: worker_status[x] = "running"})
ELSE 0
ExitedCount == IF \E x \in Workers: worker_status[x] = "exited"
THEN Cardinality({x \in Workers: worker_status[x] = "exited"})
ELSE 0

vars == << worker_status, work_done, loop_done >>

Init ==
/\ worker_status = [x \in Workers |-> "not_started"]
/\ work_done = 0
/\ loop_done = FALSE

L_Start_Worker ==
/\ RunningCount < Parallelism
/\ \E worker \in Workers:
/\ worker_status[worker] = "not_started"
/\ worker_status' = [worker_status EXCEPT ![worker] = "running"]
/\ UNCHANGED << work_done, loop_done >>

L_Complete ==
/\ ExitedCount = WorkersCount
/\ loop_done' = TRUE
/\ UNCHANGED << worker_status, work_done >>

W_Work(self) ==
/\ worker_status[self] = "running"

/\ work_done' = work_done + 1

/\ worker_status' = [worker_status EXCEPT ![self] = "exiting"]
/\ UNCHANGED << loop_done >>

W_Exit(self) ==
/\ worker_status[self] = "exiting"
/\ worker_status' = [worker_status EXCEPT ![self] = "exited"]
/\ UNCHANGED << work_done, loop_done >>

loop == L_Start_Worker \/ L_Complete

worker(self) == W_Work(self) \/ W_Exit(self)

Next == loop
\/ (\E self \in Workers: worker(self))

\/ (loop_done = TRUE /\ UNCHANGED vars)

Spec == /\ Init /\ [][Next]_vars
/\ WF_vars(loop)
/\ \A self \in Workers: WF_vars(worker(self))

ParallelismRespected == RunningCount <= Parallelism

AllWorkDone == <>[](work_done = WorkersCount)
====

I've simplified it a bit and removed what redundancy there was.

Running the above specification with the ParallelismRespected invariant and AllWorkDone property does not fail in the TLA+ toolbox, but keeps failing with stuttering in VSCode with the following .cfg file:

```

INIT Init
NEXT Next

INVARIANT ParallelismRespected
PROPERTY AllWorkDone

```

The TLA+ Toolbox Model settings

I might be doing something different for the two environments not to match ... I will check more.

I'm posting this for my future reference and in hopes it could help someone else.

Markus Kuppe

unread,

Apr 14, 2022, 1:21:21 PM4/14/22

to tla...@googlegroups.com

Hi Luca,

replace `INIT Init NEXT Next` with `SPECIFICATION Spec`.

Markus

Luca Tumedei

unread,

Apr 15, 2022, 3:56:02 AM4/15/22

to tlaplus

Thanks Markus for your reply, that did it: now the specification is passing the model check without issues.

I'm good now to iterate over the specification and complicate it in incremental steps.

Luca

Reply all

Reply to author

Forward