> the
one obvious question is how TLA ever escapes the while [true] loop to explore more states or terminate
Yep, the model checker "escapes" the loop, even though your program cannot "escape" the loop. Crazy, right? But: When the model checker visits a state it has seen before, it can safely abandon its current line of execution, even if the program being checked would continue running!
To understand the model checker better, think deeply about this very simple non-terminating program:
variable x = 1;
loop:
while (TRUE) {
x := IF x = 1 THEN 2 ELSE 1;
};
after:
x := -1;
It runs forever, but it only has two states:
/-----------\ /-----------\
| x = 1 | ----> | x = 2 |
| pc = loop | <---- | pc = loop |
\-----------/ \-----------/
(initial state)
When the model checker revisits the state (x=1 /\ pc=loop), it can stop because the only possible outcome is an infinite loop. It has already seen every possible state of the program. The program never actually "escapes" the loop: there are no reachable states where pc=after or x=-1. However, the model checker doesn't need to wait for the program to terminate before it can stop and tell you that it has seen all of the reachable states.
You might be concerned that stopping at duplicate states means the model checker has missed some possible behaviors. Don't worry, it hasn't missed anything. I'll try to respond to some concerns:
*** The example program runs forever. Therefore, isn't it an infinite-state program?
No. It can take arbitrarily many steps, but along the way it only visits a finite number of states. If you replace the body of the loop in my example program with x := x + 1; then the program becomes an infinite-state program, and the model checker will not terminate. Note Stephen's comment: "The power of model checking is that it lets you explore even infinite executions over finite state spaces."
*** If my program is nondeterministic or multi-threaded, isn't it wrong to stop at an already-visited state? After all, a nondeterministic program might do something different on the next iteration.
Great question. Don't worry: the model checker still won't miss any states. While stepping through the loop body, the model checker queues all nondeterministic behaviors to "visit later". Stephan's code shows how this queue works. The model checker will "abandon this line of execution"... but then it goes back to its queue to look at other possible executions.
If you like thinking in terms of invariants, the model checker's invariant is that all reachable states of the program are either
- in the visited set -OR-
- queued to be visited later -OR-
- reachable from some queued state.
Therefore, when the queue is empty, all reachable states have been visited. :)
*** I still think the model checker missed something, because my program has billions of possible behaviors but TLC reported only "278 states generated, 178 distinct states found".
It's possible that there is a bug in your program, and that some states you thought were reachable... aren't. If you can describe an interesting state, you can check an invariant like "~ Interesting" to see if the model checker can find such a state. A violation of "~ Interesting" corresponds to a trace where an interesting state was reached.
For what it's worth: I think 178 is a reasonable number of distinct states for your program, even if there are indeed billions of ways to reach those states. Think deeply about the difference between the number of reachable states and the number of paths through the state graph (i.e. behaviors).
More concretely, it can be hard to predict the effect of "diamonds" in the state graph. For example, the programs [put(1, 100); put(1, 200);] and [put(1, 101); put(1, 200);] arrive at the same final state:
(empty)
/ \
/ \
{1:100} {1:101}
\ /
\ /
{1:200}
As a result, with one single-threaded client I expect the number of states in your spec would be roughly (# states of the key-value store) * (# of labels in the client program). With 2 keys and 2 values, I think (# states of the key-value store)=9, so 178 states is pretty reasonable.
Stephan simplified reality slightly, I think. TLC will explore all possible states, and it constructs a state graph that captures all possible sequences of operation calls. It does this without actually "exploring" all possible sequences of operation calls (most of the time).
Fortunately, that state graph is all it needs to check correctness properties.
You should actually be more impressed now! TLC verifies your program over all possible sequences of operation calls without actually "exploring" all possible sequences of operation calls. :)