TLC simulation mode

A. Jesse Jiryu Davis

unread,

Apr 13, 2026, 9:53:31 PM (3 days ago) Apr 13

to tla...@googlegroups.com

Hello. Igor Konnov wrote a terrific post. He shows that running TLC in simulation mode for a long time doesn't find bugs reliably, it's no substitute for exhaustive model-checking. Markus wrote a very interesting comment. I'm bringing the discussion over here to the mailing list where maybe more TLA+ users will see it.

1. Markus wrote,

enumerating all successors is useful for more than just choosing the next step: TLC can also check invariants on all generated successor states, not only on the one that ends up being sampled. That is a meaningful benefit when the goal is to catch bugs, not just drive a walk.

That's interesting. I verified with Claude and the source code and confirmed: at each step of the simulation, TLC picks a random next action, then generates all the successor states that action can generate. It checks invariants for all those, then picks one of them at random as the next step. So TLC checks invariants on a "wide path" through the state space.

2. He also wrote:

for specs that explicitly model faults, it’s not too hard to correct for “faulty transitions dominate the search” by adjusting the probabilities of the fault actions themselves. That’s the approach we’ve used in practice; SIMccfraft.tla is one example (https://github.com/microsoft/CCF/blob/main/tla/consensus/SIMccfraft.tla). Recent TLC versions also have extended simulator coverage statistics (-Dtlc2.tool.Simulator.extendedStatistics=true), which make this kind of tuning easier to reason about.

It’s also worth mentioning that TLC supports reinforcement learning / Q-learning in addition to random walk. In our experience, though, getting Q-learning to work well is not especially straightforward, because picking the right variables for the hash/input space is non-trivial. Compared to that, manually adjusting action probabilities is much easier to understand and explain. We’ve also found Q-learning’s wall-clock performance to be significantly worse than random simulation with hand-tuned action probabilities.

IIUC, a human spec author can tune the action probabilities to drive the system into interesting states more often (see the file he linked for example).

It's also possible to use Q-learning: you can define a "reward" function that says how interesting a state transition is, and TLC will learn to favor the states or actions that led to that state transition. But you can only configure TLC to favor states that led to high-reward transitions (which is probably too fine-grained) or to favor actions that led to them (probably too coarse). Those are the only two options. Q-learning seems to be an undocumented experimental feature.

Markus LMK what I got wrong. =)

Igor Konnov

unread,

Apr 14, 2026, 3:28:35 AM (3 days ago) Apr 14

to tla...@googlegroups.com

Thanks Jesse!

Just to clarify for the future readers, if they look at the figures
without going through the rest of the text: The figures in the
blogpost are not given for the TLC simulations. The experiments are
closer to the random simulator of Quint. It would be interesting to
see how TLC's random simulation behaves w.r.t. state coverage on these
standard benchmarks.

Q-learning is an interesting direction, for sure. The main issue with
it is that it optimizes towards a single direction (dynamically
computed via the rewards). This is somewhat in conflict with
non-determinism in concurrent and distributed systems.

Cheers,
Igor

> --
> You received this message because you are subscribed to the Google Groups "tlaplus" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAFRUCtYyeVuGNFg9iKBehN-x2dzB50govFYvdpqASi-PcDwu1A%40mail.gmail.com.

Ovidiu Marcu

unread,

Apr 14, 2026, 8:08:53 AM (3 days ago) Apr 14

to tla...@googlegroups.com

Thanks for these highlights.

Would it be possible to pre-generate all states in memory/on disk (maybe get a count) before running any model checking or simulation?

For a few trillion states I would be motivated to explore those alternatives.

Ovidiu

To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CACooHMC0H3SYnosd8Qq7eo2e0mezd04H2u0EbjZ8XfipfbBHSg%40mail.gmail.com.

A. Jesse Jiryu Davis

unread,

Apr 14, 2026, 9:21:14 AM (3 days ago) Apr 14

to tla...@googlegroups.com

Thanks Igor, I missed that sentence in your post: "In the experiments, I am using a custom framework to represent the above specifications-as-code that makes it easy to experiment with different search procedures."

Ovidiu, I've experimented with full state-space dumps. You can do it with the "dot" format for GraphViz and parse that, but it's a hack. Will Schultz's fork of TLC can dump the full state space as JSON. Install it with this script and run it like:

java -jar dist/tla2tools.jar YourSpec.tla -dump json YourSpec

Outputs:
- YourSpec-states.json --- one entry per reachable state with fingerprint fp and variable values val.
- YourSpec-edges.json --- one entry per transition with from, to, and action metadata.

It also outputs an empty JSON file YourSpec.json, I guess that's a bug in Will's code. There's a discussion about officially adding this feature to TLC.

To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAGxa7HV2ETVhrUVViY3fGZ9ROc1r56Y1AnPOMAda4wBmvrWSRQ%40mail.gmail.com.

Ovidiu Marcu

unread,

Apr 14, 2026, 12:32:07 PM (3 days ago) Apr 14

to tla...@googlegroups.com

Thanks Jesse, very helpful.

One note: I have to comment out temporal properties/liveness checking to avoid an issue. Got some perf improvements see attached (+1 claude).

Ovidiu

To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAFRUCtbDk-gU%2B76KWWRa-b0tFa4f%3DdH5gcSB3zF_qNY8W5Xzpw%40mail.gmail.com.

JsonStateWriter.java

Igor Konnov

unread,

Apr 15, 2026, 3:50:15 AM (2 days ago) Apr 15

to tla...@googlegroups.com

I thought of repeating the experiments with TLC and TLC -simulate.
However, simulate does not report distinct states. It only counts the
generated states:

Starting... (2026-04-15 09:43:16)
Computed 1 initial states...
Finished computing initial states: 1 distinct state generated at
2026-04-15 09:43:16.
Progress: 2067418 states checked, 10000 traces generated (trace
length: mean=100, var(x)=0, sd=0)
The number of states generated: 2067418
Simulation using seed 3886328521748051410 and aril 0
Finished in 03s at (2026-04-15 09:43:19)

Is there an easy way to count how many distinct states the simulator visited?

Best,
Igor

On Tue, Apr 14, 2026 at 3:21 PM A. Jesse Jiryu Davis

> To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAFRUCtbDk-gU%2B76KWWRa-b0tFa4f%3DdH5gcSB3zF_qNY8W5Xzpw%40mail.gmail.com.

Markus Kuppe

unread,

Apr 16, 2026, 1:35:28 PM (18 hours ago) Apr 16

to tla...@googlegroups.com

You can get distinct state counts with a JVM system property and a small amount of TLA+.

Step 1: Enable extended statistics

Pass `-Dtlc2.tool.Simulator.extendedStatistics=true` to the JVM.

`java -Dtlc2.tool.Simulator.extendedStatistics=true -jar tla2tools.jar -simulate num=10000 MySpec`

This uses a HyperLogLog (constant space, approximate count) per worker to track distinct states. If you want exact counts and don't mind even more overhead, also pass `-Dtlc2.tool.Simulator.extendedStatistics.naive=true`

Step 2: Read the count via TLCGet("stats").distinct

With extended statistics enabled, TLCGet("stats").distinct returns the number of distinct states seen (instead of -1). You can use _PERIODIC to monitor it during simulation and POSTCONDITION to report it at the end.

Add to your spec:

```tla
SimStats ==
LET s == TLCGet("stats")
IN PrintT(<<"traces", s.traces,
"generated", s.generated,
"distinct", s.distinct>>)
```

Both need to be declared in the .cfg file:

```
_PERIODIC SimStats
POSTCONDITION SimStats
```

This will print something like the following at every progress interval and once at the end.

<<"traces", 10000, "generated", 2067418, "distinct", 8374>>

Caveats: The distinct count is per worker thread, not globally deduplicated across workers. With -workers 1 (the default) this doesn't matter. With multiple workers, each maintains its own counter. The HyperLogLog approximation is usually within a few percent of the true count. Use the naive flag if you need exactness.

M.

PS: TLCGet("stats”) also includes per-action counts, per-variable distinct values, and trace length statistics. I've used this (https://github.com/microsoft/CCF/issues/6537) to serialize the full record to ndjson and plot coverage over time.

Reply all

Reply to author

Forward