Is Dempster Shaffer/Probabilistic Reasoning possible in TLA plus

90 views
Skip to first unread message

Matthew Thompson

unread,
Aug 29, 2025, 10:06:22 AM (8 days ago) Aug 29
to tlaplus
Hello group,
 
I am new to formal methods and TLA plus, I have followed Leslie Lamport's excellent and beautifully idiosyncratic tutorial, I now think what I am trying to achieve is not actually possible with TLA plus. 

My objective is to learn if probabilistic reasoning or similar can be done with formal methods. Would anybody be able to provide me information where I can learn from the ground up how to do this? 

Please excuse this email if it is out of scope of this group. 

Many thanks,
Matthew Thompson



Andrew Helwer

unread,
Aug 29, 2025, 11:26:59 AM (8 days ago) Aug 29
to tla...@googlegroups.com
I don't know what Dempster Shaffer reasoning is, but for formalizing & model-checking probabilistic properties of state machines I wrote an introductory blog post you might like, which covers Discrete-Time Markov Chains (DTMCs) and Markov Decision Processes (MDPs): How do you reason about a probabilistic distributed system?

The premier probabilistic model-checking platform is currently PRISM. The FizBee specification language also checks DTMC properties. TLA+ itself does not easily check these properties. For info on TLA+ capabilities in this domain, see the talk Obtaining Statistical Properties via TLC Simulation from the 2022 TLA+ Conference by Jack Vanlighty and Markus A. Kuppe.

You might be interested in how these systems actually formalize and check probabilistic properties over state machines, in which case the best introduction I've found is Model Checking Meets Probability: A Gentle Introduction by Joost-Pieter Katoen. This covers DTMCs and MDPs, and also Continuous-Time Markov Chains (CTMCs). CTMCs are often used in queuing theory analyses, with varying probability distributions generating arrival rates. One densely theoretical resource for this topic I've read is the textbook Performance Modeling and Design of Computer Systems: Queueing Theory in Action by Mor Harchol-Balter. This author also has published a newer textbook called Introduction to Probability for Computing which might be more approachable; I don't know, I haven't read it.

Andrew Helwer

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/9ab49cd2-773a-43eb-8d3d-17dd54d426f0n%40googlegroups.com.

Matthew Thompson

unread,
Aug 29, 2025, 11:38:34 PM (8 days ago) Aug 29
to tla...@googlegroups.com
Thanks for these links 🙏🏼

I've read you're article and some of the Prism tutorials. My initial read is of bits being used in a Frequentist view of probability, it'll take me a bit more time to absorb and get to the dynamic/belief probabilistic aspects. 

I thought I'd share more info on why I was asking about Dempster-Shafer; Dempster introduced and Shafer refined belief and plausibility functions, which are represented in the range [0,1] (which itself represents no knowledge). 

It was an advancement on Shortliffe's Certainty Factor in Expert Systems - specifically the MYCIN system. To be fair this looked like a nice development to help solve the "Penguin's can't fly" monotonic problem - apparently a significant part of what killed expert systems back in the day and took Certainty Factors down with it. 

Here's an extract from Norvig's Paradigms of Artificial Intelligence Programming.


"Before MYCIN, most reasoning with uncertainty was done using probability theory. The laws of probability-in particular, Bayes's law-provide a well-founded mathematical formalism that is not subject to the inconsistencies of certainty factors. Indeed, probability theory can be shown to be the only formalism that leads to rational behavior, in the sense that if you have to make a series of bets on some uncertain events, combining information with probability theory will give you the highest expected value for your bets. Despite this, probability theory was largely set aside in the mid-1970s. The argument made by Shortliffe and Buchanan (1975) was that probability theory required too many conditional probabilities, and that people were not good at estimating these. They argued that certainty factors were intuitively easier to deal with. Other researchers of the time shared this view. Shafer, with later refinements by Dempster, created a theory of belief functions that, like certainty factors, represented a combination of the belief for and against an event. Instead of representing an event by a single probability or certainty, Dempster-Shafer theory maintains two numbers, which are analogous to the lower and upper bound on the probability. Instead of a single number like .5, Dempster-Shafer theory would have an interval like [.4,.6] to represent a range of probabilities. A complete lack of knowledge would be represented by the range [0,1]. A great deal of effort in the late 1970s and early 1980s was invested in these and other nonprobabilistic theories. Another example is Zadeh's fuzzy set theory, which is also based on intervals."

And a vid that nicely shows how it is different from Probability Theory - the car being *either* black or brown and that is not included in the mass of a car being *only* black or brown really helped me understand further: https://youtu.be/51ssBAp_i5Y?si=aZEybAL5EOGrUqLD 

Matt 




You received this message because you are subscribed to a topic in the Google Groups "tlaplus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tlaplus/KB-c_dyxlVE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CABj%3DxUX_srwawP7E%2BPCVBu9ks9yVdGMAKtF%3D2iOZAQrvNwXpjQ%40mail.gmail.com.

A. Jesse Jiryu Davis

unread,
Aug 30, 2025, 2:29:12 AM (8 days ago) Aug 30
to tla...@googlegroups.com
That's very interesting, Matt.

I reviewed the shortcomings of TLA+ for probabilistic modeling and discussed some other formal languages here:

mohan radhakrishnan

unread,
Aug 30, 2025, 2:31:15 AM (8 days ago) Aug 30
to tla...@googlegroups.com
So many interesting resources.

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tlaplus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tlaplus/KB-c_dyxlVE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tlaplus+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAFRUCtaS1pXHPDc22KgHWwO2j38RRhjyjK2%2BjOKrNv--GjMsgQ%40mail.gmail.com.

Rob Fielding

unread,
Aug 30, 2025, 5:51:35 AM (7 days ago) Aug 30
to tla...@googlegroups.com
I also tried to do TLA+ with probabilities, because I primarily wanted to model business processes. Markov Chains don't quite work, as most state changes are not probabilistic. But an MDP does work, and it's amazing in a lot of ways. Messages passed between the MDP processes are effectively the same as rewards. It would be worth starting from scratch to have a system that models systems as Communicating Markov Decision Processes. A normal Markov process ASSUMES that the transition matrix is exponentiated n times. But imagine an arbitrary matrix where rows sum to 1.0 on each step. That happens when you try to mix determinism and probability.

You either need a Markov Decision Process, where actors passing messages are the "rewards"; or you can use fully deterministic message-passing processes, where one of the actors is a random number generator, like a /dev/random. In particular, a stochastic process where on each step you might have a DIFFERENT transition matrix where all rows add to 1.0, seems the way to go. You can fruitfully combine determinism and randomness that way. ie: some state changes from a.x > 0, or a.x <= 0, or a.x++, or a.ch[0] ? a.v; a.v++. And these are different from chance nodes, as seen in MDPs used for Reinforcement learning. A "bipartite" graph that alternates between deterministic changes and chance nodes like: ((0.0 <= r0 < 0.3) /\ (x' = x+1)) \/ ((0.3 <= r0 /\ (x' = x-1)) ... chance nodes that update state based on pseudo-random r0, etc; are the heart of an MDP. It's HOW you get around Markov Chains only changing on randomness; by including also deterministic and message-driven changes.

Trying to represent business processes without chance nodes is hard. A Python library that implements temporal logic "possibly", "eventually", "always/necessarily" (two kinds of box/diamond CTL logic); with chance nodes seems to be what you need. Describe a system in English to an LLM. If it makes a set of MDPs to model the system, then you can trivially make: state machine and interaction diagrams, line and pie charts, and temporal logic statements CTL*. Check it in as executable requirements that tell the LLM how to write the full code.

ie: a "bakery" has processes "production", "truck", "store", "customers". Product and money exchanges are handled with normal process calculus message passing (ie: "b.ch[0] ! msg", and b.ch[0] ? b.v" to send and rcv messages asynchronously. Deterministic and message-driven state changes are deterministic. Internal (unexplained) changes are from chance nodes.

From an LLM, it's useful to represent it as a library in a language (ie: Python) rather than needing a totally new compiler.

So many interesting resources.

To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tlaplus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tlaplus/KB-c_dyxlVE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tlaplus+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/CAOoXFP8h3cVxZ0GxRkGESo-_GkAXdkcM1Q3CtMFiyGEAhi-Qkg%40mail.gmail.com.


--

Matthew Thompson

unread,
Sep 1, 2025, 2:27:41 AM (6 days ago) Sep 1
to tla...@googlegroups.com
Thanks for the replies all. There is a lot for me to think about. 

For more context, I'm looking to model Meta-cognition for a Masters project and Dempster-Shafer is a more tractable version of Bayes Theorem (in a way) which allows for ignorance/uncertainty in evidence and outcome. To my understanding the closest we get to ignorance in Bayes is a uniform prior. 

I've gone back to what I'm thinking about modeling, especially the invariant(s) that I want to remain true. I endeavour to remain on point, correct, and not waffle!

For now the belief (Bel(A)) and plausibility (Pl(A)) are calculated in a similar way to Self-Consistency (https://matt.thompson.gr/2025/06/26/ia-series-n-building-a.html) and ignorance comes from actual evals for a given domain.

Each has a mass value, which can be seen of as like a probability:

Mass of evidence supporting A: m(A)
Mass of evidence not supporting A:  m(¬A)
Mass of ignorance: m(Θ)

The masses are normalised to add to 1. 

Ignorance, m(Θ), is relatively static as it is from evaluations and is the value of inaccuracy of an model in that given domain. E.g. if it is a Maths question and the model scores 67% on Maths questions then the ignorance is 1 - 0.67 = 0.33.

Remaining mass (0.67) gets allocated by response distribution (effectively the Self-Consistency count)

- If 70% of responses support A: 

m(A) = 0.469 (70% of 0.67)
m(¬A) = 0.201 (30% of 0.67)
m(Θ) = 0.33
Pl(A) = 1 - Bel(¬A) = 1 - 0.201 = 0.799

In terms of invariants:

- Pl(A) ≥ Bel(A) (to keep rationality)
- m(A) + m(¬A) + m(Θ) = 1 (to ensure mass is always normalization)
- Bel(A) ≥ min_belief ∧ Pl(A) ≥ min_plausibility (an operational threshold - could be reversed to find the point where there is enough evidence to meet a threshold)

I hope that is clear, if it is does it make sense to use TLA+ ? 

To be clear, part of my motivation is being able to write an algo and have something outside me (spell) check it 😉. Another part is that I think the next steps would be distributed decision making, and TLA+ seems like a solid base to build on. 

Hope weeks are starting well, thanks for reading,
Matt



Andrew Helwer

unread,
Sep 2, 2025, 5:59:14 PM (4 days ago) Sep 2
to tla...@googlegroups.com
You could use TLA+ for this, probably PlusCal if you are talking about an algorithm. For model-checking, TLC (the TLA+ model checker) does not support working with non-integers, so you could either divide your mass spectrum up into integral units or use dyadic rationals. The value you get from using the TLA+ model-checker here is probably proportional to how much concurrency there is in your algorithm. A few years ago I evaluated the relative merits of using PlusCal vs. Python + Hypothesis for analyzing algorithms; you can read that post here. TLA+ would also allow you to prove your algorithm correct using TLAPS, although that is certainly not trivial especially for reasoning about the Real numbers. An extension to TLAPS handling reasoning over Reals was developed in a research project but has not (yet?) been merged with mainline TLAPS.

Andrew Helwer

Reply all
Reply to author
Forward
0 new messages