Best way to handle a probabilistic data structure.

thomas...@gmail.com

unread,

Oct 18, 2024, 11:31:46 AM10/18/24

to tlaplus

I wanted to simulate something like a Bloom Filter. For those unaware, a Bloom Filter is kind of like a set, but when looking up a value in a set, there's a risk of a false positive, but never a risk of a false negative.

I'm not terribly concerned with the actual implementation of this, so I just wanted to make an operator like "IsInFilter(value, filter)", where filter is actually just a set, and if value is in the set it might return TRUE or FALSE, but if it's not in the set it always returns FALSE.

I'm having a little trouble expressing the non-determinism with this. My initial implementation was something like this:

IsInSet(value, mySet) ==
\E result \in {TRUE, FALSE} :
(value \in mySet => result) /\ (value \notin mySet => FALSE)

But I think the existential quantifier in this case is tautological. Does anyone have any ideas on the best way of handling this?

thomas...@gmail.com

unread,

Oct 18, 2024, 12:17:05 PM10/18/24

to tlaplus

I just realized that this might be a malformed question; it probably makes sense to express this in the "next-state" level instead of the operator level.

Andrew Helwer

unread,

Oct 18, 2024, 1:16:40 PM10/18/24

to tlaplus

If you want to analyze the actual probability properties of something like this, you may be more interested in PRISM. It supports modeling both Discrete-Time Markov Chains (DTMCs) which have a probabilistic state transition out of every state, and a mixed nondeterminism/probabilistic transition model called a Markov Decision Process (MDP). I wrote about the differences between these two models here: https://ahelwer.ca/post/2020-09-11-probabilistic-distsys/

If you are more interested in exploring this problem with TLA+ I recommend taking a look at this spec authored by Markus Kuppe and Ron Pressler: https://github.com/tlaplus/Examples/tree/master/specifications/KnuthYao

Andrew Helwer

A. Jesse Jiryu Davis

unread,

Oct 18, 2024, 1:32:24 PM10/18/24

to tla...@googlegroups.com

Thomas, if you're interested in the actual probabilities (i.e. what percent of the time do you get false positives?) then I agree with Andrew's answer. If, on the other hand, you don't care about the probabilities, and you just want to specify a system that works correctly in the face of false positives, you might do something like this:

\* Some action that relies on the Bloom filter

Search(value) ==

\* Either the correct result or a false positive

\E bloomResult \in { (value \in mySet), TRUE} :

\* Do something with the result

latestResult' = bloomResult

This spec explores both the correct-result and false-positive paths.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/5f6efaa9-10bd-48a6-aa00-be4975d75741n%40googlegroups.com.

thomas...@gmail.com

unread,

Oct 18, 2024, 2:01:52 PM10/18/24

to tlaplus

I don't care about the probabilities, I just want to express that false positives are possible to ensure my model works correctly when dealing with them, so I don't think I need to use PRISM or anything like that. As I mentioned, I think trying to avoid any kind of "next-state" semantics on this was probably the wrong approach anyway.

Most of my model is using PlusCal so I think I can express what I need with an `either` or a `with`.

Reply all

Reply to author

Forward