TLA+ model of OneThirdRule algorithm from "A Reduction Theorem" paper

Evgeniy Shishkin

unread,

Apr 18, 2017, 9:04:39 AM4/18/17

to tlaplus

Good day!

I have a question regarding very special TLA+ model from "A Reduction Theorem for the Verification of Round-Based Distributed Algorithms (2010)" paper,
by Mouna Chaouch-Saad, Bernadette Charron-Bost and Stephan Merz.

I am trying to reproduce results stated in the "verification with TLC" table on page 13.
For that reason I have tried to check OneThirdRule model from the paper.

I have the following configuration file for the model:
CONSTANTS N = 3
SPECIFICATION Safety

My results of model checking are very different from those shown in the table (Fig. 3)
In particular, for N=3 I have the following result:
[shishkin@oberon model]$ tlc OneThirdRule.tla
TLC2 Version 2.09 of 28 January 2016
Running in Model-Checking mode with 3 workers.
Parsing file OneThirdRule.tla
Parsing file /home/shishkin/tla2tools/tla/tla2sany/StandardModules/Naturals.tla
Parsing file /home/shishkin/tla2tools/tla/tla2sany/StandardModules/FiniteSets.tla
Parsing file HeardOf.tla
Parsing file /home/shishkin/tla2tools/tla/tla2sany/StandardModules/Sequences.tla
Semantic processing of module Naturals
Semantic processing of module Sequences
Semantic processing of module FiniteSets
Semantic processing of module HeardOf
Semantic processing of module OneThirdRule
Starting... (2017-04-18 15:35:38)
Computing initial states...
Finished computing initial states: 1 distinct state generated.
Progress(3) at 2017-04-18 15:35:42: 478721 states generated (478,721 s/min), 1920 distinct states found (1,920 ds/min), 982 states left on queue.
Model checking completed. No error has been found.
Estimates of the probability that TLC did not check all reachable states
because two distinct states had the same fingerprint:
calculated (optimistic): val = 6.3E-10
based on the actual fingerprints: val = 4.3E-13
2448897 states generated, 4783 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 4.
Finished in 10s at (2017-04-18 15:35:49)

For N=3 it took 10 seconds on my Intel core i7 with 3 TLC workers running versus 1.87 seconds (using less powerful CPU) shown in the table. Number of states also differ.
For N=4 I gave up waiting after 16 minutes of checking time on the same machine.
Can anyone suggest why this is so? I attached model files and config.

Thanks alot!

HeardOf.tla

OneThirdRule.cfg

OneThirdRule.tla

Stephan Merz

unread,

Apr 18, 2017, 12:59:11 PM4/18/17

to tla...@googlegroups.com

Hi Evgeniy,

thanks for your interest in this paper. The modules that I used are attached to this message, and I think they are equivalent to what you have, modulo comments, renaming, and some auxiliary definitions. The difference in performance comes from the fact that I used a VIEW specification so that TLC identifies any two states that only differ in the value of the auxiliary (history) variable heardof, as indicated in the last sentence of section 4 of the paper (p.13): this variable is useful essentially for interpreting counter-examples displayed by TLC but does not affect the behavior of the algorithm. In particular, it is not referred to by any predicate that is used in the specification. I am sorry if the explanation/hint in the paper is a little cryptic.

Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper: they are extremely naive as far as model checking fault-tolerant algorithms goes. If you really want to learn about model checking this kind of algorithm, I suggest that you look at the recent work by Igor Konnov et al. [1].

Best regards,

Stephan

[1] http://forsyte.at/people/konnov/

HeardOf.tla

OneThirdRule.cfg

OneThirdRule.tla

Evgeniy Shishkin

unread,

Apr 19, 2017, 5:24:00 AM4/19/17

to tlaplus

Hello Stephan,

Thanks so much for your clarification! After excluding heardof from state view I got exactly those results stated in the table.

> Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper

Yeah.
As far as I understand, the main idea of the paper is that an algorithm expressed in HO model can be model-checked with respect to
'local property' in coarse-grained model without loosing any meaningful states compared to fine-grained model.

Not sure if this forum is a right place for such a side note but anyway:

As far as I understand HO model by some reasons is especially well-suited for consensus algorithms, but as for some mutual exclusion algorithms it is not so.
I was wondering if HO model can be extended with N, number of alive processes as seen by environment fault detector on the current round, as a parameter giving
more expressive power to the model.

It may resemble RRFD by Gafni, but actually it still abstracts concrete set of suspected/alive processes into just a number.
Potentially this could lead to a bigger class of algorithms checkable/provable in a reasonable time with a help of (extended) reduction theorem.
It is then takes its place somewhere in the middle between rather concrete RRFD and rather abstract HO.

Maybe you have investigated something similar towards this line of thought?

вторник, 18 апреля 2017 г., 19:59:11 UTC+3 пользователь Stephan Merz написал:

Stephan Merz

unread,

Apr 19, 2017, 12:31:42 PM4/19/17

to tla...@googlegroups.com

Hello again, Evgeniy,

On 19 Apr 2017, at 11:24, Evgeniy Shishkin <evgeniy....@gmail.com> wrote:

Hello Stephan,

Thanks so much for your clarification! After excluding heardof from state view I got exactly those results stated in the table.

> Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper
Yeah.
As far as I understand, the main idea of the paper is that an algorithm expressed in HO model can be model-checked with respect to
'local property' in coarse-grained model without loosing any meaningful states compared to fine-grained model.

Exactly (but the theorem is independent of the verification technology and applies just as well to model checking as to deductive verification, where it leads to simpler proofs).

Not sure if this forum is a right place for such a side note but anyway:

As far as I understand HO model by some reasons is especially well-suited for consensus algorithms, but as for some mutual exclusion algorithms it is not so.

Indeed, the Heard-Of computational model is intended for representing fault-tolerant distributed algorithms, it will not be appropriate for mutual exclusion algorithms.

I was wondering if HO model can be extended with N, number of alive processes as seen by environment fault detector on the current round, as a parameter giving
more expressive power to the model.

I don't understand this: first, HO has no notion of fault detector, but of course the Heard-Of set HO(p,r) can be seen as the complement of the set D(p,r) of faulty processes signaled to process p in round r in the RRFD model by Gafni. Thus, the number of processes that p considers alive in round r is precisely the cardinality of HO(p,r), so "extending" the model by this information would not add any information?

It may resemble RRFD by Gafni, but actually it still abstracts concrete set of suspected/alive processes into just a number.
Potentially this could lead to a bigger class of algorithms checkable/provable in a reasonable time with a help of (extended) reduction theorem.
It is then takes its place somewhere in the middle between rather concrete RRFD and rather abstract HO.

Maybe you have investigated something similar towards this line of thought?

No, I haven't. We restricted attention to the HO model of algorithms.

Regards,

Stephan

вторник, 18 апреля 2017 г., 19:59:11 UTC+3 пользователь Stephan Merz написал:
Hi Evgeniy,

thanks for your interest in this paper. The modules that I used are attached to this message, and I think they are equivalent to what you have, modulo comments, renaming, and some auxiliary definitions. The difference in performance comes from the fact that I used a VIEW specification so that TLC identifies any two states that only differ in the value of the auxiliary (history) variable heardof, as indicated in the last sentence of section 4 of the paper (p.13): this variable is useful essentially for interpreting counter-examples displayed by TLC but does not affect the behavior of the algorithm. In particular, it is not referred to by any predicate that is used in the specification. I am sorry if the explanation/hint in the paper is a little cryptic.

Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper: they are extremely naive as far as model checking fault-tolerant algorithms goes. If you really want to learn about model checking this kind of algorithm, I suggest that you look at the recent work by Igor Konnov et al. [1].

Best regards,

Stephan

[1] http://forsyte.at/people/konnov/

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To post to this group, send email to tla...@googlegroups.com.
Visit this group at https://groups.google.com/group/tlaplus.
For more options, visit https://groups.google.com/d/optout.

Evgeniy Shishkin

unread,

Apr 20, 2017, 11:29:37 AM4/20/17

to tlaplus

Hello Stephan!

I am sorry not being precise enough to clearly communicate my idea.
Not to let you think of my previous post as of a complete nonsense I will try to describe the idea with greater clarity.

Just to put as on the same page, as I understand the HO-model among other things gives us:
* an ability to abstract away process crashes and message losses, concentrating on control flow of an algorithm using only
HO(p,r), round number and received messages set as a parameter.
* specify communication predicate under which an algorithm is going to operate correctly (in both safety and liveness sense)
This predicate is defined on HO(p,r) and fixed number of participants N.

Number N is not changing during system lifetime. If some process dies it will never be put into any of "future" HO sets, but N stays the same.
An algorithm in the HO-model is expressed in such a way that having a 'HO' set on a given round is sufficient to make a control decision.

This HO-set abstraction removes distinction between process crashes and late messages, and generally seems like helpful.
But there are some cases when one might wish to have alternative models:

* By "squeezing" node failures, message loses and trivially process silence phenomena into HO-set on a given round we actually reducing a scope of "HO-compatible" algorithms.

For OneThirdRule it is enough to receive replies from 2*N/3 and you do not care about all the others.
But at least for some distributed mutex algorithms to make them fault-tolerant it is not the case since each process have to know why someone is keeping silence:
is it because quiet process still executes critical section or because it fail-stopped?

* An algorithm expressed in HO is not able to adapt to decreasing number of alive participants N.

For OneThirdRule to reach a consensus we have to have not less than 2*N/3 replies. In some moment set of alive processes can become smaller than that, and
then the algorithm will not be able to reach a consensus despite the fact that actually it is achievable if only N could be changed "dynamically". We have kind of
N/3-fault-tolerance, but this fault-tolerance is not up to theoretically achievable (I guess to reach a consensus we need at least 3 alive nodes, right?)

To overcome stated difficulties, one might want to consider some different round-based models.

One might try to extend HO model with N(p,r), the number of processes for which p have not received a failure notice from reliable fault detector (RFD).
So we take a move from HO(p,r) to {HO(p,r), N(p,r)}. Henceforth N(p,r) is non-increasing function. Also note that it is definitely possible to have |HO(p,r)| <= N(p,r).

If N=const then we have a standard HO. If N=f(p,r) then hopefully such model will give more expressive power
(having a chance to express distributed mutual exclusion, for example? It is just what I came to at the moment, but I believe there are more species to consider),

One could argue that proposed hypothetical model is just an RRFD, but I do not agree, because:
* using reliable fault detector instead of unreliable makes this model more practical, since, as far as I know,
FDs used in real systems are trying to mimic RFD semantics
* D(p,r) is a set, so algorithm is obliged to work with particular process identifiers while proposed model abstracts it away.
* N(p,r) is non-increasing for all p,r, while D(p,r) could be increasing on some interval, possibly making verification harder.

Considering that we squeezed D(p,r) into a natural number N(p,r) and that fault detector is reliable there is a hope that an algorithm expressed in such a model
could be model-checked/verified rather efficiently by similar reduction principle (we have to prove corresponding 'extended' Reduction Theorem).

Do you see any fallacies in my reasoning? :-)

среда, 19 апреля 2017 г., 19:31:42 UTC+3 пользователь Stephan Merz написал:

Stephan Merz

unread,

Apr 21, 2017, 11:03:14 AM4/21/17

to tla...@googlegroups.com

Thank you very much for your explanation, I now understand better what you mean. Again, I have never considered any such extended model. I'd be delighted if an analogy of our "reduction" theorem for HO proved to be useful for your applications.