randomized vs deterministic adversary

81 views
Skip to first unread message

gabrie...@gmail.com

unread,
Jun 3, 2016, 5:41:58 PM6/3/16
to PRISM model checker, jvc...@gmail.com
Hi all,
I have a question about the adversary generated for an MDP. To briefly explain the attached model, it has a concept of a clock, that starts with time=0. Before the clock ticks for the first time and thus before time becomes 1, the modeled system can non-deterministically choose between none, either or both of the actions GoTight_start and IncAlt_start.

I generate an adversary as follows:

    prism model.prism -pctl 'multi(Rmax=? [ C ], P>=0.87 [ G !failed ])' -exportadv r.adv -exportstates r.sta -exportlabels r.lab -s -lp

The adversary generated is randomized: with probability 0.928735 will take actions IncAlt_start and GoTight_start before tick, and with probability 0.0712649 it will take action GoTight_start and then tick (i.e., no IncAlt_start before tick).

Next, I change the model so that it is not possible to choose IncAlt_start before tick, that is, enforcing one choice that the previous adversary chooses with low probability. That is done by changing line 53
from:
    formula IncAlt_applicable = a < MAX_ALT_LEVEL;
to:
    formula IncAlt_applicable = time > 0 & a < MAX_ALT_LEVEL;

If I try to generate an adversary for this model, PRISM cannot find a solution.

Just to check, my understanding of the meaning of the multi-objective property I'm using is "What is the maximum reward that can be accumulated such that the probability of never failing is >= 0.87?"

So, I have two questions:
1) why is PRISM generating a randomized adversary that includes the possibility of making a choice that would not satisfy P>=0.87 [ G !failed ] ?  (or perhaps I'm misinterpreting those results)
2) is it possible to force PRISM to generate a deterministic adversary? (I haven't looked into the theory behind the adversary generation, so I'm not sure if there is some fundamental limitation)

Thanks,
Gabe

model.prism

Matthias Güdemann

unread,
Jun 4, 2016, 4:35:01 AM6/4/16
to PRISM model checker, jvc...@gmail.com, gabrie...@gmail.com
Hi Gabe,

the problem in the second model is that "Pmax [ G !failed]" is ca. 0.839, i.e., no adversary with P>=0.87 exists. In the first model the probability is just over 0.87 and therefore an adversary exists.


On Friday, June 3, 2016 at 11:41:58 PM UTC+2, gabrie...@gmail.com wrote:
Just to check, my understanding of the meaning of the multi-objective property I'm using is "What is the maximum reward that can be accumulated such that the probability of never failing is >= 0.87?"

yes, but as Pmax < 0.87 the property is not fulfilled for any adversary.

best regards
Matthias

gabrie...@gmail.com

unread,
Jun 5, 2016, 2:20:31 PM6/5/16
to PRISM model checker, jvc...@gmail.com, gabrie...@gmail.com
Hi Matthias,
Thanks for replying.

That's exactly my point, that it is necessary to take the IncAlt_start action when time=0 to satisfy P>=0.87. So, back to my question, why does the randomized adversary does not take the IncAlt_start action when time=0 with probability 0.0712649? Doesn't that mean that it won't be able to satisfy P>=0.87 [ G !failed] with probability 0.0712649?

Thanks,
Gabe

Matthias Güdemann

unread,
Jun 6, 2016, 7:03:49 AM6/6/16
to PRISM model checker, jvc...@gmail.com, gabrie...@gmail.com
Hi Gabe,

ok, I understand your point better now.

In my opinion, taking the action is not probabilistic, but a non-determinsitc choice to use a specific probability distribution in that state. Therefore, using time > 0, you are limiting the choice of probabilisitc distributions available in the first state. So in your original model, the distribution taken with the action IncAlt_start leads (together with the other non-deterministic choices of probability distributions) to a maximal probability P >= 0.87 to reach a "failed" state. On the other hand, if you prevent the choice of this probability distribution, no such further choice is possible.

But to be honest, I am not sure about this.

Best regards,
Matthias

gabrie...@gmail.com

unread,
Jun 9, 2016, 11:35:29 AM6/9/16
to PRISM model checker, jvc...@gmail.com, gabrie...@gmail.com
I'm still confused, and don't get how your answer explains my question about the randomized adversary allowing a choice that leads to not satisfying the property.

I have simplified the model (attached), so that there is a single non-deterministic choice to be made. If action IncAlt is taken, then the probability of failing is 0.03, so P [ G !failed ] = 0.97. If action IncAlt is not taken then the probability of failing is 0.07, so P [ G !failed ] = 0.93.

I generate an adversary with this command line:
    prism simple.prism -pctl 'multi(Rmax=? [ C ], P>=0.95 [ G !failed ])' -exportadv r.adv -exportstates r.sta -exportlabels r.lab -s -lp
and get the following results:

r.lab
------
0="init" 1="deadlock"
1: 0
9: 1
10: 1

r.sta
------
(time,a,IncAlt_go,failed)
0:(0,0,false,false)
1:(0,0,true,false)
2:(0,1,false,false)
3:(1,0,true,false)
4:(1,0,true,true)
5:(1,1,false,false)
6:(1,1,false,true)
7:(1,1,true,false)
8:(1,1,true,true)
9:(2,1,false,false)
10:(2,1,false,true)
11:(2,1,true,false)
12:(2,1,true,true)

r.adv
------
13 ?
0 3 0.93 tick
0 8 0.07 tick
1 0 0.5 -
1 2 0.5 IncAlt
2 5 0.97 tick
2 10 0.03 tick
3 4 1 IncAlt
4 7 1 tick
5 4 1 -
6 6 1 _ec
7 6 1 -
8 9 1 IncAlt
9 12 1 tick
10 9 1 -
12 11 1 -

If I'm interpreting this correctly from the initial state 1, it takes action IncAlt with prob 0.5 and does not take IncAlt with prob 0.5. In the latter case, it goes to state 0, from which there's only a 0.93 prob of not failing, thus not satisfying the P>=0.95 [ G !failed ] part of the multi property. Again, why does the adversary include the possibility of taking an action that does not satisfy the property if an adversary that would take IncAlt with prob 1 would satisfy it?

But it gets weirder. From state 5, in which a=1, it goes to state 4 where a=0. There is nothing in the model that could make a=0 after it has become 1. In fact, if I simulate the model, taking the IncAlt action (which is the only way to make a=1), there is no way to make a=0 again.

I'm really confused. Am I interpreting the generated files correctly?

Thanks,
Gabe
simple.prism

Joachim Klein

unread,
Jun 10, 2016, 1:46:59 PM6/10/16
to prismmod...@googlegroups.com, jvc...@gmail.com, gabrie...@gmail.com
Hi Gabe,

> I have simplified the model (attached), so that there is a single
> non-deterministic choice to be made. If action IncAlt is taken, then the
> probability of failing is 0.03, so P [ G !failed ] = 0.97. If action
> IncAlt is not taken then the probability of failing is 0.07, so P [ G
> !failed ] = 0.93.
>
> I generate an adversary with this command line:
> prism simple.prism -pctl 'multi(Rmax=? [ C ], P>=0.95 [ G !failed
> ])' -exportadv r.adv -exportstates r.sta -exportlabels r.lab -s -lp
> and get the following results:
>
> [...]
> r.adv
> ------
> 13 ?
> 0 3 0.93 tick
> 0 8 0.07 tick
> 1 0 0.5 -
> 1 2 0.5 IncAlt
> 2 5 0.97 tick
> 2 10 0.03 tick
> [...]
>
> If I'm interpreting this correctly from the initial state 1, it takes
> action IncAlt with prob 0.5 and does not take IncAlt with prob 0.5. In
> the latter case, it goes to state 0, from which there's only a 0.93 prob
> of not failing, thus not satisfying the P>=0.95 [ G !failed ] part of
> the multi property. Again, why does the adversary include the
> possibility of taking an action that does not satisfy the property if an
> adversary that would take IncAlt with prob 1 would satisfy it?

The adversary has two choices, taking IncAlt or not taking it. By taking
IncAlt with 0.5, as you said, from then on G !failed is satisfied with
0.97. Overall, this contributes 0.485 (0.5*0.97). When the adversary
doesn't take IncAlt (with 0.5 again), from then on G !failed is
satisfied with 0.93. From the initial state, this contributes 0.465
(0.5*0.93). So, overall, when the adversary is followed, G !failed is
satisfied with 0.465 + 0.485 = 0.95, which is exactly the lower bound
you specified in your multi-objective specification. Intuitively, the
adversary can afford to take the worse action to a certain extent, as it
is balanced by the higher probability for satisfying G !failed when
taking the better choice (IncAlt). If you increase the bound of P>=x in
your multi-objective to, say 0.97, you'll see that the the adversary has
to choose IncAlt with probability 1.

Algorithmically, this arises because the adversary is obtained from the
solution of the linear program, and there is no additional
"optimization" between the various possible adversaries. I.e., a human
would see that it makes sense to choose IncAlt in all cases "to be on
the safe side", while the LP simply yields that a valid adversary can
choose both actions with 0.5 and still satisfy the objective.


> But it gets weirder. From state 5, in which a=1, it goes to state 4
> where a=0. There is nothing in the model that could make a=0 after it
> has become 1. In fact, if I simulate the model, taking the IncAlt action
> (which is the only way to make a=1), there is no way to make a=0 again.

I'm not an expert on the multi-objective engine in PRISM, but my best
guess would be the following: To be able to handle the "G ! failed" path
formula, a deterministic Rabin automaton is generated for the formula
and the actual multi-objective computations are then carried out on the
product of the MDP and the automaton. Note that this product has a
different state space than the original MDP (due to the automaton
states). The exported adversary (a memoryless, randomized scheduler for
the product MDP) then uses the state space of the product. In effect,
the automaton states serve as memory for the adversary (e.g., "have I
already seen a failed state" etc).

You can use the (hidden) option
-exportprodstates prod.sta
to get the states in the product. The first index is the automaton
state, the rest are the state variables in the original MDP. From a
cursory look, it looks plausible that this would explain your observation.



Cheers,
Joachim

gabrie...@gmail.com

unread,
Jun 12, 2016, 8:32:59 AM6/12/16
to PRISM model checker, jvc...@gmail.com, gabrie...@gmail.com
Hi Joachim,
Thanks for your explanation! Now I understand why that randomized adversary is being generated.

What I wanted to compute was a resolution of the nondeterministic choices such that the reward was maximized and the probability of not failing was not less that 0.95 for any possible realization of the probabilistic transitions in the model. I'm not sure how to express what PRISM is computing, but it's not "for any possible realization of the probabilistic transitions in the model"

Is there a way to force PRISM to compute a deterministic adversary? In that case, all the probabilistic transitions in the adversary would be those originally in the model, and as such the solution would be "for any possible realization of the probabilistic transitions in the model".

Regarding the second part question and answer, about the seemingly infeasible transitions, it is unfortunate that the output is like that, because I cannot import back the adversary and check P=? [ G !failed], for example, because I get 0. However, this is not that critical. I just was trying to use this to understand the properties of the adversary generated.

Thanks,
Gabe

Joachim Klein

unread,
Jun 12, 2016, 4:51:44 PM6/12/16
to prismmod...@googlegroups.com, jvc...@gmail.com, gabrie...@gmail.com
Hi Gabe,

> Is there a way to force PRISM to compute a deterministic adversary? In
> that case, all the probabilistic transitions in the adversary would be
> those originally in the model, and as such the solution would be "for
> any possible realization of the probabilistic transitions in the model".

Well, if I recall correctly from the theory, there are situations where
randomization is needed to achieve multiple objectives. So, in general
you can't expect deterministic adversaries. As I've said, I'm not an
expert on the multi-objective part, but I haven't seen any option for
restricting to deterministic adversaries and my guess would be that the
theory/algorithms would look quite different.


> Regarding the second part question and answer, about the seemingly
> infeasible transitions, it is unfortunate that the output is like that,
> because I cannot import back the adversary and check P=? [ G !failed],
> for example, because I get 0. However, this is not that critical. I just
> was trying to use this to understand the properties of the adversary
> generated.

Yes, it would be nice to have a more convenient output. However, using
the information of -exportprodstates and of the adversary, it shold be
possible with a bit of effort to construct a DTMC that has the required
label information, e.g., a label "failed" defined for states in the
product that would be labeled with "failed" when ignoring the DA
automata states, that would allow you to verify that the adversary
behaves as expected.


Cheers,
Joachim

gabrie...@gmail.com

unread,
Jun 12, 2016, 7:30:38 PM6/12/16
to PRISM model checker, jvc...@gmail.com
Thanks for your reply!
Reply all
Reply to author
Forward
0 new messages