Hi Gabe,
> I have simplified the model (attached), so that there is a single
> non-deterministic choice to be made. If action IncAlt is taken, then the
> probability of failing is 0.03, so P [ G !failed ] = 0.97. If action
> IncAlt is not taken then the probability of failing is 0.07, so P [ G
> !failed ] = 0.93.
>
> I generate an adversary with this command line:
> prism simple.prism -pctl 'multi(Rmax=? [ C ], P>=0.95 [ G !failed
> ])' -exportadv r.adv -exportstates r.sta -exportlabels r.lab -s -lp
> and get the following results:
>
> [...]
> r.adv
> ------
> 13 ?
> 0 3 0.93 tick
> 0 8 0.07 tick
> 1 0 0.5 -
> 1 2 0.5 IncAlt
> 2 5 0.97 tick
> 2 10 0.03 tick
> [...]
>
> If I'm interpreting this correctly from the initial state 1, it takes
> action IncAlt with prob 0.5 and does not take IncAlt with prob 0.5. In
> the latter case, it goes to state 0, from which there's only a 0.93 prob
> of not failing, thus not satisfying the P>=0.95 [ G !failed ] part of
> the multi property. Again, why does the adversary include the
> possibility of taking an action that does not satisfy the property if an
> adversary that would take IncAlt with prob 1 would satisfy it?
The adversary has two choices, taking IncAlt or not taking it. By taking
IncAlt with 0.5, as you said, from then on G !failed is satisfied with
0.97. Overall, this contributes 0.485 (0.5*0.97). When the adversary
doesn't take IncAlt (with 0.5 again), from then on G !failed is
satisfied with 0.93. From the initial state, this contributes 0.465
(0.5*0.93). So, overall, when the adversary is followed, G !failed is
satisfied with 0.465 + 0.485 = 0.95, which is exactly the lower bound
you specified in your multi-objective specification. Intuitively, the
adversary can afford to take the worse action to a certain extent, as it
is balanced by the higher probability for satisfying G !failed when
taking the better choice (IncAlt). If you increase the bound of P>=x in
your multi-objective to, say 0.97, you'll see that the the adversary has
to choose IncAlt with probability 1.
Algorithmically, this arises because the adversary is obtained from the
solution of the linear program, and there is no additional
"optimization" between the various possible adversaries. I.e., a human
would see that it makes sense to choose IncAlt in all cases "to be on
the safe side", while the LP simply yields that a valid adversary can
choose both actions with 0.5 and still satisfy the objective.
> But it gets weirder. From state 5, in which a=1, it goes to state 4
> where a=0. There is nothing in the model that could make a=0 after it
> has become 1. In fact, if I simulate the model, taking the IncAlt action
> (which is the only way to make a=1), there is no way to make a=0 again.
I'm not an expert on the multi-objective engine in PRISM, but my best
guess would be the following: To be able to handle the "G ! failed" path
formula, a deterministic Rabin automaton is generated for the formula
and the actual multi-objective computations are then carried out on the
product of the MDP and the automaton. Note that this product has a
different state space than the original MDP (due to the automaton
states). The exported adversary (a memoryless, randomized scheduler for
the product MDP) then uses the state space of the product. In effect,
the automaton states serve as memory for the adversary (e.g., "have I
already seen a failed state" etc).
You can use the (hidden) option
-exportprodstates prod.sta
to get the states in the product. The first index is the automaton
state, the rest are the state variables in the original MDP. From a
cursory look, it looks plausible that this would explain your observation.
Cheers,
Joachim