about costs calculation in DTMC with rewards

27 views
Skip to first unread message

Terry Zhou

unread,
Sep 21, 2017, 5:44:37 PM9/21/17
to prismmod...@googlegroups.com, Dave Parker
Hi Dave,

I have two questions in computing the costs on a PRISM DTMC model with the rewards structure (see the example  .pm and image attached)

1) Is there any way I can compute a cost just for a path in the DTMC?  The cost from the initial state “0” to a state saying “7”, with the reachability (0 -> 7) probability < 1; saying “ R=?[F s=7];  and it returns infinity, even I attempt to set other paths to non-7 state zero.

2)Does PRISM always ask for the initial state for cost reasoning? Can I specify any non-initial state as the starting point, saying “7” to state “10(10, false,1,0)”.  (Does it make sense to extract the sub-model taking state “7” as the initial state and calculate the “sub costs”?)

Thanks in advance for your advice.


Regards,
Terry
fm_trival-1.pm
fm_trival-1.dot

Joachim Klein

unread,
Sep 22, 2017, 3:32:37 AM9/22/17
to prismmod...@googlegroups.com, Terry Zhou, Dave Parker
Hi Terry,

> I have two questions in computing the costs on a PRISM DTMC model with
> the rewards structure (see the example  .pm and image attached)
>
> 1) Is there any way I can compute a cost just for a path in the DTMC?
>  The cost from the initial state “0” to a state saying “7”, with the
> reachability (0 -> 7) probability < 1; saying “ R=?[F s=7];  and it
> returns infinity, even I attempt to set other paths to non-7 state zero.

I think what you want to compute is the conditional accumulated reward
for the paths that reach s=7? We have implemented support for computing
this in an extension of PRISM (and are currently working on integrating
that into the main PRISM) available here:

https://wwwtcs.inf.tu-dresden.de/ALGI/PUB/SEFM17/prism-conditional.zip

For theoretical / mathematical background, see
https://wwwtcs.inf.tu-dresden.de/ALGI/PUB/TACAS14/conditional-tacas14-extended.pdf

With that you can ask:

R=?[ F s=7 ][ F s=7 ]

i.e., "what is the accumulated reward until reaching state 7, under the
condition that state 7 is reached". First [] is the objective, the
second [] the condition. Essentially, this operator (for DTMCs) filters
out all paths that don't satisfy the condition. In your example, the
result would be 2 (those paths that actually reach 7 will have always
exactly 2 steps).


> 2)Does PRISM always ask for the initial state for cost reasoning? Can I
> specify any non-initial state as the starting point, saying “7” to state
> “10(10, false,1,0)”.  (Does it make sense to extract the sub-model
> taking state “7” as the initial state and calculate the “sub costs”?)

You can use the filter mechanism described at
http://www.prismmodelchecker.org/manual/PropertySpecification/Filters
to get information about non-initial states:

E.g.:

filter(state, R=?[ F s=10 ][ F s=10 ], s=7)

Result: 3.25 (value in the filter state)


You can use the state filter if there is exactly one state matching the
filter condition in the reachable state space, otherwise you can use the
print/printall filters to get information for all matching states.


Cheers,
Joachim

Colin Paterson

unread,
Oct 23, 2017, 12:16:23 PM10/23/17
to PRISM model checker
Thanks for this,
   I have a similar situation but with MDPs. I have downloaded the development version and I am looking to evaluate a Reward property on a very simple model (available if it helps).  Using a reward based on the example you gave:

    R{"cost"}min=?[F s=2][F s=2]

Unfortunately I get the following error:

    Model checking: R{"cost"}min=? [ F s=2 ][ F s=2 ]

    Error: java.lang.ClassCastException: parser.ast.ExpressionReward cannot be cast to parser.ast.ExpressionProb
    This is an unexpected error, it might be a good idea to restart PRISM.


If I use the reward formulation on a DTMC it works correctly and if I use Pmin=?[F s=2] or Pmin=?[F s=2][F s=2] on the MDP it returns what I expect.

I have to admit that I have only scanned the paper so far, but thought it was easier to ask the experts before pushing on too far.

Thank you

Colin.

Joachim Klein

unread,
Oct 23, 2017, 1:51:16 PM10/23/17
to prismmod...@googlegroups.com, Colin Paterson
Hi Colin,

>    I have a similar situation but with MDPs. I have downloaded the
> development version and I am looking to evaluate a Reward property on a
> very simple model (available if it helps).  Using a reward based on the
> example you gave:
>
>     R{"cost"}min=?[F s=2][F s=2]
>
> Unfortunately I get the following error:
>
>     Model checking: R{"cost"}min=? [ F s=2 ][ F s=2 ]
>
>     Error: java.lang.ClassCastException: parser.ast.ExpressionReward
> cannot be cast to parser.ast.ExpressionProb
>     This is an unexpected error, it might be a good idea to restart PRISM.

Unfortunately, for MDPs the computation of conditional rewards is much
more complicated than for DTMCs and is not supported by the PRISM
extension presented in the SEFM'17 paper. Admittedly, the error message
could be worded a bit more clearly :)

We have addressed the computation of conditional rewards in MDP in our
TACAS'17 paper "Maximizing the Conditional Expected Reward
for Reaching the Goal" (extended version:
https://arxiv.org/pdf/1701.05389v1.pdf), providing an algorithm for
computing Rmax[F goal][F goal]. We also have a prototypical
implementation, if you'd like to try that, but it's much less polished
and does probably not scale as well as the implementation for DTMCs.


Just to give a high-level intuition why it's much more difficult: For
DTMCs, it suffices to just rescale the probabilities in each state of
the Markov chain, depending on the probability of satisfying the
condition. Then, the reward computation can be carried out "normally" in
the scaled DTMC.

For MDP, one also has to take into account that the scheduler can choose
between maximising the reward itself and trying to influence the
probability of satisfying the condition, which has an effect on the
value as well. Furthermore, the scheduler's decisions can depend on the
already accumulated reward and additionally have to take into account
the "global picture", i.e., it's not possible to just take locally
optimal choices in each state. Thus, the algorithms are quite different
from the standard approaches for computing maximal rewards.


Cheers,
Joachim Klein
Reply all
Reply to author
Forward
0 new messages