Hi Colin,
> I have a similar situation but with MDPs. I have downloaded the
> development version and I am looking to evaluate a Reward property on a
> very simple model (available if it helps). Using a reward based on the
> example you gave:
>
> R{"cost"}min=?[F s=2][F s=2]
>
> Unfortunately I get the following error:
>
> Model checking: R{"cost"}min=? [ F s=2 ][ F s=2 ]
>
> Error: java.lang.ClassCastException: parser.ast.ExpressionReward
> cannot be cast to parser.ast.ExpressionProb
> This is an unexpected error, it might be a good idea to restart PRISM.
Unfortunately, for MDPs the computation of conditional rewards is much
more complicated than for DTMCs and is not supported by the PRISM
extension presented in the SEFM'17 paper. Admittedly, the error message
could be worded a bit more clearly :)
We have addressed the computation of conditional rewards in MDP in our
TACAS'17 paper "Maximizing the Conditional Expected Reward
for Reaching the Goal" (extended version:
https://arxiv.org/pdf/1701.05389v1.pdf), providing an algorithm for
computing Rmax[F goal][F goal]. We also have a prototypical
implementation, if you'd like to try that, but it's much less polished
and does probably not scale as well as the implementation for DTMCs.
Just to give a high-level intuition why it's much more difficult: For
DTMCs, it suffices to just rescale the probabilities in each state of
the Markov chain, depending on the probability of satisfying the
condition. Then, the reward computation can be carried out "normally" in
the scaled DTMC.
For MDP, one also has to take into account that the scheduler can choose
between maximising the reward itself and trying to influence the
probability of satisfying the condition, which has an effect on the
value as well. Furthermore, the scheduler's decisions can depend on the
already accumulated reward and additionally have to take into account
the "global picture", i.e., it's not possible to just take locally
optimal choices in each state. Thus, the algorithms are quite different
from the standard approaches for computing maximal rewards.
Cheers,
Joachim Klein