Rewards & Properties Expressions

50 views
Skip to first unread message

Bilal Sarwar

unread,
Nov 28, 2024, 2:51:45 AM11/28/24
to PRISM model checker
Hi,
I am conducting research on path planning for autonomous vehicles and have formalized a model using PRISM Model Checker. The system has been modeled as a Markov Decision Process (MDP) with uniform probabilities. As part of my analysis, I am attempting to achieve the following objectives:

  1. Calculate Total Lane Changes:
    I have written reward expressions and properties to measure the total number of lane changes in the model. However, upon verification, the results consistently yield infinity, which appears to be incorrect. I would appreciate guidance on how to structure a proper reward expression to accurately capture the cumulative count of lane changes.

  2. Calculate Total Acceleration Cost with Negative Rewards:
    I am also looking to calculate the total acceleration cost by assigning negative rewards during states or transitions where acceleration occurs. I seek confirmation on whether this approach is appropriate or if there are alternative strategies to model and calculate acceleration costs effectively.

  3. Calculate Time Steps Before Entering a Critical State:
    Additionally, I aim to compute the number of time steps before the ego vehicle transitions into a critical state (e.g., from s=0 to s=7). However, when I use the PRISM simulator to observe the rewards, it only displays values at s=0 and s=7. I need guidance on how to properly define and capture rewards or properties to account for the intermediate states and the total time steps elapsed during this transition.

I have attached the properties and reward expressions I have implemented so far for your review. Any suggestions or advice on how to address these challenges would be greatly appreciated.

Thank you for your time

Screenshot from 2024-11-28 12-45-50.pngScreenshot from 2024-11-28 12-46-16.pngScreenshot from 2024-11-28 12-46-41.pngScreenshot from 2024-11-28 12-47-28.png

Dave Parker

unread,
Dec 3, 2024, 5:09:11 AM12/3/24
to PRISM model checker, Bilal Sarwar
Hi Bilal,

> *Calculate Total Lane Changes:*
> I have written reward expressions and properties to measure the
> total number of lane changes in the model. However, upon
> verification, the results consistently yield infinity, which appears
> to be incorrect. I would appreciate guidance on how to structure a
> proper reward expression to accurately capture the cumulative count
> of lane changes.

Using Rmax=?[C] is fine. If the result is infinite, it's because there
is a policy under which the Lane_Changed action is taken infinitely
often, within some loop. As an aside, assuming your model does have a
natural end point, you might it better to use Rmax=?[F "end"].

> *Calculate Total Acceleration Cost with Negative Rewards:*
> I am also looking to calculate the total acceleration cost by
> assigning negative rewards during states or transitions where
> acceleration occurs. I seek confirmation on whether this approach is
> appropriate or if there are alternative strategies to model and
> calculate acceleration costs effectively.

We don't usually allow negative rewards (because mixtures of positive
and negative rewards in cumulative properties are problematic. The
typical approach is to also model this cost as a (positive) cost and
then use an Rmin property to minimise it.

> *Calculate Time Steps Before Entering a Critical State:*
> Additionally, I aim to compute the number of time steps before the
> ego vehicle transitions into a critical state (e.g., from s=0 to
> s=7). However, when I use the PRISM simulator to observe the
> rewards, it only displays values at s=0 and s=7. I need guidance on
> how to properly define and capture rewards or properties to account
> for the intermediate states and the total time steps elapsed during
> this transition.

The reward structure looks fine, although note that the [] means it will
only accumulate rewards on transitions with no action labels. If this is
not what you want, either add additional rewards for the other actions,
or just remove the [] and make this a state rewards.

Hope that helps,

Dave

Reply all
Reply to author
Forward
0 new messages