POMDPs.jl: Using Belief State in Reward Function

119 views
Skip to first unread message

mmc...@gmail.com

unread,
Dec 4, 2017, 4:43:59 PM12/4/17
to julia-pomdp-users

We're working on our final project for AA228, and we'd like to find a way to make our reward a function of the current belief state. Is there any way to do this in POMDPs.jl?

 

As background, our problem is robot scent search, where we are trying to localize a stationary person in a grid world. We currently have our reward function implemented as the inverse of the distance between the robot and the person. Instead, we want to encode the fact that the robot does not actually know where the person is, so we want our reward function to be based on the distance between the robot and several samples from the belief state of where the person might be.

Zachary Sunberg

unread,
Dec 4, 2017, 5:56:48 PM12/4/17
to julia-pomdp-users
Hi, thanks for posting this issue here!

Reward as a function of the belief state for a POMDP is not supported in POMDPs.jl because the mathematical definition of a POMDP specifies that the reward is a function of the state and action. The reward for a belief is implicitly defined as the expectation of the reward given that the states are distributed according to the belief.

I don't think this will be an issue for you though. If you define the reward as the inverse of the distance between the robot and person in the current state, then the solver will attempt to minimize the expectation of this with respect to the belief, so it will automatically try to reduce uncertainty in the belief so that it can get close to the target. If you run enough monte carlo simulations*, you should be able to evaluate how well the solver is reducing uncertainty.

If you want to use something other than the expectation as your objective, you have several options, but they are outside the realm of POMDPs

1) switch to a rho-POMDP formulation. https://papers.nips.cc/paper/3971-a-pomdp-extension-with-belief-dependent-rewards Your CA Louis Dressel is an expert on this.
2) switch to a belief MDP*. The state of a belief MDP *is* the belief from the original POMDP, so you can define the reward as any function of the belief.

- Zach

*e.g. with the parallel simulator - search for "parallel" in the POMDPToolbox readme
*you can use GenerativeBeliefMDP from POMDPToolbox for this if you want to use MCTS to solve it

ganlu...@gmail.com

unread,
Dec 4, 2017, 11:55:31 PM12/4/17
to julia-pomdp-users
How do we address this issue with the parallel simulator in the POMDPToolbox? Is Sim not be defined in parallel.jl?

solver = POMCPSolver()
policy = solve(solver, pomdp)
q = [] # vector of the simulations to be run
push!(q, Sim(pomdp, policy))

UndefVarError: Sim not defined

Stacktrace:
 [1] include_string(::String, ::String) at .\loading.jl:515

Zachary Sunberg

unread,
Dec 5, 2017, 12:05:35 AM12/5/17
to julia-pomdp-users
oops - I forgot, the parallel simulation stuff has not been included in a registered release of POMDPToolbox. There should be a new registered release soon, but you can get the latest master version with the parallel simulator now by running Pkg.checkout("POMDPToolbox") (see https://docs.julialang.org/en/stable/manual/packages/#Checkout,-Pin-and-Free-1) for more info.

ganlu...@gmail.com

unread,
Dec 7, 2017, 12:55:33 AM12/7/17
to julia-pomdp-users
Thanks! Is there a way to set the seed for the random number generator such that it samples the same initial state for each simulation in run_parallel? I set the rng=MersenneTwister(1), but it still sampled a different initial state for each simulation. We want to use the same one to be able to compare them.

ganlu...@gmail.com

unread,
Dec 7, 2017, 1:00:02 AM12/7/17
to julia-pomdp-users
Never mind, I found the initial_state argument in the code. Thanks!

Zachary Sunberg

unread,
Dec 7, 2017, 1:28:47 PM12/7/17
to julia-pomdp-users
Hmm... this shouldn't have happened - it seems to be a bug. If you have a chance, can you post a link to your code so that I can fix it? Thanks!

ganlu...@gmail.com

unread,
Dec 7, 2017, 5:51:16 PM12/7/17
to julia-pomdp-users
https://gist.github.com/ganlucia/36a49427c337b870e908893dc00409e3

Line 221 was the issue but I was able to solve it with Jayesh this morning. Thanks!
Reply all
Reply to author
Forward
0 new messages