Transition function/Reward function dependent on next states

hifza javed

unread,

Jan 17, 2019, 3:18:50 PM1/17/19

to julia-pomdp-users

Hi,
I see from the sample code here and here that the state transition function in the library are independent of s'. Can you point me to relevant documentation or sample codes where the two functions are defined as as functions of previous state, action as well as current state?
I hope to use this library to create a human behavior model when interacting with a robot (using a POMDP). In this case, any of the 7 robot actions I'm currently working with can result in any given transition from s to s' (where the number of states are also 7) with equal initial likelihood. Do you have any suggestions for which model (explicit vs generative) to use in such a case to make the simulation easier to achieve?
Thanks!

Zachary Sunberg

unread,

Jan 17, 2019, 3:27:46 PM1/17/19

to julia-pomdp-users

Hi Hifza,

Here is an example where the reward depends on sp: https://github.com/JuliaPOMDP/POMDPModels.jl/blob/1b227e94fef58d810f2e77ce65c2fdd80e2f5690/src/InvertedPendulum.jl

I think if you have a small discrete state and action space, it will be best to implement an explicit model making use of the SparseCat distribution: https://juliapomdp.github.io/POMDPModelTools.jl/latest/distributions.html . see here for a usage example of SparseCat: https://github.com/JuliaPOMDP/POMDPModels.jl/blob/master/src/gridworld.jl (that is an MDP, but you should be able to extrapolate to a POMDP setting.

Hope that helps! Let us know if you have more questions.

- Zach

hifza javed

unread,

Jan 18, 2019, 3:09:16 AM1/18/19

to julia-pomdp-users

Thanks for the information.

What happens when, say, the state and action spaces expand to 20 states and 30 actions. Would it still be suitable to be the same solver and the explicit model?

Zachary Sunberg

unread,

Jan 18, 2019, 12:30:17 PM1/18/19

to pomdps...@googlegroups.com

Many solvers should be able to handle 20 states.* I have not experimented much with larger action spaces, so I don't know how well they would perform with 30 actions. Probably some can handle it well.

I still think implementing an explicit model would be the way to go. Since POMDPs.jl automatically synthesizes the appropriate generative methods if it is provided with an explicit definition, an explicit model can be used with *any* solver, while a generative model can only be used with some.

*probably the best solver for a problem this size would have to be SARSOP, for which you'd have to calculate R(s,a). You can calculate the expected R(s, a) from R(s,a,s') automatically using the following code:

const rdict = Dict{Tuple{S,A}, Float64}()

for s in states(m)
  for a in actions(m)
    r = 0.0
    td = transition(m, s, a) # transition distribution for s, a
    for sp in support(td)
      r += pdf(td, sp)*reward(m, s, a, sp)
    end
    rdict[(s, a)] = r
  end
end

POMDPs.reward(m, s, a) = rdict[(s, a)]

hifza javed

unread,

Jan 18, 2019, 2:51:31 PM1/18/19

to julia-pomdp-users

That makes sense, thanks.

If I were to choose an online solver, do you think BasicPOMCP would be suitable?

I'm new to Julia, I actually only started looking into it so I could use this library. I defined my own POMDP model, placed in inside the ./POMDPModels/oR8C3/src directory, where the rest of the model files are placed. I modified the POMDPModels.jl file to export my custom model. In another file, I wrote the following code to run a simulation and obtain a policy, but I get this error: LoadError: MethodError: objects of type Module are not callable in expression starting at /Users/*****/Desktop/test1:3 top-level scope at none:0

using POMDPs, POMDPModels, POMDPSimulators, BasicPOMCP, POMDPPolicies, POMDPModelTools, POMDPSolve m = HRIModel() solver = BasicPOMCP() policy = solve(solver, m) belief_updater = updater(policy) history = simulate(HistoryRecorder(max_steps=10), m, policy, belief_updater) for (s, b, a, o) in eachstep(history, "sbao") println("State was $s,") println("belief was $b,") println("action $a was taken,") println("and observation $o was received.\n") end println("Discounted reward was $(discounted_reward(history)).")

I'm not sure if this is the right way to go about things. My apologies if this is too basic a question or if instructions about this are already included in the documentation. I was unable to find anything in the tutorials though, so any help would be appreciated!

Zachary Sunberg

unread,

Jan 18, 2019, 3:32:06 PM1/18/19

to julia-pomdp-users

BasicPOMCP is the easiest to get up and running, but ARDESPOT will probably yield the best results. (Good performance of POMCP requires at least the tuning of the UCB exploration constant c).

The reason for the error is this: BasicPOMCP is a module; the solver type from that module is POMCPSolver. So, you need to replace

solver = BasicPOMCP()

with

solver = POMCPSolver()

I think that should run, but you will probably have to tune the parameters to get good performance. I'd recommend starting with a c parameter of twice the maximum reward you would expect to see in the future. Make sure to look at the tree (https://github.com/JuliaPOMDP/BasicPOMCP.jl#tree-visualization) to see what it is thinking.

Reply all

Reply to author

Forward