Multivariate State Space

Leo Schirokauer

unread,

Nov 17, 2020, 12:24:03 PM11/17/20

to julia-pomdp-users

Hi!

I'm trying to implement a generative POMDP model to solve using the POMCP solver. My state space is a vector (representing proportion of people in different categories). I am having some trouble implementing the state space and belief state in terms of random vectors.

My gen function:

function gen(m::try1, s::Array{Int64,1}, a::Int, rng::AbstractRNG)

The state is an integer array. If the "wait" actions is chosen the vector changes according to a natural progression, if the "test" action is chosen the state does not change and an observation of the number of positive tests is returned, if the "treat" option is chosen a certain proportion of sick people are switched to the immune category.

I have defined an observation distribution as:

struct Try1Dist

s::Array{Int,1}

end

function POMDPs.pdf(d::Try1Dist, x::Int)

if x > 10

return 0.0

end

if x <0

return 0.0

end

h=Hypergeometric(d.s[2],d.s[1]+d.s[3],10)

p=0

for i=0:min(m,d.s[2])

p = p + pdf(h,i) * bin(x,i,m-i)

end

return p

end

function POMDPs.observation(p::try1,sp::Array{Int,1})

return Try1Dist(sp)

end

which gives the probability of an observation (number of positive tests) given a state (number of infected/uninfected people). I'm not completely sure how the observation distribution works so this may not be correct.

I am providing an initial belief state as a discrete multinomial, for example:

p = [.8,.1,.1,0]

mtn2 = Multinomial(100,p)

POMDPs.initialstate_distribution(m::try1) = mtn2

Using the particle filter code:

using ParticleFilters

i=0

filter = SIRParticleFilter(pomdp, 1000)

for (s,a,r,sp,o) in stepthrough(pomdp, planner, filter, "s,a,r,sp,o")

@show (s,a,r,sp,o,b)

i+=1

if i ==30

break

end

The problem I am having is that the solver does not seem to be accounting for the value of making observations and never choses the "test" option. The solver seems to be able to "treat" when there are more infected people, and generally follows a policy of "waiting" until there are a certain number of infected people and then "treating". However the solver never selects the "test" option even when the test gives perfect information about the number of infected people. Also I don't understand how the model is able to know when there are more infected people if it is never testing.

I'm not really sure how to troubleshoot this, or why the solver might never takes the "test" option?

Thanks!

Zachary Sunberg

unread,

Nov 17, 2020, 5:39:25 PM11/17/20

to julia-pomdp-users

Hi Leo,

Interesting, I think there could be several problems happening.

Here are some questions to help me understand if there might be problems with the code:

1. Are you using the POMCP solver from BasicPOMCP.jl?

2. What happens if you call `rand(d)` where `d` is a Try1Dist?

3. What entries does the `NamedTuple` returned by `gen` have? (In particular, since you have implemented `POMDPs.observation`, you probably should not return an observation in `gen`. That is redundant and may result in errors.)

Troubleshooting tips:

1. Try looking at one of the trees using D3Trees.jl: https://github.com/JuliaPOMDP/BasicPOMCP.jl#tree-visualization. In particular see how many observation branches there are after the first action. If there are a lot, there may be a particle depletion problem in POMCP.

2. Try implementing hand-designed heuristic policies with and without testing to see if testing actually does result in more reward.

3. Try looking at the beliefs calculated by the particle filter to make sure that is making sense.
(Feel free to ask for additional pointers)

Things that could be going wrong:

1. MCTS-based techniques like POMCP typically cannot perform that well from scratch. They usually need a decent rollout policy. Are you using a heuristic or just a random rollout policy.

2. If the observation space is large, the POMCP tree will have too large of a branching factor. This can be reduced with POMCPOW: https://arxiv.org/abs/1709.06196

Also, there is a `max_steps keyword argument for `stepthrough` :)

Good luck!

- Dr. Sunberg

Leo Schirokauer

unread,

Nov 18, 2020, 10:45:16 AM11/18/20

to julia-pomdp-users

Hi Dr. Sunberg,

Thanks so much for the help!

To answer your questions:

1. Yes, I am using the POMCP solver from BasicPOMCP

2. When I call rand(d) I get the error " ArgumentError: Sampler for this object is not defined "

3. My gen function is returning (sp=sp, o=o, r=r)

I based my implementation on the minimal example Light-Dark 1D to get the solver running. I was also wondering if you could briefly outline how to implement a hand-designed policy? I'm not quite sure how a policy is formatted in JuliaPOMDP, or where I would write one myself.

I have tried looking at belief states, but when I show the particle collection I get something like this:

"ParticleCollection{Array{Int64,1}}([[89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0], [89, 11, 0, 0] ..." where every particle seems to be the same. I'm was wondering if this is the right way to look at the belief states, or how I can see the weights of the particles that are making up the belief.

Thanks again!

-Leo

Zachary Sunberg

unread,

Nov 18, 2020, 12:49:48 PM11/18/20

to julia-pomdp-users

Ok, cool.

Here are my recommendations:

1. I would recommend just returning (sp=sp, r=r) from gen (POMDPs.jl will automatically get o from the observation function, and then you will never have to worry about whether gen and the observation function are consistent.

2. You should implement Base.rand(rng::AbstractRNG, s::Random.SamplerTrivial{Try1Dist}). See https://docs.julialang.org/en/v1/stdlib/Random/index.html#Generating-random-values-of-custom-types for more info.
3. To make a heuristic policy do something like

using POMDPPolicies: FunctionPolicy

function my_heuristic(b)

# b is the belief
# return an action

end

heuristic_policy = FunctionPolicy(my_heuristic)

Then you can use heuristic_policy wherever you would use the POMCP planner

Hmm. That is the correct way to look at the belief. In a ParticleCollection, all of the particles have the same weight (that is just because ParticleFilters.jl always does resampling for every update). I do not know why they all have the same value - I would look into that and try to debug. If you call rand on your initial state distribution, does it always return that same state?

- Dr. Sunberg

Zachary Sunberg

unread,

Nov 18, 2020, 12:51:43 PM11/18/20

to julia-pomdp-users

(see https://juliapomdp.github.io/ParticleFilters.jl/latest/beliefs/#Interface-1 for a list of functions that you can call on a ParticleCollection to see more info about it)

Leo Schirokauer

unread,

Nov 25, 2020, 4:34:25 PM11/25/20

to julia-pomdp-users

Dear Dr. Sunberg,

Thanks so much for the advice! Sorry for the delayed reply; I wanted to have a chance to spend some time debugging before I got back to you and also had several things due right before break.

I was able to resolve some of the problems but there are still a couple things I'm not sure are working exactly as intended. The belief states are no longer all the same and seem to be updating reasonably. However after looking at the belief states and trying some custom heuristics, it seems that the reason the solver never takes the "test" actions is that the solver is able to figure out the state quite well without using observations. Even when I break the observation distribution to return no meaningful information, the mean of the belief state particle collection is quite close to the actual state. It seems that just using the history of states and actions the solver can figure out the actual state relatively well, and that observations are just not that helpful. I was wondering if you had any suggestions on how to deal with this? I was wondering if you might be able to briefly clarify how exactly the belief state updater can be so accurate without observations? and what I might be able to adjust so that the solver is more reliant on the observations?

Thanks again!

All best,

Leo

Zachary Sunberg

unread,

Nov 27, 2020, 11:15:36 PM11/27/20

to julia-pomdp-users

Hi Leo,

Two possibilities come to my mind:

1) It may be that the dynamics of the system are fairly predictable without any observations.

2) You may be accidentally revealing information you didn't intend with your observations, even if the wait action is taking.

To disambiguate between these two, I would recommend running a bunch of simulations with an open-loop policy (that doesn't depend on the belief at all) and seeing if the state distributions resulting from these simulations have roughly the same distribution as your belief updated with observations under the same open-loop policy. If they do match your belief, then the dynamics are predictable without any observations. One of my students has been doing some simulations of virus spread, and he has noted that, for large populations, the trajectories are surprisingly predictable without any tests.

I included (2) just because I have accidentally done it myself.

If you want to change the disease model so that observations matter more, you can add more noise to the state transitions in the generative model.

Hope that helps! Let us know what you figure out.

- Zach

Leo Schirokauer

unread,

Dec 3, 2020, 4:26:30 PM12/3/20

to julia-pomdp-users

Dear Dr. Sunberg,

Thanks so much! After looking into it a bit more I determined that the model dynamics are just relatively predictable. However if I make the tests perfect then the mean(belief state) does always have the exactly correct number of cases, whereas when I make the tests less accurate the belief state is pretty close but is usually a little off.

Now that I think things are basically running correctly I was hoping you could make some suggestions about how to improve the performance of POMCP? Right now the basic POMCP is doing ok but I'm just not sure what sort of things to use to optimize since I haven't worked with any online solvers before. I also remember you suggested earlier that I implement a decent rollout policy, and I was hoping you could go into a bit more detail about how I can implement that?

Thanks you so much!

All best,

Leo

Leo Schirokauer

unread,

Dec 3, 2020, 9:14:35 PM12/3/20

to julia-pomdp-users

I also wanted to ask quickly about how to implement finite horizon POMDP's that have a fixed number of timesteps instead of a discount factor? Thanks!

Zachary Sunberg

unread,

Dec 4, 2020, 1:38:18 PM12/4/20

to julia-pomdp-users

Hi Leo,

> Now that I think things are basically running correctly I was hoping you could make some suggestions about how to improve the performance of POMCP? Right now the basic POMCP is doing ok but I'm just not sure what sort of things to use to optimize since I haven't worked with any online solvers before. I also remember you suggested earlier that I implement a decent rollout policy, and I was hoping you could go into a bit more detail about how I can implement that?

Yes, the most important things are (1) to have a good rollout policy or value estimate, and (2) to choose the exploration constant (c in the UCB equation) well. (2) I usually set c to be 2*(V_max-V_min) where V_max and V_min are rough estimates of the value of the optimal policy and worst policy, and that seems to work pretty well.

To accomplish (1), you need to use the estimate_value option of `BasicPOMCP`. You can either (a) provide a function that takes in the pomdp, state, belief node, and number of remaining steps (until the tree search limit) and returns an estimate of the value, or (b) use a rollout policy. To use the rollout policy, you need to create a policy object. I would recommend using POMDPPolicies.FunctionPolicy, which takes a function that receives a state and outputs an action. Then you can use an FORollout (FO stands for fully-observable because it uses a state-based policy, which is a bit optimistic, but works well in practice) with that policy for the estimate_value option.

> I also wanted to ask quickly about how to implement finite horizon POMDP's that have a fixed number of timesteps instead of a discount factor? Thanks!

There is currently not an automated way to do that (a stub is here: https://github.com/JuliaPOMDP/FiniteHorizonPOMDPs.jl. Hopefully someone (maybe you! ;)) will have time to implement it). For now, you have to encode the time in the state and make the problem terminate after a certain amount of time, but then you will also have to create custom distributions that include time, etc.

- Zach

Reply all

Reply to author

Forward