General questions regarding the definition of POMDP for solving the decision-making task of a autonomous vehicle

Moritz Cremer

unread,

Jan 9, 2020, 10:36:23 PM1/9/20

to julia-pomdp-users

Hey there,

first of all: Thanks for this awesome framework! The examples I tried are working well and now I want to try to implement the decision-making task of a autonomous vehicle as a POMDP and solve it via one of the implemented solvers. I am quite new to Julia and POMDPs so sorry if trivial questions are asked or concepts are misconceived.

The result should be something similar to this master thesis, this doctor thesis or this paper, but also with lateral control and integrated Lanelet2-information.

Here are the key points regarding this:

Continuous state and observation spaces are preferred over discrete spaces

State space: Ego-state (distance along route, distance perpendicular to route, velocity in x, velocity in y) + 1-k traffic-object-state (distance along route, distance perpendicular to route, velocity in x, velocity in y, [hidden]route)
Observation space: Ego-state (distance along route, distance perpendicular to route, velocity in x, velocity in y) + 1-k traffic-object-state (x, y, velocity in x, velocity in y)

Therefore an online solver is needed, looking at the available solvers: BasicPOMCP, ARDESPOT or POMCPOW should do the job, correct?
Discrete Action Space is sufficient (I think/hope)

e.g. longitudinal control: longitudinal acceleration, e.g. {−4 m/s², −2 m/s² , 0 m/s², 2 m/s² }
e.g. lateral control: lateral acceleration, e.g. {−4 m/s², −2 m/s² , 0 m/s², 2 m/s² , −4 m/s²}
the planned actions will get converted to a rough (waypoint-)trajectory which get fed into a lower-level optimizer which executes the trajectory

Planning horizon should be around 5-20 seconds and planning frequency around 1hz, so 5-20 sequential actions (tree/search-depth?) need to be considered
Framework Lanelet2 is used to get infrastructure data and route from current position to target position (list of x,y-waypoints) and also possible routes for traffic-objects (which is a hidden variable in their state)
Communication with simulation software (CarMaker) via ROS

Julia package RobotOS is currently implemented to get ego-pose, traffic-object-poses, possible routes, etc. into Julia

Prediction of traffic objects is currently done outside of julia via lanelet-matching and prediction along possible routes

If I use the generative interface(documentation), I'm limited to the ARDESPOT and BasicPOMCP, correct?

Is it better to implement the problem first as a MDP to simplify the problem?

Is there maybe an another example of the generative interface(documentation) which deals with a more complicated case (continuous state/observation space)?

I had https://github.com/sisl/AutomotivePOMDPs.jl but this was too complex for me to understand in a reasonable amount of time..

Are there any hurdles that you see with this approach?

Zachary Sunberg

unread,

Jan 10, 2020, 6:11:41 PM1/10/20

to julia-pomdp-users

Hi Moritz, thanks for the post.

> Are there any hurdles that you see with this approach?

I think this should all be possible! (though it will take some work to get it all set up). I used POMDPs.jl for something similar a few years ago. This video, paper, and chapter of my thesis that address a similar driving problem were all done using POMDPs.jl. However, I would not spend too much time looking at the code for this, because it is definitely not release quality :)

If you get to any challenges that you think might be show-stoppers, let me know and I'll be happy to discuss them honestly with you, but I don't think it will be a waste of your time to try implementing this with POMDPs.jl.

Moreover, I think there is a lot of opportunity for improving the solvers (e.g. with low level parallelization in Julia 1.3), so if you find ways to create improved solvers, it can help all of us and easily be shared through anyone who wants to use the package.

The biggest question in my mind is the integration with the lanelet code. Everything needs to be very fast if it is used in the generative model. If you can integrate directly with the C++ code (e.g. with https://github.com/JuliaInterop/Cxx.jl), then it may be fast enough, but if you have to go through python, it will likely be too slow.

> Prediction of traffic objects is currently done outside of julia via lanelet-matching and prediction along possible routes

Can you elaborate on this a bit? Will the generative model need to make a call to an external library at every timestep?

I'll respond with some more details about your other questions later, but just wanted to give you some initial thoughts.

- Zach

Zachary Sunberg

unread,

Jan 14, 2020, 4:44:06 PM1/14/20

to julia-pomdp-users

Hi Moritz,

To answer a few more of your questions after my previous response:

> Therefore an online solver is needed, looking at the available solvers: BasicPOMCP, ARDESPOT or POMCPOW should do the job, correct?

Yes, probably an online solver would be best. BasicPOMCP and ARDESPOT will converge to a QMDP-like solution when there is a continuous observation space, see https://arxiv.org/abs/1709.06196. This is probably fine, but you should be aware that it will not take expensive info-gathering actions.

POMCPOW (or DESPOT-alpha, which unfortunately does not have a POMDPs.jl implementation yet) may be able to do better in the continuous observation space, but you will have to do some modification since it is a mixed-observability model. Ask if you are interested in the details of this.

> Discrete Action Space...

In my research, we used action spaces of size 10 - any larger than that and the trees might start to get too wide. Is the action space the cross product of the lateral and longitudinal? in that case it may be too big.

> Planning horizon should be around 5-20 seconds and planning frequency around 1hz, so 5-20 sequential actions (tree/search-depth?) need to be considered

I think this will be fine as long as you have a reasonable heuristic policy for the rollouts.

> If I use the generative interface(documentation), I'm limited to the ARDESPOT and BasicPOMCP, correct?

Strictly speaking, yes. You need a generative transition model and a explicit observation model (or at least POMDPModelTools.obs_weight) for POMCPOW, and you will probably have to customize the filtering a little bit because your model has mixed observability (so the observation model will have dirac deltas).

> Framework Lanelet2 is used to get infrastructure data and route from current position to target position (list of x,y-waypoints) and also possible routes for traffic-objects (which is a hidden variable in their state)

This is the biggest question in my mind. It sounds like these waypoints will need to be accessed from within the generative model. If that is the case, they need to be extremely fast because that will be called many thousands of times per second.

> Is it better to implement the problem first as a MDP to simplify the problem?

Yes, I would implement an MDP first.

> Is there maybe an another example of the generative interface(documentation) which deals with a more complicated case (continuous state/observation space)?

I would recommend getting started with the Quick(PO)MDPs interface (https://github.com/JuliaPOMDP/QuickPOMDPs.jl). It is much quicker to try things out on and can support almost everything that the base POMDPs.jl interface can. You should start with the simplest models you can think of and build up step by step. Maybe also take a look at https://github.com/zsunberg/VDPTag2.jl?

Hope that helps! Let us know if you have any other questions.

Moritz Cremer

unread,

Jan 22, 2020, 1:53:16 PM1/22/20

to julia-pomdp-users

Hey Zachary thanks for the response!

I think this should all be possible! (though it will take some work to get it all set up). I used POMDPs.jl for something similar a few years ago. This video, paper, and chapter of my thesis that address a similar driving problem were all done using POMDPs.jl.

Yes I already had a look at your video and your thesis, that's what made me interested in JuliaPOMDPs in the first place :) Especially the ability to process continuous observation space properly is really interesting.

Moreover, I think there is a lot of opportunity for improving the solvers (e.g. with low level parallelization in Julia 1.3), so if you find ways to create improved solvers, it can help all of us and easily be shared through anyone who wants to use the package.

The low level parallelization sounds really interesting, that would boost performance quite a bit I can imagine. Sometime in the future I will see what I can do.

If you get to any challenges that you think might be show-stoppers, let me know and I'll be happy to discuss them honestly with you, but I don't think it will be a waste of your time to try implementing this with POMDPs.jl.

I think my biggest problem is the lacking knowledge of Julia and the POMDP packages...

Yes, probably an online solver would be best. BasicPOMCP and ARDESPOT will converge to a QMDP-like solution when there is a continuous observation space, see https://arxiv.org/abs/1709.06196. This is probably fine, but you should be aware that it will not take expensive info-gathering actions.

POMCPOW (or DESPOT-alpha, which unfortunately does not have a POMDPs.jl implementation yet) may be able to do better in the continuous observation space, but you will have to do some modification since it is a mixed-observability model. Ask if you are interested in the details of this.

In your paper "Online algorithms for POMDPs with continuous state, action, and observation spaces" you seem to manage to run DESPOT with continuous observation space (see Table1 - Multilane - DESPOT without superscript D) . This isn't the standard DESPOT implementation, because here only discrete observations (uint64_t) are possible, which is quite inconvenient. Did you modify the DESPOT code to run with a continuous observation space? DESPOT-alpha can handle continuous observation space, but as you said yourself it does not have a POMDPs.jl implementation or for that matter any openly accessible implementation (There is only the original DESPOT and HyP-DESPOT available) .

In my research, we used action spaces of size 10 - any larger than that and the trees might start to get too wide. Is the action space the cross product of the lateral and longitudinal? in that case it may be too big.

Action space can be reduced to 10 (3x3 acceleration lat/long relative to planned route + 1 neutral) actions, if a bigger action space would cause problems. Complexity is exponential with action space size, correct? So my action space would be very similar to the action space in your thesis but all relative to the planned route (frenet-based).

I'm very interested in the POMCPOW solver! Right now I'm at the point where I'm thinking about switching from the DESPOT solver for C++ to the POMCPOW solver, because the discrete observation space is quite hindering.

Can you elaborate on the mixed-observability model-part?

I'm now pretty sure I define my problem similar to what is described in this paper and this paper(without occlusions for now). For example my state and observation spaces should be something like this (in C++):

typedef struct obj_state_space
{
    double s; // frenet coordinate s along route
    double d; // frenet coordinate d perpendicular route
    double v; // velocity of object
    uint route_id; // (hidden)
};

typedef struct ego_state_space
{
    double s; // frenet coordinate s along our planned route
    double d; // frenet coordinate d perpendicular to planned route
    double v; // ego velocity
};

class State
{
public:
    ego_state_space ego_state;
    std::vector<obj_state_space> obj_states;
}

////////////////////

typedef struct obj_observation_space
{
    double x; // observed/measured x coordinate of object in ego frame
    double y; // observed/measured y coordinate of object in ego frame
    double v; // observed/measured velocity of object
    // route is a latent variable and cannot
 be observed directly
};

typedef struct ego_observation_space
{
    double s; // observed/measured frenet coordinate s along our planned route
    double d; // observed/measured frenet coordinate d perpendicular to planned route
    double v; // observed/measured ego velocity
};

class Observation
{
public:
    ego_observation_space ego_observation;
    std::vector<obj_observation_space> obj_observations;
}

Then in the Observation model the observed/measured values are mapped to the state(ego-frame coordinates to frenet-Coordinates) and the probability of a route is calculated according to some features (v_ref and pos_error).

Lanelet2 in C++ is quite fast (underlying boost library), so this should be not a big problem I think.

Another question I have is: Is it possible to get all actions for the branch with the highest accumulated reward in one planning step? So not just the immediate action for the current state, but all actions till the planning horizon of the best branch. Background: If a suitable action(-sequence) is found by the solver (as I mentioned every 1 or 2 seconds), it gets converted to a physical trajectory (x,y,v points in the ego frame) which gets sent to a trajectory optimizer which then controls the vehicle actuators. This is also a problem I have with the DESPOT solver, which only outputs the immediate action.

I will try to come up with an example in Julia, maybe you can have a look at it.

Greetings,

Mo

Zachary Sunberg

unread,

Jan 23, 2020, 3:32:34 PM1/23/20

to julia-pomdp-users

Hi Mo

> This isn't the standard DESPOT implementation, because here only discrete observations (uint64_t) are possible, which is quite inconvenient. Did you modify the DESPOT code to run with a continuous observation space?

Yes, I used the Julia implementation https://github.com/JuliaPOMDP/ARDESPOT.jl. It does not require states to map to integers, so it will run with any state representation (it has been a while, but if I remember correctly, the integer states in DESPOT have something to do with memory management and comparison. Julia handles the memory part quite well, especially when the states are immutable and can be stack allocated, and we use hash maps for quick comparison). To convince ourselves that the julia implementation was about as good as the standard C++ implementation, we tested on the lasertag problem and got similar performance.

> Can you elaborate on the mixed-observability model-part?

The issue is that POMCPOW (and DESPOT-alpha) use O(o | a, s') to weight the particles. If part of the observation is an exact measurement of the continuous state, then O will be a dirac delta function, so the weight of essentially all particles will be zero. Possible solutions are 1) add artificial gaussian noise to the observations to smooth things out (this is the easiest to implement) or 2) slightly modify how POMCPOW handles beliefs - an example (that I will need to explain more if you decide to use it :)) is here: https://github.com/sisl/Multilane.jl/blob/master/src/pow_filter.jl

Note: this is the same issue you would have with a particle filter with a mixed observability model - if you can make a particle filter for it, you can make POMCPOW work.

> For example my state and observation spaces should be something like this

Nice! Yeah, in julia it would just be

struct ObjState # you may want to make this a StaticArrays.FieldVector
    s::Float64
    d::Float64
    v::Float64
    route_id::UInt
end

struct EgoState
    s::Float64
    d::Float64
    v::Float64
end

struct State
    ego::EgoState
    obj_states::Vector{ObjState} # you might be able to speed this up further by using a StaticArrays.SVector
end

...

I think that should work great

> Another question I have is: Is it possible to get all actions for the branch with the highest accumulated reward in one planning step?

Yes! You can get the entire tree structure with the action_info function. The code below will show the tree and best sequence for POMCPOW on the tiger problem (sorry, it is a little messy, we didn't really right a public API for interacting with the tree yet). To do this, you have to decide what observation branches to take of course. This snippet takes the most likely observation. You could do something similar for ARDESPOT. Keep in mind that this version of the tiger problem starts over if you open a door, so the action sequence is longer than just opening one door.

using POMDPs, POMDPModels, POMDPSimulators, POMCPOW, POMDPModelTools, Random

m = TigerPOMDP()

solver = POMCPOWSolver(tree_in_info=true)
planner = solve(solver, m)

b = initialstate_distribution(m)

function show_tree_and_best_sequence(planner, b)
    a, info = action_info(planner, b)

    tree = info[:tree]
    show(stdout, MIME("text/plain"), tree)

    bindex = 1
    a_sequence = actiontype(m)[]
    while !isempty(tree.tried[bindex])
        anodes = tree.tried[bindex]
        bnode = POWTreeObsNode(tree, bindex)
        best_anode = POMCPOW.select_best(MaxQ(), bnode, MersenneTwister(1))
        push!(a_sequence, tree.a_labels[best_anode])
        children = [pair[2] for pair in tree.generated[best_anode]]
        # find most likely observation
        bindex = children[argmax([tree.total_n[c] for c in children])]
    end
    @show a_sequence
end

show_tree_and_best_sequence(planner, b)

> I will try to come up with an example in Julia, maybe you can have a look at it.

Cool. I would advising making the first version as simple as you possibly can and then building more complexity into it.

- Zach

Mo Cremer

unread,

Jan 26, 2020, 4:20:18 PM1/26/20

to julia-pomdp-users

Hey Zachary,

thank you again for the helpful answer! Now I'm sure I will use the POMDPs.jl framework in my master thesis to solve the described problem.

The issue is that POMCPOW (and DESPOT-alpha) use O(o | a, s') to weight the particles. If part of the observation is an exact measurement of the continuous state, then O will be a dirac delta function, so the weight of essentially all particles will be zero. Possible solutions are 1) add artificial gaussian noise to the observations to smooth things out (this is the easiest to implement) or 2) slightly modify how POMCPOW handles beliefs - an example (that I will need to explain more if you decide to use it :)) is here: https://github.com/sisl/Multilane.jl/blob/master/src/pow_filter.jl

Ah okay, I think I understand. Certainly I'll go with option 1) if and when I user POMCPOW, which is a lot easier and makes more sense in my mind :) Applied to my problem: To make not just the route of an object uncertain, but also its location and ego location seems like a good idea. If I use a particle filter as a belief updater, how can I control how much uncertainty each variable in the state has / how much I trust each measurement(observation) value?

The sequence-function works great, thanks! If I have all other things sorted out, I'll certainly will use this to feed my action-sequence to the trajectory planner in the end.

If you don't mind, there are a few more questions:

Is it necessary or advantageous to define the boundaries of my continuous spaces? And if so, how would I do it if the spaces(state, observation) are structs?
How to convert observations/measurements, which I get from the CarMaker-Simulator to belief? Probably with the belief updater, correct?
How to get the first belief? Do I use the initialize_belief function for that?

Have a look at this script. There are a lot of things to do, but the crucial steps which are missing in my opinion are from line 296. Here I want to convert my observation to the belief or rather update my belief with the incoming observation, how would I do that? Do I have to define my belief with a PDF over the states?

Greetings Mo

Mo Cremer

unread,

Jan 26, 2020, 5:48:27 PM1/26/20

to julia-pomdp-users

So my problem seems to work with (and understand properly) the distributions - http://juliapomdp.github.io/POMDPs.jl/latest/interfaces/#Distributions-1

Maybe you have a hint for how to get it working

Message has been deleted

Mo Cremer

unread,

Jan 27, 2020, 5:20:04 AM1/27/20

to julia-pomdp-users

By the way, is it possible to edit posts here? The link for line 296 is for an old version..

Zachary Sunberg

unread,

Jan 28, 2020, 2:32:27 AM1/28/20

to julia-pomdp-users

Hi Mo,

Sorry for not answering earlier! I was quite busy today. Here are some quick answers.

> Is it necessary or advantageous to define the boundaries of my continuous spaces? And if so, how would I do it if the spaces(state, observation) are structs?

The boundaries are not needed by POMCPOW.

> How to convert observations/measurements, which I get from the CarMaker-Simulator to belief? Probably with the belief updater, correct?

Yes. You can either do your belief updates in Julia with a belief Updater, or, if you have an existing code for belief updates in another language, you can use that, and then convert it to something julia will understand and then call action(planner, b) where b is the object representing the belief

> How to get the first belief? Do I use the initialize_belief function for that?

No, you should only write initialize_belief if you are writing a new Updater. That function converts a distribution into an object that represents that distribution that is optimized for the updater. for example if the updater is a particle filter, it just takes samples from the distribution and creates a collection of particles. You can specify the initial belief for a problem with initialstate_distribution(::POMDP) or just specify it manually whenever you run a simulation.

> So my problem seems to work with (and understand properly) the distributions

> Do I have to define my belief with a PDF over the states?

No, you should not need to define the pdf. For POMCPOW to work, the distribution object only needs to have rand(rng, ::YourDistributionType) so it can be sampled from.

Maybe the easiest thing to start off with would be to just use a ParticleCollection from ParticleFilters.jl. Just fill it with some states and it should work as a distribution wherever you need one. You can also use the particle filter for belief updates if you want.

Hope that helps - this was a rushed reply, so it is probably not perfectly clear :/

I haven't looked at your script yet, but I may have time at some point if you link to the latest version.

Zachary Sunberg

unread,

Jan 28, 2020, 2:35:22 AM1/28/20

to julia-pomdp-users

> By the way, is it possible to edit posts here? The link for line 296 is for an old version..

It seems like you can edit them as long as you are logged in with the same account as you originally posted with (maybe you have two different accounts registered?)

Moritz Cremer

unread,

Jan 31, 2020, 11:00:52 AM1/31/20

to julia-pomdp-users

Hi Zachary,

No problem! I was engaged in some other stuff as well. I've worked on the problem a bit more, and this is the temporary result.

At the moment I'm using the generative interface and the ARDESPOTSolver. Single actions are generated and they seem valid, but I'm generating a dummy action sequence, because that's what needed in the trajectory optimizer. So the question is: How to get the action sequence with the ARDESPOT solver?

Your "show_tree_and_best_sequence" function works well with the POMCPOWSolver, but the ARDESPOTSolver doesn't have functions "POWTreeObsNode" and "select_best", so this does not work.

The "POMDPModelTools.obs_weight()"-method needs some work (pretty useless at the moment), as well as the observation generation in the "POMDPs.gen()". After that I will see how I can change to the POMCPOWSolver.

Is there anything you would add or change?

Greetings

Mo

sunbe...@gmail.com

unread,

Feb 5, 2020, 10:47:46 AM2/5/20

to julia-pomdp-users

Hi Mo,

Sorry again for the delayed reply! I had a vacation last weekend and have been on travel at the beginning of this week.

I think what you have so far looks pretty good - glad to see that you're making use of StaticArrays!

The DESPOT solver has a different internal tree structure so the function I wrote for POMCPOW will not work. I will try to take a stab at writing one for DESPOT today, but I might not have time to finish.

- Zach

Message has been deleted

Mo Cremer

unread,

Feb 10, 2020, 11:45:45 AM2/10/20

to julia-pomdp-users

Hey Zachary, I hope you had a nice vacation!

There are some big changes in my plans. Unfortunately the package RobotOS.jl (relies on python/PyCall.jl at the moment), which handles to messaging from and to the ROS-server seems not be very reliable. I got the POMCPOW-solver working (implemented the obs_weight function amongst other things), but the best decision making is useless if the actions are getting transmitted with a delay up to 3 seconds.

So I'm now switching to C++, for this reason and a couple others:

easy integration with the given software packages(e.g. Lanelet2, boost)
all other code is written in C++
familiarity with C++ instead of Julia

By the way, how many tree queries are expected to be evaluated with the gen-function you see in my code on a moderate PC with 1Hz (1 second max_time)? I got 5000-7000, so with a planning step of 0.5 seconds and the rather large action space (parameter mostly standard or tuned for tree depth rather than tree width), the solver never got more than 4-5 steps into the future.

I started to translate the source code of the POMCPOW-solver to C++ and adapted it a bit to fit my problem. I also left some stuff out, for example the checking of repeating observations, because they are continuous in my case.

You can see it here: https://github.com/autonomobil/POMCPOW-cpp

The only big thing I'm missing is the estimate_value function in solver2.jl. This refers to the standard BasicPOMCP (rollout.jl), correct? Can you maybe elaborate what exactly this function does with what inputs (where does "pomcp.solved_estimate" come from for example)? I know what the inputs (sp, POWTreeObsNode(tree,hao), d-1) mean... By default it performs a rollout simulation to estimate the value, correct? What exactly does a "rollout simulation" mean in this context? Do we simulate from our belief one step ahead?

Otherwise the C++ source code is building and it would be very nice to get POMCPOW working in C++, don't you think?

Mo Cremer

unread,

Feb 10, 2020, 1:27:41 PM2/10/20

to julia-pomdp-users

Ah there are some more questions:

The type CategoricalVector in belief.jl is a struct with member items(which is from type pair(State& reward) and cdf (Float64|double weight), correct?
What does "anode = length(tree.n)" in line 34 of solver2.jl do?

Zachary Sunberg

unread,

Feb 11, 2020, 6:14:26 PM2/11/20

to pomdps...@googlegroups.com

Hi Mo,

Cool! Glad to hear you are trying it out in C++. Disappointed that RobotOS.jl didn't work - I thought the pycall stuff was only for setting up connections and the rest was pure julia, but yeah, reliable communications are definitely absolutely necessary for this.

By the way, after publishing the paper about POMCPOW, I began to realize that there are things I would have done differently. First, I would get rid of the action progressive widening - it is just not very effective. It is better to consider a fixed number of actions. Second, instead of having progressive widening on the observations, I would just use sparse sampling and fix the number of observation children. This is essentially what POMCPOW ddid in our tests because the best alpha_o was like 1/100 https://arxiv.org/pdf/1709.06196.pdf#page=11.

> By the way, how many tree queries are expected to be evaluated with the gen-function you see in my code on a moderate PC with 1Hz (1 second max_time)? I got 5000-7000, so with a planning step of 0.5 seconds and the rather large action space (parameter mostly standard or tuned for tree depth rather than tree width), the solver never got more than 4-5 steps into the future.

This seems consistent with the performance I was seeing. In fact it looks like I only used 1000 iterations and it took about 0.5 seconds to plan each step. And yeah, the actual tree only gets 4 or 5 steps deep. This means that you have to have a good rollout policy / value estimate to do well. POMCPOW takes care of dealing with the uncertainty, but it can't make complicated plans far out into the future, which is a big weakness. DESPOT may be better for that (see the comparison here: https://arxiv.org/pdf/1709.06196.pdf#page=7).

> The only big thing I'm missing is the estimate_value function in solver2.jl. This refers to the standard BasicPOMCP (rollout.jl), correct? Can you maybe elaborate what exactly this function does with what inputs (where does "pomcp.solved_estimate" come from for example)? I know what the inputs (sp, POWTreeObsNode(tree,hao), d-1) mean... By default it performs a rollout simulation to estimate the value, correct? What exactly does a "rollout simulation" mean in this context? Do we simulate from our belief one step ahead?

Yes, it refers to the same thing as BasicPOMCP. `solved_estimate` is just named that way because you can give a `POMDPs.Solver` to the `estimate_value` keyword argument in `POMCPOWSolver`, then `solved_estimate` is performs a rollout with the `Policy` that results from running `solve` with that solver and the POMDP model. To understand rollouts, first it's important to note that when each leaf belief node is created, it only has one initial state particle in it. A rollout is just a simulation of that state particle with a rollout policy. That policy might "cheat" and observe the true state, or it might be based on the observation.

> 1. The type CategoricalVector in belief.jl is a struct with member items(which is from type pair(State& reward) and cdf (Float64|double weight), correct?

Yes

> 2. What does "anode = length(tree.n)" in line 34 of solver2.jl do?

It just means that the index of the action node that we are considering is the last element of the tree (I think we just added to the tree by increasing the length of tree.n)

Hope that helps!

Reply all

Reply to author

Forward