Transition probabilities, and observation probabilities that depend on decision epoch, action history and observation history

anirud...@gmail.com

unread,

Jan 2, 2019, 11:07:47 AM1/2/19

to julia-pomdp-users

Dear All,

I am new to POMDPs and I am trying to use one for making biopsy decisions in prostate cancer screening schedules. From what I understood so far from the examples given on GitHub, is that transition probability matrices need to be defined on the basis of current state and action pair. However, in my case I require knowing the time at which this action has to be made, I also require knowing the history of actions and observations. That is my transition probability function is not just p(s_{t+1} | s_t, a_t), but is rather p(s_{t+1} | s_t, a_0, a_1, ..., a_{t-1}, a_t, y_0, y_1, ... y_t), where s_{t+1} is the state at next decision epoch t+1, and s_t is the state at the current decision epoch, a_0, a_1, .... a_{t-1} constitute action history up to time t, and a_t is potential new action, and y_0, y_1, .... y_t constitute observation history up to time t.

Is it possible to get access to these histories while defining my function for transition probability matrix or matrix for observation probabilities? Currently as given in the TigerPOMDP example (below), it seems only current action and state pair are available.

function POMDPs.transition(pomdp::TigerPOMDP, s::Bool, a::Symbol)
    if a == :open_left || a == :open_right
        # problem resets
        return BoolDistribution(0.5) 
    elseif s
        # tiger on the left stays on the left 
        return BoolDistribution(1.0)
    else
        return BoolDistribution(0.0)
    end
end

Thank you,

Ani

Mykel Kochenderfer

unread,

Jan 2, 2019, 1:40:48 PM1/2/19

to julia-pomdp-users

Hi Ani,

Sounds like a neat application! You can add time as a state variable and update it appropriately in your transition model. You can even add your full transition history to the state, but that will likely make your state space explode, making it infeasible to use many of the offline methods. However, if you are using an online method like POMCP, this should be okay.

Best wishes,

Mykel

Zachary Sunberg

unread,

Jan 2, 2019, 1:55:19 PM1/2/19

to julia-pomdp-users

Hi Ani,

Thanks for posting your issue. Glad to hear that you're considering using our software for your research! We have chosen to stick to a strict problem definition where the transition distribution depends only on the state and action in the core POMDPs.jl interface.

Of course, you can augment the state to include the action and observation history in it, but, as Mykel said, this will make the state space larger. Another researcher at Stanford, Iakovos Toumazis has solved a similar problem with Julia, and he elected to use a custom value iteration algorithm that took advantage of the structure (similar to yours), so he may be a good person to talk to. My feeling is that, for these problems, the quickest way to an offline solution would be to augment the state space and use SARSOP. I am fairly confident that it could actually handle the larger state space if the problem time is not too large. If the problem is too big for SARSOP, I think the best online solution technique would be ARDESPOT since it gives upper and lower bounds, so you can verify that it has been solved to optimality for the current belief.

I have noticed that healthcare problems like this often have a similar action- and observation- dependent structure, so, if you are able to solve it by converting to the standard POMDPs.jl interface, it might be useful to others to contribute a package that automates this - just a thought.

Best of luck!

- Zach

Zachary Sunberg

unread,

Jan 2, 2019, 2:00:01 PM1/2/19

to julia-pomdp-users

Another note: the state space augmented with the action and observation history would be a MOMDP (mixed observability MDP). It would be extremely valuable to add MOMDP support to POMDPs.jl - we just have not had the bandwidth to add it and properly support it yet. I think SARSOP could almost certainly handle the augmented-state problem if the problem were represented as a MOMDP. Contact us if you're interested in working on this.

anirud...@gmail.com

unread,

Jan 3, 2019, 7:41:12 AM1/3/19

to julia-pomdp-users

Thank you Zachary and Mykel for answering my queries.

I would like to briefly explain a bit about my problem before I respond to your suggestions. I want to decide whether to biopsy a patient or not. Patient can be in a state of cancer or no cancer, both not directly observable. Biopsies are required to be to done many times until cancer is detected. Although cancer cannot be observed, what can be observed is blood PSA levels (positive real number) and the last time a biopsy was done. The observation (PSA) function depends on time of last biopsy as well as previous PSA values. I need to know the action history, because it will give me the time of last biopsy.

I thought over your suggestion of adding time as a state variable. That actually indeed helps in expressing my problem in Julia's existing format. I read a bit more and like you said, the state will be a tuple of unobservable cancer/no cancer and observable current time. This will be a MOMDP. I have a few more caveats in my problem like no two biopsies unless there is a gap of 1 year between them. So it seems I may have to extend what is availble in julia-pomdp to fulfill my requirements. I guess this amounts to extending your existing solvers? I do not wish to implement everything by scratch in R (I am a biostatistician) as that may not be as fast and efficient as existing implementations in julia-pomdp. I am of course more than happy to contribute to julia-pomdp, so that other medical problems having similar problem structure can utilize the code. HopefullyI can also write a R interface for julia-pomdp if it is not currently available (seems like there is one for the julia language though).

I am however worried about two things: firstly my observations are positive real numbers, which makes belief space infinitely large, and I am not very sure how julia-pomdp will handle this. My second question is how does remembering action history complicates things? Does it lead to any computation issues? can it be done easily in julia-pomdp?

Regarding Mykel's comment about online solution methods, I am not aware of them so Ii will have to read up a bit more to understand what he means.

Thank you once again for all the suggestions,
Ani

Zachary Sunberg

unread,

Jan 3, 2019, 1:11:29 PM1/3/19

to julia-pomdp-users

Hi Ani,

I have a few more caveats in my problem like no two biopsies unless there is a gap of 1 year between them.

My second question is how does remembering action history complicates things? Does it lead to any computation issues?

I don't think this would actually be a disastrous computational burden - you don't need to remember the entire action history, just the gap since the last biopsy.

So it seems I may have to extend what is availble in julia-pomdp to fulfill my requirements. I guess this amounts to extending your existing solvers?

If you want to actually change the action space based on the history, that is not currently implemented (https://github.com/JuliaPOMDP/POMDPs.jl/issues/226), but I would be happy to work with you on adding it. It should be easy to add this to the online solvers; I'm not as sure about the offline ones.

A quick hack that would not require any modification would be to remember the time since the last biopsy in the state and define the reward for an invalid action to -Inf, or a large negative value. I think this should work out-of-the-box with most solvers.

my observations are positive real numbers, which makes belief space infinitely large, and I am not very sure how julia-pomdp will handle this.

This is a somewhat nuanced issue. Our paper here addresses the issue for *online* solvers: https://arxiv.org/abs/1709.06196, and you may be able to find some leads about offline solvers in the prior work section. The most similar problem to yours in that paper is the Light Dark problem (Section 5.2). The bottom line is this: Since your observation space is one-dimensional, the results for Light Dark in Table 1 suggest that discretization of the observation space (POMCP^D and DESPOT^D) will work well for the problem (I believe that this result will also be valid for offline solutions).

The picture for *offline* solvers for problems with small state and action spaces, and continuous observation spaces is much less clear in my mind. I think that, since most offline solvers use belief-space sampling anyways, it may be easy to extend them to continuous observation spaces. There may be an opportunity for research there. See also https://www.ijcai.org/Proceedings/05/Papers/1376.pdf and https://www.aaai.org/ocs/index.php/ICAPS/ICAPS15/paper/view/10491/10422 (neither of those approaches are currently implemented in JuliaPOMDP though).

If you want to talk more in depth about this, I would be happy to Skype (I am usually on west coast time in the US).

- Zach

P.S.

HopefullyI can also write a R interface for julia-pomdp if it is not currently available (seems like there is one for the julia language though).

That would be cool! I'm assuming you would limit it to tabular discrete POMDPs, or do you think you could accommodate a more general class?

anirud...@gmail.com

unread,

Jan 4, 2019, 9:36:14 AM1/4/19

to julia-pomdp-users

Thank you for the links and explanation Zachary,

I went through your paper and chapter 4 and 6 from Mykel's book. The POMCP and the derivates/extension algorithms seem quite powerful. I do have a generative model available for my data with a joint probability distribution of the observation and next state. Before I start in a particular direction, say using an online/offline solution I need to define and translate my model into an MDP correctly. Thank you for your offer to talk in more depth over skype. I will take this offer. Is it ok if I contact you in a couple of weeks? I need the time to get a better understanding of your online monte carlo approaches, run some pomdp-julia code and have to also discuss with my advisor.

Regarding this:

A quick hack that would not require any modification would be to remember the time since the last biopsy in the state and define the reward for an invalid action to -Inf, or a large negative value. I think this should work out-of-the-box with most solvers.

You mean like an MOMDP, where time since last biopsy is the observable part of the state, as well as the current time is the observable part of the state. So a state will then be a tuple of 3 items, current time, time since last biopsy and unobservable cancer status. Am I thinking wrong?

Thank you once again,
Ani

Zachary Sunberg

unread,

Jan 5, 2019, 11:59:44 AM1/5/19

to julia-pomdp-users

Hi Ani,

> Is it ok if I contact you in a couple of weeks?

Sounds good.

> You mean like an MOMDP, where time since last biopsy is the observable part of the state, as well as the current time is the observable part of the state. So a state will then be a tuple of 3 items, current time, time since last biopsy and unobservable cancer status. Am I thinking wrong?

Yeah, this is the formulation of the state that I was imagining. The "hack" that I was talking about is to make the reward for taking an invalid action -Inf. This is an alternative to making the actual action space depend on the history.

- Zach

Reply all

Reply to author

Forward