Example where a POMDP is not the natural dynamical model?

Pedro A Ortega

unread,

Nov 14, 2014, 12:16:11 AM11/14/14

to magic...@googlegroups.com

Dear all,

Today in a conversation with a colleague we were discussing the usefulness of
partially observable Markov decision processes (POMDPs) to model environments
in reinforcement learning problems.

One can argue that the most general way of modelling environments is in terms
of a controllable stochastic process ---implemented, say, by a probabilistic
Turing machine (with an appropriate input/output protocol)--- and that the POMDP
model is a standard that got adopted by the control and reinforcement learning
communities mainly due to historical reasons (e.g. Bellman's original approach based
on MDPs).

But another argument, which is perhaps the widely accepted one, is that
every environment can ultimately be reduced to a POMDP (possibly with
an infinite amount of states).

While such a reduction might be possible in principle, I feel that somehow this
view is missing the main point, but I fail to pinpoint exactly what it is.
For instance, a POMDP does not seem to be a natural model for representing
the prediction of the next prime number.

So my question is: do you know about any interesting example in the literature
where a POMDP is *not* the natural model to represent a complex environment?
(That is, I would like to exclude simple environments like bandits and their variants.)

Thanks!

Pedro

Bill Hibbard

unread,

Nov 14, 2014, 6:19:33 AM11/14/14

to magic...@googlegroups.com

Hi Pedro,

I discuss this issue in Section 4. (i.e., the introduction
to Chapter 4) of my book:
http://arxiv.org/abs/1411.1373

POMDPs generally do not support the idea of creating
functions or subroutines to express logic common to
multiple environmental states, and hence cannot adequately
express algorithmic complexity. For example, the shortest
program generalizing from a list of prime numbers will
include a function to multiplying integers and other
functions useful for defining primes, and use these to
predict the next prime.

Best wishes,
Bill

Laurent

unread,

Nov 14, 2014, 6:35:33 AM11/14/14

to magic...@googlegroups.com

Finite POMDPS are basically Hidden Markov Models with an action-observation loop [1]. As such, their expressive power is very limited.

They are the lowest level in Chomsky's hierarchy [2], only with uncertainty and actions. This means that they can't model context-free languages and above. For example, a (finite) POMDP solver cannot learn to take action UP as many times as it has seen GREEN in the past, or repeat an unbounded pattern in reverse (this requires a stack).

Sure, if you require a memory bound on these languages, a POMDP solver can learn all the different cases up to that bound, but in general it will not be able to generalize well based on a subset of these, i.e., it is not data efficient (at all).

Laurent

[1] http://www.pomdp.org/faq.shtml

[2] https://en.wikipedia.org/wiki/Chomsky_hierarchy

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
--- You received this message because you are subscribed to the Google Groups "MAGIC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magic-list+unsubscribe@googlegroups.com.
To post to this group, send email to magic...@googlegroups.com.
Visit this group at http://groups.google.com/group/magic-list.

Joel Veness

unread,

Nov 14, 2014, 6:37:23 AM11/14/14

to magic...@googlegroups.com

Hi Pedro,

I think playing Atari games using ALE: http://arxiv.org/pdf/1207.4708.pdf is a good example. I think the PSR literature has some examples too.

My opinions:

I think the POMDP model really only works well for formulating control problems in *known environments* with *limited feedback/sensors*, provided you *know the initial state distribution*.

If these 3 conditions aren't met, I think the POMDP model is not helpful/unnecessary. In particular, I am not at all a fan of the quite commonly used Bayes Adaptive MDP approach, which essentially views the task of learning an MDP as a kind of POMDP, where your initial state distribution plays the role of a prior over your parameters. For learning+planning, I much prefer the controllable stochastic process view (as defined by Marcus in his AIXI book) -- in my opinion, the minimal assumptions let you be more flexible in terms of what modelling approaches you can consider.

Cheers,

Joel

--
Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
---
You received this message because you are subscribed to the Google Groups "MAGIC" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magic-list+...@googlegroups.com.

Bill Hibbard

unread,

Nov 14, 2014, 9:06:31 AM11/14/14

to magic...@googlegroups.com

Just to make clear that a memory bound and
data efficiency are two different issues.

A memory bounded program in a procedural
language can efficiently represent data up
to its memory bound (e.g., 10^124 bits).
But it cannot represent data that exceeds
its memory bound and will not be able to
generalize beyond its bound.

Translating a memory bounded program (in a
procedural language) into an MDP
representation as a set of states and
probabilities of state transitions will be
data inefficient.

Cheers,
Bill

>> email to magic-list+...@googlegroups.com.

>> To post to this group, send email to magic...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/magic-list.
>>
>

> --
> Before posting, please read this: https://groups.google.com/forum/#!topic/magic-list/_nC7PGmCAE4
> ---
> You received this message because you are subscribed to the Google Groups "MAGIC" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to magic-list+...@googlegroups.com.

Pedro A Ortega

unread,

Nov 16, 2014, 5:56:58 PM11/16/14

to magic...@googlegroups.com

Dear all,

Thank you very much for your insightful answers and the excellent references.

@Laurent: I was also thinking about an argument along the lines of the
Chomsky hierarchy, but the fact that the number of states of a POMDP is
allowed to be infinite is somehow confusing because I didn't know where
to place this computational model within the hierarchy.

@Bill: Thank you for the link to your book, it is very interesting.

@Joel: I agree with your observations about the requirements for the POMDP
model to make sense. BTW, I just installed ALE it and it works great!

Best wishes,

Pedro

Reply all

Reply to author

Forward