Marcus,
Thanks for sharing this - It looks like a lot of careful thinking went into your book!
I think your framework is quite different from mine. My framework does not compete with the vast literature from RL (or stochastic programming, optimal control, stochastic search, simulation-optimization, decision trees, bandit problems, etc etc etc) - it brings everything in these fields together. I show a single way of modeling any sequential decision problem, and claim (this is not a proof) that *any* method that might be used to design a policy for making decisions falls into one of four classes of policies (or possibly a hybrid of two or more). It includes whatever is currently being used in practice.
You note that "The major drawback of the AIXI framework is that it is incomputable". My framework is inherently computable, since a starting point is whatever someone (or company) might be doing now in the field. However, it also formalizes many ad-hoc approaches, such as a class of policies I call "cost function approximations" which are parameterized optimization problems. UCB policies are a very simple class of CFA; CFAs are also used to schedule airlines and plan energy generation for power grids. I can often take an ad hoc approach that is being used in the field, formalize it and describe an implementable path to make the ad hoc approach better (typically by tuning parameters that were previously set at some fixed value).
I offer a number of other insights such as differentiating pure learning problems to general dynamic problems (that may or may not include a belief state); different objectives such as final and cumulative reward; the use of different uncertainty operators (such as expectations versus risk measures). I think my treatment of lookahead policies offers some fresh ideas, and I show a new way of thinking about learning problems using a multiagent framework, which fixes what I believe is a major error being made by the POMDP community.
I think that anyone working on some form of sequential decision problem will find a place in my framework to represent their methods (and that should apply to your framework as well).
Note that in my book, Bellman's equation is the basis of just one of the four classes of policies. And while I treat policies based on value functions in considerable depth, I actually think it is likely to be the method that is least used in practice. I do not present Bellman in any depth until chapter 14 (but then I spend 5 chapters on it, so it is not as if I have overlooked it).
Final note - I have taught this material to undergraduates, as well as graduate students from very applied domains. I think sequential decision problems should be taught to everyone!
Warren
------------------------------
Warren B. PowellChief Analytics Officer, Optimal Dynamics
Professor Emeritus, Princeton University