a question about IPPC 2014

Ping Hou

unread,

Oct 26, 2013, 2:36:45 PM10/26/13

to ippc-2014...@googlegroups.com

Dear Dr. Marek Grzes and Dr. Jesse Hoey,

I have a question about IPPC 2014. I find all IPPC 2011 domains are Finite Horizon MDP (FH). Because the augmented state space (s, t) of FH would not form cycles, so actually the FH augmented state space is a tree rather than a graph, and it is apparently unfair for all iterative style algorithm, such as VI, TVI, ILAO* and LRTDP. I am wondering if IPPC 2014 will use Infinite-Horizon Discounted-Reward MDP (IFH) or Stochastic Shortest-Path MDP (SSP)? Frankly said, I think Stochastic Shortest-Path MDP (SSP) domains are the most suitable because FH and IFH are just subclass of SSP.

Thank you very much!

Ping Hou

Scott Sanner

unread,

Oct 26, 2013, 11:30:45 PM10/26/13

to ippc-2014...@googlegroups.com

Hi Ping,

Thanks for your post... you have good questions.

> unfair for all iterative style algorithm

I agree that any finite horizon dynamic programming solution has to use more storage than its infinite horizon version to keep track of the horizon, but this did not keep such competitors from performing well in the IPPC 2011.

Note that VI inherently derives a finite horizon t-stage-to-go value and policy at iteration t. The only complication over infinite horizon VI is that you need to "keep" the value and policy for all iterations.

Kolobov and Mausam who placed 2nd in the IPPC 2011 with Glutton (quite close to the 1st place competitor) have a nice discussion of finite horizon LRTDP along with a "reverse" approach for improving it:

https://homes.cs.washington.edu/~mausam/papers/icaps12.pdf

Kolobov and Mausam also have some nice follow-on work to Glutton that I highly recommend reading.

===

The reason we do not use SSPs alone for the IPPC 2014 is that they are goal-oriented and we want a more general notion of reward (consider traffic, elevators, and many other domains from the IPPC 2011). I am aware of FH and IFH translations to SSPs, but the second problem with SSPs is their infinite (or indefinite) horizon nature.

The problem with infinite horizon objectives is that it is not clear how to evaluate them through simulation in the competition setting.

Finite time evaluation inherently requires some cutoff of time or decision steps, which inherently translates to a finite horizon. So a planner which plans in a way that is aware of the horizon cutoff can always do better than one that does not.

So this is why finite horizon was chosen for IPPC 2011... it aligned exactly with the only way we can evaluate in practice. If evaluation has to be finite horizon then the objective should be as well.

Cheers,

Scott

P.S. Competition planning is getting underway at this time... if there are other suggestions for what people would like to see in IPPC 2014 (evaluation, domains, etc), please post!

--
You received this message because you are subscribed to the Google Groups "IPPC 2014 (DISCRETE TRACK)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ippc-2014-discr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ping Hou

unread,

Oct 30, 2013, 6:50:57 PM10/30/13

to ippc-2014...@googlegroups.com

Thank you for your explanation!

--
You received this message because you are subscribed to a topic in the Google Groups "IPPC 2014 (DISCRETE TRACK)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ippc-2014-discrete/2yogijPK62Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ippc-2014-discr...@googlegroups.com.

Reply all

Reply to author

Forward