IPPC 2014 Final Results Analysis (and 2011 Comparison)

69 views

Skip to first unread message

ssanner

unread,

Oct 3, 2014, 4:05:12 AM10/3/14

to ippc-2014...@googlegroups.com

All,

One major purpose of the IPPC 2014 is to track progress between competitions. It's taken me a little while but I've finally gotten around to this analysis.

I've been looking at the raw (unnormalized average cumulative reward) results posted on the IPPC 2014 website

https://cs.uwaterloo.ca/~mgrzes/IPPC_2014/

https://cs.uwaterloo.ca/~mgrzes/IPPC_2014/IPPC_2014_Data.tgz

to try to understand:

(1) How the best results improved from IPPC 2011 to 2014 on the four problems (elevators, traffic, crossing_traffic, skill_teaching) common to both competitions.

(2) How the top two competitors in IPPC 2014 compared to earlier versions of their planners in IPPC 2011 on the four problems common to both competitions.

(3) How the MDP planners did on the triangle_tireworld domain reused from IPPC 2008 compared to planners in the IPPC 2008.

Indicative results and some explanations are below, but please do not read *too* much into them for the following reasons:

- The change of time limits from 24 hours to complete all instances in 2011 vs. 18 minutes per instance in 2014 make it hard to compare 2011 and 2014 results directly.

- Client machines were more powerful in 2014 (EC2 m3.large) than in 2011 (EC2 m1.large) and presumably both were more powerful than what was used in 2008 (2.4Ghz, 4GB RAM, Intel Core2 Quad CPU Q6600).

- Considerable differences in the IPPC 2008 and 2014 settings for triangle tireworld (detailed discussion on this at end of email).

Some insights regarding the above questions:

(1) Overall results on most domains improved substantially from 2011 to 2014... there has been excellent progress in three years!

(2) Planners did generally self-improve on their previous versions from 2011 to 2014.

(3) Differences in IPPC 2008 and IPPC 2014 make it difficult to draw significant conclusions from triangle_tireworld. It can be said that some of the IPPC 2008 planners with high success (coverage) on these domains like HMDPP (Keyder & Geffner) were solving the larger problems (ones that the IPPC 2014 planners did not solve), but with longer horizons than needed (they were averaging >40 steps -- even when they could have solved the problems in <40 steps). However, the IPPC 2014 version lacked preconditions so the problem specifications were massive and most actions were useless thus making the IPPC 2014 problems much harder. It's not clear to me whether the IPPC 2014 planners were more optimal than the IPPC 2008 planners on the problems they could both solve -- this is my hypothesis, but I would have to look deeper at the raw numerical results for IPPC 2008 to confirm it. Obviously this is not an apples-to-apples comparison so take the above analysis only as indicative. To draw better comparative conclusions, illegal actions should be removed from the IPPC 2014 problems (by adding state-action-constraint preconditions and then re-translating) and the horizon limit increased to account for differences in the two competitions.

And an interesting observation:

- While generally all POMDP versions of domains should be harder than the MDP versions, this apparently is *not* the case for traffic_inst_10. Perhaps an easy case was randomly generated for POMDP instance 10. However, another interesting possibility is that the POMDP versions only got to see stopline presence of cars which is probably the most important observation for traffic signal control -- hence POMDP planners may have simply done better than the MDP versions by having a relatively small but useful observation space (compared to the large state space). UCT extensions to POMDPs such as POMCP should be able to exploit this, since as I recall, there is no explicit representation of state in the UCT tree -- branches just alternate between actions and observations. I should look more into this at some point to confirm it. If true, it would suggest an interesting extension for the MDP UCT algorithms -- first create a useful subset of observations for an MDP (somehow) and then apply POMCP to the resulting POMDP instead!

Notes:

- Planners are maximizing reward so higher is better.

- The value -40 appears for stochastic shortest path problems (crossing_traffic, triangle_tireworld in IPPC 2014 -- both with a cost of -1 per action) when a planner could not achieve the goal within a horizon of 40 steps.

- For reference, all domains can be viewed online here:

https://code.google.com/p/rddlsim/source/browse/#svn%2Ftrunk%2Ffiles%2Ffinal_comp_2014%2Frddl

- IPPC 2008 results are here (second link contains graphs):

http://ippc-2008.loria.fr/wiki/index.php/Results.html

http://ippc-2008.loria.fr/wiki/images/0/03/Results.pdf

Cheers,

Scott

MDP -- 2011 vs. 2014 for repeat competitors on hardest instance (__10) of four common domains

2011	2014	2011	2014
Glutton	G-Pack*	PROST**	PROST
-32.07	-32.07	-36.17	-17.1	crossing_traffic_inst_mdp__10
-69.98	-61.51	-82.16	-58.85	elevators_inst_mdp__10
-301.21	-132.29	-220.56	-170.16	skill_teaching_inst_mdp__10
-171.6	-133.93	-187.91	-129.97	traffic_inst_mdp__10

*G-Pack built on Glutton from 2011.

**For a fair comparison under 2011 settings, these are results from PROST in the actual IPPC 2011 (not PROST-2011 in IPPC 2014 which contained modifications for time-out management due to timing rule changes in IPPC-2014).

POMDP -- 2011 vs. 2014 for repeat competitors on hardest instance (__10) of four common domains

2011	2014	2011	2014
KAIST-AI	KAIST-AI	NUS-POMDPX	NUS-POMDPX
-40	-40	-34.67	-33.5	crossing_traffic_inst_pomdp__10
-151.71	-127.65	-128.16	-108.38	elevators_inst_pomdp__10
-528.14	-447.2	-304.8	-218.45	skill_teaching_inst_pomdp__10
-52.7	-86.57	-91.8	-40.23	traffic_inst_pomdp__10

MDP -- triangle tireworld vs. IPPC 2008 results for top two MDP competitors

2014	2014
G-Pack	PROST
93.5	93.7	triangle_tireworld_inst_mdp__1
94.27	94.23	triangle_tireworld_inst_mdp__2
86.63	76.97	triangle_tireworld_inst_mdp__3
88.1	76.73	triangle_tireworld_inst_mdp__4
76.13	70.93	triangle_tireworld_inst_mdp__5
79.23	72.4	triangle_tireworld_inst_mdp__6
-36.3	-1.57	triangle_tireworld_inst_mdp__7
-36.43	30.13	triangle_tireworld_inst_mdp__8	optimal is ~72.8, 27.2 steps to goal
-40	-40	triangle_tireworld_inst_mdp__9	optimal is ~72, 28 steps to goal
-40	-40	triangle_tireworld_inst_mdp__10	optimal is ~68, 32 steps to goal

Some details on triangle_tireworld:

Motivation for triangle_tireworld as a "probabilistically interesting" domain comes from Little and Thiebaux (2007):

http://users.cecs.anu.edu.au/~thiebaux/papers/icaps07wksp.pdf

I directly translated the triangle tireworld instance numbers p01-p05 from IPPC 2008 to these 10 instances. p01 in 2008 got mapped to __1 and __2 in 2014 with different move success probabilities and similarly for p02->__3,__4 up to p05->__9,__10. The reason for not using p06-p10 from IPPC 2008 was that problems had to be solvable within the horizon of 40 steps in IPPC 2014 and the driving principle of triangle tireworld demands the failure probability is reasonably high so planners have to reason about the probabilities of different paths.

The IPPC 2008 had a very large horizon limit (I think 500 steps?) and action cost was not explicit, shorter plans were simply encouraged since all IPPC 2008 domains were goal-oriented.

I simply forgot to include state-action constraints that would have prevented illegal moves in the IPPC 2014 RDDL domain... so the IPPC 2014 domain had *tons* of useless noops (illegal moves treated as noops) and this is reflected in the unusually large translation sizes seen for these domains. My apologies for this omission that makes the IPPC 2014 domains harder, but this is simply encouragement for planners to prune useless/identical actions. :)

I've tried to give an estimate of the optimal value achievable for each problem based on expected steps to the goal. This value (100 - expected number of steps to goal) is actually an upper bound since random variance in the runs will lead some of them to necessarily exceed the horizon limit of 40 and hence get -40 for those runs (not 0), which will put down the average. The upper bounds should be reasonably tight though since the number of problems for which random chance requires >40 steps to a solution should be small.

Reply all

Reply to author

Forward

0 new messages