Final Competition / Test Competition Summary

Skip to first unread message

Scott Sanner

Apr 5, 2011, 11:01:11 AM4/5/11
An update on the final competition (in two weeks) and a summary from the test competition are below.  -Scott


Based on the votes

the final competition will begin on **Wed April 20th @ 9pm PDT** and last for 24 hours.  For your local time, see here:

Rules for the final competition are on the IPPC page:

**Note: some competitors did not run their clients solely from an EC2 large instance node for the test competition -- this is OK, but for the final competition, please note that you *must* run from a single EC2 large instance node.

If you have any competition questions, please do not hesitate to post them to this IPPC list.

We're still evaluating final competition settings for # trials, horizon length (all problems will use same horizon in the final competition), and number of instances per domain.  We'll send an email to the list when these final competition parameters have been set.


We had 11 competitors as follows:

MDP Competitors:
Andrey Kolobov, Peng Dai, Mausam, Dan Weld
George Zhu, Marek Grzes, Jesse Hoey
Thomas Keller, Patrick Eyerich
Kemal Ure, Tuna Toksoz, Alborz Geramifard, Josh Redding
Aswin Raghavan, Saket Joshi, Prasad Tadepalli, Alan Fern

POMDP Competitors:
Kyle Morrison, Pascal Poupart
Wu Kegui, Wee Sun Lee
Kee-Eung Kim, Dongho Kim, Kanghoon Lee
Shaowei Png, Sylvie Ong, Beomjoon Kim, Joelle Pineau
Alan Olsen, Dan Bryce
Eddy Borera, Arisoa Randrianasolo

Below for each instance is listed (1) how many competitors attempted at least one trial (this *includes* the two noop and random policies... also some competitors tried out different clients with different names, each counting as a different competitor), (2) the minimum average score over all planners (likely the noop or random policy).  Because of the lack of controlled test competition conditions, I will not release any results for the best average scores (sorry).

instance_name    (1) num_competitors    (2) min_avg_score

elevator_inst_mdp__1 7.0 -259.85
elevator_inst_mdp__2 6.0 -295.22
elevator_inst_mdp__3 6.0 -364.1
elevator_inst_mdp__4 6.0 -393.77
elevator_inst_mdp__5 6.0 -578.4
elevator_inst_mdp__6 6.0 -583.83
elevator_inst_mdp__7 6.0 -524.82
elevator_inst_mdp__8 6.0 -724.17
elevator_inst_mdp__9 6.0 -781.33
elevator_inst_mdp__10 6.0 -726.95

game_of_life_inst_mdp__1 7.0 22.93
game_of_life_inst_mdp__2 6.0 73.4
game_of_life_inst_mdp__3 7.0 96
game_of_life_inst_mdp__4 6.0 138.3
game_of_life_inst_mdp__5 6.0 155.57
game_of_life_inst_mdp__6 6.0 183.63
game_of_life_inst_mdp__7 6.0 103.7
game_of_life_inst_mdp__8 6.0 258.97
game_of_life_inst_mdp__9 6.0 304.1
game_of_life_inst_mdp__10 6.0 383.77

sysadmin_inst_mdp__1 8.0 120.03
sysadmin_inst_mdp__2 7.0 105.4
sysadmin_inst_mdp__3 6.0 267.07
sysadmin_inst_mdp__4 6.0 228.1
sysadmin_inst_mdp__5 6.0 341.77
sysadmin_inst_mdp__6 6.0 306.33
sysadmin_inst_mdp__7 6.0 499
sysadmin_inst_mdp__8 6.0 399.57
sysadmin_inst_mdp__9 6.0 567.3
sysadmin_inst_mdp__10 6.0 508.83

traffic_inst_mdp__1 6.0 -1257.7
traffic_inst_mdp__2 6.0 -1386.63
traffic_inst_mdp__3 6.0 -1871.37
traffic_inst_mdp__4 6.0 -1966.63
traffic_inst_mdp__5 6.0 -2438.5
traffic_inst_mdp__6 6.0 -2483.2
traffic_inst_mdp__7 6.0 -2975.8
traffic_inst_mdp__8 6.0 -3089.8
traffic_inst_mdp__9 6.0 -3298.4
traffic_inst_mdp__10 6.0 -3438.83

elevator_inst_pomdp__1 9.0 -237.97
elevator_inst_pomdp__2 6.0 -340.6
elevator_inst_pomdp__3 6.0 -380.72
elevator_inst_pomdp__4 6.0 -431.27
elevator_inst_pomdp__5 4.0 -547.82
elevator_inst_pomdp__6 4.0 -577.85
elevator_inst_pomdp__7 5.0 -538.63
elevator_inst_pomdp__8 4.0 -690.18
elevator_inst_pomdp__9 4.0 -755.58
elevator_inst_pomdp__10 4.0 -651.97

game_of_life_inst_pomdp__1 10.0 72.4
game_of_life_inst_pomdp__2 7.0 61.63
game_of_life_inst_pomdp__3 7.0 96.43
game_of_life_inst_pomdp__4 4.0 105.9
game_of_life_inst_pomdp__5 4.0 158.43
game_of_life_inst_pomdp__6 4.0 184.37
game_of_life_inst_pomdp__7 4.0 177.17
game_of_life_inst_pomdp__8 4.0 259.27
game_of_life_inst_pomdp__9 4.0 298.7
game_of_life_inst_pomdp__10 4.0 384.67

sysadmin_inst_pomdp__1 11.0 131.23
sysadmin_inst_pomdp__2 8.0 107.67
sysadmin_inst_pomdp__3 5.0 256.63
sysadmin_inst_pomdp__4 4.0 205.3
sysadmin_inst_pomdp__5 4.0 364.17
sysadmin_inst_pomdp__6 4.0 338.47
sysadmin_inst_pomdp__7 4.0 502.17
sysadmin_inst_pomdp__8 4.0 445
sysadmin_inst_pomdp__9 4.0 627.2
sysadmin_inst_pomdp__10 4.0 532.27

traffic_inst_pomdp__1 3.0 -1269.4
traffic_inst_pomdp__2 3.0 -1270.97
traffic_inst_pomdp__3 2.0 -1760.87
traffic_inst_pomdp__4 2.0 -1772.7
traffic_inst_pomdp__5 2.0 -2301.37
traffic_inst_pomdp__6 2.0 -2320.03
traffic_inst_pomdp__7 2.0 -2803.5
traffic_inst_pomdp__8 2.0 -2804.73
traffic_inst_pomdp__9 2.0 -3090.07
traffic_inst_pomdp__10 3.0 -3589

Scott Sanner

Apr 6, 2011, 8:31:56 AM4/6/11
Dear IPPC Competitors,

95% confidence intervals for the low score averages released yesterday have been requested and are below.

Also, just a comment looking at test competition results: it is interesting to note that *no* single planner dominates performance on more than one instance.  So I think we're shaping up for an interesting final competition.


instance_name    (1) num_competitors    (2) min_avg_score +/- 95% confidence intervals

elevator_inst_mdp__1 7.0 -259.85 +/- 24.31
elevator_inst_mdp__2 6.0 -295.22 +/- 28.19
elevator_inst_mdp__3 6.0 -364.1 +/- 37.23
elevator_inst_mdp__4 6.0 -393.77 +/- 25.73
elevator_inst_mdp__5 6.0 -578.4 +/- 29.7
elevator_inst_mdp__6 6.0 -583.83 +/- 30.24
elevator_inst_mdp__7 6.0 -524.82 +/- 28.79
elevator_inst_mdp__8 6.0 -724.17 +/- 35.02
elevator_inst_mdp__9 6.0 -781.33 +/- 39.77
elevator_inst_mdp__10 6.0 -726.95 +/- 27.62

game_of_life_inst_mdp__1 7.0 22.93 +/- 10.71
game_of_life_inst_mdp__2 6.0 73.4 +/- 7.84
game_of_life_inst_mdp__3 7.0 96 +/- 4.19
game_of_life_inst_mdp__4 6.0 138.3 +/- 22.61
game_of_life_inst_mdp__5 6.0 155.57 +/- 11.25
game_of_life_inst_mdp__6 6.0 183.63 +/- 5.15
game_of_life_inst_mdp__7 6.0 103.7 +/- 20.77
game_of_life_inst_mdp__8 6.0 258.97 +/- 17.81
game_of_life_inst_mdp__9 6.0 304.1 +/- 7.12
game_of_life_inst_mdp__10 6.0 383.77 +/- 16.87

sysadmin_inst_mdp__1 8.0 120.03 +/- 8.37
sysadmin_inst_mdp__2 7.0 105.4 +/- 8.92
sysadmin_inst_mdp__3 6.0 267.07 +/- 17.39
sysadmin_inst_mdp__4 6.0 228.1 +/- 12.31
sysadmin_inst_mdp__5 6.0 341.77 +/- 12.79
sysadmin_inst_mdp__6 6.0 306.33 +/- 13.66
sysadmin_inst_mdp__7 6.0 499 +/- 16.63
sysadmin_inst_mdp__8 6.0 399.57 +/- 15.65
sysadmin_inst_mdp__9 6.0 567.3 +/- 21.88
sysadmin_inst_mdp__10 6.0 508.83 +/- 16.82

traffic_inst_mdp__1 6.0 -1257.7 +/- 28.29
traffic_inst_mdp__2 6.0 -1386.63 +/- 2.25
traffic_inst_mdp__3 6.0 -1871.37 +/- 21.27
traffic_inst_mdp__4 6.0 -1966.63 +/- 4.2
traffic_inst_mdp__5 6.0 -2438.5 +/- 34.77
traffic_inst_mdp__6 6.0 -2483.2 +/- 33.24
traffic_inst_mdp__7 6.0 -2975.8 +/- 17.25
traffic_inst_mdp__8 6.0 -3089.8 +/- 41.52
traffic_inst_mdp__9 6.0 -3298.4 +/- 51.39
traffic_inst_mdp__10 6.0 -3438.83 +/- 43.85

elevator_inst_pomdp__1 9.0 -237.97 +/- 26.99
elevator_inst_pomdp__2 6.0 -340.6 +/- 40.77
elevator_inst_pomdp__3 6.0 -380.72 +/- 33.3
elevator_inst_pomdp__4 6.0 -431.27 +/- 24.59
elevator_inst_pomdp__5 4.0 -547.82 +/- 27.31
elevator_inst_pomdp__6 4.0 -577.85 +/- 29.71
elevator_inst_pomdp__7 5.0 -538.63 +/- 19
elevator_inst_pomdp__8 4.0 -690.18 +/- 33.9
elevator_inst_pomdp__9 4.0 -755.58 +/- 41.11
elevator_inst_pomdp__10 4.0 -651.97 +/- 36.71

game_of_life_inst_pomdp__1 10.0 72.4 +/- 11.82
game_of_life_inst_pomdp__2 7.0 61.63 +/- 8.49
game_of_life_inst_pomdp__3 7.0 96.43 +/- 4.91
game_of_life_inst_pomdp__4 4.0 105.9 +/- 21.64
game_of_life_inst_pomdp__5 4.0 158.43 +/- 12.1
game_of_life_inst_pomdp__6 4.0 184.37 +/- 4.23
game_of_life_inst_pomdp__7 4.0 177.17 +/- 27.24
game_of_life_inst_pomdp__8 4.0 259.27 +/- 14.63
game_of_life_inst_pomdp__9 4.0 298.7 +/- 6.33
game_of_life_inst_pomdp__10 4.0 384.67 +/- 12.5

sysadmin_inst_pomdp__1 11.0 131.23 +/- 9.19
sysadmin_inst_pomdp__2 8.0 107.67 +/- 12.26
sysadmin_inst_pomdp__3 5.0 256.63 +/- 14.26
sysadmin_inst_pomdp__4 4.0 205.3 +/- 12.38
sysadmin_inst_pomdp__5 4.0 364.17 +/- 16.33
sysadmin_inst_pomdp__6 4.0 338.47 +/- 16.37
sysadmin_inst_pomdp__7 4.0 502.17 +/- 19.36
sysadmin_inst_pomdp__8 4.0 445 +/- 16.85
sysadmin_inst_pomdp__9 4.0 627.2 +/- 21.02
sysadmin_inst_pomdp__10 4.0 532.27 +/- 24.84

traffic_inst_pomdp__1 3.0 -1269.4 +/- 10.1
traffic_inst_pomdp__2 3.0 -1270.97 +/- 5.98
traffic_inst_pomdp__3 2.0 -1760.87 +/- 11.79
traffic_inst_pomdp__4 2.0 -1772.7 +/- 13.95
traffic_inst_pomdp__5 2.0 -2301.37 +/- 15.01
traffic_inst_pomdp__6 2.0 -2320.03 +/- 28.88
traffic_inst_pomdp__7 2.0 -2803.5 +/- 6.49
traffic_inst_pomdp__8 2.0 -2804.73 +/- 31.68
traffic_inst_pomdp__9 2.0 -3090.07 +/- 13.62
traffic_inst_pomdp__10 3.0 -3589 +/- 12.32

Andrey Kolobov

Apr 6, 2011, 1:41:41 PM4/6/11
to, Scott Sanner
Hi Scott,

Thanks for the summary!

Can you please also tell us how the domains rank in the order of
perceived "hardness", judging by the results of test competition? For
instance, the domain on which most competitiors got rewards closest to
min_avg would probably be the "hardest", and the one on which pretty
much everyone did significantly better was the "easiest".

No need to provide any exact numbers to avoid discouraging anyone, but
such a qualitative assessment would be very useful.



Scott Sanner

Apr 7, 2011, 6:30:13 AM4/7/11
to, Andrey Kolobov
Hi Andrey,

From "easiest" to "hardest", the ranking of problem difficulty for both MDPs and POMDPs seems to be:

- sysadmin
- game of life
- elevators
- traffic

This is based on a quick qualitative analysis of best planner performance relative to the minimum score.

It's not clear to me how much these results simply reflect the fact that the sysadmin and game of life domains have been available longer than the elevators and traffic domains.


Reply all
Reply to author
0 new messages