Request for rule adjustment

Andrey Kolobov

unread,

Jan 16, 2014, 4:50:05 PM1/16/14

to ippc-2014...@googlegroups.com

Hi all,

Having read the rules on the competition webpage, I'd like to propose the following rule changes. Nothing too fundamental, but it would be interesting to see what people think about them.

1) The competition webpage says that "there will be a limit of 30s per trial". I'd suggest removing this restriction, because it penalizes planners that do at least part of their planning offline. As I recall from IPPC-2011, once a planner initiates a session for solving a problem on a server and receives the problem description, the time counter for the first trial immediately starts ticking. This means that a planner that wants to spend, for instance, 10 min planning offline and only then do actual trials would be essentially forced to fail the first 10*600/30 trials.

Instead, how about simply giving each planner at most N (say, 18) min per problem? Basically, this makes different planners' performance on any given problem easily comparable, and eliminates any overall perf differences due to time allocation schemes. Also, unlike the current 30s-per-trial restriction, it doesn't penalize or benefit either online or offline planners. Moreover, 18 min * 80 = 24 hours, if the competition has 80 problems (e.g., 8 domains of 10 problems each), there would be no need for an explicit overall 24-hour time cap.

2) I'd suggest disallowing parameter tuning after the competition begins, and/or designing the competition problem instances to be significantly different from the ones that have been released.

Unlike in machine learning, algorithmic parameters in planning don't generalize well even within a single domain, but picking good values for them can be absolutely essential to good performance (lookeahead illustrates both of these aspects). Therefore, if problems in the competition are the same or very similar to those in the warmup round (this was the case in IPPC-2011 for 6 out of 8 domains), allowing parameter tuning means that many planners' performance will be misleadingly better than if they were presented with a problem without a chance to have their parameters tuned.

Disallowing parameter tuning would ultimately force competitors to design planners that either don't depend heavily on parameter tuning or tune their parameters automatically, encouraging systems that are easy to use out-of-the-box.

Best,

Andrey

Thomas Keller

unread,

Jan 17, 2014, 5:39:33 AM1/17/14

to ippc-2014...@googlegroups.com

Hi,

while I believe that the website is simply outdated as Scott and I
already discussed some of these suggestions, I'd still like to back up
Andrey and share my opinion.

On 01/16/2014 10:50 PM, Andrey Kolobov wrote:
> Hi all,
>
>
> Having read the rules on the competition webpage, I'd like to propose
> the following rule changes. Nothing too fundamental, but it would be
> interesting to see what people think about them.
>
>
> 1) The competition webpage says that "there will be a limit of 30s per
> trial". I'd suggest removing this restriction, because it penalizes
> planners that do at least part of their planning offline. As I recall
> from IPPC-2011, once a planner initiates a session for solving a problem
> on a server and receives the problem description, the time counter for
> the first trial immediately starts ticking. This means that a planner
> that wants to spend, for instance, 10 min planning offline and only then
> do actual trials would be essentially forced to fail the first 10*600/30
> trials.
>
> Instead, how about simply giving each planner at most N (say, 18) min
> per problem? Basically, this makes different planners' performance on
> any given problem easily comparable, and eliminates any overall perf
> differences due to time allocation schemes. Also, unlike the current
> 30s-per-trial restriction, it doesn't penalize or benefit either online
> or offline planners. Moreover, 18 min * 80 = 24 hours, if the
> competition has 80 problems (e.g., 8 domains of 10 problems each), there
> would be no need for an explicit overall 24-hour time cap.

I totally agree on this one, we should try to avoid giving one kind of
planner type an unfair advantage over another (even though our planner
would benefit from the current ruling ;) ). Nevertheless, I wouldn't
care about the 24 hour restriction, but use a fixed time per decision to
calculate the available total time for each problem (with, e.g., 1
second per decision this would have led to 20 minutes per problem in the
last IPPC with a finite horizon of 40 and 30 runs and thereby slightly
more than a day total). This allows to use different total times if
problems have different values of the finite horizon. A good value for
the time per decision depends on the used number of runs. As confidence
intervals for single problems with 30 runs are just horrible, it'd also
be nice to increase the number of runs. I'd suggest to use 60 runs and a
time per decision of 0.75 seconds. That way, a problem with finite
horizon of 40 has to be finished in 30 minutes which nicely mimics the
non-probabilistic competition.

>
>
>
> 2) I'd suggest disallowing parameter tuning after the competition
> begins, and/or designing the competition problem instances to be
> significantly different from the ones that have been released.
>
> Unlike in machine learning, algorithmic parameters in planning don't
> generalize well even within a single domain, but picking good values for
> them can be absolutely essential to good performance (lookeahead
> illustrates both of these aspects). Therefore, if problems in the
> competition are the same or very similar to those in the warmup round
> (this was the case in IPPC-2011 for 6 out of 8 domains), allowing
> parameter tuning means that many planners' performance will be
> misleadingly better than if they were presented with a problem without a
> chance to have their parameters tuned.
>
> Disallowing parameter tuning would ultimately force competitors to
> design planners that either don't depend heavily on parameter tuning or
> tune their parameters automatically, encouraging systems that are easy
> to use out-of-the-box.

Regarding your first point (disallowing parameter tuning after the
competition begins): I absolutely agree on this point and think it is
crucial for a significant competition. It should not be allowed to
manually start the planner with different parameters on different
instances. As this is hard to check under the competition rules, I'd
suggest that the source code of each participating planner has to be
shared with the organizers before the competition starts such that it is
possible to repeat the results if the need arises (I am aware that the
results won't be identical in a probabilistic setting, but comparable at
least). Also, it should not be allowed to run the same problem more than
once as was the case in the last IPPC.

Regarding your second point (problem instances should be significantly
different from the ones that have been released): I also support this
suggestion, but it does of course mean quite some extra work for the
organizers. In a perfect competition setting, I'd also suggest to use
the same procedure as in the non-probabilistic competition where
planners are run by the organizers themselves, but I suppose that the
organizers already have enough trouble. Therefore, if you guys find the
time to model enough domains, that's great, but thank you for your
effort regardless!

Best,
Thomas

Andrey Kolobov

unread,

Jan 17, 2014, 2:13:28 PM1/17/14

to ippc-2014...@googlegroups.com

Hi all,

First of all, I'd like to join Thomas in thanking the organizers for their effort in running the competition and designing the competition domains!

Thomas, thanks for your feedback, here are some more comments:

would benefit from the current ruling ;) ). Nevertheless, I wouldn't
care about the 24 hour restriction,

Same here -- allocating 20 or 30 min per problem is fine with me, even if this will mean that the overall max. time for solving all the problems will be more than 24 hours.

but use a fixed time per decision to
calculate the available total time for each problem (with, e.g., 1
second per decision this would have led to 20 minutes per problem in the
last IPPC with a finite horizon of 40 and 30 runs and thereby slightly
more than a day total). This allows to use different total times if
problems have different values of the finite horizon.

To be clear on this -- I agree that this is a good way of calculating an overall time limit for a problem and making it dependent on the problem horizon length. However, and I'm not sure if you are actually suggesting this, I don't think there should be a per-decision limit, because, like the current per-trial limit, it would be unfair towards planners that do part or all of their computation offline. Making the first decision may take them a very long time, but making subsequent ones may take them significantly less than a second. So, I'd vote for just putting a max. time on solving each problem, and giving each planner the freedom to allocate this time in any way it likes.

A good value for
the time per decision depends on the used number of runs. As confidence
intervals for single problems with 30 runs are just horrible, it'd also
be nice to increase the number of runs.

Agreed.

Regarding your first point (disallowing parameter tuning after the
competition begins): I absolutely agree on this point and think it is
crucial for a significant competition. It should not be allowed to
manually start the planner with different parameters on different
instances. As this is hard to check under the competition rules, I'd
suggest that the source code of each participating planner has to be
shared with the organizers before the competition starts such that it is
possible to repeat the results if the need arises (I am aware that the
results won't be identical in a probabilistic setting, but comparable at
least).

Good idea; I'd also suggest having the participants share the parameter values used for the whole competition, in addition to the source code. To reduce the burden on the participants right before the competition and allow for last-minute bug fixes though, I think it would be fine to share the code shortly after, as opposed to before, the competition -- the main thing is that the code should be available for inspection and running by the organizers to verify that it produces results that are statistically indistinguishable from those demonstrated during IPPC.

Also, it should not be allowed to run the same problem more than
once as was the case in the last IPPC.

Disallowing this would hurt coarse-to-fine approximation planners, which first come up with a quick-and-dirty solution and then, as long as there is time left, improve it (and submit the improved solutions to the server). So, I'd actually vote for keeping the old rule in this case, according to which the competitors can run as many trials of a problem as they like, but only the last N trials count.

Regarding your second point (problem instances should be significantly
different from the ones that have been released): I also support this
suggestion, but it does of course mean quite some extra work for the
organizers.

Indeed. To reduce the work, the organizers could release, say, 3 example problems for each domain before the competition (as opposed to 10, as was the case for IPPC-2011), but use 10 problems for each domain (all different from the first 4) during the actual competition.

Cheers,

Andrey

Thomas Keller

unread,

Jan 18, 2014, 4:21:11 AM1/18/14

to ippc-2014...@googlegroups.com

Hi,

On 01/17/2014 08:13 PM, Andrey Kolobov wrote:

Hi all,

First of all, I'd like to join Thomas in thanking the organizers for their effort in running the competition and designing the competition domains!

Thomas, thanks for your feedback, here are some more comments:

would benefit from the current ruling ;) ). Nevertheless, I wouldn't
care about the 24 hour restriction,

Same here -- allocating 20 or 30 min per problem is fine with me, even if this will mean that the overall max. time for solving all the problems will be more than 24 hours.

but use a fixed time per decision to
calculate the available total time for each problem (with, e.g., 1
second per decision this would have led to 20 minutes per problem in the
last IPPC with a finite horizon of 40 and 30 runs and thereby slightly
more than a day total). This allows to use different total times if
problems have different values of the finite horizon.

To be clear on this -- I agree that this is a good way of calculating an overall time limit for a problem and making it dependent on the problem horizon length. However, and I'm not sure if you are actually suggesting this, I don't think there should be a per-decision limit, because, like the current per-trial limit, it would be unfair towards planners that do part or all of their computation offline. Making the first decision may take them a very long time, but making subsequent ones may take them significantly less than a second. So, I'd vote for just putting a max. time on solving each problem, and giving each planner the freedom to allocate this time in any way it likes.

I guess I was not really clear on this point, I wanted to suggest the same procedure you are suggesting here.

A good value for
the time per decision depends on the used number of runs. As confidence
intervals for single problems with 30 runs are just horrible, it'd also
be nice to increase the number of runs.

Agreed.

Regarding your first point (disallowing parameter tuning after the
competition begins): I absolutely agree on this point and think it is
crucial for a significant competition. It should not be allowed to
manually start the planner with different parameters on different
instances. As this is hard to check under the competition rules, I'd
suggest that the source code of each participating planner has to be
shared with the organizers before the competition starts such that it is
possible to repeat the results if the need arises (I am aware that the
results won't be identical in a probabilistic setting, but comparable at
least).

Good idea; I'd also suggest having the participants share the parameter values used for the whole competition, in addition to the source code. To reduce the burden on the participants right before the competition and allow for last-minute bug fixes though, I think it would be fine to share the code shortly after, as opposed to before, the competition -- the main thing is that the code should be available for inspection and running by the organizers to verify that it produces results that are statistically indistinguishable from those demonstrated during IPPC.

You are right, shortly after is sufficient.

Also, it should not be allowed to run the same problem more than
once as was the case in the last IPPC.

Disallowing this would hurt coarse-to-fine approximation planners, which first come up with a quick-and-dirty solution and then, as long as there is time left, improve it (and submit the improved solutions to the server). So, I'd actually vote for keeping the old rule in this case, according to which the competitors can run as many trials of a problem as they like, but only the last N trials count.

As long as the runs are performed within the time limit for the problem I wouldn't mind.

Cheers
Thomas

Regarding your second point (problem instances should be significantly
different from the ones that have been released): I also support this
suggestion, but it does of course mean quite some extra work for the
organizers.

Indeed. To reduce the work, the organizers could release, say, 3 example problems for each domain before the competition (as opposed to 10, as was the case for IPPC-2011), but use 10 problems for each domain (all different from the first 4) during the actual competition.

Cheers,

Andrey

--
You received this message because you are subscribed to the Google Groups "IPPC 2014 (DISCRETE TRACK)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ippc-2014-discr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Scott Sanner

unread,

Jan 28, 2014, 5:20:35 AM1/28/14

to ippc-2014...@googlegroups.com

Thanks Andrey and Thomas for the discussion.

Based on this, I propose to update the rules as follows:

(1) We'll change from a per-trial time limit to a per-instance time limit.

(2) We'll disallow manual parameter tuning once the competition has begun.

(3) To ensure reproducibility of results under competition constraints, we'll require a source code release of all competing planners to the organizers shortly after the competition (a wider release is optional, but highly encouraged).

Any objections to these rule changes? If not, we'll confirm and update the IPPC web page.

-Scott

Scott Sanner

unread,

Jan 29, 2014, 8:58:36 PM1/29/14

to ippc-2014...@googlegroups.com

I've thought a bit more about point (2) and would like to make the following proposal that differs somewhat from IPPC 2011:

Warm-up competition (March):

(a) We expect to release 4 competition domains and sample instances prior to the warm-up competition.

(b) These 4 domains will appear again in the final competition although with different instances.

(c) There are no rules on parameter tuning or planner modification in the warm-up competition since these results are not released.

(d) The purpose of the warm-up competition is to test client/server communication and for competitors to get accustomed to the Amazon EC2 platform.

Final competition (May):

(a) Competition week: The final competition will take place over a one week period where competitors will sign up for a 24 hour time slot during this period.

(b) Planner freeze and domain release: 3 days prior to the start of the one week competition period, a planner freeze will be put in effect -- no planner modifications can be made from this point on (which includes manual parameter tuning, algorithm changes, or other performance modifications). At the time of the planner freeze we will provide a link to the 8 competition domains and one associated instance for each domain. If competitors have *parse errors* or *planner crashes*, they are allowed to fix these issues during the planner freeze period so long as they document all planner modifications and post their *full list of changes* to this email list.

(c) 24 hour Competition slot: The full competition domain and instance list will be released to competitors via a private web link at the start of their chosen 24 hour competition slot.

(d) Post-competition: all competitors need to sign a document saying that they have honored the rules above (there will a section to explain any variations, e.g., to list changes on account of debugging). Competitors also need to submit an archive of their source code to the organizers with instructions on how to run the planner to reproduce competition results. (The organizers will not publicly release this code.)

(e) Final results: released at ICAPS 2014 (June 21-26) and posted to the IPPC website immediately thereafter.

This is just a proposal... if you prefer different rules, please feel free to post suggested modifications and we'll discuss.

Cheers,

Scott

Marek Grzes

unread,

Feb 18, 2014, 3:39:11 PM2/18/14

to ippc-2014...@googlegroups.com

since there were no objections, I added the updated rules to the competition web-site

Cheers,
Marek

Reply all

Reply to author

Forward