Hi,
while I believe that the website is simply outdated as Scott and I
already discussed some of these suggestions, I'd still like to back up
Andrey and share my opinion.
On 01/16/2014 10:50 PM, Andrey Kolobov wrote:
> Hi all,
>
>
> Having read the rules on the competition webpage, I'd like to propose
> the following rule changes. Nothing too fundamental, but it would be
> interesting to see what people think about them.
>
>
> 1) The competition webpage says that "there will be a limit of 30s per
> trial". I'd suggest removing this restriction, because it penalizes
> planners that do at least part of their planning offline. As I recall
> from IPPC-2011, once a planner initiates a session for solving a problem
> on a server and receives the problem description, the time counter for
> the first trial immediately starts ticking. This means that a planner
> that wants to spend, for instance, 10 min planning offline and only then
> do actual trials would be essentially forced to fail the first 10*600/30
> trials.
>
> Instead, how about simply giving each planner at most N (say, 18) min
> per problem? Basically, this makes different planners' performance on
> any given problem easily comparable, and eliminates any overall perf
> differences due to time allocation schemes. Also, unlike the current
> 30s-per-trial restriction, it doesn't penalize or benefit either online
> or offline planners. Moreover, 18 min * 80 = 24 hours, if the
> competition has 80 problems (e.g., 8 domains of 10 problems each), there
> would be no need for an explicit overall 24-hour time cap.
I totally agree on this one, we should try to avoid giving one kind of
planner type an unfair advantage over another (even though our planner
would benefit from the current ruling ;) ). Nevertheless, I wouldn't
care about the 24 hour restriction, but use a fixed time per decision to
calculate the available total time for each problem (with, e.g., 1
second per decision this would have led to 20 minutes per problem in the
last IPPC with a finite horizon of 40 and 30 runs and thereby slightly
more than a day total). This allows to use different total times if
problems have different values of the finite horizon. A good value for
the time per decision depends on the used number of runs. As confidence
intervals for single problems with 30 runs are just horrible, it'd also
be nice to increase the number of runs. I'd suggest to use 60 runs and a
time per decision of 0.75 seconds. That way, a problem with finite
horizon of 40 has to be finished in 30 minutes which nicely mimics the
non-probabilistic competition.
>
>
>
> 2) I'd suggest disallowing parameter tuning after the competition
> begins, and/or designing the competition problem instances to be
> significantly different from the ones that have been released.
>
> Unlike in machine learning, algorithmic parameters in planning don't
> generalize well even within a single domain, but picking good values for
> them can be absolutely essential to good performance (lookeahead
> illustrates both of these aspects). Therefore, if problems in the
> competition are the same or very similar to those in the warmup round
> (this was the case in IPPC-2011 for 6 out of 8 domains), allowing
> parameter tuning means that many planners' performance will be
> misleadingly better than if they were presented with a problem without a
> chance to have their parameters tuned.
>
> Disallowing parameter tuning would ultimately force competitors to
> design planners that either don't depend heavily on parameter tuning or
> tune their parameters automatically, encouraging systems that are easy
> to use out-of-the-box.
Regarding your first point (disallowing parameter tuning after the
competition begins): I absolutely agree on this point and think it is
crucial for a significant competition. It should not be allowed to
manually start the planner with different parameters on different
instances. As this is hard to check under the competition rules, I'd
suggest that the source code of each participating planner has to be
shared with the organizers before the competition starts such that it is
possible to repeat the results if the need arises (I am aware that the
results won't be identical in a probabilistic setting, but comparable at
least). Also, it should not be allowed to run the same problem more than
once as was the case in the last IPPC.
Regarding your second point (problem instances should be significantly
different from the ones that have been released): I also support this
suggestion, but it does of course mean quite some extra work for the
organizers. In a perfect competition setting, I'd also suggest to use
the same procedure as in the non-probabilistic competition where
planners are run by the organizers themselves, but I suppose that the
organizers already have enough trouble. Therefore, if you guys find the
time to model enough domains, that's great, but thank you for your
effort regardless!
Best,
Thomas