I have 2 questions:
> Scores are currently normalized to the interval [0,1]. The lower score for
> normalization purposes is currently min(RandomPolicy, NoopPolicy). This
> will be changed to max(RandomPolicy, NoopPolicy) for the main reason that
> planners should only get a non-zero normalized score if they can beat both
> trivial policies.
Will you somehow make sure that neither of these policies is optimal
on the competition problems? On quite a few currently published
problems, the best policy is actually one of these two, and some work
is required on the part of the planner to "realize" that. Giving
planners a score of 0 in such cases wouldn't be entirely fair.
Also, will you count only the last or only the first 30 trials on a
given problem when evaluating performance?
Thanks,
Andrey
> This seems true for 3 of the 10 Game of Life instances, but Game of Life
> will not be used in the final competition.
> Do you believe this holds for other domains / instances? Which ones?
I'm not entirely sure if there are more (the largest sysadmin problems
are suspect, but I can't say for certain), and this is exactly the
issue I'm slightly worried about -- it often seems impossible to know
if there is a "nontrivial" policy that beats best{noop, random} unless
the given problem can be solved optimally. If fact, a slight change of
parameters can turn a problem with a nontrivial best policy into one
with a trivial best policy. For large problems optimal solutions are
clearly infeasible, which means that the trouble with game_of_life
could surface on the new domains you will be introducing. So, I was
wondering whether you have somehow verified for the problems we (the
competitors) haven't seen yet that they all have a nontrivial optimal
policy. If this can't be verified then defining minimum score to be
max{noop, random} may be dangerous, since one of these two may
actually be the best one.
The other reason why this may be dangerous is that the best policy may
be only slighly better than best{noop, random}, and due to variance
(in turn, due to the small number of rounds) the difference may be
hardly noticeable. Again though, if you managed to verify that this is
not the case with competition problems then this is not an issue.
Cheers,
Andrey