Time limits

Andrey Kolobov

unread,

Mar 16, 2011, 10:03:11 PM3/16/11

to ippc...@googlegroups.com, Scott Sanner

Hi Scott,

I have two questions regarding the time limits for the competition:

1) On large problems, many of the competiting will probably not be
able to run till convergence and therefore will have to stop
prematurely to avoid spending too much time on one problem. There are
several stopping conditions one can think of here, but one of them is
simply a time cutoff.

The issue is that deciding how much time to allocate to a given
problem is very difficult. In particular, one can easily end up with
"extra" time left after going through all the problems, the time that
could be spent improving solutions to some of them. However, since
according to the current rules only the first 30 trials on every
problem count, one can't have a planner work on a given problem for
some time, present the solution to the server, then switch to another
problem, and then come back to the first one if there is time left,
improve the solution, and have another session with the server using
the improved policy.

I'm suggesting that for a given problem, planners should be allowed an
arbitrary number of 30-trial sessions with the server, and only the
best of these 30-trial sessions would be used for judging the
planner's performance. Of course, there could be a problem of
competitiors trying, for a given problem, many sessions with the same
policy in the hope that a particular session's reward ends up being
very high simply by luck. However, it seems that overall the results
of evaluating planners this way will be more representative to the
planners' actual performance.

So, would you consider changing the evaluation rules as above?

2) If the above isn't a good idea for some reason, would you consider
imposing a *per domain* time limit (which doesn't necessarily have to
be the same for every domain)? That would make the time allocation
problem somewhat easier. Also, it would make the comparison of
planners' performance on a given domain more meaningful -- right now,
comparing their performance on a domain-by-domain basis would be hard
since their time allocation strategies may be vastly different.

Thanks,

Andrey

Scott Sanner

unread,

Mar 16, 2011, 11:02:54 PM3/16/11

to Andrey Kolobov, ippc...@googlegroups.com

> and only the best of these 30-trial sessions

Here the results would no longer correspond to expected performance of the policy under the transition distribution. Better trials may simply result from a better sequence of randomly sampled transitions.

I might consider taking the *last* 30 trials as opposed to the *first* 30 if that makes it easier for competitors. But I cannot do any aggregation operation that involves a "max".

Aside: it seems we may increase the trials to 100+ to decrease variance as previously recommended... we will determine this after all test competition results are in.

> consider imposing a *per domain* time limit

I agree this is a good idea, but we're simply not doing this because there would be no easy way to verifiably enforce it. The Server can only verify online planning time, not offline planning time and the format of the competition is that all rules should be externally verifiable.

We could do a timed release of competition domains and used fixed time windows for each domain on the Server, but this would be 24 hours of logistical nightmare for the organizers and competitors and our staggered locations throughout the world.

The 24 hour time limit is perhaps not ideal, but it seemed the best compromise among many competing needs, so please try to do your best with this format.

Cheers,

Scott

Andrey Kolobov

unread,

Mar 16, 2011, 11:13:18 PM3/16/11

to Scott Sanner, ippc...@googlegroups.com

> I might consider taking the *last* 30 trials as opposed to the *first* 30 if
> that makes it easier for competitors.

This would be awesome!

Best,

Andrey

Ashwin NR

unread,

Mar 16, 2011, 11:17:29 PM3/16/11

to ippc...@googlegroups.com, Andrey Kolobov, Scott Sanner

> I might consider taking the *last* 30 trials as opposed to the *first* 30 if
> that makes it easier for competitors.

This would be awesome!