CIG 2018 Re-evaluation of Track 1

345 views

Skip to first unread message

Marek Wydmuch

unread,

Aug 16, 2018, 11:31:01 PM8/16/18

to Visual Doom AI Users

Hello dear Competition Participants,

after today's discussion with TSAIL team about results and performance differences in the final evaluation, we've decided to re-evaluate the final results of track 1.

I've updated the results on our side:
http://vizdoom.cs.put.edu.pl/competition-cig-2018/competition-results

As the result of re-evaluation, the final ranking has changed, TSAIL's submission took the first place, DoomNet the second, and yzlc080733 the third one.

Below I include a detailed explanation why we decided to did that for maximum clarity. I hope you will find our decision fair and reasonable.

First of all, I've made a mistake while evaluating TSAIL's submission, using their multiplayer agent.
In this situation, without the doubt, their submission required proper evaluation.
It got very good times on 3 maps, and timeouts on other 7 (counted as 5.0 time). This would give TSAIL second place.
TSAIL's experiments and also our leaderboard suggested that this submission should behave better than obtained scores.

The reason behind the performance difference was that TSAIL's bot wasn't able to leave the starting room. If you took part in track 1 and used our map generator, you will quickly find out that generator creates mostly 2 type of starting points, one on a little platform and second one in a small room with doors. Also the position of an agent in the starting room (and on the platform) is different when the agent is the only player (then it is spawned centrally in front of the doors) and when the second player is present (then the agent's position is shifted a little to the right side).
In both, our crowdAI's evaluation system and final evaluation system our grader was present as the second player (special one that can't interact with a level's environment, but observes and records the evaluation).

TSAIL's team hardcoded the procedure for leaving the starting room assuming that agent always starts in front of the doors.
This sometimes worked for public evaluation but failed completely in the final one.

I checked many recordings from the final and also public evaluation and find out that most of the submissions fail to leave the starting room or do it with some luck after quite a long time. Of course, the starting room is a part of the level, but the primary goal of the track is to test navigation and exploration skills of the agents. That's why we've decided to modify maps used for final evaluation by removing the doors and a front wall of starting rooms (we've changed map 1, 2, 3, 5, 7 and 9) to make a start much easier for all the agents. Then we re-run the evaluation of TSAIL, DoomNet, yzlc080733, and ddangelo.

I'm happy with the fact that all submissions improved their total times significantly and we will be able to see more of interesting action on the movies.
And I hope this explains the reason behind the re-evaluation well enough and DoomNet and yzlc080733 teams won't feel very hurt by our decision.

We will continue the single-player challenge and try to eliminate all the flaws in its formula meantime. And we are also counting on your feedback to improve it.

Best regards
Marek Wydmuch

Reply all

Reply to author

Forward

0 new messages