We have decided to move directly from Round 2 to Round 3 given the current progress and concerns over false positives in the Round 2 data. You can download the Round 2 leaderboard
here if you want to view the previous results.
The ES server now runs against the Round 3 test dataset consisting of 288 AI models (50% poisoned). Given the increase in number of models, we have increased the time submissions have to finish from 24 hours to 36 hours.
The Round 3 AIs are identical to round 2, with the addition of adversarial training via 2 methods (PGD and Fast is Better than Free). The adversarial training hyperparameters have been varied within the dataset in order to explore the impact of different adversarial training methods on trojan detectability.
More detailed documentation on Round 3 can be found here. We have also added some improvements to the leaderboard directly:
- new metrics: 95% confidence interval for cross entropy, Brier score, runtime
- results table has been split: Results contains the best submission per team. All Results contains a full list of all submissions. This hopefully makes viewing the relative leaderboard positions easier while still being able to view the full set of submissions.
All containers still shared with the
tro...@nist.gov google drive have automatically been run against the Round 3 test dataset.