The development phase has started. The CodaLab servers are now accepting submissions, and the starter kit, data, and models are now available. To get started, see this link: https://trojandetection.ai/start
Thanks to additional testing, we made various improvements to the rules, metrics, and models from the initial release of the website. For the Red Teaming Track, we made the following changes:
- The Diversity metric now incorporates an embedding distance.
- The Combined Score metric now weights Diversity by Success Rate so that random inputs receive low scores.
- We switched from custom refusal models to Llama-2-chat as the target models for red teaming, because we found these models to be very robust to the baseline red teaming methods.
- The manual evaluation in the test phase is now the only way that success rate is computed for the final ranking. The number of test cases manually evaluated per team was increased from 250 to 500.
For the Trojan Detection Track, we made the following changes:
- REASR is now computed using BLEU instead of exact match.
We made the following general changes:
- The compute limit rule was adjusted to significantly reduce the compute limits (down to 2 A100-days for base subtracks and 4 A100-days for large subtracks). This increases the relative importance of algorithmic improvements. The compute limits remain more than sufficient for all the baselines, but one should keep them in mind when generating submissions.
The development phase will last three months. Good luck, and thank you for participating!
All the best,
Mantas (TDC co-organizer)