Dear LLMs CTF Participants,
We hope the Reconnaissance phase is going well! Here are some updates:
We have uploaded what we hope is the final version of the rules, including the scoring details. Due to the larger-than-expected number of eligible defenses, we have set \gamma = 0.85 and T = 1. Being the first to break a defense corresponds to a 20% bonus on the base score for the defense. Please read the scoring rules before running your Evaluation chats.
The start of the evaluation phase is sensitive due to the first-successful-attack bonus. We’ve decided to move the start of the Evaluation phase to 4 Feb 23:59 Anywhere on Earth. We had to move it back by one day because the organizers need to monitor the competition for the first few hours in case something goes wrong, and the previous time slot didn’t work for us.
If you haven’t seen it, we have a Gradio interface for the Reconnaissance phase. Our Gradio setup is unfortunately not reliable enough for scoring: the participants can’t track how many chats their team has created with any defense at any given point. The only way to do Evaluation chats will be through the API, as stated in the rules since v1.1. An example Python script is linked in the API docs.
Korbinian Koch (participant) has created a nice spreadsheet template for the Reconnaissance phase.
If anyone develops code/resources that improve the competition experience, feel free to share them on the issue tracker or with us; we’ll circulate them in our future announcements and credit you. Just be sure not to share exact attack methodologies and/or hidden secrets! When in doubt, consult the organizers privately
We plan to add a few API ergonomics improvements over the next few days, such as “Reset secret in Reconnaissance chats without having to guess 10 times”. If there is something else that would a) make the technical side of executing an attack easier; b) seems easy for us to support, please notify us soon.
We mistakenly communicated to a small number of teams that a particular way of formatting the response (all caps) was fine according to the rules, even though the rules state otherwise. We sincerely apologize for not keeping all discussion of this type public; we think this should be the only such issue. The situation is currently unfair to other teams. Since this is our fault, we are working on finding a compromise solution as soon as possible.
Happy red-teaming!