Scoring rules, start time of the Evaluation phase, and other information

67 views
Skip to first unread message

SaTML 2024 LLMs CTF Announcements

unread,
Jan 30, 2024, 1:01:43 PM1/30/24
to SaTML 2024 LLMs CTF Announcements

Dear LLMs CTF Participants,


We hope the Reconnaissance phase is going well! Here are some updates:


  • We have uploaded what we hope is the final version of the rules, including the scoring details. Due to the larger-than-expected number of eligible defenses, we have set \gamma = 0.85 and T = 1. Being the first to break a defense corresponds to a 20% bonus on the base score for the defense. Please read the scoring rules before running your Evaluation chats.

  • The start of the evaluation phase is sensitive due to the first-successful-attack bonus. We’ve decided to move the start of the Evaluation phase to 4 Feb 23:59 Anywhere on Earth. We had to move it back by one day because the organizers need to monitor the competition for the first few hours in case something goes wrong, and the previous time slot didn’t work for us.

  • If you haven’t seen it, we have a Gradio interface for the Reconnaissance phase. Our Gradio setup is unfortunately not reliable enough for scoring: the participants can’t track how many chats their team has created with any defense at any given point. The only way to do Evaluation chats will be through the API, as stated in the rules since v1.1. An example Python script is linked in the API docs.

  • Korbinian Koch (participant) has created a nice spreadsheet template for the Reconnaissance phase.
    If anyone develops code/resources that improve the competition experience, feel free to share them on the issue tracker or with us; we’ll circulate them in our future announcements and credit you. Just be sure not to share exact attack methodologies and/or hidden secrets! When in doubt, consult the organizers privately

  • We plan to add a few API ergonomics improvements over the next few days, such as “Reset secret in Reconnaissance chats without having to guess 10 times”. If there is something else that would a) make the technical side of executing an attack easier; b) seems easy for us to support, please notify us soon.

  • We mistakenly communicated to a small number of teams that a particular way of formatting the response (all caps) was fine according to the rules, even though the rules state otherwise. We sincerely apologize for not keeping all discussion of this type public; we think this should be the only such issue. The situation is currently unfair to other teams. Since this is our fault, we are working on finding a compromise solution as soon as possible.


     Happy red-teaming!

SaTML 2024 LLMs CTF Announcements

unread,
Feb 1, 2024, 1:06:46 PM2/1/24
to SaTML 2024 LLMs CTF Announcements
Dear LLMs CTF participants,

A small update on the last point from the previous announcements: the three defenses involved in the mistaken communication are now back online and rules-abiding, after minimal modifications that prevent them from generating all caps outputs.

We temporarily disabled the defenses yesterday, January 31st, in the morning CET, and re-enabled them today, February 1st, in the early afternoon CET to avoid that people would waste time trying to break them while they were not in their final version.

These defenses are:
  • Hestia/Llama 2
  • RSLLM/Llama 2
  • suibianwanwan/GPT-3.5
We apologize for the mistake on our end. You can get some more details about the topic on this GitHub discussion.

Have fun during the attack phase, and be ready for the evaluation phase starting soon!

Happy red teaming!

The organizers

Reply all
Reply to author
Forward
0 new messages