More submissions for all three tracks!!

248 views
Skip to first unread message

Zhen Xiang

unread,
Oct 27, 2024, 1:07:05 AM10/27/24
to clas2024-updates
Dear Participants,Thank you again for your participation.Regarding the recent discussion on Track I evaluation, we will clarify the evaluation and process here. Hope this will help!First of all, as we stated in the competition guideline "the same evaluation metrics in the development phase are used for both testing models", we exactly use the same metric as the development phase except for a different judge model and a held-out model at the test phase for attack transferability evaluation. This is a standard evaluation process and we will publish the judge model and submitted prompts after the competition.Second, regarding the evaluation score. We take into account the jailbreak score and stealthiness score. For the jailbreak score,  we consider both the keyword matching and LLM as a judge as stated in our guideline "the score for each prompt is the average over a keyword score and a harmfulness score". These two metrics are often used simultaneously in the jailbreak literatures for prompt quality and harmfulness tradeoff analysis. Our competition aims to reward the jailbreaks which are stealthy (close to given benign prompts), low-refusal rate (outputs should not directly contain refusal keywords), and harmful as discussed in the guideline.Due to the previous confusion and model evaluation uncertainties, we will do multiple evaluations for each input and take the average score. We will provide a new set of evaluations for the Track I submissions for fairness.In addition, for Track I-III, each team will gain two more submission chances due to the recent confusion. The submission portal will be open until 10/28 at midnight EST. Please let us know if you have any questions.

Best,
Organizers

George Zhao

unread,
Oct 27, 2024, 2:10:10 AM10/27/24
to clas2024-updates
Hi organizers,

I would like to ask if the decision to postpone the competition is really reasonable? Track 2 and Track 3 were not affected at all, but they were still postponed after the agreed time. If the competition rules could be arbitrarily changed, is it in the interests of the majority of players? Rules are rules, ddl is here for everyone. I am a company employee, in order to prepare for the test phase competition as required, I put a lot of work arrangements after October 26. I don’t know what kind of discussion the organizers had internally. For me personally, a two-day postponement will not help my team at all, because I need to return to my job, but it will benefit those players who did not participate on time.

Whitolf Chen

unread,
Oct 27, 2024, 2:18:50 AM10/27/24
to clas2024-updates
Dear organizers,

we sincerely thank you for your efforts and understand that the emergence of unexpected events may lead to corresponding changes.

However, same as George Zhao mentioned, Track 2 and Track 3 were not affected at all, but they were still postponed after the agreed time. More importantly, everyone schedules and plans their time according to the original game schedule. Extending the DDL without any special changes is an unreasonable and unfair thing to do.

We hope the organizers can deliberatively reconsider their decision to extend the DDLs of the latter two tracks. We believe that maintaining the original approach (i.e. extending the duration of the first Track only) is the fairest and most reasonable approach.

rin

unread,
Oct 27, 2024, 2:37:31 AM10/27/24
to clas2024-updates

Dear CLAS 2024 Organizers,

I hope this message finds you well. I am writing to provide feedback on the recent developments in the Competition for LLM and Agent Safety (CLAS) 2024.

First, I want to express our appreciation for the hard work and dedication you have put into organizing this event. However, some participants have noted changes in the evaluation metrics and submission deadlines shortly before the competition's conclusion. These adjustments seemed to affect the scoring and rankings of various teams.

We understand that such decisions can be complex and may arise from unforeseen challenges. However, changes close to the competition's end can impact participants' plans and expectations (There may be participants who suspect the results are being manipulated). We kindly suggest that any adjustments be communicated well in advance to maintain clarity and fairness. We argue that winners be determined based on the original deadline and rules. If issues with evaluation metrics exist, they should be addressed before the competition starts, not during. A more reasonable solution might involve discussing concerns with the organizers after the competition and including additional results in an appendix.

Thank you for considering this feedback. We are all committed to ensuring that the competition remains a positive and constructive experience for everyone involved.

d1m Fak3

unread,
Oct 27, 2024, 3:03:09 AM10/27/24
to clas2024-updates
Exactly, I can't agree with you more, the postpone of Track2 and Track3 seems unreasonable, due to the adjust of personal schedule has been made 
to participate in the game.

Zhen Xiang

unread,
Oct 27, 2024, 3:20:14 AM10/27/24
to clas2024-updates
Dear Participants,
Thanks for the great efforts for participating the competition and we are also trying our best to make the process satisfactory and fair for everyone!
Based on the current feedback, it seems there is no need to extend the deadline for track II and III, although we intended to provide more opportunities for everyone.
In this case, we will only extend track I to 10/28 midnight EST. These decisions are final and thanks for understanding! Thank you!
Best,
Organizers

Frankie THOU

unread,
Oct 27, 2024, 3:24:58 AM10/27/24
to clas2024-updates
Dear organizers,

I personally agree with the participant rin, that "winners should be determined based on the original deadline and rules". In my personal thoughts, track 2 & track 3 are not affected at all, so there seems no need to accept two more late submission. Regarding Track 1, it is only the test-stage evaluation methodolgy that is not as precise as the competition website, thus maybe we could just re-evaluate all the submissions which has already been submitted before the official DDL, and do not accept additional submissions.

Moreover, it seems quite resources-demanding to conduct track 1 re-evaluation, that several teams re-evaluation results have not yet been displayed in the current leaderboard. To only re-evaluate the submitted prompts and do not accept more late submissions might help the efficiency.

Best regards

longmosheng

unread,
Oct 27, 2024, 3:29:27 AM10/27/24
to zhen.xia...@gmail.com, clas2024...@googlegroups.com
I think it's good thing to extend the deadline for track II and III

发自我的小米
在 Zhen Xiang <zhen.xia...@gmail.com>,2024年10月27日 下午3:20写道:

Dear Participants,
Thanks for the great efforts for participating the competition and we are also trying our best to make the process satisfactory and fair for everyone!
Based on the current feedback, it seems there is no need to extend the deadline for track II and III, although we intended to provide more opportunities for everyone.
In this case, we will only extend track I to 10/28 midnight EST. These decisions are final and thanks for understanding! Thank you!
Best,
Organizers

On Sunday, October 27, 2024 at 3:03:09 AM UTC-4 d1m...@gmail.com wrote:
Exactly, I can't agree with you more, the postpone of Track2 and Track3 seems unreasonable, due to the adjust of personal schedule has been made 
to participate in the game.
On Sunday, October 27, 2024 at 2:10:10 PM UTC+8 thre...@gmail.com wrote:
Hi organizers,

I would like to ask if the decision to postpone the competition is really reasonable? Track 2 and Track 3 were not affected at all, but they were still postponed after the agreed time. If the competition rules could be arbitrarily changed, is it in the interests of the majority of players? Rules are rules, ddl is here for everyone. I am a company employee, in order to prepare for the test phase competition as required, I put a lot of work arrangements after October 26. I don’t know what kind of discussion the organizers had internally. For me personally, a two-day postponement will not help my team at all, because I need to return to my job, but it will benefit those players who did not participate on time.

On Sunday, October 27, 2024 at 1:07:05 PM UTC+8 zhen.xia...@gmail.com wrote:
Dear Participants,Thank you again for your participation.Regarding the recent discussion on Track I evaluation, we will clarify the evaluation and process here. Hope this will help!First of all, as we stated in the competition guideline "the same evaluation metrics in the development phase are used for both testing models", we exactly use the same metric as the development phase except for a different judge model and a held-out model at the test phase for attack transferability evaluation. This is a standard evaluation process and we will publish the judge model and submitted prompts after the competition.Second, regarding the evaluation score. We take into account the jailbreak score and stealthiness score. For the jailbreak score,  we consider both the keyword matching and LLM as a judge as stated in our guideline "the score for each prompt is the average over a keyword score and a harmfulness score". These two metrics are often used simultaneously in the jailbreak literatures for prompt quality and harmfulness tradeoff analysis. Our competition aims to reward the jailbreaks which are stealthy (close to given benign prompts), low-refusal rate (outputs should not directly contain refusal keywords), and harmful as discussed in the guideline.Due to the previous confusion and model evaluation uncertainties, we will do multiple evaluations for each input and take the average score. We will provide a new set of evaluations for the Track I submissions for fairness.In addition, for Track I-III, each team will gain two more submission chances due to the recent confusion. The submission portal will be open until 10/28 at midnight EST. Please let us know if you have any questions.

Best,
Organizers

--
您收到此邮件是因为您订阅了Google群组上的“clas2024-updates”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到clas2024-updat...@googlegroups.com
如需查看此讨论,请访问 https://groups.google.com/d/msgid/clas2024-updates/4b3a211b-b040-4a4b-8f68-da36451b06fdn%40googlegroups.com
要查看更多选项,请访问https://groups.google.com/d/optout

Tian

unread,
Oct 27, 2024, 3:32:46 AM10/27/24
to Frankie THOU, clas2024-updates
I do not agree. The previous information about Track I is a little misleading. This is because the objective of excluding rejection words and the LLM judging model are not aligned. Some answers may have rejection words but still get good scores from certain LLMs. Some answers may not have rejection words but are not that good. So considering both metrics is a good way from our perspective.
However, after observing the previous results, a quick thought is to change the optimization policy instead of following the original one, which will easily cause information gaps.

--
您收到此邮件是因为您订阅了Google群组上的“clas2024-updates”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到clas2024-updat...@googlegroups.com

md. saroar Jahan

unread,
Oct 27, 2024, 4:32:02 AM10/27/24
to clas2024-updates
Thanks for extending the deadline.
Reply all
Reply to author
Forward
0 new messages