Dear CLAS 2024 Organizers,
I hope this message finds you well. I am writing to provide feedback on the recent developments in the Competition for LLM and Agent Safety (CLAS) 2024.
First, I want to express our appreciation for the hard work and dedication you have put into organizing this event. However, some participants have noted changes in the evaluation metrics and submission deadlines shortly before the competition's conclusion. These adjustments seemed to affect the scoring and rankings of various teams.
We understand that such decisions can be complex and may arise from unforeseen challenges. However, changes close to the competition's end can impact participants' plans and expectations (There may be participants who suspect the results are being manipulated). We kindly suggest that any adjustments be communicated well in advance to maintain clarity and fairness. We argue that winners be determined based on the original deadline and rules. If issues with evaluation metrics exist, they should be addressed before the competition starts, not during. A more reasonable solution might involve discussing concerns with the organizers after the competition and including additional results in an appendix.
Thank you for considering this feedback. We are all committed to ensuring that the competition remains a positive and constructive experience for everyone involved.
Dear Participants,Thanks for the great efforts for participating the competition and we are also trying our best to make the process satisfactory and fair for everyone!Based on the current feedback, it seems there is no need to extend the deadline for track II and III, although we intended to provide more opportunities for everyone.
In this case, we will only extend track I to 10/28 midnight EST. These decisions are final and thanks for understanding! Thank you!Best,Organizers
On Sunday, October 27, 2024 at 3:03:09 AM UTC-4 d1m...@gmail.com wrote:
Exactly, I can't agree with you more, the postpone of Track2 and Track3 seems unreasonable, due to the adjust of personal schedule has been made
to participate in the game.
On Sunday, October 27, 2024 at 2:10:10 PM UTC+8 thre...@gmail.com wrote:
Hi organizers,
I would like to ask if the decision to postpone the competition is really reasonable? Track 2 and Track 3 were not affected at all, but they were still postponed after the agreed time. If the competition rules could be arbitrarily changed, is it in the interests of the majority of players? Rules are rules, ddl is here for everyone. I am a company employee, in order to prepare for the test phase competition as required, I put a lot of work arrangements after October 26. I don’t know what kind of discussion the organizers had internally. For me personally, a two-day postponement will not help my team at all, because I need to return to my job, but it will benefit those players who did not participate on time.On Sunday, October 27, 2024 at 1:07:05 PM UTC+8 zhen.xia...@gmail.com wrote:Dear Participants,Thank you again for your participation.Regarding the recent discussion on Track I evaluation, we will clarify the evaluation and process here. Hope this will help!First of all, as we stated in the competition guideline "the same evaluation metrics in the development phase are used for both testing models", we exactly use the same metric as the development phase except for a different judge model and a held-out model at the test phase for attack transferability evaluation. This is a standard evaluation process and we will publish the judge model and submitted prompts after the competition.Second, regarding the evaluation score. We take into account the jailbreak score and stealthiness score. For the jailbreak score, we consider both the keyword matching and LLM as a judge as stated in our guideline "the score for each prompt is the average over a keyword score and a harmfulness score". These two metrics are often used simultaneously in the jailbreak literatures for prompt quality and harmfulness tradeoff analysis. Our competition aims to reward the jailbreaks which are stealthy (close to given benign prompts), low-refusal rate (outputs should not directly contain refusal keywords), and harmful as discussed in the guideline.Due to the previous confusion and model evaluation uncertainties, we will do multiple evaluations for each input and take the average score. We will provide a new set of evaluations for the Track I submissions for fairness.In addition, for Track I-III, each team will gain two more submission chances due to the recent confusion. The submission portal will be open until 10/28 at midnight EST. Please let us know if you have any questions.Best,Organizers
--
您收到此邮件是因为您订阅了Google群组上的“clas2024-updates”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到clas2024-updat...@googlegroups.com。
如需查看此讨论,请访问 https://groups.google.com/d/msgid/clas2024-updates/4b3a211b-b040-4a4b-8f68-da36451b06fdn%40googlegroups.com。
要查看更多选项,请访问https://groups.google.com/d/optout。
--
您收到此邮件是因为您订阅了Google群组上的“clas2024-updates”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到clas2024-updat...@googlegroups.com。