TL;DR
SHROOM-CAP is an Indic-centric shared task co-located with CHOMPS-2025
to advance the SOTA in hallucination detection for scientific content
generated with LLMs. We have annotated hallucinated content in 4*
high-resource languages and surprisal 3* low-resource Indic languages
using top-tier LLMs. Participate in as many languages as you like by
accurately detecting the presence of hallucinated content.
Stay informed by joining our Google group!
Full Invitation
We
are excited to announce the SHROOM-CAP shared task on cross-lingual
hallucination detection for scientific publication (link to website).
We invite participants to detect whether or not there is hallucination
in the outputs of instruction-tuned LLMs within a cross-lingual
scientific context.
About
This shared task builds upon our previous iteration, SHROOM, with three key highlights: LLM-centered, cross-lingual annotations & hallucination and fluency prediction.
LLMs
frequently produce "hallucinations," where models generate plausible
but incorrect outputs, while the existing metrics prioritize fluency
over correctness. This results in an issue of growing concern as these
models are increasingly adopted by the public.
With
SHROOM-CAP, we want to advance the state-of-the-art in detecting
hallucinated scientific content. This new iteration of the shared task
is held in a cross-lingual and multimodel context: we provide data
produced by a variety of open-weights LLMs in 4*+3* different high and
low resource languages (English, French, Spanish, Hindi, and
to-be-later-revealed Indic languages).
Participants are invited to participate in any of the languages available and are expected to develop systems that can accurately identify hallucinations in generated scientific content.
Additionally, participants will also be invited to submit system description papers, with the option to present them in oral/poster format during the CHOMPS workshop (collocated with IJCNLP-AACL 2025, Mumbai, India). Participants that elect to write a system description paper will be asked to review their peers’ submissions (max 2 papers per author).
Key Dates:
All deadlines are “anywhere on Earth” (23:59 UTC-12).
Dev set available by: 31.07.2025
Test set available by: 05.10.2025
Evaluation phase ends: 15.10.2025
System description papers due: 25.10.2025 (TBC)
Notification of acceptance: 05.11.2025 (TBC)
Camera-ready due: 11.11.2025 (TBC)
CHOMPS workshop: 23/24th December 2025 (co-located with IJCNLP-AACL 2025)
Evaluation Metrics:
Participants will be ranked along two criteria:
1. factuality mistakes measured via macro-F1 gold reference vs. predicted;
2. fluency mistakes measured via macro-F1 gold reference vs. predicted based on our annotations.
Rankings and submissions will be done separately per language: you are welcome to focus only on the languages you are interested in!
How to Participate:
Register: Please register your team https://forms.gle/hWR9jwTBjZQmFKAE7 and join our google group: https://groups.google.com/g/shroomcap
Submit results: use our platform to submit your results before 15.10.2025
Submit your system description: system description papers should be submitted by 25.10.2025 (TBC, further details will be announced at a later date).
Want to be kept in the loop?
Join our Google group mailing list! We look forward to your participation and to the exciting research that will emerge from this task.
Best regards,