SHROOM-Visions shared task on hallucination detection in language and vision models

33 views

Skip to first unread message

Timothee Mickus

unread,

May 13, 2026, 8:59:22 AM (4 days ago) May 13

to Machine Learning News

TL;DR

SHROOM-visions is a shared task to advance model-agnostic evaluation of hallucination detection for Vision-and-Language Models (VLMs). Participate in detecting fine-grained hallucination spans across 4 languages (Chinese, English, French, Italian). Stay informed by joining our Google group or our Slack!

Full Invitation

We are excited to announce the SHROOM-visions shared task on vision-language hallucination detection (link to website). We invite participants to detect and classify hallucination spans in a multilingual, multimodal context, using a dataset designed for enduring evaluation.

About

As new foundational models emerge monthly, how do we create hallucination evaluations that remain relevant? Current benchmarks are often tied to the idiosyncrasies of specific LLMs/VLMs, risking quick obsolescence. This shared task builds upon the *SHROOM series of hallucination detection tasks and datasets, venturing into vision-language multilingual hallucination-span prediction. With this shared-task we aim to advance detection methods that generalize across model generations and focus on the core phenomenon of hallucination.

We provide a dataset of 20,000 samples annotated with a fine-grained, span-level labeling scheme:

A train set of ~15,200 samples from 5 different LVLMs.
A closed test set of 4,800 crafted samples.
A submission platform to evaluate the performance of your systems.
Balanced coverage across 4 languages: Chinese, English, French, Italian.
Each sample annotated by 3 annotators using a four-class taxonomy: Invention, Mischaracterization, OCR Problem, Miscounting.

Participants are invited to develop systems that accurately identify and classify hallucinated text spans in image-conditioned outputs. Participants will be invited to submit system description papers, with the option to present them at the UncertaiNLP workshop (co-located with EMNLP 2026). All authors of paper submissions will be asked to review peers' submissions (max 2 papers per author).

Key Dates:

All deadlines are “anywhere on Earth” (23:59 UTC-12).

Train set available by: 10.05.2026
Submission platform open by: 20.05.2026
Evaluation phase ends: 31.07.2026
System description papers due: 10.08.2026 (TBC)
Notification of acceptance: 10.09.2026 (TBC)
Camera-ready due: 20.09.2026 (TBC)
UncertaiNLP workshop: end of October 2026 (co-located with EMNLP)

Evaluation Metrics:

Participants’ models must be able to produce spans corresponding to hallucinations in the text, classified along five possible categories (invention, mischaracterization, OCR problems, miscounting, and other hallucinations). The evaluation will rely on two metrics, evaluating labelled and unlabelled performances separately. Rankings and submissions will be handled separately per language: you are welcome to focus on the languages of your choice!

How to Participate:

Register: Please register your team before making a submission on https://shroom.pythonanywhere.com
Submit results: use our platform to submit your results before 31.07.2026
Submit your system description: system description papers should be submitted by 10.08.2026 (TBC, further details will be announced at a later date).

Want to be kept in the loop?

Join our Google group mailing list or the shared task Slack! We are also open to hosting Q&A sessions for groups interested in participating, you just need to send us an email. We look forward to your participation and to the exciting research that will emerge from this task.

Best regards,

Raúl Vázquez and Timothee Mickus

On behalf of ALL the SHROOM-Visions organizers

Reply all

Reply to author

Forward

0 new messages