Dear Colleague,
We are pleased to announce the ImageCLEF 2026 Multimodal Reasoning shared task, organized as part of CLEF 2026. This challenge focuses on advancing the reasoning capabilities of Vision-Language Models (VLMs) across multilingual and multimodal settings.
Motivation:
While VLMs perform strongly on tasks such as image captioning and simple Visual Question Answering (VQA), deeper reasoning across languages and subjects remains a significant challenge. This task encourages participants to develop models capable of robust, multilingual and multimodal reasoning.
Task Description
Participants will tackle multimodal reasoning problems involving images paired with multilingual questions in two tracks:
MCQ Track: Select the correct answer from multiple choices
Open-QA Track: Generate free-form answers grounded in the image
The task is supported by high-quality multimodal datasets and promotes transparent, open research.
Evaluation Metrics
MCQ Track: Accuracy
Open-QA Track: METEOR and COMET
Why Participate?
Advance research in multilingual and multimodal reasoning
Benchmark your systems on carefully curated datasets
Engage with the international CLEF research community
Present your work at CLEF 2026
Key Dates:
Registration closes: 23 April 2026
Dev Set Release: 7 March 2026
Test Data Release: 14 April 2026
We warmly invite you to participate in this exciting challenge.
📍 More information and registration details: https://mbzuai-nlp.github.io/ImageCLEF-MultimodalReasoning/2026/
Let’s shape the future of multimodal reasoning in AI together.
Best regards,
Task Organizing Team
ImageCLEF 2026 – Multimodal Reasoning