We are excited to announce the HalluScoring 2026 Shared Task on Arabic LLM Hallucination Detection, to be held as part of ArabicNLP 2026 at EMNLP 2026.
HalluScoring aims to advance research on trustworthy and reliable Arabic Large Language Models through the development of systems capable of detecting hallucinations and identifying truthful answers in Arabic Question Answering.
Track 1: Arabic Hallucination Detection
The objective of Track 1 is to detect hallucinations in Arabic question answering using only input–output pairs.
For each example, participants are given a question in Arabic, a gold (reference) answer, and a model-generated answer from one of several LLMs. Systems must predict a binary label indicating whether the generated answer is hallucinated (1) or non-hallucinated (0).
Track 1 consists of two subtasks:
Task 1.1 — Generalize Across Questions
Systems must generalize across different questions while being evaluated on answers generated by the same set of LLMs seen during training. Training and test sets contain disjoint questions, but the model architectures remain the same.
Task 1.2 — Generalize Across Models
Systems are evaluated on their ability to generalize to entirely unseen LLM architectures. The models appearing in the test set are not present in the training data, requiring robust hallucination detection beyond model-specific patterns.
Track 2: From Hallucination Detection to Truth
Given a question and an LLM-generated answer, the model must first determine whether the answer is hallucinated. It must then identify the correct answer from six highly similar candidate answers written in the same style as the generated response.
For each example, participants are provided with a question, a baseline LLM answer, and six candidate answers labeled A–F. The task consists of two steps:
• Step 1 – Hallucination Detection: Determine whether the generated answer is reliable (no_hallucination) or contains hallucinated, misleading, unsupported, or factually incorrect information (hallucination).
• Step 2 – Find the Truth: Select the single correct answer from six close and challenging candidate options.
The dataset covers diverse domains, including general knowledge, cultural knowledge, and Islamic knowledge, making the task particularly challenging for factual reasoning and truth verification.
Important Dates
All deadlines are 11:59pm UTC-12 (Anywhere on Earth):
• May 16, 2026: Release of Task Website, Training/Development Data, and Evaluation Scripts
• July 20, 2026: Registration Deadline & Blind Test Data Release
• July 30, 2026: Final Results Released
• August 22, 2026: Camera-ready System Description Papers Due
• September 1, 2026: Shared Task Overview Papers Due
• September 10, 2026: Conference Camera-ready Deadline
• October 24–29, 2026: ArabicNLP 2026 / EMNLP 2026
Website and Registration
https://halluscoring.github.io/HalluScoring-2026/
We warmly encourage researchers, students, and practitioners working on Arabic NLP, LLMs, fact-checking, reasoning, and trustworthy AI to participate.
Contact
We look forward to your participation and contributions toward building more reliable Arabic language technologies.