[CFP] NAKBA NLP 2026: Arabic Manuscript Understanding Shared Task

9 views
Skip to first unread message

Ahmad Chamseddine

unread,
Jan 15, 2026, 4:10:39 AM (4 days ago) Jan 15
to sig...@googlegroups.com
Dear Colleagues,

We are pleased to announce the NAKBA NLP 2026: Arabic Manuscript Understanding Shared Task, a research competition advancing Arabic manuscript OCR and transcription technologies. We invite researchers, practitioners, and students to participate in this initiative that provides an open, curated benchmark for Arabic manuscript processing.

DATASET OVERVIEW

The shared task features a substantial dataset drawn from the Omar Al-Saleh Memoir Collection, spanning 16 historical documents from 1951 to 1965:

  • 6,395 high-resolution manuscript pages

  • 1,597,025 words

  • 50,685 sentences

  • 50,672 paragraphs

  • Expert-verified, line-level transcriptions


COMPETITION TRACKS

SUBTASK 1: TRANSCRIPTION TRACK

Manual transcription of unseen manuscript pages at line level to enrich the benchmark with high-quality ground truth.

  • Each team receives ~500 line images (mandatory batch)

  • Additional batches available for bonus contributions

  • CodaBench submission: https://acr.ps/1L9F2KW

Evaluation Criteria:

  • Coverage completeness

  • Transcription accuracy (CER, WER)

  • Quality of submitted transcription guidelines

Important: No generative AI tools may be used for transcription in this track.


SUBTASK 2: SYSTEMS TRACK

Development of automatic OCR/HTR systems for Arabic manuscripts.

Dataset splits:

  • Training set: ~15,962 line images with gold transcriptions

  • Development set: ~1,774 line images with gold transcriptions

  • Test set: ~2,095 line images (held out)

CodaBench submission: https://acr.ps/1L9F2LL

Evaluation Metrics:

  • Primary: Character Error Rate (CER)

  • Secondary: Word Error Rate (WER)

Baseline Resources:


KEY DATES

  • January 1, 2026 – Call for Participation

  • January 10, 2026 – Release of Training Data / Transcription Phase Begins

  • February 10, 2026 – Evaluation Period Begins (Subtask 2)

  • February 17, 2026 – Evaluation Period Ends (Both Subtasks)

  • February 21, 2026 – Results Announcement

  • March 1, 2026 – System Paper Submission Deadline

  • March 15, 2026 – Acceptance Notification

  • March 21, 2026 – Camera-Ready Deadline


SUBMISSION REQUIREMENTS

  • All participating teams must submit a 4-page system description paper

  • Teams may participate in one or both subtasks

  • Registration required to receive dataset access

  • Submissions will undergo peer review via OpenReview

  • Selected papers will be published in NAKBA NLP 2026 proceedings

  • Contributed transcriptions will be released under CC-BY-4.0 (where licensing permits)


ORGANIZING COMMITTEE

  • Fadi Zaraket (Arab Center for Research and Policy Studies / American University of Beirut)

  • Bilal Shalash (Arab Center for Research and Policy Studies)

  • Hadi Hamoud (Arab Center for Research and Policy Studies)

  • Ahmad Chamseddine (Arab Center for Research and Policy Studies)

  • Firas Ben Abid (Zinki AI)

  • Mustafa Jarrar (Hamad Bin Khalifa University / Birzeit University)

  • Chadi Abou Chakra (Arab Center for Research and Policy Studies)

  • Andrew Naaem (Arab Center for Research and Policy Studies)

  • Bernard Ghanem (King Abdullah University of Science and Technology)

  • Monther Salahat (Birzeit University)


HOW TO PARTICIPATE

  • Register via the application form

  • Access datasets upon approval (via GDrive)

  • Submit results through CodaBench platforms

  • Submit system description paper by March 1

  • For questions, please contact: ar...@dohainstitute.edu.qa

Reply all
Reply to author
Forward
0 new messages