NADI 2025 - Multidialectal Arabic Speech Processing

69 views
Skip to first unread message

bashartalafha

unread,
Jun 11, 2025, 3:19:40 PM6/11/25
to SIGARAB: Special Interest Group on Arabic Natural Language Processing

📢 Call for Participation – NADI 2025 Shared Task on Multidialectal Arabic Speech Processing

https://nadi.dlnlp.ai/2025


Hosted as part of ArabicNLP 2025, NADI 2025 brings together researchers in speech and language technologies to tackle some of the most pressing challenges in Arabic speech processing across dialects.

With the growing importance of inclusive, dialect-aware AI systems, this shared task offers a structured platform to advance research across three key subtasks, each supported by curated datasets, baseline models, and evaluation tools.


Subtask 1: Spoken Arabic Dialect Identification (ADI)

Objective: Given a short audio clip, predict the spoken Arabic dialect.
This task builds on prior efforts in dialect identification but leverages recent advances in multilingual speech models (e.g., Whisper, MMS) and robust embedding techniques (e.g., i-vector, x-vector).
Relevance: Dialect ID plays a crucial role in building adaptive ASR systems, conversational agents, and regional NLP pipelines.
Resources: Benchmark dataset, baseline models, and Codabench evaluation.


Subtask 2: Multidialectal Arabic ASR


Objective: Develop Automatic Speech Recognition (ASR) systems capable of transcribing Arabic speech across diverse dialects. Participants will use the Casablanca dataset and are encouraged to explore zero-shot, few-shot, or fine-tuned learning strategies to build robust models that handle phonetic variation and dialectal variations.
Relevance: This task supports advancements in generalizable ASR across under-resourced and linguistically diverse varieties of Arabic.
Resources: Labeled training/dev data, blind test set (Codabench), and baseline systems.


Subtask 3: Diacritic Restoration (DR)


Objective: Restore missing diacritics in Arabic text (and optionally speech) across MSA, Classical Arabic, and dialects.
This task focuses on developing models that generalize beyond MSA to more challenging spoken and code-switched data. Multimodal approaches (speech + text) are encouraged for better supervision.
Relevance: Diacritic restoration improves downstream tasks such as TTS, parsing, and disambiguation in Arabic NLP.
Resources: Annotated test sets, speech/text corpora, and baselines.


🛠️ What We Provide:

  • Carefully curated datasets across subtasks

  • Starter code and tutorials

  • Codabench evaluation platforms for submission and scoring

  • Clear task guidelines to support both academic and practical experimentation

Whether your focus is on speech recognition, language identification, or Arabic NLP more broadly, NADI 2025 offers an excellent opportunity to test novel ideas and benchmark your systems on real-world data.


📝 How To Participate?

  1. Fill out this form to register and participate: https://forms.gle/WHsyFMtyaewufN7E8

  2. Participate in Codabench (links can be found in the NADI website & will be provided after form submission)

  3. Access each dataset in HuggingFace (links can be found in the NADI website)


🔗 You can find all relevant information on: https://nadi.dlnlp.ai/2025

📨 Contact us: NadiSha...@gmail.com

🧠 Google Group for announcements, Q&A, and discussion: https://groups.google.com/u/4/g/nadi-shared-task-2025


Join us in advancing robust and inclusive Arabic speech technologies.
#NADI2025 #ArabicNLP #ASR #DialectIdentification #DiacriticRestoration #SpeechTechnology #MultidialectalArabic #ArabicSpeech #NLPResearch #SharedTask


Nate Robinson

unread,
Jun 11, 2025, 8:38:46 PM6/11/25
to bashartalafha, SIGARAB: Special Interest Group on Arabic Natural Language Processing
This is exciting! Can I ask why the subtasks are focused on speech? Is textual dialect identification considered a solved problem at this point? Or is there just less interest in it? Was last year's campaign not as productive as hoped?

Of course I think speech is a great medium that captures more of the diversity of Arabic language varieties, just wondering why the change, and if text-based ADI has been relegated to any other shared task / workshop.

Nate

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/e3b2710e-39f4-42df-a470-290ae7062324n%40googlegroups.com.

Abdul-Mageed, Muhammad

unread,
Jun 11, 2025, 9:09:30 PM6/11/25
to Nate Robinson, bashartalafha, sig...@googlegroups.com
Thanks, Nate.

We hope you will participate.

I see text-based dialect ID as an area that will remain important for a long time. This year we wanted to have a change and so no other reasons and I don’t know of another shared task doing text-based dialect ID.

Best,
Muhammad

Sent from my iPhone

On Jun 11, 2025, at 5:38 PM, Nate Robinson <n8rro...@gmail.com> wrote:


[CAUTION: Non-UBC Email]

Hanan Aldarmaki

unread,
Jun 12, 2025, 1:34:26 AM6/12/25
to Abdul-Mageed, Muhammad, Nate Robinson, bashartalafha, sig...@googlegroups.com
You can still do text-based ADI by applying ASR on the speech first ;) 



Abdul-Mageed, Muhammad

unread,
Jun 12, 2025, 1:42:43 AM6/12/25
to Hanan Aldarmaki, Nate Robinson, bashartalafha, sig...@googlegroups.com
Right, if you have a reasonable dialectal ASR (or maybe even a not-so-good one will still work for this purpose).

Or employ whatever other approaches, combining different modalities. 

Up to the imagination of participants, it would be interesting to see different methods.

Best,
Muhammad 
Sent from my iPhone

On Jun 11, 2025, at 10:34 PM, Hanan Aldarmaki <hanan.a...@gmail.com> wrote:


[CAUTION: Non-UBC Email]

عبد السلام الفيتوري أحمد النويصري

unread,
Jun 13, 2025, 3:29:46 AM6/13/25
to Nate Robinson, bashartalafha, SIGARAB: Special Interest Group on Arabic Natural Language Processing
I guess this is a good move since dialects can be identified through accents (لكنة) easier than text. 



Dr Abdusalam Nwesri,

1507522253403_DSCSmaller.jpg

Associate Professor,

Faculty of Information Technology,

University of Tripoli,

P.O.Box: 5760 Hai Alandalus,

Tripoli - Libya.

Tel: +218922307021

Email: a.nw...@uot.edu.ly



From: sig...@googlegroups.com <sig...@googlegroups.com> on behalf of Nate Robinson <n8rro...@gmail.com>
Sent: Thursday, 12 June 2025 2:38 AM
To: bashartalafha <bashar...@gmail.com>
Cc: SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Subject: Re: [SIGARAB] NADI 2025 - Multidialectal Arabic Speech Processing
 
Reply all
Reply to author
Forward
0 new messages