Proud of the Arabic NLP Community Presence at LREC 2026

75 views
Skip to first unread message

Nizar Habash

unread,
May 9, 2026, 4:14:18 AM (3 days ago) May 9
to SIGARAB: Special Interest Group on Arabic Natural Language Processing

Dear colleagues,

I would like to express how proud I am of the strong participation and visibility of the Arabic NLP community at LREC 2026.

This year’s contributions covered a remarkable range of topics, including speech technologies, dialect processing, multimodal reasoning, morphology, lexicons, essay scoring, translation, cultural QA, and Arabic benchmarks for LLMs. The accepted papers and workshops reflect both the depth and diversity of our growing community.  I provide a summary list of the main conference papers below that I was able to identify. If I missed any, please respond to this email to let us know about your work.

It was especially exciting to see the continued success of the OSACT workshop series, as well as the organization of Nakba NLP 2026. Equally inspiring was the participation of many members of the Arabic NLP community in the main conference and in workshops on topics other than Arabic NLP.

Congratulations to everyone who participated in making this happen. The Arabic NLP community continues to make an important and visible impact on the global NLP landscape.

For those who will be in Palma, let's get together!

Best regards,

Nizar Habash
President of SIGARAB
Professor of Computer Science
New York University Abu Dhabi
https://www.nizarhabash.com/

  1. ADAB: Arabic Dataset for Automated Politeness Benchmarking - a Large-Scale Resource for Computational Sociopragmatics — Hend Al-Khalifa, Nadia Ghezaiel, Maria Bounnit, Hend Hamed Alhazmi, Noof Abdullah Alfear, Reem Fahad Alqifari, Ameera Masoud Almasoud, and Sharefah Ahmed Al-Ghamdi

  2. Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization — Passant Elchafei and Amany Fashwan

  3. Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic — Ahmed Samy Eldamaty, Mohamed Maher Zenhom Abdelrahman, Mohamed Mostafa Ibrahim Elbehery, Mariam Ashraf, and Radwa Elshawi

  4. Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study — Hawau Olamide Toyin, Samar Mohamed Magdy, and Hanan Aldarmaki

  5. Corruption-Based Data Augmentation for Arabic Essay Scoring: A Preliminary Study on the Organization Trait — May Saed Bashendy and Tamer Elsayed

  6. Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS — Rania Al-Sabbagh

  7. AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse — Esra'a Ahmad Sharqawi and Wajdi Zaghouani

  8. Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus — Wajdi Zaghouani, Mabrouka Bessghaier, Md. Rafiul Biswas, and Shimaa Amer Ibrahim

  9. Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse — Aisha Ali Al-Athba and Wajdi Zaghouani

  10. ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination — Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, and Houda Bouamor

  11. ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization — Wajdi Zaghouani, Kais Attia, Md. Rafiul Biswas, and Fadhl Eryani

  12. JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media — Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, and Houda Bouamor

  13. Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach — Salim Al Mandhari, Hieu Pham Dinh, Mo El-Haj, and Paul Rayson

  14. TDMulti: A Tunisian Dialect-Modern Standard Arabic Multitask Corpus with a Context-Aware Cross-Attention BERT Model — Roua Torjmen and Kais Haddar

  15. ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark — Sara Ghaboura, Shubham Patle, Ketan More, Wafa Hamad Mohamed Alghallabi, Omkar Thawakar, Jorma Laaksonen, Hisham Cholakkal, Salman Khan, and Rao Anwer

  16. Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language Models — Injy Hamed, Bashar Alhafni, Nizar Habash, and Thamar Solorio

  17. A Comprehensive Full-Form Lexicon for Arabic NLP and Speech Technology — Yannis Haralambous and Jack Halpern

  18. DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models — Malik H. Altakrori, Nizar Habash, Teresa Lynn, Younes Samih, Abed Alhakim Freihat, Kirill Chirkunov, Muhammed AbuOdeh, Radu Florian, Preslav Nakov, and Alham Fikri Aji

  19. WhiteHouse: Translation of the Casablanca Corpus for Multi-dialectal Arabic Speech Translation — Fethi Bougares, Salima Mdhaffar, and Yannick Estève

  20. Mu'jam Arriyadh: A Comprehensive Lexicon for Contemporary Arabic Language — Afrah A. Altamimi, Abdulrahman Alosaimy, Halah Munif Alharbi, Hawra Aljasim, Muneera Alhoshan, Amal Almazrua, Hanan Alharbi, Abdulrahman Saeed Alshehri, Bayan M. Almuqhim, Maryam H. Algarny, Yahya A. Asiri, Abdullah I. Alharbi, Saleh Zaidan Albalawi, Fawziah Mohammed Asiri, Sara Ali Alhifthi, and Abdullah Alfaifi

  21. Saudi ASWAT: A Large-Scale Corpus of Spontaneous Saudi Arabic Speech — Abdullah I. Alharbi, Afrah A. Altamimi, Muneera Alhoshan, Amal Almazrua, Halah Munif Alharbi, Bayan M. Almuqhim, Hawra Aljasim, Abdulrahman Alosaimy, Yahya A. Asiri, and Abdullah Alfaifi

  22. A Bilingual Bimodal Benchmark for Arabic-English NLP across Grammatical Correction, Essay Scoring, Morphological Tagging, and Speech Recognition — Bashar Alhafni, Injy Hamed, Fadhl Eryani, David Palfreyman, and Nizar Habash

  23. A Large and Balanced Multi-Domain Arabic Corpus Annotated for Morphology, Syntax, and Readability — Khalid N. Elmadani, Adel Mahmoud Wizani, Hanada Taha Thomure, and Nizar Habash

  24. Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants — Hunzalah Hassan Bhatti and Firoj Alam

  25. Morphemes without Borders: Evaluating Root–Pattern Morphology in Arabic Tokenizers and LLMs — Yara Yousif Alakeel, Chatrine Qwaider, Hanan Aldarmaki, and Sawsan Alqahtani

  26. Masrad: Arabic Terminology Management Corpora with Semi-Automatic Construction — Mahdi Nasser, Laura Sayah, and Fadi Zaraket

Tamer Elsayed

unread,
May 9, 2026, 4:46:45 AM (3 days ago) May 9
to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing
Thanks Nizar for your email and list of papers. It is indeed inspiring to see that strong presence of the community at LREC 2026. Congratulations to all those with accepted papers and workshops at the main conference.
In addition to paper #5 above, we also have 2 other accepted papers at the main conference, not on Arabic in particular, but on the evaluation of automated essay scoring models:
  • Is One Dataset Enough for Evaluation? Studying Generalizability of Automated Essay Scoring Models -- Sohaila Eltanbouly, Marwan Sayed and Tamer Elsayed
  • Quadratic Weighted Kappa Is Not Enough for Evaluating Automated Essay Scoring Models -- Salam Albatarni and Tamer Elsayed
Tamer

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/CAFfBGVnk6dMq35vv%3DAfPkXtjesjkbR-56m6%2BBGivGpG5k-3yjQ%40mail.gmail.com.

Mahmoud Fawzi

unread,
May 9, 2026, 5:15:18 AM (3 days ago) May 9
to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing
Thanks Nizar for compiling this list and thanks everyone for making this possible.

I would like to point out that in addition to our two contributions to the NakbaNLP workshop, we also have a paper about the same topic at the PoliticalNLP workshop.
We find it important to publish this research not only within the interested communities but also in broader contexts where a wider spectrum of audience can see it.

The title of the paper is From Cairo to Cape Town: How African Twitter Shapes the Global Palestine-Israel Narrative
and it will be presented in the first oral session.

Thanks again and enjoy LREC!

Regards
Mahmoud

--

Zaid Alyafeai

unread,
May 9, 2026, 5:24:54 AM (3 days ago) May 9
to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing
Thanks Nizar for compiling the list. I encourage the authors to add the datasets to Masader for visibility.

Zaid 

--

Omar Najar

unread,
May 9, 2026, 11:00:50 AM (3 days ago) May 9
to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Dear Nizar,

Thank you very much for this wonderful message and for highlighting the strong presence of the Arabic NLP community at LREC 2026.

It is truly encouraging to see Arabic NLP continuing to grow in visibility across LREC, ACL, and other major NLP venues. The breadth of contributions this year clearly shows how active and diverse the community has become, from core linguistic resources and dialect processing to multimodal models, retrieval, speech, translation, and Arabic LLM evaluation. We are really proud to be part of this momentum.

I would also like to share some of our recent work from the NAMAA Community team, which will appear through NakbaNLP and OSACT7 at LREC 2026:

  1. GATE-Reranker: A strong Arabic cross-encoder for high-precision document reranking in search and RAG systems.
  2. NAJD-MT: A high-fidelity Saudi Najdi–English dataset for bidirectional machine translation.
  3. ASCAT: An advanced Arabic scientific corpus for rigorous translation evaluation.
  4. Fine-Tashkeel: A comprehensive evaluation of Seq2Seq and multimodal approaches for Arabic speech diacritization.
  5. Ketaba-OCR: Efficient adaptation of vision-language models for Arabic handwritten manuscript recognition.

The NAMAA team has been working actively on several Arabic NLP and Arabic multimodal AI directions, and we would be very happy for colleagues in the community to check out the work, share feedback, and explore possible collaborations.

Congratulations again to everyone contributing to this exciting progress. Looking forward to seeing many of you in Palma.

Best regards,
Omer Nacar


--

Fadi Zaraket

unread,
May 9, 2026, 11:11:38 AM (3 days ago) May 9
to Omar Najar, Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing
Thank you Nizar (should I say Community-Baba Nizar ;) ) for this wonderful message, and great shout out to all for their contributions and for sharing them here.

May I suggest that we dedicate a session in Arabic NLP to feature the contributions in Arabic NLP that make it outside the conference especially if in top venues? This can help us define the Arabic NLP focus and also keep the momentum of the vibrant community centered in the conference and contributing across the broader nlp and scientific communities. 

Cheers, 

Fadi 



Nizar Habash

unread,
May 9, 2026, 11:17:52 AM (3 days ago) May 9
to Fadi Zaraket, SIGARAB: Special Interest Group on Arabic Natural Language Processing, Abdul-Mageed, Muhammad
Hi Fadi - :-D Thanks!

Do you mean in the next Arabic NLP conference? -- I think this is a nice idea (@Muhammad Abdul Mageed is the General Chair of Arabic NLP 2026).   We can also plan to do this as part of the Arabic NLP Birds of a Feather session (TBD in EMNLP 2026).

Best
Nizar
--
Nizar Habash

mustaf...@gmail.com

unread,
May 9, 2026, 2:45:22 PM (3 days ago) May 9
to sig...@googlegroups.com
Thanks Nizar, all 

I would like to share our datasets and papers at LREC  (will be released shortly)




See you in Palma de Mallorca tomorrow

—Mustafa

On 09/05/2026, 6:17 PM, "sig...@googlegroups.com" <sig...@googlegroups.com> wrote:

Best Regards,
--Mustafa

Walid Magdy

unread,
May 9, 2026, 3:48:04 PM (3 days ago) May 9
to mustaf...@gmail.com, sig...@googlegroups.com
What an amazing long list of excellent papers. Amazing work everyone.

I would second the suggestion of Fadi for holding a session at ArabicNLP where we invite papers published in about Arabic NLP during the whole year in different venues (journals and conferences) to be presented in a poster session during the conference. It is a non-archival submission, where only people can submit request to present.

I hope Mohamed Abdul Mageed and the organising team can take this into consideration for this year.

Walid

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Abdul-Mageed, Muhammad

unread,
May 9, 2026, 4:02:19 PM (3 days ago) May 9
to Walid Magdy, mustaf...@gmail.com, sig...@googlegroups.com
AA,

Thanks everyone.

The organizing team will discuss the suggestion iA.

Best,
Muhammad Abdul-Mageed
(On behalf of the team)


From: 'Walid Magdy' via SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Sent: Saturday, May 9, 2026 12:47:48 PM
To: mustaf...@gmail.com <mustaf...@gmail.com>; sig...@googlegroups.com <sig...@googlegroups.com>

Subject: Re: [SIGARAB] Proud of the Arabic NLP Community Presence at LREC 2026
 
[CAUTION: Non-UBC Email]

Wajdi Zaghouani

unread,
May 9, 2026, 5:05:27 PM (3 days ago) May 9
to mustaf...@gmail.com, sig...@googlegroups.com

Thank you Nizar and everyone. Indeed, this looks like a very productive LREC 2026 for the Arabic NLP community.

From our side, MARSAD Lab will be participating in both the LREC main conference and associated workshops with contributions spanning multilingual NLP, computational social science, LLM safety, political discourse, affective computing, low-resource languages, and responsible AI (All papets are listed below). I look forward to meeting many of you attending LREC this year.

Main Conference Papers (LREC 2026)

Large-Scale Datasets and Social Media Resources

• Zaghouani, W.; Biswas, M. R.; Bessghaier, M.; Ibrahim, S. A.; & Mikros, G. (2026). ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Sharqawi, E. A.; & Zaghouani, W. (2026). AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Bessghaier, M.; Biswas, M. R.; & Ibrahim, S. A. (2026). Audience Engagement with Arabic Women’s Social Empowerment and Wellbeing: A Decadal Corpus. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Ali Al-Athba, A.; & Zaghouani, W. (2026). Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Ibrahim, S. A.; & Bouamor, H. (2026). ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Attia, K.; Biswas, M. R.; & Eryani, F. (2026). ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Ibrahim, S. A.; Bouamor, H.; & Bessghaier, M. (2026). JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

Workshop Papers

LLM Safety and Under-Resourced Languages

• Wajdi Zaghouani, Kholoud Khalil Aldous and Isra Fejzullaj. AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian. In Proceedings of the SIGUL 2026 Workshop (Special Interest Group on Under-resourced Languages) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani, Shimaa Amer Ibrahim, Aruzhan Muratbek, Olzhasbek Zhakenov and Adiya Akhmetzhanova. KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models. In Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI and DCLRL at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani, Kholoud Khalil Aldous and Yicheng Gao. Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese. In Proceedings of the RESOURCEFUL 2026 Workshop at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

Nakba-NLP 2026: Datasets, Shared Tasks, and Systems

• Wajdi Zaghouani, Mabrouka Bessghaier and Kais Attia. Nakba Discourse 2025: A Bilingual Social Media Dataset for Collective Trauma Analysis. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Kholoud Khalil Aldous, Md Rafiul Biswas, Mabrouka Bessghaier, Shimaa Amer Ibrahim, Kais Attia and Wajdi Zaghouani. StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Ashhadul Islam, Md Rafiul Biswas, Samir Brahim Belhaouari and Wajdi Zaghouani. Pushing Boundaries at NakbaVirality: Recursive Prompt Improvement for Multimodal Virality Classification. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

LLMs, Ethics, Cognition, and Responsible AI (Some have Arabic Case studies) 

• Wajdi Zaghouani. Accountable Human-AI Deliberation with LLMs: Scaling Collective Intelligence through Symbiotic Scaffolding. In Proceedings of the 2nd Workshop on Language-driven Deliberation Technology (DELITE 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Beyond the Black Box: Ethical and Theoretical Grounding in Affective Computing. In Proceedings of the Workshop on Computational Affective Science (CAS 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Grounding Information Disorder in NLP: A Theoretical and Operational Framework. In Proceedings of the Workshop on Information Disorder (INDOR 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Cultural Adaptation in Large Language Models for Political Discourse. In Proceedings of the Political Natural Language Processing Workshop (Political NLP 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Toward Cognitive Alignment in Large Language Models: Integrating Linguistic Theory and Human Data. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities. In Proceedings of the Workshop on Large Language Models for Social Sciences and Humanities (LLM4SSH 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

Dialect Resources, Linguistic Inequality, and Lexical Innovation (Arabic Focused)

• Wajdi Zaghouani. The Generator–Eraser Paradox: Community Guidelines for Responsible LLM-Assisted Dialect Resource Creation. In Proceedings of the Dialect Resources Workshop (DialRes 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. High Resource Bias in AI-Driven Neology: Structural Inequality in Lexical Innovation. In Proceedings of the Workshop on Neology and Large Language Models (NeoLLM 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

----

Wajdi Zaghouani, Ph.D.
Associate Professor,

Communication Program


Northwestern Qatar | Education City
T +974 4454 5
232  | M +974 3345 4992



Reply all
Reply to author
Forward
0 new messages