Proud of the Arabic NLP Community Presence at LREC 2026

Nizar Habash

unread,

May 9, 2026, 4:14:18 AMMay 9

to SIGARAB: Special Interest Group on Arabic Natural Language Processing

Dear colleagues,

I would like to express how proud I am of the strong participation and visibility of the Arabic NLP community at LREC 2026.

This year’s contributions covered a remarkable range of topics, including speech technologies, dialect processing, multimodal reasoning, morphology, lexicons, essay scoring, translation, cultural QA, and Arabic benchmarks for LLMs. The accepted papers and workshops reflect both the depth and diversity of our growing community. I provide a summary list of the main conference papers below that I was able to identify. If I missed any, please respond to this email to let us know about your work.

It was especially exciting to see the continued success of the OSACT workshop series, as well as the organization of Nakba NLP 2026. Equally inspiring was the participation of many members of the Arabic NLP community in the main conference and in workshops on topics other than Arabic NLP.

Congratulations to everyone who participated in making this happen. The Arabic NLP community continues to make an important and visible impact on the global NLP landscape.

For those who will be in Palma, let's get together!

Best regards,

Nizar Habash

President of SIGARAB

Professor of Computer Science

New York University Abu Dhabi
https://www.nizarhabash.com/

https://x.com/NYHabash

ADAB: Arabic Dataset for Automated Politeness Benchmarking - a Large-Scale Resource for Computational Sociopragmatics — Hend Al-Khalifa, Nadia Ghezaiel, Maria Bounnit, Hend Hamed Alhazmi, Noof Abdullah Alfear, Reem Fahad Alqifari, Ameera Masoud Almasoud, and Sharefah Ahmed Al-Ghamdi
Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization — Passant Elchafei and Amany Fashwan
Efficient Adaptation of English Language Models for Morphologically Rich and Underrepresented Languages: The Case of Arabic — Ahmed Samy Eldamaty, Mohamed Maher Zenhom Abdelrahman, Mohamed Mostafa Ibrahim Elbehery, Mariam Ashraf, and Radwa Elshawi
Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study — Hawau Olamide Toyin, Samar Mohamed Magdy, and Hanan Aldarmaki
Corruption-Based Data Augmentation for Arabic Essay Scoring: A Preliminary Study on the Organization Trait — May Saed Bashendy and Tamer Elsayed
Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS — Rania Al-Sabbagh
AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse — Esra'a Ahmad Sharqawi and Wajdi Zaghouani
Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus — Wajdi Zaghouani, Mabrouka Bessghaier, Md. Rafiul Biswas, and Shimaa Amer Ibrahim
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse — Aisha Ali Al-Athba and Wajdi Zaghouani
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination — Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, and Houda Bouamor
ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization — Wajdi Zaghouani, Kais Attia, Md. Rafiul Biswas, and Fadhl Eryani
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media — Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, and Houda Bouamor
Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach — Salim Al Mandhari, Hieu Pham Dinh, Mo El-Haj, and Paul Rayson
TDMulti: A Tunisian Dialect-Modern Standard Arabic Multitask Corpus with a Context-Aware Cross-Attention BERT Model — Roua Torjmen and Kais Haddar
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark — Sara Ghaboura, Shubham Patle, Ketan More, Wafa Hamad Mohamed Alghallabi, Omkar Thawakar, Jorma Laaksonen, Hisham Cholakkal, Salman Khan, and Rao Anwer
Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language Models — Injy Hamed, Bashar Alhafni, Nizar Habash, and Thamar Solorio
A Comprehensive Full-Form Lexicon for Arabic NLP and Speech Technology — Yannis Haralambous and Jack Halpern
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models — Malik H. Altakrori, Nizar Habash, Teresa Lynn, Younes Samih, Abed Alhakim Freihat, Kirill Chirkunov, Muhammed AbuOdeh, Radu Florian, Preslav Nakov, and Alham Fikri Aji
WhiteHouse: Translation of the Casablanca Corpus for Multi-dialectal Arabic Speech Translation — Fethi Bougares, Salima Mdhaffar, and Yannick Estève
Mu'jam Arriyadh: A Comprehensive Lexicon for Contemporary Arabic Language — Afrah A. Altamimi, Abdulrahman Alosaimy, Halah Munif Alharbi, Hawra Aljasim, Muneera Alhoshan, Amal Almazrua, Hanan Alharbi, Abdulrahman Saeed Alshehri, Bayan M. Almuqhim, Maryam H. Algarny, Yahya A. Asiri, Abdullah I. Alharbi, Saleh Zaidan Albalawi, Fawziah Mohammed Asiri, Sara Ali Alhifthi, and Abdullah Alfaifi
Saudi ASWAT: A Large-Scale Corpus of Spontaneous Saudi Arabic Speech — Abdullah I. Alharbi, Afrah A. Altamimi, Muneera Alhoshan, Amal Almazrua, Halah Munif Alharbi, Bayan M. Almuqhim, Hawra Aljasim, Abdulrahman Alosaimy, Yahya A. Asiri, and Abdullah Alfaifi
A Bilingual Bimodal Benchmark for Arabic-English NLP across Grammatical Correction, Essay Scoring, Morphological Tagging, and Speech Recognition — Bashar Alhafni, Injy Hamed, Fadhl Eryani, David Palfreyman, and Nizar Habash
A Large and Balanced Multi-Domain Arabic Corpus Annotated for Morphology, Syntax, and Readability — Khalid N. Elmadani, Adel Mahmoud Wizani, Hanada Taha Thomure, and Nizar Habash
Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants — Hunzalah Hassan Bhatti and Firoj Alam
Morphemes without Borders: Evaluating Root–Pattern Morphology in Arabic Tokenizers and LLMs — Yara Yousif Alakeel, Chatrine Qwaider, Hanan Aldarmaki, and Sawsan Alqahtani
Masrad: Arabic Terminology Management Corpora with Semi-Automatic Construction — Mahdi Nasser, Laura Sayah, and Fadi Zaraket

Tamer Elsayed

unread,

May 9, 2026, 4:46:45 AMMay 9

to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Thanks Nizar for your email and list of papers. It is indeed inspiring to see that strong presence of the community at LREC 2026. Congratulations to all those with accepted papers and workshops at the main conference.

In addition to paper #5 above, we also have 2 other accepted papers at the main conference, not on Arabic in particular, but on the evaluation of automated essay scoring models:

Is One Dataset Enough for Evaluation? Studying Generalizability of Automated Essay Scoring Models -- Sohaila Eltanbouly, Marwan Sayed and Tamer Elsayed
Quadratic Weighted Kappa Is Not Enough for Evaluating Automated Essay Scoring Models -- Salam Albatarni and Tamer Elsayed

Tamer

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/CAFfBGVnk6dMq35vv%3DAfPkXtjesjkbR-56m6%2BBGivGpG5k-3yjQ%40mail.gmail.com.

Mahmoud Fawzi

unread,

May 9, 2026, 5:15:18 AMMay 9

to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Thanks Nizar for compiling this list and thanks everyone for making this possible.

I would like to point out that in addition to our two contributions to the NakbaNLP workshop, we also have a paper about the same topic at the PoliticalNLP workshop.

We find it important to publish this research not only within the interested communities but also in broader contexts where a wider spectrum of audience can see it.

The title of the paper is From Cairo to Cape Town: How African Twitter Shapes the Global Palestine-Israel Narrative

and it will be presented in the first oral session.

Thanks again and enjoy LREC!

Regards

Mahmoud

--

Zaid Alyafeai

unread,

May 9, 2026, 5:24:54 AMMay 9

to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Thanks Nizar for compiling the list. I encourage the authors to add the datasets to Masader for visibility.

Zaid

--

Omar Najar

unread,

May 9, 2026, 11:00:50 AMMay 9

to Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Dear Nizar,

Thank you very much for this wonderful message and for highlighting the strong presence of the Arabic NLP community at LREC 2026.

It is truly encouraging to see Arabic NLP continuing to grow in visibility across LREC, ACL, and other major NLP venues. The breadth of contributions this year clearly shows how active and diverse the community has become, from core linguistic resources and dialect processing to multimodal models, retrieval, speech, translation, and Arabic LLM evaluation. We are really proud to be part of this momentum.

I would also like to share some of our recent work from the NAMAA Community team, which will appear through NakbaNLP and OSACT7 at LREC 2026:

GATE-Reranker: A strong Arabic cross-encoder for high-precision document reranking in search and RAG systems.
NAJD-MT: A high-fidelity Saudi Najdi–English dataset for bidirectional machine translation.
ASCAT: An advanced Arabic scientific corpus for rigorous translation evaluation.
Fine-Tashkeel: A comprehensive evaluation of Seq2Seq and multimodal approaches for Arabic speech diacritization.
Ketaba-OCR: Efficient adaptation of vision-language models for Arabic handwritten manuscript recognition.

The NAMAA team has been working actively on several Arabic NLP and Arabic multimodal AI directions, and we would be very happy for colleagues in the community to check out the work, share feedback, and explore possible collaborations.

Congratulations again to everyone contributing to this exciting progress. Looking forward to seeing many of you in Palma.

Best regards,
Omer Nacar

--

Fadi Zaraket

unread,

May 9, 2026, 11:11:38 AMMay 9

to Omar Najar, Nizar Habash, SIGARAB: Special Interest Group on Arabic Natural Language Processing

Thank you Nizar (should I say Community-Baba Nizar ;) ) for this wonderful message, and great shout out to all for their contributions and for sharing them here.

May I suggest that we dedicate a session in Arabic NLP to feature the contributions in Arabic NLP that make it outside the conference especially if in top venues? This can help us define the Arabic NLP focus and also keep the momentum of the vibrant community centered in the conference and contributing across the broader nlp and scientific communities.

Cheers,

Fadi

To view this discussion visit https://groups.google.com/d/msgid/sigarab/CABPE6JMawu%2Bm6%2B%3DHqonJRHxupSU_h7fBMz%3DF90WM3WMGoQeBbg%40mail.gmail.com.

Nizar Habash

unread,

May 9, 2026, 11:17:52 AMMay 9

to Fadi Zaraket, SIGARAB: Special Interest Group on Arabic Natural Language Processing, Abdul-Mageed, Muhammad

Hi Fadi - :-D Thanks!

Do you mean in the next Arabic NLP conference? -- I think this is a nice idea (@Muhammad Abdul Mageed is the General Chair of Arabic NLP 2026). We can also plan to do this as part of the Arabic NLP Birds of a Feather session (TBD in EMNLP 2026).

Best

Nizar

--

Nizar Habash

mustaf...@gmail.com

unread,

May 9, 2026, 2:45:22 PMMay 9

to sig...@googlegroups.com

Thanks Nizar, all

I would like to share our datasets and papers at LREC (will be released shortly)

Tymaa Hammouda, Alaa Aljabari, Nagham Hamad, Mustafa Jarrar. 2026. AraREQ: A Dataset and End-to-End System for Conflict Detection and Resolution in Software Requirements. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2026), Palma de Mallorca, Spain. https://www.jarrar.info/publications/HAHJ26.pdf

Hadi Hamoud, Ahmad Ali Chamseddine, Bilal Shalash, Firas Ben Abid, Mustafa Jarrar, Chadi Abou Chakra, Bernard Ghanem, Fadi A. Zaraket. 2026. NAKBA NLP 2026: Shared Task on Arabic Handwritten Manuscript Understanding (Palestine Memory–Omar Al-Saleh Memoir). In Proceedings of The second International Workshop on Nakba Narratives as Language Resources, Palma de Mallorca, Spain.

Alexei Abrahams, Shadi Abudalfa, Mustafa Jarrar, George Mikros. 2026. The NakbaArchiveClassifier Shared Task on Nakba Image Classification. In Proceedings of The second International Workshop on Nakba Narratives as Language Resources, Palma de Mallorca, Spain.

See you in Palma de Mallorca tomorrow

—Mustafa

On 09/05/2026, 6:17 PM, "sig...@googlegroups.com" <sig...@googlegroups.com> wrote:

Best Regards,

--Mustafa

To view this discussion visit https://groups.google.com/d/msgid/sigarab/CAFfBGV%3Dzu%2BRGpPo9z%2Bn4dRfpZUt07jc4xAER1RNiqKGE%3DY%2BB-g%40mail.gmail.com.

Walid Magdy

unread,

May 9, 2026, 3:48:04 PMMay 9

to mustaf...@gmail.com, sig...@googlegroups.com

What an amazing long list of excellent papers. Amazing work everyone.

I would second the suggestion of Fadi for holding a session at ArabicNLP where we invite papers published in about Arabic NLP during the whole year in different venues (journals and conferences) to be presented in a poster session during the conference. It is a non-archival submission, where only people can submit request to present.

I hope Mohamed Abdul Mageed and the organising team can take this into consideration for this year.

Walid

To view this discussion visit https://groups.google.com/d/msgid/sigarab/EA702478-0E09-3C43-A497-C84F32F3E5E6%40hxcore.ol.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Abdul-Mageed, Muhammad

unread,

May 9, 2026, 4:02:19 PMMay 9

to Walid Magdy, mustaf...@gmail.com, sig...@googlegroups.com

AA,

Thanks everyone.

The organizing team will discuss the suggestion iA.

Best,

Muhammad Abdul-Mageed

(On behalf of the team)

Get Outlook for iOS

From: 'Walid Magdy' via SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Sent: Saturday, May 9, 2026 12:47:48 PM
To: mustaf...@gmail.com <mustaf...@gmail.com>; sig...@googlegroups.com <sig...@googlegroups.com>

Subject: Re: [SIGARAB] Proud of the Arabic NLP Community Presence at LREC 2026

[CAUTION: Non-UBC Email]

To view this discussion visit https://groups.google.com/d/msgid/sigarab/DU0PR05MB9238771EB7AFCB3123C08F47A93A2%40DU0PR05MB9238.eurprd05.prod.outlook.com.

Wajdi Zaghouani

unread,

May 9, 2026, 5:05:27 PMMay 9

to mustaf...@gmail.com, sig...@googlegroups.com

Thank you Nizar and everyone. Indeed, this looks like a very productive LREC 2026 for the Arabic NLP community.

From our side, MARSAD Lab will be participating in both the LREC main conference and associated workshops with contributions spanning multilingual NLP, computational social science, LLM safety, political discourse, affective computing, low-resource languages, and responsible AI (All papets are listed below). I look forward to meeting many of you attending LREC this year.

Main Conference Papers (LREC 2026)

Large-Scale Datasets and Social Media Resources

• Zaghouani, W.; Biswas, M. R.; Bessghaier, M.; Ibrahim, S. A.; & Mikros, G. (2026). ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Sharqawi, E. A.; & Zaghouani, W. (2026). AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Bessghaier, M.; Biswas, M. R.; & Ibrahim, S. A. (2026). Audience Engagement with Arabic Women’s Social Empowerment and Wellbeing: A Decadal Corpus. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Ali Al-Athba, A.; & Zaghouani, W. (2026). Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Ibrahim, S. A.; & Bouamor, H. (2026). ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Attia, K.; Biswas, M. R.; & Eryani, F. (2026). ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

• Zaghouani, W.; Ibrahim, S. A.; Bouamor, H.; & Bessghaier, M. (2026). JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026).

Workshop Papers

LLM Safety and Under-Resourced Languages

• Wajdi Zaghouani, Kholoud Khalil Aldous and Isra Fejzullaj. AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian. In Proceedings of the SIGUL 2026 Workshop (Special Interest Group on Under-resourced Languages) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani, Shimaa Amer Ibrahim, Aruzhan Muratbek, Olzhasbek Zhakenov and Adiya Akhmetzhanova. KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models. In Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI and DCLRL at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani, Kholoud Khalil Aldous and Yicheng Gao. Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese. In Proceedings of the RESOURCEFUL 2026 Workshop at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

Nakba-NLP 2026: Datasets, Shared Tasks, and Systems

• Wajdi Zaghouani, Mabrouka Bessghaier and Kais Attia. Nakba Discourse 2025: A Bilingual Social Media Dataset for Collective Trauma Analysis. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Kholoud Khalil Aldous, Md Rafiul Biswas, Mabrouka Bessghaier, Shimaa Amer Ibrahim, Kais Attia and Wajdi Zaghouani. StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Ashhadul Islam, Md Rafiul Biswas, Samir Brahim Belhaouari and Wajdi Zaghouani. Pushing Boundaries at NakbaVirality: Recursive Prompt Improvement for Multimodal Virality Classification. In Proceedings of Nakba-NLP 2026: The 2nd International Workshop on Nakba Narratives as Language Resources at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

LLMs, Ethics, Cognition, and Responsible AI (Some have Arabic Case studies)

• Wajdi Zaghouani. Accountable Human-AI Deliberation with LLMs: Scaling Collective Intelligence through Symbiotic Scaffolding. In Proceedings of the 2nd Workshop on Language-driven Deliberation Technology (DELITE 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Beyond the Black Box: Ethical and Theoretical Grounding in Affective Computing. In Proceedings of the Workshop on Computational Affective Science (CAS 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Grounding Information Disorder in NLP: A Theoretical and Operational Framework. In Proceedings of the Workshop on Information Disorder (INDOR 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Cultural Adaptation in Large Language Models for Political Discourse. In Proceedings of the Political Natural Language Processing Workshop (Political NLP 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Toward Cognitive Alignment in Large Language Models: Integrating Linguistic Theory and Human Data. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities. In Proceedings of the Workshop on Large Language Models for Social Sciences and Humanities (LLM4SSH 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

Dialect Resources, Linguistic Inequality, and Lexical Innovation (Arabic Focused)

• Wajdi Zaghouani. The Generator–Eraser Paradox: Community Guidelines for Responsible LLM-Assisted Dialect Resource Creation. In Proceedings of the Dialect Resources Workshop (DialRes 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

• Wajdi Zaghouani. High Resource Bias in AI-Driven Neology: Structural Inequality in Lexical Innovation. In Proceedings of the Workshop on Neology and Large Language Models (NeoLLM 2026) at the Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, Spain, 11–16 May 2026.

----

Wajdi Zaghouani, Ph.D.
Associate Professor,

Communication Program

Northwestern Qatar | Education City
T +974 4454 5232 | M +974 3345 4992

To view this discussion visit https://groups.google.com/d/msgid/sigarab/EA702478-0E09-3C43-A497-C84F32F3E5E6%40hxcore.ol.

Reply all

Reply to author

Forward