--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/5f6da2b3-45c7-4081-a14a-fa1a03c8d764n%40googlegroups.com.
Mohamed,
Thank you for this.
We have recently released several vision-language (multimodal) models and datasets that are relevant to your request,
particularly for image captioning and visual question answering in Arabic and its dialects.
Here are the details for our relevant work (PEARL, Peacock, and Dallah), including links to the papers, code, and models:
1. PEARL: A Multimodal Culturally-Aware Arabic Instruction Dataset
PEARL is a large-scale multimodal dataset explicitly designed for cultural understanding and visual question answering.
2. Peacock: A Family of Arabic Multimodal Large Language Models
Peacock is a family of Arabic MLLMs with strong vision and language capabilities. This project also introduces Henna, a benchmark for assessing cultural aspects in multimodal models.
3. Dallah: A Dialect-Aware Multimodal Large Language Model
Dallah is an advanced multimodal assistant specifically tailored for a number of Arabic dialects.
For a complete list of our work, you can also visit our main profiles here:
Bib entries are listed below.
I hope these resources are helpful for your work!
Best regards,
Muhammad Abdul-Mageed
Muhammad Abdul-Mageed,
Canada Research Chair in Natural Language Processing and Machine Learning,
Associate Professor
Chair, Minor in Informatics (iSchool)
Linguistics and School of Information (cross-appointed); Computer Science (courtesy)
The University of British Columbia | Vancouver Campus
![]()
BibTeX Citations
@inproceedings{alwajih-etal-2025-pearl,
title = "Pearl: A Multimodal Culturally-Aware {A}rabic Instruction Dataset",
author = "Alwajih, Fakhraddin and
Magdy, Samar M. and
El Mekki, Abdellah and
Nacar, Omer and
Nafea, Youssef and
Abdelfadil, Safaa Taher and
Yahya, Abdulfattah Mohammed and
Luqman, Hamzah and
Almarwani, Nada and
Aloufi, Samah and
Qawasmeh, Baraah and
Atou, Houdaifa and
Sibaee, Serry and
Alsayadi, Hamzah A. and
Al-Dhabyani, Walid and
Al-shaibani, Maged S. and
El aatar, Aya and
Qandos, Nour and
Alhamouri, Rahaf and
Ahmad, Samar and
AL-Ghrawi, Mohammed Anwar and
Yacoub, Aminetou and
AbuHweidi, Ruwa and
Lemin, Vatimetou Mohamed and
Abdel-Salam, Reem and
Bashiti, Ahlam and
Ammar, Adel and
Alansari, Aisha and
Ashraf, Ahmed and
Alturayeif, Nora and
Alcoba Inciarte, Alcides and
Elmadany, AbdelRahim A. and
Tourad, Mohamedou Cheikh and
Berrada, Ismail and
Jarrar, Mustafa and
Shehata, Shady and
Abdul-Mageed, Muhammad",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.1254/",
doi = "10.18653/v1/2025.findings-emnlp.1254",
pages = "23048--23079",
ISBN = "979-8-89176-335-7",
abstract = "Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available."
}
@inproceedings{alwajih-etal-2024-peacock,
title = "Peacock: A Family of {A}rabic Multimodal Large Language Models and Benchmarks",
author = "Alwajih, Fakhraddin and
Nagoudi, El Moatez Billah and
Bhatia, Gagan and
Mohamed, Abdelrahman and
Abdul-Mageed, Muhammad",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.689/",
doi = "10.18653/v1/2024.acl-long.689",
pages = "12753--12776",
abstract = "Multimodal large language models (MLLMs) have proven effective in a wide range of tasks that require complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, the success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, even those with large speaker populations, such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed *Peacock*, with strong vision and language capabilities. Through comprehensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce *Henna*, a new benchmark specifically designed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally-aware Arabic MLLMs. The GitHub repository for the *Peacock* project is available at [https://github.com/UBC-NLP/peacock](https://github.com/UBC-NLP/peacock)."
}
@inproceedings{alwajih-etal-2024-dallah,
title = "Dallah: A Dialect-Aware Multimodal Large Language Model for {A}rabic",
author = "Alwajih, Fakhraddin and
Bhatia, Gagan and
Abdul-Mageed, Muhammad",
editor = "Habash, Nizar and
Bouamor, Houda and
Eskander, Ramy and
Tomeh, Nadi and
Abu Farha, Ibrahim and
Abdelali, Ahmed and
Touileb, Samia and
Hamed, Injy and
Onaizan, Yaser and
Alhafni, Bashar and
Antoun, Wissam and
Khalifa, Salam and
Haddad, Hatem and
Zitouni, Imed and
AlKhamissi, Badr and
Almatham, Rawan and
Mrini, Khalil",
booktitle = "Proceedings of the Second Arabic Natural Language Processing Conference",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.arabicnlp-1.27/",
doi = "10.18653/v1/2024.arabicnlp-1.27",
pages = "320--336",
abstract = "Recent advancements have significantly enhanced the capabilities of Multimodal Large Language Models (MLLMs) in generating and understanding image-to-text content. Despite these successes, progress is predominantly limited to English due to the scarcity of high-quality multimodal resources in other languages. This limitation impedes the development of competitive models in languages such as Arabic. To alleviate this situation, we introduce an efficient Arabic multimodal assistant, dubbed ***Dallah***, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions. ***Dallah*** demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning six Arabic dialects, ***Dallah*** showcases its capability to handle complex dialectal interactions incorporating both textual and visual elements. The model excels in two benchmark tests: one evaluating its performance on Modern Standard Arabic (MSA) and another specifically designed to assess dialectal responses. Beyond its robust performance in multimodal interaction tasks, ***Dallah*** has the potential to pave the way for further development of dialect-aware Arabic MLLMs."
}
From:
sig...@googlegroups.com <sig...@googlegroups.com> on behalf of Mohamed Khenchouch <khenchouc...@gmail.com>
Date: Thursday, November 20, 2025 at 9:45 AM
To: SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Subject: [SIGARAB] Vision-Language dataset for Arabic language
|
[CAUTION: Non-UBC Email] |
Hi, everyone, I'm looking for open source available vision-language ( specifically Image captions) dataset.
thank you for taking into consideration my request of help.
best regards
--
--