[Resource] Arabic WordNet 4.0 Released - 109,823 synsets (CC BY 4.0)

37 views
Skip to first unread message

Salah Abdo

unread,
Jan 22, 2026, 4:39:51 AMJan 22
to sig...@googlegroups.com
Dear SIGARAB community,

I am happy to announce the release of Arabic WordNet 4.0, a
comprehensive lexical resource for Arabic NLP research.

Key Features:
- 109,823 synsets (100% OEWN coverage)
- 124,653 lexical entries
- 166,643 senses
- 265,676 synset relations
- 97.2% ILI coverage
- WN-LMF 1.4 format
- CC BY 4.0 license

Methodology:
Created using the expand approach, with translations generated
using AI-assisted translation (Google Gemini 3 Pro Preview).

Links:
- GitHub: https://github.com/Salah-Sal/arabic-wordnet-v4
- DOI: https://doi.org/10.5281/zenodo.18335226


This provides 11x more coverage than the previous Arabic WordNet
in OMW (9,916 synsets).

Derived from Open English WordNet (CC BY 4.0), based on Princeton
WordNet 3.0.

Feedback and contributions welcome via GitHub issues.

Best regards,
Salah Abdo
Salah.A...@gmail.com

Nizar Habash

unread,
Jan 22, 2026, 5:00:42 AMJan 22
to Salah Abdo, sig...@googlegroups.com
Thanks Salah for sharing this. Very useful.

A couple of questions:
(1) Is there a written report on the creation process?
(2) How does this work relate to the original Arabic Wordnet or the more recent efforts by Freihat et al 2024?
(3) Is there a quality evaluation of the generated resources? LLMs are good... but they hallucinate as you know. It would be helpful to quantify an error estimation.   For example, what is the degree of overlap with the original Arabic Wordnet? or a 1000 synset manual check?

Best
Nizar



--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/CAJOy2DY6abtzQm9PfVF8rLznb0f%3Dsp6O6EhGJr2QJAp2ztY_zQ%40mail.gmail.com.


--
Nizar Habash
Professor of Computer Science
New York University Abu Dhabi
https://www.nizarhabash.com/ 

Salah Abdo

unread,
Jan 22, 2026, 10:11:03 PMJan 22
to Nizar Habash, sig...@googlegroups.com
Dear Dr. Nizar,

Thank you for the thoughtful questions.

(1) Creation Process:
I am currently preparing a preprint that documents the full methodology. The approach uses hierarchical semantic batching to preserve WordNet taxonomy during LLM translation - grouping semantically related concepts together so the model receives proper context (hypernym chains, sibling concepts). I will share the paper once it's ready.

(2) Relation to Previous Work:
AWN 4.0 is independent of the original Arabic WordNet lineage. It uses the expand approach - translating from Open English WordNet 2024 - rather than the merge approach used in previous AWN versions. The preprint will include a detailed comparison with Freihat et al. 2024 and earlier AWN work.

(3) Quality Evaluation:
You raise an important point. Structural validation passes (0 errors against WN-LMF 1.4), but intrinsic accuracy evaluation is planned for the preprint, including:

Manual evaluation of a stratified sample
Overlap analysis with AWN V3 for the ~9,576 shared synsets
Error categorization
I would welcome any suggestions on evaluation benchmarks you'd recommend for Arabic lexical resources.

Best regards,
Salah


Karim BOUZOUBAA

unread,
Jan 23, 2026, 6:38:16 AMJan 23
to Salah Abdo, Nizar Habash, sig...@googlegroups.com
dear Salah

simply to let you know that we released the second version of AWN that was approved as the official one available from global wordnet. I suggest you take a look at these papers. You can also use the LREV paper where you can find how to make the evaluation of such resources.


Yasser Regragui, Lahsen Abouenour, Fettoum Krieche, Karim Bouzoubaa, Paolo Rosso:
Arabic WordNet: New Content and New Applications. GWC 2016: 333-341


Lahsen Abouenour, Karim Bouzoubaa, Paolo Rosso:
On the evaluation and improvement of Arabic WordNet coverage and usability. Lang. Resour. Evaluation 47(3): 891-917 (2013)

best, karim

-----------------------------------------------------------------------------------------------
                   Karim Bouzoubaa, M.Sc, Ph.D  د. كريم بوزوبع
                                                Full professor أستاذ جامعي
            Department of Computer Science  قسم علوم الحاسوب
   EMI (Ecole Mohammadia d'Ingénieurs,
          Mohammadia School of Engineers)  المدرسة المحمدية للمهندسين
            Mohammed V University in Rabat  جامعة محمد الخامس
                   Avenue Ibnsina B.P. 765 Agdal  شارع ابن سينا ص ب 765 أكدال
                                            Rabat, Morocco  الرباط المغرب

    Tel: +212 (0) 537 68.71.50 / +212 (0) 537 77.65.66 الهاتف
    Fax: +212 (0) 537 77.88.53 الفاكس
    karim.bouzoubaa [at] emi.ac.ma
    karim.bouzoubaa [at] um5r.ac.ma
    karimbouzoubaa [at] yahoo.com
    http://www.emi.ac.ma/bouzoubaa
    http://www.emi.ac.ma/alelm
    https://www.youtube.com/channel/UCFpBdMiXvofNsSIAxgyaxeA

** Please, consider the environment before printing this email من فضلكم فكروا في البيئة قبل طباعة هذه الرسالة -  **



--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.

---------------------------------------------------------------------------
إشعار بالسرية وإخلاء المسؤولية:
هذه الرسالة، بما في ذلك مرفقاتها، سرية ومخصصة للاستخدام من قبل الشخص أو الجهة الموجه إليها فقط. إذا لم تكن المرسل إليه المقصود، يرجى إبلاغ المرسل فوراً والقيام بحذف جميع النسخ من الرسالة والمرفقات من نظامك. أي استخدام غير مصرح به محظور. 
الآراء الواردة في هذه الرسالة هي آراء المرسل فقط ولا تعكس بالضرورة سياسة أو موقف جامعة محمد الخامس بالرباط. هذه الرسالة هي للإعلام فقط ولا ينبغي اعتبارها مشورة قانونية أو مالية أو مهنية. 
لا تتحمل جامعة محمد الخامس بالرباط أي مسؤولية عن أي تعديل أو تغيير في الرسالة. نوصي باتخاذ الاحتياطات اللازمة لضمان أمان المعلومات المرسلة عبر البريد الإلكتروني. 


---------------------------------------------------------------------------
Avis de confidentialité, clause de non-responsabilité et avis de sécurité :
Ce message, y compris ses pièces jointes, est confidentiel et destiné uniquement au destinataire prévu. Si vous n’êtes pas le destinataire, veuillez en informer l’expéditeur et supprimer toutes les copies. Toute utilisation non autorisée est interdite.
Les opinions exprimées n’engagent que l’expéditeur et ne reflètent pas nécessairement la position de l’Université Mohammed V de Rabat. Ce message est à titre informatif uniquement et ne constitue pas un conseil professionnel.
L’Université Mohammed V de Rabat décline toute responsabilité pour les erreurs ou altérations du message. Nous vous conseillons de prendre les précautions nécessaires pour sécuriser les informations transmises.


---------------------------------------------------------------------------
Confidentiality notice, disclaimer, and security notice:
This message, including any attachments, is confidential and intended solely for the recipient. If you are not the intended recipient, please notify the sender immediately and delete all copies from your system. Any unauthorized use is prohibited. 
The opinions expressed in this email are those of the sender and do not necessarily reflect the official position of Mohammed V University in Rabat. This message is for informational purposes only and should not be construed as legal, financial, or professional advice. 
Mohammed V University in Rabat does not accept any responsibility for errors or alterations to the message. We recommend taking the necessary precautions to ensure the security of the information transmitted via email.

mustaf...@gmail.com

unread,
Jan 23, 2026, 6:35:08 PMJan 23
to Karim BOUZOUBAA, Salah Abdo, Nizar Habash, sig...@googlegroups.com

Dear Salah,


Congrats on this work!

It is not surprising that LLMs can now generate WordNets, which arguably signals the end for the need for such resources 😉


This is why we chose a different path with the Arabic Ontology—an Arabic WordNet with ontologically clean content—developed manually and with extreme care. Our goal was to ensure high-quality, reliable semantic resources that are useful beyond IT-focused applications, particularly for Philosophy, Cultural, and knowledge-driven use cases. 


See my keynote at the WordNet conference in 2021: https://www.youtube.com/watch?v=Pgf4MzTHJc4

if you decided to compare with the Arabic Ontology, you can download it here

Best Regards
Mustafa


On 23/01/2026, 2:38 PM, "sig...@googlegroups.com" <sig...@googlegroups.com> wrote:

Best Regards,

--Mustafa

Abdelhakim Freihat

unread,
Jan 23, 2026, 7:29:28 PMJan 23
to mustaf...@gmail.com, Karim BOUZOUBAA, Salah Abdo, Nizar Habash, sig...@googlegroups.com
Dear Mustafa,

I completely agree with you.

Machines are still a long way from building reliable sources of knowledge. In addition to the problem of hallucinations mentioned by Nizar, there are still many issues that machines cannot handle reliably.

To name just one: how can we ensure that a machine-generated lexicon is not biased and truly reflects the cultural values of the language it is supposed to represent?

Of course, such machine-generated lexica can be used in certain computational approaches, particularly in cases where trusted human-created resources are unavailable. However, without an expert in the loop, the value of machine-generated lexical resources will never be trusted as sources of knowledge or culture for humans—which is arguably the most valuable aspect of lexica.

Hakim

Virus-free.www.avg.com

Reply all
Reply to author
Forward
0 new messages