Best List for Arabic Stop words

24 views
Skip to first unread message

Ahmed Haj Ahmed

unread,
Oct 11, 2024, 10:53:03 AM10/11/24
to sig...@googlegroups.com
What's the best and most comprehensive list (or library) to remove Arabic stop words? 

Karim BOUZOUBAA

unread,
Oct 12, 2024, 7:30:46 AM10/12/24
to Ahmed Haj Ahmed, sig...@googlegroups.com
Dear Ahmed

You can find out from the following link a list of Arabic functional words (قاموس الكلمات الوظيفية): http://arabic.emi.ac.ma/murabaa/The list contains 17153 entries.
 
D. Namly, K. Bouzoubaa, R. Tajmout, Y. Tahir and H. Khamar, "A Complex Arabic stop-words list design", in 2ème Journée Doctorale Nationale sur l’Ingénierie de la Langue Arabe (JDILA’15), Fes, October 2015

D. Namly, K. Bouzoubaa, Y. Tahir and H. Khamar, "Development of Arabic particles lexicon using the LMF framework", Colloque pour les Etudiants Chercheurs en Traitement Automatique du Langage Naturel et ses applications (CEC-TAL 2015), Sousse - Tunisia, Marsh, 2015

best, karim

-----------------------------------------------------------------------------------------------
                   Karim Bouzoubaa, M.Sc, Ph.D  د. كريم بوزوبع
                                                Full professor أستاذ جامعي
            Department of Computer Science  قسم علوم الحاسوب
   EMI (Ecole Mohammadia d'Ingénieurs,
          Mohammadia School of Engineers)  المدرسة المحمدية للمهندسين
            Mohammed V University in Rabat  جامعة محمد الخامس
                   Avenue Ibnsina B.P. 765 Agdal  شارع ابن سينا ص ب 765 أكدال
                                            Rabat, Morocco  الرباط المغرب

    Tel: +212 (0) 537 68.71.50 / +212 (0) 537 77.65.66 الهاتف
    Fax: +212 (0) 537 77.88.53 الفاكس
    karim.bouzoubaa [at] emi.ac.ma
    karim.bouzoubaa [at] um5r.ac.ma
    karimbouzoubaa [at] yahoo.com
    http://www.emi.ac.ma/bouzoubaa
    http://www.emi.ac.ma/alelm
    https://www.youtube.com/channel/UCFpBdMiXvofNsSIAxgyaxeA

** Please, consider the environment before printing this email من فضلكم فكروا في البيئة قبل طباعة هذه الرسالة -  **



On Fri, Oct 11, 2024 at 3:53 PM Ahmed Haj Ahmed <ahaj...@haverford.edu> wrote:
What's the best and most comprehensive list (or library) to remove Arabic stop words? 

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CABwoo5-diwTyLe1Un6vG%2Br%3Doh1w%3DwBr%3DpDEYqDKvXEPWkA6skg%40mail.gmail.com.

Nizar Habash

unread,
Oct 12, 2024, 7:51:35 AM10/12/24
to Ahmed Haj Ahmed, sig...@googlegroups.com
Hi Ahmed -

Here is a list that assumes initial ATB tokenization and allows access to more details on the words to exclude (like POS).

On Fri, Oct 11, 2024 at 6:53 PM Ahmed Haj Ahmed <ahaj...@haverford.edu> wrote:
What's the best and most comprehensive list (or library) to remove Arabic stop words? 

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CABwoo5-diwTyLe1Un6vG%2Br%3Doh1w%3DwBr%3DpDEYqDKvXEPWkA6skg%40mail.gmail.com.


--
Nizar Habash
Professor of Computer Science
New York University Abu Dhabi
https://www.nizarhabash.com/ 

zerrouki

unread,
Oct 12, 2024, 1:55:30 PM10/12/24
to sig...@googlegroups.com


Arabic Stop words:


Link: https://github.com/linuxscout/arabicstopwords


Contains two parts:

  • Data part, which contains classified stopwords, all generated forms, in multiple format
    • CSV
    • Python
    • SQL / Sqlite
    • another list of most frequent in corpus like (Wikipedia and Tashkeela Corpus)
  • Python library for handling stopwords.

Two formats of data are given:

  • classified words (lemma) with features to generate inflected forms
  • Generated forms from lemmas with adding affixes.

Thanks

Taha zerrouki

Reply all
Reply to author
Forward
0 new messages