Adding African Stopwords

43 views
Skip to first unread message

Chris Emezue

unread,
Mar 22, 2023, 4:06:52 PM3/22/23
to nltk-dev
Hello NLTK dev team,

The NLTK package is a renowned NLP package, supporting a host of crucial and foundational NLP processes like stemming, parsing, etc. The NLTK package is used by many practitioners, both in academia and industry.

However, the NLTK package does not have much support for African languages. Currently no African language stopwords are supported by NLTK (from my last query of the supported languages today). 

We are trying to mitigate this issue with our work called African Stopwords project, where we curated (and verified) the largest African stopwords to date. We currently have stopwords for 13 African languages and are reaching out to ask if it would be possible to include these stopwords in the NLTK package, thereby enabling support for many NLP tasks in these African languages.

This is just the beginning: the African stopwords project is an ever-ongoing project to curate trusted stopwords for African languages. At the Masakhane and Lanfrica communities, we have a team of dedicated African language experts who take pains to curate and verify the stopwords, as well as add new stopwords for other languages.. We are also working on automatically gathering these stopwords and then having human evaluators do the review (see our paper for more about that). We support an open discussion forum to encourage talks around African stopwords.  That is to say, we will be adding more stopwords

I am proposing a collaboration between NLTK.org, Masakhane and Lanfrica to enable the inclusion of African languages in the NTLK, starting with stopwords. While we plan to build our own packages for unique support of African languages (like the Preprocessor), I strongly believe that also integrating some of our efforts into the widely used NLTK ecosystem will enable a wider adoption, thus fostering the inclusion of African languages in language technologies.

Please let me know if this is something the NLTK team would be interested in, and how we could go about it. Also, feel free to schedule a meeting with me to talk more about this.



Chris Emezue
Reply all
Reply to author
Forward
0 new messages