The 1st Workshop on NLP for Languages Using Arabic Script
(AbjadNLP 2025)
Abu Dhabi, UAE
19-20 January 2024
Submission URL: https://softconf.com/coling2025/AbjadNLP25/
Co-located with COLING 2025 Conference, Abu Dhabi, UAE (19-20 January 2025)
AbjadNLP is dedicated to advancing innovation and gaining deeper insights into Natural Language Processing (NLP) for languages that use the Arabic script. Our primary focus is on Abjad and Ajami languages that utilise the Arabic script or its variations. Traditionally associated with Semitic languages, Abjad scripts represent consonants in every syllable. In contrast, Ajami scripts denote the alphabetic use of the Arabic script in various African contexts, representing non-Arabic languages. We are interested in research on languages that fall under the Abjad or Ajami categories that use the Arabic script or any variations of it.
We invite contributions, discussions, and explorations that delve deep into the unique linguistic structures, resources, challenges, and untapped potential presented by Abjad and Ajami languages within the realm of NLP and language resources. Our goal is to create synergies among researchers by addressing the diverse phenomena and challenges inherent in these rich linguistic traditions.
The workshop is proud to highlight our connections with the Masakhane NLP community and collaborations with institutions worldwide, such as COMSATS on Urdu, and the long-standing UCREL NLP Group at Lancaster University, whose work encompasses over 20 languages worldwide, including Abjad and Ajami languages.
Note: We chose the name Abjad for simplicity, but our focus includes Abjad and other languages that have adopted the Arabic and Perso-Arabic scripts, as well as Ajami languages. We acknowledge that Sorani Kurdish, when written in Arabic script, follows an alphabet style rather than an Abjad style.
Workshop Description:
We welcome contributions, discussions, and explorations that thoroughly investigate the distinctive linguistic structures, resources, challenges, and untapped potential of Abjad and Ajami languages within the field of NLP and language resources. Our aim is to foster collaboration among researchers by addressing the varied phenomena and challenges inherent in these rich linguistic traditions.
Ajami languages, representing a myriad of African languages
that have adopted the Arabic script, span at least 43 distinct languages,
including Hausa, Fulfulde, Mandingo, Swahili, Wolof, Kanuri, and Tamazight. The
combined number of speakers of these languages is estimated to exceed 200
million within Africa alone. Although Abjad has been traditionally associated
with Semitic languages such as Arabic, Hebrew and Syriac, it has been adopted
for writing by many other language communities as in Perso-Arabic scripts used
in Persian, Urdu, Pashto, Sorani Kurdish, Azeri Turkish, Sindhi, and Uyghur,
with a collective estimated speaker population exceeding 500 million.
Altogether, these languages represent an approximate global aggregate of 1
billion speakers.
The adoption of the Arabic script across diverse linguistic landscapes
highlights its expansive and varied application, transcending genres such as
governmental correspondences, poetic compositions, religious texts, and
journalistic pursuits. This widespread use underscores the imperative need to
enhance digital infrastructure, tools, and resources for these under-resourced
languages. Advancing such resources is crucial to nurturing linguistic
diversity and resilience in both digital and print media, ensuring the
preservation of linguistic heritage in the digital age.
Currently, there is an increasing interest in various NLP communities, both in
academia and industry, in writing systems. However, there is a lack of
initiatives focusing on the diverse phenomena and challenges of the languages
using an Abjad script. The AbjadNLP workshop aims to fill this gap, fostering
collaboration and innovation in this vital area of study.
Motivation
Languages employing an Abjad script signify a pivotal and
diverse fragment of the global linguistic mosaic, traversing numerous countries
and regions and embodying a considerable populace of speakers. The linguistic
wealth and geographical diffusion of languages covered by AbjadNLP present a
prolific environment for exploration and advancement in NLP. By channeling
attention towards these languages, the realm of NLP is poised to unlock access
to an expansive and varied array of linguistic constructions, subtleties, and
cultural contexts, pivotal for bolstering the versatility and adaptability of
NLP models and applications. The extensive spectrum of these languages not only
unfolds a valuable opportunity to amplify multilingualism and multiculturalism
in NLP research but also forges pathways for addressing the requisites and
challenges intrinsic to a diverse and extensive speaker population.
The broad adoption of Abjad scripts transcends diverse genres, including
governmental correspondences, poetic compositions, religious texts, and
journalistic pursuits. The sustained use of such scripts underscores the
imperative need to enhance digital infrastructure, tools, and resources that
elucidate the varied writing systems inherent to under-resourced languages. Such
advancement is crucial to nurturing linguistic diversity and resilience in both
digital and print media, ensuring that the linguistic heritage does not
diminish in the digital age.
This workshop can contribute to more inclusive and equitable progressions in
NLP, accommodating a broader assortment of languages and dialects and promoting
enhanced comprehension and interconnectivity amongst varied linguistic
communities. The assimilation and prioritization of these linguistically
affluent and diverse languages are indispensable for the comprehensive
progression and the universal adaptability of NLP technologies. While our
workshop primarily targets languages using an Abjad script, we recognize that
many historical languages such as Aramaic , Sogdian, Parthian and Phoenician
employed such a writing system. As such, we believe that our workshop can
enforce links with researchers working on endangered languages as well.
We are proud to highlight our existing connections with the Masakhane NLP community (www.masakhane.io) and collaborations with institutions worldwide, such as COMSAT on Urdu (www.comsats.edu.pk), and the long-standing UCREL NLP Group at Lancaster University, whose work encompasses over 20 languages worldwide, including Abjad and Ajami languages (http://ucrel-web-dev.lancs.ac.uk/ucrelng/).
Team
Our team is uniquely diverse and gender-balanced, comprising individuals from a wide range of ethnic backgrounds. We represent a spectrum of languages that use the Arabic script and include researchers from both Linguistics and NLP, enriching the ever-needed collaboration between these two fields. With expertise in language technology, Unicode, NLP, resources, and multilingual text analysis, together, we aim to foster a dynamic and inclusive environment for research and collaboration in the field of NLP.
Call for papers
We invite submissions on topics that include, but are not
limited to, the following:
• Enabling core technologies: morphological analysis, disambiguation,
tokenisation, POS tagging, named entity detection, chunking, parsing, semantic
role labelling, sentiment analysis, language modelling, etc.
• Applications: machine translation, speech recognition, speech synthesis,
optical character recognition, pedagogy, assistive technologies, social media,
etc.
• Resources: dictionaries, annotated data, corpus, etc.
In addition, we extend a warm invitation to researchers and stakeholders across the spectrum to contribute papers focusing on, but not limited to, the following dimensions:
Summary of the Call:
We welcome submissions of papers centred around the Abjad and Ajami theme, focusing on supporting NLP language resources for non-Arabic languages utilising Arabic script. We encourage submissions that span a spectrum from theoretical investigations to practical applications, aiming to underscore the distinctive challenges, solutions, and insights that languages using Ajami and Abjad scripts introduce to the field of NLP.
For the submission format and guidelines, we follow the COLING 2025 standards. Authors are encouraged to thoroughly review and adhere to the COLING 2025 submission guidelines and author kit, which can be found at: https://coling2025.org/calls/submission_guidlines/.
If authors are describing an orthography, we request that they include the points recommended in (Hosken 2003 https://scripts.sil.org/WP-Encoding). For continuity across the workshop and greater impact across industry applications, authors should consider terminological (orthography, script, writing system, etc.) differences presented by Constable (2002) https://www.sil.org/resources/publications/entry/7853. The model presented by Constable is the current Unicode model.
Please ensure that all submissions strictly conform to these standards to streamline the review process and maintain uniformity across all contributions. Both long papers (up to 8 pages) and short papers (up to 4 pages) are welcome. All submissions will undergo a rigorous peer-review process, emphasizing originality, relevance, and clarity.
Submissions may be of two types:
Submission URL: https://softconf.com/coling2025/AbjadNLP25/
Submission Guidelines: https://coling2025.org/calls/submission_guidlines/
Provisional Key Dates:
Anti-Harassment Policy:
The workshop supports the COLING anti-harassment policy https://coling2022.org/policy
Organising Committee:
General Chair:
Programme Chairs:
Review Committee:
Publication Chair:
Publicity Chairs:
Advisory Committee:
Programme Committee*
*We are in the process of forming a linguistically diverse program committee who are experts in languages that use Arabic Script (Abjad and Ajami), with the majority of the list already confirmed to serve as reviewers. As soon as we gain access to SoftConf, we will extend invitations to the remaining committee (if you see your name on the list and want it removed, please contact any of the organisers). If your name appears in this list and you want it removed, please contact us as soon as possible and we’ll make sure it’s removed. Thanks