OSACT 2026 Workshop, First Call for Papers
11 May 2026, Palma de Mallorca, Spain
https://osact-lrec.github.io
Hosted by LREC 2026
https://lrec2026.info/
Workshop Description
The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools.
Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community.
The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenization, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.
Workshop Topics
The workshop welcomes (including but not limited to) topics in the following areas:
A) Language Resources:
· Pre-trained Arabic language models.
· Surveys and evaluations of existing Arabic corpora and their associated processing tools.
· Development and release of new annotated corpora for NLP and IR tasks such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
· Assessing the effectiveness of crowdsourcing platforms for Arabic data annotation.
· Arabic text and speech processing toolkits.
B) Tools and Technologies:
· Language education, including first (L1) and second (L2) language learning applications.
· Pre-training & fine-tuning approaches for Arabic.
· Tokenization, normalisation, segmentation, morphology, and POS tagging.
· Sentiment analysis, dialect ID, \& classification.
· Web and social media analytics.
· Arabic LRs for text, speech, sign, gesture, image, & multimodal data.
· Best practices for LR interoperability.
· Construction and annotation of LRs.
· Knowledge extraction, acquisition, and representation.
· Ontologies, terminology, and frameworks.
· LRs and the Semantic Web (Linked Data, Knowledge Graphs).
· Data contamination, synthetic data, and quality issues.
Important Dates
· February 18, 2026: Paper submission deadline
· March 12, 2026 Notification of acceptance
· March 30, 2026: Camera-ready deadline
· May 11, 2026: Workshop Date
Submission Instructions
We invite submissions on topics of interest
between 4 and 8 pages of
content. The page limit of 8 pages does not
include acknowledgements,
references, potential Ethics Statements and
discussion on Limitations in
line with the policy of the main LREC
conference. All submissions must
follow the LREC stylesheet (https://lrec2026.info/authors-kit/).
All submissions are double-blind. Any
submissions which are
not-anonymised, over-length, poorly formatted or
make excessive use of
appendices to circumvent page limits are liable
to desk-rejection.
At the time of submission, authors are offered
the opportunity to share
related language resources with the community.
All repository entries
are linked to the LRE Map (https://lremap.elra.info/), which provides
metadata for the resource.
Organizing Committee
· Hend Al-Khalifa, Professor, King Saud University, Riyadh, Saudi Arabia, he...@ksu.edu.sa
· Mo El-Haj, Reader, VinUniversity, Vietnam, Lancaster University, UK, elh...@vinuni.edu.vn
· Saad Ezzini, Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, saad....@kfupm.edu.sa
————