Shared task 4 at the AbjadNLP Workshop at EACL 2026

16 views
Skip to first unread message

Pranav Gupta

unread,
Dec 9, 2025, 9:31:31 PMDec 9
to SIGARAB: Special Interest Group on Arabic Natural Language Processing
Greetings! We heartily invite you to participate in the following shared task for medical text classification in Arabic. The website for the shared task is https://balajinaga.github.io/EACL2026-Abjad-NLP-SharedTask/ . 

Description
Participants will develop systems to perform multi-class classification of Arabic medical text into 82 predefined categories. Each text instance must be assigned to exactly one category represented by an integer label between 0 and 81.

Dataset Information
The dataset consists of authentic medical-domain text in Arabic. Each row in the dataset contains
text: A medical-domain text segment written in Arabic
category: The English name of the corresponding medical category
label: The integer class label (0–81) that participants must predict

There are 82 categories in total, and the dataset exhibits notable class imbalance, making the task both challenging and practically important for real-world healthcare NLP applications.

Here is an example from the dataset:
text
السؤال
-------
السلام عليكم انا مصاب بفقر الدم المنجلي (السكلسل) علمآ بأن نسبة السكلسل 72 فعندما تصبح نسبة الدم 7 فأن الالام تأتي بكثره فما الحل لزيادة نسبة الدم وما الحل لعلاج...

الجواب
-------
الحل بالابتعاد عن الرضرض النفسية وتقوية المناعة وتناول حمية غذائية متوازنة غنية بالحديد وعند حدوث نوبات الام سببها ونقص حاد بالخضاب الدموي لايوجد الا تعويض الدم الناقص..بنقل الدم.

category
Hematological diseases
label
33

Dataset Links
Training Dataset: Download here
Evaluation Dataset (no labels): Download here

Evaluation Metric
Submissions will be evaluated using the macro-averaged F1 score across all 82 classes. This metric assigns equal weight to each category, encouraging solutions that perform well even on minority classes.
For more details about the macro F1 score, refer to the scikit-learn documentation .

Contact
For questions or clarifications, please contact the organising team.
We look forward to your participation in the AbjadNLP Medical Text Classification shared task and to advancing medical NLP for Arabic and other Abjad-script languages.

How to Register
1. Complete the registration form
2. Join the Kaggle competition.
3. Download the dataset (train, test without labels) and begin developing your system.

Task Summary
• Input: Arabic medical question–answer pair
• Output: One of 82 predefined category labels (0–81)
• Metric: Macro-averaged F1 score

System Description Papers
All participating teams are encouraged to submit a short system description paper. Papers will be included in the ACL Anthology and do not require high leaderboard ranking. We welcome creative approaches, analysis, and lessons learned.

For questions or clarifications, please contact the organizing team. Our contact can be found on the shared task website.


Sincerely,
Pranav Gupta
Reply all
Reply to author
Forward
0 new messages