SA Arabic NLP community,
I would like to bring your attention to our accepted paper to the ACL main conference on Arabic dialects, since I think it is really important to anyone who is working in the Arabic NLP and linguistic domains.
In the paper, we examine some of the most common assumptions about Arabic dialects, which have been used before in modelling or building some of the Arabic NLP tasks/resources. For examples, the categorisation of dialects on the regional or country levels and the assumptions that there are some unique clue words exclusive to certain dialects.
In our paper “Revisiting Common Assumptions about Arabic Dialects in NLP”, we examine four of these assumptions about Arabic dialects in a controlled quantitative method to find that these assumptions might not be that accurate as thought, and that modelling of some of the tasks in the field using them might be sub-optimal.
We hope the paper will help NLP researchers to better have clearer assumptions about Arabic dialects and to be transformative for any future NLP tasks that address them.
You can read our pre-print paper in the following link:
https://arxiv.org/pdf/2505.21816
We hope to discuss it more here over the email, or in-person at ACL in Vienna inshaAllah
Walid
Best, Mustafa