Hi SIGARAB colleagues and friends,
📣 In case you missed ACL this year, I wanted to announce AL-QASIDA (Analyzing LLM Quality & Accuracy Systematically In Dialectal Arabic), a comprehensive evaluation of LLM Dialectal Arabic proficiency.
🤖 As many of you have experienced, LLMs often struggle to produce Dialectal Arabic (العامية أو اللهجات). As practitioners attempt to mitigate this, new evaluation methods are needed. AL-QASIDA measures proficiency across four axes (dialectal fidelity, understanding, quality, and diglossia) via cross-lingual, monolingual, and translation eval sets.
🔍 As part of the eval suite we define a new metric, ADI2, with logits from NADI (Abdul-Mageed et al., 2024) and ALDi (Keleg et al., 2023) models to measure whether LLM responses are both sufficiently dialectal and corresponding to the desired country-level variety.
💡 Feel free to reach out if you have questions about using AL-QASIDA or ideas about how to improve it!
Best,
--