An empirical evaluation of Arabic text formality transfer: a comparative study
7 views
Skip to first unread message
Shadi Abudalfa
unread,
Aug 13, 2025, 4:57:49 AMAug 13
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to sig...@googlegroups.com
Paper Title: An empirical evaluation of Arabic text formality transfer: a comparative study Abstract: Large language models (LLMs) have demonstrated remarkable performance across various natural language processing (NLP) tasks, including text formality transfer, where the objective is to convert text from an informal to a formal style, often framed as a machine translation (MT) problem. Over the past few years, researchers have focused on evaluating the translation capabilities of several LLMs, primarily targeting high-resource languages such as English. However, limited work has been done on low-resource languages like Arabic, due to its morphological complexity, lexical challenges, and the limited availability of parallel corpora. Similarly, most of these studies have focused on English-centric LLMs, primarily due to the lack of LLMs designed for low-resource languages. Additionally, the majority of evaluations have examined the capabilities of LLMs in translating between English and Modern Standard Arabic (MSA), with very few addressing the translation between MSA and Arabic dialects (ADs). To address this gap, this study conducted a comprehensive performance evaluation of Arabic-based LLMs (i.e., Jais, AceGPT, and ArabianGPT) and LLaMA for translating ADs to MSA using four publicly available datasets, including, MADAR, MDC, PADIC, and BIBLE. The evaluation was performed through zero-shot, few-shot, and fine-tuning in-context scenarios. Experimental results consistently showed that the Jais and AceGPT models achieved the highest performance in terms of BLEU score, COMET score, ChrF1, and BERTScore metrics compared to baseline transformer-based models across all learning settings. The results suggest that the Jais and AceGPT models benefit from additional training on Arabic text data, unlike LLaMA, which was primarily trained on English.