Excited to share our latest research contribution to the Arabic NLP community!
We are introducing MA’AKS – the first parallel dataset for Arabic sentiment style transfer.
Although many datasets exist for sentiment classification in Arabic texts, there has been little to no contribution in applying sentiment style transfer within the Arabic domain. Meanwhile, the recent advances in Generative AI and the remarkable capabilities of large language models have transformed the field. However, the lack of focus on Arabic sentiment transfer has limited progress in developing and evaluating robust models for this task.
What we built:
- A high-quality parallel dataset of 5,000 Modern Standard Arabic sentences annotated with positive/negative sentiment pairs.
- Carefully designed annotation guidelines to ensure semantic preservation while flipping sentiment. Balanced coverage of positive and negative sentences to support both supervised and unsupervised learning.
What we did with it:
We benchmarked AceGPT, JAIS, and Llama-3 under different learning scenarios:
· Zero-shot
· Few-shot
· Fine-tuning
Results showed that while zero-shot performance in Arabic is limited, fine-tuning on MA’AKS significantly boosts sentiment transfer performance, making the dataset a valuable resource for future research.
What we’re sharing with the community:
· The MA’AKS dataset (openly available)
· Detailed annotation guidelines
Why this matters:
MA’AKS fills a crucial gap in low-resource language research, paving the way for innovations in Arabic sentiment-aware applications—from content moderation and personalized chatbots to creative writing and beyond.
We hope this work inspires further exploration in Arabic style transfer and low-resource NLP more broadly.
https://github.com/sabudalfa/ArabicTextSentimentSwap
With best regards,
Shadi Abudalfa
Shadi I. Abudalfa
PhD in Computer Science and Engineering
SDAIA-KFUPM JRC for Artificial Intelligence
http://www.linkedin.com/in/shadiabudalfa