CIDAR: Arabic Instruction Dataset

58 views
Skip to first unread message

Zaid Alyafeai

unread,
Feb 6, 2024, 11:10:59 AM2/6/24
to SIGARAB: Special Interest Group on Arabic Natural Language Processing
Introducing CIDAR: the first open Arabic instruction dataset culturally aligned by native Arabic speakers. CIDAR contains 10,000 instructions and outputs capturing the essence of the Arab region and its unique culture. CIDAR can be used to fine-tune Arabic LLMs to follow instructions. Latest work from ARBML.

Walid Magdy

unread,
Feb 6, 2024, 11:15:01 AM2/6/24
to Zaid Alyafeai, SIGARAB: Special Interest Group on Arabic Natural Language Processing
What an amazing contribution for the ArabicNLP community. 

Thanks for you Zaid and for everyone participated in this effort.

جزاكم الله خيرا 

Walid

From: sig...@googlegroups.com <sig...@googlegroups.com> on behalf of Zaid Alyafeai <alya...@gmail.com>
Sent: Tuesday, February 6, 2024 4:10:44 PM
To: SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Subject: [SIGARAB] CIDAR: Arabic Instruction Dataset
 
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CAL4%3D5YHH1t8aceH8qGZ-NzV9y559fUbZDQBBT5oNQmMsv9DgDQ%40mail.gmail.com.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Reply all
Reply to author
Forward
0 new messages