Current State of Arabic NLP datasets

24 views
Skip to first unread message

Zaid Alyafeai

unread,
May 9, 2024, 5:53:20 PM5/9/24
to SIGARAB: Special Interest Group on Arabic Natural Language Processing
In the past few weeks, I worked on improving the accessibility of Arabic NLP datasets in the Masader interface. Here are some numbers:

Total datasets: 650
Total datasets and subsets: 899
Freely accessible datasets: 376
HuggingFace datasets: 311
Upon-Request datasets: 66
Paid datasets: 208

If you know a missing dataset, please add it through this form.

Zaid 



Reply all
Reply to author
Forward
0 new messages