Current State of Arabic NLP datasets

24 views

Skip to first unread message

unread,

May 9, 2024, 5:53:20 PM5/9/24

to SIGARAB: Special Interest Group on Arabic Natural Language Processing

In the past few weeks, I worked on improving the accessibility of Arabic NLP datasets in the Masader interface. Here are some numbers:

Total datasets: 650

Total datasets and subsets: 899

Freely accessible datasets: 376

HuggingFace datasets: 311

Upon-Request datasets: 66

Paid datasets: 208

If you know a missing dataset, please add it through this form.

Zaid

Reply all

Reply to author

Forward

0 new messages