Now that the ArabicNLP 2025 conference is approaching with many published datasets, we are excited to announce
Masader Form. A new way to add datasets to Masader. Instead of manual annotations, we rely on a semi-supervised approach where an LLM can be used to extract the metadata. After submission, the metadata is then directly pushed to our GitHub repository to easily review the metadata. We encourage all authors to submit the datasets through the form to make them easily accessible to the research community.
Masader has now +730 datasets, and we aim to reach 1000 by the end of the year.
PS: this approach is based on our new research
MOLE and
MeXtract.
Zaid