It is our pleasure to announce the release of the Konooz corpus, a Multi-domain Multi-dialect corpusfor Named Entity Recognition, presented today at #ACL2025, Konooz comprises 16 dialects across 10 domains, totaling 777K tokens - manually collected and annotated. Konooz enable rich NER, as well as Domain Adaptation and Transfer Learning from one domain/dialect to another.
--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/9E86155A-8DA3-4392-B853-634FE72911F7%40gmail.com.
--
On 29 Jul 2025, at 5:25 PM, Kalmasoft <kalm...@gmail.com> wrote:
My second attempt ,Basically the internal logic considers text formatting, while it shouldn't. Whatever the text format is, all input should be converted to one long single string, with all breaks converted to spaces.
<IMG_20250729_181705.jpg>