--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/50e85daf-fdee-4d85-b611-9f610dba85e5n%40googlegroups.com.
Salam Ali,
We shared 540K tweets classified at the country level here:
https://alt.qcri.org/resources/qadi/
Paper: https://aclanthology.org/2021.wanlp-1.1.pdf
Please tell me if you need tweet texts as we shared tweet ids.
Best,
Hamdy
|
Hamdy S. Hussein |
|
Principal Software Engineer |
|
Qatar Computing Research Institute |
|
+974 445 41679 |
|
|
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CAFfBGVmfStjThwyZCMk5ER4ufhfp-C%3D%3DpQbbG0hoyXzF%2BHmfTA%40mail.gmail.com.
On Aug 29, 2024, at 7:31 PM, Amr Keleg <amr.k...@gmail.com> wrote:
[CAUTION: Non-UBC Email]
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/cb2be01a-4d27-4402-a9d5-bf1751b5e2bcn%40googlegroups.com.
Agree. We mentioned this in our QADI paper:
https://aclanthology.org/2021.wanlp-1.1.pdf
“… Similar to the results observed for both the Gulf and Levant regions, the Maghrebi dialects (MA, DZ, LY, TN) exhibit a similar pattern.
MA and DZ account for considerable confusion. For instance, the tweet الله يبارك فيك خويا(God bless you, brother!!), could be used in both dialects.
As for the Nile Basin dialects, Egyptian (EG) and Sudanese (SD) could also be confused with one another.
The tweet التويتة دي معدلة فوتوشوب, (This tweet is modified in Photoshop), is equally valid in both dialects.”
Given only a dialectal text (especially short text), in many cases it’s hard to classify it to only a single dialect.
If we add voice, the task will be easier.
Asking native speakers to pronounce the sentences that have more than one country-label can be very useful for detailed comparative studies.
Best,
Hamdy
From: sig...@googlegroups.com <sig...@googlegroups.com>
On Behalf Of Abdul-Mageed, Muhammad
Sent: Thursday, August 29, 2024 6:47 PM
To: Amr Keleg <amr.k...@gmail.com>
Cc: SIGARAB: Special Interest Group on Arabic Natural Language Processing <sig...@googlegroups.com>
Subject: Re: [SIGARAB] Recommendations for Arabic Dialect Classification Models
Adding to the discussion: as people work on this and similar tasks, I believe they should keep in mind issues related to “language production” vs. “language perception”. For example, a text produced by a speaker of a particular dialect can be perceived by an annotator as belonging to another dialect.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/490C40B6-BF25-44D7-9682-69581A5656DB%40ubc.ca.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/43eb0eac-15a8-4a4a-abd8-fd23da4b3ed4n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sigarab/CALs98aZ%2BSEpET%2BcUq%2BsDQ4sBdjvJE8rHE9b3%3DUNBNb7cVyoRhA%40mail.gmail.com.