Dear Core Developers,
The BUG
When a token is prefixed with agglutinated prefixes (CONJC, and PREP) before the definite Article Al-, the dico module should find the corresponding the lemma in the Dictionary.
Example
samaAdi => AlsGamaAdi /insertion of G (gemination) after the first solar consonant.
waAlsGamaAdi /waAl agglutinated conjonction folled by Al-
if you are interested, I will include the BUG in the Core Bug Tracker.
It is not true in ALL the cases (table below)
Thanks for your help in advance,
Let me know,
Alexis
Bug Location
Module: Arabic.cpp. Arabic.cpp module handles diacritization rules of Arabic script defined in the
Configuration files: arabic_typo_rules.txt : ... solar assimilation=YES ... in in the Arabic directory
Explanation
In Arabic, the consonants are divided into two groups, solar and lunar letters, based on whether or not they assimilate the letter 'l' of a preceding definite article Al-.
Solar letters are half of the alphabet (list is in Arabic.cpp)
Given a partially diacriticized token in Arabic, the dico program should find in the dictionary the fully diacriticized lemma according to the typo rules; and particularly when a 'G' is inserted and even with agglutinated prefixes.
Below the test case below:
AR-Token | TB-Token | FOUND (Yes/N) |
سَمَادِ | samaAd |
|
السَّمَادِ | AlsGamaAdi | Y |
بِالسَّمَادِ | biAlsGamaAdi | Y |
كَالسَّمَادِ | kaAlsGamaAdi |
|
لِلسَّمَادِ | lilosGamaAdi | Y |
وَالسَّمَادِ | waAlsGamaAdi |
|
فَالسَّمَادِ | faAlsGamaAdi |
|
وَبِالسَّمَادِ | wabiAlsGamaAdi | Y |
وَكَالسَّمَادِ | wakaAlsGamaAdi |
|
وَلِلسَّمَادِ | waliAlsGamaAdi | Y |
فَبِالسَّمَادِ | fabiAlsGamaAdi | Y |
فَكَالسَّمَادِ | fakaAlsGamaAdi |
|
فَلِلسَّمَادِ | faliAlsGamaAdi | Y
|