Hello Noureddinne,
In fact, I have applied the DELAF dictionary on "katb", it does not identify this token.
- is-it necessary? I dont think so.
I never use the dico this way.
I never apply a dictionary directly.
I apply always my dico in morphological mode using .fst2 grammar to formalize agglutination, and it identifies partial diacritization. (see attached files)
I use always the dictionary in Morphological mode since we have always in Arabic agglutination grammar for verbs, nouns and adjectives.
The dictionary should be declared your compressed DELAF dictionary
in Preference>morphological mode Dictionary.
Alexis
PS.
I advise to work in UTF16-LE since Unitex is native you switch to UTF-8 later on.
Attached files an example of grammar with Morphological Mode
- put the grammar fst2 in Dela directory
- put katb.snt 'katb.snt' in corpus directory
- unzip .7z in Corpus dir.
_ create DELAF dictionnary with kataba with your attibutes
compress in semitic mode and declare it in preference as a morphological dictionnary.
- after tokenization, apply the lexical ressource : prfx_VRB-r.fst2
- check the Filter unknown word with tag.ind (see fig below)
- word not identified by your lexical ressources (here .fst2) will be in tags.err file
execute File>construct-Fst-txt