Hello Eric,
To label Arabic text, I employ the conventional pipeline with
numerous "Morphological Mode Dictionaries."
By constructing FST_Text,
I'm able to get the acceptable segment combinations within a token by using grammatical constructs like verbs, nouns, adjectives, etc.
To answer your question:
To pinpoint the {S} at a sentence's onset, I
utilize the Locate (Paumier 2003) with
S_CONJC_V graph. An example can be found in the figure below.
New Trick:
I've recently discovered a more efficient method to identify {S} at the beginning.
It involves the
Locate with automaton intersection. Instead of merely inserting {S}, we use a format like _End-S_ {S} _Start-S_
so the annotation _Start-S_ is available in the .snt
Thanks,