LocateTFST "{S}"

Alexis Neme

unread,

Aug 8, 2023, 2:22:19 AM8/8/23

to Unitex-GramLab

Hello,

I use Locate {S} [with Paumier 2003] and it works. I can't get it to work with LocateFST... Are you aware of this? I need to identify the beginning of a sentence after running the sentence.fst2 There's always a way to do it, but I prefer to follow the standard.

For LocateTfst, it would be enough for {S} to appear as the first and last box of the automaton for each sentence.

Here is a little bit of my experience, and I've found a way around the issue.

1 - I run a Construct FST-Text
2 - then I use the Morphological Mode for the sequence of Arabic tokens to match with the beginning of a sentence: I apply Locate {S}... $<... $> with MM (Paumier 2003)

IMPORTANT: The Locate is not as flexible as LocateTFST because:

Left and right contexts as per section 6.3 are forbidden. (in Morphological mode, P.146)

That is, accessing a PART of the des-agglutinated token (in Arabic, for instance) is not straightforward (or impossible), and applying 6. let us better circumscribe and disambiguate the token PART in question.

Any feedback or observation?

Thanks and Happy Holidays

Alexis

eric.laporte

unread,

Sep 11, 2023, 10:18:31 AM9/11/23

to Unitex-GramLab

Hi Alexis,

I don't understand your method. The 'Locate' program does not access or modify the text automaton, so why do you specify that you use 'Construct FST-Text' before 'Locate'?

Thanks,

Eric

Alexis Neme

unread,

Sep 11, 2023, 11:47:55 AM9/11/23

to Unitex-GramLab

Hello Eric,

To label Arabic text, I employ the conventional pipeline with numerous "Morphological Mode Dictionaries."

By constructing FST_Text, I'm able to get the acceptable segment combinations within a token by using grammatical constructs like verbs, nouns, adjectives, etc.

To answer your question:

To pinpoint the {S} at a sentence's onset, I utilize the Locate (Paumier 2003) with S_CONJC_V graph. An example can be found in the figure below.

New Trick:

I've recently discovered a more efficient method to identify {S} at the beginning.

It involves the Locate with automaton intersection. Instead of merely inserting {S}, we use a format like _End-S_ {S} _Start-S_

so the annotation _Start-S_ is available in the .snt

Thanks,

Reply all

Reply to author

Forward