Exporting texts into the 'flextext' (EMELD) format omits the non-glossed segments of the baseline text

9 views
Skip to first unread message

Sylvain Loiseau

unread,
Sep 5, 2025, 9:15:17 AM (12 days ago) Sep 5
to FLEx list

Dear all,

In Flex, texts are first entered in the Baseline tab of the Words and Texts area, and then further annotated in the Gloss and Analyze tabs. In the Baseline tab, the running text can be assigned to the various vernacular and analysis languagesconfigured for the project. Segments of the baseline text that are not in an analysis language cannot be glossed in the Gloss and Analyze tabs.

When exporting a glossed text from Flex into the FlexText (EMELD) format, the wordforms entered in the baseline are correctly preserved in the output (word/item[@type="txt"]). However, any segments not assigned to an analysis language are discarded.

This is problematic, since corpora may include textual segment in other (non-analyzed) languages, or may mix running text with commentary or special characters (grammaticality, certainty, etc.) in the analysis language, disseminated through hundred of texts and quite crucial for the interpretation of the data. It becomes frustrating not to be able to export this material from Flex, especially as FlexText is the only format suitable for further generic processing.

Would it be possible to include these segments in the FlexText export as well? Ideally, they could be represented as word elements without morph children, with the language correctly specified in the @lang attribute. I would also be very grateful for any suggestions regarding alternative solutions.

With best regards,
Sylvain Loiseau

sarkipo

unread,
Sep 5, 2025, 10:09:59 AM (12 days ago) Sep 5
to flex...@googlegroups.com
Dear Sylvain,

Which version of FLEx are you using?
I don't have the issue you're describing and don't remember such a problem. I am currently on the latest Alpha version (9.3.3) but it has not changed for me since at least 9.0.

Here is a sample I've just made, the second word in the sentence is marked with a different writing system and is not glossed:
изображение.png

Now here is what I get in the Flextext export (Export interlinear > ELAN, SayMore, FLEx | FLEXTEXT):

          <phrase guid="cebc17db-a328-4911-8a32-4612e0ebf66b">
            <item type="txt" lang="ket">qimä usalʼgit, dʌnʼgit ʌŋnʼiŋisʼaŋ dalʼdɔŋɔnʼdɔ. </item>
            <item type="segnum" lang="en">1.11</item>
            <words>
              <word guid="f6b89a83-0f1f-4ed0-9158-d0702cf1c3b8">
                <item type="txt" lang="ket">qimä</item>
                <morphemes>
                  <morph type="root" guid="d7f713e5-e8cf-11d3-9764-00c04f186933">
                    <item type="txt" lang="ket">qima</item>
                    <item type="cf" lang="ket">qima</item>
                    <item type="gls" lang="en">grandmother</item>
                    <item type="gls" lang="ru">бабушка</item>
                    <item type="msa" lang="en">n f anim</item>
                  </morph>
                </morphemes>
                <item type="pos" lang="en">n</item>
                <item type="gls" lang="en">grandmother</item>
                <item type="gls" lang="ru">бабушка</item>
              </word>
              <word>
                <item type="punct" lang="en">usalʼgit</item>
              </word>
              <word>
                <item type="punct" lang="en">,</item>
              </word>
              <word guid="2bd25eb0-77dd-4b4a-b4df-e776695ba1f6">
                <item type="txt" lang="ket">dʌnʼgit</item>
              </word>

Best,
Alexandre

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/21809519-ea62-4eba-9140-5d478cc032a1n%40googlegroups.com.

sarkipo

unread,
Sep 5, 2025, 12:10:30 PM (12 days ago) Sep 5
to flex...@googlegroups.com
Dear Sylvain,

I guess I understand your question better now. If you were looking specifically for the XPath word/item[@type="txt"], these words indeed wouldn't appear in the result since they are getting the @type="punct" attribute (in addition to the different @lang attribute). This is indeed a FLEx-specific solution which is not evident; but I guess it is not that easy to change.
However, since these words are really not discarded, you could post-(pre-?)process them on your side and change them back to @type="txt".

Hope that helps,
Alexandre

Message has been deleted

Sylvain Loiseau

unread,
Sep 9, 2025, 9:35:44 AM (8 days ago) Sep 9
to FLEx list
Dear Alexandre,
Thank you very much for your answer! Yes I wasn't aware of this solution and missed that "punct" encoding but it's perfectly fine, it even include the lang information.
Thanks a lot,
Sylvain

Reply all
Reply to author
Forward
0 new messages