Multi-language, multi-participant ELAN-FLEx-ELAN experiences sought

154 views
Skip to first unread message

Tim Bodt

unread,
Mar 15, 2021, 3:18:37 AM3/15/21
to FLEx list
Good day,

I am looking for people who have experience with setting up an ELAN to FLEx to ELAN workflow that allows for both multiple writing systems in transcription, multiple languages in translation, and multiple participants. I am trying to come to broadly the following set--up, but have been unable to get it right, despite multiple attempts and inputs from experts:

1. transcription - the vernacular language in phonetic IPA (baseline, this will be parses and glossed);
2. transcription - the vernacular language in phonemic/orthographic IPA (for publication purposes);
3. transcription - the vernacular language in regional script (in my case, Devanāgarī, for teaching purposes);
4. the analysis language English (used for parsing and glossing);
5. a free translation in English;
6. a free translation in Nepali (in Devanāgarī script).

And that for two or more participants.

When assigning the tier for 3. above to a "notes" field, it seems to override any participant information that was assigned to the "notes" field (in that it creates a large number of 'new' participants when re-importing from FLEx to ELAN), but I haven't figured out another way to add this 'extra' tier in FLEx. 

Also, whereas 1. properly exports to FLEx, I haven't been able to figure out how to get another tier for 2. that does not end up conflicting with 1. 

Anyway, anyone who has set up a similar system for any constellation of vernacular/analysis languages is kindly requested to share their experiences.

Thanks, Tim.

Natalia

unread,
Mar 28, 2021, 6:26:12 PM3/28/21
to FLEx list
Dear Tim,

I have experience setting up an ELAN to FLEx workflow in which I have multiple languages in translation. I had no experience with multiple writing systems in transcription but needed to figure out a workflow and stumbled upon your question.

My experience after playing around with a project with multiple writing systems in transcription and translation is that FLEx:
A. Cannot import multiple writing systems for the transcription
B. Has no problem importing multiple free translation tiers corresponding to different languages

However, concerning point A, the transcription in other writing systems is generated from the wordlist component inside FLEx (after you have specified the different forms for each word in that component). This means that you only need to import the transcription line which will be the basis for your morphological segmentation (whether it is your line 1, 2 or 3) and then FLEx will know what the 2 other corresponding forms of each word are (once you generate these forms, maybe using the bulk edit function).

This also means that you would then not need to use your note field for anything other than the participant tag and that would solve the problem you mention about the generation of a large number of "new" participants.

Best,

Natalia

Tim Bodt

unread,
Apr 6, 2021, 4:47:09 AM4/6/21
to flex...@googlegroups.com
Dear Natalia,

Thank you for your response. Yes, I also found out that FLEx can not handle multiple transcription tiers in multiple orthographic scripts. It just has to be IPA only. I have added an additional notes field for the transcription in local orthography, but I will also further explore the option you mentioned with the wordlist component. However, there is (unfortunately) not a clear one-on-one correspondence between the 'ideal' orthographic transcription in the local orthography (i.e., based on the phonology) and the one employed by the local researcher, which is much more a crude and impressionistic version. So, I want to preserve the latter (in the notes field), while also offering a proposal for a more phonologically consistent orthography (through the method you proposed, perhaps). The ultimate choice will be up to the language consultant and the speakers.

I managed to solve the issue with the multiplication of participants. But to be honest, I lost how I did it! The issue is that I have by now been trying so many options, making slight adjustments here and there will importing and exporting between ELAN and FLEx, and unfortunately, I did not document every adjustment I made and every step as carefully as I should have (part of it was frustration by not getting things right, I guess....).

My FLEx view is now as follows:

Screenshot (213).png

I am quite okay with this result. In ELAN, in interlinearisation view, I can now send the ELAN file plus a preferences file to my research counterpart, and he can work on the transcription and translation:

Screenshot (243).png

But, I have another question. I have noticed, and so has Christina Truong of SIL, that repeated ELAN to FLEx to ELAN import-export is just not possible. I have tried that several times. But it just doesn't happen correctly. The worst that has happened, was when I imported a flextext file from ELAN, and chose the option 'merge' with the existing text in FLEx. This really messed up the existing text, multiplying words and changing the constituent order in phrases, before FLEx crashing completely. Of course (dumb me), I did not have a back up of the original text in FLEx, so I have to do it all again once more.

Christina wrote me, that she has observed that issue as well, and that what they do is, they work in ELAN, sending the ELAN files back and forth to research counterparts, and basically finalising the segmentation and basic (phrase-level, word-level) transcription and glossing, and only then exporting it from ELAN and importing it into FLEx. Once this has happened, it is no longer to make any changes to the file, e.g., not possible to change the segmentation (split, merge) etc. If you try that, and export the flextext file from ELAN to FLEx again, it does not render it correctly anymore. The main problem seems to be that, once exporting and importing from FLEx to ELAN, the initial transcription tier in the segmentation view is replaced by the segnum tier, and that on the segnum tier, the number of actions that can be done is very limited, and moreover, no additional transcriptions can be added.

Have you also observed that? If you have, how have you dealt with it? If you didn't, and you can effortlessly continue to import and export between ELAN and FLEx no matter how much work you did in either the one or the other (e.g., additional segmentation and transcription in ELAN, or additional parsing and glossing in FLEx), then I am curious to know how you managed that!

Best wishes, Tim.

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to a topic in the Google Groups "FLEx list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/flex-list/KpF8nbc-ndI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/54fc9e62-6686-4f77-b4c9-5697f82e71can%40googlegroups.com.

Alexandre Arkhipov

unread,
Apr 6, 2021, 10:32:59 AM4/6/21
to flex...@googlegroups.com

Dear Tim and all,

Some points that I find useful in my workflows:

-- Instead of multiple transcription lines, put all but one (which is ideally in IPA or similar) into "translation" -- e.g. add another writing system for the vernacular language into Analysis writing systems, and add a tier for "Literal translation" in that writing system.
(I avoid using Notes because they behave differently from translation lines.)
Note that in recent versions of FLEx, you can add Custom tiers at sentence level -- perfect for storing the original text in different writing systems. It also exports/imports OK to/from flextext.

-- You're right that all changes involving splitting/merging sentences should be done BEFORE you import from ELAN into FLEx.
If you happen to have done some splitting/mergin in FLEx, your flextext export will have lost time and speaker info on the affected sentences. However this can be fixed manually in the flextext file, e.g. using a plain text editor like Notepad++. It needs some time and attention but is far easier than re-importing and re-analyzing the entire text (unless you have too many fixes to do).

When your text is time-aligned, each sentence (<phrase> element in flextext) should have several attributes, e.g.:
<phrase guid="e6f7b5ec-0a7d-40a8-8acb-665b6af36058" begin-time-offset="60760" end-time-offset="62750" speaker="MPT" media-file="15762501-d42c-4c82-bdf4-ad15fc821a82">
The time offsets are in milliseconds; here they point to a 2 sec fragment from 01:00.760 to 01:02.750.

When the time-alignment is broken in FLEx, only the "guid" will remain:
<phrase guid="e6f7b5ec-0a7d-40a8-8acb-665b6af36058">

You'll need to look up the timestamps (in milliseconds) in ELAN, and then just copy the string with all the attributes from an undamaged <phrase> element and put the right values for "begin-time-offset" and "end-time-offset".
Don't forget to check the time values on the preceding/following <phrase> -- the border
adjacent to the one you fixed can be wrong too.

-- Since FLEx cannot import any word- or morph-level info, it has indeed little sense to repeat ELAN-FLEx-ELAN roundtrip more than once for any given text. So, do a maximum of transcription, editing and other preprocessing in ELAN -- gloss in FLEx -- do all the rest in ELAN. In particular, pay attention to have no sentence-final punctuation (. ! ?) in the middle of annotations before exporting from ELAN -- otherwise FLEx will split the sentence and you'll have to fix the time/speaker attributes as described above.

-- If you have your participants each in his/her own set of tiers in ELAN, and if you set the "Participant" property correctly for all the tiers, you don't need to store the participant in the Note field.

You can still put it into Note if you're more comfortable to see it in FLEx (otherwise it is there invisible), but you don't have to, since it will be stored in the "speaker" attribute with each <phrase>, and ELAN will take it from there.

-- Finally, the inconvenience when going from FLEx back to ELAN, as you said, is that you don't have your main transcription for the whole sentence anymore (you've got the segnum tier instead). While this behaviour is unlikely to change in FLEx, you can copy your new *-word-txt-* tier to a new one, specifying segnum as parent and choosing e.g. phrase-item as type (with Symbolic Association as stereotype). This will concatenate all words and punctuation in a sentence, separated with a whitespace. And it is just one command to get something readable again. Not ideal, but you can further use search&replace e.g. to remove extra spaces before commas.

Hope that helps,
Alexandre


06/04/2021 10:46, Tim Bodt пишет:
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/CAFKa-3dCHjVNo3KeexgWLi56rMQxMzx%3DtpYtsL%2Bvsbk8NpGAPA%40mail.gmail.com.

Françoise Rose

unread,
Apr 6, 2021, 11:00:59 AM4/6/21
to flex...@googlegroups.com

Very useful, thanks !

 

De : 'Alexandre Arkhipov' via FLEx list <flex...@googlegroups.com>
Envoyé : mardi 6 avril 2021 16:33
À : flex...@googlegroups.com
Objet : Re: [FLEx] Re: Multi-language, multi-participant ELAN-FLEx-ELAN experiences sought

Tim Bodt

unread,
Apr 13, 2021, 9:24:16 AM4/13/21
to flex...@googlegroups.com
Hi Alexandre,

Thank you very much for your useful tips and advice! I am currently further fine-tuning my workflow, and I am incorporating some of the suggestions you made. 

Are you aware whether it is possible to change a transcription (so not the word or morph level information, but the basic transcription) in ELAN and then, after import, re-gloss only that particular changed item in FLEx without issue? The 'sense' in multiple round trips ELAN-FLEx-ELAN would be to have language consultants check and adapt transcriptions in ELAN based on the interlinearisation that came from FLEx, in a case where consultants won't work on the FLEx project itself.

I already learned the issue about punctuation marks (., ?, !) in segments in ELAN resulting in broken segments in FLEx already the hard way...

"If you have your participants each in his/her own set of tiers in ELAN, and if you set the "Participant" property correctly for all the tiers, you don't need to store the participant in the Note field."

--> The issue seems to be that if participant "A" is not the first participant in the recording (i.e., the recording starts with participant B), the after ELAN-FLEx-ELAN export, ELAN shows participant "B" as participant "A" and vice versa, have you observed that? I was advised to make a 'dummy segment' in the beginning of each recording with participant "A" if participant "A" is not the first speaker.

Thanks, Tim.

Alexandre Arkhipov

unread,
Apr 13, 2021, 10:37:17 AM4/13/21
to flex...@googlegroups.com

Hi Tim,

Are you aware whether it is possible to change a transcription (so not the word or morph level information, but the basic transcription) in ELAN and then, after import, re-gloss only that particular changed item in FLEx without issue?

Generally, not this way.
If you reimport a text with even slight changes into FLEx (or without any changes), all the words will have lost their analysis (unless it happens differently if you try to merge the two versions of the text during import, but I never tried that -- maybe someone can correct me). More precisely, the words that did not change will have their analysis (from the old version of the text) shown as a suggestion, which then will need to be confirmed.
So one option would perhaps be for your consultants to save their changes in ELAN (and mark them in some way, e.g. in the comments), and the person who's in charge of the FLEx project to go through these changes and replicate them in FLEx.

--> The issue seems to be that if participant "A" is not the first participant in the recording (i.e., the recording starts with participant B), the after ELAN-FLEx-ELAN export, ELAN shows participant "B" as participant "A" and vice versa, have you observed that? I was advised to make a 'dummy segment' in the beginning of each recording with participant "A" if participant "A" is not the first speaker.

Well, I didn't notice that, but the dummy segment is probably a good solution.

Best,
Alexandre

13/04/2021 15:24, Tim Bodt пишет:
Reply all
Reply to author
Forward
0 new messages