ELAN–FLEx–ELAN workflow problem

141 views
Skip to first unread message

Ryan Pennington

unread,
Feb 24, 2014, 9:30:24 PM2/24/14
to flex...@googlegroups.com
Hello all,

I have been working on an ELAN–FLEx–ELAN workflow in order to create a final time-aligned interlinearized product in ELAN. I have been reviewing the document from Gaved & Salffner (2014) for this purpose and customizing it to my own needs. I am writing up my method as well, to help others, with a particular focus on some issues that one would face in PNG.

But the reason I am writing is because I would like to know if there is any way, when exporting from FLEx and importing back to ELAN, for the phrasal vernacular to remain. For instance, what began as a single intonation unit annotation in ELAN (e.g. "nai walong") is brought back into ELAN as the phrase number ("segnum") from FLEx (e.g. "1"). I would like the final file not to replace the annotations with numbers. I have messed around with the settings enough and haven't found a solution. Any thoughts? As it stands, the final product will need to be two separate ELAN files, which is inefficient and makes searching more difficult.

By the way, the final product will also include a description of how one can use SayMore instead of ELAN, or in addition to it. I'm trying to provide the complete picture for people here.

Thanks for any help,
Ryan Pennington

Alexandre Arkhipov

unread,
Feb 25, 2014, 4:55:40 AM2/25/14
to flex...@googlegroups.com
Hello Ryan,

The reason might be that FLEx does not store phrase-level text in whole,
and the most similar tier ELAN finds happens to be the "segnum"s.
Could you share (samples of) the files that have this problem?

Best,
Sasha

On 25/02/2014 06:30, Ryan Pennington wrote:
> Hello all,
>
> I have been working on an ELAN-FLEx-ELAN workflow in order to create a final time-aligned interlinearized product in ELAN. I have been reviewing the document from Gaved & Salffner (2014) for this purpose and customizing it to my own needs. I am writing up my method as well, to help others, with a particular focus on some issues that one would face in PNG.

Tim Gaved

unread,
Feb 25, 2014, 5:58:28 AM2/25/14
to flex...@googlegroups.com
Ryan
I have since discovered that one step in the ELAN-FLEx-ELAN workflow
that Sophioe and I wrote up is not actually necessary - there's no need
to create a tokenised word tier in ELAN before exporting towards FLEx.
FLEx is quite happy to import the phrase tier.
However, as you've noticed there is currently no way to export the
phrase as a phrase level unit from FLEx. I believe that it would be
possible, but it hasn't been seen as a requirement until recently. Of
course FLEx would also have to have some extra code on the import side
so that it can handle situations where is both phrase level and word
level data exists - which level is primary?
I would imagine that in the interim, someone could write an XSLT that
would reconstruct the phrase level from the word level. Maybe Sasha
Arkhipov already has something.
On the subject of SayMore, make sure that you are working with the new
version 3.0.171 available from saymore.palaso.org. The FLEx export has
been enhanced so that the annotation timings are now export as well.
This means that for simple recordings a SayMore-FLEx-ELAN workflow is
possible, (though still with the same problem mentioned above).
Tim

On 25/02/2014 02:30, Ryan Pennington wrote:
> Hello all,
>
> I have been working on an ELAN-FLEx-ELAN workflow in order to create a final time-aligned interlinearized product in ELAN. I have been reviewing the document from Gaved & Salffner (2014) for this purpose and customizing it to my own needs. I am writing up my method as well, to help others, with a particular focus on some issues that one would face in PNG.

Ryan Pennington

unread,
Feb 25, 2014, 11:19:06 AM2/25/14
to flex...@googlegroups.com
Hi Tim,

Thanks for your response. Yes, I noticed that there was no need to tokenize the phrase tier to produce a word tier, since FLEx does that task for us. Yes, I did switch to the new version of SayMore while I was working on this, which is what convinced me to include SayMore in the workflow.

The paper provides a SayMore–FLEx–ELAN–SayMore workflow. The text is segmented, transcribed, and translated in SayMore. Then the text is pulled into FLEx for incorporation of the lexicon. Then after interlinearization is finished the text is pulled into ELAN for a full time-aligned interlinear product. This final product is then added to the SayMore corpus. However, for dialogues and other multiple-speaker recordings, it's still important to know how to start in ELAN. So I have a couple sections in the paper dealing with this, which largely coincide with what you wrote in your paper. I'll send the current state of the paper to you off-list. I'd be interested to hear any comments you may have.

By the way, is there any way at all to copy tiers from one project to another? I haven't been able to find a way to do so, but that would be a workaround. Someone could just copy the phrase-level tier from the SayMore-created or ELAN-created text into the final interlinear text.

Ryan
> --
> You are subscribed to the publicly accessible group "FLEx list".
> Only members can post but anyone can view messages on the website.
> --- You received this message because you are subscribed to the Google Groups "FLEx list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
> To post to this group, send email to flex...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/530C7754.70204%40sil.org.
> For more options, visit https://groups.google.com/groups/opt_out.

J V C

unread,
Feb 25, 2014, 5:40:57 PM2/25/14
to flex...@googlegroups.com

On 2/25/2014 10:19 AM, Ryan Pennington wrote:
> The paper provides a SayMore-FLEx-ELAN-SayMore workflow.
Awesome!
Reply all
Reply to author
Forward
0 new messages