Adding a phonetic transcription line in Interlinear text

102 views
Skip to first unread message

Gray Plunkett

unread,
Jun 7, 2022, 9:10:27 AM6/7/22
to FLEx list
Hi, FLExers!

I've gotten a good start to configuring FLEx for my Foodo data, but before I go too far, there's a question I have regarding Interlinearizing.

I'd like to know if it's possible in FLEx to add a line between the Word Line and the Morpheme Line. Since my texts are typed up in the Orthography which doesn't mark all tones, I'd like to have a line after the Word (Text) line that would allow me to convert each word to an IPA transcription with all tones marked. Then I'd like that output to be the input of the parser to separate each word into the corresponding morphemes.

Word:                    Ɔnyɩm   ɔkʊ    a  náa  dɩ́gbalɩ    ŋdɛɛlɩ
Phonetic Trans:   ɔ̀ɲɩ́ḿ     ˈɔ́kʊ́    à  ná:  dɪ́g͜bálɪ̀    ǹdɛ̀:lɪ́
Morpheme:          O-ɲɩ́:-ḿ   O-kʊ́  à=ná:  dI-gbà-lÍ   ǹdɛ̀:lɪ́

I saw that in FLEx I can add another Word line with another Writing System, but when I tried that, I had to manually enter each word again. I'd like the Parser  to give me choices of what would be limited number of possibilities of tonal changes due to the context of where each word is in the phrase.

From the help menu, I thought I could add a Custom Field and then add a new Interlinear line, but that doesn't seem to be the case. Is it possible to do what I'd like to do or not?

Thanks for any help someone can give me with this.

Gray


Andy Black

unread,
Jun 7, 2022, 5:26:26 PM6/7/22
to flex...@googlegroups.com, Gray Plunkett
Gray:

As far as I know, the only way to get something like you want is to add another Word line using another writing system.

You can use Bulk Edit Wordforms to try and fill-in the tonal writing system based on what is in the orthographic writing system.

As you may know, when interlinearizing, there are three ways to do it: manually, using the default XAmple parser, or using the Hermit Crab parser.  If by "the Parser," you mean one of the latter two, then in order for one of these two to process the words using the tonal writing system in the text, you'll need to make sure that the default vernacular writing system is set to the tonal one.  I think the first Word line should also be for the tonal writing system all cases  This means you'd get the opposite order you want.  Maybe someone else can give a more definitive answer on this.

Neither of these two parsers look beyond the current word for context.  There is an experimental tool (Use TonePars with FLEx) that can look at surrounding words within a phonological phrase and apply user-written autosegmental tone rules to disambiguate which parses should be shown to the user when interlinearizing text.  TonePars is a legacy program.  You can see a description at https://www.sil.org/resources/publications/entry/7850.  There is an example in section 3.3.6 of a tone process occurring across words. The user documentation for Use TonePars with FLEx (which basically assumes one is familiar with TonePars) is at https://github.com/sillsdev/pcpatrflex/blob/ToneParsFLEx/ToneParsFLExDll/doc/ToneParsFLExUserDocumentation.pdf.

Let me know off-list if you want to know more about TonePars.

--Andy
--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/8d88a067-31c8-44ba-b194-c2c84873bc34n%40googlegroups.com.

Kevin Warfel

unread,
Jun 8, 2022, 1:55:49 PM6/8/22
to flex...@googlegroups.com

Hi, Gray.

 

I recently imported a Toolbox database into FLEx that included two forms for each entry—one with tone marks and the other without. The forms with no diacritics represented the orthographic conventions adopted by the community; those with diacritics contained the details necessary to enable FLEx to distinguish otherwise homographic forms and correctly parse words containing multiple morphemes.

 

As I prepared the data for import into FLEx, I sought counsel from a more-experienced colleague, and together we considered several options for the import of these two forms, which I’ll refer to as “orthographic” (no diacritics) and “phonological” (tone marked via the use of diacritics):

1) Import the phonological (PHON) form into the Lexeme Form field (used by the parser) and the orthographic (ORTH) form into the Citation Form field (used as the headword of the dictionary entry if present).

2) Import ORTH into the Lexeme Form field and treat PHON as a pronunciation, putting it in the Pronunciation field. (Since Citation Form would be empty in this scenario, the Lexeme Form would serve as headword of the dictionary entry.)

3) Import both ORTH and PHON into the Lexeme Form field but into two different writing systems.

 

For reasons that I can only partially recall at this point, we opted for the third of these possibilities, and it worked very well. I think this option would work well for you too. However, as Andy Black pointed out, you might need to accept having the ORTH data displayed below the PHON data in your interlinear in order to enable the parser to access the PHON data.

 

Note: If you choose at some point to also include data in IPA representation, you can add a third writing system and have all three forms in the Lexeme Form field.

 

Best wishes,

 

Kevin Warfel

Associate Dictionary & Lexicography Services Coordinator

Rapid Word Collection workshop consultant

--

Gray Plunkett

unread,
Jun 11, 2022, 2:21:49 PM6/11/22
to FLEx list
Thanks for the help. I tried adding another Word line. From what you said, and from what I saw in trying it out and reading section 13 of Ken Zook's FLEx tips, http://downloads.sil.org/FieldWorks/Documentation/Flex_tips.pdf, it seems that typing something into the Phonetic line so it will show up in every other location that wordform is used, it's not possible to have more than one choice per wordform. What I need is to be able to choose the phonetic form that occurs due to tonal rules operating across word boundaries. For example the indefinite article for Agreement Class 1 is <ɔkʊ> . When that word occurs it can be manifested phonetically as either [ɔ̀kʊ́] or [ꜜɔ́kʊ́]. So what I'm looking for is for the interlinearizer to give me a choice of those two phonetic forms to choose from when I am interlinearizing a given text with the word <ɔkʊ>. Is this possible? (Note, I'm talking about a line to do this BEFORE the Morpheme line and then I would ideally want the Morpheme line to come from the 2nd Word line, so putting these forms as allomorphs wouldn't help me.) From private message from Ken Zook, it doesn't see possible.

I guess a solution would be to do the interlinearize in Toolbox to go from Foodo orthography text to IPA text then take that IPA text line and put that in the FLEx Texts area of FLEx and start from there. I'm just trying to avoid having to completely retype a Foodo text so that I can interlinearize it in FLEx and I want the interlinear text to be in IPA so that I can include it in any papers.

Maybe I'm just trying to make more work for myself than I need to. What do others using FLEx do? Do most people interlinearize from text that is in the standard orthography? And if they do, and then when writing a linguistic paper and they need an example, do they just re-write the Word line using standard IPA instead of the orthography?

Gray 

Mike Maxwell

unread,
Jun 11, 2022, 3:51:10 PM6/11/22
to flex...@googlegroups.com
> What do others using FLEx do? Do most people interlinearize from text
> that is in the standard orthography? And if they do, and then when
> writing a linguistic paper and they need an example, do they just
> re-write the Word line using standard IPA instead of the
> orthography?

I think what people usually do is to use what they believe to be
phonemic or orthographic text in interlinear. In the early stages of
one's study of a language, before one has figured out the phonology,
that "phonemic" text may show what will turn out later to be predictable
phonetic (allophonic) variants. But once a phonetic variant has been
shown to be predictable, people usually stop explicitly writing it, and
maybe go back and change phonetic text to be more phonemic. I doubt
that most people go the other direction, intentionally writing
phonetic/allophonic variants of phonemic text. I suspect this is true
regardless of whether the conditioning environment is local (within the
same word) or larger (adjacent words, or based on intonation patterns).

One exception would of course be when writing a phonology description,
where allophonic variants would be shown along with the conditioning
environment. And here they probably write the variants by hand.

Orthographies sometimes omit phonemic differences, too, where they don't
bear much functional load. Tone is often one of the omitted things, or
sometimes voicing (like English doesn't distinguish voiced and voiceless
interdental fricatives in modern orthography).

So wanting to write predictable phonetic differences is not what most
people do.

On 6/11/2022 2:21 PM, Gray Plunkett wrote:
> Thanks for the help. I tried adding another Word line. From what you
> said, and from what I saw in trying it out and reading section 13 of
> Ken Zook's FLEx tips,
> http://downloads.sil.org/FieldWorks/Documentation/Flex_tips.pdf
> <http://downloads.sil.org/FieldWorks/Documentation/Flex_tips.pdf>,
> *From:* flex...@googlegroups.com <> *On Behalf Of *Gray Plunkett
> *Sent:* Tuesday, June 7, 2022 9:10 AM *To:* FLEx list
> <flex...@googlegroups.com> *Subject:* [FLEx] Adding a phonetic
> <http://groups.google.com/group/flex-list>. --- You received this
> message because you are subscribed to the Google Groups "FLEx list"
> group. To unsubscribe from this group and stop receiving emails from
> it, send an email to flex-list+...@googlegroups.com. To view this
> discussion on the web visit
> https://groups.google.com/d/msgid/flex-list/8d88a067-31c8-44ba-b194-c2c84873bc34n%40googlegroups.com
>
>
>
<https://groups.google.com/d/msgid/flex-list/8d88a067-31c8-44ba-b194-c2c84873bc34n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> -- "FLEx list" messages are public. Only members can post.
> flex_d...@sil.org http://groups.google.com/group/flex-list
> <http://groups.google.com/group/flex-list>. --- You received this
> message because you are subscribed to the Google Groups "FLEx list"
> group. To unsubscribe from this group and stop receiving emails from
> it, send an email to flex-list+...@googlegroups.com
> <mailto:flex-list+...@googlegroups.com>. To view this
> discussion on the web visit
> https://groups.google.com/d/msgid/flex-list/f28769e5-1016-4814-9faa-89b4a3a2d72cn%40googlegroups.com
>
>
> <https://groups.google.com/d/msgid/flex-list/f28769e5-1016-4814-9faa-89b4a3a2d72cn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
This email has been checked for viruses by AVG.
https://www.avg.com

Gray Plunkett

unread,
Jun 14, 2022, 6:07:39 AM6/14/22
to FLEx list
Mmax,
Thanks for your response. The issue is not primarily phonemic vs. phonetic, rather having the text be in standard IPA vs the orthography. For a linguistic paper, don't you generally need to have your examples in IPA and not the orthography? For example, in Foodo [ɪ] is written <ɩ>, [ʧ] is <c>, and [ʤ] is <j>. And having all the tones marked is rather critical, which the orthography only shows partially.

Gray

Beth-docs Bryson

unread,
Jun 14, 2022, 9:09:28 PM6/14/22
to flex...@googlegroups.com
I won’t comment on what is expected in linguistic papers (though I would think that if your orthography is roman (as it appears), then I would think some contexts would be content with using the orthography, and having a section to explain how it correlates with IPA).

Regarding representing the text in two scripts, it is possible, but there are some tricks in getting it set up well, and not ending up with duplicates.

And you are right, that for each Form in the Baseline of your text, if you want to represent that in a different WS, then there can be only one corresponding representation.  (That is, whatever WS is used for the Baseline is considered primary.  Then you can add forms in other WSs, and it is okay to have duplicates.  That is, more than one Baseline form can have the same representation in the secondary WS.  It just isn’t possible to have one Baseline form have more than one representation in the secondary WS.)

One way around this is to set this up so you are going in the “many to one” direction, instead of the “one to many”.  That is, make the Baseline of your text be in IPA, then in the Wordforms list, fill in the orthography version.  

Granted, you said your texts are typed in the orthography.  So you would need to first convert them to IPA and then paste them into a new text that is set to have IPA as the Baseline, instead of the orthography.  You could write a TECkit map (or other approaches) and apply that in Bulk Edit/Process.

But mapping from the orthography to IPA is not one-to-one because of the tones.  And I’m not thinking of a good way to automatically determine what tones should be added.

Not sure if this is helpful, other than to confirm some of what you have already figured out.

-Beth


--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org

---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages