Should IPA be an analysis language or a vernacular language?

137 views
Skip to first unread message

Adam

unread,
Nov 21, 2012, 5:13:10 AM11/21/12
to flex...@googlegroups.com

Hi,


I am writing with what feels like a very basic question. Should I be using IPA as a vernacular writing system or an analysis writing system?


I am not analyzing English, but this example serves. Suppose I have English with the Latin orthography as a vernacular writing system and IPA as a vernacular system. Then suppose I create a text with the following text with an orthographic baseline:


I've read a book.

Now I want to read a magazine.


In the glossing tab, first I gloss “read” as [ɹɛd] on the first line. Then I gloss “read” as [ɹid] on the second line. But now “read” on the first line becomes [ɹid]... and back-and-forth, ad infinitum. :-) (And this behavior is the same if I give the words different Word Glosses, etc.)


If, instead, I add IPA as an analysis writing system and enter my phonetic gloss in the Word Gloss field (with the IPA writing system), then I can enter distinct transcriptions in my Word Gloss fields. Although this works, for whatever reason it feels wrong (I suppose because of the definition of “gloss”). Is this the way it is supposed to work, however?


Thanks,

Adam

Beth (work) Bryson

unread,
Nov 21, 2012, 12:37:24 PM11/21/12
to flex...@googlegroups.com
This is a really important concept to understand, so it's really good that you asked.

"Writing System" is a special term that refers to the combination of "language + script".  So "IPA" itself is not a writing system; it is only a script.  IPA can be used to write English or Nahuatl or French or whatever.  If you ever have an IPA writing system, do make sure there is also some language specified as well, and that will tell you if it should be analysis or vernacular.

The terms "vernacular" and "analysis" are referring to the language itself, and the script sort of comes along for the ride.  "vernacular" refers to:  "The language you are studying".  "analysis" refers to "The language of the audience you are writing for."  So, the actual words themselves are in the vernacular language, and any glosses or definitions or part of speech labels etc. are in the analysis language.

Note that with this understanding, then if you are doing a monolingual dictionary, the analysis language would be the same as the vernacular language, since the audience language is the same as the words being studied.  It is still a useful distinction.

So then in FLEx, if Spanish were the vernacular language, you could write Spanish in several different scripts.  Spanish in the official orthography would be one vernacular writing system, and Spanish in IPA would be another.  Whatever language you are glossing it into would be an analysis writing system.

In the example you gave below, you are not actually glossing the words--you are simply writing them in a different script.  So in this case, that is not a gloss, it's a transcription. To do that, you would want multiple copies of whatever row you are writing this into:  You could have multiple Word rows, one for English orthography and one for English in IPA.  In Bulk Edit Wordforms, you need multiple Form columns showing, one for each writing system, and then you want to bulk copy from one Form field to another.

At this point I get a little confused by the other things you have written:

 - You mention "the glossing tab".  Are you using the Gloss tab, or the Analyze tab?  
 - You said that when you glossed a word in the second sentence of your text, the gloss in the first sentence changed.  That doesn't sound right.  If you did something in the Word row, that might make more sense.  I probably need a screenshot to understand what you mean by that.

I hope some of this helps, and I can talk about it more if I have a little more detail (or screenshots) about what you are doing.

-Beth


--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

J V C

unread,
Nov 21, 2012, 12:49:31 PM11/21/12
to flex...@googlegroups.com
Just to expand on this slightly, the process for enabling a vernacular
writing system to also be used in analysis fields is:
- Format, Set up Writing Systems...
- Click Add next to the list of analysis WS's.
- Choose your vernacular. (Don't worry, it won't actually add a second
copy of the WS; it will just enable it for analysis use.)

Not that many of us are actually publishing monolingual dictionaries,
but even so it can be nice to store vernacular definitions.

To make an analysis language available as a vernacular, just click Add
next to the list of vernaculars (e.g. if you've been putting in glosses
from another dialect and now want to put them in Lexeme Form instead).
I'll ask whether this is actually a good idea in a separate thread.

Jon

Adam

unread,
Nov 22, 2012, 12:55:17 AM11/22/12
to flex...@googlegroups.com
Hi,

My terminology was indeed confused wrt languages, scripts, writing systems, etc. Let me say things a bit differently (using my real-world situation instead of an example.)

My setup is this:

Vernacular writing systems:
Wakhi (Arabic script)
Wakhi (IPA)

Analysis writing systems:
English
Dari (Arabic script)

My texts have a baseline of Wakhi (Arabic script), because they are being transcribed by my language helper. So I paste that into the Baseline tab. Then I move to the "Gloss" tab. I have it configure to show these rows: Word (base), Word (IPA), Word Gloss (Dari), Word Gloss (English). The idea is to gloss and transcribe at the same time.

There is an orthographic form یرک which does double duty in this orthography, [jærk] 'work' and [jærək] 'for him'. But it doesn't seem possible to have different phonetic transcriptions in the Word (IPA) field. If I change one یرک  to [jærk] or [jærək], it changes them all.

My guess is that Flex has a single representation for the baseline entity یرک which can be written with several writing systems -- there's no way to have two یرک 's in the "Word" row. In some ways that makes sense and in some ways it doesn't. But the question is, am I supposed instead to be doing my phonetic transcriptions in the "Word Gloss (IPA)" line, since that line allows there be be several different یرک 's.

Thanks,
Adam

Beth (work) Bryson

unread,
Nov 22, 2012, 1:06:22 AM11/22/12
to flex...@googlegroups.com
Okay, yes, that is more clear.

This is the problem of trying to have a phonetic representation, including cases where a single form in the text has more than one phonetic representation.

Others have tried this (e.g., studying tone), but right now I can't remember the solution they came up with. You are right--FLEx isn't really designed for one form to have more than one transcription in the corresponding WSs.

No, I don't think putting it in the Word Gloss field is a good solution.

-Beth


Gordon

unread,
Nov 22, 2012, 4:14:35 PM11/22/12
to flex...@googlegroups.com
Is this really any different than having 'read' PRES/IMPERATIVE and 'read' PAST?
It ought to be possible to choose to Add a new Entry and create a new lexical entry that has the same Wahki forms and different glosses. Then in the Gloss tab you will need to choose between the two different possibilities for that baseline text.

Am I missing something?

Gordon

Beth (work) Bryson

unread,
Nov 22, 2012, 7:13:44 PM11/22/12
to flex...@googlegroups.com
It is different because he is transcribing the surface form, not glossing it.

If he wanted to put it into the Gloss field he could, but he is right that it is not really a gloss.

The other people who have tried to do something similar were studying tone.  One surface orthographical form had several different phonetic realizations, because the tone changes depending on the words that are around it.  That's exactly what they were trying to study.  But FLEx isn't set up to have a one-to-many relationship among the Forms in different writing systems.

One alternative would be to put some sort of "homograph number" into the form on the baseline.  Then there would be different surface forms in the orthography version of the Word line, and you could have different phonetic forms in the IPA version of the Word.  This is not how it is really written, but if the text is not getting exported, and is simply for study within FLEx (or even would only be exported for an academic audience), then it could work.

-Beth

Adam

unread,
Nov 23, 2012, 10:12:55 PM11/23/12
to flex...@googlegroups.com
To give the example going the other way, suppose I have an IPA-baseline text, [ɑj ɹɛd ə bʊk wɪθ ə ɹɛd kəvɹ]. The corresponding question would be where to put the orthographic representation, if I wanted to include that.

Happy thanksgiving to all who observed! :-)

Adam

Beth (work) Bryson

unread,
Nov 24, 2012, 1:42:50 AM11/24/12
to flex...@googlegroups.com
Well, this would be a lot easier, assuming that each phonetic representation was unique.

Just add another Word line, set to the orthography WS.  Then you can either type the word in there, or you could use Bulk Edit Wordforms to begin to populate it (either by typing or by applying a process).

Does that help?

The trick is that the main Baseline has to have word spellings that are unique.  If you want more than one Form in a different WS for a word with a single spelling in the main WS, you have to find some way to make the main WS version different for those different transcriptions.

-Beth

Adam

unread,
Nov 24, 2012, 2:30:47 AM11/24/12
to flex...@googlegroups.com
Well, okay. I am surprised that this hasn't come up before. I recognize that Flex is more oriented towards processing transcribed texts, but even so, homophony is not so rare.

Thanks,
Adam

Alexandre Arkhipov

unread,
Nov 24, 2012, 3:13:47 PM11/24/12
to flex...@googlegroups.com
Beth,

If one uses IPA as the main Baseline, s/he will be forced to use it for parsing as well, right? Then it doesn't look so attractive because there could be much phonetic variation on top of morphology, let alone speech disfluencies etc.

In general, it seems to me quite a heavy limitation to have the same baseline (i) as the input string for grammatical analysis, and (ii) as the 'original' text (be it a transcription or a written/published text). Both spontaneous natural discourse and written texts usually need quite some cleanup/preprocessing (we use to call it normalization) before one can do morphology or syntax. And linguists normally need to keep both the original (e.g. transcript) and the normalized versions to arrive at a reliable and stable analysis.

I wonder whether this limitation can, at least in remote future, be removed in FLEx, though it seems to have deep roots in design.

Best,
Sasha


Sat 24 Nov 2012 00:42:50 от "Beth (work) Bryson" <Beth-wor...@sil.org>:

J V C

unread,
Nov 26, 2012, 9:39:39 PM11/26/12
to flex...@googlegroups.com
Creating two entries in the lexicon seems to work, once I configure the
interlinear view to display a second Lex Entries line (or Morphemes
line) in another WS. (At first, I tried adding a second Word line, but
that remained blank.) And I also had to select the one I wanted from the
Lex Entries line each time. I still ended up with only one wordform, but
with two approved analyses.

Is this close enough? I've attached a screenshot. If that doesn't come
through, it's also here:
https://www.sugarsync.com/pf/D342747_6399192_6860073

Just to make sure this is clear: wkh is the main vernacular WS in the
lexicon, but I picked the second vernacular WS when creating the
baseline of this text.

Jon

wakhi-ws.png
Reply all
Reply to author
Forward
0 new messages