Interlinear Lookup issue

Craig Kopris

unread,

Jan 23, 2023, 11:04:33 PM1/23/23

to shoeboxtoolbox-fiel...@googlegroups.com

Context: I have a series of texts by a previous researcher who used a more or less phonetic orthography at a level of detail such that any given token of a word may be spelled many different ways. In addition, I have a variety of word lists in a variety of orthographies from a variety of researchers, including the transcriber of the texts.

The texts are linked to a word-level lexicon where the regularized spelling is the record marker, and the orthographic variants are separate fields, each followed by the gloss of that token, and a source citation. In essence, each variant is treated as a subentry. In the marker hierarchy, sources are under glosses, which are under orthographic variants, which are under the regularized spelling. A variant can have multiple glosses from different citations:

regularized spelling

[various fields, e.g. parse]
orthographic variant 1

gloss 1

source 1
source 2

gloss 2

source 3

orthographic variant 2

gloss 3

source 4

[etc]

Interlinearization consists solely of lookups, going from the orthographic variant to other data (gloss, citation, etc)

Problem: While that setup was fine for the original interlinearization, performing an updated interlinearization after extensive additions to the lexicon runs into the problem of the lookups not being restricted to the original data "subentries".

I.e. lookup on orthographic variant 2 will find the correct regularized spelling and e.g. return the parse, but the lookup from orthographic variant 2 to gloss will take the very first gloss in the main entry, gloss 1, instead of gloss 3, often producing errors as those were not necessarily formatted for interlinearization (e.g. characters not allowed). Lookup from orthographic variant 2 to source will pull up an unsortable list of all citations in the whole entry, rather than source 4.

Is there a way to tell the lookup to use the gloss from the orthographic variant looked up, rather than the first one in the main entry? Currently, I have to go into each entry and re-order the subentries so that the one needed is at the top, and re-order the fields in that subentry, etc. The next time another variant occurs in the text, I have to re-order the lexicon entry fields again.

Thanks,

- Craig

ToolBox SIL

unread,

Jan 23, 2023, 11:21:36 PM1/23/23

to shoeboxtoolbox-fiel...@googlegroups.com

Hi, Craig,

I'm sure there is a way to do what you want. Have you looked at the hierarchy? It might be involved. Be sure the gloss is UNDER the variant

Right now I was just heading for bed and I don't trust my brain. But I wanted you to know the message has been received and we'll look at it tomorrow (our time). GMail tells me you sent the message 15 minutes ago, so I'm hoping you're still online.

Karen

Toolbox Support

--
You received this message because you are subscribed to the Google Groups "Shoebox/Toolbox Field Linguist's Toolbox" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/CAMw68uEgDvEeoR9EwK%3DHRAFakm7%3Dm8RwKGO4edcd4sxNtzC%2B9g%40mail.gmail.com.

ToolBox SIL

unread,

Jan 24, 2023, 7:25:11 PM1/24/23

to shoeboxtoolbox-fiel...@googlegroups.com

Hi, Craig,

I think the hierarchy might really be worth a try. I see that the default for the MDF dictionary is to have the the gloss under the sense number.

In case you aren't familiar with how to specify the hierarchy, here are some notes and screenshots:

1) Place your cursor in your lexicon.

2) Do Database, Properties and choose the gloss marker.

3) Click on Modify. Toolbox will bring up the Marker Properties dialog for that marker.

4) In the Marker Properties dialog for the gloss, drop the list for "Under what in the Hierarchy" and choose the variant marker.

Since you didn't specify, you may be using different markers than those I've used to illustrate. Use your markers!

5) Click OK twice -- until you're back to the main Toolbox window.

6) Now try some interlinear and see if it works. I don't have a good sample to try this with.

If it doesn't work, please send me your project with some data -- both text and lexicon -- where the problem occurs. Note, I don't mind having your whole text and lexicon, but I only need enough to illustrate the problem.

Blessings,

Karen

Toolbox Support

Craig Kopris

unread,

Jan 24, 2023, 7:25:45 PM1/24/23

to shoeboxtoolbox-fiel...@googlegroups.com

Hi Karen,

I was still online but also tired enough to not trust my brain!

>"Be sure the gloss is UNDER the variant"

Ah, the gloss and the variants were both under the regularized spelling!

Fixing that brings up another issue, where there are three fields for variants, depending on character set. The primary field was for a legacy character set that only became covered by Unicode fonts in the last decade or so (the primary data source, of course), a second field for Unicode (most sources), and a third for another set still not covered in Unicode. The first two could be merged, though the reliability of the transcriptions makes keeping them separate practical, but the third isn't Unicode-compatible. Any given text uses a single character set variant field, so that's a minor tweak going from text to text.

Another issue with the gloss is that there are three gloss fields, depending on the source language, although that's also a minor adjustment since most texts are glossed in English.

I adjusted the hierarchy and did a test run, but there was still the issue of the chosen gloss being the first in the entry. Looking at the word list entry containing a problematic variant, I noticed that there were 42 orthographic variants (yup forty-two) but some of those were duplicates that could be merged. After merging, so that each variant was unique, and each gloss per variant was unique, I retested. Instead of interlinearizing properly, I received the error message

Forced value not in lexicon: [gloss 1]

followed by

Forced value not in lexicon: [source 1]

and *** was output in the interlinearization. [gloss 1] and [source 1] were not the literal field data, but follow the model from my previous email. What I was looking for was the only gloss under the 8th variant.

- Craig

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/CAEgRS93UdO4FkvARcLYmcBiF67t3jXfH8ptnBbjtdEOf7m1-BQ%40mail.gmail.com.

Craig Kopris

unread,

Jan 24, 2023, 7:27:37 PM1/24/23

to shoeboxtoolbox-fiel...@googlegroups.com

We overlapped responses!

I'll put together a manageable chunk of the data.

Thanks,

- Craig

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/CAEgRS93MQsUteX6dj_9ZqBE7Fqk8qUEk%3DZF%3DbUkxcuYJOzvsFw%40mail.gmail.com.

ToolBox SIL

unread,

Jan 24, 2023, 9:35:16 PM1/24/23

to shoeboxtoolbox-fiel...@googlegroups.com, cko...@gmail.com

Hi, Craig,

Feel free to send the info just to Toolbox @ sil.org (no spaces, of course). Most people are somewhat protective of their data, even in relatively small quantities.

It sounds challenging, but Alan and I discussed your situation before I wrote and considered at least one possible option for you if the hierarchy didn't work. We definitely need some data, though, for trying it out.

Blessings,

Karen

To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/CAMw68uHXn9ddo0SiiqXsV8rNpX8aSEeigTWtDH%2BBeZsABRMtvA%40mail.gmail.com.

Reply all

Reply to author

Forward