Double diacritics in Toolbox

Rolf Hotz

unread,

Mar 4, 2022, 9:59:44 AM3/4/22

to shoeboxtoolbox-fiel...@googlegroups.com

Dear members of the field ling. Toolbox group

I tried to post a question earlier but I think it didn't reach the forum so I thought I might just as well email you the same question.

My problem is that I would like to have the possibility to add two diacritics on vowels: tone ( <á>, <à> <â> ) and nasalization ( <ã>) i.e. something like this: (ã́).

I have already added all tone-nas. combinations under 'Language encoding> Sort order> Properties': A a Á á Â â À à Ã ã Ã́ ã́ Ã̂ ã̂ Ã̀ ã̀, for every vowel, but this does not seem to help Toolbox parse or recognize them. I have made sure that the lang. encoding used is the same throughout.
I have checked that the symbols (vowels with diacritics) are exactly the same, I have unicode turned on and Toolbox is up to date.

Any hint helps!

best

Rolf

toolbox

unread,

Mar 4, 2022, 10:27:58 AM3/4/22

to Shoebox/Toolbox Field Linguist's Toolbox

First, Rolf, apologies that your message didn't get posted sooner. Sometimes computers and I disagree and your first message was one of the times.

Interlinear has problems recognizing characters when the text and the Sort order are not exactly compatible. For example, if á is a single character in the text but is composed of a plus the acute accent (ie, two characters) in the sort order, the interlinear will see the á in the text as punctuation.

I'm assuming something like this is happening in your case. It can happen if the text used a different keyboarding approach than you are using to create the sort order. Similarly, different keyboards can create a mis-match between the way characters were formed in the text and how they are formed in the dictionary. In such a case, the words of the text won't be found in the dictionary. (Things like this happen in projects that have been going on for some time.)

One thing you can do is to select the character in the text that seems to be giving problems and paste it into the sort order. If the sort order accepts it, you will know that it was different in some way. The sort order will not allow duplicated characters. Or, paste the word into the dictionary and see if that helps.

I've assumed the simplest case here. I can imagine further complications, depending on what you are trying to do with parsing. For example, are any of your morphemes single diacritics? (tone, etc) An example or two would be useful so I don't try to solve problems you aren't having.

Karen

Toolbox Support

Rolf Hotz

unread,

Mar 6, 2022, 11:47:42 PM3/6/22

to Shoebox/Toolbox Field Linguist's Toolbox

Dear Karen

Thank you very much for your comment! I tried creating two test entries in the dictionary (ṹ and ũ) and they are recognized by Toolbox!

That seem to be fine! The problem is when I try to add them as suffixes i.e. -ṹ . Now that is not working neither for doble nor single circumfixes (ṹ ũ). It works though for diacritics that naturally occur in my keybord such as á,ô etc. but that was never a problem. I thought it could be related to 'secondary characters' in 'Sort order properties' but I played around adding a '-' hyphen but that didn't work either. In sum, there seems to be an incompatibility between hyphens and diachritics that are alien to the keyboard.

Should I add an example or the error that I get?

It would be great to have entries that are composed of one diacritic only! But I first have to figure out more about the language's morphophonology to add them with confidence.

best

Rolf

El dia divendres, 4 de març de 2022 a les 12:27:58 UTC-3, toolbox va escriure:

Rolf Hotz

unread,

Mar 7, 2022, 1:54:35 AM3/7/22

to Shoebox/Toolbox Field Linguist's Toolbox

* when it says circumfixes it should say diacritics!

El dia dilluns, 7 de març de 2022 a les 1:47:42 UTC-3, Rolf Hotz va escriure:

ToolBox Support

unread,

Mar 7, 2022, 5:21:18 PM3/7/22

to shoeboxtoolbox-fiel...@googlegroups.com

Hi, Rolf,

Yes, please show any error you get. I see nothing wrong in the image you included.

But I'm thinking that you probably need to send me your project so I can poke around in it. Send it to

Toolbox @ sil.org

(no spaces, of course -- I spread it out because the Google Group would hide part of the address). Sending your project means the data, the *.prj file, and all the *.lng and *.typ files that are in the folder with the *.prj file.

Either that or we can get together online. You can Skype me at KB9876 Skype will let me see your screen.

As for your diacritics being being alien to the keyboard, people have done interlinear in completely non-roman scripts. So I'm not sure what is going wrong here. But interlinear has complex interactions and it's not always easy for me to spot problems until I get ahold of the data and "play" with it for awhile.

We will find and solve this problem! I know interlinear can do this.

Karen

Toolbox Support

--
You received this message because you are subscribed to a topic in the Google Groups "Shoebox/Toolbox Field Linguist's Toolbox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/shoeboxtoolbox-field-linguists-toolbox/TldpMYgBBBY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/8c1c3f2d-815f-4616-99fb-00edebe19565n%40googlegroups.com.

ToolBox Support

unread,

Mar 8, 2022, 10:33:45 AM3/8/22

to shoeboxtoolbox-fiel...@googlegroups.com

Rolf and I and Alan (the Toolbox programmer) had a good Skype meeting this morning. (It was nice to meet you, Rolf!)

The problem turned out to be the "Punctuation" option in the Language Encoding. He had filled it in and when he got rid of the Punctuation, things started to work.

The Punctuation option was really designed for scripts like Chinese which are next to impossible to list fully in the Sort Order. Normally the Interlinear feature discerns what is punctuation by looking in the Sort Order. If a character is not there, Interlinear assumes it is punctuation and discards it when doing its analysis. When something is in the Punctuation box, Toolbox Interlinear assumes only the characters in the Punctuation box are punctuation and that everything else is part of the word. Basically, if you can list your sort order, don't do the Punctuation box. (That comment needs to be put on the dialog box.)

Anyway, we discovered he had the diacritics individually in the Punctuation box. But diacritics are part of the word. That would explain why he couldn't do any combined diacritics -- Interlinear was discarding them as not part of the word.

In any case, problem solved.

Karen (& Alan)

Toolbox Support

Reply all

Reply to author

Forward