word initial special character

10 views
Skip to first unread message

Rolf Hotz

unread,
Feb 25, 2026, 9:24:39 AM (13 days ago) Feb 25
to Shoebox/Toolbox Field Linguist's Toolbox
Dear members of this group,

I have set up a new Toolbox project and have been having difficulty parsing words when the first letter contains a special character, specifically an accent (e.g. á, à). I added these characters in the Sort Order Properties tab, but the parser still triggers an error (***). So far, I have found a workaround by adding a letter before the accented character (e.g. aálowe 'inside' instead of álowe), but this is not an elegant solution. Do you have any hints as to why this error seems to affect only word-initial accented characters?

One example. Note that ak'uás is parsed without a problem.

006
\t  ko  álowe ak'uás     afchár ka   (...)
\m  ko  álowe ak'uá -s   afchár ka  
\g  ??? ***    thus   -GEN fire    like 
\ps  gr  ***    gr     -gr  n       gr   

\f dentro, pues, del fuego (...)
\com 

 cheers
Rolf

Wayne Leman

unread,
Feb 25, 2026, 10:50:46 AM (13 days ago) Feb 25
to shoeboxtoolbox-fiel...@googlegroups.com
That's interesting. I use those same characters and have no difficulty parsing. Those special characters can be in any position in the word. 

I don't know how to fix your configuration but Karen will be able to help you. 

Wayne

--
You received this message because you are subscribed to the Google Groups "Shoebox/Toolbox Field Linguist's Toolbox" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shoeboxtoolbox-field-ling...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/shoeboxtoolbox-field-linguists-toolbox/e18af534-3b06-4c42-a724-b63068f675b4n%40googlegroups.com.

Tony Naden

unread,
Feb 25, 2026, 12:29:51 PM (13 days ago) Feb 25
to shoeboxtoolbox-fiel...@googlegroups.com
May be some are entered as unified characters and some as character-plus-combining-accent ?



--
Address: "Lost Marbles", 31, Reading Road,
Pangbourne, Berks., RG8 7HY  -

Tel.: 01189842368

Most Holy God and Father, hear our prayers for all who strive for peace
and who fight for justice.
Help us today to remember the costs of war, to work for a better tomorrow;
and, as we commend to you lives lost in terror and conflict,
bring us all, in the end, to the peace of your presence;
through Jesus Christ our Lord.
Amen.

Karen Buseman

unread,
Feb 25, 2026, 4:49:05 PM (13 days ago) Feb 25
to shoeboxtoolbox-fiel...@googlegroups.com
Hi, Rolf,

That's really strange that it's only when its the first letter of a word. There are a few things to check in general, but I don't have a lot of confidence in them since just putting a "normal" letter in front solves your problem.

1) Be sure that the language encoding for your dictionary's headword is the same as the \t and \m lines of your interlinear.

2) Check Tony's suggestion that the special characters might have different encodings. This can be a probllem if the text and dictionary were typed at different times and potentially different keyboards. (If this is an entirely new project including new data, then that probably isn't the issue.) 
   One way to do this is to select your acute a, for example, from the interlinear and paste it into the Language Encoding in the Primary Sort sequence. If it's the same as what's already there, Toolbox will complain and not let you close the dialog box. But if it's different, then we have something to deal with. Do the same with the character from the dictionary.

3) Do Project, Language Encodings, and look at the list of make sure that you don't have any duplicated names in the left column. The file names can't be duplicated, but the internal name, which is what Toolbox goes by, can be duplicated without Toolbox complaining. This can happen if you've edited the Language Encoding with, say, Notepad, and saved the new version to a different file name. Toolbox picks up all the *.lng files that it sees in the folder with its *.prj file. If there are duplicates, the choice seems to be somewhat random.

4) Another thing you can do is to sort by the lexeme and then go to the start and then to the end of the dictionary. That's where Toolbox puts odd characters that aren't in the sort sequence. If there's some problem, your acute a and friends should show up there.

If you can't find anything that helps, send me the project to Toolbox @ sil.org (no spaces) and I'll frown at it and see what I can find. (Thanks for the vote of confidence, Wayne.) By project, I mean your enough of your data to show the problem and all the various files in the same folder with the *.prj file. (If you used our New Project kit, it's the whole folder.)

I agree it's not elegant, but that's a very clever work-around!

Karen
Toolbox Support


Rolf Hotz

unread,
Feb 26, 2026, 3:22:15 PM (12 days ago) Feb 26
to Shoebox/Toolbox Field Linguist's Toolbox
Dear Karen, Tony and Wayne, thank you so much for your answers and help!

@Karen, I tried 1) and that did the trick! \m had a different language encoding and now it is working.

again, thanks everyone for your help!
cheers
Rolf

El dia dimecres, 25 de febrer del 2026 a les 18:49:05 UTC-3, Karen Buseman va escriure:

Karen Buseman

unread,
Feb 26, 2026, 4:12:51 PM (12 days ago) Feb 26
to shoeboxtoolbox-fiel...@googlegroups.com
Delighted to hear that it's solved!

It still seems odd to me that it would only be a problem with that first character! Clearly I don't understand everything.

Karen
Toolbox Support

Reply all
Reply to author
Forward
0 new messages