effecting an orthography change throughout a FLEx database

72 views
Skip to first unread message

Kevin Warfel

unread,
Jul 5, 2018, 11:00:44 AM7/5/18
to flex...@googlegroups.com

Can anyone provide or point me to a methodology for effecting an orthography change throughout a FLEx database? It seems like I’ve seen this topic discussed before, but I am not remembering or finding anything about it at present.

 

Here is the sort of thing the user I’m trying to help is wanting to do:

 

Change all occurrences of è to ɛ, whether they are in the Lexeme Form, Citation Form, Allomorphs, or Variants. Make the same change throughout the texts (which implies the desire for the same changes throughout the Wordforms list), ideally without having to redo the analysis of the texts that have already been interlinearized.

 

When the user and I attempted this task a short while ago, we were able to effect comprehensive changes throughout the various fields in the lexicon by using the Bulk Edit Entries - Bulk Replace tool. That changed nothing in the texts or in the Wordform list, however. We saw the Bulk Edit Wordforms option under Texts & Words, but changing those did not make any changes to the texts themselves. We found that, in Word List Concordance, we could change the spelling of a particular word throughout all of the selected texts, but we didn’t find a way to do a bulk replacement operation on the entire text corpus, and even if we had found that, these three operations (bulk edit lexicon, bulk edit wordforms, bulk replacement in texts) appear to be unrelated to each other, so that any analysis previously associated with a word that was subjected to the orthography change is no longer recognized and has to be redone.

 

If there is a way to do a wholesale implementation of an orthography change like this in FLEx, I’d like to know the “best practice” way to do it.

 

Thank you,

Kevin

 

Robert Hedinger

unread,
Jul 6, 2018, 4:36:53 AM7/6/18
to flex...@googlegroups.com
I am wondering whether it would work by running the change on the *.fwdata file, assuming that every element that needs to be changed is tagged for that specific language. However, since I don't know enough about these things somebody like Ken Zook would be better placed to answer that question.

Robert

--
You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+unsubscribe@googlegroups.com.
To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/53e119264d2896bfb89ee911fb036c01%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Kevin Warfel

unread,
Jul 6, 2018, 5:16:32 AM7/6/18
to flex...@googlegroups.com
Thanks, Robert. I was wondering the same thing. I wasn't sure if there were any potential complications that I might not be aware of, but that did seem to me to at least potentially be the quickest and cleanest way to effect the change everywhere, since nothing else is being altered. However, I always consider the possibility of editing the fwdata file directly with a great deal of caution, as I know that one unintended change to something in the file not intending to be targeted could ruin the whole project. I will wait to hear from someone like Ken before going that route.

Meanwhile, if anyone knows a way from within the FLEx UI to make the change universally in the project, I'd try that first.

Kevin

Ken Zook

unread,
Jul 6, 2018, 9:39:28 AM7/6/18
to flex...@googlegroups.com

No, there is no way within Flex.

 

It’s quite easy to do in CC (or similar program) on the fwdata file. You are wise with taking cautions in editing the fwdata file. Any time you attempt that, be sure to back up your project so you can restore it if things go awry. Also, if you are collaborating with S/R, have everyone do S/R and stop working while you make the change and then S/R to pass the changes to colleagues so they can S/R and continue their work.

 

Instructions for doing this are in section 6.1 of

https://software.sil.org/fieldworks/wp-content/uploads/sites/38/2016/10/FieldWorks-7-XML-model.pdf

If you need help, let me know.

 

Ken

--

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.


To post to this group, send email to flex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/53e119264d2896bfb89ee911fb036c01%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

 

--

You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.

 

--

You are subscribed to the publicly accessible group "FLEx list".
Only members can post but anyone can view messages on the website.
To change your status, please write to flex_d...@sil.org.
You can join this group by going to http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.


To post to this group, send email to flex...@googlegroups.com.

susie....@gmail.com

unread,
Dec 6, 2022, 7:19:21 AM12/6/22
to FLEx list
Hi all,
Itʼs been over 4 years since this discussion, so I thought Iʼd open it again to see if there is a way to make it work now in FLEx.
We just did a spelling reform and I updated all 3500 entries in our lexicon.  However, now we have our texts to update, and I donʼt really want to go through them all and do them all manually, nor do "change spelling" for each word that needs to be updated.  I see that Bulk Edit Wordforms doesnʼt actually change the words in your texts, either.
What I would love to see automated is having a tilde put on every vowel that comes after a nasal consonant, a tilde on a vowel that comes after a nasalized vowel (so both vowels will get the tilde), and a nasal stop between a nasal vowel and oral stop (except g).  
Again, I can do it all manually, and donʼt mind checking to make sure that everything looks ok and cleaning up the last bits and pieces afterwards, but it would be really nice if I didnʼt have to do all of these changes by hand in every text, or even go through the steps to "change spelling" for each of the words impacted.
Thanks in advance!
Susie

steve...@sil.org

unread,
Dec 15, 2022, 11:17:03 AM12/15/22
to FLEx list
Susie, I don't think the situation has changed since four years ago. It is possible to edit the fwdata file, it is in XML format.
I tested this on a sample database and could make regular expressions work after opening the file in Notepad++

The key is to restrict changes only to text in the vernacular writing system. In the FWData file the writing system is coded with a string ws= followed by the language ID. 

In my test I wanted to change any kh sequence in vernacular text to the Unicode k with hook character.

I could do this with these regular expressions in Find/Replace (
 Find what string: (ws="seh".+)kh(.+\<)

Replace string:  \1ƙ\2

(The language code for the vernacular in this project is seh. I also made sure that "dot includes newline" was NOT selected).

The expression searches for ws=seh, followed by any number of characters in the same line, then a kh, then any number of additional characters in the same line, then a <. It replaces all the characters it finds except the kh becomes a hook k.

For more info, I wrote up my test here (including how to do it in Consistent Changes). 

Wes Peacock

unread,
Dec 16, 2022, 3:48:43 AM12/16/22
to FLEx list
@Steve, I believe that there is a bug in your Notepad++ solution.
1) if there's more than 1 occurrence of kh in the Sena text, only the first will be changed. (This isn't true of the Consistent Changes version)
If your instructions tell the user to do the find/replace repeatedly over the entire file until it doesn't find any occurrences, then that should work too.

Note also that repeatedly running over the entire file only works if the replacement doesn't contain the search text. 

steve...@sil.org

unread,
Dec 16, 2022, 8:19:26 AM12/16/22
to FLEx list
Thanks, Wes, I did not test those possibilities. 

Susie Locklin

unread,
Dec 16, 2022, 8:44:52 AM12/16/22
to flex...@googlegroups.com
Thanks so much! I will try it out and let you know if I have any problems.
Susie

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to a topic in the Google Groups "FLEx list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/flex-list/ErNiaXdr1PI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to flex-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flex-list/0d4275a2-8f65-42d8-9569-fa8fb74120fbn%40googlegroups.com.

Jeff Heath

unread,
Dec 17, 2022, 8:23:14 AM12/17/22
to FLEx list
I just wanted to throw in a Regular Expression comment. I see from Steve’s Google Doc that you have expressions like this:
<Run ws="seh">akhabweka</Run>
Steve used ".+" in his search expression and told you to make sure that "dot includes newline" was NOT selected, so it wouldn’t go onto the next line. But ".+" is a greedy expression, and can be dangerous even with this option. Consider if you have multiple Runs on one line like this (whether or not this is actually possible in the FLEx file):
<Run ws="seh">nothing here</Run><Run ws="abc">akhabweka</Run>
Steve’s expression would change the "kh" in the abc writing system string, because ".+" just blithely goes over the Run boundary.

I think it’s safer to use an expression that would just find the end of that particular run, using an expression that matches all characters that are not the "<" character, since that character must appear at the end of the run. So I would propose:

(with Regular Expressions activated)
Find: (ws="seh">[^<]+?)kh
Replace: \1ƙ

The magic is in the expression  [^<]+?
This finds one or more (+) of anything that is not a "<" character ([^<]), and turns off the "greediness" (?), which means the first time it sees a "kh" string, it will stop searching. (Actually the "?" to turn off the greediness is optional - it would just run a little faster if you have lots of Runs with multiple "kh" strings. If you don't include "?", it will replace the last occurrence in the Run first.)

Similarly to Steve's solution, this will not change multiple "kh" strings in the Run, so you need to run the Find/Replace more than once, until it doesn't find any more occurrences.

I would recommend giving this a try with Notepad++. In the Find dialog, enter the Find what string, then select the Mark tab and click on the Mark All button to highlight all of the strings found.

Hope that helps...

Joyce Wood

unread,
Feb 18, 2026, 11:31:40 PM (14 days ago) Feb 18
to FLEx list
Hi All, 

I have a similar question on this topic. 

Would it work to do two separate (long & tedious) steps: 
  1. "Change Spelling" tool in [Texts & Words] Word Analyses, on each word that requires editing / that contains the special character, which is now being represented differently. 
  2. [Lexicon] Bulk Edit to edits the words with the special character. 
That is, that whatever links in the Text & Words Gloss tab and Analyze tab hopefully stay intact. 

thanks, 
Joyce

Kevin Warfel

unread,
Feb 19, 2026, 1:58:11 AM (14 days ago) Feb 19
to flex...@googlegroups.com
I believe that your proposed workflow would leave your links intact, but I would personally consider using Find/Replace in the .fwdata file with a plain text editor (Notepad++ is the one I'm used to using for this sort of thing) as a first attempt to effect the change. That would be much less work, but of course there are risks involved and special skill may be required. 

If you do that, you definitely want to do S/R (or make a backup if your project is not connected to Lexbox) prior to operating on the .fwdata file.
You would need the ability to use regular expressions in order to ensure that the Find/Replace applies only to text encoded in the target writing system.
Before doing S/R again after the Find/Replace operation, you would want to diligently verify that nothing got corrupted during that process.

Finding and replacing an "unusual" character is a much simpler proposition than if the character needing to be replaced is part of the normal English alphabet. For example, replacing "ŋ" with "ng" is much simpler than replacing "ng" with "ŋ", since "ng" may be part of words that you don't want to change, while "ŋ" is less likely to occur other than where you want to change it.

SIL Global's Dictionary & Lexicography Services does this sort of thing for a fee (hourly rate). You may contact me directly at kevin_...@sil.org if this possibility interests you.

Best wishes,
Kevin

Reply all
Reply to author
Forward
0 new messages