Duplicate letters search

43 views
Skip to first unread message

Hljodulfr

unread,
Mar 31, 2025, 10:36:39 AM3/31/25
to FLEx list
Is there a way to search for words that have more than one particular letter in it?  I'm looking for a number of things (one of them being reduplication).

Bruce Cox

unread,
Mar 31, 2025, 11:30:57 AM3/31/25
to flex...@googlegroups.com
You should be able to use regular expressions for this, but you will have to craft the regular expression for your particular language.
E.g., if you are looking for reduplication which manifests as a CV- prefix reduplicating the first C and V of a word written with English letters and no digraphs, you might filter your lexicon using:
\b([bcdfghjklmnpqrstvwxyz][aeiou])\1
The \b marks a word boundary, [bcd...xyz] a consonant, [aeiou] a vowel. The parentheses capture whatever is matched -- i.e., one consonant followed by one vowel -- and the \1 matches the first (1) captured group.
Trying this on one of my databases (for which this is not particularly meaningful), it matches words like: dodon, kaka, soso, tatasai.
Hope this helps... but you'll still have some work to do tailoring a regular expression to your exact needs.
Cheers, bruce

On 31/03/2025 3:36:42 PM, Hljodulfr <skylin....@gmail.com> wrote:

Is there a way to search for words that have more than one particular letter in it?  I'm looking for a number of things (one of them being reduplication).

--
"FLEx list" messages are public. Only members can post.
flex_d...@sil.org
http://groups.google.com/group/flex-list.
---
You received this message because you are subscribed to the Google Groups "FLEx list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flex-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/flex-list/4b6137ea-e9a9-45a2-a97c-f112b20dbeecn%40googlegroups.com.

Hljodulfr

unread,
Mar 31, 2025, 11:38:19 AM3/31/25
to FLEx list
Working to get my head wrapped around that LOL.  What about just a letter?

Claire Bowern

unread,
Mar 31, 2025, 11:44:08 AM3/31/25
to flex...@googlegroups.com
([a-z])\1 should match aa, bb, cc, but not ab, bc, etc



--

Claire Bowern
Professor
Editor: Diachronica
Department of Linguistics, Yale University

Hljodulfr

unread,
Mar 31, 2025, 11:48:39 AM3/31/25
to FLEx list
Getting close!  Thank you :).  However I'm not after just two letters next to each other.  I'm also after if a letter appears more than once at any point in a word.

Bruce Cox

unread,
Mar 31, 2025, 11:50:26 AM3/31/25
to flex...@googlegroups.com
In that case ([a-z])\w*\1 should do it
Cheers, bruce

Bruce Cox

unread,
Mar 31, 2025, 11:51:10 AM3/31/25
to flex...@googlegroups.com
(The \w matches any word-forming character; * indicates zero or more of them.)
Cheers, bruce

Hljodulfr

unread,
Mar 31, 2025, 11:58:57 AM3/31/25
to FLEx list
Huzzah!  Looks like that is what I need!  I'm also definitely going to use the CV line from above, too.  Is there a place to learn how this type of code/searching is done?  Seems like a great thing to know :).

Bruce Cox

unread,
Mar 31, 2025, 2:11:42 PM3/31/25
to flex...@googlegroups.com
FLEx has pages in its Help about regular expressions. They are powerful -- but the power comes from having lots of (sometimes obscure) options, so there is a bit of a learning curve and it can be hard sometimes to work out why they aren't working as you hope. There are plenty of other web sites that offer assistance with learning about and testing regular expressions though.
I think they are worth the effort :)
Cheers, bruce

Jeff Heath

unread,
Apr 1, 2025, 4:49:53 AM4/1/25
to FLEx list
To understand and test regular expressions (RegEx), I recommend using the site https://regex101.com/. (I would suggest selecting ECMAScript (JavaScript) as the "FLAVOR" in the panel to the left, as I believe that's the closest to the ICU regular expressions that FieldWorks uses.)

On this site you can enter your RegEx in the field at the top, and enter a test string or text in the large field below that. All of the matches will be highlighted, and in the panel to the right, it gives an explanation of the RegEx, and it provides information on all of the matches that it found.

As an example, I just went to the FieldWorks help, under "Examples of combinations of regular expressions" and copied this RegEx: "\b(\w+)\s+\1\b" which it says Finds a word occurring twice in succession. I put the RegEx in the first field, typed a bit of sample text with a couple of repeated words, and this is what I get:

RegEx101.png

It can be very helpful to get an explanation of your RegEx and see the matches change in real time as you change your RegEx.

Jeff

Hljodulfr

unread,
Apr 21, 2025, 11:00:29 AM4/21/25
to FLEx list
Sorry for the late reply, but thank you!
Reply all
Reply to author
Forward
0 new messages