Regular expressions

109 views
Skip to first unread message

Richard Gravina

unread,
May 10, 2012, 7:38:41 AM5/10/12
to flex...@googlegroups.com
Hi,
 
I’m trying to filter for words that do not contain ‘s’ or ‘z’. Is there a way of doing this with regular expressions? If possible I’d like to filter for words that don’t contain any of a set of letters.
 
Richard Gravina

Susanna

unread,
May 10, 2012, 11:07:07 AM5/10/12
to FLEx list
Hi Richard,
Yes, there is a way. It is not very elegant:

^[abcdefghijklmnopqrtuvwxy]*[^sz][abcdefghijklmnopqrtuvwxy]*$

You will need to add any other characters used in the data to the two
long abc lists.

For questions on regular expressions, a good place to start is this:
When you have the regular expression check box selected, the downarrow
button next to the text box becomes enabled. On the menu in there,
there is a an option "Regular Expression help". From that Help topic
select "Examples of regular expressions" and then "Examples of
combinations of regular expressions". In those you'll find some useful
examples which will help you learn how to make these kinds of regular
expressions!

-Susanna


On May 10, 6:38 am, "Richard Gravina" <Richard-Sue_Grav...@sil.org>
wrote:

Sargon Hasso

unread,
May 10, 2012, 11:10:57 AM5/10/12
to flex...@googlegroups.com
Try this regular expression
\b[^sz ]+\b    <-- there is a space character inside [] after z
on this example
new zcabs sz group all we za sw flex s z
==> matches all words except the ones that conatin sz, e.g.
new 
group
all
we
flex
 
explanation (optional read):
\b <-- match on a boundary of input, basically just look for words
[^sz] any character not in the set, in this case 's' and 'z'
+ one or more matches


--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

Richard Gravina

unread,
May 10, 2012, 12:08:16 PM5/10/12
to flex...@googlegroups.com
Excellent. That did the trick.
 
Thanks
To unsubscribe from this group, send email to mailto:flex-list%2Bunsu...@googlegroups.com

For more options, visit this group at http://groups.google.com/group/flex-list

J V C

unread,
May 23, 2012, 4:22:35 AM5/23/12
to flex...@googlegroups.com
This reply is rather late, but one technique I've found useful over and over is to refer to both the beginning (^) and the end ($) of the field in one regex. This forces the whole field to match, so the following example doesn't just match on "one or more occurrences of s or z" somewhere in the field; it forces the entire field to consist of such characters.

^[sz]+$

To force the field to contain only characters that are NOT s or z, you can use this (note the completely different meanings of the two ^ here):
^[^sz]+$

If you want empty fields to match as well, you could use asterisk instead:
^[^sz]*$

The \b is somewhat similar to ^ and $ in that it matches on a word boundary (an edge/context rather than an actual character that could be stored/replaced). But it's not quite identical because \b can also match on word boundaries in the middle of the field. So, o\b  will find all fields containing any word ending in o, whereas o$ will only match where the last character in the field is o.

The shortcut [0-9] is equivalent to [0123456789]; the shortcuts [a-z] and [A-Z] and [a-zA-Z] are similar (the last one is useful when "Match case" is ticked but you want any letter of either case).

Jon
Reply all
Reply to author
Forward
0 new messages