Regular expressions

Richard Gravina

unread,

May 10, 2012, 7:38:41 AM5/10/12

to flex...@googlegroups.com

Hi,

I’m trying to filter for words that do not contain ‘s’ or ‘z’. Is there a way of doing this with regular expressions? If possible I’d like to filter for words that don’t contain any of a set of letters.

Richard Gravina

Susanna

unread,

May 10, 2012, 11:07:07 AM5/10/12

to FLEx list

Hi Richard,
Yes, there is a way. It is not very elegant:

^[abcdefghijklmnopqrtuvwxy]*[^sz][abcdefghijklmnopqrtuvwxy]*$

You will need to add any other characters used in the data to the two
long abc lists.

For questions on regular expressions, a good place to start is this:
When you have the regular expression check box selected, the downarrow
button next to the text box becomes enabled. On the menu in there,
there is a an option "Regular Expression help". From that Help topic
select "Examples of regular expressions" and then "Examples of
combinations of regular expressions". In those you'll find some useful
examples which will help you learn how to make these kinds of regular
expressions!

-Susanna

On May 10, 6:38 am, "Richard Gravina" <Richard-Sue_Grav...@sil.org>
wrote:

Sargon Hasso

unread,

May 10, 2012, 11:10:57 AM5/10/12

to flex...@googlegroups.com

Try this regular expression

\b[^sz ]+\b <-- there is a space character inside [] after z

on this example

new zcabs sz group all we za sw flex s z

==> matches all words except the ones that conatin sz, e.g.

new

group

all

we

flex

explanation (optional read):

\b <-- match on a boundary of input, basically just look for words

[^sz] any character not in the set, in this case 's' and 'z'

+ one or more matches

--
You received this message because you are subscribed to the discussion group "FLEx list". This group is hosted by Google Groups and is open for anyone to browse.
To post to this group, send email to flex...@googlegroups.com
To unsubscribe from this group, send email to flex-list+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/flex-list

Richard Gravina

unread,

May 10, 2012, 12:08:16 PM5/10/12

to flex...@googlegroups.com

Excellent. That did the trick.

Thanks

To unsubscribe from this group, send email to mailto:flex-list%2Bunsu...@googlegroups.com

For more options, visit this group at http://groups.google.com/group/flex-list

J V C

unread,

May 23, 2012, 4:22:35 AM5/23/12

to flex...@googlegroups.com

This reply is rather late, but one technique I've found useful over and over is to refer to both the beginning (^) and the end ($) of the field in one regex. This forces the whole field to match, so the following example doesn't just match on "one or more occurrences of s or z" somewhere in the field; it forces the entire field to consist of such characters.

^[sz]+$

To force the field to contain only characters that are NOT s or z, you can use this (note the completely different meanings of the two ^ here):
^[^sz]+$

If you want empty fields to match as well, you could use asterisk instead:
^[^sz]*$

The \b is somewhat similar to ^ and $ in that it matches on a word boundary (an edge/context rather than an actual character that could be stored/replaced). But it's not quite identical because \b can also match on word boundaries in the middle of the field. So, o\b will find all fields containing any word ending in o, whereas o$ will only match where the last character in the field is o.

The shortcut [0-9] is equivalent to [0123456789]; the shortcuts [a-z] and [A-Z] and [a-zA-Z] are similar (the last one is useful when "Match case" is ticked but you want any letter of either case).

Jon

Reply all

Reply to author

Forward