Help with RegEx?

SparkyGuy

unread,

Jan 5, 2008, 11:07:58 PM1/5/08

to

I want to build a regular expression that will find certain characters in a
field. For example:

i,n,t,u,o,n

all need to be present (at least once) for the RegEx interpreter to label
this search True. The order is not important, and case should be ignored.

I tried

[Ii][Nn][Tt][Uu][Oo][Nn]

and

[Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]

No joy.

Also tried use of ^ and $ but I'm not sure how to implement them, and whether
or not they are required.

So basically I'm stumbling around in the dark. But I want to learn. I've
viewed several tutorials on-line, but this subject is so obtuse to me that
it's difficult even getting started.

Any suggestions would be greatly appreciated.

Thanks!

SparkyGuy

unread,

Jan 6, 2008, 1:43:11 AM1/6/08

to

> [Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]

It turns out that this does work to find terms with all these letters in this
order even if there are other characters interspersed between them. Such as:

intuition
in3t5u7ition
in tuitio9n
inBBtuCCon

Now I want to find terms that have all these characters in any order. Such
as:

noiiitunt
tn8no9uitii
otAAnuiBiit
unt ioitni

I guess this has something to do with the ^ and $ parsing metasymbols but I'm
not knowledgeable enough on this topic to know how, exactly.

Any help would be greatly appreciated.

Thanks!

SM Ryan

unread,

Jan 6, 2008, 1:43:13 AM1/6/08

to

SparkyGuy <spar...@mumcrank.ck> wrote:
# I want to build a regular expression that will find certain characters in a
# field. For example:
#
# i,n,t,u,o,n
#
# all need to be present (at least once) for the RegEx interpreter to label
# this search True. The order is not important, and case should be ignored.
#
# I tried
#
# [Ii][Nn][Tt][Uu][Oo][Nn]
#
# and
#
# [Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]
#
# No joy.

The software really isn't intended to be used this way. It would
be simpler to conjoin a number of searches. Assuming an interface
like
numberofmatches = regexp(pattern,string)
you can do something like
regexp("[Ii]",string)==1
&& regexp("[Nn]",string)==2
&& regexp("[Tt]",string)==1
&& regexp("[Uu]",string)==1
&& regexp("[Oo]",string)==1

Some interfaces also allow a flag to ignore character case.
regexpnocase("i",string)==1
&& regexpnocase("n",string)==2
&& regexpnocase("t",string)==1
&& regexpnocase("u",string)==1
&& regexpnocase("o",string)==1

To do this in a single RE, you have to use all 120 permutations,
[^IiNnTtUuOoNn]*[Ii][^IiNnTtUuOoNn]*[Nn][^IiNnTtUuOoNn]*[Tt]...
|[^IiNnTtUuOoNn]*[Ii][^IiNnTtUuOoNn]*[Uo][^IiNnTtUuOoNn]*[Nn]...
|...

--
SM Ryan http://www.rawbw.com/~wyrmwif/
GERBILS
GERBILS
GERBILS

David Empson

unread,

Jan 6, 2008, 3:06:06 PM1/6/08

to

SparkyGuy <spar...@mumcrank.ck> wrote:

> > [Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]
>
> It turns out that this does work to find terms with all these letters in this
> order even if there are other characters interspersed between them. Such as:
>
> intuition
> in3t5u7ition
> in tuitio9n
> inBBtuCCon
>
> Now I want to find terms that have all these characters in any order. Such
> as:
>
> noiiitunt
> tn8no9uitii
> otAAnuiBiit
> unt ioitni

To clarify: do you want at least one each of "I", "T", "U" and "O", and
at least two "N"s, in any order and mixed with any other characters (or
more of the same ones), ignoring case?

Tricky. You can't do that with a simple regular expression, or even a
Perl-compatible regular expression.

It would best done in parallel, testing the string against five
different regular expressions:

[Ii]
[Nn].*[Nn]
[Tt]
[Uu]
[Oo]

The string has to match all of these to pass.

If you really don't need two "N"s then you can simplify the second test
to be like the others. If you want a certain number of each letter, but
in any position, then use the same general syntax as the second line.

> I guess this has something to do with the ^ and $ parsing metasymbols but I'm
> not knowledgeable enough on this topic to know how, exactly.

Those just mean "start of string" and "end of string" respectively. For
example, if you want to only match a string which starts with "I" or "i"
then your regular expression is "^[Ii]".

--
David Empson
dem...@actrix.gen.nz

Reinder Verlinde

unread,

Jan 6, 2008, 3:25:24 PM1/6/08

to

In article <0001HW.C3A53B75...@news.sf.sbcglobal.net>,
SparkyGuy <spar...@mumcrank.ck> wrote:

> I want to build a regular expression that will find certain characters in a
> field.
>
> For example:
>
> i,n,t,u,o,n
>
> all need to be present (at least once) for the RegEx interpreter to label
> this search True. The order is not important, and case should be ignored.

The 'order is not important' part makes a regular expression a 'less
than the ideal' solution for your problem.

> I tried
>
> [Ii][Nn][Tt][Uu][Oo][Nn]
>
> and
>
> [Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]
>
> No joy.
>
> Also tried use of ^ and $ but I'm not sure how to implement them, and whether
> or not they are required.
>
> So basically I'm stumbling around in the dark.

The 'I tried', and 'I am not sure' parts already carried that message,
but it is good to hear that you know that you do not really know what
you are doing.

> But I want to learn. I've viewed several tutorials on-line, but this
> subject is so obtuse to me that it's difficult even getting started.

I guess that you are at the stage where 'every thing looks like a nail,
even your thumb'. Regular expressions are powerful, but not suited for
every job. This is one of those jobs. You can create a regular
expression of this, but it would have to sum up all 360 (that would be
720 if the six letters were different) permutations of the six letters
used, for a regular expression of length around 34 * 360 + 2 * 359.

> Any suggestions would be greatly appreciated.

For me, the #1 rule when building regular expressions is: when your
regular expression does not do what you think it should do, shorten it,
and check (in a simple test program) that the shorter one does what you
think it should do.

In your case, you might want to start with "[Ii].*[Nn]" and work from
there.

<http://www.regular-expressions.info/> might help you.

If you have access to a Windows machine: have you seen
<http://www.regexbuddy.com/test.html>? I have not used it myself, but
heard positive comments about it.

> Thanks!

In article <0001HW.C3A5A5B8...@news.sf.sbcglobal.net>,
SparkyGuy <spar...@mumcrank.ck> wrote:

> > [Ii].*[Nn].*[Tt].*[Uu].*[Oo].*[Nn]
>

> It turns out that this does work to find terms with all these letters in this
> order even if there are other characters interspersed between them. Such as:
>
> intuition
> in3t5u7ition
> in tuitio9n
> inBBtuCCon

That should work on with most, if not all, regular expression libraries.
See for example <http://www.regextester.com/>. Which one are you using?

> I guess this has something to do with the ^ and $ parsing metasymbols

Why makes you think that?

Reinder

Paul Floyd

unread,

Jan 6, 2008, 3:25:32 PM1/6/08

to

On Sat, 05 Jan 2008 23:07:58 -0500, SparkyGuy <spar...@mumcrank.ck> wrote:
> I want to build a regular expression that will find certain characters in a
> field. For example:
>
> i,n,t,u,o,n

Exactly which language/library/regex engine are you using?

In any caseisn't easy in a single regexp.

A bientot
Paul
--
Paul Floyd http://paulf.free.fr