Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regular expression for number validity

93 views
Skip to first unread message

qazmlp

unread,
Feb 20, 2003, 6:02:49 AM2/20/03
to
I have to allow a number only if it satisfies the following conditions:
- it should have 3 digits
- One of the digits in that, must be '2'.
- There should be only one '2' in 3 digits

I need a regular expression to check this validity. Can anybody suggest it?

Erik Max Francis

unread,
Feb 20, 2003, 6:08:07 AM2/20/03
to
qazmlp wrote:

2[0-9][0-9]|[0-9]2[0-9]|[0-9][0-9]2

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ In the little things / And the joy they bring
\__/ India Arie
WebVal / http://www.alcyone.com/pyos/webval/
URL scanner, maintainer, and validator in Python.

Tapani Tarvainen

unread,
Feb 20, 2003, 6:47:14 AM2/20/03
to
Erik Max Francis <m...@alcyone.com> writes:

> qazmlp wrote:
>
> > I have to allow a number only if it satisfies the following
> > conditions:
> > - it should have 3 digits
> > - One of the digits in that, must be '2'.
> > - There should be only one '2' in 3 digits
> >
> > I need a regular expression to check this validity. Can anybody
> > suggest it?
>
> 2[0-9][0-9]|[0-9]2[0-9]|[0-9][0-9]2

That doesn't work, it allows multiple 2's.
Besides it's not a real regular expression; although we
can probably presume extended regexp's would be OK,
but OP should specify exactly what type(s) are acceptable
(egrep-, awk-, or perl-style maybe?), or what tool(s)
can be used. In a shell script I'd use shell globbing,
probably with a 'case' construct.

The fix (in extended regexp) is rather obvious, but I won't post it,
unless OP gives some reason to believe this isn't homework.

--
Tapani Tarvainen

laura fairhead

unread,
Feb 20, 2003, 8:43:32 PM2/20/03
to

What is the context?

You can validate individual numbers easily with "case",

case "$NUMBER" in
2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2) valid=TRUE;;
*) valid=FALSE;;
esac

Or use 'egrep' to filter a list of numbers with the same expression,

egrep '2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2' number_list


byefornow
laura

--
alt.fan.madonna |news, interviews, discussion, writings
|chat, exchange merchandise, meet fans....
|Get into the groove baby you've got to... check us out!

Tapani Tarvainen

unread,
Feb 21, 2003, 1:40:57 AM2/21/03
to
LoveMrs...@madonnaweb.com (laura fairhead) writes:

> On 20 Feb 2003 03:02:49 -0800, qazml...@rediffmail.com (qazmlp) wrote:
>
> >I have to allow a number only if it satisfies the following conditions:
> > - it should have 3 digits
> > - One of the digits in that, must be '2'.
> > - There should be only one '2' in 3 digits

[...]


> Or use 'egrep' to filter a list of numbers with the same expression,
>
> egrep '2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2' number_list

That doesn't quite work: it matches numbers with more than three
digits, too. You'll need to add anchors or option -x.

--
Tapani Tarvainen

mats.bl...@htu.se

unread,
Feb 21, 2003, 8:45:35 AM2/21/03
to
qazmlp <qazml...@rediffmail.com> wrote:
> I need a regular expression to check this validity. Can anybody suggest it?


Here is Laura's expression modyfied and wrapped up in a little silly
script that lets you play around with your number-testing.
//Mats


bash$ cat skript
#!/bin/bash
#
while : ;do
read -p 'Enter a number: ' VAR
if echo "$VAR" | grep -E \
'^(2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2)$' \
2>&1 >/dev/null ;then
echo OK: $VAR
else
echo NOT OK: $VAR
fi
done
bash$


Sample run:

bash$ ./skript
Enter a number: 122
NOT OK: 122
Enter a number: 123
OK: 123
Enter a number: # <-- ( Press ^C to quit )
bash$


--
My code (if any) in this message are Copyright (C) 2003 Mats Blomstrand
and licensed under GNU GPL, http://www.gnu.org/licenses/gpl.html

laura fairhead

unread,
Feb 22, 2003, 12:54:57 PM2/22/03
to
On 21 Feb 2003 08:40:57 +0200, Tapani Tarvainen <tt+gn20030...@it.jyu.fi>
wrote:

Yes, you're absolutely right - that was very quickly transcribed
from the 'case' statement pattern, should be;

egrep '^(2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2)$' number_list

I've been wondering whether this sort of match was possible with
only a single (POSIX) basic regular expression, as a precursor to the
more difficult problem of constructing a single BRE to match
the negation of a word in the thread;

Message-ID: <2548491.1...@dbforums.com>
Subject: regular expressions
From: yan <membe...@dbforums.com>
Newsgroups: comp.unix.shell
Date: Wed, 19 Feb 2003 14:40:59 +0000

All that I can think of thus far is;

^\(2[013-9][013-9]\)\{0,1\}\([013-9]2[013-9]\)\{0,1\}\([013-9][013-9]2\)\{0,1\}$

Which also matches a some (5) invalid combinations. The number of invalid
combinations could be reduced to only one (being an empty line) if
the end of line anchor could be embedded in each sub-expression, but
since a POSIX BRE does not require to support anchors in sub-expressions
that would not be feasible.

I've also been wondering, if it is not possible to construct a BRE to do this,
how one might go about proving it mathematically,

>
>--
>Tapani Tarvainen

byefornow
laura # http://lf.1accesshost.com

Tapani Tarvainen

unread,
Feb 24, 2003, 9:08:35 AM2/24/03
to
LoveMrs...@madonnaweb.com (laura fairhead) writes:

> egrep '^(2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2)$' number_list
>
> I've been wondering whether this sort of match was possible with
> only a single (POSIX) basic regular expression

I don't think it is.

> All that I can think of thus far is;
>
> ^\(2[013-9][013-9]\)\{0,1\}\([013-9]2[013-9]\)\{0,1\}\([013-9][013-9]2\)\{0,1\}$
>
> Which also matches a some (5) invalid combinations.

Five? I can see more...
It only produces a legal combination if one and only one of the \{0,1\}s
evaluates to 1, otherwise you get wrong number of digits.
Each of them can give 81 different values,
if all of them are 1 you get 81^3 alternatives,
if two are 1 you get 3*81^2, if none you get one more so
in total there're 81^3 + 3*81^2 + 1 = 551125 illegal combinations.

Or did I misunderstand something?

> The number of invalid
> combinations could be reduced to only one (being an empty line) if
> the end of line anchor could be embedded in each sub-expression, but
> since a POSIX BRE does not require to support anchors in sub-expressions
> that would not be feasible.

There's a reason for that limitation, by the way: it would bring the
alternation operator in via the back door so to speak, and thereby
radically change the mathematical properties of BREs and algorithms
that can be used with them.

> I've also been wondering, if it is not possible to construct a BRE to do this,
> how one might go about proving it mathematically,

You need to study formal languages and generative grammars.
Incidentally, such proofs are much easier with BREs than
with EREs - perhaps not surprisingly.

--
Tapani Tarvainen

laura fairhead

unread,
Feb 24, 2003, 7:27:45 PM2/24/03
to
On 24 Feb 2003 16:08:35 +0200, Tapani Tarvainen <tt+gn20030...@it.jyu.fi>
wrote:

>LoveMrs...@madonnaweb.com (laura fairhead) writes:


>
>> egrep '^(2[013-9][013-9]|[013-9]2[013-9]|[013-9][013-9]2)$' number_list
>>
>> I've been wondering whether this sort of match was possible with
>> only a single (POSIX) basic regular expression
>
>I don't think it is.

No, me neither, still I'd like to be able to have a mathematical method
to be able to prove it one way or another for any given problem.

>
>> All that I can think of thus far is;
>>
>> ^\(2[013-9][013-9]\)\{0,1\}\([013-9]2[013-9]\)\{0,1\}\([013-9][013-9]2\)\{0,1\}$
>>
>> Which also matches a some (5) invalid combinations.
>
>Five? I can see more...
>It only produces a legal combination if one and only one of the \{0,1\}s
>evaluates to 1, otherwise you get wrong number of digits.
>Each of them can give 81 different values,
>if all of them are 1 you get 81^3 alternatives,
>if two are 1 you get 3*81^2, if none you get one more so
>in total there're 81^3 + 3*81^2 + 1 = 551125 illegal combinations.
>
>Or did I misunderstand something?

Yeah, when I used the word 'combination' I actually meant the number
of regular expressions this would match. Since there are 3 expressions
and each can be either on or off that makes 8 total, then you get
the 3 valid expressions, ie;

^\(2[013-9][013-9]\)\{0\}\([013-9]2[013-9]\)\{0\}\([013-9][013-9]2\)\{0\}$ =
^\(2[013-9][013-9]\)\{1\}\([013-9]2[013-9]\)\{0\}\([013-9][013-9]2\)\{0\}$ = v1
^\(2[013-9][013-9]\)\{0\}\([013-9]2[013-9]\)\{1\}\([013-9][013-9]2\)\{0\}$ = v2
^\(2[013-9][013-9]\)\{1\}\([013-9]2[013-9]\)\{1\}\([013-9][013-9]2\)\{0\}$ =
^\(2[013-9][013-9]\)\{0\}\([013-9]2[013-9]\)\{0\}\([013-9][013-9]2\)\{1\}$ = v3
^\(2[013-9][013-9]\)\{1\}\([013-9]2[013-9]\)\{0\}\([013-9][013-9]2\)\{1\}$ =
^\(2[013-9][013-9]\)\{0\}\([013-9]2[013-9]\)\{1\}\([013-9][013-9]2\)\{1\}$ =
^\(2[013-9][013-9]\)\{1\}\([013-9]2[013-9]\)\{1\}\([013-9][013-9]2\)\{1\}$ =

>
>> The number of invalid
>> combinations could be reduced to only one (being an empty line) if
>> the end of line anchor could be embedded in each sub-expression, but
>> since a POSIX BRE does not require to support anchors in sub-expressions
>> that would not be feasible.
>
>There's a reason for that limitation, by the way: it would bring the
>alternation operator in via the back door so to speak, and thereby
>radically change the mathematical properties of BREs and algorithms
>that can be used with them.

Yeah, and make them just as difficult to program, and because of the complexity
added you couldn't optimise them as much... so why have 2 different sorts of
RE at all?.. Yeah, I understand :-)

>
>> I've also been wondering, if it is not possible to construct a BRE to do this,
>> how one might go about proving it mathematically,
>
>You need to study formal languages and generative grammars.

Well. I like to think about these things and see if I can work them out myself,
so I'm going to continue to keep going over the problem -gives me something
interesting to stew over, I love problems especially tough looking ones and
if nobody sets them for me I must take the ones I can get ;-) And I'm starting
to get some idea's about this now,

I did skim through a paper on regular expressions at some point,hideous number
of references really -I think I sort of jumped right in at the deep end actually
lol .. but somehow I managed to gleen some information from the thing, so I've
got something to go on. Maybe I'll have a look out for some more documentation
if it gets to the weekend and I still haven't got anywhere. Anywhere on-line
you recommend reading?

>Incidentally, such proofs are much easier with BREs than
>with EREs - perhaps not surprisingly.

Not at all surprising :)

>
>--
>Tapani Tarvainen

Tapani Tarvainen

unread,
Feb 25, 2003, 6:41:15 AM2/25/03
to
LoveMrs...@madonnaweb.com (laura fairhead) writes:

>... so why have 2 different sorts of
> RE at all?.. Yeah, I understand :-)

Historically the reason is simply the availability of a simpler
(and in some sense more efficient) algorithm form BREs.
Quoth an old man page:

"The grep command searches for patterns that are limited regular expressions
as described under Regular Expressions. The grep command uses a compact,
nondeterministic algorithm.
[...]
The egrep command uses a deterministic algorithm that needs exponential
space."

> >Incidentally, such proofs are much easier with BREs than
> >with EREs - perhaps not surprisingly.
>
> Not at all surprising :)

Oops. That was a typo: BREs are actually harder in general, due
to the difficulty introduced by backreferences and because
the alternation operator is part of standard RE in formal language
theory, and thus there are some readily available results that can
be used in such impossiblity proofs.

If you disallow backreferences you should be able to come up with
a simple proof for the case at hand without too many difficulties.

>I did skim through a paper on regular expressions [...] Anywhere on-line
>you recommend reading?

Not really... but you can find an example of such proofs here:

http://www.wikipedia.org/wiki/Pumping_lemma

--
Tapani Tarvainen

laura fairhead

unread,
Mar 3, 2003, 3:14:38 AM3/3/03
to
On 25 Feb 2003 13:41:15 +0200, Tapani Tarvainen <tt+gn20030...@it.jyu.fi>
wrote:

>LoveMrs...@madonnaweb.com (laura fairhead) writes:
[]

>> >Incidentally, such proofs are much easier with BREs than
>> >with EREs - perhaps not surprisingly.
>>
>> Not at all surprising :)
>
>Oops. That was a typo: BREs are actually harder in general, due
>to the difficulty introduced by backreferences and because
>the alternation operator is part of standard RE in formal language
>theory, and thus there are some readily available results that can
>be used in such impossiblity proofs.

I forgot about those things; haven't used them in quite a while. It made
me wonder again, whether a back-reference to a sub-expression that
matches more than once should match what was first matched and whether
that was defined by the standard (don't have time to look right now)

>
>If you disallow backreferences you should be able to come up with
>a simple proof for the case at hand without too many difficulties.

I'm getting more familiar with the mathematics behind this stuff at
the moment, I will investigate already done formal proofs soon :-)

>
>>I did skim through a paper on regular expressions [...] Anywhere on-line
>>you recommend reading?
>
>Not really... but you can find an example of such proofs here:
>
>http://www.wikipedia.org/wiki/Pumping_lemma

Okay, thanx for the reference :) I've been thinking a bit more about this
all and one thing I did just find is that although it may be impossible
to contruct a basic regular expression to catch those values the inverse
regular expression is possible to do;

^\([0-9]*[^0-9].*\)\{0,1\}\([^2]*2[^2]*2.*\)\{0,1\}\([^2]\{3\}.*\)\{0,1\}.\{0,2\}\(.....*\)\{0,1\}$

\([0-9]*[^0-9].*\)\{0,1\} matches anything with a non-digit
\([^2]*2[^2]*2.*\)\{0,1\} matches anything with two or more '2's
\([^2]\{3\}.*\)\{0,1\} matches anything with 3 non-'2's
(ie: any 3 digit number with no '2')
.\{0,2\}\(.....*\)\{0,1\} matches 0,1,2,4 or more characters

Now I'm back to wondering if it is possible to match say the inverse
of;

^arbitrary_string$

If all the characters of arbitrary_string are unique it seems to be possible
via a similar technique to the above,

>
>--
>Tapani Tarvainen

bestwishes
laura

Stephane CHAZELAS

unread,
Mar 3, 2003, 4:53:20 AM3/3/03
to
On Mon, 03 Mar 2003 08:14:38 GMT, laura fairhead <LoveMrs...@madonnaweb.com> wrote:
[...]

> ^\([0-9]*[^0-9].*\)\{0,1\}\([^2]*2[^2]*2.*\)\{0,1\}\([^2]\{3\}.*\)\{0,1\}.\{0,2\}\(.....*\)\{0,1\}$

neat.

[...]


> Now I'm back to wondering if it is possible to match say the inverse
> of;
>
> ^arbitrary_string$
>
> If all the characters of arbitrary_string are unique it seems to be possible
> via a similar technique to the above,

Yes, even if characters are not unique. For example, for abc:
^\([^a]..\)\{0,1\}\(.[^b].\)\{0,1\}\(..[^c]\)\{0,1\}.\{0,2\}\(.....*\)\{0,1\}$

Another challenge is to reverse /abcdef/ with extended RE. I
came once with:

^([^a]|a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a*a[^ab]|[ab]*b(a
*a[^ab]|[^ac])|[^ad])|[abcd]*d(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[
abc]*c(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[^ad])|[^ae])|[abcde]*e(a
*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a*a[^ab]|[ab]*b(a*a[^ab]|
[^ac])|[^ad])|[abcd]*d(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a
*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[^ad])|[^ae])|[^af]))*[abcde]*$

(on one line)

But could not prove it worked (characters need to be unique).

--
Stéphane

laura fairhead

unread,
Mar 4, 2003, 3:07:20 PM3/4/03
to
On 03 Mar 2003 09:53:20 GMT, Stephane CHAZELAS <stephane...@yahoo.fr>
wrote:

What do you mean by 'reverse' ? I'm sure you can't mean finding the inversion
of the expression because we covered that above but I can't think what it
could be and breaking down the expression doesn't help me either,

>--
>Stéphane

byefornow

Stephane CHAZELAS

unread,
Mar 4, 2003, 7:21:02 PM3/4/03
to
On Tue, 04 Mar 2003 20:07:20 GMT, laura fairhead <LoveMrs...@madonnaweb.com> wrote:
[...]
>>Another challenge is to reverse /abcdef/ with extended RE. I
>>came once with:
>>
>>^([^a]|a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a*a[^ab]|[ab]*b(a
>>*a[^ab]|[^ac])|[^ad])|[abcd]*d(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[
>>abc]*c(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[^ad])|[^ae])|[abcde]*e(a
>>*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a*a[^ab]|[ab]*b(a*a[^ab]|
>>[^ac])|[^ad])|[abcd]*d(a*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[abc]*c(a
>>*a[^ab]|[ab]*b(a*a[^ab]|[^ac])|[^ad])|[^ae])|[^af]))*[abcde]*$
>>
>>(on one line)
>>
>>But could not prove it worked (characters need to be unique).
>>
>
> What do you mean by 'reverse' ? I'm sure you can't mean finding the inversion
> of the expression because we covered that above but I can't think what it
> could be and breaking down the expression doesn't help me either,

I mean that '^([^a]|a*a[^ab]|[ab]*b(a*a[...]^af]))*[abcde]*$'
is supposed to match any line that doesn't contain '/abcdef/'

grep -E '^([^a]|a*a[^ab]|[ab]*b(a*a[...]^af]))*[abcde]*$'

would be the same as:

grep -v abcdef

It's much more difficult that finding lines that *are* not (as a
whole) abcdef (or don't start with abcdef).

--
Stéphane

0 new messages