Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

matching exactly a 4 digit number in python

9 views
Skip to first unread message

harijay

unread,
Nov 21, 2008, 4:46:57 PM11/21/08
to
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Thanks for your help
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>> testdigit.py

Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"

Mr.SpOOn

unread,
Nov 21, 2008, 5:07:10 PM11/21/08
to harijay, pytho...@python.org
2008/11/21 harijay <har...@gmail.com>:

> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"

Try with this:

p = re.compile(r'\d{4}$')

The $ character matches the end of the string. It should work.

Mark Tolonen

unread,
Nov 21, 2008, 5:09:56 PM11/21/08
to

"harijay" <har...@gmail.com> wrote in message
news:7424ff80-c645-4b30...@j38g2000yqa.googlegroups.com...

> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string

Try:
p = re.compile(r'\b\d{4}\b')

-Mark

John Machin

unread,
Nov 21, 2008, 5:12:11 PM11/21/08
to
On Nov 22, 8:46 am, harijay <hari...@gmail.com> wrote:
> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"

No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.

>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .

{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx

George Sakkis

unread,
Nov 21, 2008, 5:25:06 PM11/21/08
to
On Nov 21, 4:46 pm, harijay <hari...@gmail.com> wrote:

> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"
>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .

No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)


HTH,
George

MRAB

unread,
Nov 21, 2008, 6:00:15 PM11/21/08
to pytho...@python.org
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')

sk...@pobox.com

unread,
Nov 21, 2008, 5:18:01 PM11/21/08
to harijay, pytho...@python.org

>> I am a few months new into python. I have used regexps before in perl
>> and java but am a little confused with this problem.

>> I want to parse a number of strings and extract only those that
>> contain a 4 digit number anywhere inside a string

>> However the regexp
>> p = re.compile(r'\d{4}')

>> Matches even sentences that have longer than 4 numbers inside strings
>> ..for example it matches "I have 3324234 and more"

Try this instead:

>>> pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>> for s in x:
... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches

--
Skip Montanaro - sk...@pobox.com - http://smontanaro.dyndns.org/

harijay

unread,
Nov 21, 2008, 6:20:56 PM11/21/08
to
Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari

0 new messages