I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
However the regexp
p = re.compile(r'\d{4}')
Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
Here is my test program output and the test given below
Thanks for your help
Harijay
PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>> testdigit.py
Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE
#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]
p = re.compile(r'\d{4} ')
for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem
print "DONE"
Try with this:
p = re.compile(r'\d{4}$')
The $ character matches the end of the string. It should work.
Try:
p = re.compile(r'\b\d{4}\b')
-Mark
No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .
{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}
Ignoring {4,}:
You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.
some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx
> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"
>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:
p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
HTH,
George
which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
>> I want to parse a number of strings and extract only those that
>> contain a 4 digit number anywhere inside a string
>> However the regexp
>> p = re.compile(r'\d{4}')
>> Matches even sentences that have longer than 4 numbers inside strings
>> ..for example it matches "I have 3324234 and more"
Try this instead:
>>> pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>> for s in x:
... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches
--
Skip Montanaro - sk...@pobox.com - http://smontanaro.dyndns.org/
so r'\b\d{4}\b' is what I need since it reads
a 4 digit number in between word boundaries
Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is
Hari