repeating regular expressions in one string

Shane

unread,

Nov 16, 2005, 3:09:56 PM11/16/05

to pytho...@python.org

Hi folks,

I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:

'blahblah_sf1234-sf1238_blahblah'

and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:

myString = re.compile('sf[0-9][0-9][0-9][0-9]')

This works great for finding the first instance of 'sfXXXX'. I hope that was clear :)

Thanks,

Shane

Carl J. Van Arsdall

unread,

Nov 16, 2005, 3:18:29 PM11/16/05

to Shane, pytho...@python.org

Shane wrote:
> Hi folks,
>
> I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
>

Well, since all your strings come in the same format you might try
something like

myString = re.compile(r'\w+_(sf\d\d\d\d)-(sf\d\d\d\d)_\w+')

then when you do your matching:

extracted = myString.match(originalStrnig)

your two extracted strings would be accessible via:

extracted.group(1)
extracted.group(2)

Inyeol Lee

unread,

Nov 16, 2005, 3:25:41 PM11/16/05

to pytho...@python.org

On Wed, Nov 16, 2005 at 03:09:56PM -0500, Shane wrote:
> Hi folks,
>
> I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>

> This works great for finding the first instance of 'sfXXXX'. I hope that was clear :)
>
> Thanks,
>
> Shane

> --
> http://mail.python.org/mailman/listinfo/python-list

You can simplify your pattern

myString = re.compile('sf[0-9][0-9][0-9][0-9]')

to

myString = re.compile(r"sf\d{4}")

>>> import re
>>> s = 'blahblah_sf1234-sf1238_blahblah'
>>> pat = re.compile(r"sf\d{4}")
>>> re.findall(pat, s)
['sf1234', 'sf1238']
>>> for m in re.finditer(pat, s):
... print m.group()
...
sf1234
sf1238
>>>

Inyeol

Fredrik Lundh

unread,

Nov 16, 2005, 3:21:51 PM11/16/05

to pytho...@python.org

"Shane" wrote

> I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts
> of the string. Each 'sfXXXX' needs to be its own string when I am
> through. How do I compile a regular expression that looks for more
> than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
> This works great for finding the first instance of 'sfXXXX'.

if you want to extract all matches, you can either call the search method
again (with a start offset), or use a method that returns all matches:

>>> s = 'blahblah_sf1234-sf1238_blahblah'

>>> import re
>>> p = re.compile('sf[0-9][0-9][0-9][0-9]')

footnote: you can use \d instead of [0-9]:

p = re.compile('sf\d\d\d\d')

and you can use {n} to specify a repeat count:

p = re.compile('sf\d{4}')

no matter what form you prefer, you can use findall and finditer to locate all
matching substrings:

>>> print p.findall(s)
['sf1234', 'sf1238']

>>> for m in p.finditer(s):
... print m, m.group()
...
<_sre.SRE_Match object at 0x00A29058> sf1234
<_sre.SRE_Match object at 0x00A29918> sf1238

</F>