Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

repeating regular expressions in one string

10 views
Skip to first unread message

Shane

unread,
Nov 16, 2005, 3:09:56 PM11/16/05
to pytho...@python.org
Hi folks,

I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:

'blahblah_sf1234-sf1238_blahblah'

and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:

myString = re.compile('sf[0-9][0-9][0-9][0-9]')

This works great for finding the first instance of 'sfXXXX'. I hope that was clear :)

Thanks,

Shane

Carl J. Van Arsdall

unread,
Nov 16, 2005, 3:18:29 PM11/16/05
to Shane, pytho...@python.org
Shane wrote:
> Hi folks,
>
> I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
>
Well, since all your strings come in the same format you might try
something like

myString = re.compile(r'\w+_(sf\d\d\d\d)-(sf\d\d\d\d)_\w+')

then when you do your matching:

extracted = myString.match(originalStrnig)

your two extracted strings would be accessible via:

extracted.group(1)
extracted.group(2)

Inyeol Lee

unread,
Nov 16, 2005, 3:25:41 PM11/16/05
to pytho...@python.org
On Wed, Nov 16, 2005 at 03:09:56PM -0500, Shane wrote:
> Hi folks,
>
> I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
> This works great for finding the first instance of 'sfXXXX'. I hope that was clear :)
>
> Thanks,
>
> Shane
> --
> http://mail.python.org/mailman/listinfo/python-list

You can simplify your pattern

myString = re.compile('sf[0-9][0-9][0-9][0-9]')

to

myString = re.compile(r"sf\d{4}")

>>> import re
>>> s = 'blahblah_sf1234-sf1238_blahblah'
>>> pat = re.compile(r"sf\d{4}")
>>> re.findall(pat, s)
['sf1234', 'sf1238']
>>> for m in re.finditer(pat, s):
... print m.group()
...
sf1234
sf1238
>>>

Inyeol

Fredrik Lundh

unread,
Nov 16, 2005, 3:21:51 PM11/16/05
to pytho...@python.org
"Shane" wrote

> I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts
> of the string. Each 'sfXXXX' needs to be its own string when I am
> through. How do I compile a regular expression that looks for more
> than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
> This works great for finding the first instance of 'sfXXXX'.

if you want to extract all matches, you can either call the search method
again (with a start offset), or use a method that returns all matches:

>>> s = 'blahblah_sf1234-sf1238_blahblah'

>>> import re
>>> p = re.compile('sf[0-9][0-9][0-9][0-9]')

footnote: you can use \d instead of [0-9]:

p = re.compile('sf\d\d\d\d')

and you can use {n} to specify a repeat count:

p = re.compile('sf\d{4}')

no matter what form you prefer, you can use findall and finditer to locate all
matching substrings:

>>> print p.findall(s)
['sf1234', 'sf1238']

>>> for m in p.finditer(s):
... print m, m.group()
...
<_sre.SRE_Match object at 0x00A29058> sf1234
<_sre.SRE_Match object at 0x00A29918> sf1238

</F>

0 new messages