I'm new to regular expressions (and a novice at Python) but it seems to be the tool I need for a particular problem. I have a bunch of strings that looks like this:
'blahblah_sf1234-sf1238_blahblah'
and I would like to use the re module to parse all the 'sfXXXX' parts of the string. Each 'sfXXXX' needs to be its own string when I am through. How do I compile a regular expression that looks for more than one instance? Currently my expression looks like this:
myString = re.compile('sf[0-9][0-9][0-9][0-9]')
This works great for finding the first instance of 'sfXXXX'. I hope that was clear :)
Thanks,
Shane
myString = re.compile(r'\w+_(sf\d\d\d\d)-(sf\d\d\d\d)_\w+')
then when you do your matching:
extracted = myString.match(originalStrnig)
your two extracted strings would be accessible via:
extracted.group(1)
extracted.group(2)
You can simplify your pattern
myString = re.compile('sf[0-9][0-9][0-9][0-9]')
to
myString = re.compile(r"sf\d{4}")
>>> import re
>>> s = 'blahblah_sf1234-sf1238_blahblah'
>>> pat = re.compile(r"sf\d{4}")
>>> re.findall(pat, s)
['sf1234', 'sf1238']
>>> for m in re.finditer(pat, s):
... print m.group()
...
sf1234
sf1238
>>>
Inyeol
> I have a bunch of strings that looks like this:
>
> 'blahblah_sf1234-sf1238_blahblah'
>
> and I would like to use the re module to parse all the 'sfXXXX' parts
> of the string. Each 'sfXXXX' needs to be its own string when I am
> through. How do I compile a regular expression that looks for more
> than one instance? Currently my expression looks like this:
>
> myString = re.compile('sf[0-9][0-9][0-9][0-9]')
>
> This works great for finding the first instance of 'sfXXXX'.
if you want to extract all matches, you can either call the search method
again (with a start offset), or use a method that returns all matches:
>>> s = 'blahblah_sf1234-sf1238_blahblah'
>>> import re
>>> p = re.compile('sf[0-9][0-9][0-9][0-9]')
footnote: you can use \d instead of [0-9]:
p = re.compile('sf\d\d\d\d')
and you can use {n} to specify a repeat count:
p = re.compile('sf\d{4}')
no matter what form you prefer, you can use findall and finditer to locate all
matching substrings:
>>> print p.findall(s)
['sf1234', 'sf1238']
>>> for m in p.finditer(s):
... print m, m.group()
...
<_sre.SRE_Match object at 0x00A29058> sf1234
<_sre.SRE_Match object at 0x00A29918> sf1238
</F>