>>> string = re.escape('123#abc456')
>>> match = re.match('\d+', string)
>>> print match
<_sre.SRE_Match object at 0x00A6A800>
>>> print match.group()
123
The correct result should be:
123456
I've tried to escape the hash symbol in the match string without
result.
Any ideas? Is the answer something I overlooked in my lurching Python
schooling?
> I've encountered a problem with my RegEx learning curve -- how to
> escape hash characters # in strings being matched, e.g.:
>
>>>> string = re.escape('123#abc456')
>>>> match = re.match('\d+', string)
>>>> print match
>
> <_sre.SRE_Match object at 0x00A6A800>
>>>> print match.group()
>
> 123
>
> The correct result should be:
>
> 123456
>>> "".join(re.findall("\d+", "123#abc456"))
'123456'
> I've tried to escape the hash symbol in the match string without
> result.
>
> Any ideas? Is the answer something I overlooked in my lurching Python
> schooling?
re.escape() is used to build the regex from a string that may contain
characters that have a special meaning in regular expressions but that you
want to treat as literals. You can for example search for r"C:\dir" with
>>> re.compile(re.escape(r"C:\dir")).findall(r"C:\dir C:7ir")
['C:\\dir']
Without escaping you'd get
>>> re.compile(r"C:\dir").findall(r"C:\dir C:7ir")
['C:7ir']
Peter
As you're not being clear on what you wanted, I'm just guessing this is
what you wanted:
>>> s = '123#abc456'
>>> re.match('\d+', re.sub('#\D+', '', s)).group()
'123456'
>>> s = '123#this is a comment and is ignored456'
>>> re.match('\d+', re.sub('#\D+', '', s)).group()
'123456'
Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.
In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.
Other patterns the strings can take in their manually-created
form:
A#12345
#1234509
Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.
I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.
Thanks!
> I'm surprised it's been so difficult to find an example of the hash
> character in a RegEx string -- for exactly this type of situation,
> since it's so common in the real world that people want to put a pound
> symbol in front of a number.
It's a character with no special meaning to the regex engine, so I'm not
in the least surprised that there aren't many examples containing it.
You could just as validly claim that there aren't many examples involving
the letter 'q'.
By the way, I don't know what you're doing but I'm seeing all of your
posts twice, from two different addresses. This is a little confusing,
to put it mildly, and doesn't half break the threading.
--
Rhodri James *-* Wildebeest Herder to the Masses
It depends on whether the re.VERBOSE option is passed. If you're using
a verbose regexp, you can use "#" to comment portions of it:
r = re.compile(r"""
\d+ # some digits
[aeiou] # some vowels
""", re.VERBOSE)
-tkc
perhaps it's like this?
>>> # you can use re.search if that suits better
>>> a = re.match('([A-Z]?)#(\d{5})(\d\d)?', 'A#12345')
>>> b = re.match('([A-Z]?)#(\d{5})(\d\d)?', '#1234509')
>>> a.group(0)
'A#12345'
>>> a.group(1)
'A'
>>> a.group(2)
'12345'
>>> a.group(3)
>>> b.group(0)
'#1234509'
>>> b.group(1)
''
>>> b.group(2)
'12345'
>>> b.group(3)
'09'