Raw string substitution problem

Ed Keith

unread,

Dec 16, 2009, 9:09:32 AM12/16/09

to pytho...@python.org

I am having a problem when substituting a raw string. When I do the following:

re.sub('abc', r'a\nb\nc', '123abcdefg')

I get

"""
123a
b
cdefg
"""

what I want is

r'123a\nb\ncdefg'

How do I get what I want?

Thanks,

-EdK

Ed Keith
e_...@yahoo.com

Blog: edkeith.blogspot.com

Chris Hulan

unread,

Dec 16, 2009, 9:36:35 AM12/16/09

to

Looks like raw strings lets you avoid having to escape slashes when
specifying the literal, but doesn't preserve it during operations.
changing your replacement string to r'a\\nb\\nc' seems to give the
desired output

cheers

Gabriel Genellina

unread,

Dec 16, 2009, 9:35:55 AM12/16/09

to pytho...@python.org

En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith <e_...@yahoo.com> escribió:

> I am having a problem when substituting a raw string. When I do the
> following:
>
> re.sub('abc', r'a\nb\nc', '123abcdefg')
>
> I get
>
> """
> 123a
> b
> cdefg
> """
>
> what I want is
>
> r'123a\nb\ncdefg'

From http://docs.python.org/library/re.html#re.sub

re.sub(pattern, repl, string[, count])

...repl can be a string or a function; if
it is a string, any backslash escapes in
it are processed. That is, \n is converted
to a single newline character, \r is
converted to a linefeed, and so forth.

So you'll have to double your backslashes:

py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
'123a\\nb\\ncdefg'

--
Gabriel Genellina

Ed Keith

unread,

Dec 16, 2009, 12:19:01 PM12/16/09

to pytho...@python.org

--- On Wed, 12/16/09, Gabriel Genellina <gags...@yahoo.com.ar> wrote:

> --http://mail.python.org/mailman/listinfo/python-list
>

That is going to be a nontrivial exercise. I have control over the pattern, but the texts to be substituted and substituted into will be read from user supplied files. I need to reproduce the exact text the is read from the file.

Maybe what I should do is use re to break the string into two pieces, the part before the pattern to be replaces and the part after it, then splice the replacement text in between them. Seems like doing it the hard way, but it should work.

Thanks,

-EdK

Peter Otten

unread,

Dec 16, 2009, 12:51:08 PM12/16/09

to

Ed Keith wrote:

There is a helper function re.escape() that you can use to sanitize the
substitution:

>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
123a\nb\ncdefg

Peter

Gabriel Genellina

unread,

Dec 16, 2009, 2:23:59 PM12/16/09

to pytho...@python.org

En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__pet...@web.de>
escribi�:

> Ed Keith wrote:
>
>> --- On Wed, 12/16/09, Gabriel Genellina <gags...@yahoo.com.ar> wrote:
>>

>>> Ed Keith <e_...@yahoo.com>
>>> escribi�:

>>>
>>> > I am having a problem when substituting a raw string.
>>> When I do the following:
>>> >
>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>>> >
>>> > I get
>>> >
>>> > """
>>> > 123a
>>> > b
>>> > cdefg
>>> > """
>>> >
>>> > what I want is
>>> >
>>> > r'123a\nb\ncdefg'
>>>

>>> So you'll have to double your backslashes:
>>>
>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>>> '123a\\nb\\ncdefg'
>>>

>> That is going to be a nontrivial exercise. I have control over the
>> pattern, but the texts to be substituted and substituted into will be
>> read
>> from user supplied files. I need to reproduce the exact text the is read
>> from the file.
>
> There is a helper function re.escape() that you can use to sanitize the
> substitution:
>
>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
> 123a\nb\ncdefg

Unfortunately re.escape does much more than that:

py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
123a\.b\.cdefg

I think the string_escape encoding is what the OP needs:

py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
'123abcdefg')
123a\n(b.c)\nddefg

--
Gabriel Genellina

Peter Otten

unread,

Dec 16, 2009, 2:54:32 PM12/16/09

to

Gabriel Genellina wrote:

> En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__pet...@web.de>

> escribió:

>
>> Ed Keith wrote:
>>
>>> --- On Wed, 12/16/09, Gabriel Genellina <gags...@yahoo.com.ar> wrote:
>>>
>>>> Ed Keith <e_...@yahoo.com>

>>>> escribió:

>>>>
>>>> > I am having a problem when substituting a raw string.
>>>> When I do the following:
>>>> >
>>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>>>> >
>>>> > I get
>>>> >
>>>> > """
>>>> > 123a
>>>> > b
>>>> > cdefg
>>>> > """
>>>> >
>>>> > what I want is
>>>> >
>>>> > r'123a\nb\ncdefg'
>>>>
>>>> So you'll have to double your backslashes:
>>>>
>>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>>>> '123a\\nb\\ncdefg'
>>>>
>>> That is going to be a nontrivial exercise. I have control over the
>>> pattern, but the texts to be substituted and substituted into will be
>>> read
>>> from user supplied files. I need to reproduce the exact text the is read
>>> from the file.
>>
>> There is a helper function re.escape() that you can use to sanitize the
>> substitution:
>>
>>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
>> 123a\nb\ncdefg
>
> Unfortunately re.escape does much more than that:
>
> py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
> 123a\.b\.cdefg

Sorry, I didn't think of that.

> I think the string_escape encoding is what the OP needs:
>
> py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
> '123abcdefg')
> 123a\n(b.c)\nddefg

Another possibility:

>>> print re.sub('abc', lambda m: r'a\nb\n.c\a', '123abcdefg')
123a\nb\n.c\adefg

Peter

Ed Keith

unread,

Dec 16, 2009, 3:53:26 PM12/16/09

to pytho...@python.org

--- On Wed, 12/16/09, Peter Otten <__pet...@web.de> wrote:

> Another possibility:
>
> >>> print re.sub('abc', lambda m: r'a\nb\n.c\a',
> '123abcdefg')
> 123a\nb\n.c\adefg

I'm not sure whether that is clever, ugly, or just plain strange!

I think I'll stick with:

>>> m = re.match('^(.*)abc(.*)$', '123abcdefg')
>>> print m.group(1) + r'a\nb\n.c\a' + m.group(2)
123a\nb\n.c\adefg

It's much less likely to fry the poor maintenance programmer's mind.

Alan G Isaac

unread,

Dec 17, 2009, 10:08:00 AM12/17/09

to

> En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith <e_...@yahoo.com> escribió:
>
>> I am having a problem when substituting a raw string. When I do the
>> following:
>>
>> re.sub('abc', r'a\nb\nc', '123abcdefg')
>>
>> I get
>>
>> """
>> 123a
>> b
>> cdefg
>> """
>>
>> what I want is
>>
>> r'123a\nb\ncdefg'

On 12/16/2009 9:35 AM, Gabriel Genellina wrote:
> From http://docs.python.org/library/re.html#re.sub
>
> re.sub(pattern, repl, string[, count])
>
> ...repl can be a string or a function; if
> it is a string, any backslash escapes in
> it are processed. That is, \n is converted
> to a single newline character, \r is
> converted to a linefeed, and so forth.
>
> So you'll have to double your backslashes:

I'm not persuaded that the docs are clear. Consider:

>>> 'ab\\ncd' == r'ab\ncd'
True

Naturally enough. So I think the right answer is:

1. this is a documentation bug (i.e., the documentation
fails to specify unexpected behavior for raw strings), or
2. this is a bug (i.e., raw strings are not handled correctly
when used as replacements)

I vote for 2.

Peter's use of a function highlights just how odd this is:
getting the raw string via a function produces a different
result than providing it directly. If this is really the
way things ought to be, I'd appreciate a clear explanation
of why.

Alan Isaac

Richard Brodie

unread,

Dec 17, 2009, 11:24:32 AM12/17/09

to

"Alan G Isaac" <alan....@gmail.com> wrote in message
news:qemdnRUT0JvJ1LfW...@rcn.net...

> Naturally enough. So I think the right answer is:
>
> 1. this is a documentation bug (i.e., the documentation
> fails to specify unexpected behavior for raw strings), or
> 2. this is a bug (i.e., raw strings are not handled correctly
> when used as replacements)

<neo> There is no raw string. </neo>

A raw string is not a distinct type from an ordinary string
in the same way byte strings and Unicode strings are. It
is a merely a notation for constants, like writing integers
in hexadecimal.

>>> (r'\n', u'a', 0x16)
('\\n', u'a', 22)

Alan G Isaac

unread,

Dec 17, 2009, 11:51:26 AM12/17/09

to

On 12/17/2009 11:24 AM, Richard Brodie wrote:
> A raw string is not a distinct type from an ordinary string
> in the same way byte strings and Unicode strings are. It
> is a merely a notation for constants, like writing integers
> in hexadecimal.
>
>>>> (r'\n', u'a', 0x16)
> ('\\n', u'a', 22)

Yes, that was a mistake. But the problem remains::

>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
True
>>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
False

Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong. Why isn't it?

More simply, consider::

>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?

Thanks,
Alan Isaac

D'Arcy J.M. Cain

unread,

Dec 17, 2009, 12:19:52 PM12/17/09

to Alan G Isaac, pytho...@python.org

On Thu, 17 Dec 2009 11:51:26 -0500
Alan G Isaac <alan....@gmail.com> wrote:
> >>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
> True

Was this a straight cut and paste or did you make a manual change? Is
that leading space in the middle one a copying error? I get False for
what you actually have there for obvious reasons.

> >>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
> False
>
> Why are the first two strings being treated as if they are the last one?

They aren't. The last string is different.

>>> for x in (r'a\nb\n.c\a', 'a\\nb\\n.c\\a', 'a\nb\n.c\a'): print repr(x)
...
'a\\nb\\n.c\\a'
'a\\nb\\n.c\\a'
'a\nb\n.c\x07'

> That is, why isn't '\\' being processed in the obvious way?
> This still seems wrong. Why isn't it?

What do you think is wrong? What would the "obvious" way of handling
'//' be?

>
> More simply, consider::
>
> >>> re.sub('abc', '\\', '123abcdefg')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python26\lib\re.py", line 151, in sub
> return _compile(pattern, 0).sub(repl, string, count)
> File "C:\Python26\lib\re.py", line 273, in _subx
> template = _compile_repl(template, pattern)
> File "C:\Python26\lib\re.py", line 260, in _compile_repl
> raise error, v # invalid expression
> sre_constants.error: bogus escape (end of line)
>
> Why is this the proper handling of what one might think would be an
> obvious substitution?

Is this what you want? What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.

>>> re.sub('abc', r'\\', '123abcdefg')
'123\\defg'

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

MRAB

unread,

Dec 17, 2009, 12:38:34 PM12/17/09

to pytho...@python.org

Regular expressions and replacement strings have their own escaping
mechanism, which also uses backslashes.

Some of these regex escape sequences are the same as those of string
literals, eg \n represents a newline; others are different, eg \b in a
regex represents a word boundary and not a backspace as in a string
literal.

You can match a newline in a regex by either using an actual newline
character ('\n' in a string literal) or an escape sequence ('\\n' or
r'\n' in a string literal). If you want a regex to match an actual
backslash followed by a letter 'n' then you need to escape the backslash
in the regex and then either use a raw string literal or escape it again
in a non-raw string literal.

Match characters: <newline>
Regex: \n
Raw string literal: r'\n'
Non-raw string literal: '\\n'

Match characters: \n
Regex: \\n
Raw string literal: r'\\n'
Non-raw string literal: '\\\\n'

Replace with characters: <newline>
Replacement: \n
Raw string literal: r'\n'
Non-raw string literal: '\\n'

Replace with characters: \n
Replacement: \\n
Raw string literal: r'\\n'
Non-raw string literal: '\\\\n'

Alan G Isaac

unread,

Dec 17, 2009, 12:54:07 PM12/17/09

to

> Alan G Isaac<alan....@gmail.com> wrote:
>> >>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
>> True

>> Why are the first two strings being treated as if they are the last one?

On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
> They aren't. The last string is different.

Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)

> Alan G Isaac<alan....@gmail.com> wrote:
>> More simply, consider::
>>
>> >>> re.sub('abc', '\\', '123abcdefg')
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in<module>
>> File "C:\Python26\lib\re.py", line 151, in sub
>> return _compile(pattern, 0).sub(repl, string, count)
>> File "C:\Python26\lib\re.py", line 273, in _subx
>> template = _compile_repl(template, pattern)
>> File "C:\Python26\lib\re.py", line 260, in _compile_repl
>> raise error, v # invalid expression
>> sre_constants.error: bogus escape (end of line)
>>
>> Why is this the proper handling of what one might think would be an
>> obvious substitution?

On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
> Is this what you want? What you have is a re expression consisting of
> a single backslash that doesn't escape anything (EOL) so it barfs.
>>>> re.sub('abc', r'\\', '123abcdefg')
> '123\\defg'

Turning again to the documentation:

"if it is a string, any backslash escapes in it are processed.
That is, \n is converted to a single newline character, \r is
converted to a linefeed, and so forth."

So why is '\n' converted to a newline but '\\' does not become a literal
backslash? OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::

>>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
True
>>> re.compile('a\\nc') == re.compile('a\nc')
False

So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?

If the answer looks too obvious to state, assume I'm missing it anyway
and please state it. As I said, I seldom use the re module.

Alan Isaac

MRAB

unread,

Dec 17, 2009, 2:45:46 PM12/17/09

to pytho...@python.org

re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').

However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').

However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.

Alan G Isaac

unread,

Dec 17, 2009, 3:18:12 PM12/17/09

to

On 12/17/2009 2:45 PM, MRAB wrote:
> re.compile('a\\nc') _does_ compile to the same as regex as
> re.compile('a\nc').
>
> However, regex objects never compare equal to each other, so, strictly
> speaking, re.compile('a\nc') != re.compile('a\nc').
>
> However, having said that, the re module contains a cache (keyed on the
> string and options supplied), so the first re.compile('a\nc') will put
> the regex object in the cache and the second re.compile('a\nc') will
> return that same regex object from the cache. If you clear the cache in
> between the two calls (do re._cache.clear()) you'll get two different
> regex objects which won't compare equal even though they are to all
> intents identical.

OK, this is helpful.
(I did check equality but did not understand
I got True only because re used caching.)
So is the bottom line the following?
A string replacement is not just "converted"
as described in the documentation, essentially
it is compiled?

But that cannot quite be right. E.g., \b will be a back
space not a word boundary. So then the question arises
again, why isn't '\\' a backslash? Just because?
Why does it not get the "obvious" conversion?

Thanks,
Alan Isaac

MRAB

unread,

Dec 17, 2009, 3:51:04 PM12/17/09

to pytho...@python.org

If you give the re module a string containing \b, eg. '\\b' or r'\b',
then it will compile it to a word boundary if it's in a regex string or
a backspace if it's in a replacement string. This is different from
giving the re module a string which actually contains a backspace, eg,
'\b'.

Because the re module uses backslashes for escaping, you'll need to
escape a literal backslash with a backslash in the string you give it.
But string literals also use backslashes for escaping, so you'll need to
escape each of those backslashes with a backslash.

Rhodri James

unread,

Dec 17, 2009, 7:59:12 PM12/17/09

to

On Thu, 17 Dec 2009 20:18:12 -0000, Alan G Isaac <alan....@gmail.com>
wrote:

> So is the bottom line the following?

> A string replacement is not just "converted"
> as described in the documentation, essentially
> it is compiled?

That depends entirely on what you mean.

> But that cannot quite be right. E.g., \b will be a back
> space not a word boundary. So then the question arises
> again, why isn't '\\' a backslash? Just because?
> Why does it not get the "obvious" conversion?

'\\' *is* a backslash. That string containing a single backslash is then
processed by the re module which sees a backslash, tries to interpret it
as an escape, fails and barfs.

"re.compile('a\\nc')" passes a sequence of four characters to re.compile:
'a', '\', 'n' and 'c'. re.compile() then does it's own interpretation:
'a' passes through as is, '\' flags an escape which combined with 'n'
produces the newline character (0x0a), and 'c' passes through as is.

"re.compile('a\nc')" by contrast passes a sequence of three character to
re.compile: 'a', 0x0a and 'c'. re.compile() does it's own interpretation,
which happens not to change any of the characters, resulting in the same
regular expression as before.

Your problem is that you are conflating the compile-time processing of
string literals with the run-time processing of strings specific to re.

--
Rhodri James *-* Wildebeeste Herder to the Masses

Gregory Ewing

unread,

Dec 18, 2009, 2:51:56 AM12/18/09

to

MRAB wrote:

> Regular expressions and replacement strings have their own escaping
> mechanism, which also uses backslashes.

This seems like a misfeature to me. It makes sense for
a regular expression to give special meanings to backslash
sequences, because it's a sublanguage with its own syntax.
But I can't see any earthly reason to do that with the
*replacement* string, which is just data.

It looks like a feature that's been blindly copied over
from Perl without thinking about whether it makes sense
in Python.

--
Greg

Sion Arrowsmith

unread,

Dec 18, 2009, 12:09:41 PM12/18/09

to

Gregory Ewing <greg....@canterbury.ac.nz> wrote:
>MRAB wrote:
>> Regular expressions and replacement strings have their own escaping
>> mechanism, which also uses backslashes.
>This seems like a misfeature to me. It makes sense for
>a regular expression to give special meanings to backslash
>sequences, because it's a sublanguage with its own syntax.
>But I can't see any earthly reason to do that with the
>*replacement* string, which is just data.

>>> re.sub('a(.)c', r'\1', "123abcdefg")
'123bdefg'

Still think the replacement string is "just data"?

--
\S

under construction

MRAB

unread,

Dec 18, 2009, 12:17:27 PM12/18/09

to pytho...@python.org

In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.

For example, swapping pairs of words:

>>> re.sub(r'(\w+) (\w+)', r'\2 \1', r'first second third fourth')
'second first fourth third'

Python also allows you to provide a function that returns the
replacement string, but that seems a bit long-winded for those cases
when a simple replacement template would suffice.

Alan G Isaac

unread,

Dec 18, 2009, 12:58:08 PM12/18/09

to

On 12/17/2009 7:59 PM, Rhodri James wrote:
> "re.compile('a\\nc')" passes a sequence of four characters to
> re.compile: 'a', '\', 'n' and 'c'. re.compile() then does it's own
> interpretation: 'a' passes through as is, '\' flags an escape which
> combined with 'n' produces the newline character (0x0a), and 'c' passes
> through as is.

I got that from MRAB's posts. (Thanks.)
What I'm not getting is why the replacement string
gets this particular interpretation. What is the payoff?
(Contrast e.g. Vim's substitution syntax.)

Thanks,
Alan

Alan G Isaac

unread,

Dec 18, 2009, 12:59:38 PM12/18/09

to

On 12/18/2009 12:17 PM, MRAB wrote:
> In simple cases you might be replacing with the same string every time,
> but other cases you might want the replacement to contain substrings
> captured by the regex.

Of course that "conversion" is needed in the replacement.
But e.g. Vim substitutions handle this fine without the
odd (to non perlers) handling of backslashes in replacement.

Alan Isaac

Lie Ryan

unread,

Dec 18, 2009, 1:31:19 PM12/18/09

to

Short answer: Python is not Perl, Python's re.sub is not Vim's :s.

Slightly longer answer: Different environments have different need;
vim-ers more often needs to escape with just a plain text. All in all,
the decision for default behaviors are often made so that less backslash
will be needed for the more common case in the particular environment.

Gregory Ewing

unread,

Dec 18, 2009, 7:21:10 PM12/18/09

to

MRAB wrote:

> In simple cases you might be replacing with the same string every time,
> but other cases you might want the replacement to contain substrings
> captured by the regex.

But you can give it a function that has access to the
match object and can produce whatever replacement string
it wants.

You already have a complete programming language at
your disposal. There's no need to invent yet another
mini-language for the replacement string.

--
Greg

MRAB

unread,

Dec 18, 2009, 9:24:00 PM12/18/09

to pytho...@python.org

There's no need for list comprehensions either, but they're much-used
shorthand.

Steven D'Aprano

unread,

Dec 18, 2009, 10:29:32 PM12/18/09

to

The same can't be said for regex replacement strings, which are far more
specialised.

And list comps don't make anything *harder*, they just make things
easier. In contrast, the current behaviour of regex replacements makes it
difficult to use special characters as part of the replacement string.
That's not good.

--
Steven

Rhodri James

unread,

Dec 19, 2009, 7:07:02 PM12/19/09

to

On Fri, 18 Dec 2009 17:58:08 -0000, Alan G Isaac <alan....@gmail.com>
wrote:

> On 12/17/2009 7:59 PM, Rhodri James wrote:

>> "re.compile('a\\nc')" passes a sequence of four characters to
>> re.compile: 'a', '\', 'n' and 'c'. re.compile() then does it's own
>> interpretation: 'a' passes through as is, '\' flags an escape which
>> combined with 'n' produces the newline character (0x0a), and 'c' passes
>> through as is.
>
>
> I got that from MRAB's posts. (Thanks.)
> What I'm not getting is why the replacement string
> gets this particular interpretation. What is the payoff?

So that the substitution escapes \1, \2 and so on work.

Aahz

unread,

Jan 2, 2010, 1:56:46 AM1/2/10

to

In article <7p2juv...@mid.individual.net>,

Gregory Ewing <greg....@canterbury.ac.nz> wrote:
>MRAB wrote:
>>
>> In simple cases you might be replacing with the same string every time,
>> but other cases you might want the replacement to contain substrings
>> captured by the regex.
>
>But you can give it a function that has access to the match object and
>can produce whatever replacement string it wants.

Assuming I remember correctly, the function capability came after the
replacement capability. I think that breaking replacement would be a
Bad Idea.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.