Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Possible re bug when using ".*"

30 views
Skip to first unread message

Alexander Richert - NOAA Affiliate

unread,
Dec 28, 2022, 1:45:24 PM12/28/22
to
In a couple recent versions of Python (including 3.8 and 3.10), the
following code:
import re
print(re.sub(".*", "replacement", "pattern"))
yields the output "replacementreplacement".

This behavior does not occur in 3.6.

Which behavior is the desired one? Perhaps relatedly, I noticed that even
in 3.6, the code
print(re.findall(".*","pattern"))
yields ['pattern',''] which is not what I was expecting.

Thanks,
Alex Richert

--
Alexander Richert, PhD
*RedLine Performance Systems*

Roel Schroeven

unread,
Dec 28, 2022, 1:59:37 PM12/28/22
to
Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022
om 19:42:
> In a couple recent versions of Python (including 3.8 and 3.10), the
> following code:
> import re
> print(re.sub(".*", "replacement", "pattern"))
> yields the output "replacementreplacement".
>
> This behavior does not occur in 3.6.
>
> Which behavior is the desired one? Perhaps relatedly, I noticed that even
> in 3.6, the code
> print(re.findall(".*","pattern"))
> yields ['pattern',''] which is not what I was expecting.
The documentation for re.sub() and re.findall() has these notes:
"Changed in version 3.7: Empty matches for the pattern are replaced when
adjacent to a previous non-empty match." and "Changed in version 3.7:
Non-empty matches can now start just after a previous empty match."
That's probably describes the behavior you're seeing. ".*" first matches
"pattern", which is a non-empty match; then it matches the empty string
at the end, which is an empty match but is replaced because it is
adjacent to a non-empty match.

Seems somewhat counter-intuitive to me, but AFAICS it's the intended
behavior.

--
"Programming today is a race between software engineers striving to build bigger
and better idiot-proof programs, and the Universe trying to produce bigger and
better idiots. So far, the Universe is winning."
-- Douglas Adams

Roel Schroeven

unread,
Dec 28, 2022, 2:03:32 PM12/28/22
to
Roel Schroeven schreef op 28/12/2022 om 19:59:
> Alexander Richert - NOAA Affiliate via Python-list schreef op
> 28/12/2022 om 19:42:
>> In a couple recent versions of Python (including 3.8 and 3.10), the
>> following code:
>> import re
>> print(re.sub(".*", "replacement", "pattern"))
>> yields the output "replacementreplacement".
>>
>> This behavior does not occur in 3.6.
>>
>> Which behavior is the desired one? Perhaps relatedly, I noticed that even
>> in 3.6, the code
>> print(re.findall(".*","pattern"))
>> yields ['pattern',''] which is not what I was expecting.
> The documentation for re.sub() and re.findall() has these notes:
> "Changed in version 3.7: Empty matches for the pattern are replaced
> when adjacent to a previous non-empty match." and "Changed in version
> 3.7: Non-empty matches can now start just after a previous empty match."
> That's probably describes the behavior you're seeing. ".*" first
> matches "pattern", which is a non-empty match; then it matches the
> empty string at the end, which is an empty match but is replaced
> because it is adjacent to a non-empty match.
>
> Seems somewhat counter-intuitive to me, but AFAICS it's the intended
> behavior.
For what it's worth, there's some discussion about this in this Github
issue: https://github.com/python/cpython/issues/76489

--
"Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à
la mort pour que vous ayez le droit de le dire."
-- Attribué à Voltaire
"I disapprove of what you say, but I will defend to the death your right to
say it."
-- Attributed to Voltaire
"Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen
tot de dood toe verdedigen"
-- Toegeschreven aan Voltaire

MRAB

unread,
Dec 28, 2022, 2:10:33 PM12/28/22
to
On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list
wrote:
> In a couple recent versions of Python (including 3.8 and 3.10), the
> following code:
> import re
> print(re.sub(".*", "replacement", "pattern"))
> yields the output "replacementreplacement".
>
> This behavior does not occur in 3.6.
>
> Which behavior is the desired one? Perhaps relatedly, I noticed that even
> in 3.6, the code
> print(re.findall(".*","pattern"))
> yields ['pattern',''] which is not what I was expecting.
>
It's not a bug, it's a change in behaviour to bring it more into line
with other regex implementations in other languages.

Ethan Furman

unread,
Dec 28, 2022, 3:37:45 PM12/28/22
to
The new behavior makes no sense to me, but better to be consistent with the other regex engines than not -- I still get
thrown off by vim's regex.

--
~Ethan~

Peter J. Holzer

unread,
Jan 1, 2023, 12:54:29 PM1/1/23
to
On 2022-12-28 19:07:06 +0000, MRAB wrote:
> On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list
> wrote:
> > print(re.sub(".*", "replacement", "pattern"))
> > yields the output "replacementreplacement".
[...]
> It's not a bug, it's a change in behaviour to bring it more into line with
> other regex implementations in other languages.

Interesting. Perl does indeed behave that way, too. Never noticed that
in 28 years of using it.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | h...@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
signature.asc
0 new messages