this is my first message on this list so I would like to greet
everybody.
I have a problem with non-greedy operators in regexps. The example
explains everything:
set a {qqawertyauiopsaerttyasss}
regsub -all -- {a.*?a} $a {a--a} b
puts $b
regsub -all -- {(a|aw).*?a} $a {\1--a} b
puts $b
regsub -all -- {(aw?).*?a} $a {\1--a} b
puts $b
exit
prints the following:
qqa--auiopsa--asss
qqaw--asss
qqaw--asss
shouldn't it be "qqa--auiopsa--asss" in all cases? I noticed that
".*?" works ok only if the first limiting pattern does not contain any
wildcard characters or optional patterns. [info tclversion] shows 8.4
(and this is 8.4.2 distribution from ActiveState. Any thoughts?
Thanks,
Marek
Tcl's regex engine works great when the whole regex is either greedy or
non-greedy. Mixed greediness does not work well -- the greediness
preference appears to be based on the first subexpression
(see http://groups.google.com/groups?hl=en&selm=FIECG4.F75%40spsystems.net)
Here, {a.*?a} is entirely non-greedy, and works as expected. Both
{(a|aw).*?a} and {(aw?).*?a} take the greediness preference from the
first expression (a|aw) or (aw?), which both prefer the greedy match.
To recode examples 2 and 3 without non-greedy quantifiers would make the
whole expressions greedy, and would then operate as expected.
regsub -all -- {(a|aw)[^a]*a} $a {\1--a} b
regsub -all -- {(aw?)[^a]*a} $a {\1--a} b
--
Glenn Jackman
NCF Sysadmin
gle...@ncf.ca
Thank you for the explanation. The example I have is unfortunately
more complex (as I am searching for patterns longer than just a single
character) but I have checked that the following really works:
set a {awawqqawertyauiopsaerttyasss}
regsub -all -- {a.*?a} $a {a--a} b
puts $b
regsub -all -- {(a|aw){1,2}?.*?a} $a {\1--a} b
puts $b
regsub -all -- {(aw?){1,2}?.*?a} $a {\1--a} b
puts $b
exit
(with {1,2}? used artificially to enforce non-greediness) prints
a--awqqa--auiopsa--asss
a--awqqa--auiopsa--asss
a--awqqa--auiopsa--asss
Thanks again,
Marek