Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

non-greedy operators in regular expressions

2 views
Skip to first unread message

Marek Malowidzki

unread,
May 13, 2003, 9:37:07 AM5/13/03
to
Hi all,

this is my first message on this list so I would like to greet
everybody.

I have a problem with non-greedy operators in regexps. The example
explains everything:

set a {qqawertyauiopsaerttyasss}
regsub -all -- {a.*?a} $a {a--a} b
puts $b
regsub -all -- {(a|aw).*?a} $a {\1--a} b
puts $b
regsub -all -- {(aw?).*?a} $a {\1--a} b
puts $b
exit

prints the following:

qqa--auiopsa--asss
qqaw--asss
qqaw--asss

shouldn't it be "qqa--auiopsa--asss" in all cases? I noticed that
".*?" works ok only if the first limiting pattern does not contain any
wildcard characters or optional patterns. [info tclversion] shows 8.4
(and this is 8.4.2 distribution from ActiveState. Any thoughts?

Thanks,

Marek

Glenn Jackman

unread,
May 13, 2003, 11:57:30 AM5/13/03
to

Tcl's regex engine works great when the whole regex is either greedy or
non-greedy. Mixed greediness does not work well -- the greediness
preference appears to be based on the first subexpression
(see http://groups.google.com/groups?hl=en&selm=FIECG4.F75%40spsystems.net)

Here, {a.*?a} is entirely non-greedy, and works as expected. Both
{(a|aw).*?a} and {(aw?).*?a} take the greediness preference from the
first expression (a|aw) or (aw?), which both prefer the greedy match.

To recode examples 2 and 3 without non-greedy quantifiers would make the
whole expressions greedy, and would then operate as expected.

regsub -all -- {(a|aw)[^a]*a} $a {\1--a} b
regsub -all -- {(aw?)[^a]*a} $a {\1--a} b


--
Glenn Jackman
NCF Sysadmin
gle...@ncf.ca

Marek Malowidzki

unread,
May 14, 2003, 4:10:11 AM5/14/03
to
xx...@freenet.carleton.ca (Glenn Jackman) wrote in message news:<slrnbc25f9...@freenet10.carleton.ca>...

Thank you for the explanation. The example I have is unfortunately
more complex (as I am searching for patterns longer than just a single
character) but I have checked that the following really works:

set a {awawqqawertyauiopsaerttyasss}


regsub -all -- {a.*?a} $a {a--a} b
puts $b

regsub -all -- {(a|aw){1,2}?.*?a} $a {\1--a} b
puts $b
regsub -all -- {(aw?){1,2}?.*?a} $a {\1--a} b
puts $b
exit

(with {1,2}? used artificially to enforce non-greediness) prints

a--awqqa--auiopsa--asss
a--awqqa--auiopsa--asss
a--awqqa--auiopsa--asss

Thanks again,

Marek

0 new messages