Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Non-greedy regexp?

22 views
Skip to first unread message

Chris Adams

unread,
Apr 16, 2008, 8:15:48 PM4/16/08
to
Can someone explain the following non-greedy behavior?

$ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
1

I thought that making the .* non-greedy would make it match the fewest
characters possible before the end of the string; I was expecting it to
output "1.2.3".
--
Chris Adams <cma...@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

Marc Girod

unread,
Apr 17, 2008, 5:07:34 AM4/17/08
to
On Apr 17, 1:15 am, cmad...@hiwaay.net (Chris Adams) wrote:
> Can someone explain the following non-greedy behavior?
>
> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
> 1
>
> I thought that making the .* non-greedy would make it match the fewest
> characters possible before the end of the string; I was expecting it to
> output "1.2.3".

Er... don't you want it to work 'backwards',
i.e. 'in insight'?
The greediness applies forwards, so here it doesn't apply at all.
The first \. matches first after '1', and the rest matches what
follows.
It is too late to applies backward greediness to picking a different
match for \., which I understand you were expecting.

What about:

$ perl -e '$x="1.2.3.4"; $x=~s/\.[^.]*$//; print $x,"\n"'
1.2.3

Marc

Bruce Van Allen

unread,
Apr 17, 2008, 12:36:37 AM4/17/08
to

On Apr 16, 2008, at 5:15 PM, Chris Adams wrote:

> Can someone explain the following non-greedy behavior?
>
> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
> 1
>
> I thought that making the .* non-greedy would make it match the fewest
> characters possible before the end of the string;

Actually, it does just that. Remember that the regex engine works from
the left. So it finds a '.' just after the '1'; then it looks for the
fewest characters from there to the end, which would be '2.3.4'.

If the '$' is left out, then it just looks for the fewest characters:
# zero chars is the minimum
$ perl -e '$x="1.2.3.4"; $x=~s/\..*?//; print $x,"\n"'
12.3.4

# one char is the minimum
$ perl -e '$x="1.2.3.4"; $x=~s/\..+?//; print $x,"\n"'
1.3.4

> I was expecting it to
> output "1.2.3".

# assume one character at end:
$ perl -e '$x="1.2.3.4"; $x=~s/\..$//; print $x,"\n"'
1.2.3
# OR #
# Assume last character(s) NOT '.' (4 variations):
# a) single character at end
$ perl -e '$x="1.2.3.4"; $x=~s/\.[^.]$//; print $x,"\n"'
1.2.3
# OR #
# b) one or more characters at end
$ perl -e '$x="1.2.3.4"; $x=~s/\.[^.]+$//; print $x,"\n"'
1.2.3
# OR #
# c) zero or more characters at end


$ perl -e '$x="1.2.3.4"; $x=~s/\.[^.]*$//; print $x,"\n"'
1.2.3

# OR #
# d) don't need non-greedy match of non-'.' character(s) at end, but
it works:
$ perl -e '$x="1.2.3.4"; $x=~s/\.[^.]*?$//; print $x,"\n"'
1.2.3

# OR #
# Another approach:
# first grab and capture as much as possible,
# then match '.' and the rest to the end;
# return captured value ($1):
$ perl -e '$x="1.2.3.4"; $x=~s/(.*)\..*?$/$1/; print $x,"\n"'
1.2.3

HTH

Best,

- Bruce

__bruce__van_allen__santa_cruz__ca__

Vaclav Barta

unread,
Apr 17, 2008, 1:07:25 AM4/17/08
to
Hi,

On Thursday 17 April 2008 02:15:48 Chris Adams wrote:
> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
> 1
>
> I thought that making the .* non-greedy would make it match the fewest

> characters possible before the end of the string; I was expecting it to
The non-greedy star applies only to the dot metacharacter before it; the
literal dot at the beginning of your regexp still matches at the earliest
opportunity (that is, it matches the first dot in "1.2.3.4"), and after that
there's nothing the rest of the regexp can do but match the rest of the
string. You may want to use /\.[^.]*$/ instead.

Bye
Vasek
--
http://www.mangrove.cz/
Open Source integration

Gunnar Hjalmarsson

unread,
Apr 17, 2008, 2:37:32 AM4/17/08
to
Chris Adams wrote:
> Can someone explain the following non-greedy behavior?
>
> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
> 1
>
> I thought that making the .* non-greedy would make it match the fewest
> characters possible before the end of the string;

It does. However, the \. matches the first occurrence of a period, and
making the .* non-greedy does not change that.

Actually, since you have .* right before the $ character, it doesn't
matter if .* is greedy or not.

> I was expecting it to output "1.2.3".

To achieve that, you probably want something like:

$x =~ s/(.*)\..*/$1/;

Here the \. matches the _last_ occurrence of a period.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

John W. Krahn

unread,
Apr 17, 2008, 1:06:38 AM4/17/08
to
Chris Adams wrote:
> Can someone explain the following non-greedy behavior?
>
> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
> 1
>
> I thought that making the .* non-greedy would make it match the fewest
> characters possible before the end of the string; I was expecting it to
> output "1.2.3".

It does match the fewest possible, it's just that \. matches the *first*
'.' character and .*? matches everything from there to the end-of-line $
anchor.

If you want the result to be '1.2.3' then you could do this:

$x =~ s/\.[^.]*$//;


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Chris Adams

unread,
Apr 17, 2008, 8:54:58 AM4/17/08
to
Once upon a time, John W. Krahn <kra...@telus.net> said:
>Chris Adams wrote:
>> Can someone explain the following non-greedy behavior?
>>
>> $ perl -e '$x="1.2.3.4"; $x=~s/\..*?$//; print $x,"\n"'
>> 1
>>
>> I thought that making the .* non-greedy would make it match the fewest
>> characters possible before the end of the string; I was expecting it to
>> output "1.2.3".
>
>It does match the fewest possible, it's just that \. matches the *first*
>'.' character and .*? matches everything from there to the end-of-line $
>anchor.

Ok, I see. Thanks (to all replies).

>If you want the result to be '1.2.3' then you could do this:
>
>$x =~ s/\.[^.]*$//;

That's what I ended up with; I had someone asking me over the phone and
he'd tried the non-greedy way and it didn't work. I wanted to
understand why it didn't, and now I do.

0 new messages