Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regexp: non greedy?

0 views
Skip to first unread message

Oliver

unread,
May 20, 2008, 6:49:19 AM5/20/08
to
Dear all

I know, that its probably a little boring for the "pro"s to answer
question about regexp...
...but, bevor I start to cry I'd like to ask for a helping hand.

I have the following regexp:
$message =~ m/(\:59\:(.+)?(\n\:\d\d\:))/s ;

I am expecting it to grab everything whats between :59: and :70: into
$2.
Unfortunately it does it greedy that for some examples it does not
the add the first but the last match of :59: and :70:.

What can I do, that it matches the first occurance between :59: and :
70: ?

Regards,
Oliver

Example Data:

a)

:21: Hello World
:23A: 12344523423
:59: I Am a line
I am another line
me too
I could be another but mustn't
:70: ABRV
:71A: kkk

b)

:21: Hello World
:23A: 12344523423
:59: I Am a line
I am another line with a number
:70: ABRV
:71G: kkk
:70: ABRV2
:71H: lll
:70: ABRV3


RedGrittyBrick

unread,
May 20, 2008, 7:01:58 AM5/20/08
to
Oliver wrote:

> I have the following regexp:
> $message =~ m/(\:59\:(.+)?(\n\:\d\d\:))/s ;
>
> I am expecting it to grab everything whats between :59: and :70: into
> $2.
> Unfortunately it does it greedy that for some examples it does not
> the add the first but the last match of :59: and :70:.
>
> What can I do, that it matches the first occurance between :59: and :
> 70: ?
>
> Regards,
> Oliver
>
> Example Data:
>

> [snip, see below]

#!/usr/bin/perl
use strict;
use warnings;

my $x=<<END;


:21: Hello World
:23A: 12344523423
:59: I Am a line
I am another line
me too
I could be another but mustn't
:70: ABRV
:71A: kkk

END

my $y=<<END;


:21: Hello World
:23A: 12344523423
:59: I Am a line
I am another line with a number
:70: ABRV
:71G: kkk
:70: ABRV2
:71H: lll
:70: ABRV3

END

for ($x,$y) {
if (/:59:(.+?):70:/s) {
print "\n'$1'\n";
}
}


--
RGB

Jim Gibson

unread,
May 20, 2008, 12:49:40 PM5/20/08
to
In article
<3cfed297-ddee-46e1...@b1g2000hsg.googlegroups.com>,
Oliver <oli.m...@gmail.com> wrote:

> Dear all
>
> I know, that its probably a little boring for the "pro"s to answer
> question about regexp...
> ...but, bevor I start to cry I'd like to ask for a helping hand.
>
> I have the following regexp:
> $message =~ m/(\:59\:(.+)?(\n\:\d\d\:))/s ;

You need to put the '?' directly after the '+', not after the group. If
it is after the ')', it applies to the group ("zero or one of '(.+') )
and does not modify the '+':

$message =~ m/(\:59\:(.+?)(\n\:\d\d\:))/s ;

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
http://www.usenet.com

xho...@gmail.com

unread,
May 20, 2008, 12:52:53 PM5/20/08
to
Oliver <oli.m...@gmail.com> wrote:
> Dear all
>
> I know, that its probably a little boring for the "pro"s to answer
> question about regexp...
> ...but, bevor I start to cry I'd like to ask for a helping hand.
>
> I have the following regexp:
> $message =~ m/(\:59\:(.+)?(\n\:\d\d\:))/s ;

The ? above is the "1 or 0 quantifier", not the greedy modifier, as it does
not immediately follow another quantifier. It is quantifying the entire
(.+) expression. To get what you want, you need to move ? inside the
parenthesis.

(.+?)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Oliver

unread,
May 21, 2008, 1:11:51 PM5/21/08
to
...

>
> The ? above is the "1 or 0 quantifier", not the greedy modifier, as it does
> not immediately follow another quantifier.  It is quantifying the entire
> (.+) expression.  To get what you want, you need to move ? inside the
> parenthesis.
>
> (.+?)
>
> Xho
>
...

Thank you all for your kind replys.
The problem was indeed my wrong intepretation of the question mark.

Although, the main issue was my understanding of the text file:
against my assumption - and against the specs (of so called MT101
format) - , :70: (and following) appears more than once in the text
file .. in fact, the messages where concatinated by transaction. ..
this just by the way.

Thanks to RedGrittyBrick: You've made my code smother ( to put it
directly into the "if"-loop). :-)

Regards,
Oliver

0 new messages