Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regex newbie

0 views
Skip to first unread message

Greg Carlson

unread,
Feb 19, 2004, 1:17:32 AM2/19/04
to
I've looked through a number of books and faq's and such and haven't been
able to solve my regex conundrum. I need to find the first match before
another match. For example, with the string 'abcdefgabcdefgfooabcdefg', I
need to match 'foo' and the 'a' previous to but nearest 'foo' (not the one
at the beginning of the string). Also, there's an unknown number of
characters between the 'a' and the 'foo'. Any help would be greatly
appreciated.

Greg Carlson


gabe anzelini

unread,
Feb 19, 2004, 1:30:08 AM2/19/04
to
"Greg Carlson" <gregfc...@hotmail.com> writes:

i dont think you can do this with a single regex (i could be wrong)
what i would do is split on foo and then do a .*a to get the last a
i think that would do it... maybe not

-----
gabe

Dave Cardwell

unread,
Feb 19, 2004, 1:32:01 AM2/19/04
to

Normally a regular expression tries to gobble up as much as it can, in this
case it will try to match the 'a' furthest away from 'foo'.

To get round this, you can do:
/a[^a]*foo/
which will match an 'a', any number of anything-but-a, then foo.

Alternatively you can do:
/a.*?foo/
Here the ? makes the regexp 'not greedy'. That is, it will try to match
across the minimum amount of characters (hence the closest 'a' to 'foo').


Either would work, though I'd wager the second was using the best coding
practice.


Regards,
--
Dave Cardwell.
http://dave.blubbernet.com


Brian McCauley

unread,
Feb 19, 2004, 4:00:28 AM2/19/04
to
"Dave Cardwell" <dcs...@ntlworld.com> writes:

> "Greg Carlson" <gregfc...@hotmail.com> wrote:

> > I need to match 'foo' and the 'a' previous to but nearest 'foo'


> > (not the one at the beginning of the string). Also, there's an
> > unknown number of characters between the 'a' and the 'foo'.

> /a[^a]*foo/


> which will match an 'a', any number of anything-but-a, then foo.

That's the normal solution assuming 'a' really is single character.



> Alternatively you can do:
> /a.*?foo/
> Here the ? makes the regexp 'not greedy'. That is, it will try to match
> across the minimum amount of characters (hence the closest 'a' to 'foo').

Bzzzt! Non-geedy does not trump first-match.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Brian McCauley

unread,
Feb 19, 2004, 3:58:45 AM2/19/04
to
"Greg Carlson" <gregfc...@hotmail.com> writes:

> Subject: regex newbie

Please put the subject of your post in the Subject of your post. If
in doubt try this simple test. Imagine you could have been bothered
to have done a search before you posted. Next imagine you found a
thread with your subject line. Would you have been able to recognise
it as the same subject?

If 'a' really is a single character then see other response.

Otherwise I'd usually use...

/(.*)(a.*foo)/

Note this actually matches both everything before the desired target
and the desired target. Note also this finds the last 'a' before the
_last_ 'foo'.

Greg Carlson

unread,
Feb 19, 2004, 11:01:13 AM2/19/04
to
"Brian McCauley" <nob...@mail.com> writes:

> Please put the subject of your post in the Subject of your post....

Oops. I see your point.

> If 'a' really is a single character then see other response.
>
> Otherwise I'd usually use...
>
> /(.*)(a.*foo)/
>
> Note this actually matches both everything before the desired target
> and the desired target. Note also this finds the last 'a' before the
> _last_ 'foo'.

That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
My latest attempt is:

$tmp = 'abcdefgabcdefgfooabcdefgfoo';
$tmp =~ m/(foo)/ogcs;
[do stuff with $1] # this part works as I'd hoped
$tmp = substr($tmp, 0, pos($tmp));
$tmp =~ m/.*(a).+?$/os;

But that still got the first 'a'. Also, $tmp can be rather large so the
substr is a bit distasteful. Is there any way to search backward from the
current pos or something similar? Thanks again.

Greg Carlson


Glenn Jackman

unread,
Feb 19, 2004, 11:09:28 AM2/19/04
to
Greg Carlson <gregfc...@hotmail.com> wrote:
[...]
> That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
> My latest attempt is:
>
> $tmp = 'abcdefgabcdefgfooabcdefgfoo';

my ($stuff) = $tmp =~ /(a[^a]*foo)/;

--
Glenn Jackman
NCF Sysadmin
gle...@ncf.ca

Glenn Jackman

unread,
Feb 19, 2004, 11:22:00 AM2/19/04
to
Greg Carlson <gregfc...@hotmail.com> wrote:
[...]
> That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
> My latest attempt is:
>
> $tmp = 'abcdefgabcdefgfooabcdefgfoo';

As Dave Cardwell posted earlier:

Brian McCauley

unread,
Feb 19, 2004, 2:01:42 PM2/19/04
to
"Greg Carlson" <gregfc...@hotmail.com> writes:

> "Brian McCauley" <nob...@mail.com> writes:
>
> > If 'a' really is a single character then see other response.

I shall assume that since you are still persuing this approach that in
your real problem 'a' is not a single character.

> > /(.*)(a.*foo)/
> >
> > Note this actually matches both everything before the desired target
> > and the desired target. Note also this finds the last 'a' before the
> > _last_ 'foo'.
>
> That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
> My latest attempt is:
>
> $tmp = 'abcdefgabcdefgfooabcdefgfoo';
> $tmp =~ m/(foo)/ogcs;

Don't put qualifiers on m// that you don't understand. /os have no
effect in the above line so if you understood them you'd not have used
them. :-)

> [do stuff with $1] # this part works as I'd hoped

Don't ever do stuff with $1 without first checking that the match
succeded. If you are sure that the match will succeded always then
append "or die" to it. This serves a dual function. Firstly it acts
a comment to anyone who reads your program meaning "I don't think this
match can ever fail". Secondly if it turns out you were wrong Perl
will tell you.

> $tmp = substr($tmp, 0, pos($tmp));
> $tmp =~ m/.*(a).+?$/os;

> But that still got the first 'a'. Also, $tmp can be rather large so the
> substr is a bit distasteful. Is there any way to search backward from the
> current pos or something similar?

Yes, this is what \G is for - it anchors a regex at the current
pos()ition.

$_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';

# I assume pos()==0 initially
# Set pos() to be the end of first 'foo'
/foo/gc or die "no foo";

# Extract everything from the last 'a' before the current position
# to the current position.
/.*(a.*)\G/ or die "no a before first foo";

print "$1\n";

Brian McCauley

unread,
Feb 19, 2004, 2:06:08 PM2/19/04
to
That well know clown Brian McCauley <nob...@mail.com> writes:

> Don't put qualifiers on m// that you don't understand.

Advice he'd do well to follow himself :-)

> $_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';


> /foo/gc or die "no foo";

> /.*(a.*)\G/ or die "no a before first foo";
> print "$1\n";

The /c above does nothing.

$_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';
/foo/g or die "no foo";

Brian McCauley

unread,
Feb 20, 2004, 3:46:08 AM2/20/04
to
Showing a worrying trend towards insanity, Brian McCauley
<nob...@mail.com> refers to himself in the third person when he
writes:

> That well know clown Brian McCauley <nob...@mail.com> writes:
>
> > Don't put qualifiers on m// that you don't understand.
>
> Advice he'd do well to follow himself :-)

Yeah, and like don't remove them from other people's code without
thinking either dude!

> /foo/g or die "no foo";
> /.*(a.*)\G/ or die "no a before first foo";

I suspect in the OP's problem the real target can span newlines so the
OP's use of /s is necessary in the second match.

/.*(a.*)\G/s or die "no a before first foo";

Jim Gibson

unread,
Feb 20, 2004, 4:00:03 PM2/20/04
to
In article <SLYYb.3492$AQ4.1...@newsfep2-win.server.ntli.net>, Dave
Cardwell <dcs...@ntlworld.com> wrote:

> "Greg Carlson" <gregfc...@hotmail.com> wrote:
> > I've looked through a number of books and faq's and such and haven't been
> > able to solve my regex conundrum. I need to find the first match before
> > another match. For example, with the string 'abcdefgabcdefgfooabcdefg', I
> > need to match 'foo' and the 'a' previous to but nearest 'foo' (not the one
> > at the beginning of the string). Also, there's an unknown number of
> > characters between the 'a' and the 'foo'. Any help would be greatly
> > appreciated.
> >
> > Greg Carlson
> >
> >
>
> Normally a regular expression tries to gobble up as much as it can, in this
> case it will try to match the 'a' furthest away from 'foo'.
>
> To get round this, you can do:
> /a[^a]*foo/
> which will match an 'a', any number of anything-but-a, then foo.
>
> Alternatively you can do:
> /a.*?foo/
> Here the ? makes the regexp 'not greedy'. That is, it will try to match
> across the minimum amount of characters (hence the closest 'a' to 'foo').

No, you can't! Perl will match the FIRST string that matches,
regardless of the greediness of the pattern. Try it first before
posting incorrect advice (we can't all be as infallible as Tad).

#!/usr/local/bin/perl

use strict;
use warnings;

my $s = 'cbabcdcbabcdefoopqrstuv';

if( $s =~ /a.*?foo/ ) {
print "matches: <$&>\n";
}else{
print "nomatch\n";
}

Output:
matches: <abcdcbabcdefoo>

0 new messages