Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular expression for search first substring

12 views
Skip to first unread message

Juan Garcia

unread,
Jun 27, 2015, 12:54:01 PM6/27/15
to
Hi all,

this is a dumb question for any PHP developer, but i and regular expression not have feeling :)

Situation is string:

The boy love movies [any variable string] and sport but sport is more important.

I would like to search string "love movies...and sport" but regular expression i have return "love movies...and sport but sport".

Which is the correct regular expression for this?

Thanks all!

Sorry for my english.

Norman Peelman

unread,
Jun 28, 2015, 7:45:31 AM6/28/15
to
You should supply the regex you are currently working with so that
others can help you better. You don't really say exactly what you are
after but this may get you started:

(love movies)(.+)(and sport)

love movies [any variable string] and sport

--
Norman
Registered Linux user #461062
-Have you been to www.php.net yet?-

Thomas 'PointedEars' Lahn

unread,
Jun 28, 2015, 6:05:33 PM6/28/15
to
Norman Peelman wrote:

> On 06/27/2015 12:53 PM, Juan Garcia wrote:
>> Situation is string:
>>
>> The boy love movies [any variable string] and sport but sport is more
>> important.
>>
>> I would like to search string "love movies...and sport" but regular
>> expression i have return "love movies...and sport but sport".
>>
>> Which is the correct regular expression for this?
>
> You should supply the regex you are currently working with so that
> others can help you better. You don't really say exactly what you are
> after but this may get you started:
>
> (love movies)(.+)(and sport)
>
> love movies [any variable string] and sport

Without flags, “.+” means “any string of at least one character that does
not contain newline” in ERE/PCRE. “Any variable string” is only matched by
“.*” or “.+” with PCRE and the “s” flag, or by an all-character class like
“[\S\s]” without either or both.

The parentheses cause storing of the matches for the subexpressions for
later use of back-references. They are a waste of runtime and memory if you
are only matching against a string.

The OP’s problem is probably that they used

/love movies.*sport/

or

/love movies.+sport/

and did not consider that regular expressions are greedy by default, i.e. as
much as possible is matched by them. There are three ways to work around
that:

A) provide more context, as you indicated (by specifying the “and” as
well)

B) make the expression non-greedy:

/love movies.*?sport/

or

/love movies.+?sport/

C) (only in PCRE) use negative lookahead or (here) lookbehind to prevent
the expression from matching if the match would contain certain
substrings:

/love movies.+(?<! but )sport/

--
PointedEars
Zend Certified PHP Engineer
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
0 new messages