Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Common error with | and ^$ in regexps

4 views
Skip to first unread message

Juerd

unread,
Feb 7, 2005, 7:21:00 AM2/7/05
to Nicholas Clark, perl6-l...@perl.org
Nicholas Clark skribis 2005-02-07 12:10 (+0000):
> Will the relative precedence of grouping versus anchors for beginning and
> end of line remain the same in Perl6 rules?

There currently is no such thing as precedence in regexes. Changing this
would make understanding regexes a lot harder, I think.

And now that (?:) is called [], I think teaching how to just do the
right thing is easy enough.

/^(?:foo|bar|baz)$/

/^ [ foo | bar | baz ] $/

Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Nicholas Clark

unread,
Feb 7, 2005, 7:10:13 AM2/7/05
to perl6-l...@perl.org
Will the relative precedence of grouping versus anchors for beginning and
end of line remain the same in Perl6 rules?

The error of writing

/^(?:free|net|open)bsd|bsdos|interix$/

when you mean

/^(?:(?:free|net|open)bsd|bsdos|interix)$/

is rather too easy to make. This is not the first time I've seen this sort
of error, and I think I've made it myself at least once.

My gut feeling is that the need to write expressions that behave as:

/(?:^(?:free|net|open)bsd)|bsdos|(?:interix$)/

is actually very rare.

Nicholas Clark

----- Forwarded message from Michael G Schwern <sch...@pobox.com> -----

Mailing-List: contact makemak...@perl.org; run by ezmlm
Precedence: bulk
list-help: <mailto:makemak...@perl.org>
list-unsubscribe: <mailto:makemaker-...@perl.org>
list-post: <mailto:make...@perl.org>
List-Id: <makemaker.perl.org>
Delivered-To: mailing list make...@perl.org
Delivered-To: make...@perl.org
X-Spam-Status: No, hits=-2.6 required=8.0
tests=BAYES_00
X-Spam-Check-By: la.mx.develooper.com
Date: Mon, 7 Feb 2005 05:35:00 -0500
From: Michael G Schwern <sch...@pobox.com>
To: Rafael Garcia-Suarez <rgarci...@mandrakesoft.com>
Cc: make...@perl.org
Subject: Re: Where were we at?
In-Reply-To: <slrncvk15b.1cp...@grubert.mandrakesoft.com>
User-Agent: Mutt/1.4i

On Fri, Jan 28, 2005 at 09:23:55AM -0000, Rafael Garcia-Suarez wrote:
> Please don't forget to integrate this platform-specific patch from
> bleadperl :
>
> Change 23849 by rgs@grubert on 2005/01/21 15:26:10
>
> Subject: [perl #33892] Add Interix support
> From: Todd Vierling (via RT) <perlbug-...@perl.org>
> Date: 21 Jan 2005 14:36:31 -0000
> Message-ID: <rt-3.0.11-33892-106...@perl.org>

I think the patch is wrong, or adding more wrongness.

-$Is_BSD = $^O =~ /^(?:free|net|open)bsd|bsdos$/;
+$Is_BSD = $^O =~ /^(?:free|net|open)bsd|bsdos|interix$/;

That second pair of bsd|bsdos isn't enclosed in parens. Furthermore,
its not "freeinterix" its just "interix".

Here's the right line.

$Is_BSD = $^O =~ /^(?:free|net|open)bsd$/ or
$^O eq 'bsdos' or $^O eq 'interix';


----- End forwarded message -----

Patrick R. Michaud

unread,
Feb 7, 2005, 9:55:45 AM2/7/05
to Juerd, Nicholas Clark, perl6-l...@perl.org
On Mon, Feb 07, 2005 at 01:21:00PM +0100, Juerd wrote:
> Nicholas Clark skribis 2005-02-07 12:10 (+0000):
> > Will the relative precedence of grouping versus anchors for beginning and
> > end of line remain the same in Perl6 rules?
>
> There currently is no such thing as precedence in regexes. Changing this
> would make understanding regexes a lot harder, I think.

A clarification:

P6rules currently has precedence. Alternatives is one
of the looser bindings. I suspect the original question is really
asking about the "relative precedence of [alternatives] versus
anchors for beginning and end of [string]", since the question arises
even in the absence of (?:...) constructs.

Here's a brief stab at a p6 rule expression "precedence table", at least
for what I've been working with in the grammar engine:

terms a . \s \b ^ $ ^^ $$ (...) [...] <...> :: :::
quantifiers * + ? *? +? ?? **{...} **{...}?
backtracking :
concatenation
conjunctive &
alternative |

Thus Nicholas' question is really asking if we can give ^ and $ a
looser binding than alternatives, such that

/ ^ abc | def | ghi $ /

binds as

/ ^ [ abc | def | ghi ] $/

and not

/ [^abc] | def | [ghi$] /

It's certainly technically possible to do this, but I'd then wonder what
to do about ^^ and $$, and if it would then be more confusing that
^ and $ (and possibly ^^ and $$) bind much more loosely than the
other assertions. Personally, I'll let you guys hash out those things
and then set the grammar engine to match. :-)

Pm

0 new messages