Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[svn:perl6-synopsis] r13523 - doc/trunk/design/syn

2 views
Skip to first unread message

la...@cvs.perl.org

unread,
Jan 16, 2007, 2:09:43 PM1/16/07
to perl6-l...@perl.org
Author: larry
Date: Tue Jan 16 11:09:42 2007
New Revision: 13523

Modified:
doc/trunk/design/syn/S05.pod

Log:
Tweak | to provide longest-token instead of short-circuit semantics.
Now use || for old short-circuit semantics!


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod (original)
+++ doc/trunk/design/syn/S05.pod Tue Jan 16 11:09:42 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <pmic...@pobox.com> and
Larry Wall <la...@wall.org>
Date: 24 Jun 2002
- Last Modified: 23 Dec 2006
+ Last Modified: 16 Jan 2007
Number: 5
- Version: 41
+ Version: 42

This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I<regex> rather than "regular
@@ -67,6 +67,29 @@

=back

+While the syntax of C<|> does not change, the default semantics do
+change slightly. Instead of representing temporal alternation, C<|>
+now represents logical alternation with longest-token semantics.
+(You may now use C<||> to indicate the old temporal alternation. That is,
+C<|> and C<||> now work within regex syntax much the same as they
+do outside of regex syntax, where they represent junctional and
+short-circuit OR.) Every regex in Perl 6 is required to be able to
+return its list of initial constant strings (transitively including the
+initial constant strings of any initial subrule called by that regex).
+A logical alternation using C<|> then takes two or more of these lists
+and dispatches to the alternative that advertises the longest matching
+prefix, not necessarily to the alternative that comes first lexically.
+(However, in the case of a tie between alternatives, the first earlier
+alternative does take precedence.)
+
+Initial constants must take into account case sensitivity (or any other
+canonicalization primitives) and do the right thing even when propagated
+up to rules that don't have the same canonicalization. That is, they
+must continue to represent the set of matches that the lower rule would
+match. If and when the optimizer turns such a list of prefixes into,
+say, a trie, the trie must continue to have the appropriate semantics
+for the originating rule.
+
=head1 Modifiers

=over
@@ -1319,6 +1342,10 @@
put an explicit C<!> after the alternation to enable backing into
another alternative if the first pick fails.

+The C<::> also has the effect of hiding any constant string on the right
+from "longest token" processing by C<|>. Only the left side is evaluated
+for initial constancy.
+
=item *

Backtracking over a triple colon causes the current regex to fail

Larry Wall

unread,
Jan 16, 2007, 1:41:03 PM1/16/07
to perl6-l...@perl.org
Note, in case you don't read synopsis checkins: the previous checkin
majorly changes the semantics of | within regex to support required
longest-token matching semantics rather than left-to-right matching.
This is nearly on the same philosophical level as requiring the
tail-recursion optimization. It will enable us to write parsers
more consistently, and it also opens up normal regexes to better
optimization via tries and such. You can now use || for the old |
semantics, which is majorly consistent with how | and || work outside
of regexen.

Larry

Patrick R. Michaud

unread,
Jan 16, 2007, 3:05:44 PM1/16/07
to perl6-l...@perl.org

Do we leave C<&> alone (as opposed to introducing a corresponding C<&&>
operator)? I can see arguments both ways.

Pm

Larry Wall

unread,
Jan 16, 2007, 3:16:50 PM1/16/07
to perl6-l...@perl.org
On Tue, Jan 16, 2007 at 02:05:44PM -0600, Patrick R. Michaud wrote:

Good question...

I think let's go ahead and put in && as well to guarantee order.
Then & can evaluate in any order it likes, including even interleaved
if it doesn't want one branch to get too far ahead of the other,
or if it can figure out that one branch can falsify earlier or more
often than the other. Or it could make a dynamic decision which
branch to try first based on past history.

And it could compare prefix sets fore and aft to traverse with a
single trie if it wants to factor out the common prefix, I guess.
Though I suppose that could happen anyway.

But mostly I think we just do it for consistency, and to avoid a FAQ.

Larry

0 new messages