Message from discussion
Lookahead and tokens
Received: by 10.150.119.36 with SMTP id r36mr371157ybc.23.1274132125692;
Mon, 17 May 2010 14:35:25 -0700 (PDT)
X-BeenThere: lepl@googlegroups.com
Received: by 10.101.132.3 with SMTP id j3ls2535880ann.3.p; Mon, 17 May 2010
14:35:25 -0700 (PDT)
Received: by 10.101.136.7 with SMTP id o7mr1721741ann.26.1274132125252;
Mon, 17 May 2010 14:35:25 -0700 (PDT)
Received: by 10.101.136.7 with SMTP id o7mr1721739ann.26.1274132125211;
Mon, 17 May 2010 14:35:25 -0700 (PDT)
Return-Path: <tazg2...@gmail.com>
Received: from mail-yw0-f155.google.com (mail-yw0-f155.google.com [209.85.211.155])
by gmr-mx.google.com with ESMTP id 11si544851gxk.5.2010.05.17.14.35.25;
Mon, 17 May 2010 14:35:25 -0700 (PDT)
Received-SPF: pass (google.com: domain of tazg2...@gmail.com designates 209.85.211.155 as permitted sender) client-ip=209.85.211.155;
Received: by mail-yw0-f155.google.com with SMTP id 27so3554835ywh.16
for <lepl@googlegroups.com>; Mon, 17 May 2010 14:35:25 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.151.116.12 with SMTP id t12mr361889ybm.34.1274132125164; Mon,
17 May 2010 14:35:25 -0700 (PDT)
Received: by c13g2000vbr.googlegroups.com with HTTP; Mon, 17 May 2010 14:35:25
-0700 (PDT)
Date: Mon, 17 May 2010 14:35:25 -0700 (PDT)
In-Reply-To: <20100517125614.GC29930@acooke.org>
X-IP: 99.251.201.81
References: <0fa46b7b-9215-4d7a-8c9a-6f99dd55c777@s41g2000vba.googlegroups.com>
<20100516115517.GA18891@acooke.org> <e5fcd570-4f35-4478-bb8b-2364a8330158@r9g2000vbk.googlegroups.com>
<20100516212203.GA2708@acooke.org> <244ff7d1-3fe6-490e-b6df-4cc64e021a01@j9g2000vbp.googlegroups.com>
<20100517124021.GA29930@acooke.org> <20100517125614.GC29930@acooke.org>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3)
Gecko/20100401 Firefox/3.6.3,gzip(gfe)
Message-ID: <b0e68c2a-46d6-4ac1-9871-f4477d0f4fea@c13g2000vbr.googlegroups.com>
Subject: [LEPL] Re: Lookahead and tokens
From: jazg <tazg2...@gmail.com>
To: lepl <lepl@googlegroups.com>
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com:
domain of tazg2...@gmail.com designates 209.85.211.155 as permitted sender)
smtp.mail=tazg2...@gmail.com
X-Original-Sender: tazg2...@gmail.com
Reply-To: lepl@googlegroups.com
Precedence: list
Mailing-list: list lepl@googlegroups.com; contact lepl+owners@googlegroups.com
List-ID: <lepl.googlegroups.com>
List-Post: <http://groups.google.com/group/lepl/post?hl=en_US>,
<mailto:lepl@googlegroups.com>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:lepl+help@googlegroups.com>
List-Archive: <http://groups.google.com/group/lepl?hl=en_US>
Sender: lepl@googlegroups.com
List-Subscribe: <http://groups.google.com/group/lepl/subscribe?hl=en_US>,
<mailto:lepl+subscribe@googlegroups.com>
List-Unsubscribe: <http://groups.google.com/group/lepl/subscribe?hl=en_US>,
<mailto:lepl+unsubscribe@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
But Eos() means the absolute end of input right? Consider something
like this...
var =3D ~Lookahead("while" & Eos()) & Lower()[1:,...]
assign =3D var + "=3D1"
var.parse("while") will fail, but assign.parse("while=3D1") will match
because var looks ahead and sees "=3D".
On May 17, 8:56=A0am, andrew cooke <and...@acooke.org> wrote:
> Sorry, that should be:
>
> t_variable =3D Token(Lower()[1:,...])(~Lookahead(Or(*keywords) & Eos())
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 &=
Lower()[1:,...])
>
> to avoid rejecting things like "whilex"
>
> Andrew
>
>
>
>
>
>
>
>
>
> On Mon, May 17, 2010 at 08:40:21AM -0400, Andrew Cooke wrote:
> > On Sun, May 16, 2010 at 06:51:14PM -0700, jazg wrote:
> > > 1 - On second thought I don't only want ~Lookahead(x) because it fail=
s
> > > for anything that begins with x. I want to allow that, and only fail
> > > if it matches x alone. So here's a basic example,
>
> > > token1 =3D Token(Lower())
> > > token2 =3D Token(Lower())
> > > m =3D token1 & token2[1:]
>
> > > Now I also want a specialized version of m that fails specifically on
> > > "ab" but allows any other combination of letters (including things
> > > like "abc" or "cab").
>
> > OK, so what I think you are saying, from above and the rest of your ema=
il, is
> > that you have some words, like "while" which are *keywords* and which c=
annot
> > be used as variable names in your language.
>
> > In that case I would do something like this:
>
> > k_while =3D "while"
> > k_if =3D "if"
> > keywords =3D [k_while, k_if]
>
> > t_while =3D Token(k_while)
> > t_if =3D Token(k_if)
> > t_variable =3D Token(Lower()[1:,...])(~Lookahead(Or(*keywords)) & Lower=
()[1:,...])
>
> > In this case it doesn't matter that the t_variable Token cannot backtra=
ck,
> > because if someone has "while" as a word in their source, it can only b=
e a
> > keyword.
>
> > > 2 - This is my example translated to not use tokens:
>
> > > name =3D Lower()[:,...]
> > > while_ =3D Literal("while")
> > > lparen =3D Literal("(")
> > > rparen =3D Literal(")")
> > > with DroppedSpace():
> > > =A0 =A0 expression =3D Delayed()
> > > =A0 =A0 loop =3D (~while_ & expression) > "loop"
> > > =A0 =A0 expression +=3D (name | (~lparen & expression & ~rparen)) > "=
exp"
> > > test =3D loop | expression
>
> > > When I tried test.parse("whilex") the result is "loop", but it should
> > > be "exp".
> > > I changed the order of "test" to (expression | loop) and it properly
> > > parsed "whilex" as "exp", but "while x" fails at " x". I don't
> > > understand why.
>
> > My guess, from just quickly looking at that, is that you have problems =
because
> > you have
>
> > name =3D Lower()[:,...]
>
> > instead of
>
> > name =3D Lower()[1:,...]
>
> > Andrew
>
> > > 5 - I was afraid of that... I'm not parsing python but I do want
> > > offside parsing.
>
> > > On May 16, 5:22 pm, andrew cooke <and...@acooke.org> wrote:
> > > > 1 - I really meant, an example of why you needed lookahead in token=
s.
>
> > > > 2 - I don't think you will have the problem you described, though, =
with not
> > > > using tokens.
>
> > > > For example (this is untested, so please try to ignore stupid mista=
kes):
>
> > > > =A0 name =3D Word()
> > > > =A0 with DroppedSpace():
> > > > =A0 =A0 =A0 while =3D "while" & name & ":"
> > > > =A0 =A0 =A0 assignment =3D name & "=3D" & name
> > > > =A0 statement =3D while | assignment
>
> > > > now consider statement.parse("whilex =3D foo")
>
> > > > First, "while" & name will match. =A0But then ":" will fail. =A0So =
then the parser
> > > > will try assignment instead.
>
> > > > 3 - If you want to force spaces, use:
>
> > > > =A0 with Separator(Drop(Space()[1:])):
>
> > > > (there's a couple of errors in the docs - one is that DroppedSpace =
isn't in
> > > > the index; the other is that it says that it matches one or more sp=
aces when,
> > > > as you have seen, it matches zero or more)
>
> > > > 4 - This (requiring spaces) gets complicated when you have optional=
values
> > > > separated by spaces (because if the optional thing is missing, you =
can
> > > > stillend up requiring a space on either side of "nothing"). =A0Smar=
tSeparator1()
> > > > and SmartSeparator2() try to address this - seehttp://www.acooke.or=
g/lepl/operators.html#index-97
>
> > > > 5 - However, getting offside parsing to work (if you are trying to =
parse
> > > > Python) without the lexer is going to be "interesting". =A0If you w=
ant offside
> > > > parsing, finding a solution with tokens is important.
>
> > > > Hope that helps - handling spaces is complex, and there are many di=
fferent
> > > > options. =A0Personally I would recommend (2) - simply going with ze=
ro or more
> > > > spaces and relying on the parser backtracking.
>
> > > > Andrew
>
> > > > On Sun, May 16, 2010 at 11:22:26AM -0700, jazg wrote:
> > > > > Well, here is one reason I wanted to use tokens instead of separa=
tors.
>
> > > > > For example, "while x" in python is a loop, but "whilex" is not p=
arsed
> > > > > as a loop because it's a valid variable name. Whereas "while(x)" =
can
> > > > > only be interpreted as a loop because "(" isn't allowed in the mi=
ddle
> > > > > of a name and "(x)" is an expression.
>
> > > > > I can implement this with tokens:
>
> > > > > name =3D Token("[a-z]+")
> > > > > while_ =3D Token("while")
> > > > > lparen =3D Token("\\(")
> > > > > rparen =3D Token("\\)")
> > > > > expression =3D Delayed()
> > > > > loop =3D (~while_ & expression) > "loop"
> > > > > expression +=3D (name | (~lparen & expression & ~rparen)) > "exp"
>
> > > > > In this case everything works: "whilex" is parsed as a single
> > > > > expression, and both "while x" and "while(x)" are parsed as loops=
.
>
> > > > > If I try the same thing with regular matchers and DroppedSpace, I=
end
> > > > > up with "whilex" being considered a loop instead of a name. I can=
't
> > > > > figure out a nice way to get the same results as I do with tokens=
. Is
> > > > > there something simple I'm overlooking?
>
> > > > > On May 16, 7:55 am, andrew cooke <and...@acooke.org> wrote:
> > > > > > Nope, not with tokens. =A0The tokenizer is currently way too si=
mple to support
> > > > > > lookahead.
>
> > > > > > I am working on better regexp support, and that will eventually=
allow this (I
> > > > > > think), but it's a long, long way from being ready.
>
> > > > > > If you give more details of why you want to do this I may be ab=
le to suggest a
> > > > > > workaround, but without more details I can't think of anything =
apart from
> > > > > > using the lookahead inside the parser.
>
> > > > > > Also, remember that backtracking doesn't work for tokens, so pu=
tting Lookahead
> > > > > > inside the token won't work - you will make it fail, but then h=
ave no
> > > > > > alternative.
>
> > > > > > In short, the tokenizer canonly be used where the grammar is si=
mple enough for
> > > > > > it to work. =A0That's why it's optional. =A0If it can't be used=
then don't use it
> > > > > > - use something like DroppedSpace() instead (it won't make much=
difference to
> > > > > > the complexity of the grammar).
>
> > > > > > Sorry,
> > > > > > Andrew
>
> > > > > > On Sat, May 15, 2010 at 08:42:00PM -0700, jazg wrote:
> > > > > > > I'd like to do this:
>
> > > > > > > ~Lookahead(x) & m
>
> > > > > > > where "m" is a matcher made up of tokens. This isn't allowed =
though.
> > > > > > > Is there an easy way to get the same effect?
>
> > > > > > > The only thing I can think of is having special versions of e=
ach token
> > > > > > > like (token(~Lookahead(x) & Any()[:])) and a modified "m" tha=
t uses
> > > > > > > the specialized versions of the tokens, but that sounds tedio=
us and
> > > > > > > confusing.
>
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the G=
oogle Groups "lepl" group.
> > > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to lepl+unsubscrib=
e@googlegroups.com.
> > > > > > > For more options, visit this group athttp://groups.google.com=
/group/lepl?hl=3Den.
>
> > > > > > --
> > > > > > You received this message because you are subscribed to the Goo=
gle Groups "lepl" group.
> > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@=
googlegroups.com.
> > > > > > For more options, visit this group athttp://groups.google.com/g=
roup/lepl?hl=3Den.
>
> > > > > --
> > > > > You received this message because you are subscribed to the Googl=
e Groups "lepl" group.
> > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > To unsubscribe from this group, send email to lepl+unsubscribe@go=
oglegroups.com.
> > > > > For more options, visit this group athttp://groups.google.com/gro=
up/lepl?hl=3Den.
>
> > > > --
> > > > You received this message because you are subscribed to the Google =
Groups "lepl" group.
> > > > To post to this group, send email to lepl@googlegroups.com.
> > > > To unsubscribe from this group, send email to lepl+unsubscribe@goog=
legroups.com.
> > > > For more options, visit this group athttp://groups.google.com/group=
/lepl?hl=3Den.
>
> > > --
> > > You received this message because you are subscribed to the Google Gr=
oups "lepl" group.
> > > To post to this group, send email to lepl@googlegroups.com.
> > > To unsubscribe from this group, send email to lepl+unsubscribe@google=
groups.com.
> > > For more options, visit this group athttp://groups.google.com/group/l=
epl?hl=3Den.
>
> --
> You received this message because you are subscribed to the Google Groups=
"lepl" group.
> To post to this group, send email to lepl@googlegroups.com.
> To unsubscribe from this group, send email to lepl+unsubscribe@googlegrou=
ps.com.
> For more options, visit this group athttp://groups.google.com/group/lepl?=
hl=3Den.
--=20
You received this message because you are subscribed to the Google Groups "=
lepl" group.
To post to this group, send email to lepl@googlegroups.com.
To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups=
.com.
For more options, visit this group at http://groups.google.com/group/lepl?h=
l=3Den.