Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Lookahead and tokens
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jazg  
View profile  
 More options May 17 2010, 7:06 pm
From: jazg <tazg2...@gmail.com>
Date: Mon, 17 May 2010 16:06:13 -0700 (PDT)
Local: Mon, May 17 2010 7:06 pm
Subject: [LEPL] Re: Lookahead and tokens
It's more like this:

a = Token(Any("abcd"))
b = Token(Any("efgh"))
both = a + b

Now I want "both" to normally match each combination, but also have a
special case where "ae" is disallowed.

But now that you reminded me that this would also match a and b with a
space in between, I think I'm doing this completely wrong. I have
probably made a lot of stupid mistakes because I'm converting code
that originally didn't use tokens, instead of designing it with tokens
from the start.

Looking at everything you posted, I think I have a good solution now:

a = Any("abcd")
b = Any("efgh")
t_a = Token(a)
t_b = Token(b)
t_both = Token(a + b)
both_no_ae = both(~Lookahead("ae" & Eos()) & Any()[:,...])

I will try to apply this to my real code and if I still have problems
I will show a more specific example of what I'm doing.

On May 17, 5:49 pm, andrew cooke <and...@acooke.org> wrote:

> You want
>   token1 & (token2 | token3)[0:])
> to not match "while"?

> Tokens are longest match, so if you have a token that matches all of "while",
> it won't be possible for smaller tokens to match part of it.

> You can't avoid "wh ile" from matching the above (token1 as "wh" and token2 as
> "ile", for example), but why would you want to?

> Maybe I am not understanding?

> Andrew

> On Mon, May 17, 2010 at 02:45:17PM -0700, jazg wrote:
> > The problem is I can only use that with a single token. What if I want
> > to apply the lookahead to an entire matcher like (token1 & (token2 |
> > token3)[0:])?

> > On May 17, 8:40 am, andrew cooke <and...@acooke.org> wrote:
> > > On Sun, May 16, 2010 at 06:51:14PM -0700, jazg wrote:
> > > > 1 - On second thought I don't only want ~Lookahead(x) because it fails
> > > > for anything that begins with x. I want to allow that, and only fail
> > > > if it matches x alone. So here's a basic example,

> > > > token1 = Token(Lower())
> > > > token2 = Token(Lower())
> > > > m = token1 & token2[1:]

> > > > Now I also want a specialized version of m that fails specifically on
> > > > "ab" but allows any other combination of letters (including things
> > > > like "abc" or "cab").

> > > OK, so what I think you are saying, from above and the rest of your email, is
> > > that you have some words, like "while" which are *keywords* and which cannot
> > > be used as variable names in your language.

> > > In that case I would do something like this:

> > > k_while = "while"
> > > k_if = "if"
> > > keywords = [k_while, k_if]

> > > t_while = Token(k_while)
> > > t_if = Token(k_if)
> > > t_variable = Token(Lower()[1:,...])(~Lookahead(Or(*keywords)) & Lower()[1:,...])

> > > In this case it doesn't matter that the t_variable Token cannot backtrack,
> > > because if someone has "while" as a word in their source, it can only be a
> > > keyword.

> > > > 2 - This is my example translated to not use tokens:

> > > > name = Lower()[:,...]
> > > > while_ = Literal("while")
> > > > lparen = Literal("(")
> > > > rparen = Literal(")")
> > > > with DroppedSpace():
> > > >     expression = Delayed()
> > > >     loop = (~while_ & expression) > "loop"
> > > >     expression += (name | (~lparen & expression & ~rparen)) > "exp"
> > > > test = loop | expression

> > > > When I tried test.parse("whilex") the result is "loop", but it should
> > > > be "exp".
> > > > I changed the order of "test" to (expression | loop) and it properly
> > > > parsed "whilex" as "exp", but "while x" fails at " x". I don't
> > > > understand why.

> > > My guess, from just quickly looking at that, is that you have problems because
> > > you have

> > > name = Lower()[:,...]

> > > instead of

> > > name = Lower()[1:,...]

> > > Andrew

> > > > 5 - I was afraid of that... I'm not parsing python but I do want
> > > > offside parsing.

> > > > On May 16, 5:22 pm, andrew cooke <and...@acooke.org> wrote:
> > > > > 1 - I really meant, an example of why you needed lookahead in tokens.

> > > > > 2 - I don't think you will have the problem you described, though, with not
> > > > > using tokens.

> > > > > For example (this is untested, so please try to ignore stupid mistakes):

> > > > >   name = Word()
> > > > >   with DroppedSpace():
> > > > >       while = "while" & name & ":"
> > > > >       assignment = name & "=" & name
> > > > >   statement = while | assignment

> > > > > now consider statement.parse("whilex = foo")

> > > > > First, "while" & name will match.  But then ":" will fail.  So then the parser
> > > > > will try assignment instead.

> > > > > 3 - If you want to force spaces, use:

> > > > >   with Separator(Drop(Space()[1:])):

> > > > > (there's a couple of errors in the docs - one is that DroppedSpace isn't in
> > > > > the index; the other is that it says that it matches one or more spaces when,
> > > > > as you have seen, it matches zero or more)

> > > > > 4 - This (requiring spaces) gets complicated when you have optional values
> > > > > separated by spaces (because if the optional thing is missing, you can
> > > > > stillend up requiring a space on either side of "nothing").  SmartSeparator1()
> > > > > and SmartSeparator2() try to address this - seehttp://www.acooke.org/lepl/operators.html#index-97

> > > > > 5 - However, getting offside parsing to work (if you are trying to parse
> > > > > Python) without the lexer is going to be "interesting".  If you want offside
> > > > > parsing, finding a solution with tokens is important.

> > > > > Hope that helps - handling spaces is complex, and there are many different
> > > > > options.  Personally I would recommend (2) - simply going with zero or more
> > > > > spaces and relying on the parser backtracking.

> > > > > Andrew

> > > > > On Sun, May 16, 2010 at 11:22:26AM -0700, jazg wrote:
> > > > > > Well, here is one reason I wanted to use tokens instead of separators.

> > > > > > For example, "while x" in python is a loop, but "whilex" is not parsed
> > > > > > as a loop because it's a valid variable name. Whereas "while(x)" can
> > > > > > only be interpreted as a loop because "(" isn't allowed in the middle
> > > > > > of a name and "(x)" is an expression.

> > > > > > I can implement this with tokens:

> > > > > > name = Token("[a-z]+")
> > > > > > while_ = Token("while")
> > > > > > lparen = Token("\\(")
> > > > > > rparen = Token("\\)")
> > > > > > expression = Delayed()
> > > > > > loop = (~while_ & expression) > "loop"
> > > > > > expression += (name | (~lparen & expression & ~rparen)) > "exp"

> > > > > > In this case everything works: "whilex" is parsed as a single
> > > > > > expression, and both "while x" and "while(x)" are parsed as loops.

> > > > > > If I try the same thing with regular matchers and DroppedSpace, I end
> > > > > > up with "whilex" being considered a loop instead of a name. I can't
> > > > > > figure out a nice way to get the same results as I do with tokens. Is
> > > > > > there something simple I'm overlooking?

> > > > > > On May 16, 7:55 am, andrew cooke <and...@acooke.org> wrote:
> > > > > > > Nope, not with tokens.  The tokenizer is currently way too simple to support
> > > > > > > lookahead.

> > > > > > > I am working on better regexp support, and that will eventually allow this (I
> > > > > > > think), but it's a long, long way from being ready.

> > > > > > > If you give more details of why you want to do this I may be able to suggest a
> > > > > > > workaround, but without more details I can't think of anything apart from
> > > > > > > using the lookahead inside the parser.

> > > > > > > Also, remember that backtracking doesn't work for tokens, so putting Lookahead
> > > > > > > inside the token won't work - you will make it fail, but then have no
> > > > > > > alternative.

> > > > > > > In short, the tokenizer canonly be used where the grammar is simple enough for
> > > > > > > it to work.  That's why it's optional.  If it can't be used then don't use it
> > > > > > > - use something like DroppedSpace() instead (it won't make much difference to
> > > > > > > the complexity of the grammar).

> > > > > > > Sorry,
> > > > > > > Andrew

> > > > > > > On Sat, May 15, 2010 at 08:42:00PM -0700, jazg wrote:
> > > > > > > > I'd like to do this:

> > > > > > > > ~Lookahead(x) & m

> > > > > > > > where "m" is a matcher made up of tokens. This isn't allowed though.
> > > > > > > > Is there an easy way to get the same effect?

> > > > > > > > The only thing I can think of is having special versions of each token
> > > > > > > > like (token(~Lookahead(x) & Any()[:])) and a modified "m" that uses
> > > > > > > > the specialized versions of the tokens, but that sounds tedious and
> > > > > > > > confusing.

> > > > > > > > --
> > > > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.

> > > > > > > --
> > > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.

> > > > > > --
> > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.

> > > > > --
> > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.

> > > > --
> > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > To post to this group, send email to lepl@googlegroups.com.
> > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.

> > > --
> > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > To post to this group, send email to lepl@googlegroups.com.
> > > To unsubscribe from this group, send email...

> read more »

--
You received this message because you are subscribed to the Google Groups "lepl" group.
To post to this group, send email to lepl@googlegroups.com.
To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lepl?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.