> You want
> token1 & (token2 | token3)[0:])
> to not match "while"?
> Tokens are longest match, so if you have a token that matches all of "while",
> it won't be possible for smaller tokens to match part of it.
> You can't avoid "wh ile" from matching the above (token1 as "wh" and token2 as
> "ile", for example), but why would you want to?
> Maybe I am not understanding?
> Andrew
> On Mon, May 17, 2010 at 02:45:17PM -0700, jazg wrote:
> > The problem is I can only use that with a single token. What if I want
> > to apply the lookahead to an entire matcher like (token1 & (token2 |
> > token3)[0:])?
> > On May 17, 8:40 am, andrew cooke <and...@acooke.org> wrote:
> > > On Sun, May 16, 2010 at 06:51:14PM -0700, jazg wrote:
> > > > 1 - On second thought I don't only want ~Lookahead(x) because it fails
> > > > for anything that begins with x. I want to allow that, and only fail
> > > > if it matches x alone. So here's a basic example,
> > > > token1 = Token(Lower())
> > > > token2 = Token(Lower())
> > > > m = token1 & token2[1:]
> > > > Now I also want a specialized version of m that fails specifically on
> > > > "ab" but allows any other combination of letters (including things
> > > > like "abc" or "cab").
> > > OK, so what I think you are saying, from above and the rest of your email, is
> > > that you have some words, like "while" which are *keywords* and which cannot
> > > be used as variable names in your language.
> > > In that case I would do something like this:
> > > k_while = "while"
> > > k_if = "if"
> > > keywords = [k_while, k_if]
> > > t_while = Token(k_while)
> > > t_if = Token(k_if)
> > > t_variable = Token(Lower()[1:,...])(~Lookahead(Or(*keywords)) & Lower()[1:,...])
> > > In this case it doesn't matter that the t_variable Token cannot backtrack,
> > > because if someone has "while" as a word in their source, it can only be a
> > > keyword.
> > > > 2 - This is my example translated to not use tokens:
> > > > name = Lower()[:,...]
> > > > while_ = Literal("while")
> > > > lparen = Literal("(")
> > > > rparen = Literal(")")
> > > > with DroppedSpace():
> > > > expression = Delayed()
> > > > loop = (~while_ & expression) > "loop"
> > > > expression += (name | (~lparen & expression & ~rparen)) > "exp"
> > > > test = loop | expression
> > > > When I tried test.parse("whilex") the result is "loop", but it should
> > > > be "exp".
> > > > I changed the order of "test" to (expression | loop) and it properly
> > > > parsed "whilex" as "exp", but "while x" fails at " x". I don't
> > > > understand why.
> > > My guess, from just quickly looking at that, is that you have problems because
> > > you have
> > > name = Lower()[:,...]
> > > instead of
> > > name = Lower()[1:,...]
> > > Andrew
> > > > 5 - I was afraid of that... I'm not parsing python but I do want
> > > > offside parsing.
> > > > On May 16, 5:22 pm, andrew cooke <and...@acooke.org> wrote:
> > > > > 1 - I really meant, an example of why you needed lookahead in tokens.
> > > > > 2 - I don't think you will have the problem you described, though, with not
> > > > > using tokens.
> > > > > For example (this is untested, so please try to ignore stupid mistakes):
> > > > > name = Word()
> > > > > with DroppedSpace():
> > > > > while = "while" & name & ":"
> > > > > assignment = name & "=" & name
> > > > > statement = while | assignment
> > > > > now consider statement.parse("whilex = foo")
> > > > > First, "while" & name will match. But then ":" will fail. So then the parser
> > > > > will try assignment instead.
> > > > > 3 - If you want to force spaces, use:
> > > > > with Separator(Drop(Space()[1:])):
> > > > > (there's a couple of errors in the docs - one is that DroppedSpace isn't in
> > > > > the index; the other is that it says that it matches one or more spaces when,
> > > > > as you have seen, it matches zero or more)
> > > > > 4 - This (requiring spaces) gets complicated when you have optional values
> > > > > separated by spaces (because if the optional thing is missing, you can
> > > > > stillend up requiring a space on either side of "nothing"). SmartSeparator1()
> > > > > and SmartSeparator2() try to address this - seehttp://www.acooke.org/lepl/operators.html#index-97
> > > > > 5 - However, getting offside parsing to work (if you are trying to parse
> > > > > Python) without the lexer is going to be "interesting". If you want offside
> > > > > parsing, finding a solution with tokens is important.
> > > > > Hope that helps - handling spaces is complex, and there are many different
> > > > > options. Personally I would recommend (2) - simply going with zero or more
> > > > > spaces and relying on the parser backtracking.
> > > > > Andrew
> > > > > On Sun, May 16, 2010 at 11:22:26AM -0700, jazg wrote:
> > > > > > Well, here is one reason I wanted to use tokens instead of separators.
> > > > > > For example, "while x" in python is a loop, but "whilex" is not parsed
> > > > > > as a loop because it's a valid variable name. Whereas "while(x)" can
> > > > > > only be interpreted as a loop because "(" isn't allowed in the middle
> > > > > > of a name and "(x)" is an expression.
> > > > > > I can implement this with tokens:
> > > > > > name = Token("[a-z]+")
> > > > > > while_ = Token("while")
> > > > > > lparen = Token("\\(")
> > > > > > rparen = Token("\\)")
> > > > > > expression = Delayed()
> > > > > > loop = (~while_ & expression) > "loop"
> > > > > > expression += (name | (~lparen & expression & ~rparen)) > "exp"
> > > > > > In this case everything works: "whilex" is parsed as a single
> > > > > > expression, and both "while x" and "while(x)" are parsed as loops.
> > > > > > If I try the same thing with regular matchers and DroppedSpace, I end
> > > > > > up with "whilex" being considered a loop instead of a name. I can't
> > > > > > figure out a nice way to get the same results as I do with tokens. Is
> > > > > > there something simple I'm overlooking?
> > > > > > On May 16, 7:55 am, andrew cooke <and...@acooke.org> wrote:
> > > > > > > Nope, not with tokens. The tokenizer is currently way too simple to support
> > > > > > > lookahead.
> > > > > > > I am working on better regexp support, and that will eventually allow this (I
> > > > > > > think), but it's a long, long way from being ready.
> > > > > > > If you give more details of why you want to do this I may be able to suggest a
> > > > > > > workaround, but without more details I can't think of anything apart from
> > > > > > > using the lookahead inside the parser.
> > > > > > > Also, remember that backtracking doesn't work for tokens, so putting Lookahead
> > > > > > > inside the token won't work - you will make it fail, but then have no
> > > > > > > alternative.
> > > > > > > In short, the tokenizer canonly be used where the grammar is simple enough for
> > > > > > > it to work. That's why it's optional. If it can't be used then don't use it
> > > > > > > - use something like DroppedSpace() instead (it won't make much difference to
> > > > > > > the complexity of the grammar).
> > > > > > > Sorry,
> > > > > > > Andrew
> > > > > > > On Sat, May 15, 2010 at 08:42:00PM -0700, jazg wrote:
> > > > > > > > I'd like to do this:
> > > > > > > > ~Lookahead(x) & m
> > > > > > > > where "m" is a matcher made up of tokens. This isn't allowed though.
> > > > > > > > Is there an easy way to get the same effect?
> > > > > > > > The only thing I can think of is having special versions of each token
> > > > > > > > like (token(~Lookahead(x) & Any()[:])) and a modified "m" that uses
> > > > > > > > the specialized versions of the tokens, but that sounds tedious and
> > > > > > > > confusing.
> > > > > > > > --
> > > > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > > > > --
> > > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > > > --
> > > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > > To post to this group, send email to lepl@googlegroups.com.
> > > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > > --
> > > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > > To post to this group, send email to lepl@googlegroups.com.
> > > > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com.
> > > > For more options, visit this group athttp://groups.google.com/group/lepl?hl=en.
> > > --
> > > You received this message because you are subscribed to the Google Groups "lepl" group.
> > > To post to this group, send email to lepl@googlegroups.com.
> > > To unsubscribe from this group, send email...
> read more »
--