I can't see the answer at quick glance, but will look at this tonight.
In general, if you hit an infinite loop, it's because you're repeating
something that's empty. So my initial guess would be that when you make the
commas explicit (which I didn't fully follow, so may be misunderstanding) you
are somehow ending up with an empty match for arg.
Andrew
On Mon, Apr 18, 2011 at 09:41:25AM -0700, Tyler Laing wrote:
> Hello all,
>
> I'm working on a very terse language, and I am having some issues
> getting argument lists recognized with Tokens.
>
> I converted the args example over to Tokens because thats what I'm
> using, and now it doesn't work. I am clearly missing something.
>
> Code: http://pastebin.com/i5WTyFrx
>
> Specifically, look for arg_expr.
>
> The idea as I understand it is that we match a number of arguments,
> dropping commas as we go, until we reach the end of the string(which
> too is dropped). An argument can be a list, a tuple or an item. A list
> is made up of successive args, starting with [, ending ], and
> separated by commas, which get dropped. A tuple uses () instead. An
> item can be a string, number, none, bool, or an ident.
>
> I think the issue is the commas, which get consumed and dropped by the
> tokens earlier on, and then the arg_expr can't match because there are
> no more comma separators. If I change comma from ~symbol(',') to
> symbol(',') and put in the ~ where needed, then it seems to be stuck
> in an infinite loop on a very simple input.
>
> So how to do this right?
>
> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>
OK, nothing exciting here (no bugs, thankfully :o), just a few mistakes (not
that I am downplaying the difficulty in finding these - it's taken me nearly
an hour). Also, an important point (4 below) that might bite you later.
1 - First, there was no need for the parse_all in the test. I suspect you
added that in frustration, but it just complicates things so I removed it.
2 - There was a possibility of an empty match in ident. I don't know if this
was causing the loop you saw, but I wouldn't be surprised. You need to make
that [1:] at the end.
3 - (The main one) Eos() isn't a character, it's a conceptual thing
(basically, it matches the error you get when there's nothing left to read).
So don't put it in a symbol because that makes no sense in this context
(you're asking for a symbol that is empty, which will never succeed because
tokens aren't "used" unless they have something to match).
With those, it works for me. I can post the code, but I suspect that's all
you really need. The extra warning is
4 - It looks like you also want strings. The way you're heading your allchars
token will match huge chunks of input and break everything else (because with
tokens, longest wins). The simplest fix for that is to include the quotation
marks in your token (so it's better to have tokens, one with single and one
with double quotes, than to have a matcher just for the contents where you
handle the quotes separately).
Hope that helps!
Andrew
PS If you got the idea that Eos() is a character from something in the docs,
and you can remember where, please say because things did kind-of sort-of work
like that before Lepl 5 and I may not have fixed up the docs correctly.
Thanks.
On Mon, Apr 18, 2011 at 09:41:25AM -0700, Tyler Laing wrote:
That should read:
so it's better to have TWO tokens, one with single and one with double quotes,
than to have a TOKEN just for the contents where you handle the quotes
separately
Sorry, Andrew
Andrew
Apart from the issue I already mentioned with ident (did you get my emails?) I
would also add parentheses to be sure grouping was correct. See
http://www.acooke.org/lepl/faq.html#when-i-change-from-to-my-function-isn-t-called
Andrew
But anyway, that sounds like a huge moun of progress. I don't think anyone
else has described such a complete use of Lepl. Are you considering releasing
the code? It would be good to have as a test...
Cheers,
Andrew
PS Don't hold you breath for the next release. It will include a lot of work
(ie time) and I am currently working on soemthing else that doesn't have a
clear end date yet.
Unless you're getting warnings about left recursion (in which case I would
*strongly* suggest rewriting to fix that) you might find that
config.no_memoize() improves performance (see
http://acooke.org/lepl/lepl5.0.html#id12)
Andrew
It's not trivial writing fast regexps in Python itself (Lepl is actually
compiling your regexps to a state machine and executing them).
Andrew
Andrew
Anyway, looking at the grammar, one place it's happening is with expr. For
example, one of the things that expr can be is cond. And of the things that
cond can be is eq_expr, and one of the things that can be is "expr & ~EQ ...".
So, when the system tries to match an expr it can follow that series of
definitions and do a loop without matching any data. That is left recursion -
it's a problem because the loop can continue forever. So Lepl can try to
match expr, leading to cond, leading to eq_expr, leading to expr, leading to
cond, ....
The way to avoid this is to do what I did in the example you started with. If
you look there (still in your code) you have
parens += ~symbol('(') & group3 ...
group1 += ... parens
group2 += ... group1
group3 += ... group2
see how, even though things loop, they are arranged so that there *must* be at
least a "(" in each loop? That blocks the chance of spinning in one place.
Lepl will detect this and stop the spinnning for you, but it's relatively
slow. That's why I suggest rewriting this.
Does that make sense?
What I sould try doing is making a difference between expr on the left and
right of an operator. So you have lexpr and rexpr. And then only recurse
with rexpr. Also, I think you need to add ( and ) in somewhere? Basically,
copy the example you started with (as you are already doing), but in more
detail. Once you get the idea it's not so hard, but it will mean rewriting
quite a lot of rules.
Cheers,
Andrew
Here is what I think would be better (but I have not actually run it - it's
only to give the idea):
group2 = Delayed()
group3 = Delayed()
group4 = Delayed()
parens = ~symbol('(') & group4 & ~symbol(')')
group1 = parens | item | funccall
mul_ = group1 & ~symbol('*') & group2 > Mul
div_ = group1 & ~symbol('/') & group2 > Div
group2 += mul_ | div_ | group1
add_ = group2 & ~symbol('+') & group3 > Add
sub_ = group2 & ~symbol('-') & group3 > Sub
group3 += add_ | sub_ | group2
group4 += Or(
group3,
group3 & ~symbol('&') & group4 > AND,
group3 & ~symbol('?') & group4 > OR,
group3 & ~GT & group4 > GreaterThan,
group3 & ~GE & group4 > GreaterEqual,
group3 & ~LT & group4 > LessThan,
group3 & ~LE & group4 > LessEqual,
group3 & ~EQ & group4 > Eq,
group3 & ~NEQ & group4 > NEq
)
And then replace group3 by group4 in list_tuple_item etc. Also NOTE CHANGE in
"parens" from group3 to group4.
Also, "?" for "or" seems odd?!
(But this is really impressive).
I don't have a good way to break things systematically (I just stared at your
code until something started to worry me and then found the possible problem
with functions above). What I tried to do with my regexp code was write
separate program that generated random regexps, and then check that, say, 1000
of them passed. You could do the same here, but it would be a fair amount
more code...
Cheers,
Andrew
On Wed, Apr 20, 2011 at 11:00:02AM -0700, Tyler Laing wrote:
> I've now updated it to remove the left recursion issues. I also
> cleaned up the text a bit, laid it out better for readability. I
> decided to remove tuples, and only have parens in functions,
> conditionals, and math evaluations, because those are all unambiguous.
> I also made the test case results easier to read and see if they're
> working.
>
> Can you see if you can "break" the grammar?
>
> -Tyler
>
> On Apr 20, 2:35�ソスam, andrew cooke <and...@acooke.org> wrote:
> > Sorry, didn't phrase my previous email very well. �ソスWhen I said "do you know
> > what is left recursing?" I meant "do you know which rule in the grammar is
> > causing the left recursion?" - I wasn't asking if you understood the theory!
> >
> > Anyway, looking at the grammar, one place it's happening is with expr. �ソスFor
> > example, one of the things that expr can be is cond. �ソスAnd of the things that
> > cond can be is eq_expr, and one of the things that can be is "expr & ~EQ ...".
> >
> > So, when the system tries to match an expr it can follow that series of
> > definitions and do a loop without matching any data. �ソスThat is left recursion -
> > it's a problem because the loop can continue forever. �ソスSo Lepl can try to
> > match expr, leading to cond, leading to eq_expr, leading to expr, leading to
> > cond, ....
> >
> > The way to avoid this is to do what I did in the example you started with. �ソスIf
> > you look there (still in your code) you have
> >
> > �ソス parens += ~symbol('(') & group3 ...
> > �ソス group1 += ... parens
> > �ソス group2 += ... group1
> > �ソス group3 += ... group2
> >
> > see how, even though things loop, they are arranged so that there *must* be at
> > least a "(" in each loop? �ソスThat blocks the chance of spinning in one place.
> >
> > Lepl will detect this and stop the spinnning for you, but it's relatively
> > slow. �ソスThat's why I suggest rewriting this.
> >
> > Does that make sense?
> >
> > What I sould try doing is making a difference between expr on the left and
> > right of an operator. �ソスSo you have lexpr and rexpr. �ソスAnd then only recurse
> > with rexpr. �ソスAlso, I think you need to add ( and ) in somewhere? �ソスBasically,
> > copy the example you started with (as you are already doing), but in more
> > detail. �ソスOnce you get the idea it's not so hard, but it will mean rewriting
> > > On Apr 19, 2:34�ソスpm, andrew cooke <and...@acooke.org> wrote:
> > > > Do you know what's left recursing? �ソスIf you post the latest version and I'll
> > > > have a look. �ソスI don't know of a simple rule, you just have to look at how
> > read more �ソス
Sorry. Andrew
Also, since List extends list, you can just write your own Python code.
Finally, you can subclass List to put code there (like with the operator
example). This is probably what you want - make the "if" node evaluate the
condition and call the correect leaf, etc.
Andrew
> > > > On Apr 20, 2:35�am, andrew cooke <and...@acooke.org> wrote:
> > > > > Sorry, didn't phrase my previous email very well. �When I said "do you know
> > > > > what is left recursing?" I meant "do you know which rule in the grammar is
> > > > > causing the left recursion?" - I wasn't asking if you understood the theory!
> >
> > > > > Anyway, looking at the grammar, one place it's happening is with expr. �For
> > > > > example, one of the things that expr can be is cond. �And of the things that
> > > > > cond can be is eq_expr, and one of the things that can be is "expr & ~EQ ...".
> >
> > > > > So, when the system tries to match an expr it can follow that series of
> > > > > definitions and do a loop without matching any data. �That is left recursion -
> > > > > it's a problem because the loop can continue forever. �So Lepl can try to
> > > > > match expr, leading to cond, leading to eq_expr, leading to expr, leading to
> > > > > cond, ....
> >
> > > > > The way to avoid this is to do what I did in the example you started with. �If
> > > > > you look there (still in your code) you have
> >
> > > > > � parens += ~symbol('(') & group3 ...
> > > > > � group1 += ... parens
> > > > > � group2 += ... group1
> > > > > � group3 += ... group2
> >
> > > > > see how, even though things loop, they are arranged so that there *must* be at
> > > > > least a "(" in each loop? �That blocks the chance of spinning in one place.
> >
> > > > > Lepl will detect this and stop the spinnning for you, but it's relatively
> > > > > slow. �That's why I suggest rewriting this.
> >
> > > > > Does that make sense?
> >
> > > > > What I sould try doing is making a difference between expr on the left and
> > > > > right of an operator. �So you have lexpr and rexpr. �And then only recurse
> > > > > with rexpr. �Also, I think you need to add ( and ) in somewhere? �Basically,
> > > > > copy the example you started with (as you are already doing), but in more
> > > > > detail. �Once you get the idea it's not so hard, but it will mean rewriting
> > > > > > On Apr 19, 2:34�pm, andrew cooke <and...@acooke.org> wrote:
> > > > > > > Do you know what's left recursing? �If you post the latest version and I'll
> > > > > > > have a look. �I don't know of a simple rule, you just have to look at how
> > read more �
I don't know if you've ever written an interpreter before, but the simplest
approach is to have every node in the AST have an execute(namespace) method.
That takes a namespace and returns (result, namespace). The namespace binds
names to values (at it's simplest, with global variables, just a dict).
Each node's execute() does the work for that node. So Add would call the two
sub-nodes and then calculate and return the sum. While a binding node would
extend the namespace with the appropriate variable name and value. The
hardest part is getting variable scopes working correctly - you need to have
namespaces that nest, etc.
Andrew