you need to use [...] to join together the fragments, but you only want to joint together the text, not the comments, so need to break things open a little.
> you need to use [...] to join together the fragments, but you only want to joint together the text, not the comments, so need to break things open a little.
is it slow parsing the tiny test case, or slow parsing some larger file? if the latter, can you show me the file that it is having problems with?
more generally, it could be that it contains an error (in which case lepl is backtracking like crazy because it is stuck) or some problem with the grammar that i can't see, but which might be fixed by appropriate use of First().
On Sat, Nov 12, 2011 at 11:19:11PM +0100, Jelle Herold wrote: > Hi Andrew,
> Thanks for answering!
> On Nov 11, 2011, at 1:47 AM, andrew cooke wrote: > > you need to use [...] to join together the fragments, but you only want to joint together the text, not the comments, so need to break things open a little.
> There is a test case for the nested comments: python coqproc.py tests/minimal.v
> Do you see a way to do this?
> Thanks, > Jelle.
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
> > > you need to use [...] to join together the fragments, but you > >
> only want to joint together the text, not the comments, so need > > > to break things open a little. > > > > > > instead of nested_comment = Delayed() contents = (nested_comment > > > | ~Lookahead(both) & Any())[:] comment = start & contents & end > > > > Comment nested_comment += comment > > > > > > try something more like: nested_comment = Delayed() word = > > > (~Lookahead(both) & Any())[:,...] contents = (nested_comment | > > > word)[:] comment = start & contents & end > Comment > > > nested_comment += comment > > > > > > (i haven't tried it out, but that shoulf give you the right > > > idea). > > > > This is what I tried initially but the parser then becomes very > > inefficient... to the point where I kill it before it's done > > parsing a file. >
> is it slow parsing the tiny test case, or slow parsing some larger >
file? if the latter, can you show me the file that it is having > problems with? > > more generally, it could be that it contains an error (in which case > lepl is backtracking like crazy because it is stuck) or some problem > with the grammar that i can't see, but which might be fixed by > appropriate use of First().
Well, it had problems with all test files, Here is a simplified version of a parser that uses lookahead but shows the same problem.
---- #! /usr/bin/env python
from lepl import *
start = Literal('(*') end = Literal('*)')
# collect (as many as possible) symbols other than '(*' or '*)' word = (~Lookahead(start|end) & Any()) #[:,...]
parser = (start|end|word)[:].get_parse()
print parser("foo(*bar(*baz*)meh*)teh") ----
If you uncomment the #[:,...] bit, the parser hangs with 100% cpu.
I'm probably doing something stupid here, but I don't see what...
ok, i don't know for sure, but just looking at that raises the following huge red flag:
[:] means repeat any number of times from 0 upwards.
that means that x[:] will successfully match an empty string.
that means that x[:][:] will sit for ever, repeatedly matching empty strings, happy to be making progress.
now obviously you don't have anything that obviously wrong, but if uncommenting a [:] leads to 100% cpu then it suggests that somwhere else that thing (which can be empty) is being repeated, and the system is repetedly matching the empty string.
the usual fix is to change [:] to [1:] (and perhaps that should be the default when i next release a major release).
On Tue, Feb 07, 2012 at 08:16:02PM -0600, Jelle Herold wrote: > On 11/12/2011 04:41 PM, andrew cooke wrote: > > > > you need to use [...] to join together the fragments, but you
> > only want to joint together the text, not the comments, so need > > > > to break things open a little. > > > > > > instead of > nested_comment = Delayed() contents = (nested_comment > > > | > ~Lookahead(both) & Any())[:] comment = start & contents & end > > > > > Comment nested_comment += comment > > > > > > try something more > like: nested_comment = Delayed() word = > > > (~Lookahead(both) & > Any())[:,...] contents = (nested_comment | > > > word)[:] comment = > start & contents & end > Comment > > > nested_comment += comment > > > > > > > (i haven't tried it out, but that shoulf give you the right > > > > idea). > > > > This is what I tried initially but the parser > then becomes very > > inefficient... to the point where I kill it > before it's done > > parsing a file. > > > is it slow parsing the tiny test case, or slow parsing some > >larger > > file? if the latter, can you show me the file that it is having > > problems with? > > more generally, it could be that it contains an > error (in which case > lepl is backtracking like crazy because it is > stuck) or some problem > with the grammar that i can't see, but > which might be fixed by > appropriate use of First().
> Well, it had problems with all test files, > Here is a simplified version of a parser that uses lookahead but > shows the same problem.
> ---- > #! /usr/bin/env python
> from lepl import *
> start = Literal('(*') > end = Literal('*)')
> # collect (as many as possible) symbols other than '(*' or '*)' > word = (~Lookahead(start|end) & Any()) #[:,...]
> parser = (start|end|word)[:].get_parse()
> print parser("foo(*bar(*baz*)meh*)teh") > ----
> If you uncomment the #[:,...] bit, the parser hangs with 100% cpu.
> I'm probably doing something stupid here, but I don't see what...
> Any ideas?
> Thanks again, > Jelle.
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
> ok, i don't know for sure, but just looking at that raises the following huge > red flag:
> [:] means repeat any number of times from 0 upwards.
> that means that x[:] will successfully match an empty string.
> that means that x[:][:] will sit for ever, repeatedly matching empty strings, > happy to be making progress.
> now obviously you don't have anything that obviously wrong, but if > uncommenting a [:] leads to 100% cpu then it suggests that somwhere else that > thing (which can be empty) is being repeated, and the system is repetedly > matching the empty string.
> the usual fix is to change [:] to [1:] (and perhaps that should be the default > when i next release a major release).
Ahh! *slap* that's it... so, the following works, (and from here I should be able to fix the rest)
---- #! /usr/bin/env python
from lepl import *
start = Literal('(*') end = Literal('*)')
# collect (as many as possible) symbols other than '(*' or '*)' word = (~Lookahead(start|end)& Any()) #[1:,...]
> ok, i don't know for sure, but just looking at that raises the following huge > red flag: > [...]
So just for the record, should anyone be wondering, here is a parser that deals with nested comments, including empty ones, and dropping the delimiters.
Cheers!
#! /usr/bin/env python
from lepl import *
class Comment(Node): pass
start = Literal('(*') end = Literal('*)')
# a word is any number of symbols not '(*' or '*)' word = (~Lookahead(start|end) & Any())[1:,...]
On Tue, Feb 07, 2012 at 10:05:53PM -0600, Jelle Herold wrote: > On 02/07/2012 02:36 PM, andrew cooke wrote: > >ok, i don't know for sure, but just looking at that raises the following huge > >red flag: > >[...]
> So just for the record, should anyone be wondering, here is a parser > that deals with nested comments, including empty ones, and dropping > the delimiters.
> Cheers!
> #! /usr/bin/env python
> from lepl import *
> class Comment(Node): pass
> start = Literal('(*') > end = Literal('*)')
> # a word is any number of symbols not '(*' or '*)' > word = (~Lookahead(start|end) & Any())[1:,...]
> -- > You received this message because you are subscribed to the Google Groups "lepl" group. > To post to this group, send email to lepl@googlegroups.com. > To unsubscribe from this group, send email to lepl+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lepl?hl=en.