Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Nesting and +
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
wires  
View profile  
 More options Nov 10 2011, 4:19 pm
From: wires <jelleher...@gmail.com>
Date: Thu, 10 Nov 2011 13:19:35 -0800 (PST)
Local: Thurs, Nov 10 2011 4:19 pm
Subject: Nesting and +
Hello,

I want to parse nested comments like this:

(* aba
   (* boo *)
   baz
   (*bar*)
   foo
*)

So I made the following parser:

    # nested comments
    start    = Literal('(*')
    end      = Literal('*)')
    both     = Or(start, end)

    nested_comment = Delayed()
    contents = (nested_comment | ~Lookahead(both) & Any())[:]
    comment = start & contents & end > Comment
    nested_comment += comment

Which parses, but...

Node
 `- Comment
     +- u'(*'
     +- u' '
     +- u'a'
     +- u'b'
     +- u'a'
     +- u'\n'
     +- u' '
     +- u' '
     +- u' '
     +- Comment
     |   +- u'(*'
     |   +- u' '
     |   +- u'b'
     |   +- u'o'
     |   +- u'o'
     |   +- u' '
     |   `- u'*)'
     +- u'\n'
     +- u' '
     +- u' '
     +- u' '
     +- u'b'
     +- u'a'
     +- u'z'
     +- u'\n'
     +- u' '
     +- u' '
     +- u' '
     +- Comment
     |   +- u'(*'
     |   +- u'b'
     |   +- u'a'
     |   +- u'r'
     |   `- u'*)'
     +- u'\n'
     +- u' '
     +- u' '
     +- u' '
     +- u'f'
     +- u'o'
     +- u'o'
     +- u'\n'
     `- u'*)'

...that is not what I want. What is the easiest way to join the text
together into a single string? Like this:

Node
 `- Comment
     +- u'(*'
     +- u' aba\n   '
     +- Comment
     |   +- u'(*'
     |   +- u' boo '
     |   `- u'*)'
     +- u'\n   baz\n   '
     +- Comment
     |   +- u'(*'
     |   +- u'bar'
     |   `- u'*)'
     +- u'\n   foo\n'
     `- u'*)'

Thanks a lot!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options Nov 10 2011, 7:47 pm
From: andrew cooke <and...@acooke.org>
Date: Thu, 10 Nov 2011 16:47:13 -0800 (PST)
Local: Thurs, Nov 10 2011 7:47 pm
Subject: Re: Nesting and +

hi,

you need to use [...] to join together the fragments, but you only want to
joint together the text, not the comments, so need to break things open a
little.

instead of

    nested_comment = Delayed()

>     contents = (nested_comment | ~Lookahead(both) & Any())[:]
>     comment = start & contents & end > Comment
>     nested_comment += comment

try something more like:

  nested_comment = Delayed()
  word = (~Lookahead(both) & Any())[:,...]
  contents = (nested_comment | word)[:]
  comment = start & contents & end > Comment
  nested_comment += comment

(i haven't tried it out, but that shoulf give you the right idea).

andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jelle Herold  
View profile  
 More options Nov 12 2011, 5:19 pm
From: Jelle Herold <jelleher...@gmail.com>
Date: Sat, 12 Nov 2011 23:19:11 +0100
Local: Sat, Nov 12 2011 5:19 pm
Subject: Re: [LEPL] Re: Nesting and +

Hi Andrew,

Thanks for answering!

On Nov 11, 2011, at 1:47 AM, andrew cooke wrote:

This is what I tried initially but the parser then becomes very inefficient... to the point where I kill it before it's done parsing a file.

In case it is helpful the file can be found here:
https://github.com/0x01/coqproc/blob/master/coqproc.py#L49

There is a test case for the nested comments: python coqproc.py tests/minimal.v

Do you see a way to do this?

Thanks,
Jelle.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options Nov 12 2011, 5:41 pm
From: andrew cooke <and...@acooke.org>
Date: Sat, 12 Nov 2011 19:41:31 -0300
Local: Sat, Nov 12 2011 5:41 pm
Subject: Re: [LEPL] Re: Nesting and +

is it slow parsing the tiny test case, or slow parsing some larger file?  if
the latter, can you show me the file that it is having problems with?

more generally, it could be that it contains an error (in which case lepl is
backtracking like crazy because it is stuck) or some problem with the grammar
that i can't see, but which might be fixed by appropriate use of First().

andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jelle Herold  
View profile  
 More options Feb 7 2012, 9:16 pm
From: Jelle Herold <jelleher...@gmail.com>
Date: Tue, 07 Feb 2012 20:16:02 -0600
Local: Tues, Feb 7 2012 9:16 pm
Subject: Re: [LEPL] Re: Nesting and +

On 11/12/2011 04:41 PM, andrew cooke wrote:

>  > > you need to use [...] to join together the fragments, but you > >

 > only want to joint together the text, not the comments, so need > > >
to break things open a little. > > > > > > instead of nested_comment =
Delayed() contents = (nested_comment > > > | ~Lookahead(both) &
Any())[:] comment = start & contents & end > > > > Comment
nested_comment += comment > > > > > > try something more like:
nested_comment = Delayed() word = > > > (~Lookahead(both) &
Any())[:,...] contents = (nested_comment | > > > word)[:] comment =
start & contents & end > Comment > > > nested_comment += comment > > > >
 > > (i haven't tried it out, but that shoulf give you the right > > >
idea). > > > > This is what I tried initially but the parser then
becomes very > > inefficient... to the point where I kill it before it's
done > > parsing a file. >
>  is it slow parsing the tiny test case, or slow parsing some larger >

file? if the latter, can you show me the file that it is having >
problems with? > > more generally, it could be that it contains an error
(in which case > lepl is backtracking like crazy because it is stuck) or
some problem > with the grammar that i can't see, but which might be
fixed by > appropriate use of First().

Well, it had problems with all test files,
Here is a simplified version of a parser that uses lookahead but shows
the same problem.

----
#! /usr/bin/env python

from lepl import *

start = Literal('(*')
end = Literal('*)')

# collect (as many as possible) symbols other than '(*' or '*)'
word = (~Lookahead(start|end) & Any()) #[:,...]

parser = (start|end|word)[:].get_parse()

print parser("foo(*bar(*baz*)meh*)teh")
----

If you uncomment the #[:,...] bit, the parser hangs with 100% cpu.

I'm probably doing something stupid here, but I don't see what...

Any ideas?

Thanks again,
Jelle.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options Feb 7 2012, 3:36 pm
From: andrew cooke <and...@acooke.org>
Date: Tue, 7 Feb 2012 17:36:59 -0300
Local: Tues, Feb 7 2012 3:36 pm
Subject: Re: [LEPL] Re: Nesting and +

ok, i don't know for sure, but just looking at that raises the following huge
red flag:

 [:] means repeat any number of times from 0 upwards.

that means that x[:] will successfully match an empty string.

that means that x[:][:] will sit for ever, repeatedly matching empty strings,
happy to be making progress.

now obviously you don't have anything that obviously wrong, but if
uncommenting a [:] leads to 100% cpu then it suggests that somwhere else that
thing (which can be empty) is being repeated, and the system is repetedly
matching the empty string.

the usual fix is to change [:] to [1:] (and perhaps that should be the default
when i next release a major release).

andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jelle Herold  
View profile  
 More options Feb 7 2012, 9:48 pm
From: Jelle Herold <jelleher...@gmail.com>
Date: Tue, 07 Feb 2012 20:48:47 -0600
Local: Tues, Feb 7 2012 9:48 pm
Subject: Re: [LEPL] Re: Nesting and +
On 02/07/2012 02:36 PM, andrew cooke wrote:

Ahh! *slap* that's it... so, the following works,
(and from here I should be able to fix the rest)

----
#! /usr/bin/env python

from lepl import *

start = Literal('(*')
end = Literal('*)')

# collect (as many as possible) symbols other than '(*' or '*)'
word = (~Lookahead(start|end)&  Any()) #[1:,...]

parser = (start|end|word)[1:].get_parse()

print parser("foo(*bar(*baz*)meh*)teh")
----

Thanks Andrew!

Jelle.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jelle Herold  
View profile  
 More options Feb 7 2012, 11:05 pm
From: Jelle Herold <jelleher...@gmail.com>
Date: Tue, 07 Feb 2012 22:05:53 -0600
Local: Tues, Feb 7 2012 11:05 pm
Subject: Re: [LEPL] Re: Nesting and +
On 02/07/2012 02:36 PM, andrew cooke wrote:

> ok, i don't know for sure, but just looking at that raises the following huge
> red flag:
> [...]

So just for the record, should anyone be wondering, here is a parser
that deals with nested comments, including empty ones, and dropping the
delimiters.

Cheers!

#! /usr/bin/env python

from lepl import *

class Comment(Node): pass

start = Literal('(*')
end = Literal('*)')

# a word is any number of symbols not '(*' or '*)'
word = (~Lookahead(start|end) & Any())[1:,...]

nested_comment = Delayed()
contents = (nested_comment | word)[1:]
comment = ~start & (contents|Empty()) & ~end > Comment
nested_comment += comment

parser = nested_comment[1:].get_parse()

print parser("(*bar(*foo(*toz*)meh*)*)")[0]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
andrew cooke  
View profile  
 More options Feb 7 2012, 6:09 pm
From: andrew cooke <and...@acooke.org>
Date: Tue, 7 Feb 2012 20:09:31 -0300
Local: Tues, Feb 7 2012 6:09 pm
Subject: Re: [LEPL] Re: Nesting and +

thanks + glad it works, andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »