Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Parser trying to extend with a SyntaxNode subclass weirdness.
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  23 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
James Harton  
View profile  
 More options Feb 28 2012, 8:46 pm
From: James Harton <jamesot...@gmail.com>
Date: Wed, 29 Feb 2012 14:46:03 +1300
Local: Tues, Feb 28 2012 8:46 pm
Subject: Parser trying to extend with a SyntaxNode subclass weirdness.

Hi.

I'm throwing together a toy language at https://github.com/jamesotron/CrimsonScript - so far it has only a simple description of nil and integer literals:

    rule literal
       integer_literal / nil_literal <Literal>
    end

    rule integer_literal
      hex_integer_literal / octal_integer_literal / binary_integer_literal / decimal_integer_literal / zero_integer_literal <IntegerLiteral>
    end

    rule nil_literal
      "nil" <NilLiteral>
    end

    rule zero_integer_literal
      '-'? '0'
    end

    rule decimal_integer_literal
      '-'? [1-9] [0-9]*
    end

    rule binary_integer_literal
      '-'? '0b' [0-1]+
    end

    rule octal_integer_literal
      '-'? '0o' [0-7]+
    end

    rule hex_integer_literal
      '-'? '0x' [0-9a-fA-F]+
    end

However when I try and parse `0` I get:

    1.9.3-p0 :001 > Crimson::Parser.parse('0')
    TypeError: wrong argument type Class (expected Module)
    from (eval):72:in `extend'
    from (eval):72:in `_nt_integer_literal'
    from (eval):24:in `_nt_literal'
    from /Users/jnh/.rvm/gems/ruby-1.9.3-p0@crimsonscript/gems/treetop-1.4.10/lib/tr eetop/runtime/compiled_parser.rb:18:in `parse'
    from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:11:in `parse'
    from (irb):1
    from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'

I'm not sure what I'm doing wrong, but when I compile the grammar to ruby I can see that it's calling #extend on the result of  _nt_zero_integer_literal.  I'm not sure why that's happening.  Is this a bug?

--
James Harton
sociable.co.nz
@jamesotron
+64226803869


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Feb 28 2012, 10:02 pm
From: markus <mar...@reality.com>
Date: Tue, 28 Feb 2012 19:02:21 -0800
Local: Tues, Feb 28 2012 10:02 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Simple answer: IntegerLiteral, NilLiteral, etc. need to be modules, and
you've most likely defined them as classes.

More complex answers are available if needed.  :)

-- MarkusQ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Douglas Camata  
View profile  
 More options Feb 28 2012, 11:37 pm
From: Douglas Camata <d.cam...@gmail.com>
Date: Tue, 28 Feb 2012 20:37:19 -0800 (PST)
Local: Tues, Feb 28 2012 11:37 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

One thing I've learned in my little experience with treetop (maybe I'm
doing it the wrong way, but it worked for me), when you define a rule
that's just an aggregation of other two, like some kind of inheritance, you
must use a parenthesis to define the class for it. For example, let's fix
your rule integer_literal:

    rule integer_literal
      (hex_integer_literal / octal_integer_literal / binary_integer_literal
/ decimal_integer_literal / zero_integer_literal) <IntegerLiteral>
    end

Try it and post some feedback, as an treetop apprentice, I'd like to share
experiences.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Clifford Heath  
View profile  
 More options Feb 29 2012, 12:47 am
From: Clifford Heath <clifford.he...@gmail.com>
Date: Wed, 29 Feb 2012 16:47:45 +1100
Local: Wed, Feb 29 2012 12:47 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
Douglas is quite correct. The class specification for a rule binds tighter than the alternation "/" operator:

rule foo
        a <A_In_Foo> / b <B_In_Foo>
end

rule a
        something <A>
end

rule b
        something_else <B>
end

In the case where "foo" matches "something_else", the SyntaxNode for "something_else"
will have two mixed-in modules (both "B" and "B_In_Foo"). Use parentheses around a list
of alternates if you want to mix in the same module regardless of which possibility is matched.

Clifford Heath.

On 29/02/2012, at 3:37 PM, Douglas Camata wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Feb 29 2012, 1:03 am
From: markus <mar...@reality.com>
Date: Tue, 28 Feb 2012 22:03:20 -0800
Local: Wed, Feb 29 2012 1:03 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

> when you define a rule that's just an aggregation of other two, like
> some kind of inheritance, you must use a parenthesis to define the
> class for it.

Good catch; this would likely be the next snag he'd hit, once the
class/module problem is cleared up (the "0" case doesn't hit it because
it's the last alternative, but anything else would).

-- MarkusQ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Harton  
View profile  
 More options Feb 29 2012, 3:30 am
From: James Harton <jamesot...@gmail.com>
Date: Wed, 29 Feb 2012 21:30:37 +1300
Local: Wed, Feb 29 2012 3:30 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Thanks folks for the feedback and explanation. I've switched over to modules so at least it's stopped complaining about that.
I've also wrapped the rules for interger_literal and literal in parens.
I'm still a little confused about the fact that the parser still refuses to recognise any of my patterns:

1.9.3p0 :001 > Crimson::Parser.parse('12345')
Exception: Parse error: Expected one of -, 0x, 0o, 0b at line 1, column 1 (byte 1) after .
from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse'
from (irb):1
from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
1.9.3p0 :002 > Crimson::Parser.parse('nil')
Exception: Parse error: Expected one of -, 0x, 0o, 0b, 0 at line 1, column 1 (byte 1) after .
from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse'
from (irb):2
from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'

--
James Harton
sociable.co.nz
@jamesotron
+64226803869


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Wilden  
View profile  
 More options Feb 29 2012, 8:55 am
From: Mark Wilden <m...@mwilden.com>
Date: Wed, 29 Feb 2012 05:55:15 -0800
Local: Wed, Feb 29 2012 8:55 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

On Wed, Feb 29, 2012 at 12:30 AM, James Harton <jamesot...@gmail.com> wrote:
> I'm still a little confused about the fact that the parser still refuses to
> recognise any of my patterns:

This is just general advice: Take a step back. Can you get some
example code to work? If so, you might want to try morphing the
example into the code you want, step by step. The other approach is to
start with the barest possible grammar, get that to work, and
gradually add on to it.

What is unproductive, in my experience, is to write a bunch of code,
throw it at the system, and debug the error messages you get. Try to
start from a happy place, and change that to where you want to get to.

This isn't quite as off-topic as it might seem, as I've found that
parsing (with Citrus rather than Treetop) really requires a go-slow
approach.

///ark
Web Applications Developer
California Academy of Sciences


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Feb 29 2012, 10:37 am
From: markus <mar...@reality.com>
Date: Wed, 29 Feb 2012 07:37:32 -0800
Local: Wed, Feb 29 2012 10:37 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

When I make those modifications to a grammar consisting of the rules you
originally posted, it appears to work:

irb(main):009:0> SimpleParser.new.parse('12345')
=> SyntaxNode+Literal+IntegerLiteral+DecimalIntegerLiteral0 offset=0,
"12345":
  SyntaxNode offset=0, ""
  SyntaxNode offset=0, "1"
  SyntaxNode offset=1, "2345":
    SyntaxNode offset=1, "2"
    SyntaxNode offset=2, "3"
    SyntaxNode offset=3, "4"
    SyntaxNode offset=4, "5"

Did you perhaps change anythings else since then?

-- M


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile   Translate to Translated (View Original)
 More options Jun 27 2012, 4:02 pm
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Wed, 27 Jun 2012 13:02:26 -0700 (PDT)
Local: Wed, Jun 27 2012 4:02 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

One motivation to use subclasses is the approach
in http://thingsaaronmade.com/blog/a-quick-intro-to-writing-a-parser-usi....

It allows to clean the parse tree such that it only has the semantically
relevant information and no longer keep the syntactical sugar.

Am Mittwoch, 29. Februar 2012 09:30:37 UTC+1 schrieb James Harton:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Clifford Heath  
View profile   Translate to Translated (View Original)
 More options Jun 27 2012, 7:00 pm
From: Clifford Heath <clifford.he...@gmail.com>
Date: Thu, 28 Jun 2012 09:00:01 +1000
Local: Wed, Jun 27 2012 7:00 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
On 28/06/2012, at 6:02 AM, Bernhard wrote:

> One motivation to use subclasses is …
> to clean the parse tree such that it only has the semantically relevant information and no longer keep the syntactical sugar.

This is a really bad idea in general. PEG parsers produce
lots of garbage - all the hidden memorised attempts at
almost every character. If you keep the parse tree, whether
it's semantically-structured or not, you keep the garbage.

Sometimes a parse tree is like your semantic model, as in
the s-expressions example, but usually you will write a better
program if you design a semantic model first, then construct
that *after* parsing your input.

Clifford Heath.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jun 28 2012, 4:29 am
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Thu, 28 Jun 2012 01:29:27 -0700 (PDT)
Local: Thurs, Jun 28 2012 4:29 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

On Thursday, June 28, 2012 1:00:01 AM UTC+2, Clifford Heath wrote:

> On 28/06/2012, at 6:02 AM, Bernhard wrote:
> > One motivation to use subclasses is …
> > to clean the parse tree such that it only has the semantically relevant
> information and no longer keep the syntactical sugar.

> This is a really bad idea in general. PEG parsers produce
> lots of garbage - all the hidden memorised attempts at
> almost every character. If you keep the parse tree, whether
> it's semantically-structured or not, you keep the garbage.

I fully agree with this. This is the reason why I want to *clean* the parse
tree as it was proposed by Aaron in
http://thingsaaronmade.com/blog/a-quick-intro-to-writing-a-parser-usi...
.

> Sometimes a parse tree is like your semantic model, as in
> the s-expressions example, but usually you will write a better
> program if you design a semantic model first, then construct
> that *after* parsing your input.

Yes, this is the usual approach. I am still struggling with finding an
appropriate generic strategy to transform the parse tree to my semantic
model. I would like to construct my semantic model and populate it by
retrieving the relevant information from the parse tree (sometimes called
the target driven transformation). But this is very difficult.

But querying the parse tree is also not recommended. Therefore, I don't
really know how to proceed.

One approach I used in another system was, to annotate the grammar properly
to distinguish, which parts

   - should be represented in the result tree as (DOM-Nodes resp.
   XML-Elements). In this case, the rule-name is the name of the XML-Element
   - should be ignored
   - should be XML-Attributes - in this case the rule-Name is the name of
   the XML-Attribute.

My task is to query tons of LaTeX - Files in order to generate particular
extracts in XML.

any suggestion is highly welcome.

Bernhard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Clifford Heath  
View profile  
 More options Jun 28 2012, 4:49 am
From: Clifford Heath <clifford.he...@gmail.com>
Date: Thu, 28 Jun 2012 18:49:06 +1000
Local: Thurs, Jun 28 2012 4:49 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
On 28/06/2012, at 6:29 PM, Bernhard wrote:

> This is the reason why I want to clean the parse tree as it was proposed by Aaron in http://thingsaaronmade.com/blog/a-quick-intro-to-writing-a-parser-usi....

It's pretty ugly digging around inside bits of the Treetop structures
assuming you know how to clean it. In fact, I think it's unlikely that
the example code frees very much garbage anyway.

> I am still struggling with finding an appropriate generic strategy to transform the parse tree to my semantic model.

The normal preferred way is to ask the parse tree to descend itself
and build whatever answer you require. Whether you use custom
classes or just methods depends on how hard that is.

> I would like to construct my semantic model and populate it by retrieving the relevant information from the parse tree (sometimes called the target driven transformation). But this is very difficult.

Right, what I said. Why is it so difficult in your case?

> My task is to query tons of LaTeX - Files in order to generate particular extracts in XML.

I think the only engine that can properly parse TeX is TeX - since
you can define new grammar rules that modify the TeX parser.
But if you assume that isn't being done… well, it's still pretty hard.
LaTeX is a real challenge. I admire you for attempting it.

Clifford Heath.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jul 27 2012, 4:59 pm
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Fri, 27 Jul 2012 13:59:55 -0700 (PDT)
Local: Fri, Jul 27 2012 4:59 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Am Donnerstag, 28. Juni 2012 10:49:06 UTC+2 schrieb Clifford Heath:

The difficulty in my case is the fact that treetop adds nodes where I did
not expect them
and does not provide nodes where I expect them.

I switched to a simpler task  to grep some patterns out of markdown file.

My main objective is, to find a generic approach how to solve such kind of
issues.

With this grammar

grammar TraceInMarkdown

   rule top
       document
   end

   rule document
      ( (noMarkupText / trace / markupAbort)* )
   end

   rule noMarkupText
      [^\[]+  
   end

   rule markupAbort
      "["    
   end

rule trace
traceId s? traceHead s? traceBody traceUpLink  
end
 rule traceId
   "[" label "]"
end
 rule label
[a-zA-Z]+ "_" [a-zA-Z]+ "_" [0-9]+
end
 rule traceHead
 '**' (!'*' . / '\*')+ '**'
end
 rule traceBody
   "{" (nestedBody / [^{}])+ "}"
end
 rule nestedBody
   "{" (nestedBody / [^{}])+ "}"
end
 rule traceUpLink
    "(" (","? s? label)* ")"
end

   rule s
      [ \t]+  
   end
end

applied to
text="text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum}
[nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "

should deliver an xml - file such as:

<top>
 <document>
     <noMarkupText>text 1</noMarkupText>
     <trace>
        <traceId>foo_bar_1234</traceId>
        <traceHead>** lorem ipsum header **</traceHead>
        <traceBody>{lorem <nestedBody>ipsum<nestedBody> [nothing]

}</traceBody>

etc.

But e.g. The parse tree does not deliver a node for document.

SyntaxNode offset=0, "...234) text2 [ text3] ":
 SyntaxNode offset=0, "text 1 ":
   SyntaxNode offset=0, "t"
   SyntaxNode offset=1, "e"
   SyntaxNode offset=2, "x"
   SyntaxNode offset=3, "t"
   SyntaxNode offset=4, " "
   SyntaxNode offset=5, "1"
   SyntaxNode offset=6, " "
 SyntaxNode+Trace0 offset=7, "..._1234, bar_foo_1234)"
(traceUpLink,traceId,traceHead,traceBody):
   SyntaxNode+TraceId0 offset=7, "[foo_bar_1234]" (label):
     SyntaxNode offset=7, "["
     SyntaxNode+Label0 offset=8, "foo_bar_1234":
       SyntaxNode offset=8, "foo":
         SyntaxNode offset=8, "f"
         SyntaxNode offset=9, "o"
         SyntaxNode offset=10, "o"
       SyntaxNode offset=11, "_"
       SyntaxNode offset=12, "bar":
         SyntaxNode offset=12, "b"
         SyntaxNode offset=13, "a"
         SyntaxNode offset=14, "r"
       SyntaxNode offset=15, "_"
       SyntaxNode offset=16, "1234":
         SyntaxNode offset=16, "1"
         SyntaxNode offset=17, "2"
         SyntaxNode offset=18, "3"
         SyntaxNode offset=19, "4"
     SyntaxNode offset=20, "]"
   SyntaxNode offset=21, " ":
     SyntaxNode offset=21, " "
   SyntaxNode+TraceHead1 offset=22, "...orem ipsum header **":

As you see the root of the SyntaxTree does not provide "document".
At the same time. there is no "trace" ...

In order to keep the business logic out of the grammar,
I tried to extend SyntaxNode with helper Methods, which should
provide what I want. Methods such as "child" which should deliver
the child nodes as specified in the production rule.

But even if I am at a current node, I can hardly figure out
the corresponding rule.

I also had an approach in which each rule had a method
delivering the related information. But there were
additional nodes which did not correspond to a rule
and therefore raised NoMethodError Exceptions.

I somehow could hack it to make it work, but it is pure try and error
without "knowing what I am doing".

> > My task is to query tons of LaTeX - Files in order to generate
> particular extracts in XML.

> I think the only engine that can properly parse TeX is TeX - since
> you can define new grammar rules that modify the TeX parser.
> But if you assume that isn't being done… well, it's still pretty hard.
> LaTeX is a real challenge. I admire you for attempting it.

Yes, difficult enough. But I want to grep out some stuff from LaTeX files.

I therefore do not need a full blown LaTeX-Parser.

My problem as of now is, that I am struggling with Treetop _and_ with Latex.

so I went to the simpler task with markdown.

Bernhard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Jul 27 2012, 6:14 pm
From: markus <mar...@reality.com>
Date: Fri, 27 Jul 2012 15:14:13 -0700
Local: Fri, Jul 27 2012 6:14 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
B --

> >> I would like to construct my semantic model and populate it by
> retrieving the relevant information from the parse tree (sometimes
> called the target driven transformation). But this is very difficult.

> > Right, what I said. Why is it so difficult in your case?

> The difficulty in my case is the fact that treetop adds nodes where I
> did not expect them and does not provide nodes where I expect them.

I'm not sure how helpful this will be, but I'll offer it in the hopes
that it at least provides some insight.

It feels as if you are trying to solve the wrong problem, and in the
process making things harder on yourself than need be.  Specifically, it
sounds as if you are trying to have treetop produce the results you want
(and thus you care about where it does/does not produce nodes) rather
than having it produce the AST and then having the AST produce the
results you want.

This is a subtle distinction, but it has powerful consequences, because
it breaks the coupling between the structure of the grammar and the
structure of the results..

I'll try to walk you to the place where I think you ought to be starting
by going through a series of not-right-but-at-least-less-wrong stages.
First, with your grammar in g.tt I write a test rig like so:

require "treetop"

text="text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum}
[nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "

p Treetop.load('g.tt').new.parse(text)

This produces the AST, as you showed in your e-mail.  But now rather
than having the AST as the output, suppose I define a property "as_xml"
on each node, with a (clearly incorrect) default implementation, and
print that as my result, like so:

require "treetop"

text="text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum}
[nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "

class Treetop::Runtime::SyntaxNode
  def as_xml
    if elements
      elements.map { |e| e.as_xml }.join
    else
      text_value
    end
  end
end

p Treetop.load('g.tt').new.parse(text).as_xml

Now the output is just the original input string, put it is being
produced by parsing the input, producing an AST, which we then walk,
reproducing the source..

We can recover some of the structure by adding the option of tagging the
xml like so (this isn't intended as an example of great code, just a
conceptual exercise to lead you to the easier way of thinking of
things):

class Treetop::Runtime::SyntaxNode
  def as_xml
    if elements
      elements.map { |e| e.as_xml }.join
    else
      text_value
    end
  end
  def wrap(tag,body)
    "<#{tag}>#{body}</#{tag}>"
  end
end

...and then mark up the grammar to use it:

   rule top
     document { def as_xml; wrap('top',super); end }
   end

   rule document
      (noMarkupText / trace / markupAbort)* { def as_xml;
wrap('document',super); end }
   end

   rule noMarkupText
      [^\[]+ { def as_xml; wrap('noMarkupText',super); end }
   end

...which would give us:

"<top><document><noMarkupText>text 1 </noMarkupText>[foo_bar_1234] **
lorem ipsum header ** {lorem {ipsum} [nothing] }(foo_bar_1234,
bar_foo_1234)<noMarkupText> text2 </noMarkupText>[<noMarkupText> text3]
</noMarkupText></document></top>"

...which is starting to look at least somewhat like your goal:

> <top>
>   <document>
>       <noMarkupText>text 1</noMarkupText>
>       <trace>
>          <traceId>foo_bar_1234</traceId>
>          <traceHead>** lorem ipsum header **</traceHead>
>          <traceBody>{lorem <nestedBody>ipsum<nestedBody> [nothing] }</traceBody>

> etc.

You could get the rest of the way (up to indentation) by decorating the
rest of the grammar, and could even do indentation by fussing with the
contents.

But there's nothing saying these methods have to produce strings; you
could just as well produce a tree of objects that has the structure you
want, and have them emit themselves as xml on demand, or have them
update a dictionary, etc.  Or, going the other way (as in the obligatory
calculator example) you could have the methods produce a single number.

The key is to decouple the AST and the result by adding methods to the
syntax tree, so that the structure of one isn't bound to the structure
of the other.

I hope that helps.

-- MarkusQ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jul 27 2012, 7:02 pm
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Fri, 27 Jul 2012 16:02:44 -0700 (PDT)
Local: Fri, Jul 27 2012 7:02 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Markus,

thanks a lot for your help.

On Saturday, July 28, 2012 12:14:13 AM UTC+2, Markus wrote:

> B --

> I'm not sure how helpful this will be, but I'll offer it in the hopes
> that it at least provides some insight.

It _is_ helpful. Thanks a lot.

> It feels as if you are trying to solve the wrong problem, and in the
> process making things harder on yourself than need be.  Specifically, it
> sounds as if you are trying to have treetop produce the results you want
> (and thus you care about where it does/does not produce nodes) rather
> than having it produce the AST and then having the AST produce the
> results you want.

This impression might come up, but it is not my intention to have treetop
produce the intended results. My Intention is indeed to get a clean AST as
easy as possible.

> This is a subtle distinction, but it has powerful consequences, because
> it breaks the coupling between the structure of the grammar and the
> structure of the results..

Yes, fully agree with this.

> I'll try to walk you to the place where I think you ought to be starting
> by going through a series of not-right-but-at-least-less-wrong stages.
> First, with your grammar in g.tt I write a test rig like so:

> thanks.

> require "treetop"

> text="text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum}
> [nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "

> p Treetop.load('g.tt').new.parse(text)

yes this is the same as my test rig.

ok so far. I get the same and I do understand the approach.

Here we are at the problem: I annotated this single rule as you
describe. And I receive

(rdb:1) p r2.as_xml
"text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum} [nothing]

}(foo_bar_1234, bar_foo_1234) text2 [ text3] "

no tag for documentation. The reason is, that the parse tree does not have
a node here. And this is
what drives me to desperation: Why not? This is one of the cases where TT
does not behave
intuitively.

>    rule noMarkupText
>       [^\[]+ { def as_xml; wrap('noMarkupText',super); end }
>    end

> ...which would give us:

> "<top><document><noMarkupText>text 1 </noMarkupText>[foo_bar_1234] **
> lorem ipsum header ** {lorem {ipsum} [nothing] }(foo_bar_1234,
> bar_foo_1234)<noMarkupText> text2 </noMarkupText>[<noMarkupText> text3]
> </noMarkupText></document></top>"

I tried it and the result is:

(rdb:1) p r2.as_xml
"<noMarkupText>text 1 </noMarkupText>[foo_bar_1234] ** lorem ipsum header
** {lorem {ipsum} [nothing] }(foo_bar_1234, bar_foo_1234)<noMarkupText>
text2 </noMarkupText>[<noMarkupText> text3] </noMarkupText>"

if the thing with the document rule was not there, I would not be in
trouble, since then there is a clear strategy. But now what erver I define
in the rule "documentation" it will not
be used.

> ...which is starting to look at least somewhat like your goal:

yes

> You could get the rest of the way (up to indentation) by decorating the
> rest of the grammar, and could even do indentation by fussing with the
> contents.

yes .. if all notes were really delivered.

> But there's nothing saying these methods have to produce strings; you
> could just as well produce a tree of objects that has the structure you
> want, and have them emit themselves as xml on demand, or have them
> update a dictionary, etc.  Or, going the other way (as in the obligatory
> calculator example) you could have the methods produce a single number.

I finally would like to push the stuff to nokogiri in order to transform
and eventually serialize it to the intended output format.

> The key is to decouple the AST and the result by adding methods to the
> syntax tree, so that the structure of one isn't bound to the structure
> of the other.

Yes ... but hopefully you see my problem with TT on this way. BTW, I deal
with model
transformations for long time. This is the reason why I started with TT
even in these
tasks.

> I hope that helps.

oh yes, thanks a lot.

-- B.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Clifford Heath  
View profile  
 More options Jul 27 2012, 7:11 pm
From: Clifford Heath <clifford.he...@gmail.com>
Date: Sat, 28 Jul 2012 09:11:12 +1000
Local: Fri, Jul 27 2012 7:11 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
On 28/07/2012, at 9:02 AM, Bernhard wrote:

> Here we are at the problem: I annotated this single rule as you
> describe. And I receive

> (rdb:1) p r2.as_xml
> "text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum} [nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "

> no tag for documentation. The reason is, that the parse tree does not have a node here. And this is
> what drives me to desperation: Why not?

You do get a node for 'document', but not for 'top', since that only contains one sub-rule.

When a rule contains only a single node, no new node is created.
Any code block associated is put into a module which is mixed in
to the existing node from the sub-rule.

If you defined "top" like this you'd get a new node which is a sequence of two nodes

rule top
  document ''
end

This has the null rule '' which matches without consuming any characters.
It means that 'top' is defined as a sequence, so must create a new node.

Clifford Heath.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jul 27 2012, 7:18 pm
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Fri, 27 Jul 2012 16:18:49 -0700 (PDT)
Local: Fri, Jul 27 2012 7:18 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Does not work for me.

changed

    rule top
        document '' { def as_xml; wrap('top',super); end }
    end

    rule document
       ( (noMarkupText / trace / markupAbort)* )
        { def as_xml;
            wrap('document',super);
          end
        }
    end

u see still no "document"

the result is

"<top><noMarkupText>text 1 </noMarkupText>[foo_bar_1234] ** lorem ipsum
header ** {lorem {ipsum} [nothing] }(foo_bar_1234,
bar_foo_1234)<noMarkupText> text2 </noMarkupText>[<noMarkupText> text3]
</noMarkupText></top>"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Jul 27 2012, 7:30 pm
From: markus <mar...@reality.com>
Date: Fri, 27 Jul 2012 16:30:18 -0700
Local: Fri, Jul 27 2012 7:30 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
B --

Going back to the test rig that just dumps the AST, I see:

SyntaxNode+Top0+Document0 offset=0, "...234) text2 [ text3] ":
  SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
    SyntaxNode offset=0, "t"
    SyntaxNode offset=1, "e"
    SyntaxNode offset=2, "x"

...which, as I would expect, has the document and top nodes merged;
which is to say, their is one SyntaxNode and it's getting both modules
added to it, with top being "outer" and document being "inner".

If you want, you can force there to be two nodes by adding a null string
to top (so that it isn't just a synonym for document)

   rule top
     document '' { def as_xml; wrap('top',super); end }
   end

...which should give you:

SyntaxNode+Top1+Top0 offset=0, "...234) text2 [ text3] " (document):
  SyntaxNode+Document0 offset=0, "...234) text2 [ text3] ":
    SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
      SyntaxNode offset=0, "t"
      SyntaxNode offset=1, "e"
      SyntaxNode offset=2, "x"

...but before you do that, did you try marking up both top and document
as I showed?  When I do that (without adding a '' to force top to get
its own node) I see the generated xml as I pasted it.  In ruby, super
will go up the module chain, so for instance:

module A
  def foo
     :foo
  end
end

module B
  def foo
    [super,super]
  end
end

class C
  include A
  include B
  def foo
    super.to_s.reverse
  end
end

p C.new.foo

...produces:

"]oof: ,oof:["

...:foo from A, doubled by B, and converted to a string & reversed by C.

In case it matters, I'm using ruby 1.9.2 & Treetop v1.4.10.

-- M


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jul 28 2012, 9:39 am
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Sat, 28 Jul 2012 06:39:46 -0700 (PDT)
Local: Sat, Jul 28 2012 9:39 am
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

On Saturday, July 28, 2012 1:30:18 AM UTC+2, Markus wrote:

> Going back to the test rig that just dumps the AST, I see:

> SyntaxNode+Top0+Document0 offset=0, "...234) text2 [ text3] ":
>   SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
>     SyntaxNode offset=0, "t"
>     SyntaxNode offset=1, "e"
>     SyntaxNode offset=2, "x"

amazing. I do not get this. I am using ruby 1.8.7. I tried it with 1.9.2 as
well but not difference. By whatever reason I cannot make ruby-debug work
on my Mountain Lion Mac, so I returned to 1.8.7

I crated a gist for the two cases such that you can see the results

https://gist.github.com/3192266/d1d0244b56f3770005b9026279a7e9d93f47d949  -
the one with the null srings
https://gist.github.com/3192266/2b2f2bbe36df9dbb3b211f987deddc3a4453a562  
- the one without the null strings

you can switch the revisions also on the github page and observe the
differences.

you see that the parsetree of the one without the null strings does not
reflect the rule "documentation"

SyntaxNode+Top0 offset=0, "...234) text2 [ text3] ":
  SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
    SyntaxNode offset=0, "t"
    SyntaxNode offset=1, "e"
    SyntaxNode offset=2, "x"
    SyntaxNode offset=3, "t"
    SyntaxNode offset=4, " "
    SyntaxNode offset=5, "1"
    SyntaxNode offset=6, " "
  SyntaxNode+Trace0 offset=7, "..._1234, bar_foo_1234)" (traceHead,traceBody,traceUpLink,traceId):
    SyntaxNode+TraceId0 offset=7, "[foo_bar_1234]" (label):

> ...which, as I would expect, has the document and top nodes merged;
> which is to say, their is one SyntaxNode and it's getting both modules
> added to it, with top being "outer" and document being "inner".

With your advice to use "super" in a generic method as_xml, it would pave a
way for me. thanks.

Did not work either :-(

I tried it with both. There was no difference.

What I do not like (but certainly would accept as long as the stuff works)
is the
stereotypical block which I have to add on every node, which does not really
add further information. If SyntaxTree would provide a method to retrieve
the
name of the rule, It could all be done in the generic method as_xml.

I tried it as follows. It seems to work well, even if it is somehow
heuristic.

    # indicates if a meaningful name for the node in the AST
    # is available

    def has_rule_name?
        not (extension_modules.nil? or extension_modules.empty?)
    end

    # returns a meaning name for the node in the AST
    def rule_name
        if has_rule_name? then
            extension_modules.first.name.split("::").last.gsub(/[0-9]/,"")
            else
            "###"
        end
    end

 Bernhard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Jul 28 2012, 2:03 pm
From: markus <mar...@reality.com>
Date: Sat, 28 Jul 2012 11:03:22 -0700
Local: Sat, Jul 28 2012 2:03 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
B --

>         SyntaxNode+Top0+Document0 offset=0, "...234) text2 [ text3]
>         ":
>           SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
>             SyntaxNode offset=0, "t"
>             SyntaxNode offset=1, "e"
>             SyntaxNode offset=2, "x"

> amazing. I do not get this. I am using ruby 1.8.7. I tried it with
> 1.9.2 as well but not difference. By whatever reason I cannot make
> ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7

I think you're right that this is the core difference.  What version of
treetop are you using?  I'm not aware of any version that should do what
you are seeing, but I don't track the details too closely.

> I crated a gist for the two cases such that you can see the results

> https://gist.github.com/3192266/d1d0244b56f3770005b9026279a7e9d93f47d949  - the one with the null srings

> https://gist.github.com/3192266/2b2f2bbe36df9dbb3b211f987deddc3a4453a562   - the one without the null strings

Hmm.  I think you switched the descriptions.

So the one without the null string doesn't generate separate nodes for
top and document (which it shouldn't) but it also doesn't accrete  their
modules onto the shared node (which it should).

But with the null string, it looks as if everything is working as
expected.  From your second gist:

SyntaxNode+Top1+Top0 offset=0, "...234) text2 [ text3] " (document):
  SyntaxNode+Document1+Document0 offset=0, "...234) text2 [ text3] ":
    SyntaxNode offset=0, "...234) text2 [ text3] ":
      SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
        SyntaxNode offset=0, "t"
        SyntaxNode offset=1, "e"

This looks like we want, with an extra bonus node between document and
noMarkupText because you also put a '' inside the definition of
document.

The as_xml stuff should work fine on this tree.

> you see that the parsetree of the one without the null strings does
> not reflect the rule "documentation"

Yeah.  My only idea is that it's a treetop version difference, but
that's just an unfounded guess.

>         SyntaxNode+Top1+Top0 offset=0, "...234) text2 [ text3]
>         " (document):
>           SyntaxNode+Document0 offset=0, "...234) text2 [ text3] ":
>             SyntaxNode+NoMarkupText0 offset=0, "text 1 ":
>               SyntaxNode offset=0, "t"
>               SyntaxNode offset=1, "e"
>               SyntaxNode offset=2, "x"

> Did not work either :-(

Actually, from the gist

https://gist.github.com/3192266/2b2f2bbe36df9dbb3b211f987deddc3a4453a562

it appears that it did.

> What I do not like (but certainly would accept as long as the stuff
> works) is the stereotypical block which I have to add on every node,
> which does not really add further information.

I was giving you the simplest to implement/understand version, but there
are several things you can do to declutter the grammar once you get it
working.

For example, if you change your helpers to look like so:

class Treetop::Runtime::SyntaxNode
  def as_xml
    if elements
      elements.map { |e| e.as_xml }.join
    else
      text_value
    end
  end
end

def wrap_with(tag)
  define_method(:as_xml) { "<#{tag}>#{super()}</#{tag}>" }
end

...then the grammar rule annotations can be correspondingly reduced:

   rule top
     document '' { wrap_with 'top' }
   end

   rule document
      (noMarkupText / trace / markupAbort)*
       { wrap_with 'document' }
   end

   rule noMarkupText
      [^\[]+ { wrap_with 'noMarkupText' }
   end

which is far less obtrusive.  As it stands, it is still redundant, in
that the tags names are always the same as the rule names, but is really
an artifact of how we got here.  The rule names should really reflect
the grammatical constructs they define, not the tags that are used to
mark up the output.

Consider a somewhat contrived example:

  rule literal
    digit+ '.' digit+ { wrap_with 'float' } /
    digit             { wrap_with 'integer' } /
    '"' [^"]* '"'     { wrap_with 'string' }
  end

The rule name (literal) describes their grammatical role, while the
tagging deals with their semantics (we'd probably want to call it
something other than wrap_with in this case).  There's no reason the two
should be bound together, and good reasons for keeping them separate--as
in this example, a single rule might be capable of representing things
with differing semantics and different rules might likewise be capable
of producing things with the same semantics.

> If SyntaxTree would provide a method to retrieve the
> name of the rule, It could all be done in the generic method as_xml.

Yeah, but (IMHO) you really don't want to go there.  :)

-- M


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bernhard  
View profile  
 More options Jul 28 2012, 2:44 pm
From: Bernhard <bernhard.weic...@googlemail.com>
Date: Sat, 28 Jul 2012 11:44:40 -0700 (PDT)
Local: Sat, Jul 28 2012 2:44 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.

Hi

On Saturday, July 28, 2012 8:03:22 PM UTC+2, Markus wrote:

> > amazing. I do not get this. I am using ruby 1.8.7. I tried it with
> > 1.9.2 as well but not difference. By whatever reason I cannot make
> > ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7

> I think you're right that this is the core difference.  What version of
> treetop are you using?  I'm not aware of any version that should do what
> you are seeing, but I don't track the details too closely.

nor do I, but as it does not seem to be the root cause, I stick with 1.8.7

> > I crated a gist for the two cases such that you can see the results

> > https://gist.github.com/3192266/d1d0244b56f3770005b9026279a7e9d93f47d949 - the one with the null srings

> > https://gist.github.com/3192266/2b2f2bbe36df9dbb3b211f987deddc3a4453a562  - the one without the null strings

> Hmm.  I think you switched the descriptions.

grr. sorry.

Yes it does, and I guess, I continue this path.

> you see that the parsetree of the one without the null strings does
> > not reflect the rule "documentation"

> Yeah.  My only idea is that it's a treetop version difference, but
> that's just an unfounded guess.

I am using treetop 1.4.10, this is what "gem install treetop" gave me

Yes, you are right, even if in most cases this is the same.
But triggered by you Idea, I could use "wrap_with" not as
a wrap instruction but as semantic notation.

I then could even use this to query the parse tree, If want
to implement something which picks particluar nodes.

A so called Targetdriven transformation builds the target
model not by traversing the source tree but by querying
the source model.

I guess you really helped me out.

the literal example is strong. So I agree.

Thanks Markus, you helped me a lot.

--B


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
markus  
View profile  
 More options Jul 28 2012, 3:32 pm
From: markus <mar...@reality.com>
Date: Sat, 28 Jul 2012 12:32:41 -0700
Local: Sat, Jul 28 2012 3:32 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
B --

> But triggered by you Idea, I could use "wrap_with" not as
> a wrap instruction but as semantic notation.
> I then could even use this to query the parse tree, If want
> to implement something which picks particluar nodes.

> A so called Targetdriven transformation builds the target
> model not by traversing the source tree but by querying
> the source model.

Yeah, I like that line of thinking.

> Thanks Markus, you helped me a lot.

Glad I was able to, & best wishes going forward.  If you get stuck
again, don't hesitate to ping the list.  We aren't always able to help,
but we try to always try.  :)

-- M


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Clifford Heath  
View profile  
 More options Jul 28 2012, 7:34 pm
From: Clifford Heath <clifford.he...@gmail.com>
Date: Sun, 29 Jul 2012 09:34:34 +1000
Local: Sat, Jul 28 2012 7:34 pm
Subject: Re: Parser trying to extend with a SyntaxNode subclass weirdness.
On 28/07/2012, at 11:39 PM, Bernhard wrote:

> By whatever reason I cannot make ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7

Install the 'debugger' gem - it's a modified version of ruby-19 that works.

It's a mess, but this worked for me. Pry does do, and is supposed to be pretty
good, but I haven't made the switch.

Clifford Heath.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »