rule decimal_integer_literal '-'? [1-9] [0-9]* end
rule binary_integer_literal '-'? '0b' [0-1]+ end
rule octal_integer_literal '-'? '0o' [0-7]+ end
rule hex_integer_literal '-'? '0x' [0-9a-fA-F]+ end
However when I try and parse `0` I get:
1.9.3-p0 :001 > Crimson::Parser.parse('0') TypeError: wrong argument type Class (expected Module) from (eval):72:in `extend' from (eval):72:in `_nt_integer_literal' from (eval):24:in `_nt_literal' from /Users/jnh/.rvm/gems/ruby-1.9.3-p0@crimsonscript/gems/treetop-1.4.10/lib/tr eetop/runtime/compiled_parser.rb:18:in `parse' from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:11:in `parse' from (irb):1 from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
I'm not sure what I'm doing wrong, but when I compile the grammar to ruby I can see that it's calling #extend on the result of _nt_zero_integer_literal. I'm not sure why that's happening. Is this a bug?
-- James Harton sociable.co.nz @jamesotron +64226803869
> 1.9.3-p0 :001 > Crimson::Parser.parse('0') > TypeError: wrong argument type Class (expected Module) > from (eval):72:in `extend' > from (eval):72:in `_nt_integer_literal' > from (eval):24:in `_nt_literal'
> from /Users/jnh/.rvm/gems/ruby-1.9.3-p0@crimsonscript/gems/treetop-1.4.10/lib/tr eetop/runtime/compiled_parser.rb:18:in `parse' > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:11:in > `parse' > from (irb):1 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
> I'm not sure what I'm doing wrong, but when I compile the grammar to > ruby I can see that it's calling #extend on the result of > _nt_zero_integer_literal. I'm not sure why that's happening. Is > this a bug?
Simple answer: IntegerLiteral, NilLiteral, etc. need to be modules, and you've most likely defined them as classes.
One thing I've learned in my little experience with treetop (maybe I'm doing it the wrong way, but it worked for me), when you define a rule that's just an aggregation of other two, like some kind of inheritance, you must use a parenthesis to define the class for it. For example, let's fix your rule integer_literal:
> rule decimal_integer_literal > '-'? [1-9] [0-9]* > end
> rule binary_integer_literal > '-'? '0b' [0-1]+ > end
> rule octal_integer_literal > '-'? '0o' [0-7]+ > end
> rule hex_integer_literal > '-'? '0x' [0-9a-fA-F]+ > end
> However when I try and parse `0` I get:
> 1.9.3-p0 :001 > Crimson::Parser.parse('0') > TypeError: wrong argument type Class (expected Module) > from (eval):72:in `extend' > from (eval):72:in `_nt_integer_literal' > from (eval):24:in `_nt_literal' > from > /Users/jnh/.rvm/gems/ruby-1.9.3-p0@crimsonscript/gems/treetop-1.4.10/lib/tr eetop/runtime/compiled_parser.rb:18:in > `parse' > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:11:in > `parse' > from (irb):1 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
> I'm not sure what I'm doing wrong, but when I compile the grammar to ruby > I can see that it's calling #extend on the result of > _nt_zero_integer_literal. I'm not sure why that's happening. Is this a > bug?
Douglas is quite correct. The class specification for a rule binds tighter than the alternation "/" operator:
rule foo a <A_In_Foo> / b <B_In_Foo> end
rule a something <A> end
rule b something_else <B> end
In the case where "foo" matches "something_else", the SyntaxNode for "something_else" will have two mixed-in modules (both "B" and "B_In_Foo"). Use parentheses around a list of alternates if you want to mix in the same module regardless of which possibility is matched.
> One thing I've learned in my little experience with treetop (maybe I'm doing it the wrong way, but it worked for me), when you define a rule that's just an aggregation of other two, like some kind of inheritance, you must use a parenthesis to define the class for it. For example, let's fix your rule integer_literal:
> rule decimal_integer_literal > '-'? [1-9] [0-9]* > end
> rule binary_integer_literal > '-'? '0b' [0-1]+ > end
> rule octal_integer_literal > '-'? '0o' [0-7]+ > end
> rule hex_integer_literal > '-'? '0x' [0-9a-fA-F]+ > end
> However when I try and parse `0` I get:
> 1.9.3-p0 :001 > Crimson::Parser.parse('0') > TypeError: wrong argument type Class (expected Module) > from (eval):72:in `extend' > from (eval):72:in `_nt_integer_literal' > from (eval):24:in `_nt_literal' > from /Users/jnh/.rvm/gems/ruby-1.9.3-p0@crimsonscript/gems/treetop-1.4.10/lib/tr eetop/runtime/compiled_parser.rb:18:in `parse' > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:11:in `parse' > from (irb):1 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
> I'm not sure what I'm doing wrong, but when I compile the grammar to ruby I can see that it's calling #extend on the result of _nt_zero_integer_literal. I'm not sure why that's happening. Is this a bug?
> -- > You received this message because you are subscribed to the Google Groups "Treetop Development" group. > To view this discussion on the web visit https://groups.google.com/d/msg/treetop-dev/-/i618WdSTO9sJ. > To post to this group, send email to treetop-dev@googlegroups.com. > To unsubscribe from this group, send email to treetop-dev+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/treetop-dev?hl=en.
> when you define a rule that's just an aggregation of other two, like > some kind of inheritance, you must use a parenthesis to define the > class for it.
Good catch; this would likely be the next snag he'd hit, once the class/module problem is cleared up (the "0" case doesn't hit it because it's the last alternative, but anything else would).
Thanks folks for the feedback and explanation. I've switched over to modules so at least it's stopped complaining about that. I've also wrapped the rules for interger_literal and literal in parens. I'm still a little confused about the fact that the parser still refuses to recognise any of my patterns:
1.9.3p0 :001 > Crimson::Parser.parse('12345') Exception: Parse error: Expected one of -, 0x, 0o, 0b at line 1, column 1 (byte 1) after . from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse' from (irb):1 from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>' 1.9.3p0 :002 > Crimson::Parser.parse('nil') Exception: Parse error: Expected one of -, 0x, 0o, 0b, 0 at line 1, column 1 (byte 1) after . from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse' from (irb):2 from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
-- James Harton sociable.co.nz @jamesotron +64226803869
On Wednesday, 29 February 2012 at 7:03 PM, markus wrote: > > when you define a rule that's just an aggregation of other two, like > > some kind of inheritance, you must use a parenthesis to define the > > class for it.
> Good catch; this would likely be the next snag he'd hit, once the > class/module problem is cleared up (the "0" case doesn't hit it because > it's the last alternative, but anything else would).
> -- MarkusQ
> -- > You received this message because you are subscribed to the Google Groups "Treetop Development" group. > To post to this group, send email to treetop-dev@googlegroups.com (mailto:treetop-dev@googlegroups.com). > To unsubscribe from this group, send email to treetop-dev+unsubscribe@googlegroups.com (mailto:treetop-dev+unsubscribe@googlegroups.com). > For more options, visit this group at http://groups.google.com/group/treetop-dev?hl=en.
On Wed, Feb 29, 2012 at 12:30 AM, James Harton <jamesot...@gmail.com> wrote: > I'm still a little confused about the fact that the parser still refuses to > recognise any of my patterns:
This is just general advice: Take a step back. Can you get some example code to work? If so, you might want to try morphing the example into the code you want, step by step. The other approach is to start with the barest possible grammar, get that to work, and gradually add on to it.
What is unproductive, in my experience, is to write a bunch of code, throw it at the system, and debug the error messages you get. Try to start from a happy place, and change that to where you want to get to.
This isn't quite as off-topic as it might seem, as I've found that parsing (with Citrus rather than Treetop) really requires a go-slow approach.
///ark Web Applications Developer California Academy of Sciences
> 1.9.3p0 :001 > Crimson::Parser.parse('12345') > Exception: Parse error: Expected one of -, 0x, 0o, 0b at line 1, column 1 > (byte 1) after . > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse' > from (irb):1 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>' > 1.9.3p0 :002 > Crimson::Parser.parse('nil') > Exception: Parse error: Expected one of -, 0x, 0o, 0b, 0 at line 1, column 1 > (byte 1) after . > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in `parse' > from (irb):2 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
> On Wednesday, 29 February 2012 at 7:03 PM, markus wrote:
> when you define a rule that's just an aggregation of other two, like > some kind of inheritance, you must use a parenthesis to define the > class for it.
> Good catch; this would likely be the next snag he'd hit, once the > class/module problem is cleared up (the "0" case doesn't hit it because > it's the last alternative, but anything else would).
> -- MarkusQ
> -- > You received this message because you are subscribed to the Google Groups > "Treetop Development" group. > To post to this group, send email to treetop-dev@googlegroups.com. > To unsubscribe from this group, send email to > treetop-dev+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/treetop-dev?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Treetop Development" group. > To post to this group, send email to treetop-dev@googlegroups.com. > To unsubscribe from this group, send email to > treetop-dev+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/treetop-dev?hl=en.
On Wed, 2012-02-29 at 21:30 +1300, James Harton wrote: > Thanks folks for the feedback and explanation. I've switched over to > modules so at least it's stopped complaining about that. > I've also wrapped the rules for interger_literal and literal in > parens. > I'm still a little confused about the fact that the parser still > refuses to recognise any of my patterns:
> 1.9.3p0 :001 > Crimson::Parser.parse('12345') > Exception: Parse error: Expected one of -, 0x, 0o, 0b at line 1, > column 1 (byte 1) after . > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in > `parse' > from (irb):1 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>' > 1.9.3p0 :002 > Crimson::Parser.parse('nil') > Exception: Parse error: Expected one of -, 0x, 0o, 0b, 0 at line 1, > column 1 (byte 1) after . > from /Users/jnh/Dev/Toys/CrimsonScript/lib/crimson/parser.rb:13:in > `parse' > from (irb):2 > from /Users/jnh/.rvm/rubies/ruby-1.9.3-p0/bin/irb:16:in `<main>'
When I make those modifications to a grammar consisting of the rules you originally posted, it appears to work:
> Thanks folks for the feedback and explanation. I've switched over to > modules so at least it's stopped complaining about that. > I've also wrapped the rules for interger_literal and literal in parens. > I'm still a little confused about the fact that the parser still refuses > to recognise any of my patterns:
> One motivation to use subclasses is …
> to clean the parse tree such that it only has the semantically relevant information and no longer keep the syntactical sugar.
This is a really bad idea in general. PEG parsers produce
lots of garbage - all the hidden memorised attempts at
almost every character. If you keep the parse tree, whether
it's semantically-structured or not, you keep the garbage.
Sometimes a parse tree is like your semantic model, as in
the s-expressions example, but usually you will write a better
program if you design a semantic model first, then construct
that *after* parsing your input.
On Thursday, June 28, 2012 1:00:01 AM UTC+2, Clifford Heath wrote:
> On 28/06/2012, at 6:02 AM, Bernhard wrote: > > One motivation to use subclasses is … > > to clean the parse tree such that it only has the semantically relevant > information and no longer keep the syntactical sugar.
> This is a really bad idea in general. PEG parsers produce > lots of garbage - all the hidden memorised attempts at > almost every character. If you keep the parse tree, whether > it's semantically-structured or not, you keep the garbage.
> Sometimes a parse tree is like your semantic model, as in > the s-expressions example, but usually you will write a better > program if you design a semantic model first, then construct > that *after* parsing your input.
Yes, this is the usual approach. I am still struggling with finding an appropriate generic strategy to transform the parse tree to my semantic model. I would like to construct my semantic model and populate it by retrieving the relevant information from the parse tree (sometimes called the target driven transformation). But this is very difficult.
But querying the parse tree is also not recommended. Therefore, I don't really know how to proceed.
One approach I used in another system was, to annotate the grammar properly to distinguish, which parts
- should be represented in the result tree as (DOM-Nodes resp. XML-Elements). In this case, the rule-name is the name of the XML-Element
- should be ignored - should be XML-Attributes - in this case the rule-Name is the name of the XML-Attribute.
My task is to query tons of LaTeX - Files in order to generate particular extracts in XML.
It's pretty ugly digging around inside bits of the Treetop structures
assuming you know how to clean it. In fact, I think it's unlikely that
the example code frees very much garbage anyway.
> I am still struggling with finding an appropriate generic strategy to transform the parse tree to my semantic model.
The normal preferred way is to ask the parse tree to descend itself
and build whatever answer you require. Whether you use custom
classes or just methods depends on how hard that is.
> I would like to construct my semantic model and populate it by retrieving the relevant information from the parse tree (sometimes called the target driven transformation). But this is very difficult.
Right, what I said. Why is it so difficult in your case?
> My task is to query tons of LaTeX - Files in order to generate particular extracts in XML.
I think the only engine that can properly parse TeX is TeX - since
you can define new grammar rules that modify the TeX parser.
But if you assume that isn't being done… well, it's still pretty hard.
LaTeX is a real challenge. I admire you for attempting it.
> It's pretty ugly digging around inside bits of the Treetop structures > assuming you know how to clean it. In fact, I think it's unlikely that > the example code frees very much garbage anyway.
> > I am still struggling with finding an appropriate generic strategy to > transform the parse tree to my semantic model.
> The normal preferred way is to ask the parse tree to descend itself > and build whatever answer you require. Whether you use custom > classes or just methods depends on how hard that is.
> > I would like to construct my semantic model and populate it by > retrieving the relevant information from the parse tree (sometimes called > the target driven transformation). But this is very difficult.
> Right, what I said. Why is it so difficult in your case?
The difficulty in my case is the fact that treetop adds nodes where I did not expect them
and does not provide nodes where I expect them.
I switched to a simpler task to grep some patterns out of markdown file.
My main objective is, to find a generic approach how to solve such kind of issues.
With this grammar
grammar TraceInMarkdown
rule top
document end
rule document
( (noMarkupText / trace / markupAbort)* )
end
rule noMarkupText
[^\[]+
end
rule markupAbort
"[" end
rule trace
traceId s? traceHead s? traceBody traceUpLink end
rule traceId "[" label "]" end
rule label
[a-zA-Z]+ "_" [a-zA-Z]+ "_" [0-9]+ end
rule traceHead
'**' (!'*' . / '\*')+ '**' end
rule traceBody
"{" (nestedBody / [^{}])+ "}" end
rule nestedBody
"{" (nestedBody / [^{}])+ "}" end
rule traceUpLink
"(" (","? s? label)* ")" end
rule s
[ \t]+
end
end
applied to text="text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum} [nothing] }(foo_bar_1234, bar_foo_1234) text2 [ text3] "
As you see the root of the SyntaxTree does not provide "document".
At the same time. there is no "trace" ...
In order to keep the business logic out of the grammar, I tried to extend SyntaxNode with helper Methods, which should
provide what I want. Methods such as "child" which should deliver the child nodes as specified in the production rule.
But even if I am at a current node, I can hardly figure out the corresponding rule.
I also had an approach in which each rule had a method
delivering the related information. But there were
additional nodes which did not correspond to a rule and therefore raised NoMethodError Exceptions.
I somehow could hack it to make it work, but it is pure try and error
without "knowing what I am doing".
> > My task is to query tons of LaTeX - Files in order to generate > particular extracts in XML.
> I think the only engine that can properly parse TeX is TeX - since > you can define new grammar rules that modify the TeX parser. > But if you assume that isn't being done… well, it's still pretty hard. > LaTeX is a real challenge. I admire you for attempting it.
Yes, difficult enough. But I want to grep out some stuff from LaTeX files.
I therefore do not need a full blown LaTeX-Parser.
My problem as of now is, that I am struggling with Treetop _and_ with Latex.
> >> I would like to construct my semantic model and populate it by
> retrieving the relevant information from the parse tree (sometimes
> called the target driven transformation). But this is very difficult.
> > Right, what I said. Why is it so difficult in your case?
> The difficulty in my case is the fact that treetop adds nodes where I
> did not expect them and does not provide nodes where I expect them.
I'm not sure how helpful this will be, but I'll offer it in the hopes
that it at least provides some insight.
It feels as if you are trying to solve the wrong problem, and in the
process making things harder on yourself than need be. Specifically, it
sounds as if you are trying to have treetop produce the results you want
(and thus you care about where it does/does not produce nodes) rather
than having it produce the AST and then having the AST produce the
results you want.
This is a subtle distinction, but it has powerful consequences, because
it breaks the coupling between the structure of the grammar and the
structure of the results..
I'll try to walk you to the place where I think you ought to be starting
by going through a series of not-right-but-at-least-less-wrong stages.
First, with your grammar in g.tt I write a test rig like so:
This produces the AST, as you showed in your e-mail. But now rather
than having the AST as the output, suppose I define a property "as_xml"
on each node, with a (clearly incorrect) default implementation, and
print that as my result, like so:
class Treetop::Runtime::SyntaxNode
def as_xml
if elements
elements.map { |e| e.as_xml }.join
else
text_value
end
end
end
p Treetop.load('g.tt').new.parse(text).as_xml
Now the output is just the original input string, put it is being
produced by parsing the input, producing an AST, which we then walk,
reproducing the source..
We can recover some of the structure by adding the option of tagging the
xml like so (this isn't intended as an example of great code, just a
conceptual exercise to lead you to the easier way of thinking of
things):
class Treetop::Runtime::SyntaxNode
def as_xml
if elements
elements.map { |e| e.as_xml }.join
else
text_value
end
end
def wrap(tag,body)
"<#{tag}>#{body}</#{tag}>"
end
end
...and then mark up the grammar to use it:
rule top
document { def as_xml; wrap('top',super); end }
end
rule document
(noMarkupText / trace / markupAbort)* { def as_xml;
wrap('document',super); end }
end
rule noMarkupText
[^\[]+ { def as_xml; wrap('noMarkupText',super); end }
end
You could get the rest of the way (up to indentation) by decorating the
rest of the grammar, and could even do indentation by fussing with the
contents.
But there's nothing saying these methods have to produce strings; you
could just as well produce a tree of objects that has the structure you
want, and have them emit themselves as xml on demand, or have them
update a dictionary, etc. Or, going the other way (as in the obligatory
calculator example) you could have the methods produce a single number.
The key is to decouple the AST and the result by adding methods to the
syntax tree, so that the structure of one isn't bound to the structure
of the other.
On Saturday, July 28, 2012 12:14:13 AM UTC+2, Markus wrote:
> B --
> I'm not sure how helpful this will be, but I'll offer it in the hopes > that it at least provides some insight.
It _is_ helpful. Thanks a lot.
> It feels as if you are trying to solve the wrong problem, and in the > process making things harder on yourself than need be. Specifically, it > sounds as if you are trying to have treetop produce the results you want > (and thus you care about where it does/does not produce nodes) rather > than having it produce the AST and then having the AST produce the > results you want.
This impression might come up, but it is not my intention to have treetop produce the intended results. My Intention is indeed to get a clean AST as easy as possible.
> This is a subtle distinction, but it has powerful consequences, because > it breaks the coupling between the structure of the grammar and the > structure of the results..
Yes, fully agree with this.
> I'll try to walk you to the place where I think you ought to be starting > by going through a series of not-right-but-at-least-less-wrong stages. > First, with your grammar in g.tt I write a test rig like so:
> This produces the AST, as you showed in your e-mail. But now rather > than having the AST as the output, suppose I define a property "as_xml" > on each node, with a (clearly incorrect) default implementation, and > print that as my result, like so:
> class Treetop::Runtime::SyntaxNode > def as_xml > if elements > elements.map { |e| e.as_xml }.join > else > text_value > end > end > end
> p Treetop.load('g.tt').new.parse(text).as_xml
> Now the output is just the original input string, put it is being > produced by parsing the input, producing an AST, which we then walk, > reproducing the source..
ok so far. I get the same and I do understand the approach.
> We can recover some of the structure by adding the option of tagging the > xml like so (this isn't intended as an example of great code, just a > conceptual exercise to lead you to the easier way of thinking of > things):
> class Treetop::Runtime::SyntaxNode > def as_xml > if elements > elements.map { |e| e.as_xml }.join > else > text_value > end > end > def wrap(tag,body) > "<#{tag}>#{body}</#{tag}>" > end > end
> ...and then mark up the grammar to use it: > rule top > document { def as_xml; wrap('top',super); end } > end
rule document > (noMarkupText / trace / markupAbort)* { def as_xml; > wrap('document',super); end } > end
Here we are at the problem: I annotated this single rule as you describe. And I receive
(rdb:1) p r2.as_xml "text 1 [foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum} [nothing]
}(foo_bar_1234, bar_foo_1234) text2 [ text3] "
no tag for documentation. The reason is, that the parse tree does not have a node here. And this is what drives me to desperation: Why not? This is one of the cases where TT does not behave intuitively.
> rule noMarkupText > [^\[]+ { def as_xml; wrap('noMarkupText',super); end } > end
(rdb:1) p r2.as_xml "<noMarkupText>text 1 </noMarkupText>[foo_bar_1234] ** lorem ipsum header ** {lorem {ipsum} [nothing] }(foo_bar_1234, bar_foo_1234)<noMarkupText> text2 </noMarkupText>[<noMarkupText> text3] </noMarkupText>"
if the thing with the document rule was not there, I would not be in trouble, since then there is a clear strategy. But now what erver I define in the rule "documentation" it will not be used.
> ...which is starting to look at least somewhat like your goal:
yes
> You could get the rest of the way (up to indentation) by decorating the > rest of the grammar, and could even do indentation by fussing with the > contents.
yes .. if all notes were really delivered.
> But there's nothing saying these methods have to produce strings; you > could just as well produce a tree of objects that has the structure you > want, and have them emit themselves as xml on demand, or have them > update a dictionary, etc. Or, going the other way (as in the obligatory > calculator example) you could have the methods produce a single number.
I finally would like to push the stuff to nokogiri in order to transform and eventually serialize it to the intended output format.
> The key is to decouple the AST and the result by adding methods to the > syntax tree, so that the structure of one isn't bound to the structure > of the other.
Yes ... but hopefully you see my problem with TT on this way. BTW, I deal with model transformations for long time. This is the reason why I started with TT even in these tasks.
> no tag for documentation. The reason is, that the parse tree does not have a node here. And this is > what drives me to desperation: Why not?
You do get a node for 'document', but not for 'top', since that only contains one sub-rule.
When a rule contains only a single node, no new node is created.
Any code block associated is put into a module which is mixed in
to the existing node from the sub-rule.
If you defined "top" like this you'd get a new node which is a sequence of two nodes
rule top
document ''
end
This has the null rule '' which matches without consuming any characters.
It means that 'top' is defined as a sequence, so must create a new node.
> > no tag for documentation. The reason is, that the parse tree does not > have a node here. And this is > > what drives me to desperation: Why not?
> You do get a node for 'document', but not for 'top', since that only > contains one sub-rule.
> When a rule contains only a single node, no new node is created. > Any code block associated is put into a module which is mixed in > to the existing node from the sub-rule.
> If you defined "top" like this you'd get a new node which is a sequence of > two nodes
> rule top > document '' > end
> This has the null rule '' which matches without consuming any characters. > It means that 'top' is defined as a sequence, so must create a new node.
> no tag for documentation. The reason is, that the parse tree does not
> have a node here. And this is > what drives me to desperation: Why not? This is one of the cases where
> TT does not behave > intuitively.
Going back to the test rig that just dumps the AST, I see:
...which, as I would expect, has the document and top nodes merged;
which is to say, their is one SyntaxNode and it's getting both modules
added to it, with top being "outer" and document being "inner".
If you want, you can force there to be two nodes by adding a null string
to top (so that it isn't just a synonym for document)
rule top
document '' { def as_xml; wrap('top',super); end }
end
...but before you do that, did you try marking up both top and document
as I showed? When I do that (without adding a '' to force top to get
its own node) I see the generated xml as I pasted it. In ruby, super
will go up the module chain, so for instance:
module A
def foo
:foo
end
end
module B
def foo
[super,super]
end
end
class C
include A
include B
def foo
super.to_s.reverse
end
end
p C.new.foo
...produces:
"]oof: ,oof:["
...:foo from A, doubled by B, and converted to a string & reversed by C.
In case it matters, I'm using ruby 1.9.2 & Treetop v1.4.10.
amazing. I do not get this. I am using ruby 1.8.7. I tried it with 1.9.2 as well but not difference. By whatever reason I cannot make ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7
I crated a gist for the two cases such that you can see the results
> ...which, as I would expect, has the document and top nodes merged; > which is to say, their is one SyntaxNode and it's getting both modules > added to it, with top being "outer" and document being "inner".
With your advice to use "super" in a generic method as_xml, it would pave a way for me. thanks.
> ...but before you do that, did you try marking up both top and document > as I showed? When I do that (without adding a '' to force top to get > its own node) I see the generated xml as I pasted it. In ruby, super > will go up the module chain, so for instance:
> module A > def foo > :foo > end > end
> module B > def foo > [super,super] > end > end
> class C > include A > include B > def foo > super.to_s.reverse > end > end
> p C.new.foo
> ...produces:
> "]oof: ,oof:["
> ...:foo from A, doubled by B, and converted to a string & reversed by C.
> In case it matters, I'm using ruby 1.9.2 & Treetop v1.4.10.
I tried it with both. There was no difference.
What I do not like (but certainly would accept as long as the stuff works) is the stereotypical block which I have to add on every node, which does not really add further information. If SyntaxTree would provide a method to retrieve the name of the rule, It could all be done in the generic method as_xml.
I tried it as follows. It seems to work well, even if it is somehow heuristic.
# indicates if a meaningful name for the node in the AST # is available
def has_rule_name? not (extension_modules.nil? or extension_modules.empty?) end
# returns a meaning name for the node in the AST def rule_name if has_rule_name? then extension_modules.first.name.split("::").last.gsub(/[0-9]/,"") else "###" end end
> amazing. I do not get this. I am using ruby 1.8.7. I tried it with
> 1.9.2 as well but not difference. By whatever reason I cannot make
> ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7
I think you're right that this is the core difference. What version of
treetop are you using? I'm not aware of any version that should do what
you are seeing, but I don't track the details too closely.
> I crated a gist for the two cases such that you can see the results
So the one without the null string doesn't generate separate nodes for
top and document (which it shouldn't) but it also doesn't accrete their
modules onto the shared node (which it should).
But with the null string, it looks as if everything is working as
expected. From your second gist:
> What I do not like (but certainly would accept as long as the stuff
> works) is the stereotypical block which I have to add on every node,
> which does not really add further information.
I was giving you the simplest to implement/understand version, but there
are several things you can do to declutter the grammar once you get it
working.
For example, if you change your helpers to look like so:
class Treetop::Runtime::SyntaxNode
def as_xml
if elements
elements.map { |e| e.as_xml }.join
else
text_value
end
end
end
def wrap_with(tag)
define_method(:as_xml) { "<#{tag}>#{super()}</#{tag}>" }
end
...then the grammar rule annotations can be correspondingly reduced:
rule noMarkupText
[^\[]+ { wrap_with 'noMarkupText' }
end
which is far less obtrusive. As it stands, it is still redundant, in
that the tags names are always the same as the rule names, but is really
an artifact of how we got here. The rule names should really reflect
the grammatical constructs they define, not the tags that are used to
mark up the output.
The rule name (literal) describes their grammatical role, while the
tagging deals with their semantics (we'd probably want to call it
something other than wrap_with in this case). There's no reason the two
should be bound together, and good reasons for keeping them separate--as
in this example, a single rule might be capable of representing things
with differing semantics and different rules might likewise be capable
of producing things with the same semantics.
> If SyntaxTree would provide a method to retrieve the
> name of the rule, It could all be done in the generic method as_xml.
Yeah, but (IMHO) you really don't want to go there. :)
On Saturday, July 28, 2012 8:03:22 PM UTC+2, Markus wrote:
> > amazing. I do not get this. I am using ruby 1.8.7. I tried it with > > 1.9.2 as well but not difference. By whatever reason I cannot make > > ruby-debug work on my Mountain Lion Mac, so I returned to 1.8.7
> I think you're right that this is the core difference. What version of > treetop are you using? I'm not aware of any version that should do what > you are seeing, but I don't track the details too closely.
nor do I, but as it does not seem to be the root cause, I stick with 1.8.7
> > I crated a gist for the two cases such that you can see the results
> So the one without the null string doesn't generate separate nodes for > top and document (which it shouldn't) but it also doesn't accrete their > modules onto the shared node (which it should).
> But with the null string, it looks as if everything is working as > expected. From your second gist:
> > What I do not like (but certainly would accept as long as the stuff > > works) is the stereotypical block which I have to add on every node, > > which does not really add further information.
> I was giving you the simplest to implement/understand version, but there > are several things you can do to declutter the grammar once you get it > working.
> For example, if you change your helpers to look like so:
> class Treetop::Runtime::SyntaxNode > def as_xml > if elements > elements.map { |e| e.as_xml }.join > else > text_value > end > end > end
> def wrap_with(tag) > define_method(:as_xml) { "<#{tag}>#{super()}</#{tag}>" } > end
> ...then the grammar rule annotations can be correspondingly reduced:
> rule top > document '' { wrap_with 'top' } > end
> which is far less obtrusive. As it stands, it is still redundant, in > that the tags names are always the same as the rule names, but is really > an artifact of how we got here. The rule names should really reflect > the grammatical constructs they define, not the tags that are used to > mark up the output.
Yes, you are right, even if in most cases this is the same. But triggered by you Idea, I could use "wrap_with" not as a wrap instruction but as semantic notation.
I then could even use this to query the parse tree, If want to implement something which picks particluar nodes.
A so called Targetdriven transformation builds the target model not by traversing the source tree but by querying the source model.
> The rule name (literal) describes their grammatical role, while the > tagging deals with their semantics (we'd probably want to call it > something other than wrap_with in this case). There's no reason the two > should be bound together, and good reasons for keeping them separate--as > in this example, a single rule might be capable of representing things > with differing semantics and different rules might likewise be capable > of producing things with the same semantics.
> > If SyntaxTree would provide a method to retrieve the > > name of the rule, It could all be done in the generic method as_xml.
> Yeah, but (IMHO) you really don't want to go there. :)
> But triggered by you Idea, I could use "wrap_with" not as > a wrap instruction but as semantic notation.
> I then could even use this to query the parse tree, If want
> to implement something which picks particluar nodes.
> A so called Targetdriven transformation builds the target
> model not by traversing the source tree but by querying
> the source model.
Yeah, I like that line of thinking.
> Thanks Markus, you helped me a lot.
Glad I was able to, & best wishes going forward. If you get stuck
again, don't hesitate to ping the list. We aren't always able to help,
but we try to always try. :)