Maybe it's evident, maybe I am utterly wrong, but...
I am playing with treetop and this very simple grammar below.
And what strike me is that, I would have assume that their would be some form of reduction at some point, but from my grammar it looks like every char is being turned in a syntax node.
When looking on the NewLine node, I was assuming that It would be terminal and match "\n\n",,
It's clearly not, and has as children all the regexp like value:
Newline offset=0, "\n\n":
SyntaxNode+NEWLINE0 offset=0, "\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "\n"
SyntaxNode+NEWLINE0 offset=1, "\n":
SyntaxNode offset=1, ""
SyntaxNode offset=1, "\n"
Is it the the expectation?
Shouldn't it be a problem if source parsed is of significative size.
My
initial assumption was that every regexp like expression would end up
being turned in a terminal symbol and be a string (not a set of node).
My very simple grammar:
#python3_parser.treetopgrammar Python3
rule program
# Initialise the indent stack with a sentinel:
&{|s| @indents = [-1] }
( NEWLINE / stmt)*
end
rule stmt
indentation text:((!"\n" .)*) "\n" <LineNode>
{
def inspect(indent="")
indent + self.class.to_s + " " + text.text_value
end
}
end
rule indentation
' '*
end
rule NEWLINE
([\t ]* "\n")+ <Newline>
{
def inspect(indent="")
indent + self.class.to_s + " #{elements.size}"
end
} end
end
Parser code
# In file parser.rb
require 'treetop'
class LineNode < Treetop::Runtime::SyntaxNode
def value
text_value.strip
end
end
class Newline < Treetop::Runtime::SyntaxNode
end
class Parser
# Load the Treetop grammar from the 'python3_parser.treetop' file -> produce <grammar name>Parser
# and then create a new instance of that parser as a class variable so we don't have to re-create
# it every time we need to parse a string
Treetop.load('python3_parser.treetop')
@@parser = Python3Parser.new
input = <<~TEXT
block
line1
line2
nested
line3
TEXT
tree = @@parser.parse(input)
if tree
puts "Parsed successfully!"
p tree
else
puts "Parsing failed at: #{@@parser.failure_line} #{@@parser.index}: #{@@parser.failure_reason}"
end
end
The Tree I am getting as a result:
Parsed successfully!
SyntaxNode+Program0 offset=0, "...\n nested\n line3\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "...\n nested\n line3\n":
Newline offset=0, "\n\n":
SyntaxNode+NEWLINE0 offset=0, "\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "\n"
SyntaxNode+NEWLINE0 offset=1, "\n":
SyntaxNode offset=1, ""
SyntaxNode offset=1, "\n"
LineNode block
LineNode line1
Newline offset=16, "\n":
SyntaxNode+NEWLINE0 offset=16, "\n":
SyntaxNode offset=16, ""
SyntaxNode offset=16, "\n"
LineNode line2
LineNode nested
LineNode line3