Is it me of treetop create a node for each characters?

Yves Dufournaud

unread,

Jun 20, 2025, 4:59:36 AMJun 20

to Treetop Development

Hello,

Maybe it's evident, maybe I am utterly wrong, but...

I am playing with treetop and this very simple grammar below.

And what strike me is that, I would have assume that their would be some form of reduction at some point, but from my grammar it looks like every char is being turned in a syntax node.

When looking on the NewLine node, I was assuming that It would be terminal and match "\n\n",,
It's clearly not, and has as children all the regexp like value:

Newline offset=0, "\n\n":
SyntaxNode+NEWLINE0 offset=0, "\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "\n"
SyntaxNode+NEWLINE0 offset=1, "\n":
SyntaxNode offset=1, ""
SyntaxNode offset=1, "\n"

Is it the the expectation?
Shouldn't it be a problem if source parsed is of significative size.
My initial assumption was that every regexp like expression would end up being turned in a terminal symbol and be a string (not a set of node).

My very simple grammar:
#python3_parser.treetop

grammar Python3

rule program
  # Initialise the indent stack with a sentinel:
  &{|s| @indents = [-1] }
  ( NEWLINE / stmt)*
end

rule stmt
  indentation text:((!"\n" .)*) "\n" <LineNode>
  {
    def inspect(indent="")
      indent + self.class.to_s + " "  + text.text_value
    end
  } 
end

rule indentation
  ' '*
end


rule NEWLINE
  ([\t ]* "\n")+ <Newline>
  {
    def inspect(indent="")
      indent + self.class.to_s + " #{elements.size}"
    end
  } end

end

Parser code

# In file parser.rb
require 'treetop'

class LineNode < Treetop::Runtime::SyntaxNode
  def value
    text_value.strip
  end
end

class Newline < Treetop::Runtime::SyntaxNode
end


class Parser
  
  # Load the Treetop grammar from the 'python3_parser.treetop' file -> produce <grammar name>Parser
  # and then create a new instance of that parser as a class variable so we don't have to re-create
  # it every time we need to parse a string
  Treetop.load('python3_parser.treetop')
  @@parser = Python3Parser.new

input = <<~TEXT


block
  line1

  line2
    nested
  line3
TEXT

tree = @@parser.parse(input)

if tree
  puts "Parsed successfully!"
  p tree
else
  puts "Parsing failed at: #{@@parser.failure_line} #{@@parser.index}: #{@@parser.failure_reason}"
end
  
end

The Tree I am getting as a result:
Parsed successfully!
SyntaxNode+Program0 offset=0, "...\n nested\n line3\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "...\n nested\n line3\n":
Newline offset=0, "\n\n":
SyntaxNode+NEWLINE0 offset=0, "\n":
SyntaxNode offset=0, ""
SyntaxNode offset=0, "\n"
SyntaxNode+NEWLINE0 offset=1, "\n":
SyntaxNode offset=1, ""
SyntaxNode offset=1, "\n"
LineNode block
LineNode line1
Newline offset=16, "\n":
SyntaxNode+NEWLINE0 offset=16, "\n":
SyntaxNode offset=16, ""
SyntaxNode offset=16, "\n"
LineNode line2
LineNode nested
LineNode line3

Clifford Heath

unread,

Jun 20, 2025, 5:01:30 AMJun 20

to treetop-dev@googlegroups.com Development

Your grammar asks for each char as a separate atom, so that is what you got.

Use *[^\n] instead.

Clifford Heath

--
You received this message because you are subscribed to the Google Groups "Treetop Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to treetop-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/treetop-dev/e10e3cc1-11ce-415c-9d9a-dbd40eaa1d94n%40googlegroups.com.

Clifford Heath

unread,

Jun 20, 2025, 5:03:43 AMJun 20

to treetop-dev@googlegroups.com Development

Ah sorry, I have been working with my prefix regular expressions too long.

Use [^\n]* instead.

Clifford Heath

On Fri, 20 June 2025, 18:59 Yves Dufournaud, <yves.du...@gmail.com> wrote:

Yves Dufournaud

unread,

Jun 20, 2025, 6:42:54 AMJun 20

to Treetop Development

Hi Cliffoard,

Thanks for your feedback.
I was about to ask where are atoms defined, but re-reading the doc I found:
https://github.com/cjheath/treetop?tab=readme-ov-file#terminal-symbols

_Terminals are called atomic expressions_

In fact was I was looking for (probably I read too fast at some point) is this part:
_Treetop now also supports regular expressions as terminals. _

Because I intend to have more complex pattern than [.]*.

But I failed to find it, because I didn't see example using it. Yet in same paragraph I found that

_regular expression can be used at atoms,_ and I can transform may grammar to it by using the syntax ""r, so:

([\t ]* "\n")

becomes

"([\t ]*\n)+"r

Which is what I was trying to achieve in my example.

In fact I thought I was already writing regexp, because using only terminals (I was assuming some kind of optimisation there),

and I was wrong.

Thanks again,

Regards,

Yves

Reply all

Reply to author

Forward