Bootstrap PEG parser generates readable parse tree

3 views
Skip to first unread message

Bill Cox

unread,
Jun 12, 2023, 10:11:29 AM6/12/23
to Rune language discussion
For example, given this:

func factorial(a) {
  if a == 1 {
    return a
  }
  return a*factorial(a - 1)
}

The PEG parser generates this parse tree:

(
  function(factorial "(" a ")" (
      ifPart((a ("==" 1)) a)
      returnStatement(a ("*" (factorial ("(" (a ("-" 1)) ")"))))))
  printlnStatement(factorial ("(" 57u256 ")")))

I think we can easily generate the HIR from the parse tree.

Bill

Aiden Hall

unread,
Jun 12, 2023, 10:20:13 AM6/12/23
to Bill Cox, Rune language discussion
does that imply we're skipping the AST IR?

--
You received this message because you are subscribed to the Google Groups "Rune language discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rune-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rune-discuss/CAH9QtQG5eJzemS1BMOrpe5523PDE0Kj7aMsnSkMk1F%3D_sxZeaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Bill Cox

unread,
Jun 12, 2023, 11:22:12 AM6/12/23
to Aiden Hall, Rune language discussion
I am leaning that way, but I think it is a good discussion topic.  The difference between an AST and parse tree is the AST is structured better for the processing we do there, which at a minimum includes creating the HIR.  I think transformers should work on the HIR, not AST, so we have less need than say Rust to have an AST.

What changes would you suggest to the parse tree to make it a clean AST?

Aiden Hall

unread,
Jun 12, 2023, 2:54:51 PM6/12/23
to Bill Cox, Rune language discussion
To clarify, in this case AST means a node structure where everything is a string rather than a type right? If so, that seems easy to generate from the parse tree, just give every type in the tree a "tostring" method and call that recursively to get the whole tree and return the result. I wonder if we could implement whatever AST features we want selectively this way without supporting a full conversion to AST:
- Given a node of any type in the parse tree, create AST subtree
- Apply syntactic sugar to the AST subtree
- Convert back to Parse Tree - maybe it would even be possible to convert the AST to code and parse it starting from whatever step the original parser was at. If the types of the parse tree nodes don't work out and we can't parse as a result, we know there's an error

Maybe this would be easier to implement than doing a full conversion to an AST, doing transforms, and then creating an IR. Additionally I think this would provide a good opportunity to catch any errors in the transforms applied to the AST.

Andrew Wilson

unread,
Jun 12, 2023, 3:00:07 PM6/12/23
to Aiden Hall, Bill Cox, Rune language discussion
just for clarification,  I understand an AST to be a node structure that contains only the essential program structure, and ignoring irrelevant parsing issues, often related to white-space processing or precedence parsing. i.e., drop brackets, irrelevant keywords, or the layered levels of syntax rules required to implement arithmetic parsing precedence, for example.

Bill's suggestion of having 'weak' keywords that can be dropped seems to be the right direction.  I don't think converting them to strings is necessary.


For more options, visit https://groups.google.com/d/optout.


--
Andrew Wilson
Software Engineer, Android TV Eng Prod
Reply all
Reply to author
Forward
0 new messages