> On 1 Nov 2018, at 08:22, Dmitry Ponyatov <
dpon...@gmail.com> wrote:
>
> Is it critical for rewrite engine to use only B-tree (not more than 2 branches in a node) representation?
For source code, yes. It’s a matter of simplicity. But only for source code.
>
> Things like
>
> (infix <CR>
> (name "Hello")
> (infix <CR>
> (name "World")
> ................)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
>
> makes my mind boiling,
It’s only a matter of rendering. It’s nothing more than a linked-list, and like for any linked list, this means accessing the N-th element is N hops. It makes sense to structure it that way because during rewrites, you *never* access the N-th element, you *always* access the surrounding ones.
> and does not support attribute grammar in nodes, which may be extra useful for programming language processing.
There is an “info” field in Tree just for that purpose. You can stick whatever you need there. It won’t show during rendering (and neither should it). It’s designed to be accessed by type, e.g. you could have `tree->GetInfo<MyStuff>()` to get whatever you put in there. The reasons for that design is that the compiler is extensible, so any compiler plugin can efficiently attach their own data to any tree node in whatever format they want.
Only the *source* is binary, though, and there’s a reason for that, see my comments below.
>
> In my knowledge base experiments, I use such a like base class:
>
> ## root class
> class Node:
>
> ## construct leaf with value
> ## @param[in] V value
> def __init__(self,V):
> ## primitive value
> self.value = V
> ## attributes (attribute grammar elements, or object slots) # std::map<std::string,Node*>
> self.attr = {}
> ## nested elements = vector & stack # std::vector<Node*>
> self.nest = []
>
> able to hold
> • any count of nested elements in order (array, vector, and stack representation at the same time) and
> • string-keyed attributes (generic associative array, can be used for attribute grammar).
> Does the use of this structure in XLTree class (tree.h) will totally break the rewriting algorithm?
It would certainly require a very different algorithm. The existing algorithm requires the ability to match one tree type to another.
I have tried this kind of approach before. I actually started that way, see
http://mozart-dev.sourceforge.net. In that initial implementation, an “if” would be represented by an “If” node, a “function” by a “function” node, and so on. I’m afraid the source history was lost when SourceForge decommissioned their CVS service.
This approach meant any tiny source transformation plugin had to know about all possible source node type, and could not add their own in a way that others would easily understand. So it proved to be not extensible, and not general enough. Figuring out I could do the same thing with a much simpler tree was a key insight, and happened only after years of tinkering.
I retried limited N-ary recently, see
https://github.com/c3d/xl-c-rewrite. In that version, tree has a “handler”, instead of 8 node types, so you can have an infinite number of tree classes. Some of them like “array” (in array.[ch]) are N-ary. I hoped it would be manageable, and help for things like a sequence of line. It turns out out it is not manageable, and I quickly ran into the same issues I had encountered earlier with Mozart, i.e. any tree rewrite must have deep-knowledge of all other tree rewrites, which is exactly what I managed to avoid with what you call B-Tree.
So for now, I’d stick to my current design, i.e.: source is a simple tree, with the ability to add arbitrary meta-data that is not part of the source.
> I mean not code fixes for this STL-based enhancements, but the core method of rewrite doing. Maybe B-tree only is a critical requirement.
As far as I can tell, it is. Not really “B-tree”, more like “a very small and closed set of tree node types”. This is what made Lisp meta-programming possible. Lisp has essentially two node types. XL has 8, so it’s “worse”, but the reason for that is to match the node types to how humans read and write code.
>
> For example, int main() becomes a clear human-readable structure:
> <function:main>
> args = <block:args>
> <arg:argc>
> type = <type:int>
> <arg:argv>
> type = <type:array>
> <pointer:*>
> <type:char>
> return = <type:int>
> <function:init>
> <name:hello>
> <name:World>
> <function:fini>
> <name:return>
> <integer:0>
I positively *love* your definition of “human-readable” ;-)
In that case, how does an arbitrary plugin know what to do with your “return” field, for example?
Try to imagine this. I write the “parse_tree” function which simply copies its input, but evaluates {{X}} by inserting X from the evaluation context into the node. Then you could write code that unrolss N times the same code as:
unroll 1, Body is Body
unroll N, Body is
parse_tree
{{Body}}
{{unroll N-1, Body}}
Now,
unroll 4, write “Hello”
will generate a code that looks like
write “Hello"
write “Hello"
write “Hello"
write “Hello”
I can do that entirely based on manipulations of the tree structure. The “parse_tree” function is quite simple. It's there:
https://github.com/c3d/xl/blob/master/xlr/runtime.cpp#L335 (parse_tree_inner is just above, line 268).
I invite you to analyse how you could write a function like this with your structure. How would it know what to do with “return” attributes or “function” tags?