That's a good starting point and looks generally ok. Real world usage
will say how applicable this is. Eventually some benchmarks can be
worthwhile too (e.g. compared to hand-written searches).
> :filter (M.parent 'Call') -- nodes whose parents are Calls [...]
> Notice that it would be nice to also select invocation arguments, but it
> can't be done properly without a way to denote a form of parallel visit.
Can't you just replace 'Call' with has_tag{'Call', 'Invoke'} ?
BTW, in has_tag, `node_tag and tags[node_tag]` can be replaced with
`tags[node_tag]`, which should still work even when node_tag is nil.
Eventually some benchmarks can be worthwhile too (e.g. compared to hand-written searches).
> :filter (M.parent 'Call') -- nodes whose parents are Calls [...]
> Notice that it would be nice to also select invocation arguments, but itCan't you just replace 'Call' with has_tag{'Call', 'Invoke'} ?
> can't be done properly without a way to denote a form of parallel visit.
BTW, in has_tag, `node_tag and tags[node_tag]` can be replaced with `tags[node_tag]`
Speed probably usually doesn't matter. One case where it can matter
is real-time processing like in an IDE. Last time I did profiling on
LuaInspect (it was awhile ago), sluggishness on processing a large
file was due not only to parsing but also tree walking + decorating
(though I don't recall how much walking cost in isolation though).
The multiple pairs/ipairs per node traversal I see in your code make
me wonder about that. However, these applications indeed are the
minority, and they will find their own way as you say with appropriate
techniques. (Incidentally, people report sluggishness issues with
Visual Studio and Eclipse too, the latter of which you would know much
better than I.)
One possible alternate implementation of this library is to use code
generation techniques: compile a query into a Lua source code string
passed into loadstring (like my ListComprehensions and ExPattern Lua
libraries do). Regular expression matching libraries tend to be
similar in general: you "compile" the regular expression into some
efficient format and then reuse it any number of times. Such an
optimization is not a priority right now but it may be something to
keep in mind.
> some methods are "positional selectors": nodes are selected according to their position,
> relative to the node which satisfied the predicate. [...]
> :filter(p) is also a selector, it only select nodes which satisfy predicate p.
I think it helps clarify to think of it that way: filter is a special
type of positional selector. In the code, it looked like these were
two distinct concepts. You can probably even express (though maybe
not efficiently) all positional selectors in terms of filter because
each filter is passed the entire path of nodes up to the root. Your
parent and nth filter functions do something like that and thereby
blur the distinction between filter and the positional selectors.
Note: your selectors generally maintain some type of state across node
visits. Something else that maintains state across node visits is
lexical scope. Therefore, these both might share a pattern. Maybe
the maintenance of the "path of nodes up to the root" can likewise
also be a type of walk state that can be moved out of the core
behavior and be made optional for only those queries that need it.
> Another point which hasn't been addressed, but is extremely important, is scoped variable handling.
> [...] The way I see it, variables have a binder which "create" them. For instance,
> "`Function{ {`Id 'a', `Id 'b'}, body }" is the binder of the variables named "a" or "b" which occur
> in "body". Each occurrence of a local variable has one binder; each binder has zero, one,
> or many occurrences of its variable. Global variables are, by definition, the variable occurences
> without a binder.
> Maybe the binder shouldn't be defined as a node, but as a node with an indice. This way,
> each binder represents a single variable. In the example above, the binder of "b" in "body"
> is "{node=`Function{ {`Id 'a', `Id 'b'}, body }; indice=2}". The exact representation of a binder
> will be determined during implementation.
In comparison in LuaInspect, if ast.tag == 'Id', then
ast.localdefinition refers directly to the 'Id' node in the binder
(not to the binder itself). ast.localdefinition is nil for global
variables. It also happens to be the case that ast.localdefinition ==
ast for the 'Id' node in the binder itself. When storing information
about some lexical variable (e.g. a flag indicating whether the
variable is referred to (isused) or mutated (isset) anywhere else),
the ast.localdefinition node is used as the key or the storage
location for this information. This is more efficient as a table key
than trying to use the binder + index position. On the other hand, I
typically don't care about which binder node contains an 'Id' node,
but if I do I can walk up the ancestors (e.g. in my case, ast.parent
links). Highlighting tokens corresponding to `Id nodes of locals
declared as function parameters (`Function binders) differently from
tokens of locals declared via a local statement (`Local binders) is
one example of this.
> var.occurence(b)(n)
Here's another class of use cases to consider, one that I've needed in
more than one project: find all local variables that are constant (not
assigned to outside of the binder), are unused (e.g. for dead-code
elimination or typo detection), or mask another lexical of the same
name. In these cases you're dealing with more than one variable per
walk.
The multiple pairs/ipairs per node traversal I see in your code make
me wonder about that.
One possible alternate implementation of this library is to use code
generation techniques: compile a query into a Lua source code string
passed into loadstring (like my ListComprehensions and ExPattern Lua
libraries do). Regular expression matching libraries tend to be
similar in general: you "compile" the regular expression into some
efficient format and then reuse it any number of times. Such an
optimization is not a priority right now but it may be something to
keep in mind.
> some methods are "positional selectors": nodes are selected according to their position,
> relative to the node which satisfied the predicate. [...]
> :filter(p) is also a selector, it only select nodes which satisfy predicate p.
I think it helps clarify to think of it that way: filter is a special
type of positional selector. In the code, it looked like these were
two distinct concepts.
You can probably even express (though maybe
not efficiently) all positional selectors in terms of filter because
each filter is passed the entire path of nodes up to the root.
Note: your selectors generally maintain some type of state across node
visits. Something else that maintains state across node visits is
lexical scope. Therefore, these both might share a pattern.
Maybe
the maintenance of the "path of nodes up to the root" can likewise
also be a type of walk state that can be moved out of the core
behavior and be made optional for only those queries that need it.
> Another point which hasn't been addressed, but is extremely important, is scoped variable handling.
> [...] The way I see it, variables have a binder which "create" them. For instance,
> "`Function{ {`Id 'a', `Id 'b'}, body }" is the binder of the variables named "a" or "b" which occur
> in "body". Each occurrence of a local variable has one binder; each binder has zero, one,
> or many occurrences of its variable. Global variables are, by definition, the variable occurences
> without a binder.
> Maybe the binder shouldn't be defined as a node, but as a node with an indice. This way,
> each binder represents a single variable. In the example above, the binder of "b" in "body"
> is "{node=`Function{ {`Id 'a', `Id 'b'}, body }; indice=2}". The exact representation of a binder
> will be determined during implementation.
In comparison in LuaInspect, if ast.tag == 'Id', then
ast.localdefinition refers directly to the 'Id' node in the binder
(not to the binder itself).
ast.localdefinition is nil for global
variables. It also happens to be the case that ast.localdefinition ==
ast for the 'Id' node in the binder itself. When storing information
about some lexical variable (e.g. a flag indicating whether the
variable is referred to (isused) or mutated (isset) anywhere else),
the ast.localdefinition node is used as the key or the storage
location for this information.
> var.occurence(b)(n)
Here's another class of use cases to consider, one that I've needed in
more than one project: find all local variables that are constant (not
assigned to outside of the binder), are unused (e.g. for dead-code
elimination or typo detection), or mask another lexical of the same
name. In these cases you're dealing with more than one variable per
walk.
From: Fabien Fleutot <fle...@gmail.com>
To: met...@googlegroups.com
Sent: Wednesday, October 5, 2011 11:48 AM
Subject: Re: [metalua] Re: Some samples using the treequery API prototype
I don't agree: The basic concept is, like a compiler, to translate
some mathematical query expression into Lua source as an intermediate
representation (IR) that is then compiled into bytecode via loadstring
(and optionally further to machine code under JIT). It is possible to
do this all robustly. However, under JIT, I wonder if the trace
recording obviates this. In the case of Metalua, you go straight to
bytecode, bypassing Lua source IR, but despite some additional
flexibility some consider that a disadvantage (e.g. the current state
of Metalua incompatibility with 5.2 and JIT). The addition of goto in
5.2beta partly had this topic in mind. True, this is however somewhat
beside the point at the moment for reasons you state below :)
On Wed, Oct 5, 2011 at 5:48 AM, Fabien Fleutot <fle...@gmail.com> wrote:
> loadstring() hacks cause many problems, most notably of scope, and are
> doomed to remain brittle.
I don't agree: [...] It is possible to do this all robustly.
I follow that, but I had something different in mind, as prototyped here:
https://gist.github.com/4839a28430f3b1ad7a8d .