progress with variable scope handling

10 views
Skip to first unread message

Fabien Fleutot

unread,
Oct 5, 2011, 11:45:36 AM10/5/11
to met...@googlegroups.com
I've added some variable related features to the treequery prototype. Those are:
  • predicate treequery.is_binder(...)
  • predicate treequery.is_bound_by(binder)
  • function treequery.binder(occurrence)
The query module "metalua/treequery.mlua": https://gist.github.com/1264745
Its adapted walker "metalua/treequery/walk.mlua": https://gist.github.com/1264752

Telling the difference between binders and occurrences

is_binder(...) can be used to determine whether a node is binder, as opposed to an occurrence (or a non-`Id{ } node). It works by looking at the node's parent. 

Example

M> Q = require 'metalua.treequery'
M> ast = +{block: 
    no1(a); 
    local a; 
    yes1(a); 
    do 
        yes2(a); 
        local a = yes3(a); 
        no2(a); 
    end; 
    yes4(a); 
    local a; 
    no3(a) } 

M> =ast -- AST view of the code block above
 { `Call{ `Id "no1", `Id "a" }, 
   `Local{ { `Id "a" }, { } },
   `Call{ `Id "yes1", `Id "a" },
   `Do{ `Call{ `Id "yes2", `Id "a" },
        `Local{ { `Id "a" }, `Call{ `Id "yes3", `Id "a" } } },
        `Call{ `Id "no2", `Id "a" } },
   `Call{ `Id "yes4", `Id "a" }, `Local{ { `Id "a" }, { } },
   `Call{ `Id "no3", `Id "a" } }

M> Q(ast) :filter(Q.is_binder) :foreach(|_,parent| table.print(parent))
`Local{ { `Id "a" }, { } }
`Local{ { `Id "a" }, { `Call{ `Id "yes3", `Id "a" } } }
`Local{ { `Id "a" }, { } }

As expected, the nodes creating local vars are found, but those merely using them, here in function calls, are ignored. I chose to print parents rather than the nodes themselves: just getting a list of `Id "a" wouldn't have been very exciting.

You might notice that occurrences of the binder in the first "local a" statement are in functions called yesX(), whereas globals and occurrences of other locals are in functions called noX(). This will help checking that scoping tests below work correctly.

Filtering the occurrences of a given binder

is_bound_by(binder) returns a predicate, which in turns returns true if its argument is an occurrence of the binder `Id{ } passed as argument.

Example

M> b = Q(ast) :filter(Q.is_binder) :list() [1]
M> =Q(ast) :filter(Q.is_bound_by(b)) :foreach(|_,parent| table.print(parent))
`Call{ `Id "yes1", `Id "a" }
`Call{ `Id "yes2", `Id "a" }
`Call{ `Id "yes3", `Id "a" }
`Call{ `Id "yes4", `Id "a" }
M>

From now on, the first binder is called b. It will be reused in the next examples.

Retrieving the binder of a given occurrence

binder(occ) returns the binder associated with a given  occurrence, or nil in other case (if the argument isn't a local variable)).
This function is tricky, because it is non-local and makes sense out of a query. It relies on a weak table, which keeps track of discovered occurrence/binder relationships until one of the nodes becomes collectible. It means that it won't work on a node which has never been through a query. 

An alternative would be binder(occurrence, root_node), which would walk through root_node in o(n). It's conceptually more sound, but asymptotically longer. Also, it would allow to check for variable openness in a given, limited context (useful to move code around in a sound way, e.g. to implement "extract method" refactorings in a sound way). Finally, there's no risk to return garbage if ASTs are modified between queries. To sum up, if I follow my own advice given to David:

I wrote and self-quoted: 
Finding is up to treequery, caching is up to the user.

This function is flimsy and untrustworthy because it relies on caching, and can't guarantee cache consistency. The more I think of it, the more I'm convinced I should implement the proper, slower version. It seems easy for users to cache stuff or treat them in parallel whenever needed.

Example

M> require 'metalua.ast_to_string'
M> Q(ast) :filter('Id') :filter(|_,id| id[1]=='a') :foreach(function(node,parent) 
    printf("parent: %s; binder: %s; is first? %s", 
        ast_to_string(parent),
        table.tostring(Q.binder(node)),
        Q.binder(node)==b and "yes" or "no")
    end)
parent: no1 (a); binder: nil; is first? no
parent: local a; binder: nil; is first? no
parent: yes1 (a); binder: `Id "a"; is first? yes
parent: yes2 (a); binder: `Id "a"; is first? yes
parent: yes3 (a); binder: `Id "a"; is first? yes
parent: local a = yes3 (a); binder: nil; is first? no
parent: no2 (a); binder: `Id "a"; is first? no
parent: yes4 (a); binder: `Id "a"; is first? yes
parent: local a; binder: nil; is first? no
parent: no3 (a); binder: `Id "a"; is first? no

The 6th result might look wrong at first sight: in "local a=yes3(a)", the a parameter is an instance of the first "a" binder, so we would expect "is first? yes". But that's what the 5th result is about. the 6th result is about the left-hand-side a, the binder. Since it's a binder, it's obviously not bound by any other binder.

This exposes a peculiarity of the walker's traversal order: when walking through "local a=b", b will be visited before a. That's because this order is compliant both with sane scoping rules, and with the execution order: first b is computed, then it is stored into a, which is added to the scope.

Next steps
  • rewriting treequery.binder() in a sound way.
  • checking that the sound way is an acceptable substitute to the current one.
  • implement a conditional :map() functor (more about this in a later post).
  • testing those scope-aware services more thoroughly, this is something easy to get subtly wrong.

David Manura

unread,
Oct 5, 2011, 10:49:35 PM10/5/11
to met...@googlegroups.com
On Wed, Oct 5, 2011 at 11:45 AM, Fabien Fleutot <fle...@gmail.com> wrote:
> I've added some variable related features to the treequery prototype. Those are:
> predicate treequery.is_binder(...)
> predicate treequery.is_bound_by(binder)
> function treequery.binder(occurrence)

Noted.

> binder(occ) returns the binder associated with a given  occurrence, or nil
> in other case (if the argument isn't a local variable)).
> This function is tricky, because it is non-local and makes sense out of a
> query. It relies on a weak table, which keeps track of discovered
> occurrence/binder relationships until one of the nodes becomes collectible.

Underlying that, your walk.mlua still maintains its own active stack
of scope tables. I generally do something similar except that instead
of using table.shallow_copy between scope tables, the __index
metamethod of one scope table refers to its parent scope table. I'm
not sure how much performance difference that has in practice though.

https://github.com/davidm/lua-inspect/blob/master/luainspectlib/luainspect/globals.lua
"traverse"

> It means that it won't work on a node which has never been through a query.
> An alternative would be binder(occurrence, root_node), which would walk
> through root_node in o(n). It's conceptually more sound, but asymptotically
> longer.

binder(...) where ... is the list of ancestors of occurrence up to
root could be even faster. You can traverse from occurrence to root
and at each intermediate node examine the direct children (i.e.
examine the preceding siblings of the nodes in the path to root).
That is around O(log n). Moreover, most locals might be expected to
be closer to occurrence than root (under the principle that it is good
programming practice to minimize variable scopes), so in some cases it
might be effectively closer to O(1).

> Finally, there's no risk to return garbage if ASTs are modified between queries.

True, it should be stated to what extent ASTs can be modified during traversal.

Reply all
Reply to author
Forward
0 new messages