I am new to attribute grammars and Kiama alike, but the slides and
examples intrigued me, so that I am now trying to port a parser,
resolver and typechecker for a smallish language
(http://research.microsoft.com/en-us/projects/chalice/) to Kiama.
I use the examples Obr and PicoJava to get started, but I have the
impression that they use to different approaches to name resolving: Obr
appears to use a more classical symbol table whereas PicoJava seems to
make use of the attribute concept to create backlinks to variable/method
declarations.
Is my perception correct?
Are there pros and cons of each approach, e.g. does one scale better
than the other?
And a second question: The current parser (also build using Scala's
parser library) maintains a set of local variables and uses this set in
several parsing rules to distinguish between local variables and
implicit this-dereferences. That is (roughly), for the rhs of an assignment
x = y
it either creates the node
VariableExpr("y")
or
FieldAccess(ThisExpr, "y").
What is the idiomatic way of doing this in Kiama? Rewriting the AST
after the resolving step?
Thanks in advance,
Malte
Welcome to the Kiama list.
On 09/12/2011, at 7:27 PM, Malte Schwerhoff wrote:
> I am new to attribute grammars and Kiama alike, but the slides and
> examples intrigued me, so that I am now trying to port a parser,
> resolver and typechecker for a smallish language
> (http://research.microsoft.com/en-us/projects/chalice/) to Kiama.
>
> I use the examples Obr and PicoJava to get started, but I have the
> impression that they use to different approaches to name resolving: Obr
> appears to use a more classical symbol table whereas PicoJava seems to
> make use of the attribute concept to create backlinks to variable/method
> declarations.
>
> Is my perception correct?
>
> Are there pros and cons of each approach, e.g. does one scale better
> than the other?
Yes, you are right. I tend to break the options down as follows:
a) classical environment passing where the entries in the environment are to symbol table information not tree nodes
b) environment passing where the entries in the environment are references to syntax tree nodes
c) a "lookup" style where a parameterised attribute is used to search the tree for the declaration node
(a) is easy to do but tends to duplicate information and structures that are already present in the tree. (b) works well for applications like compilation where it is likely that you will need access to information about many names at many places in the tree (since you are passing a complete environment around). (c) is more suited to cases where you might need to lookup some names, but not all of them, since you only go looking for the information that you need.
A small complication is how to handle information about named entities that are not present in the source (e.g., pre-defined things). To get a uniform approach in (b) and (c) you need to create "fake" tree fragments to represent them so that you can compute their properties from those fragments. Otherwise, you need to treat pre-computed information for pre-defined things differently to computed-on-demand information for user-defined things.
FYI, we are developing a larger example in the Obr language family that used more of the (b) approach. We expect to release that example in month or so.
> And a second question: The current parser (also build using Scala's
> parser library) maintains a set of local variables and uses this set in
> several parsing rules to distinguish between local variables and
> implicit this-dereferences. That is (roughly), for the rhs of an assignment
> x = y
> it either creates the node
> VariableExpr("y")
> or
> FieldAccess(ThisExpr, "y").
>
> What is the idiomatic way of doing this in Kiama? Rewriting the AST
> after the resolving step?
My preference is to avoid using semantic information (e.g. your set of variables) at parse time, since I prefer a separation so that parsing just deals with structure. The reason is that even though it works in simple cases, it doesn't scale to more complicated languages since you end up having to do tasks like name and type analysis at parse time. It's possible to do it for some languages, but in general the information you need is just not there when you are part-way through a parse.
Kiama has a comprehensive rewriting library [2] that can easily be used to replace general constructs with more specific ones after the AST has been built, using semantic information from attributes where necessary. The more complex example I mentioned earlier contains some desugaring that is implemented using the rewriting facilities of Kiama, but you can also see it used in the lambda*, json and oneohonecompanies Kiama examples. Our rewriting library is based on the Stratego language [1], so any of the available literature on that system can also be used to see what you can do.
regards,
Tony
[1] http://code.google.com/p/kiama/wiki/Rewriting
[2] http://strategoxt.org/
thanks for your detailed explanation and the links you provided!
Since I am implementing a verifier I guess the best way is to go with
option b). I'll have a closer look at the Obr example, and please let me
know when the extended example is available, that sounds existing!
Btw, can you suggest a good book about attribute grammars (and there
application to compiler design and related fields)?
Regards,
Malte
On 14/12/2011, at 4:02 AM, Malte Schwerhoff wrote:
> Hi Tony,
>
> thanks for your detailed explanation and the links you provided!
No problem!
> Since I am implementing a verifier I guess the best way is to go with
> option b). I'll have a closer look at the Obr example, and please let me
> know when the extended example is available, that sounds existing!
It sounds like (b) is the best option for you. The new example will be
part of an upcoming release of Kiama. I'm hoping to get it out before
Christmas but some work needs to be done so it may slip to January.
> Btw, can you suggest a good book about attribute grammars (and there
> application to compiler design and related fields)?
THe most accessible intro that I know is in the book Modern Compiler
Design by Grune et al. For a more in-depth look there is this paper:
http://dl.acm.org/citation.cfm?id=197409
The paper is a bit out of date but is quite good as an overview.
<plug>
Some collaborators and I wrote a book a while back that deals with our
approach to these sorts of topics and makes heavy use of attribute
grammars:
It is fairly heavy on specifications for our Eli system, which isn't
being actively developed anymore. Some of these ideas are slowly
transferring across to Kiama (which has somewhat different aims). But
it might be of interest to you if you can locate a copy.
</plug>
regards,
Tony
> --
> You received this message because you are subscribed to the Google Groups "kiama" group.
> To post to this group, send email to ki...@googlegroups.com.
> To unsubscribe from this group, send email to kiama+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/kiama?hl=en.
>
assume I am working with a small object-oriented language that has
classes, fields, methods and local variables. An example would be
class C {
var f: int
def m() {
var l: int
f := l
}
}
Following your suggestions below - and the PicoJava example - I
implemented the following (simplified) steps in order to handle implicit
'this' dereferences, as in 'f := l':
1. The parser creates nodes Field("f") and LocalVar("l") for the
declarations and UnresolvedRef("f") and UnresolvedRef("l") for the use
of the field and variable, respectively.
2. The node attribute decl is computed
- UnresolvedRef("f")->decl == Field("f")
- UnresolvedRef("l")->decl == LocalVar("l")
3. A rewriter replaces
- UnresolvedRef(x) by FieldAccess(x)
if UnresolvedRef(x)->decl == Field(x)
- UnresolvedRef(x) by VariableExpr(x)
if UnresolvedRef(x)->decl == LocalVar(x)
def rewriteUnresolvedRef: Program => Program =
rewrite(everywhere (rule {
...
}))
Program p' = rewriteUnresolvedRef(p)
4. An error attribute (as in the Obr example) is computed in order to
check for errors, e.g. decl-attributes that are UnknownDecl.
That is, p'->errors is computed, i.e. the errors are computed AFTER
replacing certain nodes.
I observed that computing the error attribute - which basically means
comparing the decl-attribute - does not always works as expected, unless
I invoke initTree(p') before evaluating p'->errors.
Does that seem to make sense?
Is it related to attribute caching or why would I have to call initTree
again?
Thanks,
Malte
On 12/12/2011 01:11, Tony Sloane wrote:
On 21/12/2011, at 3:53 AM, Malte Schwerhoff wrote:
> 3. A rewriter replaces
> - UnresolvedRef(x) by FieldAccess(x)
> if UnresolvedRef(x)->decl == Field(x)
> - UnresolvedRef(x) by VariableExpr(x)
> if UnresolvedRef(x)->decl == LocalVar(x)
>
> def rewriteUnresolvedRef: Program => Program =
> rewrite(everywhere (rule {
> ...
> }))
>
> Program p' = rewriteUnresolvedRef(p)
>
> 4. An error attribute (as in the Obr example) is computed in order to
> check for errors, e.g. decl-attributes that are UnknownDecl.
> That is, p'->errors is computed, i.e. the errors are computed AFTER
> replacing certain nodes.
>
> I observed that computing the error attribute - which basically means
> comparing the decl-attribute - does not always works as expected, unless
> I invoke initTree(p') before evaluating p'->errors.
>
> Does that seem to make sense?
> Is it related to attribute caching or why would I have to call initTree
> again?
It's not caching, but access to the tree properties such as "parent" and the children iterator. These are not defined on a tree until you call initTree.
(Longer story: in the 1.1 and earlier releases, initTree is not used. We had a "clever" system of automatically setting the node properties when an Attributable node was created. This scheme worked most of the time, but in some cases you could have problems, particularly if a node is "re-parented" for some reason. We decided that the convenience of automatic setting was not worth the confusion for these cases. So, in the current trunk version (and hence in the upcoming 1.2 release), we removed the automatic node property setting and you have to call initTree directly to get the properties.)
So, anytime you create a new tree and want to subsequently use properties on that tree such as parent, you need to call initTree on the root of the tree before you do.
In your code it depends on what rewriteUnresolvedRef does, but from the description it is always going to create a new tree, so you will need to call initTree on p' before you do the error checking.
(Longer story: there are some rewrites that don't actually rewrite the tree. For example, a query that just collects some nodes. In those cases, Kiama is careful to not create a new tree, so the node properties would still be set and valid after the "rewrite".)
cheers,
Tony
thanks a lot for your quick and detailed replies!
I assume that, if you have n arbitrary rewriters who possibly traverse
the complete tree, you also have to call initTree n times in order to
(re)create a fully traversably tree.
Is that correct?
If so, does it reset the computed attributes? More generally, how does
it affect the performance?
Also, is there a way to "reuse" computed attributes when rewriting
trees? Staying with my previous example, if I have computed
UnresolvedRef(x)->decl and I replace the node UnresolvedRef(x) via a
rewriter with FieldAccess(x), can I somehow copy over
UnresolvedRef(x)->decl to FieldAccess(x)->decl, so that decl doesn not
need to be recomputed?
Cheers,
Malte
On 22/12/2011, at 1:26 AM, Malte Schwerhoff wrote:
> thanks a lot for your quick and detailed replies!
No worries!
> I assume that, if you have n arbitrary rewriters who possibly traverse
> the complete tree, you also have to call initTree n times in order to
> (re)create a fully traversably tree.
> Is that correct?
Only if you plan to use the node properties such as parent and the child iterator. You can process rewritten trees in any way you like without calling initTree if you don't need those properties. Ignoring the node properties, there is nothing special about the trees, they are just Scala data structures. In fact, you don't need to use the Attributable trait from Kiama at all if you don't need the properties.
> If so, does it reset the computed attributes? More generally, how does
> it affect the performance?
initTree has no effect on computed attributes. Performance would be affected since initTree essentially does a full traversal of the tree to set up the properties. So, you would want to avoid doing that if you don't need to. Normally you would want to do lots of rewriting in one phase to build a tree that is attributed by attribution in the next phase. Just call initTree once before the second of these phases (again, only if you need the node properties).
BTW, if you *do* want to reset computed attributes, there are two methods at present. Attribution.resetMemo is a "lazily reset all attributes" operation. Attributes that you ask for subsequently will be recomputed. Alternatively, in the trunk version there is a fine-grained reset method on each attribute that just resets its cache.
> Also, is there a way to "reuse" computed attributes when rewriting
> trees? Staying with my previous example, if I have computed
> UnresolvedRef(x)->decl and I replace the node UnresolvedRef(x) via a
> rewriter with FieldAccess(x), can I somehow copy over
> UnresolvedRef(x)->decl to FieldAccess(x)->decl, so that decl doesn not
> need to be recomputed?
I guess I wouldn't copy the values over. Rather, it's easier for the new FieldAccess node to be given a reference to the old UnresolvedRef node at construction time. Then when you need the decl attribute of a FieldAccess you can just ask its UnresolvedRef for it. This method has some negatives since it keeps the old nodes around, but is much simpler than any scheme to copy attributes around.
cheers,
Tony