ClojureScript analyzer AST changes

171 views
Skip to first unread message

Brandon Bloom

unread,
Dec 12, 2012, 2:10:54 AM12/12/12
to cloju...@googlegroups.com
I've been on-and-off-again working on a Continuation Passing Style AST Transform for ClojureScript. Progress has been slow because I needed to fix a bunch of bugs in the ClojureScript analyzer along the way.

In the process, I've identified some issues with the current AST and would like to make some breaking changes. This message is an attempt to identify stakeholders (who is going to be broken?) and get feedback on my proposed changes.

Let's start with two changes I think are no-brainers and we can work from there:


Change #1) There should be a separate :loop operation

Problem: loops are :let operations with :is-loop true. This came about because the implementation of emit for let and loops are similar, but for other types of AST passes loops are much more similar to functions than to lets. It's completely arbitrary that :loop is the only special form that doesn't have it's own :op.

Solution: Split up the operations.


Change #2) All blocks should expand to a :do node

Problem 2A) A bunch of redundancy happens whenever a special form allows multiple statements in a block. Analyze has the 'analyze-block helper and emit has the 'emit-block helper. Having written two other AST passes (Abstract Normal Form and Continuation Passing Style), I had to create anf-block and cps-block helpers as well. This redundancy leeks out into every single pass by necessity.

Problem 2B) When a block is empty, there is no AST node on which to hang the :env property. When performing an AST transformation, you sometimes need the env at the end of a block, even if the block is empty.

Solutions: Replace ~@exprs with (do ~@exprs) wherever a special form employs an implicit block.
This is another breaking change with a straightforward patch: https://github.com/brandonbloom/clojurescript/compare/distinct-let...implicit-do


Who currently depends on the shape of the AST?

What is the process for making breaking changes to the AST?

Do these changes make sense?


I've started a design page to capture some additional notes regarding the AST: http://dev.clojure.org/display/design/Formalize+AST


Cheers,
Brandon

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 3:32:34 AM12/12/12
to cloju...@googlegroups.com
Hi Brandon,

First of all, thank you for working on this area, it is very important for Clojure's future.

On Wed, Dec 12, 2012 at 3:10 PM, Brandon Bloom <snpr...@gmail.com> wrote:

Change #1) There should be a separate :loop operation

Problem: loops are :let operations with :is-loop true. This came about because the implementation of emit for let and loops are similar, but for other types of AST passes loops are much more similar to functions than to lets. It's completely arbitrary that :loop is the only special form that doesn't have it's own :op.

Solution: Split up the operations.

I initially agreed with this, but I'm having reservations after thinking about it. It's not really clear whether :is-loop eliminates some redundancy in the AST nodes, or merely passes it to the consumer of analysis.

Could you elaborate on where :loop's are like :fn's? They look more like :let's to me, with a recur point.

We seem to be supporting combinations of these features:

1. Scope local bindings
2. Provide inits to local bindings
3. Make a recur point
4. Create a fn

:let has 1,2
:fn has 1,3,4
:loop has 1,2,3
 


Change #2) All blocks should expand to a :do node

Problem 2A) A bunch of redundancy happens whenever a special form allows multiple statements in a block. Analyze has the 'analyze-block helper and emit has the 'emit-block helper. Having written two other AST passes (Abstract Normal Form and Continuation Passing Style), I had to create anf-block and cps-block helpers as well. This redundancy leeks out into every single pass by necessity.

Problem 2B) When a block is empty, there is no AST node on which to hang the :env property. When performing an AST transformation, you sometimes need the env at the end of a block, even if the block is empty.

Solutions: Replace ~@exprs with (do ~@exprs) wherever a special form employs an implicit block.
This is another breaking change with a straightforward patch: https://github.com/brandonbloom/clojurescript/compare/distinct-let...implicit-do


Can someone speculate as to why this doesn't already happen? It seems like a good idea.
 

Who currently depends on the shape of the AST?

I have some preliminary support for CLJS from Typed Clojure, but the minor differences in some of the AST nodes from Clojure's is annoying. I'm looking forward making them compatible to some degree.
 

What is the process for making breaking changes to the AST?

Do these changes make sense?


I've started a design page to capture some additional notes regarding the AST: http://dev.clojure.org/display/design/Formalize+AST


This is working towards a standard AST that can be applied to all implementations of Clojure.

Here are some of the main differences that I have noticed between Clojure and ClojureScript's AST's.

1. Interop - CLJS has mainly :js* and :dot, Clojure has many interop nodes, some essential, others for optimisations. This is undoubtedly the biggest difference between Clojure implementations.
2. defrecord - CLJS has a :defrecord node, Clojure doesn't
3. local binding "init" nodes are slightly simpler in CLJS
4. Clojure has vars, thus :the-var AST node.
5. Some small differences with try/catch which I don't fully understand.

Again, thank you Brandon for your great work.
Ambrose

Brandon Bloom

unread,
Dec 12, 2012, 3:59:24 AM12/12/12
to cloju...@googlegroups.com, abonnair...@gmail.com
Could you elaborate on where :loop's are like :fn's? They look more like :let's to me, with a recur point.

We seem to be supporting combinations of these features:

1. Scope local bindings
2. Provide inits to local bindings
3. Make a recur point
4. Create a fn

:let has 1,2
:fn has 1,3,4
:loop has 1,2,3

That's a good way to look at it. One thing worth noting is that all of these really are the same thing: lambda abstractions.

Consider the loop:

(loop [x 3] (when (pos? x) (println x) (recur (dec x))))

It's actually equivalent to:

((fn [x] (when (pos? x) (println x) (recur (dec x)))) 3)

Similarly, you could say that:

(let [x 1] (inc x))

Is the same as:

((fn [x] (inc x)) 1)

You know, lambda calculus and all that...


The interesting bit for me is that recur implies a tail call and so when transforming into continuation passing style, I can apply the above trivial transformation and then delegate to the :fn implementation for the recur trampoline. As a result, :loop looks like a :fn with an initial :invoke, instead of looking like a :let.

We don't have any other bi-modal AST nodes, so it seems like a good idea to enumerate all the possible states as true node ops.

 
1. Interop - CLJS has mainly :js* and :dot, Clojure has many interop nodes, some essential, others for optimisations. This is undoubtedly the biggest difference between Clojure implementations.
2. defrecord - CLJS has a :defrecord node, Clojure doesn't

I don't think there's anything wrong with having interop nodes, but I'd like to ultimately segregate them. Both compilers currently pipeline muddles together parsing, macro expansion, higher level analysis, platform concerns, optimizations, etc. It would be nice to have the analyzer made completely agnostic of the target platform in the earliest stage: before macro expansion. The AST before macro expansion would be useful for codeq to find usages of macros, etc. This would enable target-agnostic analysis to occur without having to ignore target-specific nodes.
 
3. local binding "init" nodes are slightly simpler in CLJS

I have some ideas about further improving this, but let's focus on :loop and implict :do right now :-)
 
4. Clojure has vars, thus :the-var AST node.

ClojureScript will likely gain :the-var as we approach self hosting.
 
5. Some small differences with try/catch which I don't fully understand.

JavaScript's catch does not provide conditional catching by type like Java does. There is a 'try macro which expands to the 'try* form. The 'try* form uses a 'cond to simulate type filtering.
 
Again, thank you Brandon for your great work.

My pleasure. I'd really love to see Typed-Clojure working on a common AST for both Clojure and ClojureScript!

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 4:23:43 AM12/12/12
to cloju...@googlegroups.com
On Wed, Dec 12, 2012 at 4:59 PM, Brandon Bloom <snpr...@gmail.com> wrote:
Could you elaborate on where :loop's are like :fn's? They look more like :let's to me, with a recur point.

We seem to be supporting combinations of these features:

1. Scope local bindings
2. Provide inits to local bindings
3. Make a recur point
4. Create a fn

:let has 1,2
:fn has 1,3,4
:loop has 1,2,3

That's a good way to look at it. One thing worth noting is that all of these really are the same thing: lambda abstractions.

Consider the loop:

(loop [x 3] (when (pos? x) (println x) (recur (dec x))))

It's actually equivalent to:

((fn [x] (when (pos? x) (println x) (recur (dec x)))) 3)

Similarly, you could say that:

(let [x 1] (inc x))

Is the same as:

((fn [x] (inc x)) 1)

You know, lambda calculus and all that...


Right.
 

The interesting bit for me is that recur implies a tail call and so when transforming into continuation passing style, I can apply the above trivial transformation and then delegate to the :fn implementation for the recur trampoline. As a result, :loop looks like a :fn with an initial :invoke, instead of looking like a :let.

We don't have any other bi-modal AST nodes, so it seems like a good idea to enumerate all the possible states as true node ops.

So I assume this is doable without having a separate :loop node: :loop would just be more convenient for some users of analysis. 

Could this change mean more work for some common usages of analysis? Possibly yes for some very simple analysis of the AST, but it seems like a welcome change more complicated libraries like CPS.

For Typed Clojure: handling of :let and :loop are almost identical, except for adding a recur point.
 

 
1. Interop - CLJS has mainly :js* and :dot, Clojure has many interop nodes, some essential, others for optimisations. This is undoubtedly the biggest difference between Clojure implementations.
2. defrecord - CLJS has a :defrecord node, Clojure doesn't

I don't think there's anything wrong with having interop nodes, but I'd like to ultimately segregate them. Both compilers currently pipeline muddles together parsing, macro expansion, higher level analysis, platform concerns, optimizations, etc.

Yes, I like it.

What about other Clojure implementations? Anyone want to chime in about the Lua, Python, C (etc.) implementations? I know CLR has exactly the same analyser as JVM.
 
It would be nice to have the analyzer made completely agnostic of the target platform in the earliest stage: before macro expansion. The AST before macro expansion would be useful for codeq to find usages of macros, etc. This would enable target-agnostic analysis to occur without having to ignore target-specific nodes.

Great idea.
 
 
3. local binding "init" nodes are slightly simpler in CLJS

I have some ideas about further improving this, but let's focus on :loop and implict :do right now :-)

Perhaps add it to the design page for the meantime.
 
5. Some small differences with try/catch which I don't fully understand.

JavaScript's catch does not provide conditional catching by type like Java does. There is a 'try macro which expands to the 'try* form. The 'try* form uses a 'cond to simulate type filtering.

Ah, got it (even though you just explained it to me on IRC). Any issues here? Does CLJS/CLJ need to change here?
 
 
Again, thank you Brandon for your great work.

My pleasure. I'd really love to see Typed-Clojure working on a common AST for both Clojure and ClojureScript!

 That would be awesome.

Ambrose

Brandon Bloom

unread,
Dec 12, 2012, 4:35:57 AM12/12/12
to cloju...@googlegroups.com, abonnair...@gmail.com

So I assume this is doable without having a separate :loop node: :loop would just be more convenient for some users of analysis. 

Could this change mean more work for some common usages of analysis? Possibly yes for some very simple analysis of the AST, but it seems like a welcome change more complicated libraries like CPS.

For Typed Clojure: handling of :let and :loop are almost identical, except for adding a recur point.

It's an 'if vs a 'defmethod and where such an 'if goes exactly :depends on the needs of the AST pass. The main benefit here is having 1-to-1 special forms with op keywords, which is mostly a cosmetic change. However, if we're going to change the AST, we should try to minimize the number of times that downstream consumers need to respond to a break.

 > This would enable target-agnostic analysis to occur without having to ignore target-specific nodes.
Great idea.

There's a bunch more complexity to that whole thing, best we save that for another thread. 

 
Perhaps add it to the design page for the meantime.

Will do when I get a chance. 
 
Ah, got it (even though you just explained it to me on IRC). Any issues here? Does CLJS/CLJ need to change here?

There's a more general "what to do about error handling" problem. Even little things like (throw (Exception. ...)) vs (throw (Error. ...)) really hinder code portability. Unless it gets cleaned up dramatically, I view throw and catch as interop forms :-)
 

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 4:54:34 AM12/12/12
to cloju...@googlegroups.com
On Wed, Dec 12, 2012 at 5:35 PM, Brandon Bloom <snpr...@gmail.com> wrote:
The main benefit here is having 1-to-1 special forms with op keywords, which is mostly a cosmetic change.

Ok. To reinforce that argument, not having a direct correspondence between AST nodes and special forms is likely to expose implementation details, as we can see with :is-loop.

So to summarise so far:

1. Adding a separate :loop AST node 
 - does not expose the implementation detail of the similarity to :let
 - gives a 1-1 correspondence of special forms to AST nodes
 - is a breaking change, usually requires moving some logic/dispatch one level "up": to the multimethod.

2. Eliminating `analyze-block` relieves consumers of analysis from extra work of special-casing what really should be 'do blocks.

Ambrose

Herwig Hochleitner

unread,
Dec 12, 2012, 8:33:17 AM12/12/12
to cloju...@googlegroups.com
Brandon, thanks for working on that. It's an area of interest for me aswell.

Fortunately I don't have any code that would break, except for an experimental walker that rebuilds :children and other dependent fields on update. 

- Eliminating implicit block nodes would probably simplify said walker.
- I'm indifferent towards :loop nodes, but having a 1-1 correspondence is probably a net win.

The similarities between fn, let, loop, and catch could be handled by deriving them from :lexical-bindings, thoughts on this?

What is the process for making breaking changes to the AST?

In my opinion, ClojureScript currently has the freedom to make breaking AST changes without a lot of ceremony, since we are still at 0.1-SNAPSHOT.

I'm interested in cleaning up and documenting the AST format, which is a requirement to publish the AST format to authors of pluggable optimizations, via compilation units.
In the light of that it might be beneficial to start versioning the AST format.

I will ponder this some more and have a look at the patches when I get home in the (middle-european) evening.

Ambrose: I was thinking about starting a branch to document the different AST nodes with typed-clojure. Do you think that's feasible? 



2012/12/12 Ambrose Bonnaire-Sergeant <abonnair...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 8:37:41 AM12/12/12
to cloju...@googlegroups.com
On Wed, Dec 12, 2012 at 9:33 PM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
Ambrose: I was thinking about starting a branch to document the different AST nodes with typed-clojure. Do you think that's feasible? 

What do you mean? The AST's are defined in `analyze`.

Ambrose

Herwig Hochleitner

unread,
Dec 12, 2012, 8:56:32 AM12/12/12
to cloju...@googlegroups.com
My thinking was:

- create a heterogenous map type for every AST node type (:let, :fn, :try, ...), which would exactly document its fields
- annotate methods in analyzer and emitter to produce and consume those types

I understand that annotating multimethods is work in progress, but is there some way to annotate just a defmethod?

The main purpose, anyway, would be to have a formal description of AST types that could easily be maintained, extracted to a website and maybe even automatically verified.

Does that sound reasonable?


2012/12/12 Ambrose Bonnaire-Sergeant <abonnair...@gmail.com>

--

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 9:03:23 AM12/12/12
to cloju...@googlegroups.com
Oh! Here is a start: https://github.com/frenchy64/typed-clojure/blob/master/test/typed/test/compiler.clj#L56

That probably doesn't even compile at the moment, but that's what it'd look like.

Multimethods don't work at all. But I'll consider that a feature request.

This is certainly feasible and a good idea, but for now just writing the pseudocode for the types is all you can really do.
I'll keep you updated.

Ambrose

Ambrose Bonnaire-Sergeant

unread,
Dec 12, 2012, 9:05:20 AM12/12/12
to cloju...@googlegroups.com
BTW:

- (HMap {...}) is equivalent to '{...}, but you can add (HMap {} :optional {...} for optional keys.
- (Rec [x] ...) defines a recursive type, where `x` is the whole type.

David Nolen

unread,
Dec 12, 2012, 11:48:13 AM12/12/12
to cloju...@googlegroups.com
Both of these changes seem sensible to me. Can we get enhancement tickets + squashed patches into JIRA so we test this out?

Thanks,
David


--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/clojure-dev/-/7v5IcvGzTigJ.

Brandon Bloom

unread,
Dec 12, 2012, 2:19:08 PM12/12/12
to cloju...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages