Crazy idea: ASTs as function metadata?

184 views
Skip to first unread message

Mikera

unread,
Feb 4, 2013, 1:36:29 AM2/4/13
to cloju...@googlegroups.com
Consider the following code:

(defn foo [x]
  "Result!")

(reduce + (map (comp count foo) (range 5)))
=> 35

A smart compiler would recognise that it is dealing with a constant expression, and compile down to a constant. But currently in Clojure it compiles to quite an expensive operation that allocates temporary structures, makes some interface calls, boxes various numbers etc.

(time (dotimes [i 1000000] (reduce + (map (comp count foo) (range 5)))))
"Elapsed time: 1049.207845 msecs"

i.e. we are taking about 1050ns for an operation that should be a constant load in less than 1ns. Not very good, and the JVM JIT can't help us much because it doesn't know enough about the structure of the code to perform the necessary higher-level code transformations.

A key root cause of the problem is that functions such as "foo" is opaque to the compiler. foo has important properties (being a pure function, always evaluating to a constant of type java.lang.String) that the compiler would need to see if it were going to optimise the given expression correctly. Likewise the compiler would need to know facts about the other functions involved (e.g. "comp of a pure function like count and a constant function like foo is itself a constant function", "(range n) produces a sequence of length n" etc.)

It seems to me that we could get this kind of optimisation if we added an AST representation as (hidden?) metadata to functions. The compiler could then use this for all sorts of valuable things:
- inlining for small functions
- spotting opportunities to use primitive versions of functions (avoiding boxing)
- avoiding reflection (often eliminating the need for explicit type hints)
- constant expression folding as above
- using core.logic to reason about possible higher level optimisations
- optionally producing type-check errors at compile time

The only downside I can see is some extra memory usage for keeping the ASTs around after initial compilation. But I don't think it would be a big issue, and you could even provide a "clear-compiled-ast" function if you wanted to null out the ASTs and get your memory back at some later stage (e.g. after your app has fully loaded)

It seems that we would want something like this in order to achieve the twin goals of having a high-level, dynamic functional language while still providing the ability to achieve high performance in production code.

Thoughts? A reasonable idea for Clojure 2.0 / Clojure in Clojure?




kovas boguta

unread,
Feb 4, 2013, 3:24:45 AM2/4/13
to cloju...@googlegroups.com
There are a number of use cases for keeping the AST and/or the source around.

There's been some discussion on this wrt distributed computing,
particularly in conjunction with using cascalog from the repl.

Another branch of this is the codeq effort. Once analyzers have been
fully implemented, this will very nearly contain the information you
want. Though integration of that with the compiler is something I
haven't seen discussion about.
> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure-dev...@googlegroups.com.
> To post to this group, send email to cloju...@googlegroups.com.
> Visit this group at http://groups.google.com/group/clojure-dev?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Ambrose Bonnaire-Sergeant

unread,
Feb 4, 2013, 6:08:41 AM2/4/13
to cloju...@googlegroups.com
I'm interested in adding support for providing type hints.

With the infrastructure core.typed provides, we can eliminate all reflection calls
in the following function, even _without_ a provided static type signature.

(defn silly-eg [x]
  (cond
    (my-class? x) (.some-method x) ; x is MyClass

    (and (another-class? x)
           (.another-method x))  ; x is AnotherClass
    (.some-more-methods x))) ;same

I just need a way to plug this information into the compiler.

Thanks,
Ambrose

Mikera

unread,
Feb 4, 2013, 6:28:40 AM2/4/13
to cloju...@googlegroups.com, abonnair...@gmail.com
This would be awesome - exactly the kind of metadata that would be good to add to the AST representation. 

I suspect we could do other useful optimisations as well by propagating type information, e.g. proving that a value cannot be of a specific type and therefore eliminating instanceof checks and perhaps whole conditional branches?

Also, would it be possible in core.typed to express predicates of the form "X is a sequence of length N" or "X is a pure function"? Being able to assert compile-time properties of values in this way would open the door to even more optimisation potential

Ambrose Bonnaire-Sergeant

unread,
Feb 4, 2013, 6:38:57 AM2/4/13
to cloju...@googlegroups.com
There is no attempt to measure purity in core.typed.

You can track the length of a sequence using an intersection type.

eg. A non-empty seqable is (I (Seqable Any) (CountRange 1))
A seqable of length 2 is (I (Seqable Any) (ExactCount 2))

core.typed can infer this within functions after tests like (= 2 (count x)).

Thanks,
Ambrose

Mikera

unread,
Feb 4, 2013, 7:16:43 AM2/4/13
to cloju...@googlegroups.com
On Monday, 4 February 2013 16:24:45 UTC+8, kovasb wrote:
There are a number of use cases for keeping the AST and/or the source around.

There's been some discussion on this wrt distributed computing,
particularly in conjunction with using cascalog from the repl.

Interesting idea. I suspect source code forms are best suited for distribution rather than full ASTs though..... the source is likely to be much more compact, also the AST may have things in it that aren't nice for serialisation and/or depend on the local machine context (e.g. macroexpanded syntax that assumes a specific JVM version)
 

Another branch of this is the codeq effort. Once analyzers have been
fully implemented, this will very nearly contain the information you
want. Though integration of that with the compiler is something I
haven't seen discussion about.

I think the AST based approach needs to be pretty closely linked with the compiler in order to be useful for compilation and optimisation (as opposed to static analysis, which I think codeq is more geared towards). Ideally it would share the same AST representation with the compiler, so that future compilation can directly use previously stored AST chunks.

Mikera

unread,
Feb 4, 2013, 7:46:43 AM2/4/13
to cloju...@googlegroups.com, abonnair...@gmail.com
That's clear - thanks Ambrose!

I presume that if you were going to integrate core.typed with the compiler, you would also want to automatically generate type signatures based on what you can infer about functions when they are defined?

e.g. 

(defn get-character [^String s index]
  (str (.charAt s index)))

This presents a few opportunities for automatic type inference
- easy?: infer that it returns a String based on knowldedge of the str function
- harder?: infer that index must be an int, or something that can be cast to an int
- hardest?: infer that the returned string is of length 1 - (I String (ExactCount 1)) or something similar.....

Put the result of this kind of type inference into the AST for get-character, and you could then use it to do things like automatically avoiding boxing of the "index" parameter. It would be much nicer if our Clojure compiler of the future could be relied upon to do this kind of optimisation rather than having to inline stuff manually or explicitly declare primitive functions.

Hugo Duncan

unread,
Feb 4, 2013, 9:39:19 AM2/4/13
to cloju...@googlegroups.com
Mikera <mike_r_...@yahoo.co.uk> writes:

> On Monday, 4 February 2013 16:24:45 UTC+8, kovasb wrote:
>
>> There are a number of use cases for keeping the AST and/or the source
>> around.
>>
>> There's been some discussion on this wrt distributed computing,
>> particularly in conjunction with using cascalog from the repl.
>>
>
> Interesting idea. I suspect source code forms are best suited for
> distribution rather than full ASTs though..... the source is likely to be
> much more compact, also the AST may have things in it that aren't nice for
> serialisation and/or depend on the local machine context (e.g.
> macroexpanded syntax that assumes a specific JVM version)

Adding AST data as metadata on forms can be achieved today using [1] analyze.

>> Another branch of this is the codeq effort. Once analyzers have been
>> fully implemented, this will very nearly contain the information you
>> want. Though integration of that with the compiler is something I
>> haven't seen discussion about.
>>
>
> I think the AST based approach needs to be pretty closely linked with the
> compiler in order to be useful for compilation and optimisation (as opposed
> to static analysis, which I think codeq is more geared towards). Ideally it
> would share the same AST representation with the compiler, so that future
> compilation can directly use previously stored AST chunks.


AST transforms (eg. as provided by muir [2]) would definitely benefit
from not having to output source forms that are re-analyzed.


[1] https://github.com/frenchy64/analyze
[2] https://github.com/hugoduncan/muir

kovas boguta

unread,
Feb 4, 2013, 5:55:52 PM2/4/13
to cloju...@googlegroups.com
On Mon, Feb 4, 2013 at 6:39 AM, Hugo Duncan <dunca...@gmail.com> wrote:
> Adding AST data as metadata on forms can be achieved today using [1] analyze.

The issue with that is that it is a manual process. Rather than
something that happens automatically in conjunction with def/defn.

I think the goal with this line of thinking is that a
clojure-in-clojure compiler would be easier to modify to achieve the
desired behavior. As it is, its somewhat tricky.

Mikera

unread,
Feb 7, 2013, 4:11:13 AM2/7/13
to cloju...@googlegroups.com
This project from the Scala world has some relevance here:

https://github.com/TiarkRompf/lancet

Not exactly the same concept but similar objectives: the ability to give the compiler extra intelligence so that it can perform much more powerful optimisations / partial evaluation etc.
Reply all
Reply to author
Forward
0 new messages