Bypassing the parser

Showing 1-6 of 6 messages
Bypassing the parser Brandon Bloom 11/8/12 11:33 PM
I'm curious if it's possible (or sane) to target the Closure Compiler's AST directly. In particular, I'm interested in building a Node tree with JSType information for use with both TypedCodeGenerator-based printing and all the wonderful optimization and compression features offered by Compiler.

Here's the background:

The ClojureScript compiler currently emits Google Closure compatible JavaScript source code. I am experimenting with migrating towards emitting a JavaScript AST, instead of source code strings. See https://github.com/brandonbloom/clojurescript/blob/js-ast/src/clj/cljs/compiler.clj and https://github.com/brandonbloom/clojurescript/blob/js-ast/src/clj/cljs/js.clj

I have a few goals with this:

1) Eliminate the printing side effect, so that the ClojureScript compiler can be simplified by rearranging the order of various things without worrying about the impact on printing.
2) Speed up CLJS compilation by bypassing writing a file to disk and then immediately reading it back in and re-parsing it.
3) Learn a whole lot of stuff about compilers :-)

In order to get this up and running quickly, I used reflection to hack together that to-source function which takes a Node and turns it to JavaScript source. I've got all of the key ClojureScript libraries to compile using my modified compiler. The missing bit, however, are the type annotations. It looks like a pretty complex beast to set up a JSTypeRegistry, construct JSTypeExpressions, JSType objects, etc, and attach them to the AST. I don't see an obvious way to get the Compiler class and the various CompilerInput bits to work directly against an AST instead of a source code string of some sort.

Here's some specific questions:

1) What's the quickest way to get some minimum JSType objects onto my AST? At minimum, I need @constructor annotations on a few functions in the output of TypedCodeGenerator.
2) Is the Compiler class up to the task of working with completely synthetic ASTs in the absence of source files? What issues will I run into? Has some one tried this before?
3) How stable is the interface I'll need to code against? Are the Rhino or JSTypeExpression ASTs changing much? In what ways?
4) Is this a use case the Closure Compiler team is interested in supporting?

Thanks,
Brandon Bloom
Re: [closure-compiler-discuss] Bypassing the parser Nick Santos 11/9/12 5:28 AM
is the goal to write your own coffeescript type inference engine and
attach a type to every node, or to tell closure-compiler about type
contracts, or to just add a few @constructor tags?

If you're tending towards the later, it might be a lot easier just to
use JSDocInfoBuilder and attach JSDocInfo.

> 2) Is the Compiler class up to the task of working with completely synthetic
> ASTs in the absence of source files? What issues will I run into? Has some
> one tried this before?

A year or two ago, there was a proposal to define a common interchange
format for Javascript syntax trees.
http://code.google.com/p/es-lab/wiki/JsonMLASTFormat
http://code.google.com/p/closure-compiler/source/browse/trunk/src/com/google/javascript/jscomp/jsonml/
We abandoned it due to lack of interest, but the ecmascript committee
people might still be pursuing it.

> 3) How stable is the interface I'll need to code against? Are the Rhino or
> JSTypeExpression ASTs changing much? In what ways?

They change often enough that we softly discourage people writing code
directly against the AST, but we understand that there are sometimes
reasonable engineering reasons to do so. Most of the recent changes
can be seen here:
http://code.google.com/p/closure-compiler/source/list?path=/trunk/src/com/google/javascript/jscomp/parsing/IRFactory.java&start=2295

> 4) Is this a use case the Closure Compiler team is interested in supporting?

We tend to be pretty results-oriented, i.e., if that's the best
solution, then it's a solution worth supporting.

So I'd be interested in hearing more about how effective your solution
is. The SourceFile/SourceAst API is pretty rich, and we have existing
clients that use it to read the source code directly from memory
(bypassing file I/O), or to duplicate/cache the AST during incremental
builds, so that you can use the cached AST if the file hasn't changed.
I'm not sure what kind of savings you get from generating the AST
directly.

Nick
Re: [closure-compiler-discuss] Bypassing the parser Brandon Bloom 11/9/12 12:30 PM
Thanks for the quick response!


is the goal to write your own coffeescript type inference engine and
attach a type to every node, or to tell closure-compiler about type
contracts, or to just add a few @constructor tags?

Did you mean ClojureScript? Or is there a CoffeeScript type inference project I'm not aware of?

If you're tending towards the later, it might be a lot easier just to
use JSDocInfoBuilder and attach JSDocInfo.

The currently CLJS compiler release only really emits @constructor tags and a few @param tags. I think the constructor ones are the only ones that really matter at this stage, so I'll probably just start there.

Clojure has a pretty decent type hinting system, which may make sense to extend down to the Google Closure / JavaScript level in the future.
 
They change often enough that we softly discourage people writing code
directly against the AST, but we understand that there are sometimes
reasonable engineering reasons to do so. Most of the recent changes
can be seen here:
http://code.google.com/p/closure-compiler/source/list?path=/trunk/src/com/google/javascript/jscomp/parsing/IRFactory.java&start=2295

Thanks for the info. I'll make sure to factor in your guidance when considering how far to push my little experiment.
 
So I'd be interested in hearing more about how effective your solution
is. The SourceFile/SourceAst API is pretty rich, and we have existing
clients that use it to read the source code directly from memory
(bypassing file I/O), or to duplicate/cache the AST during incremental
builds, so that you can use the cached AST if the file hasn't changed.
I'm not sure what kind of savings you get from generating the AST
directly.

I haven't dug into measurements yet. Anecdotally, adding the JS AST step (rather than emitting JS source directly) has slowed down the CLJS compiler. That was expected because it's simply an extra step at this point. I was hoping to recover that speed lost, as well as pick up some new gains by bypassing one round-trip to JS Source. I'll look more deeply at SourceFile and SourceAst.

We depend pretty heavily on the optimizations made by the Closure Compiler to make ClojureScript fast. Assuming we can tolerate upstream changes to the AST, and if I can recover the lost performance (or make gains!), then working with an AST is much preferable to working with source code strings for a host of reasons. I'd love to be able to tell the ClosureScript community that this is a viable option; hence my experiments with this.
Re: [closure-compiler-discuss] Bypassing the parser Nick Santos 11/9/12 1:16 PM
On Fri, Nov 9, 2012 at 3:30 PM, Brandon Bloom <snpr...@gmail.com> wrote:
> Thanks for the quick response!
>
>
>> is the goal to write your own coffeescript type inference engine and
>> attach a type to every node, or to tell closure-compiler about type
>> contracts, or to just add a few @constructor tags?
>
>
> Did you mean ClojureScript? Or is there a CoffeeScript type inference
> project I'm not aware of?

haha. it was just early in the morning and i needed coffee.

yes, I meant ClojureScript.

It wasn't clear to me how much type information you have when you do
the translation (i.e., if you can differentiate type info by symbol,
or by symbol-reference, or by individual AST node).

>
>> If you're tending towards the later, it might be a lot easier just to
>> use JSDocInfoBuilder and attach JSDocInfo.
>
>
> The currently CLJS compiler release only really emits @constructor tags and
> a few @param tags. I think the constructor ones are the only ones that
> really matter at this stage, so I'll probably just start there.
>
> Clojure has a pretty decent type hinting system, which may make sense to
> extend down to the Google Closure / JavaScript level in the future.

You also might consider a hybrid approach where you construct the AST
for the source code (harder to maintain, but faster), but use the
JsDocInfoParser to construct the JSDoc nodes (easier to maintain, but
slower).

using reflection and/or editing the source to make things public is
the right approach for this kind of experimentation.

> We depend pretty heavily on the optimizations made by the Closure Compiler
> to make ClojureScript fast. Assuming we can tolerate upstream changes to the
> AST, and if I can recover the lost performance (or make gains!), then
> working with an AST is much preferable to working with source code strings
> for a host of reasons. I'd love to be able to tell the ClosureScript
> community that this is a viable option; hence my experiments with this.

OK. I think that merging in the upstream changes is probably viable. I
could imagine it being a big win, or it might not be. Keep us posted
:)

Nick
Re: [closure-compiler-discuss] Bypassing the parser Brandon Bloom 11/9/12 1:56 PM
> It wasn't clear to me how much type information you have when you do 
> the translation (i.e., if you can differentiate type info by symbol, 
> or by symbol-reference, or by individual AST node). 

Right now we have some limited dataflow analysis of types. In ClojureScript, they are used primarily for identifying places to optimize boolean tests where Clojure's notion of truthy/falsey can be simplified to JavaScript's. That type information doesn't make it to the Google Closure Compiler in any interesting way yet. Big brother JVM Clojure has much richer type analysis for use in eliminating reflection in the emitted JVM byte code. I'm not sure how much we would benefit in ClojureScript by beefing up the number of type annotations we provide to the Closure Compiler.

> You also might consider a hybrid approach where you construct the AST 
> for the source code (harder to maintain, but faster), but use the 
> JsDocInfoParser to construct the JSDoc nodes (easier to maintain, but 
> slower). 

Good idea. I'll think about that too.

> OK. I think that merging in the upstream changes is probably viable. I 
> could imagine it being a big win, or it might not be. Keep us posted 

Definitely a win for clarity: I managed to shrink the size of the code generator by about 20% and that was just as a straight port from the string emitting approach. I expect even bigger gains there after some additional refactoring. Now I'm trying to see if I can get a comparable performance win :-)
Re: [closure-compiler-discuss] Bypassing the parser John 11/9/12 2:28 PM
I wrote code for the GWT project to create the Closure AST from the GWT JS AST.   That code is available here:


The key thing is that you need to be sure to use Closure's AstValidator to double check you create an AST with the right shape.   As Nick says you either need to pin to a particular release of the compiler or be willing to keep up with the compiler's internal state.  You want to take care to properly set the source locations for each node so that the source maps produced are useful.

Unfortunately, we don't have a similar validator for the JSDoc but it should generally be self evident (directly reflecting the source) and but you can ask if things are not as expected.