For the past 3 months I've been working as part of my GSoC project,
on a port of the clojure compiler and analyzer to clojure.
Tomorrow the GSoC will end, so this is a report of what I've accomplished
with this project thus far; note that I'm not going to stop working on
CinC now that the GSoC is over.
First, a link to the project if anybody is interested:
https://github.com/Bronsa/CinC
What's there?
The project is made of 2 major components: the analyzer and the emitter.
The analyzer is based heavly on the clojurescript analyzer, I've used
(successfully) the :children-keys approach to write a generic walker
over the AST and to implement multiple passes in order to
annotate the AST with all the information needed in order to compile to
JVM bytecode.
It should be noted that this analyzer provides all the information
collected by the clojure compiler (info on locals-clearing, loop-locals
invalidation etc) in a much more accessible way, exposing it all as
fields in a hash-map.
The compiler takes this AST and constructs an AST representing the class
to be emitted, expressing the byte-code as a data-structure too, and it
subsequently interprets this AST evaluating the expression.
The `doc` folder contains some further documentation on how the
analyzer/compiler works, it's not much yet but more is to come.
What's not there yet?
While `cinc.compiler.jvm.bytecode/eval` is capable of evaluating all of
clojure special forms, primitive support mostly not in place, and should
be expected to be broken.
This means that some expression that need primitive support to work, for
example `defrecord` will not work.
What's going to happen?
CinC is a project I wanted to work on for quite some time, and I'm not
going to abandon it now that the GSoC is over, I have things I want to
experiment with CinC (and I hope I'm not the only one), my current to-do
list is:
* get primitive support fully working (including invokePrim)
* refactor :tag/:cast/:box handling
* standardize the AST format between CinC, clojurescript and
jvm.tools.analyzer, David Nolen and Ambrose B.S. are obviously who I'm
most looking forward to talking to about this, but any opinion will be
be greatly appreciated.
* have 2 compile target: one "normal" target that won't do any type
specialization and emit a dynamic bytecode (mostly for repl
experimentations), one "optimized" target that will do aggressive
tag inference and compile to more static code trying to avoid most of
the runtime reflection.
* experiment dynamically emitting invokePrim interfaces for primitive
types other than long or doubles
Ideas/feedbacks/contributions are greatly appreciated!
Thanks,
Nicola