Hello guys,
I am currently a second year MsC Computer Science student at UFMG, the
same university of Rodrigo Sol, who have done Summer of Code project
in TraceMonkey. I am also working with Péricles Alves, who just have
sent a GSoC proposal to IonMonkey. We have been working in IonMonkey
for the last three months writing some optimization passes and
studying its JIT infra-structure.
I would also like to submit the following proposal to GSoC 2012.
Please, tell me if you think it will be a feasible project.
Best regards,
Igor Rafael
--------------------------------------------------------------------------------
Range Analysis and Overflow checks Elimination in IonMonkey
===========================================================
Objective
---------
The objective of this project is to implement a Range Analysis pass
and an overflow check elimination pass that uses the range analysis on
the IonMonkey JavaScript JIT compiler. There is a bug filled on
Mozilla's bugzilla [1] that stated the need of a Range Analysis pass.
And another bug report [2], states the need of overflow checks
elimination to achieve better run times in many JavaScript benchmarks.
How it will be done
-------------------
This project has three main phases:
1) Implementation of e-SSA form
2) Implementation of range analysis
3) Implementation of overflow check elimination.
I will start with the implementation of the extended SSA (e-SSA) form,
described by Bodik et al. in the paper [5]. In the background section
I will explain why the e-SSA form is very useful to range analysis.
This implementation consists in adapting the MIR Generation phase. It
is in this moment that SpiderMonkey's bytecodes are transformed into a
control-flow graph (CFG) and an architecture-independent SSA-form
(IR) is generated. I will modify this phase so that the IR can be
generated in e-SSA form. I already worked in the implementation of an
e-SSA pass in LLVM. Converting a program from SSA form to e-SSA form
is very fast. Furthermore, the overhead is small: the e-SSA form
increases the SSA form program by less than 7% on average, considering
the SPEC CPU 2006 programs in LLVM.
Next, I will implement the Range Analysis algorithm described in [4].
This algorithm has five major steps:
1) Extract constraints from the CFG
2) Build the constraint graph (CG)
3) Compute CG's strongly connected components (SCC)
4) Sort SCCs topologically
5) For each SCC, solve its constraints applying different fix-point iterators
I also worked in the implementation of this algorithm as an LLVM pass,
and know its implementation details very well.
Finally, I will implement an Overflow Checks Elimination pass that
uses the ranges provided by the Range Analysis pass to eliminate
overflow checks.
Background
----------
IonMonkey is the next generation JIT compiler for Mozilla's
SpiderMonkey JavaScript engine. It was built to have much more
organized and explicit data structures, typical of advanced compilers,
than its predecessors (TraceMonkey and JaegerMonkey). It has as goals
to be clean and flexible to enable research and experimentations.
Why overflow check elimination is important?
---
JavaScript uses floating point numbers by default. However, in many
cases the interpreter or JIT compiler can infer, from the syntax of
the program, that some numbers are integers. Operating with integers
is faster than using floating point numbers; hence, this conversion is
desirable whenever possible. However, in order to preserve the
semantics of the original program, the integer operations must be
preceded by an overflow check. These overflow checks are implemented
via conditional branches, so, in addition of slowing down the code,
they make it hard for the compiler to perform code optimizations that
cannot cross basic block boundaries, such as code motion, for
instance.
In many cases, the overflow checks can be eliminated. A typical
example is given in the toy program below, which increments its
arguments if they are less than 10:
inc(x) {
if x < 10
return x + 1;
else
return 0
}
This program clearly cannot lead to an integer overflow; hence, no
check is necessary before the addition. The goal of this project is to
eliminate this kinds of tests.
How can Range Analysis help in the elimination of overflow tests?
---
Range analysis is a compiler analysis whose goal is to determine
statically, for each integer variable, its lower and upper limits.
Range analysis is useful to compilers because it enables many
optimizations, including array bounds check elimination, dead code
elimination, if conversion and, in our particular case, redundant
overflow check elimination. There are many different implementations
of range analysis. The research group in which I am a member has a
very modern implementation, which is publicly available at
http://code.google.com/p/range-analysis/. Together with my colleagues,
I have worked in this algorithm for six months, and today I am very
familiar with it.
I am proposing to implement a sparse version of range analysis. A
dense implementation maps each pair formed by a variable and a program
point to a range interval. Today the compiler community agree that
this approach is not good, because it suffers from scalability issues:
as the number of variables and the program size increases, it tends to
consume too much memory. Nowadays most of the implementations of range
analysis are sparse: they map the range information to the variable
itself. In other words, it finds an interval that is constant along
the entire live range of the variable. Thus, in order of it to be
correct, it must "split" the live ranges of variables. The systematic
way to do this live range splitting results in a program
representation called Extended Static Single Assignment (e-SSA) form.
In order to convert a program to e-SSA form, we must split the live
ranges of variables that are used in conditionals. For instance, given
a program like this one below:
inc(x) {
if x < 10
return x + 1;
else
return 0
}
Its e-SSA representation looks like the program below:
inc(x) {
if x < 10
x0 = x;
x1 = x0 + 1;
return x1;
else
return 0
}
In the program above, which is in e-SSA form, we know that the
variable x0 is always less than 10; thus, an overflow check is not
necessary in the increment.
The e-SSA form is very useful to any compiler analysis that extracts
information from conditionals, and propagates this information
forwardly. Other examples of such analysis include the ABCD algorithm
for array bounds check elimination, and the taint analysis that tracks
the flow of information across the program, trying to prevent attacks
such as cross site scripting, for instance. Therefore, I believe that
not only my project, but also other works that use IonMonkey would
benefit from this implementation.
How does the range analysis algorithm that I am proposing works?
---
The algorithm that we have implemented in the Federal University of
Minas Gerais is a sparse version of range analysis. It has five
phases, that I have mentioned previously. In the first phase,
constraints are extracted from the program. In our LLVM implementation
of range analysis we have 18 different types of constrains. For
instance, in our example program, we would have the constraint: x0 = x
intersection with [-inf, 10], and x1 = x0 + 1.
After extracting constraints, we use them to build a data-structure
called the constraint graph. We have a node nv for each variable v,
and an edge nv0 -> f() -> nv1 if v1 = f(v0) is a constraint in our
constraint system.
The next two phases are optional, but they improve the runtime and
the precision of the range analysis algorithm substantially. We find
strongly connected components (SCCs) in the constraint graph, and then
sort the graph of SCCs topologically. Actually these two steps can be
performed in one single step, via Nuutila's algorithm.
Finally, we must apply widening and narrowing to find the ranges of
variables inside each strongly connected component. These two
operators, widening and narrowing, have been described by Cousot and
Cousot in the seminal paper about abstract interpretation. Widening in
necessary to ensure that an abstract interpretation of the constraints
terminate, and narrowing is necessary to ensure that the ranges found
are precise enough. Both operators run in linear time on the size of
the strongly connected component.
Timeline
--------
- [1 week] Extend the MIR Generation phase to generate code in the e-SSA form.
- [6 weeks] Implement Range Analysis Pass
- [3 weeks] Implement Overflow Checks Elimination Pass
- [1 week] Run extensive tests on the implementation, including
jit-tests and Anion[6]
- [1 week] Write comprehensive report about the results obtained
Expected results
----------------
The main criterion of success will be the improvement in the
performance of IonMonkey, taking into account the compilation latency
and native code runtime.
My background
-------------
I am a second year Computer Science MsC student at the Federal
University of Minas Gerais (UFMG), which is a top-ranked Computer
Science graduate program in Brazil. I have graduated in Computer
Science from this very school on the year of 2010. My MsC's research
is related with Speculative JIT optimizations and I have worked last
year in the TraceMonkey compiler, that started with my GSoC
participation in 2011 under David Mandelin mentoring. I have also
worked with the LLVM compiler, writing optimization passes as one of
my MsC projects.
Why I can do well in this project
---------------------------------
I have being working with Mozilla's SpiderMonkey engine for almost one
year, most of time writing optimizations in TraceMonkey. But for the
last 3 months I have been writing optimization passes for IonMonkey.
In the last year I also worked writing a Range Analysis pass for LLVM
[3], which will be the reference for the implementation of the Range
Analysis pass in IonMonkey. I also have good experience with
programming, having been programing in C++ for the last five years.
References
----------
[1] Range Analysis in Bugzilla -
https://bugzilla.mozilla.org/show_bug.cgi?id=699883
[2] Range Analysis is need to get rid of overflow checks -
https://bugzilla.mozilla.org/show_bug.cgi?id=688078#c8
[3] A Range Analysis Pass for LLVM -
http://code.google.com/p/range-analysis/
[4] Range Analysis Algorithm -
http://range-analysis.googlecode.com/svn/trunk/doc/Manuscript/paper.pdf
[5] R. Bodik, R. Gupta and V. Sarkar. ABCD: Eliminating Array-Bounds
Checks on Demand. In Proceedings of the Conference on Programming
language design and implementation, 2000.
[6] JavaScript fuzzer for IonMonkey -
https://github.com/drakedevel/anion