Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[GSoC] Proposal: Implement Profile Guided Optimization in IonMonkey (Draft)

58 views
Skip to first unread message

Wei WU(吴伟)

unread,
Apr 27, 2013, 4:04:39 AM4/27/13
to dev-tech-js-en...@lists.mozilla.org, Nicolas Pierron, Luke Wagner
Hi,

I have finished the initial version of my proposal and submitted to GSoC
website. I put a copy here in case you don't have permission to comment on
it. You can also review my proposal via this link:
https://github.com/lazyparser/gsoc2013/blob/master/proposal.md

Your comments and suggestions are welcome!
*
*
*Title*
Implement Profile Guided Optimization in IonMonkey

*Abstract*
The aim of the project is to implement a branch profiling module and
improve IonMonkey optimizations by leveraging the information gathered from
the profiling module.

*Project Proposal*
I propose to implement a branch profiling module and a MIR pass for pruning
infrequently used branches in IonMonkey.

*Principles*
Profile Guided Optimizations have been adopted by modern compilers, such as
LLVM and many JIT compilers in Java Virtual Machines. Lots of optimizations
can benefit from the profiling data to generate faster codes. Currently
IonMonkey lacks the mechanism to gather branch profiling information and
has to use heuristics to guess the probability of each conditional jump.
These heuristics are usually conservative and may block further
(aggressive) optimization opportunities.

By introducing the branch profiling data IonMonkey can aggressively remove
the infrequently used basic blocks and therefore simplify subsequent
analyses like type inference and alias analysis. Nicolas Pierron, an
IonMonkey developer, suggested that other optimizations such as GVN & LICM
might benefit from the profiling information even if branches were not
removed.

*Project scope*
Ideally, all analyses and optimizations existed in IonMonkey might be
benefit from using branch profiling data and should be explored. The
profiling module should be able to gather multiple types of running
information other than conditional jumps only. However, this project will
implement the branch profiling functionality in baseline compiler and a MIR
transformation pass, due to the time limitation of a GSoC project.

*Technical details*
This project can be divided into two parts, i.e. the profiling module and
the MIR transformation pass. I will describe the details of them in this
section.

*Branch Profiling Module*
Branch Profiling Module (BPM) collects the targets of conditional jumps by
instrumenting SpiderMonkey engine. There are three potential
instrumentation points in SpiderMonkey. Instrumenting interpreter is the
easiest way to get things work but the the data it collects is fragmented
and less important, since(for) "hot" JavaScript methods are not executed by
interpreter. IonMonkey is another place that can insert profiling code but
the overhead caused by profiling makes it unfeasible. BPM leverages the
Baseline Compiler to generate instrumented machine codes. BPM rewrites the
output of baseline compiler when it generating code for JSOP_IFEQ,
JSOP_IFNE, and JSOP_TABLESWITCH bytecodes.

BPM uses a circular buffer to store data and allocates one buffer for each
IonScript object. Every time BPM is triggered it pushes both the address of
the conditional jump instruction and the target address it jumps to. Main
benefits of this design are that not only frequencies of each branch can be
calculated but also the relationships between jumps are able to be inferred
by advanced algorithms.

BPM can be switched on/off through command line options or browser's
configuration panel. Also it can be turned on/off via JSAPIs. The design of
the interface would be like this:

JS_IsBranchProfilingEnabled(JSContext* cx);
JS_EnableBranchProfiling(JSContext* cx);
JS_DisableBranchProfiling(JSContext* cx);

Although BPM can only be switched on/off at JSContext granularity outside
SpiderMonkey, it stores a profiling status flag in IonScript so that
SpiderMonkey has the ability to profile a JSScript (IonScript)
individually.

*MIR Branch Pruning Pass*
The Branch Pruning Pass (BPP) is invoked right after the MIR is generated
by IonBuilder. Based on the probabilities converted from the branch
profiling buffer BPP speculatively removes unused or infrequently used
basic blocks. After the removal of "cold branches", BPP is able to reduce
the type sets and simplify the work of alias analysis.

The main drawback of pruning branches is that when the pruned branch is
taken a bailout will occur. The more basic blocks are removed the more
bailouts may occur. One goal of this project is to find out the balance
point between the aggressiveness of pruning and the costs of bailout.

*Schedule of Deliverables**Deliverables*
The final deliverables would consist of:
1. A patch that implements Circular Buffer utility and related test cases.
2. A patch that implements Branch Profiling Module and related test cases.
3. A patch that implements MIR Branch Pruning Pass and related test cases.
4. A technical report illustrating the relationships among the
aggressiveness of pruning, type inference, alias analysis and the costs of
bailout.

*Schedule*
Week 1 - Week 2: Discuss the project details with my mentor. Get familiar
with the JS Engine team.
Week 3 - Week 4: Implement the Circular Buffer. Test and fix bugs.
Week 5 - Week 6: Implement the Branch Profiling prototype; instrument the
machine codes generated by baseline compiler.
Week 7 - Week 8: Add the convert function to Branch Profiling class. Modify
related MIR nodes to store the probability data.
Week 9 - Week 11: Implement the prototype of Branch Pruning Pass. Discuss
the implementation details with my mentor.
Week 12 - Week 14: Benchmark the performance impact of BPM and BPP with
different pruning policies. Test and fix the bugs in BPP.
Week 15 - Week 16: Preparing the technical report. Submit patches.

*Open Source Development Experience*
TBD.

*Work/Internship Experience*
TBD.

*Academic Experience*
TBD.

*Why Me*
TBD.

*Why Mozilla*
TBD.
--
Best wishes,
Wei Wu (吴伟)

Jan de Mooij

unread,
Apr 27, 2013, 5:19:57 AM4/27/13
to Wei WU(吴伟), Nicolas Pierron, Luke Wagner, dev-tech-js-en...@lists.mozilla.org
Hi,

This looks really interesting. Not emitting code for cold branches could
not only help LICM and GVN but also register allocation by avoiding
unnecessary spills and moves.

However, my main concerns are:

(1) The circular buffer to collect branch information seems too
complicated. It would be simpler and faster to have the baseline compiler
just increment a counter at the start of every basic block (we can easily
get this information from the bytecode analysis). I think counters also
allow you to keep more useful data (one hot loop will quickly fill the
circular buffer and you lose information about other branches).

(2) Instead of pruning MIR from the graph, IonBuilder could just
immediately terminate "cold" basic blocks (very similar to what it does
when it reaches a JSOP_RETURN or JSOP_THROW). Just add a new MIR
instruction that unconditionally bails out.

(3) Before implementing anything, it would be good to look at some
benchmarks (Kraken, Octane, maybe Emscripten code) and see where we expect
the main perf wins. Then you can hack/prototype something just for that
particular case and see what happens.

Jan
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>

Nicolas B. Pierron

unread,
Apr 27, 2013, 8:48:28 PM4/27/13
to
On 04/27/2013 01:04 AM, Wei WU(吴伟) wrote:
> I have finished the initial version of my proposal and submitted to GSoC
> website. I put a copy here in case you don't have permission to comment on
> it. You can also review my proposal via this link:
> https://github.com/lazyparser/gsoc2013/blob/master/proposal.md

This is Awesome™.

> Your comments and suggestions are welcome!
> *
> *
> *Title*
> Implement Profile Guided Optimization in IonMonkey

PGO is a catch-all term used by static compilers, and frequently associated
with branch prediction based on profiled code.

In the case of VM, TI might be included as a Profiled Guided Optimization,
as it monitor types during the execution.

May be “Implement Branch Prediction in IonMonkey” would be a better title.
I think it would be better to re-use the JSOPTION defined in jsapi.h,
instead of adding a new set of functions.

> Although BPM can only be switched on/off at JSContext granularity outside
> SpiderMonkey, it stores a profiling status flag in IonScript so that
> SpiderMonkey has the ability to profile a JSScript (IonScript)
> individually.
>
> *MIR Branch Pruning Pass*
> The Branch Pruning Pass (BPP) is invoked right after the MIR is generated
> by IonBuilder. Based on the probabilities converted from the branch
> profiling buffer BPP speculatively removes unused or infrequently used
> basic blocks. After the removal of "cold branches", BPP is able to reduce
> the type sets and simplify the work of alias analysis.
>
> The main drawback of pruning branches is that when the pruned branch is
> taken a bailout will occur. The more basic blocks are removed the more
> bailouts may occur. One goal of this project is to find out the balance
> point between the aggressiveness of pruning and the costs of bailout.
>
> *Schedule of Deliverables**Deliverables*
> The final deliverables would consist of:
> 1. A patch that implements Circular Buffer utility and related test cases.
> 2. A patch that implements Branch Profiling Module and related test cases.
> 3. A patch that implements MIR Branch Pruning Pass and related test cases.
> 4. A technical report illustrating the relationships among the
> aggressiveness of pruning, type inference, alias analysis and the costs of
> bailout.

As Jan suggested, The step 4 would good to have some idea of the relevance
of the approach before the end, and to come with estimates of the
improvements. One way to do it would be to instrument one of the most
promising benchmark (by modifying the JavaScript code) such as we can test
an approximation of the gain.

> *Schedule*
> Week 1 - Week 2: Discuss the project details with my mentor. Get familiar
> with the JS Engine team.
> Week 3 - Week 4: Implement the Circular Buffer. Test and fix bugs.
> Week 5 - Week 6: Implement the Branch Profiling prototype; instrument the
> machine codes generated by baseline compiler.

If you plan on taking 2 weeks for the circular buffer, then we should
probably go for simple counters on each branch target. Such as we can keep
more time for looking into induced issues of the approach.

Also Jan is correct, a really hot loop will erase the profiled information
unless it is summarized and attached to the script when the buffer becomes full.

> Week 7 - Week 8: Add the convert function to Branch Profiling class. Modify
> related MIR nodes to store the probability data.

I guess you mean Basic blocks and not MIR nodes.

> Week 9 - Week 11: Implement the prototype of Branch Pruning Pass. Discuss
> the implementation details with my mentor.
> Week 12 - Week 14: Benchmark the performance impact of BPM and BPP with
> different pruning policies. Test and fix the bugs in BPP.
> Week 15 - Week 16: Preparing the technical report. Submit patches.

Submitting patches to GSoC does not implies that patches have to be kept on
hold before the end of the GSoC ;) I would expect more than 3 large patches.

--
Nicolas B. Pierron

Wei WU(吴伟)

unread,
Apr 29, 2013, 9:56:18 AM4/29/13
to Jan de Mooij, Nicolas Pierron, Luke Wagner, dev-tech-js-en...@lists.mozilla.org
Hi,


On Sat, Apr 27, 2013 at 5:19 PM, Jan de Mooij <jande...@gmail.com> wrote:

> Hi,
>
> This looks really interesting. Not emitting code for cold branches could
> not only help LICM and GVN but also register allocation by avoiding
> unnecessary spills and moves.
>
> However, my main concerns are:
>
> (1) The circular buffer to collect branch information seems too
> complicated. It would be simpler and faster to have the baseline compiler
> just increment a counter at the start of every basic block (we can easily
> get this information from the bytecode analysis). I think counters also
> allow you to keep more useful data (one hot loop will quickly fill the
> circular buffer and you lose information about other branches).
>
To avoid the "hot loop" problem we can allocate a counter for each address
in the buffer and increment the counter if this address have been pushed
into the buffer. We can just compare the new address with two or three
newly pushed addresses so the overhead would not be high. But this design
is more complicated, unfortunately.
Circular buffer can provide some sorts of path profiling data, which might
be useful in some optimizations. On the other hand, I agree that the basic
block counters are easier to implement and able to provide enough
information for branch prediction.

>
> (2) Instead of pruning MIR from the graph, IonBuilder could just
> immediately terminate "cold" basic blocks (very similar to what it does
> when it reaches a JSOP_RETURN or JSOP_THROW). Just add a new MIR
> instruction that unconditionally bails out.
>
This approach is fresh to me and seems easier than pruning. I think they
share some common tasks when a basic block is marked "cold": remove the
instructions in the basic block, update use-def chains, reduce type sets
and phi nodes. Pruning MIR from the graph have more work to do and might
gain extra optimization opportunities.
I would like to implement the "terminate" approach first and see whether it
works or not. Then we can implement the "pruning" approach quickly, based
on the "terminate" approach, if necessary.

>
> (3) Before implementing anything, it would be good to look at some
> benchmarks (Kraken, Octane, maybe Emscripten code) and see where we expect
> the main perf wins. Then you can hack/prototype something just for that
> particular case and see what happens.
>
Currently I'm searching Kraken to find some functions for case study but
the progress is quite slow. I could not find such profiling tools that can
show the frequencies of each branch so I literally read the JavaScript code
to make guesses. I use the Gecko profiler, JIT Inspector and IonGraph to
accelerate my work. Could you recommend me some utilities? That would be
great help.

>
> Jan
>
>
>
>
> On Sat, Apr 27, 2013 at 10:04 AM, Wei WU(吴伟) <lazyp...@gmail.com> wrote:
>
>> Hi,
>>
>> I have finished the initial version of my proposal and submitted to GSoC
>> website. I put a copy here in case you don't have permission to comment on
>> it. You can also review my proposal via this link:
>> https://github.com/lazyparser/gsoc2013/blob/master/proposal.md
>>
>> Your comments and suggestions are welcome!
>> *
>> *
>> *Title*
>>
>> Implement Profile Guided Optimization in IonMonkey
>>
>> Although BPM can only be switched on/off at JSContext granularity outside
>> SpiderMonkey, it stores a profiling status flag in IonScript so that
>> SpiderMonkey has the ability to profile a JSScript (IonScript)
>> individually.
>>
>> *MIR Branch Pruning Pass*
>>
>> The Branch Pruning Pass (BPP) is invoked right after the MIR is generated
>> by IonBuilder. Based on the probabilities converted from the branch
>> profiling buffer BPP speculatively removes unused or infrequently used
>> basic blocks. After the removal of "cold branches", BPP is able to reduce
>> the type sets and simplify the work of alias analysis.
>>
>> The main drawback of pruning branches is that when the pruned branch is
>> taken a bailout will occur. The more basic blocks are removed the more
>> bailouts may occur. One goal of this project is to find out the balance
>> point between the aggressiveness of pruning and the costs of bailout.
>>
>> *Schedule of Deliverables**Deliverables*
>>
>> The final deliverables would consist of:
>> 1. A patch that implements Circular Buffer utility and related test cases.
>> 2. A patch that implements Branch Profiling Module and related test cases.
>> 3. A patch that implements MIR Branch Pruning Pass and related test cases.
>> 4. A technical report illustrating the relationships among the
>> aggressiveness of pruning, type inference, alias analysis and the costs of
>> bailout.
>>
>> *Schedule*
>>
>> Week 1 - Week 2: Discuss the project details with my mentor. Get familiar
>> with the JS Engine team.
>> Week 3 - Week 4: Implement the Circular Buffer. Test and fix bugs.
>> Week 5 - Week 6: Implement the Branch Profiling prototype; instrument the
>> machine codes generated by baseline compiler.
>> Week 7 - Week 8: Add the convert function to Branch Profiling class.
>> Modify
>> related MIR nodes to store the probability data.
>> Week 9 - Week 11: Implement the prototype of Branch Pruning Pass. Discuss
>> the implementation details with my mentor.
>> Week 12 - Week 14: Benchmark the performance impact of BPM and BPP with
>> different pruning policies. Test and fix the bugs in BPP.
>> Week 15 - Week 16: Preparing the technical report. Submit patches.
>>

Nicolas B. Pierron

unread,
Apr 29, 2013, 4:22:49 PM4/29/13
to
On 04/29/2013 06:56 AM, Wei WU(吴伟) wrote:
> Hi,
>
>
> On Sat, Apr 27, 2013 at 5:19 PM, Jan de Mooij <jande...@gmail.com> wrote:
>
>> Hi,
>>
>> This looks really interesting. Not emitting code for cold branches could
>> not only help LICM and GVN but also register allocation by avoiding
>> unnecessary spills and moves.
>>
>> However, my main concerns are:
>>
>> (2) Instead of pruning MIR from the graph, IonBuilder could just
>> immediately terminate "cold" basic blocks (very similar to what it does
>> when it reaches a JSOP_RETURN or JSOP_THROW). Just add a new MIR
>> instruction that unconditionally bails out.
>>
> This approach is fresh to me and seems easier than pruning. I think they
> share some common tasks when a basic block is marked "cold": remove the
> instructions in the basic block, update use-def chains, reduce type sets
> and phi nodes. Pruning MIR from the graph have more work to do and might
> gain extra optimization opportunities.
> I would like to implement the "terminate" approach first and see whether it
> works or not. Then we can implement the "pruning" approach quickly, based
> on the "terminate" approach, if necessary.

I am not a big fan of adding that to IonBuilder, the main reason is that
IonBuilder is currently a huge pieces of code and I wish we could move the
Stack-based type information and MIR specialization out of the way, as
separated MIR phases.

The other reason is that it would be harder to integrate the branch pruning
inside IonBuilder as we are trying to recover information from the bytecode,
and we are extremely tied to the bytecode interpretation, I am afraid this
will imply changes to the re-AST-ify logic.

>>
>> (3) Before implementing anything, it would be good to look at some
>> benchmarks (Kraken, Octane, maybe Emscripten code) and see where we expect
>> the main perf wins. Then you can hack/prototype something just for that
>> particular case and see what happens.
>>
> Currently I'm searching Kraken to find some functions for case study but
> the progress is quite slow. I could not find such profiling tools that can
> show the frequencies of each branch so I literally read the JavaScript code
> to make guesses. I use the Gecko profiler, JIT Inspector and IonGraph to
> accelerate my work. Could you recommend me some utilities? That would be
> great help.
>

I made some profiling on PdfJS back in Novemeber. The idea was to have a
uniq identifier for each basic block of each script to dump the profile
information data. (see masm.printf and CogeGenerator::generateBody)

The raw info that I have does not alias basic blocks between compilations,
but I am sure we can add it on the current results that I put online[1].
This file contains what would have been stored by an infinite buffer
collecting the block numbers visited in Ion.

It would be interesting just to note the number of blocks which are never
visited in these benchmarks as well as if some path are rarely visited and
the impact on performances if we were to bailout on these without the
ability to re-enter. (such as we probably don't want to bail if the next
loop is hot and we do not have an OSR entry-point for it)

[1] http://people.mozilla.org/~npierron/pdfjs-pgo.tar.bz2

--
Nicolas B. Pierron

0 new messages