[LLVMdev] GSOC Adaptive Compilation Framework for LLVM JIT Compiler

67 views
Skip to first unread message

Xin Tong Utoronto

unread,
Mar 29, 2011, 7:35:07 AM3/29/11
to llv...@cs.uiuc.edu, Xin Tong

Project Description:


LLVM has gained much popularity in the programming languages and compiler industry from the time it is developed. Lots of researchers have used LLVM as frameworks for their researches and many languages have been ported to LLVM IR and interpreted, Just-in-Time compiled or statically compiled to native code. One of the current drawbacks of the LLVM JIT is the lack of an adaptive compilation System. All the non-adaptive bits are already there in LLVM: optimizing compiler with the different types of instruction selectors, register allocators, preRA schedulers, etc. and a full set of optimizations changeable at runtime. What's left is a system that can keep track of and dynamically look-up the hotness of methods and re-compile with more expensive optimizations as the methods are executed over and over. This should improve program startup time and execution time and will bring great benefits to all ported languages that intend to use LLVM JIT as one of the execution methods


Project Outline:


Currently, the LLVM JIT serves as a management layer for the executed LLVM IR, it manages the compiled code and calls the LLVM code generator to do the real work. There are levels of optimizations for the LLVM code generator, and depends on how much optimizations the code generator is asked to do, the time taken may vary significantly. The adaptive compilation mechanism should be able to detect when a method is getting hot, compiling or recompiling it using the appropriate optimization levels. Moreover, this should happen transparently to the running application. In order to keep track of how many times a JITed function is called. This involves inserting instrumentational code into the function's LLVM bitcode before it is sent to the code generator. This code will increment a counter when the function is called. And when the counter reaches a threshold, the function gives control back to the LLVM JIT. Then the JIT will look at the hotness of all the methods and find the one that triggered the recompilation threshold. The JIT can then choose to raise the level of optimization based on the algorithm below or some other algorithms developed later.


IF (getCompilationCount(method) > 50 in the last 100 samples) = > Recompile at Aggressive
ELSE Recompile at the next optimization level.


Even though the invocation counting introduces a few lines of binary, but the advantages of adaptive optimization should far overweigh the extra few lines of binary introduced. Note the adaptive compilation framework I propose here is orthogonal to the LLVM profile-guided optimizations. The profile-guided optimization is a technique used to optimize code with some profiling or external information. But the adaptive compilation framework is concerned with level of optimizations instead of how the optimizations are to be performed.


Project Timeline:


This is a relatively small project and does not involve a lot of coding, but good portion of the time will be spent benchmarking, tuning and experimenting with different algorithms, i.e. what would be the algorithm to raise the compilation level when a method recompilation threshold is reached, can we make this algorithm adaptive too, etc. Therefore, my timeline for the project is as follow


Week 1
Benchmarking the current LLVM JIT compiler, measuring compilation speed differences for different levels of compilation. This piece of information is required to understand why a heuristic will outperform others


Week 2
Reading LLVM Execution Engine and Code Generator code. Design the LLVM adaptive compilation framework


Week 3 - 9
Implementing and testing the LLVM adaptive compilation framework. The general idea of the compilation framework is described in project outline


Week 10 - 13
Benchmarking, tuning and experimenting with different recompilation algorithms. Typically benchmarking test cases would be


Week 14
Test and organize code. Documentation


Overall Goals:

 

My main goal at the end of the summer is to have an automated profiling and adaptive compilation framework for the LLVM. Even though the performance improvements are still unclear at this point, I believe that this adaptive compilation framework will definitely give noticeable performance benefits, as the current JIT compilation is either too simple to give a reasonably fast code or too expensive to apply to all functions.

 

Background:

 

I have some experience with the Java Just-In-Time compiler and some experience with LLVM. I have included my CV for your reference. I don't have a specific mentor in mind, but I imagine that the existing mentors from LLVM would be extremely helpful.

 

 


Xin Tong

 

Email:x.t...@utoronto.ca

 

 

 

Creative, quality-focused Computer Engineering student brings a strong blend of programming, design and analysis skills. Offers solid understanding of best practices at each stage of the software development lifecycle. Skilled at recognizing and resolving design flaws that have the potential to create downstream maintenance, scalability and functionality issues. Adept at optimizing complex system processes and dataflows, taking the initiative to identify and recommend design and coding modifications to improve overall system performance. Excels in dynamic, deadline-sensitive environments that demand resourcefulness, astute judgement, and self-motivated quick study talents.Utilizes excellent time management skills to balance a demanding academic course of studies with employment and volunteer pursuits, achieving excellent results in all endeavours.

 

STRENGTHS & EXPERTISE

 

Compiler Construction • Compiler Optimization • Computer Architecture • Bottleneck Analysis & Solutions

Coding & Debugging • Workload Prioritization •  Team Collaboration & Leadership   

Software Testing & Integration  • Test-Driven Development

 

EDUCATION & CREDENTIALS

 

BACHELOR OF COMPUTER ENGINEERING

University of Toronto, Toronto, ON, Expected Completion 2011

Compiler· Operation Systems · Computer Architecture  

 

 

Cisco Certified Networking Associate, July 2009

 

PROFESSIONAL EXPERIENCE

 

Java VIRTUAL MACHINE JIT Developer                                                                                                       Aug 2010-May 2011

IBM, Toronto, Canada

 

  • Working on the PowerPC code generator of  IBM Just-in-Time compiler for Java Virtual Machine.
  • Benchmarking Just-in-Time compiler performance, analyzing and fixing possible regressions.
  • Triaging and fixing defects in the Just-in-Time compiler
  • Acquiring hand-on experience with powerpc assembly and powerpc binary debugging with gdb and other related tools

 

 

Java VirTual Mahine Developer , Extreme Blue                                                                               May 2010-Aug 2010

IBM, Ottawa, Canada

  • Architected a multi-tenancy solution for IBM J9 Java Virtual Machine for hosting multiple applications within one Java Virtual Machine. Designed solutions to provide good tenant isolation and resource control for all tenants running in the same Java Virtual Machine.
  • Worked on Java class libraries and different components of J9 Java Virtual Machine, including threading library, garbage collector, interpreter, etc.

 

 

 

Continued…

Xin Tong                                                                                                page 2

 

Graphics Compiler Developer                                                                                                                    May 2009-May 2010

Qualcomm,San Diego, USA

  • Recruited for an internship position with this multinational telecommunications company to work on their C++ compiler project.
  • Developed a static verifier program which automatically generates and addsintermediate language code to test programs to make them self-verifying. Then the test programs are used to test the C++ compiler, ensuring that it can compile code correctly.
  • Utilizes in-depth knowledge of LLVM systems and algorithms to generate elegant and robust code.  

 

 

 

ACADEMIC PROJECTS

 

COMPILER OPTIMIZER IMPLEMENTATION (Dec. 2010 – Apr 2011) : Implemented a compiler optimizer on the SUIF framework. Implemented control flow analysis, data flow analysis, loop invariant code motion, global value numbering, loop unrolling and various other local optimizations.

 

GPU COMPILER IMPLEMENTATION (Sept. – Dec. 2010) : Implemented a GPU compiler that compiles a subset of the GLSL language to ARB language which then can be executed on GPU. Wrote the scanner and parser using Lex and Yacc and a code generator in a OOP fashion

 

Malloc Library Implementation (Oct.-Nov. 2008) : Leveraged solid understanding of best fit algorithm and linkedlist data structure to design a malloc library to perform dynamic memory allocation. Implemented the library with C programming language to ensure robust and clear coding for 1000 line codes. Optimized library on the code level to obtain a 6% increase of allocation throughput. Harnessed knowledge of trace files and drivers to test and evaluate the malloc library’s throughput and memory utilization.

 

 

 

COMPUTERSKILLS

 

Programming Languages

C ·C++ ·Java

Operating Systems

Linux

Software Tools

GDB · GCC   

 

 

Extracurricular Activities

 

Elected Officer, Institute of Electrical & Electronics Engineers, University of Toronto Branch, Since May 2009

Member, Institute of Electrical & Electronics Engineers, Since 2007

Member, University of Toronto E-Sports Club, 2007

Member, University of Toronto Engineering Chinese Culture Club, 2007

Member, University of Toronto Robotics Club, 2007

--
Kind Regards

Xin Tong

Reid Kleckner

unread,
Mar 29, 2011, 1:25:53 PM3/29/11
to Xin Tong Utoronto, Xin Tong, llv...@cs.uiuc.edu
Neat project idea!

When we used the JIT for Unladen Swallow, one of our pain points was that code generation took a very long time. Just turning off the IR optimizations had negligible impact on compilation time.  Things may have changed these days, but I would start by having the adaptive JIT splat out code with fast isel and fast regalloc.

I see that you put it outside the scope of your project, but eventually it would be nice if the first code could be instrumented in a way that gathers information to the as-yet unimplemented profile guided optimizations.

Reid

On Tue, Mar 29, 2011 at 7:35 AM, Xin Tong Utoronto <x.t...@utoronto.ca> wrote:
> ...

Eric Christopher

unread,
Mar 29, 2011, 4:45:34 PM3/29/11
to Xin Tong Utoronto, Xin Tong, llv...@cs.uiuc.edu
>
> Project Outline:
>
>
>
> Currently, the LLVM JIT serves as a management layer for the executed LLVM IR, it manages the compiled code and calls the LLVM code generator to do the real work. There are levels of optimizations for the LLVM code generator, and depends on how much optimizations the code generator is asked to do, the time taken may vary significantly. The adaptive compilation mechanism should be able to detect when a method is getting hot, compiling or recompiling it using the appropriate optimization levels. Moreover, this should happen transparently to the running application. In order to keep track of how many times a JITed function is called. This involves inserting instrumentational code into the function's LLVM bitcode before it is sent to the code generator. This code will increment a counter when the function is called. And when the counter reaches a threshold, the function gives control back to the LLVM JIT. Then the JIT will look at the hotness of all the methods and find the on!

e that triggered the recompilation threshold. The JIT can then choose to raise the level of optimization based on the algorithm below or some other algorithms developed later.
>
>
> IF (getCompilationCount(method) > 50 in the last 100 samples) = > Recompile at Aggressive
> ELSE Recompile at the next optimization level.
>
>
> Even though the invocation counting introduces a few lines of binary, but the advantages of adaptive optimization should far overweigh the extra few lines of binary introduced. Note the adaptive compilation framework I propose here is orthogonal to the LLVM profile-guided optimizations. The profile-guided optimization is a technique used to optimize code with some profiling or external information. But the adaptive compilation framework is concerned with level of optimizations instead of how the optimizations are to be performed.
>

So, one way that current projects use the JIT is via getPointerToFunction() which returns an address that can then be casted and called with the appropriate arguments. The compile task itself is often done on a separate thread. How would you deal with the updating problem in the calling application? What sort of use cases for the JIT have you looked at so far?

>
> This is a relatively small project and does not involve a lot of coding, but good portion of the time will be spent benchmarking, tuning and experimenting with different algorithms, i.e. what would be the algorithm to raise the compilation level when a method recompilation threshold is reached, can we make this algorithm adaptive too, etc. Therefore, my timeline for the project is as follow
>
>
> Week 1
> Benchmarking the current LLVM JIT compiler, measuring compilation speed differences for different levels of compilation. This piece of information is required to understand why a heuristic will outperform others
>

> Week 10 - 13
> Benchmarking, tuning and experimenting with different recompilation algorithms. Typically benchmarking test cases would be
>

What do you have in mind for benchmarking? Which of the jitted problems were you looking at, or just running large programs through lli and that interface? (Which isn't threaded and therefore doesn't have the problems I mentioned above - it has other problems).

>
> Week 14
> Test and organize code. Documentation
>

As a general note all of these things would need to be done during the project along with incremental changes made to the repository (on a branch if possible).

> Overall Goals:
>
>
> My main goal at the end of the summer is to have an automated profiling and adaptive compilation framework for the LLVM. Even though the performance improvements are still unclear at this point, I believe that this adaptive compilation framework will definitely give noticeable performance benefits, as the current JIT compilation is either too simple to give a reasonably fast code or too expensive to apply to all functions.

My comments above aside, I think this is a great idea for a project. It is aggressive so the amount of time you put in will likely be larger than a scaled back project.

-eric
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Xin Tong Utoronto

unread,
Mar 31, 2011, 5:35:36 PM3/31/11
to Eric Christopher, llv...@cs.uiuc.edu
On Tue, Mar 29, 2011 at 4:45 PM, Eric Christopher <echr...@apple.com> wrote:
>
> Project Outline:
>
>
>
> Currently, the LLVM JIT serves as a management layer for the executed LLVM IR, it manages the compiled code and calls the LLVM code generator to do the real work. There are levels of optimizations for the LLVM code generator, and depends on how much optimizations the code generator is asked to do, the time taken may vary significantly. The adaptive compilation mechanism should be able to detect when a method is getting hot, compiling or recompiling it using the appropriate optimization levels. Moreover, this should happen transparently to the running application. In order to keep track of how many times a JITed function is called. This involves inserting instrumentational code into the function's LLVM bitcode before it is sent to the code generator. This code will increment a counter when the function is called. And when the counter reaches a threshold, the function gives control back to the LLVM JIT. Then the JIT will look at the hotness of all the methods and find the one that triggered the recompilation threshold. The JIT can then choose to raise the level of optimization based on the algorithm below or some other algorithms developed later.

>
>
> IF (getCompilationCount(method) > 50 in the last 100 samples) = > Recompile at Aggressive
> ELSE Recompile at the next optimization level.
>
>
> Even though the invocation counting introduces a few lines of binary, but the advantages of adaptive optimization should far overweigh the extra few lines of binary introduced. Note the adaptive compilation framework I propose here is orthogonal to the LLVM profile-guided optimizations. The profile-guided optimization is a technique used to optimize code with some profiling or external information. But the adaptive compilation framework is concerned with level of optimizations instead of how the optimizations are to be performed.
>

So, one way that current projects use the JIT is via getPointerToFunction() which returns an address that can then be casted and called with the appropriate arguments. The compile task itself is often done on a separate thread. How would you deal with the updating problem in the calling application? What sort of use cases for the JIT have you looked at so far?

I assume the updating problem means the problem when a method gets recompiled. Here is an algorithm to deal with that. Say A calls B. when B gets recompiled we patch B with br helper at the beginning of its code, then when A calls B, B branches to the helper and the helper patches the br B in A with br newB. as we don't know all the callers of B, we have to wait until they call B to know who they are and patch them one-by-one. The helper can get the address of the br B in A from the link register or some specific registers or memory locations. For newly compiled code, the address of the newB can be used. There is another problem with recompilation. obsolete methods(methods that have recompiled copies) need to be recycled. In order to do that, we will need to keep a br helper in place of the old method and reclaim the old method body. 

As for use case, the LLVM JIT is used as an execution engine for a few number of ported languages, for example JIT compiler for PHP, in 2008 GSOC.  There are also people using LLVM JIT for industry work, https://llvm.org/svn/llvm-project/www-pubs/trunk/2010-01-Wennborg-Thesis.pdfAs LLVM is growing more and more powerful, LLVM JIT will become more and more attractive to language designer and implementer. And I think that is one of the most important reasons we need to have an adaptive compilation framework. This framework can also work together with the LLVM profile-guided optimizations to make LLVM JIT a much faster execution engine. 

>
> This is a relatively small project and does not involve a lot of coding, but good portion of the time will be spent benchmarking, tuning and experimenting with different algorithms, i.e. what would be the algorithm to raise the compilation level when a method recompilation threshold is reached, can we make this algorithm adaptive too, etc. Therefore, my timeline for the project is as follow
>
>
> Week 1
> Benchmarking the current LLVM JIT compiler, measuring compilation speed differences for different levels of compilation. This piece of information is required to understand why a heuristic will outperform others
>
> Week 10 - 13
> Benchmarking, tuning and experimenting with different recompilation algorithms. Typically benchmarking test cases would be
>

What do you have in mind for benchmarking? Which of the jitted problems were you looking at, or just running large programs through lli and that interface? (Which isn't threaded and therefore doesn't have the problems I mentioned above - it has other problems).

Widely known benchmarks, such as SPEC CPU, would be good candidates.  In addition to these benchmarks, we may want to introduce some specific tests for Just-In-Time compilers, like ones with a small portions of the methods taking up 80%+ of the time and ones with all the methods spend about the same amount of time and ones in the middle of the two.   

>
> Week 14
> Test and organize code. Documentation
>

As a general note all of these things would need to be done during the project along with incremental changes made to the repository (on a branch if possible).

> Overall Goals:
>
>
> My main goal at the end of the summer is to have an automated profiling and adaptive compilation framework for the LLVM. Even though the performance improvements are still unclear at this point, I believe that this adaptive compilation framework will definitely give noticeable performance benefits, as the current JIT compilation is either too simple to give a reasonably fast code or too expensive to apply to all functions.

My comments above aside, I think this is a great idea for a project. It is aggressive so the amount of time you put in will likely be larger than a scaled back project.

From the questions you asked, I now understand why this project might take more time than I originally anticipated. Thank You.

- Xin
 

-eric

Eric Christopher

unread,
Mar 31, 2011, 6:47:53 PM3/31/11
to Xin Tong Utoronto, llv...@cs.uiuc.edu
>
>> So, one way that current projects use the JIT is via getPointerToFunction() which returns an address that can then be casted and called with the appropriate arguments. The compile task itself is often done on a separate thread. How would you deal with the updating problem in the calling application? What sort of use cases for the JIT have you looked at so far?
>>
> I assume the updating problem means the problem when a method gets recompiled. Here is an algorithm to deal with that. Say A calls B. when B gets recompiled we patch B with br helper at the beginning of its code, then when A calls B, B branches to the helper and the helper patches the br B in A with br newB. as we don't know all the callers of B, we have to wait until they call B to know who they are and patch them one-by-one. The helper can get the address of the br B in A from the link register or some specific registers or memory locations. For newly compiled code, the address of the newB can be used. There is another problem with recompilation. obsolete methods(methods that have recompiled copies) need to be recycled. In order to do that, we will need to keep a br helper in place of the old method and reclaim the old method body.
>

This all assumes that you have control over where the parent (for lack of a better term) calls the function you're compiling. This method of replacement only works when you call a stub in place of the function - since the JIT owns the stub you'll have a relocation to replace. If you are giving an actual address that is the real start of the function this won't work since you'll have no way of updating.

Just some food for though.

> As for use case, the LLVM JIT is used as an execution engine for a few number of ported languages, for example JIT compiler for PHP, in 2008 GSOC. There are also people using LLVM JIT for industry work, https://llvm.org/svn/llvm-project/www-pubs/trunk/2010-01-Wennborg-Thesis.pdf. As LLVM is growing more and more powerful, LLVM JIT will become more and more attractive to language designer and implementer. And I think that is one of the most important reasons we need to have an adaptive compilation framework. This framework can also work together with the LLVM profile-guided optimizations to make LLVM JIT a much faster execution engine.
>

Heh. Not quite what I meant by use cases :) I meant some "ways that you expect people will take the address of the code you are providing to run".

>> What do you have in mind for benchmarking? Which of the jitted problems were you looking at, or just running large programs through lli and that interface? (Which isn't threaded and therefore doesn't have the problems I mentioned above - it has other problems).
>
> Widely known benchmarks, such as SPEC CPU, would be good candidates. In addition to these benchmarks, we may want to introduce some specific tests for Just-In-Time compilers, like ones with a small portions of the methods taking up 80%+ of the time and ones with all the methods spend about the same amount of time and ones in the middle of the two.
>

*nod* You'll also want to test short lifetime code to make certain that you aren't regressing the performance of that too much as well.

> From the questions you asked, I now understand why this project might take more time than I originally anticipated. Thank You.

Thanks for looking into it. I think it's a great idea for a GSoC project.

Xin Tong Utoronto

unread,
Mar 31, 2011, 11:06:41 PM3/31/11
to Eric Christopher, llv...@cs.uiuc.edu
On Thu, Mar 31, 2011 at 6:47 PM, Eric Christopher <echr...@apple.com> wrote:
>
>> So, one way that current projects use the JIT is via getPointerToFunction() which returns an address that can then be casted and called with the appropriate arguments. The compile task itself is often done on a separate thread. How would you deal with the updating problem in the calling application? What sort of use cases for the JIT have you looked at so far?
>>
> I assume the updating problem means the problem when a method gets recompiled. Here is an algorithm to deal with that. Say A calls B. when B gets recompiled we patch B with br helper at the beginning of its code, then when A calls B, B branches to the helper and the helper patches the br B in A with br newB. as we don't know all the callers of B, we have to wait until they call B to know who they are and patch them one-by-one. The helper can get the address of the br B in A from the link register or some specific registers or memory locations. For newly compiled code, the address of the newB can be used. There is another problem with recompilation. obsolete methods(methods that have recompiled copies) need to be recycled. In order to do that, we will need to keep a br helper in place of the old method and reclaim the old method body.
>

This all assumes that you have control over where the parent (for lack of a better term) calls the function you're compiling. This method of replacement only works when you call a stub in place of the function - since the JIT owns the stub you'll have a relocation to replace. If you are giving an actual address that is the real start of the function this won't work since you'll have no way of updating.

Just some food for though.

No we will always have control over where the parent calls the functions that we are recompiling. As explained in the example below

Original Code 

Binary for A:         Binary for B:           
  
...                         ...         
...                         ...
br B                      ...
...                         ...
...

After B is recompiled, we patch the entry of B with br helper

Binary for A:         Binary for B:        Binary for Recompiled B
...                        br helper              ...
...                        ...                        ...
br B                     ...                        ...
...
...


now when the parent A calls B, B branches to the helper, we will get the address of br B from the return address pushed onto the
stack or saved in the link register. The helper then patches the callsite in A

After Patching 

Binary for A:               Binary for B:        Binary for Recompiled B 
...                               br helper             ...
...                               ...                       ...
br Recompiled B          ...                       ...
...                       
...
  

> As for use case, the LLVM JIT is used as an execution engine for a few number of ported languages, for example JIT compiler for PHP, in 2008 GSOC.  There are also people using LLVM JIT for industry work, https://llvm.org/svn/llvm-project/www-pubs/trunk/2010-01-Wennborg-Thesis.pdf. As LLVM is growing more and more powerful, LLVM JIT will become more and more attractive to language designer and implementer. And I think that is one of the most important reasons we need to have an adaptive compilation framework. This framework can also work together with the LLVM profile-guided optimizations to make LLVM JIT a much faster execution engine.
>

Heh. Not quite what I meant by use cases :) I meant some "ways that you expect people will take the address of the code you are providing to run".

>> What do you have in mind for benchmarking? Which of the jitted problems were you looking at, or just running large programs through lli and that interface? (Which isn't threaded and therefore doesn't have the problems I mentioned above - it has other problems).
>
> Widely known benchmarks, such as SPEC CPU, would be good candidates.  In addition to these benchmarks, we may want to introduce some specific tests for Just-In-Time compilers, like ones with a small portions of the methods taking up 80%+ of the time and ones with all the methods spend about the same amount of time and ones in the middle of the two.
>

*nod* You'll also want to test short lifetime code to make certain that you aren't regressing the performance of that too much as well.

> From the questions you asked, I now understand why this project might take more time than I originally anticipated. Thank You.

Thanks for looking into it. I think it's a great idea for a GSoC project.

-eric

Eric Christopher

unread,
Mar 31, 2011, 11:07:43 PM3/31/11
to Xin Tong Utoronto, llv...@cs.uiuc.edu
>
>
> No we will always have control over where the parent calls the functions that we are recompiling. As explained in the example below
>
> Original Code
>
> Binary for A: Binary for B:
>
> ... ...
> ... ...
> br B ...
> ... ...
> ...
>
> After B is recompiled, we patch the entry of B with br helper
>
> Binary for A: Binary for B: Binary for Recompiled B
> ... br helper ...
> ... ... ...
> br B ... ...
> ...
> ...
>
>
> now when the parent A calls B, B branches to the helper, we will get the address of br B from the return address pushed onto the
> stack or saved in the link register. The helper then patches the callsite in A
>
> After Patching
>
> Binary for A: Binary for B: Binary for Recompiled B
> ... br helper ...
> ... ... ...
> br Recompiled B ... ...
> ...
> ...
>

Your helper is what I'm calling a stub (and what they're called in the jit at the moment).

Xin Tong Utoronto

unread,
Mar 31, 2011, 11:13:36 PM3/31/11
to Eric Christopher, llv...@cs.uiuc.edu
Then we would always have the location of the br B instruction in A, as it is pushed onto the stack or saved in link register when A calls B.  

Eric Christopher

unread,
Mar 31, 2011, 11:15:23 PM3/31/11
to Xin Tong Utoronto, llv...@cs.uiuc.edu
>
>
> Then we would always have the location of the br B instruction in A, as it is pushed onto the stack or saved in link register when A calls B.
>

Right, unless you wanted to go with direct calls in the JIT. I don't know that inspecting a running program's stack for return values while compiling on a separate thread is sane :)

That said, the adaptive compilation strategy can always enforce the use of a stub for calls into jitted functions.

Reid Kleckner

unread,
Apr 1, 2011, 12:38:39 PM4/1/11
to Eric Christopher, Xin Tong Utoronto, llv...@cs.uiuc.edu
On Thu, Mar 31, 2011 at 11:15 PM, Eric Christopher <echr...@apple.com> wrote:
>>
>>
>> Then we would always have the location of the br B instruction in A, as it is pushed onto the stack or saved in link register when A calls B.
>>
>
> Right, unless you wanted to go with direct calls in the JIT. I don't know that inspecting a running program's stack for return values while compiling on a separate thread is sane :)
>
> That said, the adaptive compilation strategy can always enforce the use of a stub for calls into jitted functions.

I think you said what I was going to say, but here's how I see it:

getPointerToFunction always returns the address of the stub, so any
indirect calls to the function hit the stub. The stub will attempt to
backpatch the callsite, and should fail if it is a) outside the JIT or
b) an indirect call. Indirect calls will always go through the stub.
Direct calls can be backpatched.

Alternatively, everything would go through the stub.

Finally, you have to be careful about the memory model of the
architecture you're dealing with if you care about threads. On x86,
if you can align the operand of the call instruction, you can do an
atomic update. You will also have to make sure you don't free the old
machine code while there are still functions executing in it. An easy
(yet inefficient) way to deal with this is to update a refcount on
function entry/exit.

Reid

Eric Christopher

unread,
Apr 1, 2011, 1:26:28 PM4/1/11
to Reid Kleckner, Xin Tong Utoronto, llv...@cs.uiuc.edu

On Apr 1, 2011, at 9:38 AM, Reid Kleckner wrote:

> On Thu, Mar 31, 2011 at 11:15 PM, Eric Christopher <echr...@apple.com> wrote:
>>>
>>>
>>> Then we would always have the location of the br B instruction in A, as it is pushed onto the stack or saved in link register when A calls B.
>>>
>>
>> Right, unless you wanted to go with direct calls in the JIT. I don't know that inspecting a running program's stack for return values while compiling on a separate thread is sane :)
>>
>> That said, the adaptive compilation strategy can always enforce the use of a stub for calls into jitted functions.
>
> I think you said what I was going to say, but here's how I see it:
>
> getPointerToFunction always returns the address of the stub, so any
> indirect calls to the function hit the stub. The stub will attempt to
> backpatch the callsite, and should fail if it is a) outside the JIT or
> b) an indirect call. Indirect calls will always go through the stub.
> Direct calls can be backpatched.
>
> Alternatively, everything would go through the stub.
>
> Finally, you have to be careful about the memory model of the
> architecture you're dealing with if you care about threads. On x86,
> if you can align the operand of the call instruction, you can do an
> atomic update. You will also have to make sure you don't free the old
> machine code while there are still functions executing in it. An easy
> (yet inefficient) way to deal with this is to update a refcount on
> function entry/exit.

Well said :)

-eric

Xin Tong Utoronto

unread,
Apr 3, 2011, 3:01:14 PM4/3/11
to Eric Christopher, llv...@cs.uiuc.edu
On Fri, Apr 1, 2011 at 1:26 PM, Eric Christopher <echr...@apple.com> wrote:

On Apr 1, 2011, at 9:38 AM, Reid Kleckner wrote:

> On Thu, Mar 31, 2011 at 11:15 PM, Eric Christopher <echr...@apple.com> wrote:
>>>
>>>
>>> Then we would always have the location of the br B instruction in A, as it is pushed onto the stack or saved in link register when A calls B.
>>>
>>
>> Right, unless you wanted to go with direct calls in the JIT. I don't know that inspecting a running program's stack for return values while compiling on a separate thread is sane :)
>>
>> That said, the adaptive compilation strategy can always enforce the use of a stub for calls into jitted functions.
>
> I think you said what I was going to say, but here's how I see it:
>
> getPointerToFunction always returns the address of the stub, so any
> indirect calls to the function hit the stub.  The stub will attempt to
> backpatch the callsite, and should fail if it is a) outside the JIT or
> b) an indirect call.  Indirect calls will always go through the stub.
> Direct calls can be backpatched.
>
> Alternatively, everything would go through the stub.
>
> Finally, you have to be careful about the memory model of the
> architecture you're dealing with if you care about threads.  On x86,
> if you can align the operand of the call instruction, you can do an
> atomic update.  You will also have to make sure you don't free the old
> machine code while there are still functions executing in it.  An easy
> (yet inefficient) way to deal with this is to update a refcount on
> function entry/exit.

Another way to do the patching is to first atomically inserted a self-loop jump -2 atomically (jump -2 takes 2 bytes and 2 bytes writing is atomic on x86 )  into the old branch address on x86 such that it stops all threads reaching this point. copy in the new compiled function address.  and then re-patch the jump -2 with the correct binary.

Well said :)

-eric

Eric Christopher

unread,
Apr 3, 2011, 3:11:40 PM4/3/11
to Xin Tong Utoronto, llv...@cs.uiuc.edu

On Apr 3, 2011, at 12:01 PM, Xin Tong Utoronto wrote:

> Another way to do the patching is to first atomically inserted a self-loop jump -2 atomically (jump -2 takes 2 bytes and 2 bytes writing is atomic on x86 ) into the old branch address on x86 such that it stops all threads reaching this point. copy in the new compiled function address. and then re-patch the jump -2 with the correct binary.

Heh. That'd work on x86 at least. I don't know about other architectures.

Stephen Kyle

unread,
Apr 4, 2011, 1:19:01 PM4/4/11
to Xin Tong Utoronto, Xin Tong, llv...@cs.uiuc.edu
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


Hi Xin,

If I understand the above correctly, this basically means that whenever an application calls a function it's been given by getPointerToFunction(), there's a possibility the function is recompiled with more aggressive optimisations, should that function meet some hotness threshold. Does the application have to wait while this compilation takes place, before the function it called is actually executed?

If so, it's nice that recompilation is transparent to the application, and so functions just magically become faster over time, but stalling the application like this may not be desirable. 

I've added an adaptive optimisation system to an instruction set simulator developed at my university which heavily relies on LLVM for JIT compilation. It performs all the compilation in a separate thread from where the interpretation of the simulated program is taking place, meaning it never needs to wait for any compilation. Adaptive reoptimisation also takes place in a separate thread, and this has caused me a multitude of headaches, but I digress...

Basically: if the initial compilation is done in a separate thread, can you ensure that any adaptive reoptimisation also happens asynchronously, or will such use cases have to do without your system?

Cheers,
Stephen

Xin Tong Utoronto

unread,
Apr 4, 2011, 1:42:49 PM4/4/11
to Stephen Kyle, llv...@cs.uiuc.edu

Functions will have to meet some hotness threshold before it is recompiled at a higher optimization level. The application does not have to wait for the compilation to finish as the compilation will be done asynchronously in the different thread and the application would use the current copy(less optimized copy this time) and the more optimized copy later. Thank you for the suggestion.

Xin

Owen Anderson

unread,
Apr 4, 2011, 1:49:53 PM4/4/11
to LLVMdev List

On Apr 3, 2011, at 12:11 PM, Eric Christopher wrote:

> <snip conversation about call patching>

It seems to me that there's a general feature here that LLVM is lacking, that would be useful in a number of JIT-compilation contexts, namely the ability to mark certain instructions (direct calls, perhaps branches too) as back-patchable.

The thing that stands out to me is that back-patching a call or branch in a JIT'd program is very much like relocation resolution that a dynamic linker does at launch time for a statically compiled program. It seems like it might be possible to build on top of the runtime-dyld work that Jim has been doing for the MC-JIT to facilitate this. Here's the idea:

Suppose we had a means of tagging certain calls (and maybe branches) as explicitly requiring relocations. Any back-patchable call would have a relocation in the generated code, and the MC-JIT would be aware of the location and type of the relocations, and rt-dyld would handle the upfront resolution. Backpatching, then, is just a matter of updating the resolution for a given symbol, and asking rt-dyld to re-link the executable code.

Thoughts?

--Owen

Jim Grosbach

unread,
Apr 4, 2011, 3:47:56 PM4/4/11
to Owen Anderson, LLVMdev List

On Apr 4, 2011, at 10:49 AM, Owen Anderson wrote:

>
> On Apr 3, 2011, at 12:11 PM, Eric Christopher wrote:
>
>> <snip conversation about call patching>
>
> It seems to me that there's a general feature here that LLVM is lacking, that would be useful in a number of JIT-compilation contexts, namely the ability to mark certain instructions (direct calls, perhaps branches too) as back-patchable.
>
> The thing that stands out to me is that back-patching a call or branch in a JIT'd program is very much like relocation resolution that a dynamic linker does at launch time for a statically compiled program. It seems like it might be possible to build on top of the runtime-dyld work that Jim has been doing for the MC-JIT to facilitate this. Here's the idea:
>
> Suppose we had a means of tagging certain calls (and maybe branches) as explicitly requiring relocations. Any back-patchable call would have a relocation in the generated code, and the MC-JIT would be aware of the location and type of the relocations, and rt-dyld would handle the upfront resolution. Backpatching, then, is just a matter of updating the resolution for a given symbol, and asking rt-dyld to re-link the executable code.
>

You nailed it. This is basically the direction MCJIT is heading down for address resolution (code coming soon to a tree near you). You're right that the other half of that problem is making sure that all relevant instructions will need to have explicit relocations. That's part of the motivation for using MachO as a container, at least at first, for the MCJIT, as it effectively has that constraint by virtue of its treatment of Atoms.

This will allow things like compiling a function into one address space, then copying it into another for execution (think a debugger or compile server process needing to inject JITed code into a client process address space). Likewise, moving things around (e.g., recompiling a hot function with more optimization) will result in needing to re-patch the addresses for that symbol.

One catch is that addresses can escape the relocations, so back-patching can't handle everything directive. Specifically, function pointers make things "fun" since two pointer to the same function must always compare equal. We can deal with that by detecting when an address escapes like that, which shouldn't be too hard (hand wave), and allocating a stub containing just a direct branch instruction. Indirect references to the symbol always get the stub, and when the function is re-compiled, the stub is back-patched to branch to the new destination. So things work, at the cost of an additional unconditional branch on the execution path. In the context of a JIT, that should be a relatively small cost to pay.

Also of interest is freeing up the allocated memory when it's no longer needed. For functions, this isn't too hard, as we just backpatch references back to a compilation callback (or interpreter or whatever else we want). For global values, including the above function pointer indirection stubs, things are a bit trickier. The most likely answer will be to have those objects be reference counted such that they're automatically deallocated when no more objects remain that reference them. An explicit call to free them becomes a nop. To be completely thorough, we'll need cycle detection, but in practice, I suspect that happens rarely enough that it won't be a big issue if we don't get to it right off the bat.

-Jim

Reply all
Reply to author
Forward
0 new messages