The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.
These are the proposed changes to LangRef, written by Juergen and me.
WebKit Intrinsics
-----------------
This class of intrinsics is used by the WebKit JavaScript compiler to obtain
additional information about the live state of certain variables and/or to
enable the runtime system / JIT to patch the code afterwards.
The use of the following intrinsics always generates a stack map. The purpose
of a stack map is to record the location of function arguments and live
variables at the point of the intrinsic function in the instruction steam.
Furthermore it records a unique callsite id and the offset from the beginning
of the enclosing function.
'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
::
declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32 <numShadowBytes>, ...)
Overview:
"""""""""
The '``llvm.webkit.stackmap``' intrinsic records the location of live variables in the stack map without generating any code.
Arguments:
""""""""""
The first argument is a unique id and the second argument is the number of
shadow bytes following the intrinsic. The variable number of arguments after
that are the live variables.
Semantics:
""""""""""
The stackmap intrinsic generates no code in place, but its offset from function
entry is stored in the stack map. Furthermore, it guarantees a shadow of
instructions following its instruction offset during which neither the end of
the function nor another stackmap or patchpoint intrinsic may occur.
This allows the runtime to patch the code at this point in response to an event triggered from outside the code.
----- Original Message -----
On Fri, Oct 18, 2013 at 1:39 AM, Andrew Trick < atr...@apple.com >
wrote:
The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.
I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.
The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.
I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.
You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).Again, the only reason to start with the webkit name is that it’s easy to change webkit later to use different intrinsics. I was waiting to see how much interest there is in using these instrinsics as-is for other clients. So far, there seems to be strong interest. If there isn’t much debate regarding the intrinsic format then I’ll drop the webkit name.
I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).
For the record, I wasn’t aware of any precedent for “llvm.experimental”, but if it will help avoid backward compatibility issues then it’s a good thing.
-Chris
On Oct 18, 2013, at 12:39 PM, Chris Lattner <clat...@apple.com> wrote:On Oct 18, 2013, at 11:08 AM, Andrew Trick <atr...@apple.com> wrote:
I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).I think that Hal's idea of "experimental" is the right approach here. The major thing we want is to avoid having to be backwards compatible with this intrinsic in subsequent llvm releases. "experimental" sends that message, where webkit does not (and is also bad for the reasons Hal mentions).What would be the criteria for eventually dropping 'experimental' from the intrinsic names?
On 10/22/13 10:34 AM, Filip Pizlo wrote:On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.On 10/17/13 10:39 PM, Andrew Trick wrote:These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.This is a proposal for adding Stackmaps and Patchpoints to LLVM. TheI have a couple of comments on your proposal. None of these are major enough to prevent submission.
first client of these features is the JavaScript compiler within the
open source WebKit project.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.
I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being *uniquely* a problem for such collectors.) Since I was unaware of what you're using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.
In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.Er, possibly I'm misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that's not the only thing you can use it for.
(i.e. avoiding the dynamic dispatch logic in the common case) I'm assuming this what you mean with the term "call inline cache", but I have never heard of a "heap access dispatch inline cache". I've done a google search and didn't find a definition. Could you point me to a reference or provide a brief explanation?
Philip
On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:On 10/22/13 10:34 AM, Filip Pizlo wrote:On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.On 10/17/13 10:39 PM, Andrew Trick wrote:These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.This is a proposal for adding Stackmaps and Patchpoints to LLVM. TheI have a couple of comments on your proposal. None of these are major enough to prevent submission.
first client of these features is the JavaScript compiler within the
open source WebKit project.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:
On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.
On 10/17/13 10:39 PM, Andrew Trick wrote:These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.
This is a proposal for adding Stackmaps and Patchpoints to LLVM. TheI have a couple of comments on your proposal. None of these are major enough to prevent submission.
first client of these features is the JavaScript compiler within the
open source WebKit project.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.
In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.
I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being *uniquely* a problem for such collectors.) Since I was unaware of what you're using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.
Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.
In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.
Er, possibly I'm misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.
- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that's not the only thing you can use it for.
Inline caches don't have to be calls. For example, in JavaScript, the expression "o.f" is fully dynamic but usually does not result in a call. The inline cache - and hence patchpoint - for such an expression will not have a call in the common case.
Similar things arise in other dynamic languages. You can have inline caches for arithmetic. Or for array accesses. Or for any other dynamic operation in your language.
(i.e. avoiding the dynamic dispatch logic in the common case) I'm assuming this what you mean with the term "call inline cache", but I have never heard of a "heap access dispatch inline cache". I've done a google search and didn't find a definition. Could you point me to a reference or provide a brief explanation?
Every JavaScript engine does it, and usually the term "inline cache" in the context of JS engines implies dispatching on the shape of the object in order to find the offset at which a field is located, rather than dispatching on the class of an object to determine what method to call.
On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:On 10/22/13 10:34 AM, Filip Pizlo wrote:On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.On 10/17/13 10:39 PM, Andrew Trick wrote:These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.This is a proposal for adding Stackmaps and Patchpoints to LLVM. TheI have a couple of comments on your proposal. None of these are major enough to prevent submission.
first client of these features is the JavaScript compiler within the
open source WebKit project.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't.
On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:
On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:
On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:
On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.
I have a couple of comments on your proposal. �None of these are major enough to prevent submission.These stackmaps have nothing to do with GC. �Interior pointers are a problem unique to precise copying collectors.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. �(minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. �In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). �Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. �(e.g. an interior pointer to an array when blocking a loop) �Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? �If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. �Is there something I'm missing here?
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.
Thank you for the restatement.� You summarized my view well.�
It's not at all similar. �These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. �You won't use these for a GC.
In general, if the expected semantics are the same, a shared implementation would be desirable. �This is more a suggestion for future refactoring than anything else.
I think that these stackmaps and GC stackmaps are fairly different beasts. �While it's possible to unify the two, this isn't the intent here. �In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn�t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.
Er, I think we're talking past each other again.� Let me lay out my current understanding of the terminology and existing infrastructure in LLVM.� Please correct me where I go wrong.
The proposed stack maps are very different from LLVM�s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I�m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.
I think this a restatement of my last paragraph above which would mean we're actually in agreement.�
If someone wanted to use stack maps for GC, I don�t know why they wouldn�t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't. The runtime can add GC roots to the stack map just like other live value, and it should know how to interpret the records. The intrinsic doesn�t bake in any particular interpretation of the mapped values.
Oddly enough, I'll be raising the issue of how to go about supporting a relocating collector on list shortly.� We've looking into this independently, but are at the point we'd like to get feedback from others.� :)That said, my proposal deliberately does not cover GC. I think that stack maps are the easy part of the problem. The hard problem is tracking interior pointers, or for that matter exterior/out-of-bounds or swizzled pointers. LLVM�s machine IR simply doesn�t have the necessary facilities for doing this. But if you don�t need a moving collector, then you don�t need to track derived pointers as long as the roots are kept live. In that case, llvm.stackmap might be a nice optimization over llvm.gcroot.
Agreed.� My concern is mostly about naming and documentation of intended usages.� Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used.� The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.
Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it�s nice to be able to leverage calling conventions to inform the location of arguments.
But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.
In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.
Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments.Agreed. My concern is mostly about naming and documentation of intended usages. Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used. The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.
But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.
Er, not quite true. Your calling convention also influences what registers stay live across the call. But in general, I see your point.
(Again, this is touching an area of LLVM I'm not particularly familiar with.)In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.
If you were to do that, how would the implementation be different then the new stackmap intrinsic? Does that difference imply a clarification in intended usage or naming?
Adding Gael as someone who has previously discussed vmkit topics on the list. Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself.
On 10/22/13 4:18 PM, Andrew Trick wrote:
On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:
On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:
On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:
On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.
I have a couple of comments on your proposal. None of these are major enough to prevent submission.These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.
Thank you for the restatement. You summarized my view well.
It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.
In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.
I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.
Er, I think we're talking past each other again. Let me lay out my current understanding of the terminology and existing infrastructure in LLVM. Please correct me where I go wrong.
The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.
stack map - A mapping from "values" to storage locations. Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state). A stack map is specific to a particular PC and describes the state at that instruction only.
In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector. When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack. This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread. To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself.
In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered. (i.e. calls and actual stop-the-world safepoints) The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way. The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints). Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic.
In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map. The SafePoints structure in GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage. See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
Note: The summary of existing LLVM details above is based on reading the code. I haven't actually implemented anything which used this mechanism yet. As such, take it with a grain of salt.
In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching. (Right?) Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM. The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not. One other possible difference, are you intending to include "values" which aren't of pointer type?
Before moving on, am I interpreting your proposal and changes correctly?
Assuming I'm still correct so far, how might we combine these implementations? It looks like your implementation is much more mature than what exists in tree at the moment. One possibility would be to express the needed GC stack maps in terms of your new infrastructure. (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments). What would you think of this?
> On Oct 23, 2013, at 1:17 AM, Gaël Thomas <gael....@lip6.fr> wrote:
>
> Hi all,
>
> I don't know if I understand everything, but it seems really
> interesting for a runtime developer, stackmap and patchpoint looks
> perfect for a lot of optimizations :) I just have few question to
> verify if I understand what are these stackmaps and patchpoints, and I
> discuss the GC after.
>
> * I have a first very simple scenario (useful in vmkit). Let's imagine
> that we want to lazily build the layout of an object at runtime, i.e.,
> we don't know the layout of the object when we are emitting the code.
> And, we want to access to a field of this object identified by a
> symbol. If I understand correctly, we can use your stackmap to define
> the offset of this field and then patch the code that use this offset?
> The machine code will like mov %(rax)offset, .., and the stackmap
> will generate a map that contains the location of "offset" in the
> code? If it's the case, it's perfect.
This is one of the use cases of patchpoint. Stackmap doesn't quite work because in IR it doesn't return anything - though you could probably use stackmap for putfield. But patchpoint is more convenient for this, I think.
You'll probably want a wider range of return types of patchpoint. Currently it's just i64.
I think that you'd instead want the whole call and the vtable resolution to be machine code that you generate inside the patchpoint.
Hi all,
I don't know if I understand everything, but it seems really
interesting for a runtime developer, stackmap and patchpoint looks
perfect for a lot of optimizations :) I just have few question to
verify if I understand what are these stackmaps and patchpoints, and I
discuss the GC after.
* I have a first very simple scenario (useful in vmkit). Let's imagine
that we want to lazily build the layout of an object at runtime, i.e.,
we don't know the layout of the object when we are emitting the code.
And, we want to access to a field of this object identified by a
symbol. If I understand correctly, we can use your stackmap to define
the offset of this field and then patch the code that use this offset?
The machine code will like mov %(rax)offset, .., and the stackmap
will generate a map that contains the location of "offset" in the
code? If it's the case, it's perfect.
* Now, for the GC, if I understand correctly, instead of declaring a
variable as a root, you can declare explicitly the safepoints by using
patchpoints with something like
patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the
alloca (or registers) that contains objects
Then in the suspendTheThreadForCollection, we can see that we are
coming for the safepoint_17 and then find the locations of the
objects? If a patchpoint can work like this, it's probably a good
building block for the gc.
Currently, we have to declare the root objects with the root
intrinsic, then add the appropriate safepoints (it's just a call to
GCFunctionInfo.addSafePoint). As root objects are marked as root,
modifying GCFunctionInfo.addSafepoint to generate a patchpoint with
all the gc roots as argument (instead of using the current
infrastructure) should not be difficult. And it probably means that
the current gc infrastructure could use patchpoint as a backend. The
only problem that I see is that all the objects will be transmitted as
arguments to suspendTheThreadForCollection, it's maybe not the best
way to do that. Probably, something like:
safepoint ID_safepoint_17, list of alloca that contains objects
patchpoint ID_safepoint_17, suspendTheThreadForCollection
should be better to avoid useless arguments?
See you,
Gaël
PS: just, tell me if the code is already in the trunk, because I would
like to see if these intrinsics can work for vmkit :)
I'll respond to a few questions below. I'll start a new thread for GC discussion.
On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:
Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it�s nice to be able to leverage calling conventions to inform the location of arguments.Agreed.� My concern is mostly about naming and documentation of intended usages.� Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used.� The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.
I don't think the intrinsic names will be able to capture their semantics. I think that's why we need documentation, which I've been working on: http://llvm-reviews.chandlerc.com/D1981.
For example, the "stackmap" intrinsic isn't really a stack map, it's something that allows generation of a stack map in which the entries don't actually need to be on the stack... confusing, but still a good name I think.
"stack map" is also a fairly well understood term in the GC/compiler world.� It's better to stick with well known terminology where possible.�
But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.
Er, not quite true.� Your calling convention also influences what registers stay live across the call.� But in general, I see your point.
You get around that by defining a new calling convention. Each patchpoint intrinsic call can be marked with a different calling convention if you choose. For example, we'll be adding a dynamic calling convention called AnyRegCC. You can use that to effectively specify the number of arguments that you want to force into registers. The stack map will tell you which registers were used for arguments. The "call" will preserves most registers, but clobbers one register (on x86) for use within the code.
Nice trick.� I'll have to remember that.�
Another potential extension is to add an entry to the stackmap marking physical registers that are actually in-use across the stack map or patch point.
It helps me to think of llvm.patchpoint as a replacement for any situation where a JIT would have otherwise needed inline asm to generate the desired code sequence.
(Again, this is touching an area of LLVM I'm not particularly familiar with.)
In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.
If you were to do that, how would the implementation be different then the new stackmap intrinsic?� Does that difference imply a clarification in intended usage or naming?
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.� Let's move on.� We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.�
We could have defined different intrinsics for all variations of use cases, but I think two is the right number:
- Use llvm.stackmap if you just want a stack map. No code will be emitted. There is no calling convention. If the runtime patches the code here, it will be destructive.
- Use llvm.patchpoint if you want to reserve space for patching the code. When you do that, you can optionally specify a number of arguments that will follow a specified calling convention. You also get a stack map here because it can be useful to fuse the stack map to the point point location. After all, the runtime needs to know where to patch.
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.
I'm moving this to a different thread. I think the newly proposedintrinsic definitions and their current implementation are valuableregardless of how it gets tied into GC...
On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:
Adding Gael as someone who has previously discussed vmkit topics on the list.� Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself.
On 10/22/13 4:18 PM, Andrew Trick wrote:
On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:
On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:
On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:
On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.
I have a couple of comments on your proposal. �None of these are major enough to prevent submission.These stackmaps have nothing to do with GC. �Interior pointers are a problem unique to precise copying collectors.
- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. �(minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. �In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). �Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. �(e.g. an interior pointer to an array when blocking a loop) �Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? �If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. �Is there something I'm missing here?
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.
It's not at all similar. �These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. �You won't use these for a GC.
In general, if the expected semantics are the same, a shared implementation would be desirable. �This is more a suggestion for future refactoring than anything else.
I think that these stackmaps and GC stackmaps are fairly different beasts. �While it's possible to unify the two, this isn't the intent here. �In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn�t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.
Thank you for the restatement.� You summarized my view well.�
Er, I think we're talking past each other again.� Let me lay out my current understanding of the terminology and existing infrastructure in LLVM.� Please correct me where I go wrong.
The proposed stack maps are very different from LLVM�s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I�m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.
stack map - A mapping from "values" to storage locations.� Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state).� A stack map is specific to a particular PC and describes the state at that instruction only.�
In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector.� When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack.� This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread.� To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself.�
In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered.� (i.e. calls and actual stop-the-world safepoints)� The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way.� The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints).� Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic.�
In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map.� The SafePoints structure in GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage.� See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
Note: The summary of existing LLVM details above is based on reading the code.� I haven't actually implemented anything which used this mechanism yet.� As such, take it with a grain of salt.�
That's an excellent description of stack maps, GCStrategy, andsafepoints. Now let me explain how I see it.
GCStrategy provides layers of abstraction that allow plugins tospecialize GC metadata. Conceptually, a plugin can generate what lookslike stack map data to the collector. But there isn't any directsupport in LLVM IR for the kind of stack maps that we need.
When I talk about adding stack map support, I'm really talking aboutsupport for mapping values to registers, where the set of values andtheir locations are specific to the "safepoint".
We're adding an underlying implementation of per-safepoint livevalues. There isn't a lot of abstraction built up around it. Just acouple of intrinsics that directly expose the functionality.
We're also approaching the interface very differently. We're enablingan MCJIT client. The interface to the client is the stack map format.
For the record, I actually prefer your approach to the interface.� :)
In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching.� (Right?)� Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM.� The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not.� One other possible difference, are you intending to include "values" which aren't of pointer type?�
Yes, the values will be of various types (although only 32/64 bittypes are currently allowed because of DWARF register numberweirdness). More importantly, our stack maps record locations of aspecific set of values, which may be in registers, at a specificlocation.
The fact that you're interested in more than information about which locations contain pointers into the heap is the key point here.� Your stack map is actually slightly more general than the form used by a garbage collector.� For example, your mechanism allows you to describe where the iteration variable ("int i") in a loop lives.� This is not something a stack map (in the sense I've been using it to refer to GC usage) would enable.
In fact, that, along with reserving space for code patching,is *all* we're doing. GCRoot doesn't do this at all. So there iseffectively no overlap in implementation.
Before moving on, am I interpreting your proposal and changes correctly?
Yes, except I don�t see a direct connection between the functionality we�readding and �the implementation currently in LLVM�.
Assuming I'm still correct so far, how might we combine these implementations?� It looks like your implementation is much more mature than what exists in tree at the moment.� One possibility would be to express the needed GC stack maps in terms of your new infrastructure.� (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments).� What would you think of this?�
I can imagine someone wanting to leverage some of the newimplementation without using it end-to-end as-is. Although I'm notentirely sure what the motivation would be. For example:
- A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint
� calls at custom safepoints after determining GC root liveness at� those points.
- Something like a GCStrategy could intercept our implementation of
I think this is a very good point.� Alternately, you could frame your encoding as being the default representation provided by LLVM and provide a plugin mechanism to modify it.� (Not proposing this should actually be done at the moment.� This would be by demand only.)� stack map generation and emit a custom format. Keep in mind though� that the format that LLVM emits does not need to be the format read� by the collector. The JIT/runtime can parse LLVM's stack map data� and encode it using it's own data structures. That way, the� JIT/runtime can change without customizing LLVM.
As far as hooking the new stack map support into the GCMetaDataabstraction, I'm not sure how that would work. GCMachineCodeAnalysisis currently a standalone MI pass. We can't generate our stack mapshere. Technically, a preEmitPass can come along later and reassignregisters invalidating the stack map. That's why we generate the mapsduring MC lowering.
So, currently, the new intrinsics are serving a different purpose thanGCMetaData. I think someone working on GC support needs to beconvinced that they really need the new stack map features. Then wecan build something on top of the underlying functionality that worksfor them.
-Andy
On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I responded in the review.� The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end.� This seemed to disagree with our previous discussion.� Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic.� In either case, we should probably hash it out here.�I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.� Let's move on.� We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.��I want this to be clear when we expose the intrinsics to other developers. I�ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
On 10/23/13 5:38 PM, Andrew Trick wrote:
On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I responded in the review. The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end. This seemed to disagree with our previous discussion. Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic. In either case, we should probably hash it out here.I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.I want this to be clear when we expose the intrinsics to other developers. I’ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
On Oct 23, 2013, at 7:26 PM, Philip Reames <list...@philipreames.com> wrote:
On 10/23/13 5:38 PM, Andrew Trick wrote:
On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I responded in the review.� The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end.� This seemed to disagree with our previous discussion.� Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic.� In either case, we should probably hash it out here.�I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.� Let's move on.� We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.��I want this to be clear when we expose the intrinsics to other developers. I�ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
Thanks for the review. I'll respond to each of your comments there.
So essentially you're using the shadow region to enable creating either a nop sled which carries you down into deopt code or a forced trap (i.e. patching with an instruction sequence which forces a trap to a signal handler)?� Ok, that sounds reasonable.� You should spell this out this example explicitly in the documentation.� It's useful for clarification purposes.There's one important thing I'm not getting across. llvm.stackmap is not primarily intended for patching. As an additional feature,�llvm.stackmap can specify the size of its shadow, but does not need to generate nops in its shadow. The shadow can contain arbitrary code from the same function. The only purpose of the shadow is to allow destructive patching. Destructive means that other that some arbitrary code may be overwritten. I think the only use for this is to invalidate compiled code in a way that allows the runtime to recover. That just happens to be a very important use. Note that if we didn't specify a shadow, the runtime would have a hard time knowing how to patch without potentially clobbering something outside the current function.
On 10/23/13 10:03 PM, Andrew Trick wrote:
On Oct 23, 2013, at 7:26 PM, Philip Reames <list...@philipreames.com> wrote:
On 10/23/13 5:38 PM, Andrew Trick wrote:
On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:
The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.
I responded in the review. The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end. This seemed to disagree with our previous discussion. Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic. In either case, we should probably hash it out here.I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.I want this to be clear when we expose the intrinsics to other developers. I’ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
Thanks for the review. I'll respond to each of your comments there.
So essentially you're using the shadow region to enable creating either a nop sled which carries you down into deopt code or a forced trap (i.e. patching with an instruction sequence which forces a trap to a signal handler)? Ok, that sounds reasonable. You should spell this out this example explicitly in the documentation. It's useful for clarification purposes.There's one important thing I'm not getting across. llvm.stackmap is not primarily intended for patching. As an additional feature, llvm.stackmap can specify the size of its shadow, but does not need to generate nops in its shadow. The shadow can contain arbitrary code from the same function. The only purpose of the shadow is to allow destructive patching. Destructive means that other that some arbitrary code may be overwritten. I think the only use for this is to invalidate compiled code in a way that allows the runtime to recover. That just happens to be a very important use. Note that if we didn't specify a shadow, the runtime would have a hard time knowing how to patch without potentially clobbering something outside the current function.