[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

Andrew Trick

unread,

Oct 18, 2013, 1:39:36 AM10/18/13

to llvmdev@cs.uiuc.edu Dev

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

A Stackmap is a record of variable locations (registers and stack
offsets) at a particular instruction address.

A Patchpoint is an instruction address at which space is reserved for
patching a new instruction sequence at runtime.

These two features are close friends because it wouldn't be possible
for a runtime to patch LLVM generated code without a stack map telling
it where relevant variables live. However, the proposed stackmaps are
useful without patchpoints. In fact, the typical use-case for
stackmaps is implementing a simple trap to the runtime.

Stackmaps are required for runtime traps because without them the
optimized code would be dominated by instructions for marshaling
values into fixed locations. Even if most of the extra code can be
sunk into cold paths, experiments have shown that the impact on
compile time and code size is enormous.

Explicitly branching to runtime traps handles many situations. But
there are also cases in which the runtime needs to patch the runtime
call or the surrounding code. There are two kinds of patching we need
to support. The first case involves trampling LLVM-generated code to
safely invalidate it. This case needs to have zero impact on
optimization and codegen aside from keeping some values live. A second
case involves dynamically optimizing a short code sequence, for
example, to implement a dynamic inline cache. In this case, the
commonly executed code must be a call-free fast path. Some runtime
events may require rewriting the check guarding the fast path (e.g. to
change a type ID) or even rewriting the code the accesses a field to
change the offset. Completely invalidating the code at these events is
undesirable.

Two proposed intrinsics, llvm.stackmap and llvm.patchpoint, solve all
of the problems outlined above. The LangRef doc section is included at
the end of this proposal. The LLVM implementation of the intrinsics is
quite straightforward as you can see from the patches that I'll be
sending to llvm-commits.

Both intrinsics can generate a stack map. The difference is that a
llvm.stackmap is just a stack map. It doesn't generate any
code. llvm.patchpoint always generates a call. The runtime may
overwrite that call into a dynamically optimized inline cache.

llvm.stackmap is simple. It takes an integer ID for easy reference by
the runtime and a list of live values. It can optionally be given a
number of "shadow" bytes. The shadow bytes may be set to nonzero to
ensure that the runtime can safely patch the code following the
stackmap. This is useful for invalidating compiled code by trapping at
arbitrary points.

The LLVM backend emits stackmaps in a special data section. This
design works for JITs that are confined to the LLVM C API. Each
intrinsic results in a stackmap record with the ID and offset from
function entry. Each record contains an entry for each live value with
its location encoded as a register or stack offset.

llvm.patchpoint is the fusion of a normal call and an
llvm.stackmap. It additionally takes a call target and specifies a
number of call arguments. The call target is an opaque value to LLVM
so the runtime is not required to provide a symbol. The calling
convention can be specified via the normal "cconv" marker on the call
instruction. Instead of casting a "shadow" where code can be patched
it reserves a block of encoding space where the call-to-target will be
initially emitted followed by nop padding.

Everything about the design and implementation of these intrinsic is
as generic as we can conceive at the time. I expect the next client
who wants to optimize their managed runtime to be able to do most if
not all of what they want with the existing design. In the meantime,
the open source WebKit project has already added optional support for
llvm.stackmaps and llvm.patchpoint will be in shortly.

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics
-----------------

This class of intrinsics is used by the WebKit JavaScript compiler to obtain
additional information about the live state of certain variables and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The purpose
of a stack map is to record the location of function arguments and live
variables at the point of the intrinsic function in the instruction steam.
Furthermore it records a unique callsite id and the offset from the beginning
of the enclosing function.

'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32 <numShadowBytes>, ...)

Overview:
"""""""""

The '``llvm.webkit.stackmap``' intrinsic records the location of live variables in the stack map without generating any code.

Arguments:
""""""""""

The first argument is a unique id and the second argument is the number of
shadow bytes following the intrinsic. The variable number of arguments after
that are the live variables.

Semantics:
""""""""""

The stackmap intrinsic generates no code in place, but its offset from function
entry is stored in the stack map. Furthermore, it guarantees a shadow of
instructions following its instruction offset during which neither the end of
the function nor another stackmap or patchpoint intrinsic may occur. This allows the runtime to patch the code at this point in response to an event triggered from outside the code.

'``llvm.webkit.patchpoint.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

declare void (i32, i32, i8*, i32, ...)* @llvm.webkit.patchpoint.void(i32 <id>, i32 <numBytes>, i8* <target>, i32 <numArgs>, ...)
declare i64 (i32, i32, i8*, i32, ...)* @llvm.webkit.patchpoint.i64(i32 <id>, i32 <numBytes>, i8* <target>, i32 <numArgs>, ...)

Overview:
"""""""""

The '``llvm.webkit.patchpoint.*``' intrinsics creates a function call to the
specified <target> and records the location of the live variables in the stack
map.

Arguments:
""""""""""

The first argument is a unique id, the second is the number of bytes
reserved for encoding the intrinsic, the third is the target address
of a function, and the fourth specifies how many of the following
variable arguments participate in the function call. The remaining
variable number of arguments are the live variables.

Semantics:
""""""""""

The patchpoint intrinsic generates the stack map and emits a function call to the address specified by <target>. The function call and its arguments are lowered according to the calling convention specified at the callsite of the intrinsic. The location of the arguments are not normally recorded in the stack map. However, special stack-map specific calling conventions can force the argument locations to be recorded. The patchpoint also emits nops to cover at least <numBytes> of instruction encoding space. Hence, the client must ensure that <numBytes> is enough to encode a call to the target address on the supported targets. The runtime may patch the code emitted for the patchpoint, including the call instruction and nops.

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Sean Silva

unread,

Oct 18, 2013, 12:37:26 PM10/18/13

to Andrew Trick, llvmdev@cs.uiuc.edu Dev

On Fri, Oct 18, 2013 at 1:39 AM, Andrew Trick <atr...@apple.com> wrote:

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

This sort of functionality could probably be used to greatly improve the usability of DTrace's USDT tracing.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics
-----------------

This class of intrinsics is used by the WebKit JavaScript compiler to obtain
additional information about the live state of certain variables and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The purpose
of a stack map is to record the location of function arguments and live
variables at the point of the intrinsic function in the instruction steam.
Furthermore it records a unique callsite id and the offset from the beginning
of the enclosing function.

'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32 <numShadowBytes>, ...)

Overview:
"""""""""

The '``llvm.webkit.stackmap``' intrinsic records the location of live variables in the stack map without generating any code.

Last I checked LLVM IR doesn't have "variables" in this sense (except globals; it seems like a handful of other places in the LangRef have this wording slip too). Shouldn't the wording be more like "the run time location of the provided values"?

Arguments:
""""""""""

The first argument is a unique id and the second argument is the number of
shadow bytes following the intrinsic. The variable number of arguments after
that are the live variables.

The purpose of the "id" isn't described.

Semantics:
""""""""""

The stackmap intrinsic generates no code in place, but its offset from function
entry is stored in the stack map. Furthermore, it guarantees a shadow of
instructions following its instruction offset during which neither the end of
the function nor another stackmap or patchpoint intrinsic may occur.

It's meaningless to discuss the semantics when important terms are undefined:

* "stack map" (and the format of a stack map, and where it is emitted/how it can be accessed, etc.)

* "shadow": while it's fairly clear roughly what is meant by this, this is Lang*Ref*, not "LangOverview" or "LangTour"

It may be that the inherently experimental nature of these intrinsics do not lend itself to being documented adequately enough for inclusion in LangRef at this point, in which case I would suggest demoting this description to a new page for experimental intrinsics until they have settled enough.

This allows the runtime to patch the code at this point in response to an event triggered from outside the code.

Here and elsewhere, I suggest avoiding saying "the runtime". It is more accurate to describe properties of the code, rather than the runtime (which LLVM doesn't provide and which is not a concept in the LangRef). For example this sentence could be "This permits the code to be safely patched".

-- Sean Silva

Hal Finkel

unread,

Oct 18, 2013, 1:09:19 PM10/18/13

to Andrew Trick, Sean Silva, llvmdev@cs.uiuc.edu Dev

----- Original Message -----
>
> On Fri, Oct 18, 2013 at 1:39 AM, Andrew Trick < atr...@apple.com >
> wrote:
>
>
>
> The initial documentation and patches name these intrinsics in a
> "webkit" namespace. This clarifies their current purpose and conveys
> that they haven't been standardized for other JITs yet. If someone on
> the on the dev list says "yes we want to use these too, just the way
> they are", then we can just drop the "webkit" name. More likely, we
> will continue improving their functionality for WebKit until some
> point in the future when another JIT customer tells us they would
> like
> to use the intrinsics but really want to change the interface. At
> that
> point, we can review this again with the goal of standardization and
> backward compatibility, then promote the name. WebKit is maintained
> against LLVM trunk so can be quickly adjusted to a new interface. The
> same may not be true of other JITs.

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

I'd like to second this; we need to document the format of the stack map. We might want to do this in a separate document (like the document which elaborates on the description of the exception handling intrinsics). This document should be seeded with, at least, the description for x86 (and we can add descriptions for the other backends as they're validated).

This looks like a very-useful piece of functionality.

-Hal

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

unread,

Oct 18, 2013, 2:00:52 PM10/18/13

to Sean Silva, llvmdev@cs.uiuc.edu Dev

100% agree with all your comments. I’m pulling the intrinsic docs into a separate page. I’ll create a phabricator diff and we can continue reviewing the docs on llvm-commits independent of this proposal.

-Andy

Andrew Trick

unread,

Oct 18, 2013, 2:08:27 PM10/18/13

to Hal Finkel, Sean Silva, llvmdev@cs.uiuc.edu Dev

On Oct 18, 2013, at 10:09 AM, Hal Finkel <hfi...@anl.gov> wrote:

----- Original Message -----

On Fri, Oct 18, 2013 at 1:39 AM, Andrew Trick < atr...@apple.com >
wrote:

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

Again, the only reason to start with the webkit name is that it’s easy to change webkit later to use different intrinsics. I was waiting to see how much interest there is in using these instrinsics as-is for other clients. So far, there seems to be strong interest. If there isn’t much debate regarding the intrinsic format then I’ll drop the webkit name.

I’m moving the intrinsic docs to a separate page, like exception intrinsics, and documenting the stack map format there.

-Andy

Chandler Carruth

unread,

Oct 18, 2013, 3:07:52 PM10/18/13

to Andrew Trick, Sean Silva, llvmdev@cs.uiuc.edu Dev

On Fri, Oct 18, 2013 at 11:08 AM, Andrew Trick <atr...@apple.com> wrote:

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

Again, the only reason to start with the webkit name is that it’s easy to change webkit later to use different intrinsics. I was waiting to see how much interest there is in using these instrinsics as-is for other clients. So far, there seems to be strong interest. If there isn’t much debate regarding the intrinsic format then I’ll drop the webkit name.

I'm strongly against naming this "webkit" or anything else to do with any other single consumer of LLVM which is not even an LLVM project. It is really confusing and implies a whole boat of things that aren't true.

I don't understand why you are pushing for "the final name or the webkit name". I think the recommendation of "experimental" is great. It clarifies that the exact interface isn't fully baked and may change, and clients must be prepared to update following LLVM trunk as opposed to expecting full backwards compatibility.

If this feature were *only* applicable to WebKit, I'm not even sure it would belong in the main open source repository. But it isn't, it's a really interesting general purpose feature for doing dynamic patching of call sites, and we should figure out a way to design and evolve it as such.

Chris Lattner

unread,

Oct 18, 2013, 3:39:52 PM10/18/13

to Andrew Trick, Sean Silva, Dev

On Oct 18, 2013, at 11:08 AM, Andrew Trick <atr...@apple.com> wrote:

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

I think that Hal's idea of "experimental" is the right approach here. The major thing we want is to avoid having to be backwards compatible with this intrinsic in subsequent llvm releases. "experimental" sends that message, where webkit does not (and is also bad for the reasons Hal mentions).

-Chris

Andrew Trick

unread,

Oct 18, 2013, 4:37:08 PM10/18/13

to llvmdev@cs.uiuc.edu Dev, Sean Silva

Done. I’ll update the patches on llvm-commits.

For the record, I wasn’t aware of any precedent for “llvm.experimental”, but if it will help avoid backward compatibility issues then it’s a good thing.

-Andy

Chandler Carruth

unread,

Oct 18, 2013, 4:41:56 PM10/18/13

to Andrew Trick, Sean Silva, llvmdev@cs.uiuc.edu Dev

On Fri, Oct 18, 2013 at 1:37 PM, Andrew Trick <atr...@apple.com> wrote:

For the record, I wasn’t aware of any precedent for “llvm.experimental”, but if it will help avoid backward compatibility issues then it’s a good thing.

I don't think we have precedent, but I think it will be really good to establish precedent. =]

Evan Cheng

unread,

Oct 18, 2013, 5:59:21 PM10/18/13

to Christopher Lattner, Sean Silva, Dev

On Oct 18, 2013, at 12:39 PM, Chris Lattner <clat...@apple.com> wrote:

What would be the criteria for eventually dropping 'experimental' from the intrinsic names?

Evan

-Chris

Chris Lattner

unread,

Oct 18, 2013, 11:37:50 PM10/18/13

to Chandler Carruth, Sean Silva, Dev

Right!

-Chris

Chris Lattner

unread,

Oct 18, 2013, 11:38:59 PM10/18/13

to Evan Cheng, Sean Silva, Dev

On Oct 18, 2013, at 2:59 PM, Evan Cheng <evan....@apple.com> wrote:

On Oct 18, 2013, at 12:39 PM, Chris Lattner <clat...@apple.com> wrote:

On Oct 18, 2013, at 11:08 AM, Andrew Trick <atr...@apple.com> wrote:

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

I think that Hal's idea of "experimental" is the right approach here. The major thing we want is to avoid having to be backwards compatible with this intrinsic in subsequent llvm releases. "experimental" sends that message, where webkit does not (and is also bad for the reasons Hal mentions).

What would be the criteria for eventually dropping 'experimental' from the intrinsic names?

At the least, I'd like to get some experience on these. Having webkit actually ship something based on this seems like a minimal requirement to demonstrate that it will actually work (end to end) in practice. Beyond that, we'd want to be happy enough with it that we'd be willing to autoupgrade it if it ever evolves in future releases: i.e. we'd be promising backward compatibility with the intrinsic.

-Chris

Eric Christopher

unread,

Oct 19, 2013, 9:02:56 PM10/19/13

to Chris Lattner, Sean Silva, Dev

Sounds like excellent criteria for me on the name. Thanks Chris :)

-eric

Philip R

unread,

Oct 22, 2013, 12:53:06 PM10/22/13

to Andrew Trick, llv...@cs.uiuc.edu

On 10/17/13 10:39 PM, Andrew Trick wrote:
> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
> first client of these features is the JavaScript compiler within the
> open source WebKit project.
>

I have a couple of comments on your proposal. None of these are major
enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than
a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic
duplicates existing functionality already in llvm. In particular, much
of the StackMap construction seems similar to the Safepoint mechanism
used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and
CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if
you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization
pass from manufacturing new pointers which point inside an existing data
structure. (e.g. an interior pointer to an array when blocking a loop)
Does your StackMap mechanism need to be able to inspect/modify these
manufactured temporaries? If so, I don't see how you could generate an
intrinsic which would include this manufactured pointer in the live
variable list. Is there something I'm missing here?
- Your patchpoint mechanism appears to be one very specialized use of a
patchable location. Would you mind renaming it to something like
patchablecall to reflect this specialization?

Yours,
Philip

Filip Pizlo

unread,

Oct 22, 2013, 1:34:06 PM10/22/13

to Philip R, llv...@cs.uiuc.edu

On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

> On 10/17/13 10:39 PM, Andrew Trick wrote:
>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
>> first client of these features is the JavaScript compiler within the
>> open source WebKit project.
>>
> I have a couple of comments on your proposal. None of these are major enough to prevent submission.
>
> - As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
> - Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
> - To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

> - Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

The top use case will be heap access dispatch inline cache, which is not a call.

You can also use it to implement call inline caches, but that's not the only thing you can use it for.

-Filip

Philip R

unread,

Oct 22, 2013, 4:48:34 PM10/22/13

to Filip Pizlo, llv...@cs.uiuc.edu

On 10/22/13 10:34 AM, Filip Pizlo wrote:
> On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:
>
>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
>>> first client of these features is the JavaScript compiler within the
>>> open source WebKit project.
>>>
>> I have a couple of comments on your proposal. None of these are major enough to prevent submission.
>>
>> - As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
>> - Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
>> - To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
> These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different,
the mechanism is fairly similar. In general, if the expected semantics
are the same, a shared implementation would be desirable. This is more
a suggestion for future refactoring than anything else.

I agree that interior pointers are primarily a problem for relocating
collectors. (Though I disagree with the characterization of it being
*uniquely* a problem for such collectors.) Since I was unaware of what
you're using your stackmap mechanism for, I wanted to ask. Sounds like
this is not an intended use case for you.

>
> In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

Thanks for the clarification. This is definitely a useful mechanism.
Thank you for contributing it back.

>
>> - Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?
> The top use case will be heap access dispatch inline cache, which is not a call.
> You can also use it to implement call inline caches, but that's not the only thing you can use it for.

Er, possibly I'm misunderstanding you. To me, a inline call cache is a
mechanism to optimize a dynamic call by adding a typecheck+directcall
fastpath. (i.e. avoiding the dynamic dispatch logic in the common
case) I'm assuming this what you mean with the term "call inline
cache", but I have never heard of a "heap access dispatch inline
cache". I've done a google search and didn't find a definition. Could
you point me to a reference or provide a brief explanation?

Filip Pizlo

unread,

Oct 22, 2013, 6:08:22 PM10/22/13

to Philip R, llvmdev

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being *uniquely* a problem for such collectors.) Since I was unaware of what you're using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.
Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.

- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?
The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that's not the only thing you can use it for.
Er, possibly I'm misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.

Inline caches don't have to be calls. For example, in JavaScript, the expression "o.f" is fully dynamic but usually does not result in a call. The inline cache - and hence patchpoint - for such an expression will not have a call in the common case.

Similar things arise in other dynamic languages. You can have inline caches for arithmetic. Or for array accesses. Or for any other dynamic operation in your language.

(i.e. avoiding the dynamic dispatch logic in the common case) I'm assuming this what you mean with the term "call inline cache", but I have never heard of a "heap access dispatch inline cache". I've done a google search and didn't find a definition. Could you point me to a reference or provide a brief explanation?

Every JavaScript engine does it, and usually the term "inline cache" in the context of JS engines implies dispatching on the shape of the object in order to find the offset at which a field is located, rather than dispatching on the class of an object to determine what method to call.

-Filip

Philip

Andrew Trick

unread,

Oct 22, 2013, 7:18:19 PM10/22/13

to Filip Pizlo, llvmdev

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't. The runtime can add GC roots to the stack map just like other live value, and it should know how to interpret the records. The intrinsic doesn’t bake in any particular interpretation of the mapped values. That said, my proposal deliberately does not cover GC. I think that stack maps are the easy part of the problem. The hard problem is tracking interior pointers, or for that matter exterior/out-of-bounds or swizzled pointers. LLVM’s machine IR simply doesn’t have the necessary facilities for doing this. But if you don’t need a moving collector, then you don’t need to track derived pointers as long as the roots are kept live. In that case, llvm.stackmap might be a nice optimization over llvm.gcroot.

Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments. But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention. In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.

-Andy

Philip R

unread,

Oct 22, 2013, 8:25:44 PM10/22/13

to Filip Pizlo, llvmdev

On 10/22/13 3:08 PM, Filip Pizlo wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:

On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I'm going to respond to Andrew Trick's followup for this portion.

I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being *uniquely* a problem for such collectors.) Since I was unaware of what you're using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.

- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that's not the only thing you can use it for.

Er, possibly I'm misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.

Inline caches don't have to be calls. For example, in JavaScript, the expression "o.f" is fully dynamic but usually does not result in a call. The inline cache - and hence patchpoint - for such an expression will not have a call in the common case.

Similar things arise in other dynamic languages. You can have inline caches for arithmetic. Or for array accesses. Or for any other dynamic operation in your language.

(i.e. avoiding the dynamic dispatch logic in the common case) I'm assuming this what you mean with the term "call inline cache", but I have never heard of a "heap access dispatch inline cache". I've done a google search and didn't find a definition. Could you point me to a reference or provide a brief explanation?

Every JavaScript engine does it, and usually the term "inline cache" in the context of JS engines implies dispatching on the shape of the object in order to find the offset at which a field is located, rather than dispatching on the class of an object to determine what method to call.

Thank you for the clarification. I am familiar with the patching optimizations performed for property access, but had not been aware of the modified usage of the term "inline cache". I was also unaware of the term "heap access dispatch inline cache". I believe I now understand your intent.

Taking a step back in the conversation, my original question was about the naming of the patchpoint intrinsic. I am now convinced that you could use your patchpoint intrinsic for a number of different inline caching schemes (method dispatch, property access, etc..). Given that, my concern about naming is diminished, but not completely eliminated. I don't really have a suggestion for a better name, but given that a "stackmap" intrinsic can be patched, the "patchpoint" intrinsic name doesn't seem particularly descriptive. To put it another way, how are the stackmap and patchpoint intrinsics different? Can this difference be encoded in a descriptive name for one or the other?

As a secondary point, it would be good to update the proposed documentation with a brief description of the intended usage (i.e. inline caching). This might prevent a future developer from being confused on the same issues.

Yours,
Philip

Filip Pizlo

unread,

Oct 22, 2013, 9:23:28 PM10/22/13

to Andrew Trick, llvmdev

On Oct 22, 2013, at 4:18 PM, Andrew Trick <atr...@apple.com> wrote:

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:
On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:
This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.
I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't.

You're right, it could work.

If you were happy with spilling all of your GC roots, then you could put them into allocas and then pass the allocas' addresses to a stackmap. This will give you a FP offset of the roots.

If you were happy with an accurate GC that couldn't move objects referenced from the stack then you could have each safepoint call use patchpoint, and then if you also implemented stack unwinding, you could use the patchpoints' implicit stackmaps to figure out which registers (or stack slots) contained pointers.

These would be niche uses, I think. If you care about performance then you're not going to use an accurate GC that requires spilling roots; you'll go for some GC algorithm that can handle conservative stack roots. If you're using accurate GC support for moving objects then it's usually because you need to move *all* objects (after all you can move *most* objects without any GC roots or stackmaps by using Bartlett's algorithm or similar) so the calls-as-patchpoints approach won't work.

I could kind of see some real-time GC's using the alloca+stackmap approach, but it's a bit of a stretch.

So, I don't see stackmaps as being particularly practical for accurate GC, but I do concede that you *could* implement some kind of accurate GC that uses stackmaps for some part of its stack scanning.

Philip R

unread,

Oct 22, 2013, 9:24:38 PM10/22/13

to Andrew Trick, llvmdev

Adding Gael as someone who has previously discussed vmkit topics on the list.ï¿½ Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself.

On 10/22/13 4:18 PM, Andrew Trick wrote:

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:

On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. ï¿½None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. ï¿½(minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. ï¿½In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). ï¿½Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. ï¿½(e.g. an interior pointer to an array when blocking a loop) ï¿½Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? ï¿½If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. ï¿½Is there something I'm missing here?

These stackmaps have nothing to do with GC. ï¿½Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. ï¿½These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. ï¿½You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. ï¿½This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. ï¿½While it's possible to unify the two, this isn't the intent here. ï¿½In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldnï¿½t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

Thank you for the restatement.ï¿½ You summarized my view well.ï¿½

The proposed stack maps are very different from LLVMï¿½s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and Iï¿½m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

Er, I think we're talking past each other again.ï¿½ Let me lay out my current understanding of the terminology and existing infrastructure in LLVM.ï¿½ Please correct me where I go wrong.

stack map - A mapping from "values" to storage locations.ï¿½ Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state).ï¿½ A stack map is specific to a particular PC and describes the state at that instruction only.ï¿½

In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector.ï¿½ When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack.ï¿½ This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread.ï¿½ To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself.ï¿½

In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered.ï¿½ (i.e. calls and actual stop-the-world safepoints)ï¿½ The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way.ï¿½ The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints).ï¿½ Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic.ï¿½

In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map.ï¿½ The SafePoints structure in GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage.ï¿½ See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp

Note: The summary of existing LLVM details above is based on reading the code.ï¿½ I haven't actually implemented anything which used this mechanism yet.ï¿½ As such, take it with a grain of salt.ï¿½

In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching.ï¿½ (Right?)ï¿½ Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM.ï¿½ The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not.ï¿½ One other possible difference, are you intending to include "values" which aren't of pointer type?ï¿½

Before moving on, am I interpreting your proposal and changes correctly?

Assuming I'm still correct so far, how might we combine these implementations?ï¿½ It looks like your implementation is much more mature than what exists in tree at the moment.ï¿½ One possibility would be to express the needed GC stack maps in terms of your new infrastructure.ï¿½ (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments).ï¿½ What would you think of this?ï¿½

p.s. This discussion has gotten sufficiently abstract that it should in no way block your plan to submit these changes.ï¿½ I appreciate your willingness to discuss.

If someone wanted to use stack maps for GC, I donï¿½t know why they wouldnï¿½t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't. The runtime can add GC roots to the stack map just like other live value, and it should know how to interpret the records. The intrinsic doesnï¿½t bake in any particular interpretation of the mapped values.

I think this a restatement of my last paragraph above which would mean we're actually in agreement.ï¿½

That said, my proposal deliberately does not cover GC. I think that stack maps are the easy part of the problem. The hard problem is tracking interior pointers, or for that matter exterior/out-of-bounds or swizzled pointers. LLVMï¿½s machine IR simply doesnï¿½t have the necessary facilities for doing this. But if you donï¿½t need a moving collector, then you donï¿½t need to track derived pointers as long as the roots are kept live. In that case, llvm.stackmap might be a nice optimization over llvm.gcroot.

Oddly enough, I'll be raising the issue of how to go about supporting a relocating collector on list shortly.ï¿½ We've looking into this independently, but are at the point we'd like to get feedback from others.ï¿½ :)

Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and itï¿½s nice to be able to leverage calling conventions to inform the location of arguments.

Agreed.ï¿½ My concern is mostly about naming and documentation of intended usages.ï¿½ Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used.ï¿½ The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.

But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.

Er, not quite true.ï¿½ Your calling convention also influences what registers stay live across the call.ï¿½ But in general, I see your point.

(Again, this is touching an area of LLVM I'm not particularly familiar with.)

In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.

If you were to do that, how would the implementation be different then the new stackmap intrinsic?ï¿½ Does that difference imply a clarification in intended usage or naming?

p.s. The naming discussion has gotten rather abstract and is starting to feel like a "what color is the bikeshed" discussion.ï¿½ Feel free to just tell me to go away at some point.ï¿½ :)

Philip

Andrew Trick

unread,

Oct 23, 2013, 1:48:10 AM10/23/13

to Philip R, llvmdev

I'll respond to a few questions below. I'll start a new thread for GC discussion.

On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:

Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments.
Agreed. My concern is mostly about naming and documentation of intended usages. Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used. The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.

I don't think the intrinsic names will be able to capture their semantics. I think that's why we need documentation, which I've been working on: http://llvm-reviews.chandlerc.com/D1981.

For example, the "stackmap" intrinsic isn't really a stack map, it's something that allows generation of a stack map in which the entries don't actually need to be on the stack... confusing, but still a good name I think.

But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.

Er, not quite true. Your calling convention also influences what registers stay live across the call. But in general, I see your point.

You get around that by defining a new calling convention. Each patchpoint intrinsic call can be marked with a different calling convention if you choose. For example, we'll be adding a dynamic calling convention called AnyRegCC. You can use that to effectively specify the number of arguments that you want to force into registers. The stack map will tell you which registers were used for arguments. The "call" will preserves most registers, but clobbers one register (on x86) for use within the code.

Another potential extension is to add an entry to the stackmap marking physical registers that are actually in-use across the stack map or patch point.

It helps me to think of llvm.patchpoint as a replacement for any situation where a JIT would have otherwise needed inline asm to generate the desired code sequence.

(Again, this is touching an area of LLVM I'm not particularly familiar with.)
In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.

If you were to do that, how would the implementation be different then the new stackmap intrinsic? Does that difference imply a clarification in intended usage or naming?

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

We could have defined different intrinsics for all variations of use cases, but I think two is the right number:

- Use llvm.stackmap if you just want a stack map. No code will be emitted. There is no calling convention. If the runtime patches the code here, it will be destructive.

- Use llvm.patchpoint if you want to reserve space for patching the code. When you do that, you can optionally specify a number of arguments that will follow a specified calling convention. You also get a stack map here because it can be useful to fuse the stack map to the point point location. After all, the runtime needs to know where to patch.

-Andy

Andrew Trick

unread,

Oct 23, 2013, 2:12:59 AM10/23/13

to Philip R, llvmdev

I'm moving this to a different thread. I think the newly proposed

intrinsic definitions and their current implementation are valuable

regardless of how it gets tied into GC...

On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:

Adding Gael as someone who has previously discussed vmkit topics on the list. Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself.

On 10/22/13 4:18 PM, Andrew Trick wrote:

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:

On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

Thank you for the restatement. You summarized my view well.

The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

Er, I think we're talking past each other again. Let me lay out my current understanding of the terminology and existing infrastructure in LLVM. Please correct me where I go wrong.

stack map - A mapping from "values" to storage locations. Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state). A stack map is specific to a particular PC and describes the state at that instruction only.

In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector. When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack. This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread. To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself.

In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered. (i.e. calls and actual stop-the-world safepoints) The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way. The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints). Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic.

In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map. The SafePoints structure in GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage. See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp

Note: The summary of existing LLVM details above is based on reading the code. I haven't actually implemented anything which used this mechanism yet. As such, take it with a grain of salt.

That's an excellent description of stack maps, GCStrategy, and

safepoints. Now let me explain how I see it.

GCStrategy provides layers of abstraction that allow plugins to

specialize GC metadata. Conceptually, a plugin can generate what looks

like stack map data to the collector. But there isn't any direct

support in LLVM IR for the kind of stack maps that we need.

When I talk about adding stack map support, I'm really talking about

support for mapping values to registers, where the set of values and

their locations are specific to the "safepoint".

We're adding an underlying implementation of per-safepoint live

values. There isn't a lot of abstraction built up around it. Just a

couple of intrinsics that directly expose the functionality.

We're also approaching the interface very differently. We're enabling

an MCJIT client. The interface to the client is the stack map format.

In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching. (Right?) Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM. The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not. One other possible difference, are you intending to include "values" which aren't of pointer type?

Yes, the values will be of various types (although only 32/64 bit

types are currently allowed because of DWARF register number

weirdness). More importantly, our stack maps record locations of a

specific set of values, which may be in registers, at a specific

location. In fact, that, along with reserving space for code patching,

is *all* we're doing. GCRoot doesn't do this at all. So there is

effectively no overlap in implementation.

Before moving on, am I interpreting your proposal and changes correctly?

Yes, except I don’t see a direct connection between the functionality we’re

adding and “the implementation currently in LLVM”.

Assuming I'm still correct so far, how might we combine these implementations? It looks like your implementation is much more mature than what exists in tree at the moment. One possibility would be to express the needed GC stack maps in terms of your new infrastructure. (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments). What would you think of this?

I can imagine someone wanting to leverage some of the new

implementation without using it end-to-end as-is. Although I'm not

entirely sure what the motivation would be. For example:

- A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint

calls at custom safepoints after determining GC root liveness at

those points.

- Something like a GCStrategy could intercept our implementation of

stack map generation and emit a custom format. Keep in mind though

that the format that LLVM emits does not need to be the format read

by the collector. The JIT/runtime can parse LLVM's stack map data

and encode it using it's own data structures. That way, the

JIT/runtime can change without customizing LLVM.

As far as hooking the new stack map support into the GCMetaData

abstraction, I'm not sure how that would work. GCMachineCodeAnalysis

is currently a standalone MI pass. We can't generate our stack maps

here. Technically, a preEmitPass can come along later and reassign

registers invalidating the stack map. That's why we generate the maps

during MC lowering.

So, currently, the new intrinsics are serving a different purpose than

GCMetaData. I think someone working on GC support needs to be

convinced that they really need the new stack map features. Then we

can build something on top of the underlying functionality that works

for them.

-Andy

Gaël Thomas

unread,

Oct 23, 2013, 4:17:35 AM10/23/13

to Andrew Trick, llvmdev

Hi all,

I don't know if I understand everything, but it seems really
interesting for a runtime developer, stackmap and patchpoint looks
perfect for a lot of optimizations :) I just have few question to
verify if I understand what are these stackmaps and patchpoints, and I
discuss the GC after.

* I have a first very simple scenario (useful in vmkit). Let's imagine
that we want to lazily build the layout of an object at runtime, i.e.,
we don't know the layout of the object when we are emitting the code.
And, we want to access to a field of this object identified by a
symbol. If I understand correctly, we can use your stackmap to define
the offset of this field and then patch the code that use this offset?
The machine code will like mov %(rax)offset, .., and the stackmap
will generate a map that contains the location of "offset" in the
code? If it's the case, it's perfect.

* Now, let's imagine that I want to lazily call a virtual method (aka,
a single dispatch call in Java). I have two problems. First, I have to
know the offset of the method in the virtual table (just like for
virtual calls in c++). Second, the entry in the table should contain a
stub able to link/compile the target method at runtime. And the stub
has to know which object has received the call (which drives the
method resolution and the update of the virtual table). With stackmaps
and patchpoints, I can imagine something like that (in pseudo-llvm
without typing)

%r0 = load %obj, 0 ; the virtual table is at offset 0
%r1 = 0
stackmap %r1, ID_OFFSET ; contains the offset of the target method in
the virtual table
%r2 = add %r1, %r0
%r3 = load %r2
patchpoint ID_CALL %r3, %obj, other parameters ; to find %obj in the stub

I should be able to:
- patch ID_OFFSET when I load the description of obj (before the call,
when the object is allocated)
- use ID_CALL to know which object is the target of the call in order
to find the appropriate method. If it could be the case, your
patchpoint and safepoint are very interesting for vmkit. We just need
a function to retreive, in the caller, from which patchpoint we are
coming from (probably by inspecting the stack, we can find the program
pointer of the call site and then find the patchpoint descriptor?)

* Now, for the GC, if I understand correctly, instead of declaring a
variable as a root, you can declare explicitly the safepoints by using
patchpoints with something like

patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the
alloca (or registers) that contains objects

Then in the suspendTheThreadForCollection, we can see that we are
coming for the safepoint_17 and then find the locations of the
objects? If a patchpoint can work like this, it's probably a good
building block for the gc.

Currently, we have to declare the root objects with the root
intrinsic, then add the appropriate safepoints (it's just a call to
GCFunctionInfo.addSafePoint). As root objects are marked as root,
modifying GCFunctionInfo.addSafepoint to generate a patchpoint with
all the gc roots as argument (instead of using the current
infrastructure) should not be difficult. And it probably means that
the current gc infrastructure could use patchpoint as a backend. The
only problem that I see is that all the objects will be transmitted as
arguments to suspendTheThreadForCollection, it's maybe not the best
way to do that. Probably, something like:
safepoint ID_safepoint_17, list of alloca that contains objects
patchpoint ID_safepoint_17, suspendTheThreadForCollection
should be better to avoid useless arguments?

See you,
Gaël

PS: just, tell me if the code is already in the trunk, because I would
like to see if these intrinsics can work for vmkit :)

2013/10/23 Andrew Trick <atr...@apple.com>:

--
-------------------------------------------------------------------
Gaël Thomas, Associate Professor, UPMC
http://pagesperso-systeme.lip6.fr/Gael.Thomas/
-------------------------------------------------------------------

Filip Pizlo

unread,

Oct 23, 2013, 11:38:14 AM10/23/13

to Gaël Thomas, llvmdev

> On Oct 23, 2013, at 1:17 AM, Gaël Thomas <gael....@lip6.fr> wrote:
>
> Hi all,
>
> I don't know if I understand everything, but it seems really
> interesting for a runtime developer, stackmap and patchpoint looks
> perfect for a lot of optimizations :) I just have few question to
> verify if I understand what are these stackmaps and patchpoints, and I
> discuss the GC after.
>
> * I have a first very simple scenario (useful in vmkit). Let's imagine
> that we want to lazily build the layout of an object at runtime, i.e.,
> we don't know the layout of the object when we are emitting the code.
> And, we want to access to a field of this object identified by a
> symbol. If I understand correctly, we can use your stackmap to define
> the offset of this field and then patch the code that use this offset?
> The machine code will like mov %(rax)offset, .., and the stackmap
> will generate a map that contains the location of "offset" in the
> code? If it's the case, it's perfect.

This is one of the use cases of patchpoint. Stackmap doesn't quite work because in IR it doesn't return anything - though you could probably use stackmap for putfield. But patchpoint is more convenient for this, I think.

You'll probably want a wider range of return types of patchpoint. Currently it's just i64.

I think that you'd instead want the whole call and the vtable resolution to be machine code that you generate inside the patchpoint.

Andrew Trick

unread,

Oct 23, 2013, 7:37:45 PM10/23/13

to Gaël Thomas, llvmdev

On Oct 23, 2013, at 1:17 AM, Gaël Thomas <gael....@lip6.fr> wrote:

Hi all,

I don't know if I understand everything, but it seems really
interesting for a runtime developer, stackmap and patchpoint looks
perfect for a lot of optimizations :) I just have few question to
verify if I understand what are these stackmaps and patchpoints, and I
discuss the GC after.

* I have a first very simple scenario (useful in vmkit). Let's imagine
that we want to lazily build the layout of an object at runtime, i.e.,
we don't know the layout of the object when we are emitting the code.
And, we want to access to a field of this object identified by a
symbol. If I understand correctly, we can use your stackmap to define
the offset of this field and then patch the code that use this offset?
The machine code will like mov %(rax)offset, .., and the stackmap
will generate a map that contains the location of "offset" in the
code? If it's the case, it's perfect.

I agree with Filip's response. I'm not sure I exactly what you envision though.

Ignoring calling conventions for the moment, with the current implementation you'll get something like this for llvm.patchpoint(ID, 12, %resolveGetField, %obj):

mov (%rax), %rcx ; load object pointer

--- patchpoint ---

movabsq $resolveGetField, $r11

callq *$r11

--- end patchpoint ---

ret ; field value in %rax

The stack map will indicate that %rcx holds the object pointer.

You can then patch it with:

mov (%rax), %rcx ; load object pointer

--- patchpoint ---

mov offset(%rcx), $rax ; load field at offset

nop

nop...

--- end patchpoint ---

ret ; field value in %rax

I think you're invisioning patching instructions that are emitted by LLVM outside of a patchpoint's reserved space. While that is possible to do using llvm.stackmap, it is not a good idea to assume anything about LLVM instruction selection or scheduling.

As Filip said, just use a single patchpoint for the vtable call sequence and only patch within the reserved bytes.

llvm.patchpoint(ID, nbytes, %resolveVCall, %obj):

Say the stackmap locates %obj in $r1 and a return value in $r3, you would simply patch 'nbytes' of code as follows:

--- patchpoint ---

mov (%r1), $r2 ; load vtable ptr

mov $vtableofs($r2), $r3

--- end patchpoint ---

You could potentially expose the vtable ptr load to LLVM IR.

Note that patching at llvm.stackmap (not llvm.patchpoint) would only be done if you don't want to resume execuction within the same compiled function after reaching stack map.

* Now, for the GC, if I understand correctly, instead of declaring a
variable as a root, you can declare explicitly the safepoints by using
patchpoints with something like

patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the
alloca (or registers) that contains objects

Then in the suspendTheThreadForCollection, we can see that we are
coming for the safepoint_17 and then find the locations of the
objects? If a patchpoint can work like this, it's probably a good
building block for the gc.

Currently, we have to declare the root objects with the root
intrinsic, then add the appropriate safepoints (it's just a call to
GCFunctionInfo.addSafePoint). As root objects are marked as root,
modifying GCFunctionInfo.addSafepoint to generate a patchpoint with
all the gc roots as argument (instead of using the current
infrastructure) should not be difficult. And it probably means that
the current gc infrastructure could use patchpoint as a backend. The

Yes, but I think it's more robust to call addSafepoint during MC lowering when we know that register assignments can't change. I see two options

- Call addSafepoint() within the AsmPrinter where we currently call StackMaps::recordStackMap.

- Run addGCPasses() after addPreEmitPass() in Passes.cpp. I'm not sure if there's a good reason we currently run it before block layout and target-specific passes.

only problem that I see is that all the objects will be transmitted as
arguments to suspendTheThreadForCollection, it's maybe not the best
way to do that. Probably, something like:
safepoint ID_safepoint_17, list of alloca that contains objects
patchpoint ID_safepoint_17, suspendTheThreadForCollection
should be better to avoid useless arguments?

No, just use patchpoint, passing '0' as the number of calling convention arguments. All patchpoint operands after the arguments are identical to stackmap operands:

patchpoint(ID_safepoint_17, suspendThread, 0, list of alloca)

See you,
Gaël

PS: just, tell me if the code is already in the trunk, because I would

like to see if these intrinsics can work for vmkit :)

The implementation is not final, but a working patch set is up for review. See

http://llvm-reviews.chandlerc.com/D1996 and its dependent patches.

-Andy

Philip Reames

unread,

Oct 23, 2013, 8:27:25 PM10/23/13

to Andrew Trick, llvmdev

On 10/22/13 10:48 PM, Andrew Trick wrote:

I'll respond to a few questions below. I'll start a new thread for GC discussion.

Good idea.ï¿½ Thanks.

On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:

Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and itï¿½s nice to be able to leverage calling conventions to inform the location of arguments.
Agreed.ï¿½ My concern is mostly about naming and documentation of intended usages.ï¿½ Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used.ï¿½ The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.

I don't think the intrinsic names will be able to capture their semantics. I think that's why we need documentation, which I've been working on: http://llvm-reviews.chandlerc.com/D1981.

For example, the "stackmap" intrinsic isn't really a stack map, it's something that allows generation of a stack map in which the entries don't actually need to be on the stack... confusing, but still a good name I think.

"stack map" is also a fairly well understood term in the GC/compiler world.ï¿½ It's better to stick with well known terminology where possible.ï¿½

As for naming vs documentation, I tend to believe that naming should be as descriptive as is reasonable.ï¿½ Having said that, we're well past the point of adding value with this discussion.ï¿½ Please update the documentation to reflect some of the clarifications on usage that have come up here and let's move on.ï¿½

But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention.

Er, not quite true.ï¿½ Your calling convention also influences what registers stay live across the call.ï¿½ But in general, I see your point.

You get around that by defining a new calling convention. Each patchpoint intrinsic call can be marked with a different calling convention if you choose. For example, we'll be adding a dynamic calling convention called AnyRegCC. You can use that to effectively specify the number of arguments that you want to force into registers. The stack map will tell you which registers were used for arguments. The "call" will preserves most registers, but clobbers one register (on x86) for use within the code.

Nice trick.ï¿½ I'll have to remember that.ï¿½

Another potential extension is to add an entry to the stackmap marking physical registers that are actually in-use across the stack map or patch point.

It helps me to think of llvm.patchpoint as a replacement for any situation where a JIT would have otherwise needed inline asm to generate the desired code sequence.

(Again, this is touching an area of LLVM I'm not particularly familiar with.)

In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.

If you were to do that, how would the implementation be different then the new stackmap intrinsic?ï¿½ Does that difference imply a clarification in intended usage or naming?

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.ï¿½ Let's move on.ï¿½ We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.ï¿½

We could have defined different intrinsics for all variations of use cases, but I think two is the right number:

- Use llvm.stackmap if you just want a stack map. No code will be emitted. There is no calling convention. If the runtime patches the code here, it will be destructive.

- Use llvm.patchpoint if you want to reserve space for patching the code. When you do that, you can optionally specify a number of arguments that will follow a specified calling convention. You also get a stack map here because it can be useful to fuse the stack map to the point point location. After all, the runtime needs to know where to patch.

This summary is really helpful.ï¿½

This summary and the other points you've made in our discussion is exactly what should be in the documentation.ï¿½ It provides the thought process behind their design, and the intended usage scenarios.ï¿½ Can you add a section with this meta information?ï¿½

Yours,
Philip

p.s. Thank you both for taking the time to hash this through.ï¿½

Andrew Trick

unread,

Oct 23, 2013, 8:38:11 PM10/23/13

to Philip Reames, llvmdev

On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.

I want this to be clear when we expose the intrinsics to other developers. I’ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed.

-Andy

Philip Reames

unread,

Oct 23, 2013, 9:23:02 PM10/23/13

to Andrew Trick, llvmdev

On 10/22/13 11:12 PM, Andrew Trick wrote:

I'm moving this to a different thread. I think the newly proposed

intrinsic definitions and their current implementation are valuable

regardless of how it gets tied into GC...

Agreed.ï¿½ As Gaï¿½l said, I'm looking forward to being able to play with this in tree.ï¿½ :)

On Oct 22, 2013, at 6:24 PM, Philip R <list...@philipreames.com> wrote:

Adding Gael as someone who has previously discussed vmkit topics on the list.ï¿½ Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself.

On 10/22/13 4:18 PM, Andrew Trick wrote:

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpi...@apple.com> wrote:

On Oct 22, 2013, at 1:48 PM, Philip R <list...@philipreames.com> wrote:

On 10/22/13 10:34 AM, Filip Pizlo wrote:

On Oct 22, 2013, at 9:53 AM, Philip R <list...@philipreames.com> wrote:

On 10/17/13 10:39 PM, Andrew Trick wrote:

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. ï¿½None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. ï¿½(minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. ï¿½In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). ï¿½Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. ï¿½(e.g. an interior pointer to an array when blocking a loop) ï¿½Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? ï¿½If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. ï¿½Is there something I'm missing here?

These stackmaps have nothing to do with GC. ï¿½Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It's not at all similar. ï¿½These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. ï¿½You won't use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. ï¿½This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. ï¿½While it's possible to unify the two, this isn't the intent here. ï¿½In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldnï¿½t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

Thank you for the restatement.ï¿½ You summarized my view well.ï¿½

The proposed stack maps are very different from LLVMï¿½s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and Iï¿½m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

Er, I think we're talking past each other again.ï¿½ Let me lay out my current understanding of the terminology and existing infrastructure in LLVM.ï¿½ Please correct me where I go wrong.

stack map - A mapping from "values" to storage locations.ï¿½ Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state).ï¿½ A stack map is specific to a particular PC and describes the state at that instruction only.ï¿½

In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector.ï¿½ When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack.ï¿½ This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread.ï¿½ To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself.ï¿½

In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered.ï¿½ (i.e. calls and actual stop-the-world safepoints)ï¿½ The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way.ï¿½ The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints).ï¿½ Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic.ï¿½

In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map.ï¿½ The SafePoints structure in GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage.ï¿½ See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp

Note: The summary of existing LLVM details above is based on reading the code.ï¿½ I haven't actually implemented anything which used this mechanism yet.ï¿½ As such, take it with a grain of salt.ï¿½

That's an excellent description of stack maps, GCStrategy, and

safepoints. Now let me explain how I see it.

GCStrategy provides layers of abstraction that allow plugins to

specialize GC metadata. Conceptually, a plugin can generate what looks

like stack map data to the collector. But there isn't any direct

support in LLVM IR for the kind of stack maps that we need.

When I talk about adding stack map support, I'm really talking about

support for mapping values to registers, where the set of values and

their locations are specific to the "safepoint".

We're adding an underlying implementation of per-safepoint live

values. There isn't a lot of abstraction built up around it. Just a

couple of intrinsics that directly expose the functionality.

We're also approaching the interface very differently. We're enabling

an MCJIT client. The interface to the client is the stack map format.

For the record, I actually prefer your approach to the interface.ï¿½ :)

In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching.ï¿½ (Right?)ï¿½ Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM.ï¿½ The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not.ï¿½ One other possible difference, are you intending to include "values" which aren't of pointer type?ï¿½

Yes, the values will be of various types (although only 32/64 bit

types are currently allowed because of DWARF register number

weirdness). More importantly, our stack maps record locations of a

specific set of values, which may be in registers, at a specific

location.

The fact that you're interested in more than information about which locations contain pointers into the heap is the key point here.ï¿½ Your stack map is actually slightly more general than the form used by a garbage collector.ï¿½ For example, your mechanism allows you to describe where the iteration variable ("int i") in a loop lives.ï¿½ This is not something a stack map (in the sense I've been using it to refer to GC usage) would enable.

In fact, that, along with reserving space for code patching,

is *all* we're doing. GCRoot doesn't do this at all. So there is

effectively no overlap in implementation.

I'm actually come around to agree with you.ï¿½ I think you're slightly misunderstanding the role of "safepoints" and "gcroot" in the current implementation, but the fact that your mechanism is significantly more general than standard GC stackmaps (which LLVM implements currently) is a strong reason for having them as a separate implementation.ï¿½

(For performance reasons, a GC framework would probably want to use more concise stack maps which only encode pointer roots.)

Before moving on, am I interpreting your proposal and changes correctly?

Yes, except I donï¿½t see a direct connection between the functionality weï¿½re

adding and ï¿½the implementation currently in LLVMï¿½.

Assuming I'm still correct so far, how might we combine these implementations?ï¿½ It looks like your implementation is much more mature than what exists in tree at the moment.ï¿½ One possibility would be to express the needed GC stack maps in terms of your new infrastructure.ï¿½ (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments).ï¿½ What would you think of this?ï¿½

I can imagine someone wanting to leverage some of the new

implementation without using it end-to-end as-is. Although I'm not

entirely sure what the motivation would be. For example:

- A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint

ï¿½ calls at custom safepoints after determining GC root liveness at

ï¿½ those points.

- Something like a GCStrategy could intercept our implementation of

ï¿½ stack map generation and emit a custom format. Keep in mind though

ï¿½ that the format that LLVM emits does not need to be the format read

ï¿½ by the collector. The JIT/runtime can parse LLVM's stack map data

ï¿½ and encode it using it's own data structures. That way, the

ï¿½ JIT/runtime can change without customizing LLVM.

I think this is a very good point.ï¿½ Alternately, you could frame your encoding as being the default representation provided by LLVM and provide a plugin mechanism to modify it.ï¿½ (Not proposing this should actually be done at the moment.ï¿½ This would be by demand only.)

As far as hooking the new stack map support into the GCMetaData

abstraction, I'm not sure how that would work. GCMachineCodeAnalysis

is currently a standalone MI pass. We can't generate our stack maps

here. Technically, a preEmitPass can come along later and reassign

registers invalidating the stack map. That's why we generate the maps

during MC lowering.

I agree.ï¿½ I think this is actually a problem with the existing implementation as well.ï¿½ It gets around it by (I believe) forcing all roots to the stack when a stack map is needed.ï¿½

So, currently, the new intrinsics are serving a different purpose than

GCMetaData. I think someone working on GC support needs to be

convinced that they really need the new stack map features. Then we

can build something on top of the underlying functionality that works

for them.

-Andy

Just to note, that person working on the GC support is very likely to be me (or one of my coworkers) in the near future.ï¿½ That's why I've been so interested in your changes.ï¿½ :)

As background, we're investing using LLVM as a JIT compiler for a VM which uses a precise relocating collector.ï¿½ The existing collector support appears problematic with regards to a relocating collector and we're investigating approaches to enhance it.ï¿½ My coworker will be opening another thread on that topic in the next few days.

Philip

Philip Reames

unread,

Oct 23, 2013, 10:26:13 PM10/23/13

to Andrew Trick, llvmdev

On 10/23/13 5:38 PM, Andrew Trick wrote:

On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.ï¿½ Let's move on.ï¿½ We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.ï¿½ï¿½

I want this to be clear when we expose the intrinsics to other developers. Iï¿½ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed

I responded in the review.ï¿½ The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end.ï¿½ This seemed to disagree with our previous discussion.ï¿½ Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic.ï¿½ In either case, we should probably hash it out here.ï¿½

Philip

Andrew Trick

unread,

Oct 24, 2013, 1:03:24 AM10/24/13

to Philip Reames, llvmdev

On Oct 23, 2013, at 7:26 PM, Philip Reames <list...@philipreames.com> wrote:

On 10/23/13 5:38 PM, Andrew Trick wrote:

On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.

I want this to be clear when we expose the intrinsics to other developers. I’ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
I responded in the review. The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end. This seemed to disagree with our previous discussion. Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic. In either case, we should probably hash it out here.

Thanks for the review. I'll respond to each of your comments there.

There's one important thing I'm not getting across. llvm.stackmap is not primarily intended for patching. As an additional feature, llvm.stackmap can specify the size of its shadow, but does not need to generate nops in its shadow. The shadow can contain arbitrary code from the same function. The only purpose of the shadow is to allow destructive patching. Destructive means that other that some arbitrary code may be overwritten. I think the only use for this is to invalidate compiled code in a way that allows the runtime to recover. That just happens to be a very important use. Note that if we didn't specify a shadow, the runtime would have a hard time knowing how to patch without potentially clobbering something outside the current function.

-Andy

Philip Reames

unread,

Oct 24, 2013, 12:31:06 PM10/24/13

to Andrew Trick, llvmdev

On 10/23/13 10:03 PM, Andrew Trick wrote:

On Oct 23, 2013, at 7:26 PM, Philip Reames <list...@philipreames.com> wrote:

On 10/23/13 5:38 PM, Andrew Trick wrote:

On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more.ï¿½ Let's move on.ï¿½ We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.ï¿½ï¿½

I want this to be clear when we expose the intrinsics to other developers. Iï¿½ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
I responded in the review.ï¿½ The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end.ï¿½ This seemed to disagree with our previous discussion.ï¿½ Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic.ï¿½ In either case, we should probably hash it out here.ï¿½

Thanks for the review. I'll respond to each of your comments there.

There's one important thing I'm not getting across. llvm.stackmap is not primarily intended for patching. As an additional feature,ï¿½llvm.stackmap can specify the size of its shadow, but does not need to generate nops in its shadow. The shadow can contain arbitrary code from the same function. The only purpose of the shadow is to allow destructive patching. Destructive means that other that some arbitrary code may be overwritten. I think the only use for this is to invalidate compiled code in a way that allows the runtime to recover. That just happens to be a very important use. Note that if we didn't specify a shadow, the runtime would have a hard time knowing how to patch without potentially clobbering something outside the current function.

So essentially you're using the shadow region to enable creating either a nop sled which carries you down into deopt code or a forced trap (i.e. patching with an instruction sequence which forces a trap to a signal handler)?ï¿½ Ok, that sounds reasonable.ï¿½ You should spell this out this example explicitly in the documentation.ï¿½ It's useful for clarification purposes.

Yours,
Philip

Andrew Trick

unread,

Oct 31, 2013, 1:11:55 PM10/31/13

to Philip Reames, llvmdev

On Oct 24, 2013, at 9:31 AM, Philip Reames <list...@philipreames.com> wrote:

On 10/23/13 10:03 PM, Andrew Trick wrote:

On Oct 23, 2013, at 7:26 PM, Philip Reames <list...@philipreames.com> wrote:

On 10/23/13 5:38 PM, Andrew Trick wrote:

On Oct 23, 2013, at 5:27 PM, Philip Reames <list...@philipreames.com> wrote:

The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does.

I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.

I want this to be clear when we expose the intrinsics to other developers. I’ve been making an effort to improve the docs: http://llvm-reviews.chandlerc.com/D1981. Please let me know where clarification is needed
I responded in the review. The only big thing that might be worth discussion here is the "full resume" semantics which are mentioned at the very end. This seemed to disagree with our previous discussion. Let me know if you're either a) unclear at what I was getting at or b) believe the "full resume" semantics were a key part of the intrinsic. In either case, we should probably hash it out here.

Thanks for the review. I'll respond to each of your comments there.

There's one important thing I'm not getting across. llvm.stackmap is not primarily intended for patching. As an additional feature, llvm.stackmap can specify the size of its shadow, but does not need to generate nops in its shadow. The shadow can contain arbitrary code from the same function. The only purpose of the shadow is to allow destructive patching. Destructive means that other that some arbitrary code may be overwritten. I think the only use for this is to invalidate compiled code in a way that allows the runtime to recover. That just happens to be a very important use. Note that if we didn't specify a shadow, the runtime would have a hard time knowing how to patch without potentially clobbering something outside the current function.

So essentially you're using the shadow region to enable creating either a nop sled which carries you down into deopt code or a forced trap (i.e. patching with an instruction sequence which forces a trap to a signal handler)? Ok, that sounds reasonable. You should spell this out this example explicitly in the documentation. It's useful for clarification purposes.

The latest version of the documentation can be found here:

http://llvm-reviews.chandlerc.com/D1981

-Andy

Reply all

Reply to author

Forward