[llvm-dev] GC-parseable element atomic memcpy/memmove

Artur Pilipenko via llvm-dev

unread,

Sep 18, 2020, 7:51:28 PM9/18/20

to llvm...@lists.llvm.org, Philip Reames

TLDR: a proposal to add GC-parseable lowering to element atomic

memcpy/memmove instrinsics controlled by a new "requires-statepoint”

call attribute.

Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are

considered as GC leaf functions (like most other intrinsics). As a

result GC cannot occur while copy operation is in progress. This might

have negative effect on GC latencies when large amounts of data are

copied. To avoid this problem copying large amounts of data can be

done in chunks with GC safepoints in between. We'd like to be able to

represent such copy using existing instrinsics [1].

For that I'd like to propose a new attribute for

llvm.{memcpy|memmove}.element.unordered.atomic calls

"requires-statepoint". This attribute on a call will result in a

different lowering, which makes it possible to have a GC safepoint

during the copy operation.

There are three parts to the new lowering:

1) The calls with the new attribute will be wrapped into a statepoint

by RewriteStatepointsForGC (RS4GC). This way the stack at the calls

will be GC parceable.

2) Currently these intrinsics are lowered to GC leaf calls to the symbols

__llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>.

The calls with the new attribute will be lowered to calls to different

symbols, let's say

__llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.

This way the runtime can provide copy implementations with safepoints.

3) Currently memcpy/memmove calls take derived pointers as arguments.

If we copy with safepoints we might need to relocate the underlying

source/destination objects on a safepoint. In order to do this we need

to know the base pointers as well. How do we make the base pointers

available in the copy routine? I suggest we add them explicitly as

arguments during lowering.

For example:

__llvm_memcpy_element_unordered_atomic_safepoint_1(

dest_base, dest_derived, src_base, src_derived, length)

It will be up to RS4GC to do the new lowering and prepare the arguments.

RS4GC knows how to compute base pointers for a given derived pointer.

It also already does lowering for deoptimize intrinsics by replacing

an intrinsic call with a symbol call. So there is a precedent here.

Other alternatives:

- Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept

base pointers + offsets instead of derived pointers. This will

require autoupgrade of old representation. Changing API of a generic

intrinsic to facilitate GC-specific lowering doesn't look like the

best idea. This will not work if we want to do the same for non-atomic

intrinsics.

- Teach GC infrastructure to record base pointers for all derived

pointer arguments. This looks like an overkill for single use case.

Here is the proposed implementation in a single patch:

https://reviews.llvm.org/D87954

If there are no objections I will split it into individual reviews and

add langref changes.

Thoughts?

Artur

[1] An alternative approach would be to make the frontend generate a

chunked copy loop with a safepoint inside. The downsides are:

- It's harder for the optimizer to see that this loop is just a copy

of a range of bytes.

- It forces one particular lowering with the chunked loop inlined in

compiled code. We can't outline the copy loop into the copy routine.

With the intrinsic representation of a chunked copy we can choose

different lowering strategies if we want.

- In our system we have to outline the copy loop into the copy routine

due to interactions with deoptimization.

Artur Pilipenko via llvm-dev

unread,

Sep 24, 2020, 10:28:48 PM9/24/20

to llvm...@lists.llvm.org, Philip Reames

Ping?

Artur

Philip Reames via llvm-dev

unread,

Sep 28, 2020, 1:56:11 PM9/28/20

to Artur Pilipenko, llvm...@lists.llvm.org, Philip Reames

In general, I am supportive of this direction. It seems like an entirely reasonable solution. I do have some comments below, but they're mostly of the "how do we generalize this?" variety.

First, let's touch on the attribute.

My first concern is naming; I think the use of "statepoint" here is problematic as this doesn't relate to lowering strategy needed (e.g. statepoints), but the conceptual support (e.g. a safepoint). This could be resolved by simply tweaking to require-safepoint.

But that brings us to a broader point. We've chosen to build in the fact intrinsics don't require safepoints. If all we want is for some intrinsics *to* require safepoints, why isn't this simply a tweak to the existing code? callsGCLeafFunction already has a small list of intrinsics which can have safepoints.

I think you can completely remove the need for this attribute by a) adding the atomic memcpy variants to the exclude list in callsGCLeafFunction, and b) using the existing "gc-leaf-function" on most calls the frontend generates.

Second, let's discuss the signature for the runtime function.

I think you should use a signature for the runtime call which takes base pointers and offsets, not base pointers and derived pointers. Why? Because passing derived pointers in registers for arguments presumes that the runtime knows how to map a record in the stackmap to where a callee might have shuffled the argument to. Some runtimes may support this, others may not. Given using the offset scheme is just as simple to implement, being considerate and minimizing the runtime support required seems worthwhile.

On x86, the cost of a subtract (to produce the offset in the worst case), and an LEA (to produce the derived pointer again inside the runtime routine) is pretty minimal. Particular since the former is likely to be optimized away and the later folded into the addressing mode.

Finally, it's also worth noting that some (but not all) GCs can convert from an interior derived pointer to the base of the containing object. With the memcpy family we know that either the pointers are all interior derived, or the length must be zero. This is not true for all GCs and thus we don't want to rely on it.

Philip

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Artur Pilipenko via llvm-dev

unread,

Sep 30, 2020, 12:12:07 AM9/30/20

to Philip Reames, llvm...@lists.llvm.org, Philip Reames

Thanks for the feedback.

I think both of the suggestions are very reasonable. I’ll incorporate them.

Given there were no objections for two weeks, I’m going to go ahead with posting individual patches for review.

One small question inline:

On Sep 28, 2020, at 10:56 AM, Philip Reames <list...@philipreames.com> wrote:

In general, I am supportive of this direction. It seems like an entirely reasonable solution. I do have some comments below, but they're mostly of the "how do we generalize this?" variety.

First, let's touch on the attribute.

My first concern is naming; I think the use of "statepoint" here is problematic as this doesn't relate to lowering strategy needed (e.g. statepoints), but the conceptual support (e.g. a safepoint). This could be resolved by simply tweaking to require-safepoint.

But that brings us to a broader point. We've chosen to build in the fact intrinsics don't require safepoints. If all we want is for some intrinsics *to* require safepoints, why isn't this simply a tweak to the existing code? callsGCLeafFunction already has a small list of intrinsics which can have safepoints.

I think you can completely remove the need for this attribute by a) adding the atomic memcpy variants to the exclude list in callsGCLeafFunction, and b) using the existing "gc-leaf-function" on most calls the frontend generates.

Second, let's discuss the signature for the runtime function.

I think you should use a signature for the runtime call which takes base pointers and offsets, not base pointers and derived pointers. Why? Because passing derived pointers in registers for arguments presumes that the runtime knows how to map a record in the stackmap to where a callee might have shuffled the argument to. Some runtimes may support this, others may not. Given using the offset scheme is just as simple to implement, being considerate and minimizing the runtime support required seems worthwhile.

On x86, the cost of a subtract (to produce the offset in the worst case), and an LEA (to produce the derived pointer again inside the runtime routine) is pretty minimal. Particular since the former is likely to be optimized away and the later folded into the addressing mode.

Finally, it's also worth noting that some (but not all) GCs can convert from an interior derived pointer to the base of the containing object. With the memcpy family we know that either the pointers are all interior derived, or the length must be zero. This is not true for all GCs and thus we don't want to rely on it.

Do you think it makes sense to control this aspect of lowering (derived pointers vs base+offset in memcpy args) using GCStrategy?

Artur

Philip Reames via llvm-dev

unread,

Sep 30, 2020, 4:08:31 PM9/30/20

to Artur Pilipenko, llvm...@lists.llvm.org, Philip Reames

On 9/29/20 9:11 PM, Artur Pilipenko wrote:

Thanks for the feedback.

I think both of the suggestions are very reasonable. I’ll incorporate them.

Given there were no objections for two weeks, I’m going to go ahead with posting individual patches for review.

One small question inline:

On Sep 28, 2020, at 10:56 AM, Philip Reames <list...@philipreames.com> wrote:

In general, I am supportive of this direction. It seems like an entirely reasonable solution. I do have some comments below, but they're mostly of the "how do we generalize this?" variety.

First, let's touch on the attribute.

My first concern is naming; I think the use of "statepoint" here is problematic as this doesn't relate to lowering strategy needed (e.g. statepoints), but the conceptual support (e.g. a safepoint). This could be resolved by simply tweaking to require-safepoint.

But that brings us to a broader point. We've chosen to build in the fact intrinsics don't require safepoints. If all we want is for some intrinsics *to* require safepoints, why isn't this simply a tweak to the existing code? callsGCLeafFunction already has a small list of intrinsics which can have safepoints.

I think you can completely remove the need for this attribute by a) adding the atomic memcpy variants to the exclude list in callsGCLeafFunction, and b) using the existing "gc-leaf-function" on most calls the frontend generates.

Second, let's discuss the signature for the runtime function.

I think you should use a signature for the runtime call which takes base pointers and offsets, not base pointers and derived pointers. Why? Because passing derived pointers in registers for arguments presumes that the runtime knows how to map a record in the stackmap to where a callee might have shuffled the argument to. Some runtimes may support this, others may not. Given using the offset scheme is just as simple to implement, being considerate and minimizing the runtime support required seems worthwhile.

On x86, the cost of a subtract (to produce the offset in the worst case), and an LEA (to produce the derived pointer again inside the runtime routine) is pretty minimal. Particular since the former is likely to be optimized away and the later folded into the addressing mode.

Finally, it's also worth noting that some (but not all) GCs can convert from an interior derived pointer to the base of the containing object. With the memcpy family we know that either the pointers are all interior derived, or the length must be zero. This is not true for all GCs and thus we don't want to rely on it.

Do you think it makes sense to control this aspect of lowering (derived pointers vs base+offset in memcpy args) using GCStrategy?

I would not bother. The performance difference is tiny, and no one is to my knowledge using LLVM for such a use case. If we have a reported regression, we can address then.

Artur Pilipenko via llvm-dev

unread,

Oct 8, 2020, 9:07:59 PM10/8/20

to Philip Reames, llvm...@lists.llvm.org

I incorporated the feedback and posted the Phabricator review.

https://reviews.llvm.org/D88861

Artur

Reply all

Reply to author

Forward