[llvm-dev] llvm.memcpy for struct copy

ma jun via llvm-dev

unread,

Jan 30, 2018, 2:13:00 AM1/30/18

to llvm...@lists.llvm.org

Hi all

I'm new here, and I have some question about llvm.memcpy intrinsic.

why does llvm.memcpy intrinsic only support i8* for first two arguments? and does clang will also transform struct copy into llvm.memcpy ? what format does IR looks like?

Thanks !

Regards

Jun

Craig Topper via llvm-dev

unread,

Jan 30, 2018, 2:24:14 AM1/30/18

to ma jun, llvm-dev

The i8 type in the pointers doesn't matter a whole lot. There's a long term plan to remove the type from all pointers in llvm IR.

Yes, clang will use memcpy for struct copies. You can see example IR here https://godbolt.org/g/8gQ18m. You'll see that the struct pointers are bitcasted to i8* before the call.

~Craig

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hongbin Zheng via llvm-dev

unread,

Jan 30, 2018, 2:25:50 AM1/30/18

to ma jun, llvm-dev

hi

On Mon, Jan 29, 2018 at 11:12 PM, ma jun via llvm-dev <llvm...@lists.llvm.org> wrote:

What do you men by this? could you be more specific and give some examples?

Thanks

Hongbin

ma jun via llvm-dev

unread,

Jan 30, 2018, 2:36:46 AM1/30/18

to Craig Topper, llvm-dev

Hi

Thanks !

so for this example

void foo(X &src, X &dst) {
dst = src;
}

and the IR:

define void @foo(X&, X&)(%struct.X* dereferenceable(8), %struct.X* dereferenceable(8)) #0 {
%3 = alloca %struct.X*, align 8
%4 = alloca %struct.X*, align 8
store %struct.X* %0, %struct.X** %3, align 8
store %struct.X* %1, %struct.X** %4, align 8
%5 = load %struct.X*, %struct.X** %3, align 8
%6 = load %struct.X*, %struct.X** %4, align 8
%7 = bitcast %struct.X* %6 to i8*
%8 = bitcast %struct.X* %5 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8* align 4 %8, i64 8, i1 false)
ret void
}

how can I transform the llvm.memcpy into data move loop IR and eliminate the bitcast instruction ?

Regards

Jun

ma jun via llvm-dev

unread,

Jan 30, 2018, 2:45:32 AM1/30/18

to Craig Topper, llvm-dev

Hi

2018-01-30 15:36 GMT+08:00 ma jun <jun.p...@gmail.com>:

Hi
Thanks !
so for this example
void foo(X &src, X &dst) {
dst = src;
}
and the IR:

define void @foo(X&, X&)(%struct.X* dereferenceable(8), %struct.X* dereferenceable(8)) #0 {
%3 = alloca %struct.X*, align 8
%4 = alloca %struct.X*, align 8
store %struct.X* %0, %struct.X** %3, align 8
store %struct.X* %1, %struct.X** %4, align 8
%5 = load %struct.X*, %struct.X** %3, align 8
%6 = load %struct.X*, %struct.X** %4, align 8
%7 = bitcast %struct.X* %6 to i8*
%8 = bitcast %struct.X* %5 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %7, i8* align 4 %8, i64 8, i1 false)

also since the dst and src are 4 byte align , can we use the IR below:

%7 = bitcast %struct.X* %6 to i32*

%8 = bitcast %struct.X* %5 to i32*

call void @llvm.memcpy.p0i32.p0i32.i64(i32* align 4 %7, i32* align 4 %8, i64 8, i1 false)

Craig Topper via llvm-dev

unread,

Jan 30, 2018, 3:11:57 AM1/30/18

to ma jun, llvm-dev

The pointers must always be i8* the alignment is independent and is controlled by the attributes on the arguments in the call to memcpy.

~Craig

ma jun via llvm-dev

unread,

Jan 30, 2018, 3:22:33 AM1/30/18

to Craig Topper, llvm-dev

Hi Craig

Thank you very much ！

Jakub (Kuba) Kuderski via llvm-dev

unread,

Jan 31, 2018, 12:37:00 PM1/31/18

to ma jun, llvm-dev

Hi Ma,

how can I transform the llvm.memcpy into data move loop IR and eliminate the bitcast instruction ?

I'm not sure why you are concerned about memcpy and bitcasts, but if you call MCpyInst->getSource() and MCpyInst->getDest() it will look through casts and give you the 'true' source/destination.

If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc .

Best,
Kuba

--

Jakub Kuderski

ma jun via llvm-dev

unread,

Feb 1, 2018, 12:40:45 AM2/1/18

to Jakub (Kuba) Kuderski, llvm-dev

Hi Jakub

thanks, I saw the pass with code:

	auto *BufferTy = dyn_cast<StructType>(SrcPtrTy->getPointerElementType());
	if (!BufferTy)
	return false;
	any type like i32/float can also use this pass to eliminate memcpy? Regards Jun

David Chisnall via llvm-dev

unread,

Feb 1, 2018, 5:03:45 AM2/1/18

to Jakub (Kuba) Kuderski, llvm-dev

On 31 Jan 2018, at 17:36, Jakub (Kuba) Kuderski via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> If you want to get rid of memcpy altogether, you can take a look at this pass: https://github.com/seahorn/seahorn/blob/master/lib/Transforms/Scalar/PromoteMemcpy.cc .

There are at least four different places in LLVM where memcpy intrinsics are expanded to either sequences of instructions or calls:

- InstCombine does it for very small memcpys (with a broken heuristic).

- PromoteMemCpy does it mostly to expose other optimisation opportunities.

- SelectionDAG does it (though in a pretty terrible way, because it can’t create new basic blocks and so can’t emit small loops)

- Some back ends do it in cooperation with SelectionDAG to provide their own implementation.

Whether you want a memcpy intrinsic or a sequence of loads and stores depends a little bit on what optimisation you’re doing next - some work better treating individual fields separately, some prefer to have a blob of memory that they can treat as a single entity.

It’s also worth noting that LLVM’s handling of padding in structure fields is particularly bad. LLVM IR has two kinds of struct: packed an non-packed. The documentation doesn’t make it clear whether non-packed structs have padding at the end (and clang assumes that it doesn’t, some of the time). Non-padded structs do have padding in between fields for alignment. When lowering from C (or a language needing to support a C ABI), you sometimes end up with padding fields inserted by the front end. Optimisers have no way of distinguishing these fields from non-padding fields and so we only get rid of them if SROA extracts them and finds that they have no side-effect-free consumers. In contrast, the padding between fields in non-packed structs disappears as soon as SROA runs. This can lead to violations of C semantics, where padding fields should not change (because C defines bitwise comparisons on structs using memcmp). This can lead to subtly different behaviour in C code depending on the target ABI (we’ve seen cases where trailing padding is copied in one ABI but not in another, depending solely on pointer size).

David

ma jun via llvm-dev

unread,

Feb 1, 2018, 8:25:45 AM2/1/18

to David Chisnall, llvm-dev

Hi David

tks a lot, that makes much more clear!

Regards

Jun

ma jun via llvm-dev

unread,

Feb 1, 2018, 12:57:34 PM2/1/18

to llvm...@lists.llvm.org

Hi all

why does llvm.memcpy only support i8* ? and how does clang transform struct copy?

Thanks !

Regards

Jun

ma jun via llvm-dev

unread,

Feb 1, 2018, 12:58:15 PM2/1/18

to llvm...@lists.llvm.org

Hi all

Matthias Braun via llvm-dev

unread,

Feb 1, 2018, 1:09:15 PM2/1/18

to ma jun, llvm...@lists.llvm.org

There's no need for anything else, you can bitcast any struct pointer to an i8 pointer (which is free/needs no instructions in all the targets I know of).

- Matthias

Friedman, Eli via llvm-dev

unread,

Feb 1, 2018, 1:39:23 PM2/1/18

to David Chisnall, Jakub (Kuba) Kuderski, llvm-dev

On 2/1/2018 2:03 AM, David Chisnall via llvm-dev wrote:
> In contrast, the padding between fields in non-packed structs
> disappears as soon as SROA runs. This can lead to violations of C
> semantics, where padding fields should not change (because C defines
> bitwise comparisons on structs using memcmp). This can lead to subtly
> different behaviour in C code depending on the target ABI (we’ve seen
> cases where trailing padding is copied in one ABI but not in another,
> depending solely on pointer size).

The IR type of an alloca isn't supposed to affect the semantics; it's
just a sizeof(type) block of bytes. We haven't always gotten this right
in the past, but it should work correctly on trunk, as far as I know.
If you have an IR testcase where this still doesn't work correctly,
please file a bug.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

ma jun via llvm-dev

unread,

Feb 1, 2018, 8:21:28 PM2/1/18

to Matthias Braun, llvm-dev

tks

Regards

Jun

David Chisnall via llvm-dev

unread,

Feb 2, 2018, 5:59:47 AM2/2/18

to Friedman, Eli, llvm-dev

On 1 Feb 2018, at 18:39, Friedman, Eli <efri...@codeaurora.org> wrote:
>
> On 2/1/2018 2:03 AM, David Chisnall via llvm-dev wrote:
>> In contrast, the padding between fields in non-packed structs disappears as soon as SROA runs. This can lead to violations of C semantics, where padding fields should not change (because C defines bitwise comparisons on structs using memcmp). This can lead to subtly different behaviour in C code depending on the target ABI (we’ve seen cases where trailing padding is copied in one ABI but not in another, depending solely on pointer size).
>
> The IR type of an alloca isn't supposed to affect the semantics; it's just a sizeof(type) block of bytes. We haven't always gotten this right in the past, but it should work correctly on trunk, as far as I know. If you have an IR testcase where this still doesn't work correctly, please file a bug.

It’s not an IR test case. We have a C struct that is {void*, int}. On a system with 8-byte pointers, this becomes an LLVM struct { i8*, i8 }. On a system with 16-byte pointers, clang lowers it to { i8*, i8, [12 x i8] }. From the perspective of SROA, the [12 x i8] is a real field. When a function is called with the struct, it is lowered to taking an explicit [12 x i8] argument, whereas the other version takes only i8* and i8 in registers. This means that if the callee writes the data out to memory and then performs a memcmp, the 8-byte-pointer version may not have the same padding, whereas the 16-byte-pointer version will.

In the code that we were using (the DukTape JavaScript interpreter), the callee didn’t actually look at the padding bytes in either case, so we just ended up with less efficient code in the 16-byte-pointer case, but the same could equally have generated incorrect code for the 8-byte-pointer case.

David

Hongbin Zheng via llvm-dev

unread,

Feb 2, 2018, 4:27:25 PM2/2/18

to David Chisnall, llvm-dev

I wonder it is possible the explicitly mark the padding bytes such that the later optimization know the padding bytes and do some optimizations.