For GlobalISel, we’re exploring options for implementing inlining optimizations for memcpy and friends. However, looking around the existing implementation, I don’t see anything that would particularly be problematic for us to do it at the IR level.
The existing TLI hooks to specify how certain memcpy calls should be lowered doesn’t have anything too SelectionDAG specific, and an IR lowering pass could be shared in future between SDAG and GISel. Does anyone see issues with this?
Thanks,
Amara
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> Thanks,
> Amara
Roman.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> On Jun 19, 2019, at 11:41 PM, Amara Emerson via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi all,
>
> For GlobalISel, we’re exploring options for implementing inlining optimizations for memcpy and friends. However, looking around the existing implementation, I don’t see anything that would particularly be problematic for us to do it at the IR level.
>
> The existing TLI hooks to specify how certain memcpy calls should be lowered doesn’t have anything too SelectionDAG specific, and an IR lowering pass could be shared in future between SDAG and GISel. Does anyone see issues with this?
>
> Thanks,
> Amara
We already have lib/Transforms/Utils/LowerMemIntrinsics.cpp, there just isn’t a general pass that expands these for targets. AMDGPU already always use this for memcpy handling.
-Matt
> On Jun 20, 2019, at 3:54 PM, Matt Arsenault <ars...@gmail.com> wrote:
>
>
>
>> On Jun 19, 2019, at 11:41 PM, Amara Emerson via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> Hi all,
>>
>> For GlobalISel, we’re exploring options for implementing inlining optimizations for memcpy and friends. However, looking around the existing implementation, I don’t see anything that would particularly be problematic for us to do it at the IR level.
>>
>> The existing TLI hooks to specify how certain memcpy calls should be lowered doesn’t have anything too SelectionDAG specific, and an IR lowering pass could be shared in future between SDAG and GISel. Does anyone see issues with this?
>>
>> Thanks,
>> Amara
>
> We already have lib/Transforms/Utils/LowerMemIntrinsics.cpp, there just isn’t a general pass that expands these for targets. AMDGPU already always use this for memcpy handling.
>
>
> -Matt
>
Sure, that might end up sharing some code but the key thing is to use the TLI hooks to implement the same optimizations that SelectionDAG currently does.
Amara
If you're expanding to loads and stores, it's much better to do this at
the IR level with SelectionDAG because you can insert flow control
structures (so can emit loops), which I don't believe is a problem for
GlobalISel. Some targets do not expand to loads and stores, for example
on some x86 variants memcpy is expanded to a single REP MOVSB
instruction. This is probably much easier to implement as a DAG pattern.
David
On 20/06/2019 23:48, Amara Emerson via llvm-dev wrote:
> I agree that this should be a very late pass. Doing it in the IR would
> simplify the implementation in GlobalISel, but it would also allow us to
> perhaps have one shared expansion/optimization pass between both SDISel
> and GISel.
>
> Volkan may look at upstreaming a partial implementation he has downstream.
>
> Cheers,
> Amara
>
>> On Jun 20, 2019, at 3:22 AM, Sjoerd Meijer <Sjoerd...@arm.com
>> <mailto:Sjoerd...@arm.com>> wrote:
>>
>> Looks like there are a lot of opinions where memcpy expansion/inlining
>> needs to happen: (late) IR, or if it is a backend problem, see also
>> for example https://reviews.llvm.org/D35035. Complicating factor here
>> is that efficient memcpy lowering is crucial for performance and
>> code-size (and they occur a lot).
>>
>> Either way, I agree that the TLI hooks are not SelectionDAG specific,
>> they can be used in an IR lowering pass.
>>
>> Cheers,
>> Sjoerd.
>>
>> ------------------------------------------------------------------------
>> *From:*llvm-dev <llvm-dev...@lists.llvm.org
>> <mailto:llvm-dev...@lists.llvm.org>> on behalf of Roman Lebedev
>> via llvm-dev <llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>>
>> *Sent:*20 June 2019 08:04
>> *To:*Amara Emerson
>> *Cc:*llvm-dev
>> *Subject:*Re: [llvm-dev] RFC: Memcpy inlining in IR
>> On Thu, Jun 20, 2019 at 6:42 AM Amara Emerson via llvm-dev
>> <llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>> wrote:
>> >
>> > Hi all,
>> >
>> > For GlobalISel, we’re exploring options for implementing inlining optimizations for memcpy and friends. However, looking around the existing implementation, I don’t see anything that would particularly be problematic for us to do it at the IR level.
>> >
>> > The existing TLI hooks to specify how certain memcpy calls should be lowered doesn’t have anything too SelectionDAG specific, and an IR lowering pass could be shared in future between SDAG and GISel. Does anyone see issues with this?
>> Sounds similar tohttps://reviews.llvm.org/D60318
>> It should be done *really* late in the middle-end pipeline though.
>>
>> > Thanks,
>> > Amara
>> Roman.
>>
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
>> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
Maybe I’m wrong, but it seems to me like as soon as there’s a memcpy on a pointer, the IR is incorrect for you target’s semantics. i.e. the language frontend should never emit a memcpy on pointers, and it would be wrong for your target to synthesize new memcpy from load / store on pointers. Clang definitely has the required information to respect these constraints, but I don’t think LLVM IR does.