[llvm-dev] Disable combining of loads and stores in instcombine

Neil Ryan via llvm-dev

unread,

Apr 16, 2019, 3:01:11 PM4/16/19

to llvm...@lists.llvm.org

LLVM's optimizer combines stores to consecutive characters into a write of a single word. For instance, if I have char A[4] and I write some static value to each element, these writes would be combined into a single 32-bit word write. I found this thread from 2009 -- it seems like it wasn't possible then. Has anything changed since?

Neil

Tom Stellard via llvm-dev

unread,

Apr 16, 2019, 11:17:05 PM4/16/19

to Neil Ryan, llvm...@lists.llvm.org

On 04/16/2019 11:38 AM, Neil Ryan via llvm-dev wrote:
> LLVM's optimizer combines stores to consecutive characters into a write of a single word. For instance, if I have char A[4] and I write some static value to each element, these writes would be combined into a single 32-bit word write. I found this thread <http://llvm.1065342.n5.nabble.com/disabling-combining-load-stores-in-optimizer-td37560.html> from 2009 -- it seems like it wasn't possible then. Has anything changed since?
>

Why do you want to disable this optimization?

-Tom

> Neil
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tom Stellard via llvm-dev

unread,

Apr 17, 2019, 8:00:39 AM4/17/19

to Neil Ryan, llvm...@lists.llvm.org

On 04/16/2019 09:11 PM, Neil Ryan wrote:
> I’m writing a pass for some custom hardware — we’d like to split arrays across hardware elements; this doesn’t work if consecutive writes to characters get combined to a word.

Do you have an LLVM IR example where this happens when you don't want
it to?

-Tom

Arsenault, Matthew via llvm-dev

unread,

Apr 17, 2019, 8:02:54 AM4/17/19

to Neil Ryan, llvm...@lists.llvm.org

This won’t happen with volatile load/store

-Matt

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 12:35:35 PM4/17/19

to Arsenault, Matthew, Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 5:02 AM, Arsenault, Matthew via llvm-dev <llvm...@lists.llvm.org> wrote:

This won’t happen with volatile load/store

This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:

struct {
volatile int a : 12;
volatile int b : 4;
} s;

As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.

-Matt

Jameson Nash via llvm-dev

unread,

Apr 17, 2019, 12:54:21 PM4/17/19

to JF Bastien, Neil Ryan, llvm...@lists.llvm.org

The lang ref promises that, "the backend should never ... merge target-legal volatile load/store instructions". Which I've usually assumed meant that it was promised? Just curious if you thought this left some wiggle room for some optimizations, since my impression was that someone wanted to let you know that it's dependable. Likewise, there's also the clause that "The optimizers must not change the number of volatile operations", which doesn't seem to explicitly forbid merging, but does seem to me like it would make it difficult to find a case where it would be profitable.

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 1:01:53 PM4/17/19

to Jameson Nash, Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 9:53 AM, Jameson Nash <vtj...@gmail.com> wrote:

The lang ref promises that, "the backend should never ... merge target-legal volatile load/store instructions". Which I've usually assumed meant that it was promised? Just curious if you thought this left some wiggle room for some optimizations, since my impression was that someone wanted to let you know that it's dependable. Likewise, there's also the clause that "The optimizers must not change the number of volatile operations", which doesn't seem to explicitly forbid merging, but does seem to me like it would make it difficult to find a case where it would be profitable.

“Should” and “target-legal” aren’t very good promises :-)

The “must not” bit is interesting, I hadn’t noticed it!

In any case, I agree that LLVM should do what it can to meet expectations w.r.t. volatile, and in this case avoid merging when it can (i.e. only merge if there are no corresponding memory operations).

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 1:02:46 PM4/17/19

to Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 10:00 AM, Neil Ryan <neil...@cs.washington.edu> wrote:

So, volatile’s been a fine solution — the issue is volatile pointers would perform the load every time; ideally memory accesses would be able to be cached. This is why I’ve been leaning towards disabling the part of instcombine that combines memory accesses instead of using volatile — there should be better performance.

Why do you want this?

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 1:11:24 PM4/17/19

to Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 10:06 AM, Neil Ryan <neil...@cs.washington.edu> wrote:

Why do you want this?

The goal is to share arrays between multiple tiles on a manycore architecture by splitting arrays between tiles. With a DRF memory model, it makes sense to elide multiple loads to the same memory location between barriers.; IIRC the semantics for volatile don’t allow this eliding.

If there hasn’t been synchronization between to loads, then yes it makes sense to elide loads in a DRF memory model.

Indeed volatile loads cannot be elided.

But why is it desirable to avoid combining adjacent stores? If you’ve got DRF code then the combination can’t be observed.

Amara Emerson via llvm-dev

unread,

Apr 17, 2019, 1:43:04 PM4/17/19

to JF Bastien, Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 9:35 AM, JF Bastien via llvm-dev <llvm...@lists.llvm.org> wrote:

On Apr 17, 2019, at 5:02 AM, Arsenault, Matthew via llvm-dev <llvm...@lists.llvm.org> wrote:

This won’t happen with volatile load/store

This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:

struct {
volatile int a : 12;
volatile int b : 4;
} s;

As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.

Is this really true? The writes to two volatile variables are sequenced and can’t be re-ordered since they can have side effects, and so merging adjacent volatile stores would break this.

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 1:45:18 PM4/17/19

to Amara Emerson, Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 10:42 AM, Amara Emerson <aeme...@apple.com> wrote:

On Apr 17, 2019, at 9:35 AM, JF Bastien via llvm-dev <llvm...@lists.llvm.org> wrote:

On Apr 17, 2019, at 5:02 AM, Arsenault, Matthew via llvm-dev <llvm...@lists.llvm.org> wrote:

This won’t happen with volatile load/store

This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:

struct {
volatile int a : 12;
volatile int b : 4;
} s;

As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.
Is this really true? The writes to two volatile variables are sequenced and can’t be re-ordered since they can have side effects, and so merging adjacent volatile stores would break this.

There’s further discussion of what we promise downthread. It’s definitely true of C / C++, a handwavy promise in LLVM LangRef, and the LLVM memory model just says “same as C / C++".

JF Bastien via llvm-dev

unread,

Apr 17, 2019, 1:48:43 PM4/17/19

to Neil Ryan, llvm...@lists.llvm.org

On Apr 17, 2019, at 10:40 AM, Neil Ryan <neil...@cs.washington.edu> wrote:

But why is it desirable to avoid combining adjacent stores? If you’ve got DRF code then the combination can’t be observed.

It’s more that the consecutive stores would be going to different tiles. If multiple stores are combined in IR, I don’t think they’d be able to decoupled in IR, unless there’s a way to always determine which global object an arbitrary GEP is pointing to.

That does seem like a valid use case, albeit one that C / C++ and LLVM IR can’t really help you with today. Volatile has the limitations you’ve described, and relaxed atomics could be combined as you’d like to avoid (though they probably won’t be right now).

Neil Ryan via llvm-dev

unread,

Apr 17, 2019, 3:28:18 PM4/17/19

to llvm...@lists.llvm.org, tste...@redhat.com

I’m writing a pass for some custom hardware — we’d like to split arrays across hardware elements; this doesn’t work if consecutive writes to characters get combined to a word.

On Apr 16, 2019, 8:17 PM -0700, Tom Stellard <tste...@redhat.com>, wrote:

Neil Ryan via llvm-dev

unread,

Apr 17, 2019, 3:28:48 PM4/17/19

to JF Bastien, Jameson Nash, llvm...@lists.llvm.org

So, volatile’s been a fine solution — the issue is volatile pointers would perform the load every time; ideally memory accesses would be able to be cached. This is why I’ve been leaning towards disabling the part of instcombine that combines memory accesses instead of using volatile — there should be better performance.

On Apr 17, 2019, 9:54 AM -0700, Jameson Nash <vtj...@gmail.com>, wrote:

Neil Ryan via llvm-dev

unread,

Apr 17, 2019, 3:29:28 PM4/17/19

to JF Bastien, llvm...@lists.llvm.org

Why do you want this?

Neil Ryan via llvm-dev

unread,

Apr 17, 2019, 3:30:06 PM4/17/19

to JF Bastien, llvm...@lists.llvm.org

But why is it desirable to avoid combining adjacent stores? If you’ve got DRF code then the combination can’t be observed.

Arsenault, Matthew via llvm-dev

unread,

Apr 17, 2019, 3:31:49 PM4/17/19

to Neil Ryan, llvm...@lists.llvm.org, tste...@redhat.com

This is really a codegen problem. You can decompose the load/store however you like in the backend. InstCombine should still combine the loads as a canonicalization.

-Matt

Arsenault, Matthew via llvm-dev

unread,

Apr 18, 2019, 4:42:48 PM4/18/19

to Neil Ryan, llvm...@lists.llvm.org, tste...@redhat.com

I’m not sure what you mean by this. The type in memory doesn’t really mean anything, and no information is lost. You can still tell (for optimizations) some information about the underlying IR object from the MachineMemOperand, which is mostly used for alias analysis.

-Matt

From: Neil Ryan <neil...@cs.washington.edu>
Date: Thursday, April 18, 2019 at 9:59 PM
To: llvm-dev <llvm...@lists.llvm.org>, "tste...@redhat.com" <tste...@redhat.com>, "Arsenault, Matthew" <Matthew....@amd.com>
Subject: Re: [llvm-dev] Disable combining of loads and stores in instcombine

IIRC it’s not strictly possible to determine what array a load/store is based on. I don’t believe the decomposition is always possible, as information is lost when accesses are combined.

Neil

Neil Ryan via llvm-dev

unread,

Apr 19, 2019, 1:22:54 PM4/19/19

to llvm...@lists.llvm.org, tste...@redhat.com, Arsenault, Matthew

IIRC it’s not strictly possible to determine what array a load/store is based on. I don’t believe the decomposition is always possible, as information is lost when accesses are combined.

Neil

On Apr 17, 2019, 12:31 PM -0700, Arsenault, Matthew <Matthew....@amd.com>, wrote:

Neil Ryan via llvm-dev

unread,

Apr 19, 2019, 4:54:55 PM4/19/19

to llvm...@lists.llvm.org, tste...@redhat.com, Arsenault, Matthew

Sorry, I meant that if I have a series of character loads that get combined into a single word load, I don’t know of a way to 1) know that the word load was originally comprised of character loads and 2) decompose the word load back into character loads.

Granted, if I have (1), (2) just a matter of inserting the right ops. I’ve been digging into MachineMemOperand — I’m not entirely sure what methods in this class can get me this information. Any pointers would be much appreciated.

Neil

Sanjay Patel via llvm-dev

unread,

Apr 21, 2019, 11:22:41 AM4/21/19

to Neil Ryan, llvm...@lists.llvm.org

Step back: where exactly in instcombine is this load/store combining happening? I thought we don't do that for precisely the reasons stated in this thread.

We used to have a dedicated IR pass for load combining that was removed:

https://reviews.llvm.org/rL306067

We definitely do some load/store combining in SelectionDAG, but that should all be preventable using target hooks.

AFAICT, the place this happens in IR is the -memcpyopt pass? If that's the problem, then how about adding a TargetTransformInfo dependence/hook there to stop this transform?

Reply all

Reply to author

Forward