We have some similar cases and wanted the same thing; what we were doing for a while is using the third "is constant" field of the TBAA metadata and setting that to 1. I'm not 100% sure what the semantics of that are -- LangRef says it means that pointsToConstantMemory() returns true which means that the memory is "impossible ... to be modified", which seems like not quite a fit for this set-exactly-once use case. In practice, looking at the IR after our optimization pipeline, we were getting the results we wanted: if a store and subsequent loads were seen together, the store would remain and the value would be forwarded to all the loads. (I don't think I looked at the "multiple loads with no visible store which should get collapsed to a single load" case.) ymmv
We've disabled the optimization for now, since without an easy way of putting the annotation in the C++ source code and getting it passed through clang, it became a burden to keep the application logic in sync with the tbaa-fixup code that lived in a different place. We'll fix it eventually...
On 09/10/2014 02:42 PM, Kevin Modzelewski wrote:
I hadn't looked at this approach much, but based on the documentation, you're basically just asking for miscompiles here. The semantics seem to be the same as "invariant.load". While the optimizer happens to not be removing the stores, it seems like it would be perfectly legal for it to do so.We have some similar cases and wanted the same thing; what we were doing for a while is using the third "is constant" field of the TBAA metadata and setting that to 1. I'm not 100% sure what the semantics of that are -- LangRef says it means that pointsToConstantMemory() returns true which means that the memory is "impossible ... to be modified", which seems like not quite a fit for this set-exactly-once use case. In practice, looking at the IR after our optimization pipeline, we were getting the results we wanted: if a store and subsequent loads were seen together, the store would remain and the value would be forwarded to all the loads. (I don't think I looked at the "multiple loads with no visible store which should get collapsed to a single load" case.) ymmv
This is also something that I've wanted to get support for, so thanks for
working on it.
I've never realy understood how the llvm.invariant intrinsics could be
put into practice. There is the problem that "end" can occur anywhere
as you suggested fixing with a flag. But there is also the problem
that "start" needs to be visible in-scope. Additionally,
llvm.invariant is inherently control dependent, which doesn't really
allow the optimizer to hoist the invariant loads the way we want it
to. Are you planning to have your frontend emit llvm.invariant.start
intrinsics *everywhere* the field is accessed. If the frontend creates
a bunch of small llvm.invariant scopes, with arbitrary calls in
between, we won't be able to hoist or combine them. Although with your
proposal we could at least eliminate the llvm.invariant calls that are
dominated by another identical one.
I think it would be simpler and more powerful for the frontend to
treat loads from a particular type as invariant and have a safe way to
express the initialization. In other words, the frontend could always
emit the invariant metadata on the loads, but guard the store to avoid
loading before initialization.
The question is how to guard the initial store(s). Because we treat
invariant metadata as effectively meaning "can't alias", I think we
need to express this as a data dependence on the pointer that we load
from.
Conceptually:
%p = unaliased address
store val, %p
%a = llvm.safe_cast %p ; just made this up
val = load %a !invariant
This approach would give optimization maximum freedom in code that
doesn't involve initialization. It would block store-load forwarding
from the initializer itself to the invariant loads, but I think
handling that is a simple peephole.
There are multiple important use-cases for write-once data:
- Constant globals with nontrivial initialization.
- Runtime metadata, including but not limited to object headers.
- Builtin data structures with immutable semantics.
- Maybe even language features for user defined data structures.
In some cases, we want to lazilly initialize a value. Some cases even
have conditionally invariant values. Ideally, we can express all of
these situations in LLVM IR. I think of the generalization of the
problem as:
- some arbitrary condition produces a token
- all loads from the immutable memory must acquire this token
It might look like this:
%p1 = ...
%p2 = llvm.safe_cast %p1
%a = select i1 %is_immutable, %p1, %p2
load %a !invariant
If %is_immutable can be proven to be true, then the safe_cast becomes
a nop. Otherwise, the safe_cast appears to read memory reachable from
its pointer, preventing stores from being reordered across it. The
safe_cast is readonly, so loads can be hoisted and optimized across it
and redundant stores can be eliminated.
Examples:
(1) Immutable fields in builtin data structures (Array.length)
; initialization
%p = <allocate>
store %v0, %p
%a = llvm.safe_cast %p
store %a, %container
...
; access to the immutable data in a different scope
%a = load %container
%v1 = load %a, !invariant
Here the second load is invariant but depends on the first load of the
container object (array), %a. The first load effectively acquires the
token. This load is not generally invariant, but likely loop invariant
and may benefit from TBAA. We would get the same effect by returning
the pointer to the "new" array and wouldn't need the first load in
that case.
(2) Lazilly initialized data
%p1 = ...
if (!%is_initialized) ; some arbitrary condition
store %initval, %p1
%p2 = llvm.safe_cast %p1
%a = select i1 %is_initialized, %p1, %p2
%v = load %a, !invariant
Here we have an invariant load guarded by a condition,
%is_initialized. If the optimizer can reason about the condition,
which may be tested repeatedly in the same scope, it can optimize away
the lazy initialization path. These patterns can potentially be
optimized via correlated value propagation, jump-threading, and
unswitching.
(3) Conditionally immutable data
I realize this case is confusing but I'm laying out an example here anyway as a proof of concept.
; Load the internal state of an object. Once it is immutable, it is
; always immutable. This computation is marked invariant so it can be
; hoisted and combined.
%invariant_internal_state = load %object, !invariant
%invariant_is_immutable_flag = load %invariant_internal_state, !invariant
%invariant_is_immutable_check = cmp %invariant_is_immutable_flag, ...
; Any subsequent load of the internal state after it may have changed
; must be guarded.
%safe_object = llvm.safe_cast %object
; conditionally reload the object
%reload_internal_state = load %safe_object
%internal_state =
select i1 %invariant_is_immutable_check,
%invariant_internal_state,
%reload_internal_state,
The %invariant_is_immutable_check can now be combined and hoisted out
of loops.
This case is tricky because every load of the object's internal state
must be guarded. It's up to the front end to handle this. This
technique may be useful if you need to aggressively optimize for case
in which an object is dynamically determined to be immutable, and that
is expected to be the common case.
—
I think the safe_cast intrinsic would work for your array.length case
and also for initialization of metadata. It wouldn't be as great for
standard global variables because we would either need to introduce a
level of indirection to get at the global, introduce a flag to check
initialization, or add a safe_cast to every read of the variable.
For this technique to be really effective, we also need to give the
invariant loads may-speculate semantics. Do you have a plan for that?
I'm fine with introducing another "may-speculate" metadata type for it
initially. We could then use safe_cast intrinsic to guard these loads
on type checks in addition to initialization.
I would also give the safe_cast a constant ID and semantics so that
only casts with the same ID could be combined. For example, two checks
on the same type can be combined if one dominates the other,
regardless of the intervening code. However, we could have multiple
independent checks on the same pointer value, each with different
control dependencies.
-Andy
%p1 = ...
store %v0, %p1
%p2 = llvm.safe_cast %p1
store %v1, %p1
store %v2, %p2
%v3 = load %p2, !invariant
The load can be moved above both stores (%v1 and %v2). AFAIK, the invariant metadata tag essentially tells alias analysis not to even try, just assume no aliasing. So the semantics effectively guarantee that %v0==%v1==%v3==%v3. The IR is still “legal” in that it is verifiable, but it is nonsense.
Note that it is also totally valid for an alias analysis to return that they MustAlias. That would allow forwarding from store %v0 to the load. This would have to be implemented as a special case, since the invariant tag normally bypasses pointer analysis. I admit this is weird, but I’m don’t know of any problems it will cause. This can already happen with TBAA, and will be a problem with any metadata that can be abused by the frontend or language design. A difficultly with this approach is that the frontend has the burden for emitting guards anywhere the “invariant” data could be modified.
How do you imagine mixing and matching invariant and non-invariant? I showed some examples of doing that safely, but I’m not sure how it’s relevant to you.
-Andy
Sanjoy made a good point. We don't actually need a new variant of
"invariant.start". Simply using an invariant.start with no uses gives
us a notion of an invariant region with no end. (Since the result
doesn't escape, there can be no end hidden inside a function call.)
This seems like a simple notion to exploit and is strict step forward
from where we are, without a new intrinsic.
The notions of "invariant over some range" and "safe to speculate over
some range" are related, but distinct. However, to get the most power
out of the former, you often need the later.
Andy seems to be arguing in favour of data flow mostly because it's
easier for the optimizer. Unless I'm misreading, there's nothing
preventing us from using control flow for the same patterns. For example:
p1 = ...
*p1 = 5
p2 = llvm.safe_cast p1
p3 = select is_constant p2, p1
load p3
Is isomorphic with:
p1 = ...
*p1 = 5
if is_constant:
invariant.start(p1)
load p1
Andy, is this correct? Or am I missing something?
(p.s. You guys seem to be throwing around the terms "!invariant" and
"invariant load" without clearly defining them. I'm unsure if you mean
the same things.)
I suspect this is going to be a topic we should discuss in person at the
developers conference or over a beer at a social. It's a bit
complicated to keep track of in email.
My belief is that a path sensitive invariant indicator for a memory
location is probably sufficient for most use cases. In particular, the
initialization and use logic is usually not in the same compilation for
the cases I care about. I care that such cases are *possible*, but I'm
less concerned about them being *easy*. I'm also having real trouble
find examples of cases where Andy's "conditionally invariant data" is
useful. Given that we *have an existing* mechanism which only needs a
slightly tweak for "endless" invariant regions, I'm tempted to stick
with what we have.
The only change (clarification?) I'd make to the current scheme is to
specify that a location is both invariant *and dereferenceable* at the
point it is listed in an invariant.start. This enables speculative
loading up to that construct. I think this is actually what the current
intrinsic is trying for from the documentation.
(Hm.. it does make it hard to rearrange calls to invariant.start
though... Thoughts?)
I suspect that for the Array.length case, I'd probably do something like
this:
arr = function_that_returns_array();
invariant.start(arr.length)
... arbitrary control flow ...
length = arr.length
... arbitrary control flow ...
length2 = arr.length
My reasoning here is that if something returns an array (in this
compilation) I want the invariant fact to be visible. If that factory
function gets inlined, I'd end up with:
arr = new Array;
arr.length = X;
invariant.start(arr.length)
... arbitrary control flow ...
length = arr.length
... arbitrary control flow ...
length2 = arr.length
Both cases seem nearly ideal here. I know I can lift the loads as far
as the function which returns the base pointer. I know I can value
forward loads (either with or without lifting). I know I can value
forward the store to the load (as I normally would).
In terms of aliasing, it seems like having invariant.start read from
it's arguments memory is sufficient. That prevents reorderings of
stores to that location before the address to after the
invariant.start. Due to conservative aliasing, that might be somewhat
harmful, but I don't really see a way to avoid this... (Or is this the
point Andy's trying to solve with the data flow?) I don't have a sense
for how painful this is. Try it and see?
I could see a useful extension for a function attribute (type?, TBAA?)
for describing the return value's invariant fields (*at the point of the
return*), but that is more minor goodness for IR size then clearly
necessary.
Philip
Yes. Exactly.
> Andy seems to be arguing in favour of data flow mostly because it's easier for the optimizer. Unless I'm misreading, there's nothing preventing us from using control flow for the same patterns. For example:
> p1 = ...
> *p1 = 5
> p2 = llvm.safe_cast p1
> p3 = select is_constant p2, p1
> load p3
>
> Is isomorphic with:
> p1 = ...
> *p1 = 5
> if is_constant:
> invariant.start(p1)
> load p1
>
> Andy, is this correct? Or am I missing something?
Yes, you are missing my main point. In the first case, you can actually mark load p3 with the !invariant tag. The cast and load don’t need to be in the same scope.
> (p.s. You guys seem to be throwing around the terms "!invariant" and "invariant load" without clearly defining them. I'm unsure if you mean the same things.)
Were talking about two things
- loads of data that is known to be immutable, path sensitive
- load that are marked with !invariant metadata (I often say “invariant load” because !invariant looks like “not invariant”)
> I suspect this is going to be a topic we should discuss in person at the developers conference or over a beer at a social. It's a bit complicated to keep track of in email.
Definitely.
I am not at all opposed to an assert_immutable intrinsic. I just don’t think it will be performant for the three reasons I stated in the previous email.
> In terms of aliasing, it seems like having invariant.start read from it's arguments memory is sufficient. That prevents reorderings of stores to that location before the address to after the invariant.start. Due to conservative aliasing, that might be somewhat harmful, but I don't really see a way to avoid this... (Or is this the point Andy's trying to solve with the data flow?) I don't have a sense for how painful this is. Try it and see?
>
> I could see a useful extension for a function attribute (type?, TBAA?) for describing the return value's invariant fields (*at the point of the return*), but that is more minor goodness for IR size then clearly necessary.
invariant.start/assert_immutable has to write to memory to get the semantics you want. “maythrow” is insufficient (see previous message). I proposed new semantics based on TBAA tags as a partial fix for this problem.
-Andy
> My belief is that a path sensitive invariant indicator for a memory location is probably sufficient for most use cases. In particular, the initialization and use logic is usually not in the same compilation for the cases I care about. I care that such cases are *possible*, but I'm less concerned about them being *easy*. I'm also having real trouble find examples of cases where Andy's "conditionally invariant data" is useful. Given that we *have an existing* mechanism which only needs a slightly tweak for "endless" invariant regions, I'm tempted to stick with what we have.
>
> The only change (clarification?) I'd make to the current scheme is to specify that a location is both invariant *and dereferenceable* at the point it is listed in an invariant.start. This enables speculative loading up to that construct. I think this is actually what the current intrinsic is trying for from the documentation.
>
> (Hm.. it does make it hard to rearrange calls to invariant.start though... Thoughts?)
>
> I suspect that for the Array.length case, I'd probably do something like this:
> arr = function_that_returns_array();
> invariant.start(arr.length)
> ... arbitrary control flow ...
> length = arr.length
> ... arbitrary control flow ...
> length2 = arr.length
>
> My reasoning here is that if something returns an array (in this compilation) I want the invariant fact to be visible. If that factory function gets inlined, I'd end up with:
> arr = new Array;
> arr.length = X;
> invariant.start(arr.length)
> ... arbitrary control flow ...
> length = arr.length
> ... arbitrary control flow ...
> length2 = arr.length
>
> Both cases seem nearly ideal here. I know I can lift the loads as far as the function which returns the base pointer. I know I can value forward loads (either with or without lifting). I know I can value forward the store to the load (as I normally would).
>
>
>
On Oct 21, 2014, at 4:03 PM, Philip Reames <list...@philipreames.com> wrote:Sanjoy made a good point. We don't actually need a new variant of "invariant.start". Simply using an invariant.start with no uses gives us a notion of an invariant region with no end. (Since the result doesn't escape, there can be no end hidden inside a function call.) This seems like a simple notion to exploit and is strict step forward from where we are, without a new intrinsic.
On Oct 21, 2014, at 4:03 PM, Philip Reames <list...@philipreames.com> wrote:
Sanjoy made a good point. We don't actually need a new variant of "invariant.start". Simply using an invariant.start with no uses gives us a notion of an invariant region with no end. (Since the result doesn't escape, there can be no end hidden inside a function call.) This seems like a simple notion to exploit and is strict step forward from where we are, without a new intrinsic.
I'm coming back to this because it is a sticking point for me in all of the proposals that were floated informally on the commits list. I would really like this to work, but can't prove that it's safe.
Given:
%a = newArray()%pLen = gep(%a, <length offset>)invariant.start(<size>, %pLen)%len = load %pLen
%o = foo() // may deallocate %astore %somethingload %o
I want to claim that the LLVM optimizer cannot do the following (but don't have a complete line of reasoning yet):
%a = newArray()%pLen = gep(%a, <length offset>)invariant.start(<size>, %pLen)%len = load %pLen
%o = foo() // may deallocate %aif ptrtoint(%o) == ptrtoint(%pLen)load %pLenstore %something // Oops... this might aliaselsestore %somethingload %o
But is there an implicit invariant.end(%pLen) at the call to foo(), at which point the array can be freed and another object allocated in its place? I don’t think so.
The way I interpreted the conclusion of this email thread is that invariant.start can be used without invariant.end.
Since the token returned by the intrinsic never escapes, you can just assume the memory is invariant at all uses. The problem here is that the optimizer could (in theory) introduce new uses of the pointer after the object lifetime. This is the same problem that you yourself have raised. I hoped we could sidestep the problem because it crops up with any representation that attaches invariance to a pointer value. If we have an answer for this, then we can easily debate other representations like broadening !invariant.load metadata semantics and introducing new intrinsics.
On Dec 1, 2014, at 3:44 PM, Philip Reames <list...@philipreames.com> wrote:(Spawning a separate subthread off the 'Optimization hints for "constant" loads' discussion for a related question. )
Looking at TBAA again, I was reminded that TBAA also contains a third field which indicates that "meaning pointsToConstantMemory should return true; see other useful AliasAnalysis methods". Looking at this a bit, it really seems like this flag has the exact same meaning as !invariant.load.
pointsToConstantMemory returns a value for a Location. Since it is entirely legal to have two Locations which describe the same physical memory, it seems like you'd be back to the same semantics as !invariant.load.
The only uncertainty here is that a Location is clearly (??) position/Instruction specific. Does that also imply that the Location is control dependent? What is the semantics of the following code?
if (is_known_invariant) {
load %p, !tbaa is_constant=true
}
Is the optimizer allowed to lift the load above the conditional? (Assuming it can prove the location is known dereferenceable.) The semantics for !invariant.load clearly allow this, but do the semantics for TBAA's constant flag?
I think the answer to these questions is that the load is *not* control dependent on the conditional (assuming it's otherwise known dereferenceable). Given this, why do we have both? Should we canonicalize one into the other?
Looking at the current implementations, it appears that TBAA's constant flag is more broadly implemented. On first glance, I'm really tempted to just deprecate !invariant.load in place of TBAA's constant flag. Thoughts?
Philip
On Dec 1, 2014, at 3:44 PM, Philip Reames <list...@philipreames.com> wrote:
(Spawning a separate subthread off the 'Optimization hints for "constant" loads' discussion for a related question. )
Looking at TBAA again, I was reminded that TBAA also contains a third field which indicates that "meaning pointsToConstantMemory should return true; see other useful AliasAnalysis methods". Looking at this a bit, it really seems like this flag has the exact same meaning as !invariant.load.
pointsToConstantMemory returns a value for a Location. Since it is entirely legal to have two Locations which describe the same physical memory, it seems like you'd be back to the same semantics as !invariant.load.
The only uncertainty here is that a Location is clearly (??) position/Instruction specific. Does that also imply that the Location is control dependent? What is the semantics of the following code?
if (is_known_invariant) {
load %p, !tbaa is_constant=true
}
Is the optimizer allowed to lift the load above the conditional? (Assuming it can prove the location is known dereferenceable.) The semantics for !invariant.load clearly allow this, but do the semantics for TBAA's constant flag?
I think the answer to these questions is that the load is *not* control dependent on the conditional (assuming it's otherwise known dereferenceable). Given this, why do we have both? Should we canonicalize one into the other?
It would be very confusing if the two had different semantics.
In either case, hoisting the load (without dropping metadata) is definitely legit.
But conservatively, the invariance is still a path sensitive property. The load is invariant w.r.t. any store as long as control reaches a use-with-side-effects of the loaded value.
Given:
store %qm1 = load %pif <something> {m2 = load %p, !invariant.load}m = phi(m1, m2)
We can safely convert to:
m2 = load %p, !invariant.loadstore %qm1 = load %pif <something> {}m = phi(m1, m2)
But cannot safely convert to:
m = load %p, !invariant.loadstore %q
I would *really* like to specify more aggressive semantics so that we can do that, but haven’t adequately proved we can do that safely. I’m don’t think the optimizer will do the unsafe thing today, unless there’s an oversight somewhere.
Looking at the current implementations, it appears that TBAA's constant flag is more broadly implemented. On first glance, I'm really tempted to just deprecate !invariant.load in place of TBAA's constant flag. Thoughts?
I don’t have a strong opinion here. I’m fine relying on TBAA being enabled to active these sort of optimizations.
-Andy
Philip