[LLVMdev] Optimization hints for "constant" loads

629 views
Skip to first unread message

Philip Reames

unread,
Sep 10, 2014, 1:15:07 AM9/10/14
to llv...@cs.uiuc.edu
I'm looking at how to optimize IR which contains reads from a field
which is known to be initialized exactly once. I've got an idea on how
to approach this, but wanted to see if others have alternate ideas or
similar problems which might spark discussion. It feels like there's a
potentially generally useful optimization hint here if we can generalize
it sufficiently without loosing optimization potential.

The problem:
struct array {
private:
// once initialized 'len' never changes
int len;
// data can be modified at will
char data[0];
public:
static array* make(int len) {
array* a = ... allocate uninitialized space
a->len = len;
return a;
}
};
void access(array* a, int idx) {
if( idx >= 0 && idx <- a->len ) {
a->data[idx] = 5;
}
}
void foo(array* a) {
for(int i = 0; i < a->len; i++) {
access(a, i);
}
}
// assume 'access' is inlined into 'foo' and the loop is unrolled a time
or two

To phrase that again in english, I've got a field which is initialized
once, with naive code which reads from it many times. I know at IR
generation time that a load from array::len is special, but I loose this
information as soon as I generate IR. In particular, I run into
aliasing problems where the location a->len is considered 'mayalias'
with unrelated stores thus preventing value forwarding, LICM, and other
desirable optimizations.

Existing Approaches:
1) Use TBAA - By tagging loads and stores to the two fields of the
array struct as disjoint branches in the TBAA tree, I can inform LLVM
that a load of 'len' never aliases with a store through 'data'. This
mostly works, and enables many loads to be forwarded by GVN, but (due to
the intervening stores) is a complete loss in EarlyCSE and (due to
intervening calls) LICM.
a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
improve the situation in EarlyCSE.
2) Use "invariant.load" metadata - This metadata indicates that the
field loaded from is initialized before the execution of the code being
compiled. In particular, "invariant.load" implies that the load is not
control dependent on any branch, is safe to speculate, and that no write
aliases the location read from. This mostly works, but only if
'array::make' is never visible in the IR. As soon as 'array::make'
gets inlined, all bets are off and mis-compiles may result.
a) Also, in practice, only LICM really knows about
"invariant.load". It would be pretty straight forward to teach both
EarlyCSE and GVN about them though.
http://llvm.org/bugs/show_bug.cgi?id=20806

New Approaches:
(This isn't so much "being proposed" as "being put forward for discussion".)

1) Introduce a new metadata type "initialized-before-load" which implies
that the value of any two loads tagged with the metadata along any
execution path must yield the same result.

This doesn't give much freedom to the 'first' load; it's still control
dependent, can't be reordered with preceding stores, can't be lifted out
of loops, etc... It can however be reordered with following stores or
sunk out of a loop provided the loop body is known to execute at least
once.

The second load has a lot more freedom. Provided that there is always
another load to the same location (with the metadata) provable preceding
it on all paths, it can be reordered freely, lifted over control
branches, lifted out of loops, etc... Importantly, it is also legal to
forward the value of a preceding load to a later load provided that a)
both have the metadata and b) that one load executes strictly before the
other.

A store marked "initialized-before-load" is undefined if there is a load
with the same metadata on the same location preceding it. There may be
*multiple* stores to the location along a path, provided that the first
load is strictly after *all* of them.

This seems staight forward to implement in EarlyCSE and LICM. I haven't
looked closely at GVN, but expect it's probably not hard either.

2) Introduce a slightly different metadata "initialized-once". Semantics
are very similar to the preceding except that there can only be a single
store to the location along any path.

Value forwarding from the store to a following load (with metadata) is
allowed regardless of potentially aliasing intervening stores.

This was actually my original idea, but it has a couple of problems.
First, it breaks on surprisingly common initialization patterns such as
default initialization followed by real initialization. Secondly, I'm
not sure the optimizer would always preserve the "write once" property.
In particular, the optimizer is free to divide a large write into
several smaller ones (assuming the write is not atomic.)



Thoughts? Suggestions? Similar sounding problems this might be able to
solve?

Philip

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Kevin Modzelewski

unread,
Sep 10, 2014, 5:49:39 PM9/10/14
to Philip Reames, llv...@cs.uiuc.edu
We have some similar cases and wanted the same thing; what we were doing for a while is using the third "is constant" field of the TBAA metadata and setting that to 1.  I'm not 100% sure what the semantics of that are -- LangRef says it means that pointsToConstantMemory() returns true which means that the memory is "impossible ... to be modified", which seems like not quite a fit for this set-exactly-once use case.  In practice, looking at the IR after our optimization pipeline, we were getting the results we wanted: if a store and subsequent loads were seen together, the store would remain and the value would be forwarded to all the loads.  (I don't think I looked at the "multiple loads with no visible store which should get collapsed to a single load" case.)  ymmv

We've disabled the optimization for now, since without an easy way of putting the annotation in the C++ source code and getting it passed through clang, it became a burden to keep the application logic in sync with the tbaa-fixup code that lived in a different place.  We'll fix it eventually...

kmod

Philip Reames

unread,
Sep 10, 2014, 6:25:05 PM9/10/14
to Kevin Modzelewski, llv...@cs.uiuc.edu
On 09/10/2014 02:42 PM, Kevin Modzelewski wrote:
We have some similar cases and wanted the same thing; what we were doing for a while is using the third "is constant" field of the TBAA metadata and setting that to 1.  I'm not 100% sure what the semantics of that are -- LangRef says it means that pointsToConstantMemory() returns true which means that the memory is "impossible ... to be modified", which seems like not quite a fit for this set-exactly-once use case.  In practice, looking at the IR after our optimization pipeline, we were getting the results we wanted: if a store and subsequent loads were seen together, the store would remain and the value would be forwarded to all the loads.  (I don't think I looked at the "multiple loads with no visible store which should get collapsed to a single load" case.)  ymmv
I hadn't looked at this approach much, but based on the documentation, you're basically just asking for miscompiles here.  The semantics seem to be the same as "invariant.load".   While the optimizer happens to not be removing the stores, it seems like it would be perfectly legal for it to do so. 

We've disabled the optimization for now, since without an easy way of putting the annotation in the C++ source code and getting it passed through clang, it became a burden to keep the application logic in sync with the tbaa-fixup code that lived in a different place.  We'll fix it eventually...
I can't comment on this part.  I'm assuming the C++ code being compiled is your runtime or something?

Kevin Modzelewski

unread,
Sep 11, 2014, 5:33:05 AM9/11/14
to Philip Reames, llv...@cs.uiuc.edu
On Wed, Sep 10, 2014 at 3:21 PM, Philip Reames <list...@philipreames.com> wrote:
On 09/10/2014 02:42 PM, Kevin Modzelewski wrote:
We have some similar cases and wanted the same thing; what we were doing for a while is using the third "is constant" field of the TBAA metadata and setting that to 1.  I'm not 100% sure what the semantics of that are -- LangRef says it means that pointsToConstantMemory() returns true which means that the memory is "impossible ... to be modified", which seems like not quite a fit for this set-exactly-once use case.  In practice, looking at the IR after our optimization pipeline, we were getting the results we wanted: if a store and subsequent loads were seen together, the store would remain and the value would be forwarded to all the loads.  (I don't think I looked at the "multiple loads with no visible store which should get collapsed to a single load" case.)  ymmv
I hadn't looked at this approach much, but based on the documentation, you're basically just asking for miscompiles here.  The semantics seem to be the same as "invariant.load".   While the optimizer happens to not be removing the stores, it seems like it would be perfectly legal for it to do so. 

Yeah, agreed that it's risky based on the documentation, but it feels like this is the intended use case for that "is constant" TBAA field, since I'm not sure when the described semantics could be useful.

Hal Finkel

unread,
Sep 28, 2014, 4:26:09 AM9/28/14
to Philip Reames, llv...@cs.uiuc.edu
I think that, at least in theory, we already have a solution to this problem. If, after the initialization is complete, you insert a call to the llvm.invariant.start intrinsic (and perhaps also do so at the start of any routine you know can be called only after initialization is complete), that should convey the information you want.

I've never worked with this intrinsic before, but if this does not work, I'd be really curious to know why.

Thanks again,
Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Philip Reames

unread,
Sep 29, 2014, 7:09:36 PM9/29/14
to Hal Finkel, llv...@cs.uiuc.edu
Huh. You're right, this should work. I'm going to run some experiments
and see what happens in practice.

Thanks for the pointer on this! Not sure how I managed to miss this
approach.

Philip Reames

unread,
Oct 10, 2014, 9:12:30 PM10/10/14
to Hal Finkel, llv...@cs.uiuc.edu

On 09/28/2014 01:22 AM, Hal Finkel wrote:
From some experiments, the existing invariant intrinsics appear nearly
useless for my purposes. The problem is that any call can hide a
invariant.end intrinsic. As a result, the optimizer must conservatively
assume that any call clobbers the "invariant" location. This makes the
intrinsic a non-starter in it's current form.

; Function Attrs: uwtable
define zeroext i1 @_Z4testv() #0 {
%1 = tail call noalias i8* @_Znwm(i64 4) #3
%2 = bitcast i8* %1 to i32*
store i32 5, i32* %2, align 4, !tbaa !1
%discard = call {}* @llvm.invariant.start(i64 4, i8* %1)
tail call void @_Z6escapePi(i32* %2)
%3 = load i32* %2, align 4, !tbaa !1
%4 = icmp eq i32 %3, 5 <-- this conditional should be folded away and is not
ret i1 %4
}

declare {}* @llvm.invariant.start(i64, i8*)

; Function Attrs: nobuiltin
declare noalias i8* @_Znwm(i64) #1

declare void @_Z6escapePi(i32*) #2

It also appears that the intrinsic has limited implementation in the
optimizer. Even surprisingly simple cases don't appear to kick in.
Consider:
define zeroext i1 @_Z4testv() #0 {
%1 = tail call noalias i8* @_Znwm(i64 4) #4
%2 = bitcast i8* %1 to i32*
store i32 5, i32* %2, align 4, !tbaa !1
%discard = tail call {}* @llvm.invariant.start(i64 4, i8* %1)
%3 = load i32* %2, align 4, !tbaa !1
%4 = icmp eq i32 %3, 5 <-- This conditional should be folded
tail call void @_Z6escapePi(i32* %2, i1 %4)
%5 = load i32* %2, align 4, !tbaa !1
%6 = icmp eq i32 %5, 5
ret i1 %6
}


We could extend the existing intrinsic with a notion of invariant.start
that has no end. This could be as simple as adding a boolean parameter
to the intrinsic. I think this could be made to work. There'd be other
implementation work needed to make it actually useful, the validity
based on dominance model used by assumes seems like an obvious candidate.

Thinking through this, I see a couple of places that we'd change:
- isLocationKnownInvariant(Value* Loc, Instruction* Cntx) and related
analysis pass (analogous to assumptions)
- The new "no end" version is easy (dominance)
- The older version is a graph walk looking paths which include
either an end, or call.
- eagerly propagate known invariant location store values in EarlyCSE
(strictly by dominance for the new intrinsic form)
- Add an InstCombine rule to fold a store to a known invariant location
to undef
- Teach GVN about it (gets the non dominance cases), could also do fun
things which phis of invariant locations, but not sure that's actually
useful
- Teach LICM to lift loads of known invariant locations out of loops

Does this seem like a workable approach?

Philip

Hal Finkel

unread,
Oct 11, 2014, 9:05:57 PM10/11/14
to Philip Reames, llv...@cs.uiuc.edu
Yes, I think this sounds quite reasonable.

-Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Philip Reames

unread,
Oct 13, 2014, 6:05:19 PM10/13/14
to Hal Finkel, llv...@cs.uiuc.edu
I'll start throwing together some patches. It'll take a couple of weeks
due to current distractions, but I'll get something up for review when I
can.

Philip

Andrew Trick

unread,
Oct 20, 2014, 7:39:12 PM10/20/14
to Philip Reames, llv...@cs.uiuc.edu

This is also something that I've wanted to get support for, so thanks for
working on it.

I've never realy understood how the llvm.invariant intrinsics could be
put into practice. There is the problem that "end" can occur anywhere
as you suggested fixing with a flag. But there is also the problem
that "start" needs to be visible in-scope. Additionally,
llvm.invariant is inherently control dependent, which doesn't really
allow the optimizer to hoist the invariant loads the way we want it
to. Are you planning to have your frontend emit llvm.invariant.start
intrinsics *everywhere* the field is accessed. If the frontend creates
a bunch of small llvm.invariant scopes, with arbitrary calls in
between, we won't be able to hoist or combine them. Although with your
proposal we could at least eliminate the llvm.invariant calls that are
dominated by another identical one.

I think it would be simpler and more powerful for the frontend to
treat loads from a particular type as invariant and have a safe way to
express the initialization. In other words, the frontend could always
emit the invariant metadata on the loads, but guard the store to avoid
loading before initialization.

The question is how to guard the initial store(s). Because we treat
invariant metadata as effectively meaning "can't alias", I think we
need to express this as a data dependence on the pointer that we load
from.

Conceptually:

%p = unaliased address
store val, %p
%a = llvm.safe_cast %p ; just made this up
val = load %a !invariant

This approach would give optimization maximum freedom in code that
doesn't involve initialization. It would block store-load forwarding
from the initializer itself to the invariant loads, but I think
handling that is a simple peephole.

There are multiple important use-cases for write-once data:
- Constant globals with nontrivial initialization.
- Runtime metadata, including but not limited to object headers.
- Builtin data structures with immutable semantics.
- Maybe even language features for user defined data structures.

In some cases, we want to lazilly initialize a value. Some cases even
have conditionally invariant values. Ideally, we can express all of
these situations in LLVM IR. I think of the generalization of the
problem as:

- some arbitrary condition produces a token
- all loads from the immutable memory must acquire this token

It might look like this:

%p1 = ...
%p2 = llvm.safe_cast %p1
%a = select i1 %is_immutable, %p1, %p2

load %a !invariant

If %is_immutable can be proven to be true, then the safe_cast becomes
a nop. Otherwise, the safe_cast appears to read memory reachable from
its pointer, preventing stores from being reordered across it. The
safe_cast is readonly, so loads can be hoisted and optimized across it
and redundant stores can be eliminated.

Examples:

(1) Immutable fields in builtin data structures (Array.length)

; initialization

%p = <allocate>
store %v0, %p
%a = llvm.safe_cast %p
store %a, %container

...

; access to the immutable data in a different scope

%a = load %container
%v1 = load %a, !invariant

Here the second load is invariant but depends on the first load of the
container object (array), %a. The first load effectively acquires the
token. This load is not generally invariant, but likely loop invariant
and may benefit from TBAA. We would get the same effect by returning
the pointer to the "new" array and wouldn't need the first load in
that case.

(2) Lazilly initialized data

%p1 = ...
if (!%is_initialized) ; some arbitrary condition
store %initval, %p1
%p2 = llvm.safe_cast %p1
%a = select i1 %is_initialized, %p1, %p2
%v = load %a, !invariant

Here we have an invariant load guarded by a condition,
%is_initialized. If the optimizer can reason about the condition,
which may be tested repeatedly in the same scope, it can optimize away
the lazy initialization path. These patterns can potentially be
optimized via correlated value propagation, jump-threading, and
unswitching.

(3) Conditionally immutable data

I realize this case is confusing but I'm laying out an example here anyway as a proof of concept.

; Load the internal state of an object. Once it is immutable, it is
; always immutable. This computation is marked invariant so it can be
; hoisted and combined.

%invariant_internal_state = load %object, !invariant
%invariant_is_immutable_flag = load %invariant_internal_state, !invariant
%invariant_is_immutable_check = cmp %invariant_is_immutable_flag, ...

; Any subsequent load of the internal state after it may have changed
; must be guarded.

%safe_object = llvm.safe_cast %object

; conditionally reload the object

%reload_internal_state = load %safe_object
%internal_state =
select i1 %invariant_is_immutable_check,
%invariant_internal_state,
%reload_internal_state,

The %invariant_is_immutable_check can now be combined and hoisted out
of loops.

This case is tricky because every load of the object's internal state
must be guarded. It's up to the front end to handle this. This
technique may be useful if you need to aggressively optimize for case
in which an object is dynamically determined to be immutable, and that
is expected to be the common case.

I think the safe_cast intrinsic would work for your array.length case
and also for initialization of metadata. It wouldn't be as great for
standard global variables because we would either need to introduce a
level of indirection to get at the global, introduce a flag to check
initialization, or add a safe_cast to every read of the variable.

For this technique to be really effective, we also need to give the
invariant loads may-speculate semantics. Do you have a plan for that?
I'm fine with introducing another "may-speculate" metadata type for it
initially. We could then use safe_cast intrinsic to guard these loads
on type checks in addition to initialization.

I would also give the safe_cast a constant ID and semantics so that
only casts with the same ID could be combined. For example, two checks
on the same type can be combined if one dominates the other,
regardless of the intervening code. However, we could have multiple
independent checks on the same pointer value, each with different
control dependencies.

-Andy

Sanjoy Das

unread,
Oct 21, 2014, 3:32:53 AM10/21/14
to Andrew Trick, LLVM Developers Mailing List
> I've never realy understood how the llvm.invariant intrinsics could be
> put into practice. There is the problem that "end" can occur anywhere
> as you suggested fixing with a flag.

I was under this impression too, but after re-reading the language
reference I'm not so sure -- it says about invariant.start: "This
intrinsic indicates that
until an llvm.invariant.end that uses the return value, the referenced
memory location is constant and unchanging.". I think this implies
that if the result of an `llvm.invariant.start` doesn't escape, we can
safely conclude that there can be no matching `llvm.invariant.end`.

I'm still a little hazy on all of the use cases you want to support,
but one problem with llvm.safe_cast is, as you note, stores to the
pointer passed to safe_cast will not be forwarded to loads from the
pointer it returns. I think this issue can be solved by modeling
`llvm.safe_cast` not as a transformation on a pointer, but as a check.
Specifically, I imagine a readonly intrinsic `llvm.assert_immutable`
that "asserts" that the pointer passed to it is immutable, and unwinds
if that fact isn't true (these are only imaginary semantics -- we know
the intrinsic will never actually unwind). This means
`llvm.assert_immutable` can't be marked nounwind (otherwise it would
just get optimized away); but since it doesn't need to unwind to the
same frame, a CallInst is sufficient (an InvokeInst would've increased
the CFG's complexity).

So

%p = unaliased address
store 42, %p
%a = llvm.safe_cast %p ; just made this up
%val = load %a !invariant

becomes

%p = unaliased address
store 42, %p
call void @llvm.assert_immutable(%p)
%val = load %p !invariant

AFAICT, "load %p" can not, in general, be re-ordered to before the
call to @llvm.assert_immutable because we don't know what condition it
uses to decide if it should unwind. There are cases where I think
such a re-ordering would be legal (an unused %val is one example,
another example is where the only use of %val is a comparison with
undef) but if I understand correctly, re-ordering a load with the call
to @llvm.assert_immutable can only be a performance loss -- it will
change dominance in a way that we'll "forget" that a load was
immutable. And I think in most of the cases that are interesting to
optimize, the re-ordering will not be legal.

It is still legal to value forward the store though:

%p = unaliased address
store 42, %p
call void @llvm.assert_immutable(%p)
%val = 42

because *if* assert_immutable does not unwind, %val will see the
dominating store, because assert_immutable is readonly.

Again,

%p1 = ...
%p2 = llvm.safe_cast %p1
%a = select i1 %is_immutable, %p1, %p2
load %a !invariant

could become

%p1 = ...
if (not %is_immutable) {
call llvm.assert_immutable(%p1)
}
load %p1

or, if we wish to reduce control flow, we could define
`llvm.assert_immutable` to be a no-op on null:

%p1 = ...
%a = select i1 %is_immutable, null, %p1,
call llvm.assert_immutable(%a)
load %p1 !invariant


`llvm.assert_immutable` does with control dependencies what
`llvm.safe_cast` does with data dependencies.

> llvm.invariant is inherently control dependent, which doesn't really
> allow the optimizer to hoist the invariant loads the way we want it
> to.

I'm curious about why you think so -- do you have examples that
demonstrate this? And is this a flaw in how llvm implements llvm.invariant
or something fundamental about control dependencies?

-- Sanjoy

Andrew Trick

unread,
Oct 21, 2014, 2:02:56 PM10/21/14
to Sanjoy Das, LLVM Developers Mailing List
Just so you know, I am not a big fan of adding cast nodes into the use-def chain. It has to be a last resort. I proposed it this time because I don't see another way to fully express the semantics, and in many cases the casts are limited to the initialization code. I think the optimizer could be taught to store->load forward across them with some effort.

AFAICT The only difference between assert_immutable and invariant.start is that assert_immutable "maythrow", and we don't need to find all uses of the pointer.

assert_immutable will handle your use case safely as long as you don't mark it "readonly" (loads can move above readonly calls).
http://llvm.1065342.n5.nabble.com/Does-nounwind-have-semantics-td59631.html

The problems that we will face with assert_immutable are:

- Because it is maythrow and is not readonly, there is a dependence between assert_immutable and *all* subsequent loads. This could seriously impact optimization.

- Since the invariant load is not actually marked invariant, it can't be reordered with other stores and nonreadonly calls.

- It does not allow us in the future to speculatively hoist certain invariant loads above maythrow calls and branches.

So, I'm ok with assert_immutable as a solution to some subset of problems. But it is incomplete and suboptimal. I suspect that introducing non-readonly calls will do more harm than the benefit you get from marking some loads invariant.

Also note that this could get interesting when we introduce TBAA on calls:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066214.html

We would like to hoist loads above calls that have non-interfering TBAA tags, but also may throw (patchpoints and stackmaps are marked maythrow).

Say we declare that loads can be hoisted above maythrow calls with TBAA. If TBAA is dropped it becomes conservative. Then you could give your assert_immutable calls TBAA that only interferes with the array length load. That would solve some of your optimization issues, but only really works well if most of your loads and calls have precise TBAA tags.

The more aggressive solution I was proposing was to mark loads that are guarded with explicit safety checks as "maySpeculate". If we do that we definitely need a way to create a data dependence on a cast - this approach is incompatible with assert_immutable.

> `llvm.assert_immutable` does with control dependencies what
> `llvm.safe_cast` does with data dependencies.
>
>> llvm.invariant is inherently control dependent, which doesn't really
>> allow the optimizer to hoist the invariant loads the way we want it
>> to.
>
> I'm curious about why you think so -- do you have examples that
> demonstrate this? And is this a flaw in how llvm implements llvm.invariant
> or something fundamental about control dependencies?

I am saying exactly what you said just above the quote: llvm.invariant and llvm.assert_immutable guarantee safety by relying on implicit control dependence. That doesn't really give us full freedom to optimize the guarded invariant loads. They still can't be optimized across other non-readonly calls or hoisted above branches or out of loops. This is fundamental to the representation - it's not an implementation detail.

-Andy

Sanjoy Das

unread,
Oct 21, 2014, 2:50:05 PM10/21/14
to Andrew Trick, LLVM Developers Mailing List
Thank you for the explanation, I think I misunderstood your solution
initially. Is it accurate to say: by making the definition of the
source pointer of an !invariant load control (or data) dependent on
some initialization event, you can effectively separate the
initialization event (where the location is not invariant) and the
rest of the execution (where the location is invariant).

This approach looks similar to Ruby's "freeze" method:
http://ruby-doc.org/core-2.1.3/Object.html#method-i-freeze

What is the aliasing relationship between a non-invariant load from
the result of a safe_cast and a load from (or store to) the argument
we pass to it? Is it sensible to make them MustAlias? I'm trying to
think if a frontend needs to be careful about using the return value
of check_cast only for !invariant loads (or some other restriction),
or is it legal (or desirable) to mix and match both kinds of loads
from both kinds of pointers.

Andrew Trick

unread,
Oct 21, 2014, 4:49:02 PM10/21/14
to Sanjoy Das, LLVM Developers Mailing List

> On Oct 21, 2014, at 11:46 AM, Sanjoy Das <san...@playingwithpointers.com> wrote:
>
> Thank you for the explanation, I think I misunderstood your solution
> initially. Is it accurate to say: by making the definition of the
> source pointer of an !invariant load control (or data) dependent on
> some initialization event, you can effectively separate the
> initialization event (where the location is not invariant) and the
> rest of the execution (where the location is invariant).
>
> This approach looks similar to Ruby's "freeze" method:
> http://ruby-doc.org/core-2.1.3/Object.html#method-i-freeze
>
> What is the aliasing relationship between a non-invariant load from
> the result of a safe_cast and a load from (or store to) the argument
> we pass to it? Is it sensible to make them MustAlias? I'm trying to
> think if a frontend needs to be careful about using the return value
> of check_cast only for !invariant loads (or some other restriction),
> or is it legal (or desirable) to mix and match both kinds of loads
> from both kinds of pointers.

%p1 = ...
store %v0, %p1
%p2 = llvm.safe_cast %p1
store %v1, %p1
store %v2, %p2
%v3 = load %p2, !invariant

The load can be moved above both stores (%v1 and %v2). AFAIK, the invariant metadata tag essentially tells alias analysis not to even try, just assume no aliasing. So the semantics effectively guarantee that %v0==%v1==%v3==%v3. The IR is still “legal” in that it is verifiable, but it is nonsense.

Note that it is also totally valid for an alias analysis to return that they MustAlias. That would allow forwarding from store %v0 to the load. This would have to be implemented as a special case, since the invariant tag normally bypasses pointer analysis. I admit this is weird, but I’m don’t know of any problems it will cause. This can already happen with TBAA, and will be a problem with any metadata that can be abused by the frontend or language design. A difficultly with this approach is that the frontend has the burden for emitting guards anywhere the “invariant” data could be modified.

How do you imagine mixing and matching invariant and non-invariant? I showed some examples of doing that safely, but I’m not sure how it’s relevant to you.

-Andy

Philip Reames

unread,
Oct 21, 2014, 7:11:12 PM10/21/14
to Andrew Trick, Sanjoy Das, LLVM Developers Mailing List
Ok, I'm catching up on this thread. Let me summarize a couple of things
I noticed while reading.

Sanjoy made a good point. We don't actually need a new variant of
"invariant.start". Simply using an invariant.start with no uses gives
us a notion of an invariant region with no end. (Since the result
doesn't escape, there can be no end hidden inside a function call.)
This seems like a simple notion to exploit and is strict step forward
from where we are, without a new intrinsic.

The notions of "invariant over some range" and "safe to speculate over
some range" are related, but distinct. However, to get the most power
out of the former, you often need the later.

Andy seems to be arguing in favour of data flow mostly because it's
easier for the optimizer. Unless I'm misreading, there's nothing
preventing us from using control flow for the same patterns. For example:
p1 = ...
*p1 = 5
p2 = llvm.safe_cast p1
p3 = select is_constant p2, p1
load p3

Is isomorphic with:
p1 = ...
*p1 = 5
if is_constant:
invariant.start(p1)
load p1

Andy, is this correct? Or am I missing something?

(p.s. You guys seem to be throwing around the terms "!invariant" and
"invariant load" without clearly defining them. I'm unsure if you mean
the same things.)


I suspect this is going to be a topic we should discuss in person at the
developers conference or over a beer at a social. It's a bit
complicated to keep track of in email.


My belief is that a path sensitive invariant indicator for a memory
location is probably sufficient for most use cases. In particular, the
initialization and use logic is usually not in the same compilation for
the cases I care about. I care that such cases are *possible*, but I'm
less concerned about them being *easy*. I'm also having real trouble
find examples of cases where Andy's "conditionally invariant data" is
useful. Given that we *have an existing* mechanism which only needs a
slightly tweak for "endless" invariant regions, I'm tempted to stick
with what we have.

The only change (clarification?) I'd make to the current scheme is to
specify that a location is both invariant *and dereferenceable* at the
point it is listed in an invariant.start. This enables speculative
loading up to that construct. I think this is actually what the current
intrinsic is trying for from the documentation.

(Hm.. it does make it hard to rearrange calls to invariant.start
though... Thoughts?)

I suspect that for the Array.length case, I'd probably do something like
this:
arr = function_that_returns_array();
invariant.start(arr.length)
... arbitrary control flow ...
length = arr.length
... arbitrary control flow ...
length2 = arr.length

My reasoning here is that if something returns an array (in this
compilation) I want the invariant fact to be visible. If that factory
function gets inlined, I'd end up with:
arr = new Array;
arr.length = X;
invariant.start(arr.length)
... arbitrary control flow ...
length = arr.length
... arbitrary control flow ...
length2 = arr.length

Both cases seem nearly ideal here. I know I can lift the loads as far
as the function which returns the base pointer. I know I can value
forward loads (either with or without lifting). I know I can value
forward the store to the load (as I normally would).

In terms of aliasing, it seems like having invariant.start read from
it's arguments memory is sufficient. That prevents reorderings of
stores to that location before the address to after the
invariant.start. Due to conservative aliasing, that might be somewhat
harmful, but I don't really see a way to avoid this... (Or is this the
point Andy's trying to solve with the data flow?) I don't have a sense
for how painful this is. Try it and see?

I could see a useful extension for a function attribute (type?, TBAA?)
for describing the return value's invariant fields (*at the point of the
return*), but that is more minor goodness for IR size then clearly
necessary.

Philip

Andrew Trick

unread,
Oct 22, 2014, 1:43:00 PM10/22/14
to Philip Reames, LLVM Developers Mailing List

Yes. Exactly.

> Andy seems to be arguing in favour of data flow mostly because it's easier for the optimizer. Unless I'm misreading, there's nothing preventing us from using control flow for the same patterns. For example:
> p1 = ...
> *p1 = 5
> p2 = llvm.safe_cast p1
> p3 = select is_constant p2, p1
> load p3
>
> Is isomorphic with:
> p1 = ...
> *p1 = 5
> if is_constant:
> invariant.start(p1)
> load p1
>
> Andy, is this correct? Or am I missing something?

Yes, you are missing my main point. In the first case, you can actually mark load p3 with the !invariant tag. The cast and load don’t need to be in the same scope.

> (p.s. You guys seem to be throwing around the terms "!invariant" and "invariant load" without clearly defining them. I'm unsure if you mean the same things.)

Were talking about two things
- loads of data that is known to be immutable, path sensitive
- load that are marked with !invariant metadata (I often say “invariant load” because !invariant looks like “not invariant”)

> I suspect this is going to be a topic we should discuss in person at the developers conference or over a beer at a social. It's a bit complicated to keep track of in email.

Definitely.

I am not at all opposed to an assert_immutable intrinsic. I just don’t think it will be performant for the three reasons I stated in the previous email.

> In terms of aliasing, it seems like having invariant.start read from it's arguments memory is sufficient. That prevents reorderings of stores to that location before the address to after the invariant.start. Due to conservative aliasing, that might be somewhat harmful, but I don't really see a way to avoid this... (Or is this the point Andy's trying to solve with the data flow?) I don't have a sense for how painful this is. Try it and see?
>
> I could see a useful extension for a function attribute (type?, TBAA?) for describing the return value's invariant fields (*at the point of the return*), but that is more minor goodness for IR size then clearly necessary.

invariant.start/assert_immutable has to write to memory to get the semantics you want. “maythrow” is insufficient (see previous message). I proposed new semantics based on TBAA tags as a partial fix for this problem.

-Andy

> My belief is that a path sensitive invariant indicator for a memory location is probably sufficient for most use cases. In particular, the initialization and use logic is usually not in the same compilation for the cases I care about. I care that such cases are *possible*, but I'm less concerned about them being *easy*. I'm also having real trouble find examples of cases where Andy's "conditionally invariant data" is useful. Given that we *have an existing* mechanism which only needs a slightly tweak for "endless" invariant regions, I'm tempted to stick with what we have.
>
> The only change (clarification?) I'd make to the current scheme is to specify that a location is both invariant *and dereferenceable* at the point it is listed in an invariant.start. This enables speculative loading up to that construct. I think this is actually what the current intrinsic is trying for from the documentation.
>
> (Hm.. it does make it hard to rearrange calls to invariant.start though... Thoughts?)
>
> I suspect that for the Array.length case, I'd probably do something like this:
> arr = function_that_returns_array();
> invariant.start(arr.length)
> ... arbitrary control flow ...
> length = arr.length
> ... arbitrary control flow ...
> length2 = arr.length
>
> My reasoning here is that if something returns an array (in this compilation) I want the invariant fact to be visible. If that factory function gets inlined, I'd end up with:
> arr = new Array;
> arr.length = X;
> invariant.start(arr.length)
> ... arbitrary control flow ...
> length = arr.length
> ... arbitrary control flow ...
> length2 = arr.length
>
> Both cases seem nearly ideal here. I know I can lift the loads as far as the function which returns the base pointer. I know I can value forward loads (either with or without lifting). I know I can value forward the store to the load (as I normally would).
>
>
>

Andrew Trick

unread,
Dec 1, 2014, 2:28:43 PM12/1/14
to Philip Reames, LLVM Developers Mailing List

On Oct 21, 2014, at 4:03 PM, Philip Reames <list...@philipreames.com> wrote:

Sanjoy made a good point.  We don't actually need a new variant of "invariant.start".  Simply using an invariant.start with no uses gives us a notion of an invariant region with no end.  (Since the result doesn't escape, there can be no end hidden inside a function call.)   This seems like a simple notion to exploit and is strict step forward from where we are, without a new intrinsic.

I'm coming back to this because it is a sticking point for me in all of the proposals that were floated informally on the commits list. I would really like this to work, but can't prove that it's safe.

Given:

%a = newArray()
%pLen = gep(%a, <length offset>)
invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
store %something
load %o

I want to claim that the LLVM optimizer cannot do the following (but don't have a complete line of reasoning yet):

%a = newArray()
%pLen = gep(%a, <length offset>)
invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
if ptrtoint(%o) == ptrtoint(%pLen)
  load %pLen
  store %something // Oops... this might alias
else
  store %something
  load %o

Either we need to make a convincing argument, or bail on most of the tentative proposals for expressing invariance, and resort to generating IR like this:

%a = newArray()
%pLen = gep(%a, <length offset>)
%t = invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
invariant.end(%t, <size>, %pLen)
store %something
load %o

...which is pretty painful to implement and may be very difficult to optimize.

-Andy

Philip Reames

unread,
Dec 1, 2014, 5:31:17 PM12/1/14
to Andrew Trick, LLVM Developers Mailing List

On 12/01/2014 11:14 AM, Andrew Trick wrote:

On Oct 21, 2014, at 4:03 PM, Philip Reames <list...@philipreames.com> wrote:

Sanjoy made a good point.  We don't actually need a new variant of "invariant.start".  Simply using an invariant.start with no uses gives us a notion of an invariant region with no end.  (Since the result doesn't escape, there can be no end hidden inside a function call.)   This seems like a simple notion to exploit and is strict step forward from where we are, without a new intrinsic.

I'm coming back to this because it is a sticking point for me in all of the proposals that were floated informally on the commits list. I would really like this to work, but can't prove that it's safe.
As I said in the other thread, !invariant.load and llvm.invariant.* are not the same.  This thread is discussing the later, not the former.  The other thread is discussing the former, but not the later. 

Given:

%a = newArray()
%pLen = gep(%a, <length offset>)
invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
store %something
load %o

I want to claim that the LLVM optimizer cannot do the following (but don't have a complete line of reasoning yet):

%a = newArray()
%pLen = gep(%a, <length offset>)
invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
if ptrtoint(%o) == ptrtoint(%pLen)
  load %pLen
  store %something // Oops... this might alias
else
  store %something
  load %o
Just to make sure we're on the same page, it *would* be legal for the optimizer to construct:

%a = newArray()
%pLen = gep(%a, <length offset>)
invariant.start(<size>, %pLen)
%len = load %pLen

%o = foo() // may deallocate %a
if ptrtoint(%o) == ptrtoint(%pLen)
  store %something // Oops... this might alias
  load %pLen <--- This is now after the store
else
  store %something
  load %o

Are we in agreement here? 

What is the argument that you see leading from the second to your problematic example?  I'm missing the reasoning by which you got there.  I agree that the example is problematic, I'm just not sure how it would arise. 

Andrew Trick

unread,
Dec 1, 2014, 5:50:42 PM12/1/14
to Philip Reames, LLVM Developers Mailing List
It’s fine in the sense that memory access has not been reordered… yet.

But is there an implicit invariant.end(%pLen) at the call to foo(), at which point the array can be freed and another object allocated in its place? I don’t think so. The way I interpreted the conclusion of this email thread is that invariant.start can be used without invariant.end. Since the token returned by the intrinsic never escapes, you can just assume the memory is invariant at all uses. The problem here is that the optimizer could (in theory) introduce new uses of the pointer after the object lifetime. This is the same problem that you yourself have raised. I hoped we could sidestep the problem because it crops up with any representation that attaches invariance to a pointer value. If we have an answer for this, then we can easily debate other representations like broadening !invariant.load metadata semantics and introducing new intrinsics.

-Andy

Philip Reames

unread,
Dec 1, 2014, 6:29:03 PM12/1/14
to Andrew Trick, LLVM Developers Mailing List
I think I've actually convinced myself my example is not okay, but for a completely different reason.  :)  See my last comment below.


But is there an implicit invariant.end(%pLen) at the call to foo(), at which point the array can be freed and another object allocated in its place? I don’t think so.
Given the current semantics, if the result of the invariant.start is not passed to foo, we can assume foo does not contain an invariant.end.  However, for foo to allocate a new object in the same memory, it must write to that memory.  Doing so without ending the previous invariant region is ill defined.  As such, your example isn't a valid test. 

We have two cases:
a) The invariant region hasn't ended.  As a result, the input IR is ill defined and the problematic reordering can happen. 
b) The invariant region ended (and we either encountered a llvm.invariant.end, or saw the result of invariant.start passed to foo). In this case, the location is no longer invariant and the reordering isn't legal and wouldn't happen.

As I think came up before, this would imply that an invariant.end can't be dropped without dropping it's corresponding invariant.start.  That's problematic. 

The way I interpreted the conclusion of this email thread is that invariant.start can be used without invariant.end.
We had a proposal for this, but I don't believe it's actually been implemented yet.  I believe this proposal is still consistent with everything I said above though. 
Since the token returned by the intrinsic never escapes, you can just assume the memory is invariant at all uses. The problem here is that the optimizer could (in theory) introduce new uses of the pointer after the object lifetime. This is the same problem that you yourself have raised. I hoped we could sidestep the problem because it crops up with any representation that attaches invariance to a pointer value. If we have an answer for this, then we can easily debate other representations like broadening !invariant.load metadata semantics and introducing new intrinsics.
I agree there is a general issue here.  However, I don't actually think it's an issue of invariance.  Instead, I see this as being a problem of *derefenceability*.  Regardless of the 'invariant-ness' of the memory in question, it's problematic for the optimizer to insert uses after a deallocation because the memory transitions from dereferenceable to non-dereferenceable.  (i.e. Consider a malloc routine allocates one page per object, and a free which protects the page freed.)  Any transformation which inserts such a load is dangerous. 

What your example really shows is that we can have two names for the same *memory address*.  One of those names can be dereferenceable while the other is not. 

The dynamic check of the pointer values essentially allows the optimizer to chose which of the two names for the same physical address it wants to use.  I think that is valid, but it's odd to say the least.  The particular example you've shown isn't legal since the optimizer is choosing to insert a load to a non-dereferenceable location and thus could introduce a fault. 

Philip Reames

unread,
Dec 1, 2014, 6:47:27 PM12/1/14
to llv...@cs.uiuc.edu
(Spawning a separate subthread off the 'Optimization hints for "constant" loads' discussion for a related question. )

Looking at TBAA again, I was reminded that TBAA also contains a third field which indicates that "meaning pointsToConstantMemory should return true; see other useful AliasAnalysis methods".  Looking at this a bit, it really seems like this flag has the exact same meaning as !invariant.load. 

pointsToConstantMemory returns a value for a Location.  Since it is entirely legal to have two Locations which describe the same physical memory, it seems like you'd be back to the same semantics as !invariant.load. 

The only uncertainty here is that a Location is clearly (??) position/Instruction specific.  Does that also imply that the Location is control dependent?  What is the semantics of the following code?
if (is_known_invariant) {
  load %p, !tbaa is_constant=true
}

Is the optimizer allowed to lift the load above the conditional?  (Assuming it can prove the location is known dereferenceable.)  The semantics for !invariant.load clearly allow this, but do the semantics for TBAA's constant flag? 

I think the answer to these questions is that the load is *not* control dependent on the conditional (assuming it's otherwise known dereferenceable).  Given this, why do we have both?  Should we canonicalize one into the other?

Looking at the current implementations, it appears that TBAA's constant flag is more broadly implemented.  On first glance, I'm really tempted to just deprecate !invariant.load in place of TBAA's constant flag.  Thoughts?

Philip



Andrew Trick

unread,
Dec 1, 2014, 9:12:41 PM12/1/14
to Philip Reames, llv...@cs.uiuc.edu
On Dec 1, 2014, at 3:44 PM, Philip Reames <list...@philipreames.com> wrote:

(Spawning a separate subthread off the 'Optimization hints for "constant" loads' discussion for a related question. )

Looking at TBAA again, I was reminded that TBAA also contains a third field which indicates that "meaning pointsToConstantMemory should return true; see other useful AliasAnalysis methods".  Looking at this a bit, it really seems like this flag has the exact same meaning as !invariant.load. 

pointsToConstantMemory returns a value for a Location.  Since it is entirely legal to have two Locations which describe the same physical memory, it seems like you'd be back to the same semantics as !invariant.load. 

The only uncertainty here is that a Location is clearly (??) position/Instruction specific.  Does that also imply that the Location is control dependent?  What is the semantics of the following code?
if (is_known_invariant) {
  load %p, !tbaa is_constant=true
}

Is the optimizer allowed to lift the load above the conditional?  (Assuming it can prove the location is known dereferenceable.)  The semantics for !invariant.load clearly allow this, but do the semantics for TBAA's constant flag? 

I think the answer to these questions is that the load is *not* control dependent on the conditional (assuming it's otherwise known dereferenceable).  Given this, why do we have both?  Should we canonicalize one into the other?

It would be very confusing if the two had different semantics.

In either case, hoisting the load (without dropping metadata) is definitely legit. But conservatively, the invariance is still a path sensitive property. The load is invariant w.r.t. any store as long as control reaches a use-with-side-effects of the loaded value.

Given:

store %q
m1 = load %p
if <something> {
 m2 = load %p, !invariant.load
}
m = phi(m1, m2)

We can safely convert to:

m2 = load %p, !invariant.load
store %q
m1 = load %p
if <something> {}
m = phi(m1, m2)

But cannot safely convert to:

m = load %p, !invariant.load
store %q

I would *really* like to specify more aggressive semantics so that we can do that, but haven’t adequately proved we can do that safely. I’m don’t think the optimizer will do the unsafe thing today, unless there’s an oversight somewhere. 

Looking at the current implementations, it appears that TBAA's constant flag is more broadly implemented.  On first glance, I'm really tempted to just deprecate !invariant.load in place of TBAA's constant flag.  Thoughts?

I don’t have a strong opinion here. I’m fine relying on TBAA being enabled to active these sort of optimizations.

-Andy

Philip




Andrew Trick

unread,
Dec 2, 2014, 12:57:53 PM12/2/14
to Philip Reames, LLVM Developers Mailing List
Sorry, I didn’t follow your response. I was showing that the optimizer can theoretically substitute the use of one pointer value (%o) with another pointer value (%pLen) after checking value equivalence. Doing that creates multiple uses of the same pointer value where not all uses dereference the same memory object. To me, that means we cannot associate a pointer value/address with an object-specific attribute, like invariance, without tying it to a closed lifetime. I wish we could prevent the optimizer from doing this.

-Andy

Philip Reames

unread,
Dec 2, 2014, 1:06:11 PM12/2/14
to Andrew Trick, llv...@cs.uiuc.edu

On 12/01/2014 06:05 PM, Andrew Trick wrote:

On Dec 1, 2014, at 3:44 PM, Philip Reames <list...@philipreames.com> wrote:

(Spawning a separate subthread off the 'Optimization hints for "constant" loads' discussion for a related question. )

Looking at TBAA again, I was reminded that TBAA also contains a third field which indicates that "meaning pointsToConstantMemory should return true; see other useful AliasAnalysis methods".  Looking at this a bit, it really seems like this flag has the exact same meaning as !invariant.load. 

pointsToConstantMemory returns a value for a Location.  Since it is entirely legal to have two Locations which describe the same physical memory, it seems like you'd be back to the same semantics as !invariant.load. 

The only uncertainty here is that a Location is clearly (??) position/Instruction specific.  Does that also imply that the Location is control dependent?  What is the semantics of the following code?
if (is_known_invariant) {
  load %p, !tbaa is_constant=true
}

Is the optimizer allowed to lift the load above the conditional?  (Assuming it can prove the location is known dereferenceable.)  The semantics for !invariant.load clearly allow this, but do the semantics for TBAA's constant flag? 

I think the answer to these questions is that the load is *not* control dependent on the conditional (assuming it's otherwise known dereferenceable).  Given this, why do we have both?  Should we canonicalize one into the other?

It would be very confusing if the two had different semantics.

In either case, hoisting the load (without dropping metadata) is definitely legit.
Agreed up to here.

But conservatively, the invariance is still a path sensitive property. The load is invariant w.r.t. any store as long as control reaches a use-with-side-effects of the loaded value.
I disagree with this.  The !invariant.load metadata is not a path sensitive property.  It is a property of the pointer value which happens to be described on the load.  Consider how we use the !invariant.load metadata in LICM.  We look at the load (*not* it's uses) and decide to skip all remaining alias checks. 

To be clear, the preceding statement is not true of llvm.invariant.start and llvm.invariant.end. 

Given:

store %q
m1 = load %p
if <something> {
 m2 = load %p, !invariant.load
}
m = phi(m1, m2)

We can safely convert to:

m2 = load %p, !invariant.load
store %q
m1 = load %p
if <something> {}
m = phi(m1, m2)

But cannot safely convert to:

m = load %p, !invariant.load
store %q

I would *really* like to specify more aggressive semantics so that we can do that, but haven’t adequately proved we can do that safely. I’m don’t think the optimizer will do the unsafe thing today, unless there’s an oversight somewhere.
We perform nearly exactly this transformation today.  Consider LICM:
load %p
for(int i = 0; i < 1; i++) {
  store %p
  load %p !invariant.load
}

LICM - in isolation - will transform this to:
load %p
load %p, !invariant.load
for(int i = 0; i < 1; i++) { }
store %p

Can you be more explicit about what you mean by "safely"?  Given the semantics I understand for !invariant.load, the transformation you've described is entirely legal.  We might not implement it today - I don't believe we'd forward the non-invariant load to the invariant load in your original example - but that's a limitation of the implementation. 



Looking at the current implementations, it appears that TBAA's constant flag is more broadly implemented.  On first glance, I'm really tempted to just deprecate !invariant.load in place of TBAA's constant flag.  Thoughts?

I don’t have a strong opinion here. I’m fine relying on TBAA being enabled to active these sort of optimizations.
I'll throw together a patch in a few days.

-Andy

Philip





Andrew Trick

unread,
Dec 2, 2014, 2:16:23 PM12/2/14
to Philip Reames, llv...@cs.uiuc.edu
Here's a hopefully more intuituve example:

%p = <dereferenceable address>
store <value>, %q
if <type check> { // check whether %p is an invariant field
  m1 = load %p, !invariant
  foo(m1)
} else {
  m2 = load %p
  foo(m2)
}

==> LEGAL

%p = <dereferenceable address>
m1 = load %p, !invariant
store <value>, %q
m2 = load %p
if <type check> { // check whether %p is an invariant field
  foo(m1)
} else {
  foo(m2)
}

==> NOT CURRENTLY DONE (and potentially unsafe)

%p = <dereferenceable address>
m1 = load %p, !invariant
store <value>, %q
if <type check> { // check whether %p is an invariant field
  foo(m1)
} else {
  foo(m1)
}

---

In the above example, if the type check fails, then the store to %q may interfere with the load of m2. The final optimization loses the stored value.

The invariance of data independently applies to
(a) the scope over which other stores are guaranteed not to interfere 
(b) the paths in which the invariance holds

Given control proceeds along some path in (b), the invariant load cannot interfere with any store in (a).

Here is the most useful interpretation I can come up with that is also closest to being sound:

The invariant scope (a) includes all stores that may occur after the address becomes dereferencable and before the end of the object lifetime, subject only to existing constraints on code motion. 

The invariant paths are any that include a use-with-side-effect of the loaded value (similar to our poison values).

I don't particularly like this conservative interpretation either. But even being this conservative, we're left an even more fundamtental question: is it valid to use !invariant metadata for any load from an object that doesn't have process lifetime? What prevents an invariant load from sinking below a "free"? i.e. Is this thing really only valid for global constants?

-Andy

Daniel Berlin

unread,
Dec 4, 2014, 4:34:27 PM12/4/14
to Andrew Trick, Philip Reames, LLVM Developers Mailing List
Agreed.


This is actually one of the nice things memory ssa provides/provided GCC.  You got versioning of the global memory state for free, and lifetime info embedded directly into the IR in the normal SSA way.

For a given instruction, the state of the virtual uses for the instruction was sufficient to identify whether the memory state was the same, so value substitution was easy.

You could also do the inverse, and prove a better result than memory ssa already had given you - that the virtual uses were really the same due to non-aliasing.

This made value numbering loads/stores and substitution very easy.
Reply all
Reply to author
Forward
0 new messages