struct Histogram {
int values[256];
int total;
};
Histogram DoIt(const int* image, int size) {
Histogram histogram;
for (int i = 0; i < size; ++i) {
++histogram.values[image[i]]; // (A)
++histogram.total; // (B)
}
return histogram;
}
In theory, inrange should be perfect for this job. If inrange was extended to non-const geps, I think it would work for all cases you care about, right? It’s also the most compact representation, so I would prefer that solution. Better than adding a new call instruction for every single pointer arithmetic instruction.
In terms of BasicAA, supporting inrange shouldn’t be any different from !range metadata I think.
Just my 2c.
Nuno
In theory, inrange should be perfect for this job. If inrange was extended to non-const geps, I think it would work for all cases you care about, right?
It’s also the most compact representation, so I would prefer that solution. Better than adding a new call instruction for every single pointer arithmetic instruction.
In terms of BasicAA, supporting inrange shouldn’t be any different from !range metadata I think.
On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:
> Hi everyone,
>
> tl;dr: I would like to teach clang to output range metadata so that LLVM
> can do better alias analysis. I have a proposal as D99248
> <https://reviews.llvm.org/D99248> (clang part) and D99247
> <https://reviews.llvm.org/D99247> (llvm part). But there are other possible
> options that I'm detailing below.
>
> Consider the following code, adapted from brotli
> <https://en.wikipedia.org/wiki/Brotli>:
>
> ```
>
> struct Histogram {
>
> int values[256];
>
> int total;
>
> };
>
> Histogram DoIt(const int* image, int size) {
>
> Histogram histogram;
>
> for (int i = 0; i < size; ++i) {
>
> ++histogram.values[image[i]]; // (A)
>
> ++histogram.total; // (B)
>
> }
>
> return histogram;
>
> }
> ```
>
> In this code, the compiler does not know anything about the values of
> images[i], so it assumes that 256 is a possible value for it. In that case,
> (A) would change the value of histogram.total, so (B) has to load, add one
> and store [godbolt <https://godbolt.org/z/KxE343>].
>
> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use
> values to form a pointer to total and dereference it. What valid C/C++ code
> is allowed to do with values is:
> - Form any pointer in [values, values + 256].
> - Form and dereference any pointer in [values, values + 256)
>
> Note that the LLVM memory model is much laxer than that of C/C++. It has no
> notion of types. In particular, given an LLVM aggregate definition:
>
> ```
> %struct.S = type { [42 x i32], i32, i32 }
> ```
>
> It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep
> reference] representing indexing into the [42 x i32] array to load the i32
> member at index 2. It is also valid for %i to be 43 (though not 44 if an
> inbound GEP is used).
> So clang has to give LLVM more information about the C/C++ rules.
>
> *IR representation:*
> LLVM has several ways of representing ranges of values:
> - *!range* metadata can be attached to integer call and load instructions
> to indicate the allowed range of values of the result. LLVM's ValueTracking
> provides a function for querying the range for any llvm::Variable.
> - The *llvm.assume* intrinsic takes a boolean condition that can also be
> used by ValueTracking to infer range of values.
> - The *inrange* attribute of GEP can be used to indicate C-like semantics
> for the structure field marked with the inrange attribute. It can only be
> used for GEP constantexprs (ie.e. GEPs defined inline), but not for
> standalone GEPs defining instructions. relevant discussion
> <https://reviews.llvm.org/D22793?id=65626#inline-194653>.
>
> Alternatives:
> *(1) *Annotate each array subscript index value with a range, e.g.:
> ```
> %i = i64 …
> %ri = call i64 @llvm.annotation.i64(%index), !range !0
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32
> %ri
> ...
> !0 = !{i64 0, i64 42}
> ```
> *(2) *(variant of 1) relax the constraint that !range metadata can only be
> set on call and load instructions, and set the !range metadata on the index
> expression. We still need annotations for function parameters though:
> ```
> %i = i64 … , !range !0
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32
> %i
> ...
> !0 = !{i64 0, i64 42}
> ```
> This is slightly more compact.
>
> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations.
> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this will
> interfere with optimizations.
I would very much like not to introduce another way to encode
assumptions other than `llvm.assume`. If you want to avoid the extra
instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`,
which is in line with our move towards operand bundle use.
SCEV should be thought about this (as well), unsure what the problem
is you describe below. If BasicAA needs to know, sure.
~ Johannes
>
> *On the clang side*:
> The clang part is quite trivial as the infrastructure is already in place
> to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248>
>
> *On the LLVM Side:*
> Alternatives:
> *(A)* - (annotation or assume options) Simply enable scev-aa which knows
> how to handle value ranges in general. IIUC it's not enabled in clang
> because it has issues with invalidation when code changes, and is therefore
> not cacheable. This makes it too slow to be practical.
> *(B) *- (annotation or assume options) Teach BasicAA to honor !range
> metadata (D99247 <https://reviews.llvm.org/D99247>)
> *(C)* - (inrange option) Teach BasicAA to honor inrange attributes of GEP.
>
> I was leaning towards (1) and (B) because:
> - BasicAA already has basic support for value range analysis
> (zero/nonzero), this is a small and natural extension.
> - The BasicAA improvement already benefits some existing code (as
> evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247>)
> - Using range metadata rather than the `inrange` attribute means that
> BasicAA will automatically benefit from improvements in value tracking in
> the future.
>
> Opinions ?
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
SCEV should be thought about this (as well), unsure what the problem
is you describe below. If BasicAA needs to know, sure.
We need to get rid of that assertion. There are other non-attributes
to be used in assume operand bundles in the (near) future, so the this
work has to be done anyway.
>
>> SCEV should be thought about this (as well), unsure what the problem
>> is you describe below. If BasicAA needs to know, sure.
>>
> scev-aa already knows how to use range information. If we add a !range
> metadata in clang right now and use SCEV, there is nothing to do on the
> LLVM side. My point was that scev-aa is not enabled in the default
> pipeline, so we might as well teach BasicAA about this cheap case.
>
> Actually I think it makes sense to teach BasicAA about range information
> anyway (D99247) given that it could already be useful in cases like:
>
> ```
> define dso_local void @DoIt(%struct.Histogram* noalias nocapture sret(
> %struct.Histogram) align 4 %0, i32 %1, *i8 zeroext %2*) local_unnamed_addr
> #0 {
> ...
> *%6 = zext i8 %2 to i64*
> %7 = getelementptr inbounds %struct.Histogram, %struct.Histogram* %0, i64 0
> , i32 0, *i64 %6*
> ```
>
> Where ValueTracking could easily deduce that %6 is in [0,255].
Sounds reasonable.
~ Johannes
On Wed, Mar 24, 2021 at 12:00 PM Nuno Lopes <nunop...@sapo.pt> wrote:In theory, inrange should be perfect for this job. If inrange was extended to non-const geps, I think it would work for all cases you care about, right?
Agreed, but inrange was not extended to non-const geps. I'm not sure exactly why. From what I understand of the previous discussions, there were non-trivial issues with supporting `inrange` for non-const geps, though it's not exactly clear to me what these were. Peter, what was the blocker for that ?
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
It would make users explicit and we will have non-attribute bundles anyway.
I find it also "conceptually nicer", would you prefer explicit instructions?
~ Johannes
> Cheers,
> Florian
In theory, inrange should be perfect for this job. If inrange was extended to non-const geps, I think it would work for all cases you care about, right? It’s also the most compact representation, so I would prefer that solution. Better than adding a new call instruction for every single pointer arithmetic instruction.
In terms of BasicAA, supporting inrange shouldn’t be any different from !range metadata I think.
Roman
On Thu, Mar 25, 2021 at 5:42 PM Clement Courbet via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
>
>
> On Wed, Mar 24, 2021 at 12:00 PM Nuno Lopes <nunop...@sapo.pt> wrote:
>>
>> In theory, inrange should be perfect for this job. If inrange was extended to non-const geps, I think it would work for all cases you care about, right? It’s also the most compact representation, so I would prefer that solution. Better than adding a new call instruction for every single pointer arithmetic instruction.
>>
>> In terms of BasicAA, supporting inrange shouldn’t be any different from !range metadata I think.
>>
>>
>
>
> So I tried implementing this on top of `inrange`, and I found out that you can only have one `inrange` index, which doesn't allow us to represent access into multidimensional arrays : https://reviews.llvm.org/D99341
Are you aware of upcoming Opaque Pointers?
There won't be any multi-dimensional GEP's after that.
there might be some overlap with this proposal:
https://lists.llvm.org/pipermail/llvm-dev/2019-July/134063.html
Michael
Am Do., 25. März 2021 um 08:53 Uhr schrieb Clement Courbet via
llvm-dev <llvm...@lists.llvm.org>:
I think this is only for the internal representation. According to the
reference (https://llvm.org/docs/LangRef.html#id230), it is allowed
for each dimension.
Michael
This seems fine to me, SCEV and others can learn about this.
I assume we'll want the range `llvm.assume` anyway but that doesn't
necessarily preclude this.
> *annotation with range metadata*
> define void @with_annotation(%struct.S* %s, i32 %index) {
> %in_array = call i32 @llvm.annotation.i32(%index), !range !0
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1,
> i32 %in_array
> %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
> ret void
> }
>
> pros:
> - a single added instruction
> - the ValueTracker already knows about range metadata
> - deriving the range in ValueTracker involves a single metadata lookup.
> - The range information will be available for other purposes (e.g. SCEV).
> cons:
> - one extra instruction
> - llvm.annotation is not widely used
This is my least favorite. It introduces a completely new concept
for something that should be attached to the GEP or assume.
>
>
> *assume*
> define void @with_assume(%struct.S* %s, i32 %in_array) {
> %cmp1 = icmp sge i32 %in_array, 0
> %cmp2 = icmp slt i32 %in_array, 2
> %cmp = and i1 %cmp1, %cmp2
> call void @llvm.assume(i1 %cmp)
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1,
> i32 %in_array
> %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
> ret void
> }
>
> pros:
> - The range information will be available for other purposes (e.g. SCEV).
> cons:
> - several extra instructions
> - the ValueTracker cannot currently derive a range from this
> - Deriving the range can be costly
This is the old assume way and OK. I don't think the cost of
looking at assumptions is a good argument because we should
come up with solutions there anyway. The extra instructions
and uses are the real downside here.
>
> *assume with bundle*
> define void @with_assume_bundle(%struct.S* %s, i32 %in_array) {
> call void @llvm.assume(i1 true) ["range"(i32 %in_array, i32 0, i32 2)]
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1,
> i32 %in_array
> %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
> ret void
> }
>
> (Note that this is not currently valid IR, we need to make bundle tags work
> for non-attributes)
> pros:
> - The range information will be available for other purposes (e.g. SCEV).
> - ? (see below)
> cons:
> - one extra instruction
> - ? (see below)
>
> @johannes For the last option, I'm also not sure what becomes of the
> "range" tag in the assume bundle once we remove the assertion ? Does it
> become attached to the value (%in_array), in which case we have the
> advantages of range metadata (option 2), or is this still attached to the
> assume, with the disadvantages of option 3 ?
So, as I said before, we'll have to make bundles work for non-attribute
tags anyway, that should not be a negative point here. The extra instruction
is there but we already know how to combine multiple `llvm.assume`s into
one.
The tag of the bundle describes the kind of assumption. In general, it could
be pretty much anything, the user needs to know how to interpret the
arguments.
All arguments of a bundle are "attached" to the tag, think of the tag as a
function name of an uninterpreted function. The "range" one could look
like this
```
void range(int v, int lb, int ub) {
__builtin_assume(lb <= v && v < ub);
}
```
Later I'm confident we even want such explicit assumption functions, so
you can
write the following, given the "range" definition above is in the module.
`call void @llvm.assume(i1 true) ["assumption_fn"(@range, i32 %in_array,
i32 0, i32 2)]`
For more justification see:
https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
I'm not sure I understand the questions properly. It basically is range
metadata attached
to an arbitrary value for a fixed program point. Could you elaborate on
your comments wrt.
the connection to option 2 and 3, please.
~ Johannes
I don't think this is necessarily accurate. We can, and already do
(https://godbolt.org/z/MaMEb1Koo),
generate bundles from conditions. If we can interpret a condition, why
could we not rewrite it into a bundle?
I'm not sure why this is any different form other normalization we do.
(And bundles have benefits over implicit
instruction encodings, for example use tracking and #instructions.)
> This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handled, which in turn can lead to surprising results (of the form: why does a transformation apply if information provided in a certain way, but does not apply of the equivalent info is provided in a different way).
> Using instruction potentially also allows us to specify more complex ranges, in relation to other values.
I don't see how bundles would restrict us in any way. I mean, if we want
to express property XYZ for %v and %q,
`llvm.assume(i1 true) ["XYZ"(%v, %q)]`, makes it really easy and it is
arguably as generic as you want it to be.
> But I realize that there are some practical consideration that make the instruction approach less appealing and I am all in favor of the more pragmatic & practical solution to start with.
I think bundles, and more generic assumptions, are what we need in the
future. I still believe we should use them
to encode information in assertions [0], among other things, without
running into the risk of having side-effects
that influence the compilation.
~ Johannes
[0] https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
char* test_fill(int size) {
char *test1 = malloc(size)
for (n = 0; n <= size; n++) {
test1[n] = 'A';
}
}
Would it be worth making the "range" information a little richer and
be able to use algebraic expressions as well as numeric ranges.
Note: the above example code has an off by one overflow, and it would
be helpful if one could catch that at compile time.
In this case, it could catch that n must be less than size, and not
less than or equal to size.
Thus putting the range value on the test1 pointer as being from
address of test1 to test1 + (size - 1)
This can only be achieved if algebraic expressions are used for
ranges, and not just constant values.
Actual use cases can get much more complicated with for example,
non-contiguous ranges. e.g. 0,1,4,5 ok, but 2,3,6,7 not ok.
Another useful thing to catch at compile time, would be a warning that
a pointer is being dereferenced, and we were not able to apply a range
expression to it. I.e. warn about unbounded dereferences.
I think it would be useful to at least consider how we would capture
this more complex range information/metadata in LLVM IR.
Kind Regards
James
> >>>> On 3/24/21 9:06 AM, Clement Courbet wrote:
> >>>>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <
> >>>>> johannes...@gmail.com> wrote:
> >>>>>
> >>>>>> I really like encoding more (range) information in the IR,
> >>>>>> more thoughts inlined.
> >>>>>>
> >>>>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:
> >>>>>>> struct Histogram {
> >>>>>>>
> >>>>>>> int values[256];
> >>>>>>>
> >>>>>>> int total;
> >>>>>>>
> >>>>>>> };
> >>>>>>>
> >>>>>>> Histogram DoIt(const int* image, int size) {
> >>>>>>>
> >>>>>>> Histogram histogram;
> >>>>>>>
> >>>>>>> for (int i = 0; i < size; ++i) {
> >>>>>>>
> >>>>>>> ++histogram.values[image[i]]; // (A)
> >>>>>>>
> >>>>>>> ++histogram.total; // (B)
> >>>>>>>
> >>>>>>> }
> >>>>>>>
> >>>>>>> return histogram;
> >>>>>>>
> >>>>>>> }
I think what you want is the max object extend attribute, formerly
known as max object size when we only wanted to track an upper bound
next revision shall also include the lower one:
https://reviews.llvm.org/D87975
If we allow values instead of only constants you can "properly"
generate warnings, using SCEV to determine the range of `n` above.
That said, in operand bundles we can generally allow non-constant
values, e.g., `["range"(%p, i32 0, i32 %N)]`
~ Johannes
On 3/28/21 3:49 AM, James Courtier-Dutton wrote:
> Actual use cases can get much more complicated with for example,
> non-contiguous ranges. e.g. 0,1,4,5 ok, but 2,3,6,7 not ok.
That said, in operand bundles we can generally allow non-constant
values, e.g., `["range"(%p, i32 0, i32 %N)]`
Am Do., 25. März 2021 um 09:42 Uhr schrieb Clement Courbet via
llvm-dev <llvm...@lists.llvm.org>:
> So I tried implementing this on top of `inrange`, and I found out that you can only have one `inrange` index, which doesn't allow us to represent access into multidimensional arrays : https://reviews.llvm.org/D99341
I think this is only for the internal representation. According to the
reference (https://llvm.org/docs/LangRef.html#id230), it is allowed
for each dimension.
On Mar 27, 2021, at 20:37, Johannes Doerfert <johannes...@gmail.com> wrote:On 3/27/21 1:30 PM, Florian Hahn wrote:On Mar 24, 2021, at 19:32, Johannes Doerfert <johannes...@gmail.com> wrote:
On 3/24/21 12:47 PM, Florian Hahn wrote:On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm...@lists.llvm.org> wrote:
One disadvantage of using a bundle (or !range metadata) is that we treat ranges for certain values in a special way and differently to how we treat range information expressed by the user e.g. via conditions (or builtin assume).It would make users explicit and we will have non-attribute bundles anyway.We need to get rid of that assertion. There are other non-attributes+1 on trying to use assume, rather than adding another way.
to be used in assume operand bundles in the (near) future, so the this
work has to be done anyway.
But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?
I find it also "conceptually nicer", would you prefer explicit instructions?
I don't think this is necessarily accurate. We can, and already do (https://godbolt.org/z/MaMEb1Koo),
generate bundles from conditions. If we can interpret a condition, why could we not rewrite it into a bundle?
I'm not sure why this is any different form other normalization we do. (And bundles have benefits over implicit
instruction encodings, for example use tracking and #instructions.)
This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handled, which in turn can lead to surprising results (of the form: why does a transformation apply if information provided in a certain way, but does not apply of the equivalent info is provided in a different way).
Using instruction potentially also allows us to specify more complex ranges, in relation to other values.
I don't see how bundles would restrict us in any way. I mean, if we want to express property XYZ for %v and %q,
`llvm.assume(i1 true) ["XYZ"(%v, %q)]`, makes it really easy and it is arguably as generic as you want it to be.
But I realize that there are some practical consideration that make the instruction approach less appealing and I am all in favor of the more pragmatic & practical solution to start with.
I think bundles, and more generic assumptions, are what we need in the future. I still believe we should use them
to encode information in assertions [0], among other things, without running into the risk of having side-effects
that influence the compilation.
You are right,t sometimes we will need more code to "also" handle the
bundles, especially when the same conditions can occur in regular code
as well.
My point was that we probably want a canonical assumption representation
and bundles generally have more benefits over explicit encodings. This might
require us to teach passes a new encoding but we will then use the new
encoding for all assumptions of a certain kind and start generating
those right
away wherever possible.
>
>>> This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handled, which in turn can lead to surprising results (of the form: why does a transformation apply if information provided in a certain way, but does not apply of the equivalent info is provided in a different way).
>>> Using instruction potentially also allows us to specify more complex ranges, in relation to other values.
>> I don't see how bundles would restrict us in any way. I mean, if we want to express property XYZ for %v and %q,
>> `llvm.assume(i1 true) ["XYZ"(%v, %q)]`, makes it really easy and it is arguably as generic as you want it to be.
>>
> I agree that it is possible to encode more interesting properties with assume bundles, but wouldn’t we end up duplicating all existing compare predicates for example? And for something like %x + %y < %x we would either fall back to instructions again or come up with a way to encode that in bundles as well. If we still use IR instructions for more complex expressions, we’d still need a way to exclude the ‘assume-only’ uses.
For "common assumption" I would strongly suggest "known bundles", e.g.,
for frequent kinds of inequalities maybe.
For "complex assumptions" I would prefer we do outlined assumptions to
deal with the uses problem while also *gaining*
expressiveness. What I mean was described in the email [0] under design
idea 2) and came up a few times since.
In addition to complex instruction based encodings of assumptions it
allows us to deal with calls and side-effects properly.
I would use such an encoding and teach the Attributor to transfer
knowledge about the arguments from the assumption to the
outlined assumption function and back as new "known bundles". So we
normalize and specialize in the outlined function and
transfer back what we know other passes can actually digest.
~ Johannes
[0] https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
>
>>> But I realize that there are some practical consideration that make the instruction approach less appealing and I am all in favor of the more pragmatic & practical solution to start with.
>> I think bundles, and more generic assumptions, are what we need in the future. I still believe we should use them
>> to encode information in assertions [0], among other things, without running into the risk of having side-effects
>> that influence the compilation.
> Again, I am not saying we shouldn’t, just that there are some potential drawbacks in some cases.
>
> Cheers,
> Florian
We can also introduce a new field into
GetElementPtrInst/GetElementPtrConstantExpr. Since it's
variable-sized, use of llvm::TrailingObjects (llvm::User calls it
co-allocation) might be necessary.
On Mar 29, 2021, at 06:43, Clement Courbet via llvm-dev <llvm...@lists.llvm.org> wrote:On Sun, Mar 28, 2021 at 5:44 PM Johannes Doerfert via llvm-dev <llvm...@lists.llvm.org> wrote:
On 3/28/21 3:49 AM, James Courtier-Dutton wrote:
> Actual use cases can get much more complicated with for example,
> non-contiguous ranges. e.g. 0,1,4,5 ok, but 2,3,6,7 not ok.This can actually be represented by !range metadata, but only for constant values.That said, in operand bundles we can generally allow non-constant
values, e.g., `["range"(%p, i32 0, i32 %N)]`That's a good point and it essentially disqualifies !range metadata if we want to one day be able to deal with values..> This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handledI don't think this is really an issue, because whatever the choice of representation is, all these variants should be using `ValueTracking` to abstract the representation anyway.