[PATCH] D108885: [Delinerization] Keep array element byte offset.

7 views
Skip to first unread message

Michael Kruse via Phabricator

unread,
Aug 29, 2021, 12:34:46 AM8/29/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur created this revision.
Meinersbur added reviewers: Whitney, bmahjour, sebpop, grosser.
Herald added a reviewer: bollu.
Herald added subscribers: javed.absar, hiraditya.
Meinersbur requested review of this revision.
Herald added a project: LLVM.

The offset into the array element cannot just be discarded. In most cases it will be zero, but like any of the detected subscript any memory access must be checked whether it is contained in the byte range of the memory access (or aliases with neighboring array elements). This is because delinearization is just a heuristic, there is not guarantee that the byte offset is a constant, is non-negative, is less than the array element size, or the memory access is entirely contained within the array element (delinerization does not even know the access size).

Fix by removing the special handling of the least significant dimension in ScalarEvolution::computeAccessFunctions. The makes the returned Subscripts array one larger than the Sizes array. This actually would be expected, a subscript each size plus one subscript for the division remainder representing the outermost dimension of unknown size.

This bug caused Polly to miscompile blender (`526.blender_r` from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case `byte_offset.ll`. The dropped offset into the array element of size 4 (a float) is `((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>)`. This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts.

This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image.

| Bug | Reference |
| --- | --------- |
| F18719158: sh5_reduced_0234.org-bug.png <https://reviews.llvm.org/F18719158> | F18719160: sh5_reduced_0234.org-ref.png <https://reviews.llvm.org/F18719160> |


Repository:
rG LLVM Github Monorepo

https://reviews.llvm.org/D108885

Files:
llvm/include/llvm/Analysis/ScalarEvolution.h
llvm/lib/Analysis/Delinearization.cpp
llvm/lib/Analysis/DependenceAnalysis.cpp
llvm/lib/Analysis/LoopCacheAnalysis.cpp
llvm/lib/Analysis/ScalarEvolution.cpp
llvm/test/Analysis/Delinearization/a.ll
llvm/test/Analysis/Delinearization/byte_offset.ll
llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
llvm/test/Analysis/Delinearization/divide_by_one.ll
llvm/test/Analysis/Delinearization/himeno_1.ll
llvm/test/Analysis/Delinearization/himeno_2.ll
llvm/test/Analysis/Delinearization/iv_times_constant_in_subscript.ll
llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll
llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_nts_3d.ll
llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll
llvm/test/Analysis/Delinearization/multidim_only_ivs_2d.ll
llvm/test/Analysis/Delinearization/multidim_only_ivs_3d.ll
llvm/test/Analysis/Delinearization/multidim_only_ivs_3d_cast.ll
llvm/test/Analysis/Delinearization/parameter_addrec_product.ll
polly/lib/Analysis/ScopDetection.cpp
polly/test/ScopDetect/array_elt_byte_offset.ll

D108885.369298.patch

Michael Kruse via Phabricator

unread,
Aug 29, 2021, 12:43:28 AM8/29/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur updated this revision to Diff 369304.
Meinersbur added a comment.

[Polly] Also require offset to be zero.


Repository:
rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108885/new/
D108885.369304.patch

Bardia Mahjour via Phabricator

unread,
Aug 31, 2021, 5:18:59 PM8/31/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added a comment.

> The makes the returned Subscripts array one larger than the Sizes array. This actually would be expected, a subscript each size plus one subscript for the division remainder representing the outermost dimension of unknown size.

The Size array includes the element size, so it already makes up for the lack of a value for the outermost dimension.

Instead of treating the element offset as a subscript, could we make `computeAccessFunctions` return the offset (if non-zero) in a separate output parameter?
Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.



================
Comment at: llvm/include/llvm/Analysis/ScalarEvolution.h:1201
///
- /// Overall, we have: A[][n][m], and the access function: A[j+k][2i][5i].
+ /// Overall, for the address of a memory access, we have: A[][n][m][4], and
+ /// the access function: A[j+k][2i][5i][0] where [4] is the array element size
----------------
address of a memory access -> address of the above memory access


================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:3454
// Fail when there is only a subscript: that's a linearized access function.
- if (SrcSubscripts.size() < 2 || DstSubscripts.size() < 2 ||
+ if (SrcSubscripts.size() <= 2 || DstSubscripts.size() <= 2 ||
SrcSubscripts.size() != DstSubscripts.size())
----------------
I'm worried that having an extra subscript could confuse the analysis. Each separable subscript typically corresponds to a distinct loop level (unless the subscript is ZIV), but if we include the element offset as a "subscript" it would normally not have any corresponding loop levels.


================
Comment at: llvm/test/Analysis/Delinearization/byte_offset.ll:10
+; CHECK: ArrayRef[0][{(6 + (4 * (sext i16 %i401 to i64))<nsw>)<nsw>,+,1}<%for.body.i4567>][((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>)]
+
+%struct.S1 = type { %struct.S1*, i8*, i16, i16, i16, i16 }
----------------
could we have the psudo-c for this IR added here as a comment?

Michael Kruse via Phabricator

unread,
Sep 1, 2021, 12:29:40 AM9/1/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur updated this revision to Diff 369840.
Meinersbur marked 2 inline comments as done.
Meinersbur added a comment.

- Address review


Repository:
rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108885/new/

https://reviews.llvm.org/D108885

D108885.369840.patch

Michael Kruse via Phabricator

unread,
Sep 1, 2021, 12:29:45 AM9/1/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur added a comment.

In D108885#2975622 <https://reviews.llvm.org/D108885#2975622>, @bmahjour wrote:

> The Size array includes the element size, so it already makes up for the lack of a value for the outermost dimension.

That's the issue. It is contained in the Size array, but its subscript (which ideally would be between 0 <= subscript < EltSize), unlike for any other size, is dropped.

> Instead of treating the element offset as a subscript, could we make `computeAccessFunctions` return the offset (if non-zero) in a separate output parameter?

This would make handling of the element offset more complicated and error prone, while there is no inherent difference between a array dimension of constant size and the byte offset. As illustrated by this bug and the fact that DependenceAnalysis's subscript checking also just works for the array element offset.

> Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.

Would be possibility, and was already done by computeAccessFunctions if the outermost operation was a SCEVAddRecExpr justified by being "too complicated". How can it be "too complicated" if it is just forgotten about?
It would also make make it impossible do use delinearization for array-of-structs. I would leave that decision what to support to the caller. At least for Polly I was considering supporting array-of-structs.



================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:3454
// Fail when there is only a subscript: that's a linearized access function.
- if (SrcSubscripts.size() < 2 || DstSubscripts.size() < 2 ||
+ if (SrcSubscripts.size() <= 2 || DstSubscripts.size() <= 2 ||
SrcSubscripts.size() != DstSubscripts.size())
----------------
bmahjour wrote:
> I'm worried that having an extra subscript could confuse the analysis. Each separable subscript typically corresponds to a distinct loop level (unless the subscript is ZIV), but if we include the element offset as a "subscript" it would normally not have any corresponding loop levels.
Changed this to pop the last element before continuing unless `isZero()` is false and then bail out, like already done for Polly and LoopCacheAnalysis. We can check later how DA had to be changed to support this. At least the DelinearizationChecks below would just work and I would assume that DA works find as long as the accessed memory falls completely into the elements byte change, which is verified by the Delinearization check.


================
Comment at: llvm/test/Analysis/Delinearization/byte_offset.ll:10
+; CHECK: ArrayRef[0][{(6 + (4 * (sext i16 %i401 to i64))<nsw>)<nsw>,+,1}<%for.body.i4567>][((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>)]
+
+%struct.S1 = type { %struct.S1*, i8*, i16, i16, i16, i16 }
----------------
bmahjour wrote:
> could we have the psudo-c for this IR added here as a comment?
Not sure how useful you find these manually-decompiled IR that come from llvm-reduce, but here it is.

You can also find the origin here: https://github.com/blender/blender/blob/765b842f9520843183bf0a3cdcd071f152bbbf9e/source/blender/blenkernel/intern/CCGSubSurf.c#L2131

Bardia Mahjour via Phabricator

unread,
Sep 1, 2021, 1:13:43 PM9/1/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added a comment.

In D108885#2976089 <https://reviews.llvm.org/D108885#2976089>, @Meinersbur wrote:

> In D108885#2975622 <https://reviews.llvm.org/D108885#2975622>, @bmahjour wrote:
>
>> Instead of treating the element offset as a subscript, could we make `computeAccessFunctions` return the offset (if non-zero) in a separate output parameter?
>
> This would make handling of the element offset more complicated and error prone, while there is no inherent difference between a array dimension of constant size and the byte offset. As illustrated by this bug and the fact that DependenceAnalysis's subscript checking also just works for the array element offset.

The way "element offset" is being handled in this patch is to isolate it (from the rest of the subscripts) and ignore it everywhere except for `ScopDetection.cpp` where it is still isolated but used to decide if the access is affine. If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated. The original goal of the delinearization algorithm is to recover source level subscripts. While the "element offset" can be thought of as a byte index into an imaginary inner-most dimension, it is not something that corresponds to a source level array subscript.

>> Or maybe we should consider delinearization unsuccessful when there is a non-zero remainder after the last real subscript is already computed.
>
> Would be possibility, and was already done by computeAccessFunctions if the outermost operation was a SCEVAddRecExpr justified by being "too complicated". How can it be "too complicated" if it is just forgotten about?
> It would also make make it impossible do use delinearization for array-of-structs. I would leave that decision what to support to the caller. At least for Polly I was considering supporting array-of-structs.

Yeah, I'm not sure why it was considered "too complicated" only when the remainder was an AddRec. The paper <https://dl.acm.org/doi/10.1145/2751205.2751248> says that the original polynomial expression (representing the linearized access function) is first divided by the element size, but they don't say what it means if there is a remainder and what to do with it. I would assume that having an access function that doesn't evenly divide the element size cannot be safely delinearized.

I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:

> cat delin_struct.c
struct MyS {
float a, b;
};

void foo(int n, int m, struct MyS f1[][n][m]) {
for (int i = 0; i < 1024; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < m; k++)
f1[i][j][k].a = f1[i][j][k].b;
}

opt -passes='print<delinearization>' -disable-output delin_struct.simp.ll 2>&1 | grep ArrayRef
ArrayRef[{0,+,2}<%for.body>][{0,+,2}<%for.body4>][{1,+,2}<%for.body8>][0]
ArrayRef[{0,+,2}<%for.body>][{2,+,2}<%for.body4>][-1][0]
ArrayRef[0][{(-1 + (2 * (zext i32 %n to i64) * (zext i32 %m to i64))),+,(2 * (zext i32 %n to i64) * (zext i32 %m to i64))}<%for.body>][0]
ArrayRef[{0,+,2}<%for.body>][{0,+,2}<%for.body4>][{0,+,2}<%for.body8>][0]
ArrayRef[{0,+,2}<%for.body>][{2,+,2}<%for.body4>][-2][0]
ArrayRef[0][{(-2 + (2 * (zext i32 %n to i64) * (zext i32 %m to i64))),+,(2 * (zext i32 %n to i64) * (zext i32 %m to i64))}<%for.body>][0]

Note that the subscript in the last dimension (assumed byte array) is 0! IR attached: F18788796: delin_struct.simp.ll <https://reviews.llvm.org/F18788796>



================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:3453

+ // computeAccessFunctions also returns the by offset into the array element,
+ // which DependenceAnalysis does not handle. Pop that last subscript and bail
----------------
[nit] the by offset -> the byte offset


================
Comment at: llvm/test/Analysis/Delinearization/byte_offset.ll:10
+; CHECK: ArrayRef[0][{(6 + (4 * (sext i16 %i401 to i64))<nsw>)<nsw>,+,1}<%for.body.i4567>][((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>)]
+
+%struct.S1 = type { %struct.S1*, i8*, i16, i16, i16, i16 }
----------------
Meinersbur wrote:
> bmahjour wrote:
> > could we have the psudo-c for this IR added here as a comment?
> Not sure how useful you find these manually-decompiled IR that come from llvm-reduce, but here it is.
>
> You can also find the origin here: https://github.com/blender/blender/blob/765b842f9520843183bf0a3cdcd071f152bbbf9e/source/blender/blenkernel/intern/CCGSubSurf.c#L2131
>
I haven't looked at the original example yet, but this looks like a case where delinearization should be expected to fail. The recovered subscripts above don't look anything like what the source level subscripts are.


================
Comment at: polly/lib/Analysis/ScopDetection.cpp:1029
+ const SCEV *AccOffset = Acc->DelinearizedSubscripts.pop_back_val();
+ const SCEV *ArrSize = Shape->DelinearizedSizes.back();
+ const SCEV *AccSize = Context.ElementSize[BasePointer];
----------------
[nit] `ElementSize` may be a better name than `ArrSize`?

Michael Kruse via Phabricator

unread,
Sep 2, 2021, 2:04:59 AM9/2/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur added a comment.

In D108885#2977230 <https://reviews.llvm.org/D108885#2977230>, @bmahjour wrote:

> If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated.

It only is handled differently because the legacy code does not expect an additional parameter. However, fixing the bug a higher priority and the individual passes could be improved making proper uses of the returned subscript at a later point.

> The original goal of the delinearization algorithm is to recover source level subscripts.

[citation needed]

If this was true than any delinearization that does not correspond to the original source subscript needed to be considered wrong. However, I am pretty sure we also want to delinearize `A[i +j*n]` even though the source code subscript has been linearized.

> I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:

It is delinearized as array element `sizeof(float)`, not `sizeof(MyS)` with one dimension twice as large and SCEVAddRecExpr starting with 1 instead of 0. This has to do with how ScalarEvolution tries to remove the struct index from the GEP, modeling it as a simple addition.

Use different elements of different sizes in the struct:

struct __attribute__((packed)) Pair { int x; long y; };
void foo(unsigned long N, struct Pair A[][N]) {
for (unsigned long i = 0; i < N; i+=1)
A[i][i].y = 0;
}



name=opt -delinearize
AccessFunction: {4,+,(12 + (12 * %N))}<%for.cond>
Base offset: %A
ArrayDecl[UnknownSize][%N][8]
ArrayRef[0][{0,+,1}<%for.cond>][{4,+,(4 + (12 * %N))}<%for.cond>]

This is obviously more complicated than it needs to be. Specifically, it still considers element size to be `sizeof(long)`, because that's the access it sees. If accessing `x` instead of `y` the element size would be 4. Polly tries to combine the shapes from multiple delinearization results into a common one such that subscripts are comparable.

Yes, with the current implementation of delinearization, I don't think there are a lot of cases where the byte offset subscript would improve the dependence analysis. Still, I think the right API that allows improving the delinerization includes the byte offset.

Having derived this example, it is much better understandable than the regression test I got from llvm-reduce, I could replace it.

Bardia Mahjour via Phabricator

unread,
Sep 8, 2021, 2:20:47 PM9/8/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added a comment.
Thanks for the example. It does make it much easier to understand the problem. I also have concerns about the current implementation not being able to handle arrays of structures. To understand this better, I derived an example based on your example above:

struct __attribute__((packed)) MyS {
float a;
double b;
};

void foo(long long n, long long m, struct MyS f1[][n][m]) {
for (int i = 0; i < 1024; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < m; k++)
f1[i][j][k].b = f1[i-1][j][k].b;
}

Ideally we would want delinearization to recover the subscripts for the load to be `[i-1][j][k]` and the subscripts for the store to be `[i][j][k]`, so that the dependence analysis can produce `[-1, 0, 0]`. With the changes in this patch we get the following delinearization for the load and stores respectively:

Inst: %4 = load double, double* %b, align 1, !tbaa !3
In Loop with Header: for.body11
AccessFunction: {{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
Base offset: %f1
ArrayDecl[UnknownSize][%n][%m][8]
ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]



Inst: store double %4, double* %b22, align 1, !tbaa !3
In Loop with Header: for.body11
AccessFunction: {{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
Base offset: %f1
ArrayDecl[UnknownSize][%n][%m][8]
ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]

Even though delinearization claims to be successful most of the complexity of the original access function has now moved to the inner most dimension, leaving the outer subscripts as zeros. This won't help dependence analysis in any way, as it now has to further analyze the complicated subscript for the synthesized inner most dimension. It's not clear to me how that can be done.

Note that if the element size passed to the delinearize function was chosen to be 12 (the true element size of the array), then delinearization would have been able to recover more meaningful subscripts for the outer dimensions without requiring a "byte offset". I wonder if we can improve the results for structure of arrays by choosing a better element size.

Bardia Mahjour via Phabricator

unread,
Sep 8, 2021, 4:08:52 PM9/8/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added a comment.

> Note that if the element size passed to the delinearize function was chosen to be 12 (the true element size of the array), then delinearization would have been able to recover more meaningful subscripts for the outer dimensions without requiring a "byte offset". I wonder if we can improve the results for structure of arrays by choosing a better element size.

Correction: the access functions in my example do not divide 12, so choosing the "true element size" doesn't fix it :(

...but since delinearization is a heuristic, maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with `-da-disable-delinearization-checks`):

ArrayDecl[UnknownSize][%n][%m][1]
ArrayRef[{-12,+,12}<%for.body>][{0,+,12}<%for.body5>][{4,+,12}<%for.body11>][0]
...
ArrayRef[{0,+,12}<%for.body>][{0,+,12}<%for.body5>][{4,+,12}<%for.body11>][0]



Src: %4 = load double, double* %b, align 1, !tbaa !3 --> Dst: store double %4, double* %b22, align 1, !tbaa !3
da analyze - consistent anti [-1 0 0]!

Michael Kruse via Phabricator

unread,
Sep 9, 2021, 1:53:49 PM9/9/21
to whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
Meinersbur added a comment.

> I also have concerns about the current implementation not being able to handle arrays of structures.

D109527 <https://reviews.llvm.org/D109527>

> maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with -da-disable-delinearization-checks)

A dimension of size 1 can be omitted since its only valid with a subscript of 0 and multiplies by 1. What makes delinearization useful is that allowing to assume that memory accesses only alias if all subscripts are equal (and the arrays themselves dot overlap)[*]. For this to work the subscripts must be between 0 and the dimension size. `-da-disable-delinearization-checks` skips this check and therefore may cause miscompilation.

Bardia Mahjour via Phabricator

unread,
Sep 9, 2021, 6:27:49 PM9/9/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added a comment.

In D108885#2992237 <https://reviews.llvm.org/D108885#2992237>, @Meinersbur wrote:

>> I also have concerns about the current implementation not being able to handle arrays of structures.
>
> D109527 <https://reviews.llvm.org/D109527>

Thanks for that. Looks like using the "true element size" can significantly simplify the remainder expression for the added byte dimension. We may be converging on a solution here.

>> maybe we can heuristically pick something that does divide it evenly (eg a GCD or just choosing 1). If we choose 1, DA is able to produce expected result (with -da-disable-delinearization-checks)
>
> A dimension of size 1 can be omitted since its only valid with a subscript of 0 and multiplies by 1. What makes delinearization useful is that allowing to assume that memory accesses only alias if all subscripts are equal (and the arrays themselves dot overlap)[*]. For this to work the subscripts must be between 0 and the dimension size. `-da-disable-delinearization-checks` skips this check and therefore may cause miscompilation.

Rather than viewing it as an extra dimension I simply view it as the size of the structure member. From the point of view of dependence analysis (after aliasing properties are computed) the dependence between `A[i][j][k].b` and `A[i-1][j][k].b` should yield the same results regardless of the size of `b` (be it 1-byte or otherwise). I used the `-da-disable-delinearization-checks` to illustrate the idea, but it's possible that using this scheme requires different validity checks.

Bardia Mahjour via Phabricator

unread,
Sep 9, 2021, 6:45:59 PM9/9/21
to ll...@meinersbur.de, whitney....@gmail.com, bmah...@ca.ibm.com, seb...@gmail.com, phabr...@grosser.es, siddu...@gmail.com, poll...@googlegroups.com, jav...@graphcore.ai, llvm-c...@lists.llvm.org, bhuvanend...@amd.com, yanli...@intel.com, doug...@gmail.com, ju...@samsung.com, floria...@apple.com, p8u8i7l5...@ibm-systems-z.slack.com, ruilin...@amd.com, artem.rad...@intel.com
bmahjour added inline comments.


================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:3458-3467
+ if (!SrcSubscripts.empty()) {
+ const SCEV *EltOffset = SrcSubscripts.pop_back_val();
+ if (!EltOffset->isZero())
+ return false;
+ }
+ if (!DstSubscripts.empty()) {
+ const SCEV *EltOffset = DstSubscripts.pop_back_val();
----------------
I think we should care more about the difference between the byte offset of the source and destination, than its actual value. If the byte offsets are equal, then the rest of the subscripts should be analyzed. If they are different, then we don't know how to handle them yet.

suggestion: replace it with the following and move it after checking that the subscript arrays are equal in size and contain at least 2 elements each (line 3457 of the base).

```
const SCEV *EltOffsetSrc = SrcSubscripts.pop_back_val();
const SCEV *EltOffsetDst = DstSubscripts.pop_back_val();
if (EltOffsetSrc != EltOffsetDst)
return false;
```
Reply all
Reply to author
Forward
0 new messages