[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts

569 views
Skip to first unread message

Tobias Grosser

unread,
Sep 12, 2012, 4:38:19 AM9/12/12
to Hal Finkel, llv...@cs.uiuc.edu, preston...@gmail.com, poll...@googlegroups.com
On 09/11/2012 03:38 PM, Hal Finkel wrote:
> This is a general problem that also affects, for example, Preston's
> dependence-analysis work. How have you thought about dealing with this?

Hi Hal,

you are right. Preston's dependence-analysis and Polly are sharing the same
problem. As far as I know, non of us implemented a solution yet.
Preston, what is your status on multi-dim arrays?

As stated earlier the problem is that access functions to multi-dimensional arrays
are linearized at LLVM-IR level. Unfortunately it is a lot harder to reason about
linearized accesses. Hence, we want to recover the multi-dimensional access function.

In LLVM-IR words: When we currently analyze a multi-dimensional
variable length memory access, we calculate a single SCEV expression
which describes the memory access. However, we would prefer to have for
each array dimension a separate SCEV* expression. Those individual expressions
will be a lot easier to analyze.

I think there are two major directions we can go:

1) Represent multi-dimensional access functions in LLVM-IR

Instead of linearizing multi-dimensional array accesses to single
dimensional array accesses, we extend LLVM-IR such that it can express
multi-dimensional accesses and we ensure that front-ends directly
generate LLVM-IR multi-dimensional accesses.

2) Recover multi-dimensionality from linearized accesses

We take a linearized single-dimensional SCEV* expression, apply some
magic and recover both the number and size of the original array
dimensions as well as a set of SCEV* expressions describing the individual
access functions of the original dimensions.

Both approaches have advantages/disadvantages:

Approach 1)
Pro:
- Provides exact information about multi-dimensional arrays,
if the multi-dimensionality is explicit in the source program

Contra:
- We need to explicitly express multi-dimensionality in LLVM-IR
- Front-ends need to explicitly generate the multi-dimensional IR
- Existing passes need to maintain and reason about the new IR
- Only C99 supports variable length arrays. Most existing C/C++
programs have a manual implementation of variable length
arrays, which cannot be automatically translated by the front-end.

Approach 2)
Pro:
- Does not distinguish between manually simulated multi-dimensional
arrays and multi-dimensional arrays that have been explicitly
expressed in the source code.
- Changes are limited to an LLVM-IR analysis

Contra:
- The general problem is difficult
- We may need to add some run-time checks as statically proving
the delinearization may not be entirely possible

I personally would first have a look at approach '2'. Access functions of
multi-dimensional follow often a very regular structure, which appears not
only for accesses generated by compiler front-ends, but also for
manual implementations of multi-dimensional arrays. One common pattern
is that there is a single parameter that represents the size of a
dimension [1]. Meaning SCEV* expressions describing access to an array "A[][m][p]" will contain the sizes '%p' and '%m * %p' more or less explicit.

Here some examples I committed in r163619:

multidim_only_ivs_2d.ll:
; void foo(long n, long m, double A[n][m]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; A[i][j] = 1.0;
; }
; Access function: {{0,+,%m}<%for.i>,+,1}<nw><%for.j>

multidim_only_ivs_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i][j][k] = 1.0;
; }
; Access function: {{{0,+,(%m * %o)}<%for.i>,+,%o}<%for.j>,+,1}<nw><%for.k>

multidim_ivs_and_integer_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i+3][j-4][k+7] = 1.0;
; }
;
; Access function:
; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,
; (8 * %o)}<%for.j>,+,8}<%for.k>

multidim_only_ivs_3d_cast.ll:
; void foo(int n, int m, int o, double A[n][m][o]) {
; for (int i = 0; i < n; i++)
; for (int j = 0; j < m; j++)
; for (int k = 0; k < o; k++)
; A[i][j][k] = 1.0;
; }
;
; Access function:
; {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,
; (8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>

multidim_ivs_and_parameteric_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) {
;
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i+p][j+q][k+r] = 1.0;
; }
;
; Access function:
; {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,
; (8 * %o)}<%for.j>,+,8}<%for.k>

Those examples are obviously not exhaustive, but they already cover many common
cases. Analyzing them seems to be possible. As you may realize the size of the
array dimensions are all explicitly available in the 'step' of the SCEVAddRec
repressions. Even in the last example which contains parameters both for the
sizes and the offsets, the parameters for the sizes are the only ones that
appear on the right hand sides ('8 * %m * %o' and '8 * %o').

I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses
the dimension sizes. After having written that, it should be possible to write
another SCEVIterator that given the dimension sizes can separate the different
dimensions of a SCEV.

I doubt we will always be able to prove that our guess is correct, but we could
add a run-time check to test the conditions that we can not prove statically. This is
also the point, where I think the front-end (or the user with attributes) could help.
Accesses could be annotated with meta-data providing the size of the array dimensions,
such that our analysis can start from this meta-data. This could allow our analysis
to take some short cuts and to avoid some run-time checks.

Does this sound like a reasonable approach? Any ideas or suggestions on that topic?

Cheers
Tobi

[1] FORTRAN seems to express array size as "max(0, %dim_size)"
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Hal Finkel

unread,
Sep 12, 2012, 10:27:00 AM9/12/12
to Armin Größlinger, preston...@gmail.com, llv...@cs.uiuc.edu, poll...@googlegroups.com
On Wed, 12 Sep 2012 14:18:43 +0200
Armin Größlinger <armin.gro...@uni-passau.de> wrote:

> Hi,
>
> I had a few thoughts about recovering multi-dimensional accesses in
> Polly and maybe it's time to share them.
>
> On 09/12/2012 10:38 AM, Tobias Grosser wrote:
> > [...] Even in the last example which contains parameters both for


> > the sizes and the offsets, the parameters for the sizes are the
> > only ones that appear on the right hand sides ('8 * %m * %o' and '8
> > * %o').
>

> I guess this is the key observation. When we have a multi-dimensional
> access, the coefficients of the iterators have to form an increasing
> chain w.r.t. divisibility. For the example


>
> > multidim_ivs_and_integer_offsets_3d.ll:
> > ; void foo(long n, long m, long o, double A[n][m][o]) {
> > ; for (long i = 0; i < n; i++)
> > ; for (long j = 0; j < m; j++)
> > ; for (long k = 0; k < o; k++)
> > ; A[i+3][j-4][k+7] = 1.0;
> > ; }
>

> with access function


>
> > ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m *
> > %o)}<%for.i>,+, ; (8 * %o)}<%for.j>,+,8}<%for.k>
>

> we have (writing the SCEV in ordinary notation and in a suggestive
> order)
>
> 8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56
>
> and we see that the coefficients of the iterators are 8, 8*%o,
> 8*%o*%m and every coefficient in this chain divides it successor. By
> factoring these coefficients out from all the term they divide, we
> find the subscripts for the dimensions:
>
> 8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A
>
> So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access.
> The math tool we'd need to implement this is (multivariate)
> polynomial division or something similar

Agreed. Furthermore, I already have code that does this (I've started
working on a transformation pass which re-factors and optimizes integer
polynomial expressions, so I can repurpose code from there). It sounds
like we just need to extract integer polynomials used by GEPs
containing induction variables, and, order-by-order, factor out the
non-induction variables. That will give us the 'size' of each
dimension. As you've noted below, we need to make sure that no index
exceeds its size, and that all accesses to an array within a given loop
are consistent. Does that sound reasonable?

-Hal

> (coefficients could be
> arbitrary polynomials, e.g., %o+5*%m+7 if somebody declares something
> like double A[][o+5*m+7] or even more complex expressions).
>
> Note that two iterators can have the same coefficients (e.g., for
> A[i+j][.][.]) or one iterator can have two coefficients (e.g.,
> A[k][k][.]). Or a "coefficient" may actually be a constant (e.g.,
> A[42][.][.]) or missing in one access (e.g., A[0][.][.]) so
> determining the coefficients is non-trivial (but not too difficult I
> guess).


>
> > I would guess that we could write a SCEVIterator, that analyzes the
> > SCEVs and guesses the dimension sizes.
>

> I'm not sure if this cane be done on the SCEV directly (instead of
> polynomials) for cases that are more complex than products of
> parameters but I haven't thought about it thoroughly. OTOH, add
> recursions should be of the right structure to check divisibility
> directly; I'm not sure without more thinking...


>
> > After having written that, it should be possible to write
> > another SCEVIterator that given the dimension sizes can separate
> > the different dimensions of a SCEV.
>

> Here we have to be careful because when there are several accesses to
> the same array (base pointer) not only the coefficients (8, 8*%o,
> 8*%o*%m above) have to match but also the range of the subscripts
> must be right (globally over all arrays). For example with two
> accesses
>
> A[i+4][j-4][k-7]
> A[...][i-k][3*j]
>
> we'd need to check that
>
> max{j-4, i-k} - min{j-4, i-k} <= %m - 1
> max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1
>
> so there is no (inconsistent) overrun between the dimensions (%m and
> %o are the factors "between" the dimensions). Note that this also
> ensures that the factors between the dimensions are positive, i.e.,
> no dimension is collapsed.
>
> This check can, of course, be relaxed to checking
>
> 0 <= j-4 <= %m - 1
> 0 <= k-7 <= %o - 1
>
> 0 <= i-k <= %m - 1
> 0 <= 3*j <= %o - 1
>
> (allowing for independent checks for each access) in the assumption
> that subscripts start from 0 but I'm not sure how often this will
> hold on the IR (after loop index and array base pointer shifts during
> optimization).
>
> The problem here is that the factors between the dimensions (%m and
> %o here) could be non-affine.


>
> > I doubt we will always be able to prove that our guess is correct,
> > but we could add a run-time check to test the conditions that we
> > can not prove statically. This is also the point, where I think the
> > front-end (or the user with attributes) could help. Accesses could
> > be annotated with meta-data providing the size of the array
> > dimensions, such that our analysis can start from this meta-data.
> > This could allow our analysis to take some short cuts and to avoid
> > some run-time checks.
>

> Makes sense. I think that it should not be too difficult to implement
> the above approach for affine factors between the dimensions (which
> should cover the common cases) and then we should evaluate how often
> we need run-time checks and whether we need more information from the
> front-ends because other cases occur in practice.
>
> So far my thoughts for now.
>
> Regards,
> Armin

--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Preston Briggs

unread,
Sep 12, 2012, 10:51:22 AM9/12/12
to Hal Finkel, llv...@cs.uiuc.edu, poll...@googlegroups.com, Armin Größlinger
I'm travelling right now, so can't fully participate.
But I'd encourage people to read Maslov's paper

Delinearization: an Efficient Way to Break Multiloop Dependence Equations
Vadim Maslov
PLDI '92

Wolfe's book shows the same approach.
My impression was that he can do most, but not all,
of the delinearization we want. At one point, I though that
he couldn't handle coupled subscripts that had been linearized,
but I'll have to re-create my examples.

To get the best results, I thought we should change the GEP
(or have a fancy version in addition to the current one).

Preston

Renato Golin

unread,
Sep 12, 2012, 10:56:25 AM9/12/12
to Tobias Grosser, preston...@gmail.com, poll...@googlegroups.com, llv...@cs.uiuc.edu
On 12 September 2012 09:38, Tobias Grosser <tob...@grosser.es> wrote:
> I personally would first have a look at approach '2'.

While I normally argue to leave the IR as it is (since it's a compiler
IR, not a magical one), I can see some trends going on that should not
be ignored.

This is one example, where the front-end bends its knees to generate
IR that LLVM understands, so that you can revert it to what should
have been in the first place (to analyse parallelism) and convert it
back again to "correct" IR. As you mentioned, there will be cases that
the analysis won't work, ex. when receiving an array from a function,
it could look like an opaque pointer in some architectures.

Last month, in the Cambridge LLVM Social, David Chisnall asked me
about a builder that would validate procedure call standards depending
on the target, so that the front-end could use that to build the
horrible messes we end up doing in that scenario. Another idea is to
create a pass that will convert from high-level function declaration
to low-level target-dependent declaration during the validation phase,
so the front-end produces "good-looking" calls (with types as-is) and
the pass makes them target-valid.

This technique could also apply to your case. If the front-end
supports multi-dimensional access and produces code conforming to
that, your pass can analyse and vectorize. If not, no worries, you
just ignore them. Which means not all front-ends must change at the
same time to accommodate your analysis.

Later on, a validation pass would transform all multi- to
single-dimensional array access and produce IR that the back-ends can
consume.

It might not be the easiest route, but would avoid some dead-ends down
the road...


--
cheers,
--renato

http://systemcall.org/

Armin Größlinger

unread,
Sep 12, 2012, 8:18:43 AM9/12/12
to poll...@googlegroups.com, preston...@gmail.com, llv...@cs.uiuc.edu
Hi,

I had a few thoughts about recovering multi-dimensional accesses in Polly and
maybe it's time to share them.

On 09/12/2012 10:38 AM, Tobias Grosser wrote:
> [...] Even in the last example which contains parameters both for the
> sizes and the offsets, the parameters for the sizes are the only ones that
> appear on the right hand sides ('8 * %m * %o' and '8 * %o').

I guess this is the key observation. When we have a multi-dimensional access,
the coefficients of the iterators have to form an increasing chain w.r.t.
divisibility. For the example

> multidim_ivs_and_integer_offsets_3d.ll:
> ; void foo(long n, long m, long o, double A[n][m][o]) {
> ; for (long i = 0; i < n; i++)
> ; for (long j = 0; j < m; j++)
> ; for (long k = 0; k < o; k++)
> ; A[i+3][j-4][k+7] = 1.0;
> ; }

with access function

> ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,
> ; (8 * %o)}<%for.j>,+,8}<%for.k>

we have (writing the SCEV in ordinary notation and in a suggestive order)

8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56

and we see that the coefficients of the iterators are 8, 8*%o, 8*%o*%m and
every coefficient in this chain divides it successor. By factoring these
coefficients out from all the term they divide, we find the subscripts for the
dimensions:

8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A

So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access. The math
tool we'd need to implement this is (multivariate) polynomial division or
something similar (coefficients could be arbitrary polynomials, e.g., %o+5*%m+7
if somebody declares something like double A[][o+5*m+7] or even more complex
expressions).

Note that two iterators can have the same coefficients (e.g., for
A[i+j][.][.]) or one iterator can have two coefficients (e.g., A[k][k][.]). Or
a "coefficient" may actually be a constant (e.g., A[42][.][.]) or missing in
one access (e.g., A[0][.][.]) so determining the coefficients is non-trivial
(but not too difficult I guess).

> I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses
> the dimension sizes.

I'm not sure if this cane be done on the SCEV directly (instead of polynomials)
for cases that are more complex than products of parameters but I haven't
thought about it thoroughly. OTOH, add recursions should be of the right
structure to check divisibility directly; I'm not sure without more thinking...

> After having written that, it should be possible to write
> another SCEVIterator that given the dimension sizes can separate the different
> dimensions of a SCEV.

Here we have to be careful because when there are several accesses to the same
array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above) have to
match but also the range of the subscripts must be right (globally over all
arrays). For example with two accesses

A[i+4][j-4][k-7]
A[...][i-k][3*j]

we'd need to check that

max{j-4, i-k} - min{j-4, i-k} <= %m - 1
max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1

so there is no (inconsistent) overrun between the dimensions (%m and %o are the
factors "between" the dimensions). Note that this also ensures that the factors
between the dimensions are positive, i.e., no dimension is collapsed.

This check can, of course, be relaxed to checking

0 <= j-4 <= %m - 1
0 <= k-7 <= %o - 1

0 <= i-k <= %m - 1
0 <= 3*j <= %o - 1

(allowing for independent checks for each access) in the assumption that
subscripts start from 0 but I'm not sure how often this will hold on the IR
(after loop index and array base pointer shifts during optimization).

The problem here is that the factors between the dimensions (%m and %o here)
could be non-affine.

> I doubt we will always be able to prove that our guess is correct, but we could
> add a run-time check to test the conditions that we can not prove statically. This is
> also the point, where I think the front-end (or the user with attributes) could help.
> Accesses could be annotated with meta-data providing the size of the array dimensions,
> such that our analysis can start from this meta-data. This could allow our analysis
> to take some short cuts and to avoid some run-time checks.

Makes sense. I think that it should not be too difficult to implement the above
approach for affine factors between the dimensions (which should cover the
common cases) and then we should evaluate how often we need run-time checks and
whether we need more information from the front-ends because other cases occur
in practice.

So far my thoughts for now.

Regards,
Armin

David Tweed

unread,
Sep 12, 2012, 12:56:59 PM9/12/12
to Armin Größlinger, poll...@googlegroups.com, preston...@gmail.com, llv...@cs.uiuc.edu
| Here we have to be careful because when there are several accesses to the
same
| array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above)
have to
| match but also the range of the subscripts must be right (globally over
all
| arrays). For example with two accesses
|
| A[i+4][j-4][k-7]
| A[...][i-k][3*j]
|
| we'd need to check that
|
| max{j-4, i-k} - min{j-4, i-k} <= %m - 1
| max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1

An interjection from the peanut gallery...

One thing that would be nice would be a way to indicate in LLVM instructions
that _by construction_ a given variable will be a valid index for a given
dimension of a given array. This isn't so much use in C-style languages
where arrays don't have size data canonically associated with them, but for
languages with constructs like (in the simplest case)

for i=0:dimension(A,0): acc+=A[i]

or other more complicated "index generator" expressions like generating
indices from stencils. At the moment (as I understand the LLVM IR) a
language can only specify that an entire linearised access expression is
within the linearised bounds. This would provide a way to hopefully boost
the "safe to apply" rate of LLVM optimisers like polly for languages that
are amenable to such constructs. This presumably requires a way to link
array pointers, dimensions, bounds and indexes so it's not obvious how to do
it within the LLVM context of course... (There's also the issue of exposing
transitivity: the source language may know i is a by construction index into
A, and B is the same shape as A hence i is also guaranteed to be
within-bounds index into B.)

Just a vague thought,
Regards,
Dave

Hal Finkel

unread,
Sep 12, 2012, 6:27:33 PM9/12/12
to Renato Golin, preston...@gmail.com, poll...@googlegroups.com, Tobias Grosser, llv...@cs.uiuc.edu
On Wed, 12 Sep 2012 15:56:25 +0100
Renato Golin <reng...@systemcall.org> wrote:

> On 12 September 2012 09:38, Tobias Grosser <tob...@grosser.es> wrote:
> > I personally would first have a look at approach '2'.
>
> While I normally argue to leave the IR as it is (since it's a compiler
> IR, not a magical one), I can see some trends going on that should not
> be ignored.
>
> This is one example, where the front-end bends its knees to generate
> IR that LLVM understands, so that you can revert it to what should
> have been in the first place (to analyse parallelism) and convert it
> back again to "correct" IR. As you mentioned, there will be cases that
> the analysis won't work, ex. when receiving an array from a function,
> it could look like an opaque pointer in some architectures.

I agree that this seems suboptimal: information known to the frontend
is lost, and then must be guessed by the backend. This might also be a
case where metadata is helpful.

I also think that it is important to keep in mind that this particular
case is one in which guessing is important. This is because there is a
lot of existing C/C++ code which does manual indexing of
multidimensional arrays, and we should try to support iteration-space
transformations involving those arrays as well.

>
> Last month, in the Cambridge LLVM Social, David Chisnall asked me
> about a builder that would validate procedure call standards depending
> on the target, so that the front-end could use that to build the
> horrible messes we end up doing in that scenario. Another idea is to
> create a pass that will convert from high-level function declaration
> to low-level target-dependent declaration during the validation phase,
> so the front-end produces "good-looking" calls (with types as-is) and
> the pass makes them target-valid.
>
> This technique could also apply to your case. If the front-end
> supports multi-dimensional access and produces code conforming to
> that,

Do you mean that there is some kind of canonical form that all of the
frontends will use, and that LLVM provides some kind of utility for
generating this canonical form?

-Hal

> your pass can analyse and vectorize. If not, no worries, you
> just ignore them. Which means not all front-ends must change at the
> same time to accommodate your analysis.
>
> Later on, a validation pass would transform all multi- to
> single-dimensional array access and produce IR that the back-ends can
> consume.
>
> It might not be the easiest route, but would avoid some dead-ends down
> the road...
>
>



--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Wojciech Meyer

unread,
Sep 21, 2012, 7:38:56 PM9/21/12
to llv...@cs.uiuc.edu
This all looks like a common problem of single IR, this will happen in
every compiler that uses single centralised IR. It's usually easier to
describe a problem in terms of higher level representation that is even
tailored for the domain or a specific language. Meta data in IR
workarounds many of these problems, but not all of them.

Some of the compilers do employ this strategy of gradual rewrites, but
not all of them. LLVM IR hits the ultimate sweet spot of being flexible
enough and high level enough for most cases, but this discussion shows
that we either want to grow it up to the mid end or down towards the
machine.

There is an awful lot of different classical IRs: LLVM SSA IR, CPS, C--,
various RTLs etc. The domain specific optimisations can be performed,
and then these IRs can be lowered into something else. In the end the
resulting RTL should be only as abstract to decouple from the machine,
but yet contain enough information for efficient instruction selection.

Sounds wonderful but only in theory!

two cents,

--
Wojciech Meyer
http://danmey.org
Reply all
Reply to author
Forward
0 new messages