AST generation: The simplifying will erase for loops with range [0,1).

80 views
Skip to first unread message

李佳琦

unread,
Sep 27, 2022, 3:28:04 AM9/27/22
to isl Development
I had a problem about ast generation: The simplifying will erase for loops with range [0,1).
And actually I don't want it to be simplified.

For example:
Forloops like  `for (int i = 0; i <= 0; i += 1)` will be erased

Eliminating the for loop causes problems because we need to preserve the structure of the completed for loop.
I wonder if there is a way to tell ISL not to simplify forloops in the range[0,1] scenario.

Sven Verdoolaege

unread,
Sep 27, 2022, 4:50:37 PM9/27/22
to 李佳琦, isl Development
On Tue, Sep 27, 2022 at 12:28:04AM -0700, 李佳琦 wrote:
> I had a problem about ast generation: The simplifying will erase for loops
> with range [0,1).
> And actually I don't want it to be simplified.
>
> For example:
> Forloops like `for (int i = 0; i <= 0; i += 1)` will be erased
>
> Eliminating the for loop causes problems because we need to preserve the
> structure of the completed for loop.

If you want to preserve the structure of the loop
then why are you doing AST generation?

You'll have to explain in a bit more detail what
it is you are trying to achieve,
before I can try and offer some suggestions.

> I wonder if there is a way to tell ISL not to simplify forloops in the
> range[0,1] scenario.

If you are so sensitive to the structure of the loops,
then maybe you shouldn't be doing any AST generation,
at least not using isl.
The schedule only prescribes a relative order and
isl exploits this in various ways.

skimo

Steve Luo

unread,
Dec 22, 2022, 11:38:16 AM12/22/22
to isl Development
I have two similar problems, and hope you can give me some hints. I'm using isl to generate code for gpu(from schedule tree to ast using isl_ast_build_node_from_schedule_map), and take matrix multiply as an example, the original inputs looks like below

for (int i = 0; i < 128; i++)
    for (int j = 0; j < 32; j++)
        for (int k = 0; k < 32; k++)
            C[i][j] += A[i][k] * B[j][k]

I will first do tiling: split i with 64 to (i0, i1), j with 32 to (j0, j1) and k with 32 to (k0, k1). And it will look like

for (int i0 = 0; i0 < 128/64; i0++)
    for (int j0 = 0; j0 < 32/32; j0++)
        for (int k0 = 0; k0 < 32/32; k0++)
            for (int i1 = 0; i1 < 64; i1++)
                for (int j1 = 0; j1 < 32; j1++)
                    for (int k1 = 0; k1 < 32; k1++)
                        C[i][j] += A[i][k] * B[j][k]

And bind the outer loop (i0, j0) to gpu's thread block (blockIdx.y, blockIdx.x). 

Since j0's extent is 1, the output ast will not contain blockIdx.x, and since k0's extent is 1, it will be erased directly. I'm wondering if there exists any options to keep these two things in the generated ast? if not exist, what's the simplest way to adapt isl to support it.

Thanks for your reply

Sven Verdoolaege

unread,
Dec 22, 2022, 1:36:40 PM12/22/22
to Steve Luo, isl Development
On Thu, Dec 22, 2022 at 08:16:38AM -0800, Steve Luo wrote:
> I have two similar problems, and hope you can give me some hints. I'm using
> isl to generate code for gpu(from schedule tree to ast using
> isl_ast_build_node_from_schedule_map), and take matrix multiply as an
> example, the original inputs looks like below
>
> for (int i = 0; i < 128; i++)
> for (int j = 0; j < 32; j++)
> for (int k = 0; k < 32; k++)
> C[i][j] += A[i][k] * B[j][k]
>
> I will first do tiling: split i with 64 to (i0, i1), j with 32 to (j0, j1)
> and k with 32 to (k0, k1). And it will look like
>
> for (int i0 = 0; i0 < 128/64; i0++)
> for (int j0 = 0; j0 < 32/32; j0++)
> for (int k0 = 0; k0 < 32/32; k0++)
> for (int i1 = 0; i1 < 64; i1++)
> for (int j1 = 0; j1 < 32; j1++)
> for (int k1 = 0; k1 < 32; k1++)
> C[i][j] += A[i][k] * B[j][k]
>
> And bind the outer loop (i0, j0) to gpu's thread block (blockIdx.y,
> blockIdx.x).

You should bind to block ids _before_ generating an AST.

> Since j0's extent is 1, the output ast will not contain blockIdx.x, and
> since k0's extent is 1, it will be erased directly. I'm wondering if there
> exists any options to keep these two things in the generated ast? if not
> exist, what's the simplest way to adapt isl to support it.

It's probably best if you explain why you need the k0-loop
to be preserved and then maybe I can help you solve
that problem instead.

skimo

Steve Luo

unread,
Dec 22, 2022, 10:22:21 PM12/22/22
to isl Development
Sorry, maybe I do not clarify my process clearly. Surely tiling and binding are all processed in schedule tree before ast generation. And the for loops above are just the corresponding ast of the schedule tree (that I want) after each operation. I need blockIdx.x exist even though it's gridDim.x = 1, because I want to reuse the generated code for other workloads like

for (int i = 0; i < 128; i++)
    for (int j = 0; j < 64; j++)
        for (int k = 0; k < 32; k++)
            C[i][j] += A[i][k] * B[j][k]

here j changes from 32 to 64, but if it uses the same tiling size, it will share the generated code same with (128, 32, 32)'s, and all I need is to configure a different launch parameters(change gridDim.x) without compiling the workload again. And to keep k0-loop is for the same reason. If I can preserve the k0-loop even though it's extent is 1, then I can adjust the k0-loop's extent to K/32(outside of isl) as below

for (int i0 = 0; i0 < 128/64; i0++)
    for (int j0 = 0; j0 < 32/32; j0++)
        for (int k0 = 0; k0 < K/32; k0++)
            for (int i1 = 0; i1 < 64; i1++)
                for (int j1 = 0; j1 < 32; j1++)
                    for (int k1 = 0; k1 < 32; k1++)
                        C[i][j] += A[i][k] * B[j][k]

then this generated code can be shared with (128, 32, 32*i)

Sven Verdoolaege

unread,
Dec 23, 2022, 5:46:21 AM12/23/22
to Steve Luo, isl Development
On Thu, Dec 22, 2022 at 07:22:20PM -0800, Steve Luo wrote:
> Sorry, maybe I do not clarify my process clearly. Surely tiling and binding
> are all processed in schedule tree before ast generation.

If you had done the binding before the AST generation,
the AST would look something like this:

if (BX == 0 && BY >= 0 && BY <= 1)
for (int c3 = 0; c3 <= 63; c3 += 1)
for (int c4 = 0; c4 <= 31; c4 += 1)
for (int c5 = 0; c5 <= 31; c5 += 1)
S(64 * BY + c3, c4, c5);

and it would already have blockIdx.x (BX) exactly where it is needed
(I'm assuming here that you want to map the outer dimensions to
blockIdx.y and blockIdx.x in that order).

> And the for loops
> above are just the corresponding ast of the schedule tree (that I want)
> after each operation. I need blockIdx.x exist even though it's gridDim.x =
> 1, because I want to reuse the generated code for other workloads like
>
> for (int i = 0; i < 128; i++)
> for (int j = 0; j < 64; j++)
> for (int k = 0; k < 32; k++)
> C[i][j] += A[i][k] * B[j][k]

Then you should generate an AST for the generic case

for (int i = 0; i < 128; i++)
for (int j = 0; j < J; j++)
for (int k = 0; k < 32; k++)
C[i][j] += A[i][k] * B[j][k]

and you get something like

if (BX >= 0 && BX <= 31 && BY >= 0 && BY <= 1)
for (int c1 = BX; c1 <= floord(J - 1, 32); c1 += 32)
for (int c3 = 0; c3 <= 63; c3 += 1)
for (int c4 = 0; c4 <= min(31, J - 32 * c1 - 1); c4 += 1)
for (int c5 = 0; c5 <= 31; c5 += 1)
S(64 * BY + c3, 32 * c1 + c4, c5);

and then you can plug in specific values of J afterwards.

> here j changes from 32 to 64, but if it uses the same tiling size, it will
> share the generated code same with (128, 32, 32)'s, and all I need is to
> configure a different launch parameters(change gridDim.x) without compiling
> the workload again. And to keep k0-loop is for the same reason. If I can
> preserve the k0-loop even though it's extent is 1, then I can adjust the
> k0-loop's extent to K/32(*outside of isl*) as below

You fundamentally cannot first generate code for one specific value of J and
then hope to be able to generalize this to other values of J.
It may look like it could be done in very specific cases,
but you'll run into all sorts of issues very quickly. You already have.
isl will exploit the specific value of J in ways that you may not have imagined.
isl doen't even know about the concept of any J parameter since you've
only told it about a specific value (32).

skimo
Reply all
Reply to author
Forward
0 new messages