Generating "backwar" loops

87 views
Skip to first unread message

Emil Vatai

unread,
Sep 8, 2022, 11:42:14 PM9/8/22
to isl Development
If I use PET to read a function with a loop like this:
```
  for (int i = n - 1; i >= 0; i--) {
    a[i] = data[i] * data[i];
  }
```
and then use `isl_ast_build_node_from_schedule()` and `isl_ast_node_to_C_str(ast)` I get output like this:

```
for (int c0 = -n + 1; c0 <= 0; c0 += 1)
  S_1(-c0); 
```
that is, the loop is still going "forward" (and the negative iterator is used in S_1). How can I change this? I.e. can ISL generate a loop like the first one, with `i--` (or equivalent `i -=1`)?

Additionally, another question I'm looking into is how to actually generate compilable code instead of `S_1` like statements.

Best,
Emil

Sven Verdoolaege

unread,
Sep 14, 2022, 4:33:01 PM9/14/22
to Emil Vatai, isl Development
On Thu, Sep 08, 2022 at 08:42:14PM -0700, Emil Vatai wrote:
> If I use PET to read a function with a loop like this:
> ```
> for (int i = n - 1; i >= 0; i--) {
> a[i] = data[i] * data[i];
> }
> ```
> and then use `isl_ast_build_node_from_schedule()` and
> `isl_ast_node_to_C_str(ast)` I get output like this:
>
> ```
> for (int c0 = -n + 1; c0 <= 0; c0 += 1)
> S_1(-c0);
> ```
> that is, the loop is still going "forward" (and the negative iterator is
> used in S_1). How can I change this? I.e. can ISL generate a loop like the
> first one, with `i--` (or equivalent `i -=1`)?

I don't think so.

> Additionally, another question I'm looking into is how to actually generate
> compilable code instead of `S_1` like statements.

If you're using isl_ast_node_print,
then you can use isl_ast_print_options_set_print_user to set a function
to print the actual statement.
If you want to print a statement parsed by pet, then it's a bit involved.
See attachment for an example.

skimo
test.c

Emil Vatai

unread,
Sep 14, 2022, 7:25:56 PM9/14/22
to isl Development
Thanks!


> I don't think so.

Would this (i.e. the impossibility reverse `i--` loop generation) be due to the part of the definition of polyhedral model about visiting each element in *increasing* lexicographic order? (this is more or less a "shower thought", i.e. no need to answer:)

Sven Verdoolaege

unread,
Sep 17, 2022, 12:21:48 PM9/17/22
to Emil Vatai, isl Development
The schedule just determines the relative order of
the statement instances.
The AST generator is free to generate the code in any form
that respects that relative order.
So, for example, the AST generator could iterate down over
the opposite of the schedule dimension.
It's just that the isl AST generator never does this.

skimo

Sven Verdoolaege

unread,
Jan 2, 2023, 4:13:21 PM1/2/23
to Emil Vatai, isl Development
On Wed, Sep 14, 2022 at 10:32:59PM +0200, Sven Verdoolaege wrote:
> If you want to print a statement parsed by pet, then it's a bit involved.
> See attachment for an example.

I put a cleaned-up and documented version of the example in the git repo.

skimo

Emil Vatai

unread,
Aug 2, 2023, 11:08:48 PM8/2/23
to sven.ver...@gmail.com, isl Development
Hola,

Sorry for necrobumping this. 

I'm trying to use the pet_loopback (that the cleaned-up and documented version, right?) program (with --autodetect!!!) on an input file (see below), and it basically breaks after printing a few lines. The input files in intentionally made "hard", i.e. I wanted to see how well PET autodetect works, and how I can handle multiple scops in one function. I guess my questions are: Is this expected? Am I missing something/doing something wrong? what is the proper way to handle this (if there is any)?

Best,
Emil

The input file:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

void f(double *data, size_t M, size_t N) {
  // first scop: horizontal stencil
  for (size_t i = 0; i < M; i++) {
    for (size_t j = 1; j < N - 1; j++) {
      size_t idx = i * M + j;
      data[idx] = (data[idx - 1] + data[idx] + data[idx + 1]) / 3;
    }
  }

  // non-scop part;
  int r = min(rand(), M * N);
  for (size_t t = 1; t < r; t++) {
    // int r = rand();
    if (r == 0) {
      data[t] = 0;
    } else if (r % 2 == 0) {
      data[t] += -data[t];
    } else {
      data[t] += 2 * data[t];
    }
  }

  // second scop: vertical stencil
  for (size_t i = 1; i < M - 1; i++) {
    for (size_t j = 0; j < N; j++) {
      size_t idx = i * M + j;
      data[idx] = (data[idx - M] + data[idx] + data[idx + M]) / 3;
    }
  }
}

void g(double *data, size_t M, size_t N) {
  // third stencil
  for (size_t i = 1; i < M - 1; i++) {
    for (size_t j = 0; j < N; j++) {
      size_t idx = i * M + j;
      data[idx] = (data[idx - M] + data[idx - 1] + data[idx] + data[idx + 1] +
                   data[idx + M]) /
                  5;
    }
  }
}

void init(double *data, size_t M, size_t N) {
  for (size_t i = 0; i < M * N; i++) {
    data[i] = rand();
  }
}

void print_matrix(double *data, size_t M, size_t N) {

  for (size_t i = 0; i < M; i++) {
    for (size_t j = 1; j < N - 1; j++) {
      size_t idx = i * M + j;
      printf("%15.2lf ", data[idx]);
    }
    printf("\n");
  }
}

int main(int argc, char *argv[]) {
  size_t M = 10, N = 10;
  double *data = malloc(M * N * sizeof(double));

  init(data, M, N);
  print_matrix(data, M, N);
  f(data, M, N);
  g(data, M, N);
  printf("====\n");
  print_matrix(data, M, N);
  free(data);
  printf("Done\n");
  return 0;
}
--
Emil Vatai

Sven Verdoolaege

unread,
Aug 3, 2023, 2:31:29 PM8/3/23
to Emil Vatai, isl Development
On Thu, Aug 03, 2023 at 12:08:36PM +0900, Emil Vatai wrote:
> Hola,
>
> Sorry for necrobumping this.
>
> I'm trying to use the pet_loopback (that the cleaned-up and documented
> version, right?) program (with --autodetect!!!) on an input file (see
> below), and it basically breaks after printing a few lines. The input files
> in intentionally made "hard", i.e. I wanted to see how well PET autodetect
> works, and how I can handle multiple scops in one function. I guess my
> questions are: Is this expected?

No. I guess I overlooked some case(s).
I'll try and have a look in the next couple of days.

> Am I missing something/doing something
> wrong? what is the proper way to handle this (if there is any)?

Well...

> void f(double *data, size_t M, size_t N) {

... you're not going to get very good results
if you linearize the array you're working on.
You should pass it as

void f(size_t M, size_t N, double data[M][N]) {

instead.

> // first scop: horizontal stencil
> for (size_t i = 0; i < M; i++) {
> for (size_t j = 1; j < N - 1; j++) {
> size_t idx = i * M + j;

Actually, even if you linearize your accesses,
this should probably be i * N + j.

skimo

Sven Verdoolaege

unread,
Aug 6, 2023, 4:52:45 AM8/6/23
to Emil Vatai, isl Development
On Thu, Aug 03, 2023 at 08:31:27PM +0200, Sven Verdoolaege wrote:
> On Thu, Aug 03, 2023 at 12:08:36PM +0900, Emil Vatai wrote:
> > Hola,
> >
> > Sorry for necrobumping this.
> >
> > I'm trying to use the pet_loopback (that the cleaned-up and documented
> > version, right?) program (with --autodetect!!!) on an input file (see
> > below), and it basically breaks after printing a few lines. The input files
> > in intentionally made "hard", i.e. I wanted to see how well PET autodetect
> > works, and how I can handle multiple scops in one function. I guess my
> > questions are: Is this expected?
>
> No. I guess I overlooked some case(s).
> I'll try and have a look in the next couple of days.

The error should be fixed now, but as explained before,
you probably won't be able to do anything useful
with linearized arrays.

skimo

Emil Vatai

unread,
Aug 6, 2023, 9:31:04 PM8/6/23
to isl Development
Huge thanks for looking into this! It works!

But now I found (perhaps?) another bug when I add this one line to the `transform` function in pet_loopback.c (detailed diff at the below).
```
p = isl_printer_print_schedule_constraints(p, schedule);
```
I get a segfault with the following error message preceding it:
```
isl_output.c:1710: invalid output format for isl_union_set
```
The input file is same as in my previous email, with the fixed index `idx = i * N + j` (correct) instead of `idx = i * M + j` (wrong). File added to the end of the email for completeness.


My question is similar to the one before, is this expected/unexpected? Am I the one doing something wrong?

Best,
E

git diff pet_loopback.c
diff --git a/pet_loopback.c b/pet_loopback.c
index fe22129..de89f2b 100644
--- a/pet_loopback.c
+++ b/pet_loopback.c
@@ -36,6 +36,7 @@
 #include <isl/id_to_id.h>
 #include <isl/options.h>
 #include <isl/printer.h>
+#include <isl/schedule.h>
 #include <isl/val.h>
 
 #include <pet.h>
@@ -442,6 +443,7 @@ static __isl_give isl_printer *transform(__isl_take isl_printer *p,
        p = print_macros(p, node);
        p = isl_ast_node_print(node, p, print_options);
        p = print_end_declarations(p, indent);
+        p = isl_printer_print_schedule_constraints(p, schedule);
        isl_ast_node_free(node);
        isl_ast_build_free(build);
        isl_id_to_id_free(id2stmt);

// input.c ///////////////////////////////////////////////////////////////////////////////////////////////
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

void f(size_t M, size_t N, double data[M][N]) {
  // first scop: horizontal stencil
  for (size_t i = 0; i < M; i++) {
    for (size_t j = 1; j < N - 1; j++) {
      data[i][j] = (data[i - 1][j] + data[i][j] + data[i + 1][j]) / 3;
    }
  }

  // non-scop part;
  int r = rand() % (N * M);

  for (size_t t = 1; t < r; t++) {
    // int r = rand();
    size_t i = r / N, j = r % N;
    if (r == 0) {
      data[i][j] = 0;

    } else if (r % 2 == 0) {
      data[i][j] -= 0.5 * data[i][j];
    } else {
      data[i][j] -= 1.5 * data[i][j];

    }
  }

  // second scop: vertical stencil
  for (size_t i = 1; i < M - 1; i++) {
    for (size_t j = 0; j < N; j++) {
      data[i][j] = (data[i][j - 1] + data[i][j] + data[i][j + 1]) / 3;
    }
  }
}

void g(size_t M, size_t N, double data[M][N]) {
  // third stencil
  for (size_t i = 1; i < M - 1; i++) {

    for (size_t j = 1; j < N - 1; j++) {
      data[i][j] = (data[i][j - 1] + data[i - 1][j] + data[i][j] +
                    data[i + 1][j] + data[i][j + 1]) /
                   5;
    }
  }
}

void init(size_t M, size_t N, double data[M][N]) {

  for (size_t i = 0; i < M; i++) {
    for (size_t j = 0; j < N; j++) {
      data[i][j] = rand();
    }
  }
}

void print_matrix(size_t M, size_t N, double data[M][N]) {


  for (size_t i = 0; i < M; i++) {
    for (size_t j = 1; j < N - 1; j++) {
      size_t idx = i * N + j;
      printf("%15.2lf ", data[i][j]);

    }
    printf("\n");
  }
}

int main(int argc, char *argv[]) {
  size_t M = 10, N = 10;
  double data[M][N];

  init(M, N, data);
  print_matrix(M, N, data);
  f(M, N, data);
  g(M, N, data);
  printf("====\n");
  print_matrix(M, N, data);

  printf("Done\n");
  return 0;
}

Emil Vatai

unread,
Aug 6, 2023, 9:37:41 PM8/6/23
to isl Development
Sorry, I relied too much on auto/tab-completion and by accident used the wrong function! But after correcting my mistake I still get an error:

The error:
isl_output.c:1710: invalid output format for isl_union_set

The added line:
p = isl_printer_print_schedule_node(p, isl_schedule_get_root(schedule));

Diff:
git diff pet_loopback.c
diff --git a/pet_loopback.c b/pet_loopback.c
index fe22129..42ec8d3 100644
--- a/pet_loopback.c
+++ b/pet_loopback.c
@@ -36,6 +36,8 @@

 #include <isl/id_to_id.h>
 #include <isl/options.h>
 #include <isl/printer.h>
+#include <isl/schedule.h>
+#include <isl/schedule_node.h>
 #include <isl/val.h>
 
 #include <pet.h>
@@ -442,6 +444,7 @@ static __isl_give isl_printer *transform(__isl_take isl_printer *p,

        p = print_macros(p, node);
        p = isl_ast_node_print(node, p, print_options);
        p = print_end_declarations(p, indent);
+        p = isl_printer_print_schedule_node(p, isl_schedule_get_root(schedule));
        isl_ast_node_free(node);
        isl_ast_build_free(build);
        isl_id_to_id_free(id2stmt);

Emil Vatai

unread,
Aug 6, 2023, 9:41:00 PM8/6/23
to isl Development
Ok. It's too early in the morning I guess - I keep on piling small mistakes.

I see now that the input is not the one with flat arrays and corrected indices, but the 2d array version. But I guess this is less relevant. The input.c I posted in 2 messages back is exactly what I'm using...

Sorry for the mini spam torrent. :)

Best,
E

Sven Verdoolaege

unread,
Aug 7, 2023, 12:49:05 PM8/7/23
to Emil Vatai, isl Development
On Sun, Aug 06, 2023 at 06:31:04PM -0700, Emil Vatai wrote:
> git diff pet_loopback.c
> diff --git a/pet_loopback.c b/pet_loopback.c
> index fe22129..de89f2b 100644
> --- a/pet_loopback.c
> +++ b/pet_loopback.c
> @@ -36,6 +36,7 @@
> #include <isl/id_to_id.h>
> #include <isl/options.h>
> #include <isl/printer.h>
> +#include <isl/schedule.h>
> #include <isl/val.h>
>
> #include <pet.h>
> @@ -442,6 +443,7 @@ static __isl_give isl_printer *transform(__isl_take
> isl_printer *p,
> p = print_macros(p, node);
> p = isl_ast_node_print(node, p, print_options);
> p = print_end_declarations(p, indent);
> + p = isl_printer_print_schedule_constraints(p, schedule);

You're not just calling the wrong function,
you're also calling on a freed data structure.

skimo

Emil Vatai

unread,
Aug 7, 2023, 9:12:49 PM8/7/23
to sven.ver...@gmail.com, isl Development
I corrected the function (I hope), and moved it immediately below the where the schedule is obtained (also copied the schedule to be on the safe side -- see below), but it still fails.

Best,
Emil


diff --git a/pet_loopback.c b/pet_loopback.c
index fe22129..1a61cfe 100644
--- a/pet_loopback.c
+++ b/pet_loopback.c
@@ -36,6 +36,8 @@

 #include <isl/id_to_id.h>
 #include <isl/options.h>
 #include <isl/printer.h>
+#include <isl/schedule.h>
+#include <isl/schedule_node.h>
 #include <isl/val.h>
 
 #include <pet.h>
@@ -431,6 +433,7 @@ static __isl_give isl_printer *transform(__isl_take isl_printer *p,
 
        ctx = isl_printer_get_ctx(p);
        schedule = isl_schedule_copy(scop->schedule);
+        p = isl_printer_print_schedule_node(p, isl_schedule_get_root(isl_schedule_copy(schedule)));
        id2stmt = set_up_id2stmt(scop);
        build = isl_ast_build_alloc(ctx);
        build = isl_ast_build_set_at_each_domain(build, &at_domain, id2stmt);

--
Emil Vatai

Emil Vatai

unread,
Aug 7, 2023, 9:34:20 PM8/7/23
to sven.ver...@gmail.com, isl Development
Ok! It is working!

I learned something new. The `output_format` is a property of the printer, and not all objects support all formats (e.g. `isl_union_map` only supports ISL_FORMAT_ISL, and ISL_FORMAT_LATEX, while `isl_multi_union_pw_aff` only supports ISL_FORMAT_ISL). These can be set with `isl_printer_set_output_format()` (and inspected with `isl_printer_get_output_format()`). So the code below works!

Possible suggestion for improvement: instead of error messages like "invalid output format for isl_union_set", be more explicit that the "output format" is the printer's property (e.g. "the printer's output format is invalid for isl_union_set")

Best,
E

diff --git a/pet_loopback.c b/pet_loopback.c
index fe22129..f6f2f15 100644

--- a/pet_loopback.c
+++ b/pet_loopback.c
@@ -36,6 +36,8 @@
 #include <isl/id_to_id.h>
 #include <isl/options.h>
 #include <isl/printer.h>
+#include <isl/schedule.h>
+#include <isl/schedule_node.h>
 #include <isl/val.h>
 
 #include <pet.h>
@@ -431,6 +433,11 @@ static __isl_give isl_printer *transform(__isl_take isl_printer *p,

 
        ctx = isl_printer_get_ctx(p);
        schedule = isl_schedule_copy(scop->schedule);
+        int old_isl_format = isl_printer_get_output_format(p);
+        printf("// %d\n", old_isl_format == ISL_FORMAT_C);
+        p = isl_printer_set_output_format(p, ISL_FORMAT_ISL);

+        p = isl_printer_print_schedule_node(p, isl_schedule_get_root(isl_schedule_copy(schedule)));
+        p = isl_printer_set_output_format(p, old_isl_format);

        id2stmt = set_up_id2stmt(scop);
        build = isl_ast_build_alloc(ctx);
        build = isl_ast_build_set_at_each_domain(build, &at_domain, id2stmt);
--
Emil Vatai

Emil Vatai

unread,
Mar 8, 2024, 12:59:29 AMMar 8
to sven.ver...@gmail.com, isl Development
Dear Skimo,

Any suggestions on how to improve code generation of pet_loopback.c? More concretely for polybench/linalg/kernel/2mm it generates something like the code below. TL;DR: it adds extra `j = 0` and `j = c2 + 1` kind of statements (when they are not actually needed - to the best of my knowledge).

Best,
E

Code below:

### pet_loopback OUTPUT ###
```  int i, j, k;

    #define min(x,y)    ((x) < (y) ? (x) : (y))
    {
      i = 0;
      for (int c0 = 0; c0 < ni; c0 += 10)
        for (int c1 = 0; c1 <= min(9, ni - c0 - 1); c1 += 1) {
          j = 0;
          for (int c2 = 0; c2 < nj; c2 += 1) {
            tmp[c0 + c1][c2] = 0.;
            k = 0;
            for (int c3 = 0; c3 < nk; c3 += 1) {
              tmp[c0 + c1][c2] += ((alpha * A[c0 + c1][c3]) * B[c3][c2]);
              k = (c3 + 1);
            }
            j = (c2 + 1);
          }
          i = (c0 + c1 + 1);
        }
// ... snip ...
```
### ORIGINAL ###
```
  int i, j, k;

#pragma scop
  /* D := alpha*A*B*C + beta*D */
  for (i = 0; i < _PB_NI; i++)
    for (j = 0; j < _PB_NJ; j++)
      {
        tmp[i][j] = SCALAR_VAL(0.0);
        for (k = 0; k < _PB_NK; ++k)
          tmp[i][j] += alpha * A[i][k] * B[k][j];
      }
// ... snip ...
```

--
Emil Vatai

Sven Verdoolaege

unread,
Mar 8, 2024, 3:39:16 PMMar 8
to Emil Vatai, isl Development
On Fri, Mar 08, 2024 at 02:59:16PM +0900, Emil Vatai wrote:
> Dear Skimo,
>
> Any suggestions on how to improve code generation of pet_loopback.c? More
> concretely for polybench/linalg/kernel/2mm it generates something like the
> code below. TL;DR: it adds extra `j = 0` and `j = c2 + 1` kind of
> statements (when they are not actually needed - to the best of my
> knowledge).

pet_loopback.c just prints the code as pet extract it.
Such statements are added by pet (in scop_from_for_init)
because some code that comes after the scop might depend
on the loop iterators.
However, pet will also add kill statements for the loop iterators
if it can easily figure out that these iterators are not
being used after the scop. A full-fledged code optimizer
such as PPCG will exploit those kills to remove those assignments
(along with other dead code). In PPCG, this happens inside
eliminate_dead_code.
I suppose in theory it would be possible for peti itself to perform
a light-weight dead code elimination. I just didn't bother
because PPCG already takes care of this.

Note that you could also change the input to declare the loop iterators
inside the loop statement and then pet will not add such statements.

skimo
Reply all
Reply to author
Forward
0 new messages