how to let two statements be scheduled in the same loop nest?

71 views
Skip to first unread message

le yin

unread,
Aug 17, 2023, 1:26:26 AM8/17/23
to isl Development

for (int i = 0; i < 1024; i++)
   for (int j = 0; j < 1024; j++)
   {
        tmp0[i] += input[i][j];
        tmp1[i] += input[i][j];
   }

will get the schedule as follows:

domain: "{ S_1[i, j] : 0 <= i <= 1023 and 0 <= j <= 1023; S_0[i, j] : 0 <= i <= 1023 and 0 <= j <= 1023 }"                                                                                                                                                 
 child:                                                                                                                                                        set:                                                                                                                                                         - filter: "{ S_0[i, j] }"
      child:                                                                                                                                                        schedule: "[{ S_0[i, j] -> [(i)] }, { S_0[i, j] -> [(j)] }]"
          permutable: 1                                                                                                                                       coincident: [ 1, 0 ]                                                                                                                            - filter: "{ S_1[i, j] }"                                                                                                                             child:                                                                                                                                                        schedule: "[{ S_1[i, j] -> [(i)] }, { S_1[i, j] -> [(j)] }]"
            permutable: 1                                                                                                                                      coincident: [ 1, 0 ]  

S_0 and S_1 be scheduled in different loops. And two kernels will generated.

Can the two statements be scheduled in the same loop ?

Sven Verdoolaege

unread,
Aug 17, 2023, 3:01:19 PM8/17/23
to le yin, isl Development
On Wed, Aug 16, 2023 at 10:26:25PM -0700, le yin wrote:
>
> for (int i = 0; i < 1024; i++)
> for (int j = 0; j < 1024; j++)
> {
> tmp0[i] += input[i][j];
> tmp1[i] += input[i][j];
> }
>
> will get the schedule as follows:

What do you mean? Where did you get this schedule?

> domain: "{ S_1[i, j] : 0 <= i <= 1023 and 0 <= j <= 1023; S_0[i, j] : 0 <=
> i <= 1023 and 0 <= j <= 1023 }"
>
>
> child:
>
> set:
>
> - filter: "{ S_0[i, j] }"
> child:
>
> schedule: "[{ S_0[i, j] -> [(i)] }, { S_0[i, j] -> [(j)] }]"
> permutable: 1
>
> coincident: [ 1, 0 ]
>
> - filter: "{ S_1[i, j] }"
>
> child:
>
> schedule: "[{ S_1[i, j] -> [(i)] }, { S_1[i, j] -> [(j)] }]"
> permutable: 1
>
> coincident: [ 1, 0 ]

There is some serious issue with the formatting here,
making it almost impossible to parse the schedule.

> Can the two statements be scheduled in the same loop ?

Something like this should do it:

domain: "{ S_1[i, j] : 0 <= i <= 1023 and 0 <= j <= 1023; S_0[i, j] : 0 <= i <= 1023 and 0 <= j <= 1023 }"
child:
schedule: "[{ S_0[i, j] -> [(i)]; S_1[i, j] -> [(i)] }, { S_0[i, j] -> [(j)]; S_1[i, j] -> [(j)] }]"

skimo

le yin

unread,
Aug 17, 2023, 11:44:50 PM8/17/23
to isl Development
Sorry for the formatting when I copy text from command term to this session.

Let me use a simpler case.
The input source file is 

#pragma scop

 for (int i = 0; i < 1024; i++)
 {
     c[i] = a[i] + b[i];                                // S_0
     d[i] = a[i] + b[i];                                // S_1
 }
#pragma endscop


ppcg processed the above file and produced a isl_schedule after calling get_schedule().
The schedule is as the following:

domain: ...
child:
    set:
    - filter:  S_0
      child:
           schedule:  S_0[i] -> [(i)]
    - filter:  S_1
      child:
           schedule:  S_1[i] -> [(i)]

S_0 and S_1 are put in different loops.
How can I get a isl schedule that S_0 and S_1 are in the same loop?

Thanks,
-leo

Sven Verdoolaege

unread,
Aug 20, 2023, 5:41:56 PM8/20/23
to le yin, isl Development
On Thu, Aug 17, 2023 at 08:44:50PM -0700, le yin wrote:
> Sorry for the formatting when I copy text from command term to this session.
>
> Let me use a simpler case.
> The input source file is
>
> #pragma scop
> for (int i = 0; i < 1024; i++)
> {
> c[i] = a[i] + b[i]; // S_0
> d[i] = a[i] + b[i]; // S_1
> }
> #pragma endscop
>
>
> ppcg processed the above file and produced a isl_schedule after calling
> get_schedule().
> The schedule is as the following:
>
> domain: ...
> child:
> set:
> - filter: S_0
> child:
> schedule: S_0[i] -> [(i)]
> - filter: S_1
> child:
> schedule: S_1[i] -> [(i)]
>
> S_0 and S_1 are put in different loops.
> How can I get a isl schedule that S_0 and S_1 are in the same loop?

You can override the schedule computed by isl
by using the --load-schedule option.

If you think isl should compute such a schedule,
then you'll have to explain based on what criteria
it should do that and why you would prefer such a schedule.

skimo

le yin

unread,
Aug 20, 2023, 11:29:07 PM8/20/23
to isl Development

For the practical purperse, I modify the case a little as the following:
  #pragma scop
 for (int i = 0; i < 1024; i++)
 {
 c[i] = a[i] + b[i];                       // S_0
 d[i] = a[i] * b[i];                        // S_1
 }
 #pragma endscop

S_0 and S_1 both read a[] and b[] to compute different values, stored in c[] and d[] respectively.

If S_0 and S_1 are in the same loop, then a[i] and b[i] are read once for the computation of S_0 and S_1;
if they are in two different loops (so in different kernels), a[i] and b[i] need be read twice.
For the hardware of gpu architecture, reading data from global device memory to gpu thread private memory 
for computation is a huge cost. So I prefer that S_0 and S_1 are put in the same loop, which will save much
memory copying and affect the generated code performance greately.

Thanks,
-leo

Sven Verdoolaege

unread,
Sep 16, 2023, 2:37:47 PM9/16/23
to le yin, isl Development
On Sun, Aug 20, 2023 at 08:29:07PM -0700, le yin wrote:
>
> For the practical purperse, I modify the case a little as the following:
> #pragma scop
> for (int i = 0; i < 1024; i++)
> {
> c[i] = a[i] + b[i]; // S_0
> d[i] = a[i] * b[i]; // S_1
> }
> #pragma endscop
>
> S_0 and S_1 both read a[] and b[] to compute different values, stored in
> c[] and d[] respectively.
>
> If S_0 and S_1 are in the same loop, then a[i] and b[i] are read once for
> the computation of S_0 and S_1;
> if they are in two different loops (so in different kernels), a[i] and b[i]
> need be read twice.
> For the hardware of gpu architecture, reading data from global device
> memory to gpu thread private memory
> for computation is a huge cost. So I prefer that S_0 and S_1 are put in the
> same loop, which will save much
> memory copying and affect the generated code performance greately.

Try the attached patch (and use --input-dependences).
This is very simplistic in that it just tries to put any pairs
of reads of the same array element close to each other.
This may result in telling the isl scheduler to put too many
statement instances close to each other, resulting
in other proximity dependences essentially getting ignored
(or perhaps even worse side effect).

See if it works for your intended use cases and report back here.

skimo
0001-input-dependences.patch
Reply all
Reply to author
Forward
0 new messages