help understanding how cloog works

30 views
Skip to first unread message

christian

unread,
Apr 28, 2011, 4:36:01 AM4/28/11
to CLooG
Hello all:

I am playing with cloog, just to learn how to use it. I have borrowed
the following example from a well known paper (Amy Lim and Monica Lam
"Maximizing Parallelism and Minimizing Synchronization with Affine
Transforms"):

for( i=1; i < N ; i++ )
for( j=1; j < N; j++ ){
A[i][j] = A[i][j] + B[i-1][j];
B[i][j] = A[i][j-1] * B[i][j];
}

and a want to apply the partitioning they find:

f1(i,j) = i - j
f2(i,j) = i - j + 1

I can generate a valid code with the following scattering functions:

# of Scattering functions
2

# Scattering function
2 7
0 1 0 -1 1 0 0 ## p = i-j
0 0 1 -1 0 0 0 ## l = i


# Scattering function
2 7
0 1 0 -1 1 0 -1 ## p = i-j+1
0 0 1 -1 0 0 0 ## l = i

# we will set the scattering dimension names
2
p l

obtaining:

if (N >= 2) {
S1(1,N-1);
for (p=-N+3;p<=N-2;p++) {
if (p >= 1) {
S2(p,1);
}
for (l=max(1,p+1);l<=min(N-1,p+N-2);l++) {
S1(l,-p+l);
S2(l,-p+l+1); from
}
if (p <= 0) {
S1(p+N-1,N-1);
}
}
S2(N-1,1);
}

What I do not understand is why this alternative schedules do not
generate the same code:

# of Scattering functions
2

# Scattering function
1 6
0 1 -1 1 0 0 ## p = i-j

# Scattering function
1 6
0 1 -1 1 0 -1 ## p = i-j+1

# we will set the scattering dimension names
1
p

I was assuming, following C. Bastoul papers and Ph.D. Thesis, that the
original domain was added to form a new polyhedron in which the
scattering variables were at the leading dimensions. This polyhedron
was then scanned lexicographically. Thus adding the extra scattering
dimension i (leading dimension in the original domains) should make
no difference, right? The code generated by cloog from the latter
scattering functions is:

[CLooG] INFO: 1 domains have been blocked.
/* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
0.00s. */
if (N >= 1) {
for (p=-N+1;p<=N-1;p++) {
for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
S1(i,-p+i);
S2(i,-p+i);
}
}
}

Can anyone explain me why am I wrong? By the way, I do not know what
cloog means by "1 domains have been blocked."

Furthermore, notice that the schedule for S2 is not respected (if I
understood it well). For p = i - j + 1, then given i and p the
operation to be executed should be S2(i,i-p+1) (as in the first code).
Am I wrong?

If I add a scalar dimension in the leading position to split both
domains:
# of Scattering functions
2

# Scattering function
2 7
0 1 0 0 0 0 0 ## 0
0 0 1 -1 1 0 0 ## p = i-j

# Scattering function
2 7
0 1 0 0 0 0 -1 ## 0
0 0 1 -1 1 0 -1 ## p = i-j+1


# we will set the scattering dimension names
2
s p

the code generated by cloog is:

[CLooG] INFO: 1 dimensions (over 2) are scalar.
/* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
0.01s. */
if (N >= 1) {
for (p=-N+1;p<=N-1;p++) {
for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
S1(i,-p+i);
}
}
for (p=-N+2;p<=N;p++) {
for (i=max(0,p-1);i<=min(N-1,p+N-2);i++) {
S2(i,-p+i+1);
}
}
}

In this case both schedules are respected. It seams that the previous
result is just a fusion of these two loops by shifting the second p
loop by -1. I cannot understand why cloog does it. I was assuming that
the scattering dimensions were common for all the statements and fixed
by the scattering function, and cloog would not change them. I am
certainly missing something.

Any help on this would be appreciated. Thanks in advance.


Regards,

Christian.

Sven Verdoolaege

unread,
Apr 28, 2011, 5:02:02 AM4/28/11
to christian, CLooG
On Thu, Apr 28, 2011 at 01:36:01AM -0700, christian wrote:
> I am playing with cloog, just to learn how to use it.

For playing, I would strongly recommend using iscc instead of
calling CLooG directly.

> I was assuming, following C. Bastoul papers and Ph.D. Thesis, that the
> original domain was added to form a new polyhedron in which the
> scattering variables were at the leading dimensions. This polyhedron
> was then scanned lexicographically. Thus adding the extra scattering
> dimension i (leading dimension in the original domains) should make
> no difference, right? The code generated by cloog from the latter
> scattering functions is:

These are implementation details. As a user, you should not depend
on this. CLooG is only required to respect the schedule as specified
by the "scattering functions". It is free to order the iterations
mapped to the same scheduling iteration in any way it sees fit.

> [CLooG] INFO: 1 domains have been blocked.
> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
> 0.00s. */
> if (N >= 1) {
> for (p=-N+1;p<=N-1;p++) {
> for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
> S1(i,-p+i);
> S2(i,-p+i);
> }
> }
> }

Now, this does look like something went wrong.
Please send the complete input for this example.

skimo

Christian Tenllado

unread,
Apr 28, 2011, 5:29:16 AM4/28/11
to sk...@kotnet.org, CLooG
Skimo, thank you for your answer. I will try iscc.

> These are implementation details.  As a user, you should not depend
> on this.  CLooG is only required to respect the schedule as specified
> by the "scattering functions".  It is free to order the iterations
> mapped to the same scheduling iteration in any way it sees fit.

I know, but ones you know it... In any case, the biggest confusion arrives
from "not respecting the schedule for S2"

>> [CLooG] INFO: 1 domains have been blocked.
>> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
>> 0.00s. */
>> if (N >= 1) {
>>   for (p=-N+1;p<=N-1;p++) {
>>     for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
>>       S1(i,-p+i);
>>       S2(i,-p+i);
>>     }
>>   }
>> }
>
> Now, this does look like something went wrong.
> Please send the complete input for this example.

I attach you 2 cloog files, the test.limlam2.cloog is the one that
generates the "wrong?" schedule for S2, and the test.limlam3.cloog
is the one with the extra scalar dimensions added, to split both domains.


Thank you!

test.limlam2.cloog
test.limlam3.cloog

Sven Verdoolaege

unread,
Apr 28, 2011, 6:28:21 AM4/28/11
to Christian Tenllado, CLooG
On Thu, Apr 28, 2011 at 11:29:16AM +0200, Christian Tenllado wrote:
> > These are implementation details. �As a user, you should not depend
> > on this. �CLooG is only required to respect the schedule as specified
> > by the "scattering functions". �It is free to order the iterations
> > mapped to the same scheduling iteration in any way it sees fit.
>
> I know, but ones you know it... In any case, the biggest confusion arrives
> from "not respecting the schedule for S2"

Actually, the schedule is respected.
Note that the schedule only specifies the order in which iterations
are executed and the code below does respect that order.

> >> [CLooG] INFO: 1 domains have been blocked.
> >> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
> >> 0.00s. */
> >> if (N >= 1) {
> >> � for (p=-N+1;p<=N-1;p++) {
> >> � � for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
> >> � � � S1(i,-p+i);
> >> � � � S2(i,-p+i);
> >> � � }
> >> � }
> >> }
> >
> > Now, this does look like something went wrong.

But it is correct.
What happens is that CLooG notices that each iteration
of S2 is to executed immediately after the same iteration
of S1 and then treats the block {S1;S2} as a single statement.

skimo

Christian Tenllado

unread,
Apr 28, 2011, 7:04:00 AM4/28/11
to sk...@kotnet.org, CLooG
> Actually, the schedule is respected.
> Note that the schedule only specifies the order in which iterations
> are executed and the code below does respect that order.

I agree that the schedule specifies the order in which iterations are executed,
but also imposes relative order between the operations of the two
statements, right?

if schedule for S1 is p = i-j
and schedule for S2 is p = i-j+1

then for any valid value of p we must compute
S1(i,i-p) for all valid values of i (no order is specified)
S2(i,i-p+1) for all valid values of i (no order is specified)

I am assuming that the value of p is common for both statements,
so you can impose relative order between operations.

>> >> [CLooG] INFO: 1 domains have been blocked.
>> >> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
>> >> 0.00s. */
>> >> if (N >= 1) {
>> >>   for (p=-N+1;p<=N-1;p++) {
>> >>     for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
>> >>       S1(i,-p+i);
>> >>       S2(i,-p+i);
>> >>     }
>> >>   }
>> >> }
>> >
>> > Now, this does look like something went wrong.
>
> But it is correct.
> What happens is that CLooG notices that each iteration
> of S2 is to executed immediately after the same iteration
> of S1 and then treats the block {S1;S2} as a single statement.

Uhmm, let us see...

p = -1
S1(i,i+1) for all 0<=i<=N-2 and
S2(i',i'+2) for all 0<=i'<=N-3 (no order specified)

p = 0
S1(i,i) for all 0<=i<=N-1 and
S2(i',i'+1) for all 0<=i'<=N-2 (no order specified)

p = 1
S1(i,i-1) for all 1<=i<=N-1 and
S2(i',i') for all 0<=i'<=N-1 (no order specified)

You mean that cloog reorders it shifting S2:

p=-2

S2(i',i'+2) for all 0<=i'<=N-3 (no order specified)

p = -1
S1(i,i+1) for all 0<=i<=N-2 and
S2(i',i'+1) for all 0<=i'<=N-2 (no order specified)

p = 0
S1(i,i) for all 0<=i<=N-1 and
S2(i',i') for all 0<=i'<=N-1 (no order specified)

p = 1
S1(i,i-1) for all 1<=i<=N-1 and
S2(i',i'-1) for all 1<=i'<=N-1 (no order specified)

But then what I scheduled for S2 to be in the same
partition than S1 is not respected. S2(i',i'+1) should
be in partition 0, with S1(i,i).

I would have expected a different behavior and I still do
not understand what am I missing.

Thank you once more.


Christian.

Sven Verdoolaege

unread,
Apr 28, 2011, 7:31:36 AM4/28/11
to Christian Tenllado, CLooG
On Thu, Apr 28, 2011 at 01:04:00PM +0200, Christian Tenllado wrote:
> > Actually, the schedule is respected.
> > Note that the schedule only specifies the order in which iterations
> > are executed and the code below does respect that order.
>
> I agree that the schedule specifies the order in which iterations are executed,
> but also imposes relative order between the operations of the two
> statements, right?

Exactly.

> if schedule for S1 is p = i-j
> and schedule for S2 is p = i-j+1
>
> then for any valid value of p we must compute
> S1(i,i-p) for all valid values of i (no order is specified)
> S2(i,i-p+1) for all valid values of i (no order is specified)

You also didn't specify an order between those two groups of
iterations.

You only specified that S1(i,i) and S2(i',i'+1) should
be executed before S1(i,i-1) and S2(i',i').
CLooG respects your specification.

skimo

Christian Tenllado

unread,
Apr 28, 2011, 5:21:04 PM4/28/11
to sk...@kotnet.org, CLooG
>> if schedule for S1 is p = i-j
>> and schedule for S2 is p = i-j+1
>>
>> then for any valid value of p we must compute
>>      S1(i,i-p) for all valid values of i (no order is specified)
>>      S2(i,i-p+1) for all valid values of i (no order is specified)
>
> You also didn't specify an order between those two groups of
> iterations.

Right.

>> But then what I scheduled for S2 to be in the same
>> partition than S1 is not respected. S2(i',i'+1) should
>> be in partition 0, with S1(i,i).
>
> You only specified that S1(i,i) and S2(i',i'+1) should
> be executed before S1(i,i-1) and S2(i',i').
> CLooG respects your specification.

Uhmm, here I don't get it.

For S1 I used p=i-j as schedule. Then for any value
of p we have to perform the operations:

S1(i,i-p) for max(1,p+1) <= i <= min(N-1,p+N-1)

For S2 I used p=i-j+1 as schedule. Then for any value
of p we have to perform the operations:

S2(i',i'-p+1) for max(1,p) <= i' <= min(N-1,p+N-2)

Consider for instance N=4, then

S1(2,3) is to be executed at p=-1

and

S2(1,2) is to be executed at p=0

thus S1(2,3) should be executed before S2(1,2). Am I right?

However, the code generated by CLooG computes both operations at:

S2(1,2) p=-1 and i = 1
S1(2,3) p=-1 and i = 2

Thus S2(1,2) is executed before S1(2,3) which, if I was right, contradicts
the schedules.

Could you point me where is my error?

Thanks a lot skimo.

Christian.

Sven Verdoolaege

unread,
Apr 28, 2011, 5:36:00 PM4/28/11
to Christian Tenllado, CLooG
On Thu, Apr 28, 2011 at 11:21:04PM +0200, Christian Tenllado wrote:
> However, the code generated by CLooG computes both operations at:
>
> S2(1,2) p=-1 and i = 1
> S1(2,3) p=-1 and i = 2
>
> Thus S2(1,2) is executed before S1(2,3) which, if I was right, contradicts
> the schedules.
>
> Could you point me where is my error?

You're right. The optimization only works for (statement-wise) injective
schedules. I'm not used to thinking about non-injective schedules.
Thanks for persisting. I'll work on a fix.

skimo

Uday Kumar Reddy B

unread,
Apr 29, 2011, 1:41:26 AM4/29/11
to Christian Tenllado, sk...@kotnet.org, CLooG

Yes, the ordering specified by your schedule makes sure that all
S1(i,i-p) and S2(i,i-p+1) appear in the same iteration of the
outermost loop (since i-(i-p) = (i-(i-p+1)+1). Within that iteration,
they may appear in any order since you don't specify any more
scattering functions. OTOH, the above code executes S1(i,i-p) and
S2(i,i-p) in the same iteration of the outermost loop and is incorrect
as you point out; all statements are mapped to the same global time
space. I tried cloog-polylib, it does generate correct code (below).
Perhaps this is something specific to cloog-isl.

-------------
if (N >= 1) {
S1(0,N-1);
for (p=-N+2;p<=N-1;p++) {
if (p >= 1) {
S2(p-1,0);
}
for (i=max(0,p);i<=min(N-1,p+N-2);i++) {
S1(i,-p+i);
S2(i,-p+i+1);


}
if (p <= 0) {
S1(p+N-1,N-1);
}
}

S2(N-1,0);
}
-----------------------------

-- Uday

>
> I would have expected a different behavior and I still do
> not understand what am I missing.
>
> Thank you once more.
>
>
> Christian.
>

> --
> You got this message because you subscribed to the CLooG mailing list.
> To send messages to this list, use cl...@googlegroups.com
> To stop subscribing, send a mail to cloog+un...@googlegroups.com
> For more options and to visit the group, http://groups.google.fr/group/cloog?hl=en
>

Christian Tenllado

unread,
Apr 29, 2011, 1:47:07 AM4/29/11
to sk...@kotnet.org, CLooG
Thank you for your help skimo.

I had tried previously with single-statement examples, and I guess
that even non injective schedules where working. This was my first try
with two statements.

With the injective schedules (full decision, processor and time
schedule, by adding i as second schedule dimension) CLooG is doing
perfect.

As I told you, I was trying to verify if I was understanding how
schedules work. Therefore, thank you very much for your effort
understanding the problem. I will also try using iscc.

Christian

Christian Tenllado

unread,
Apr 29, 2011, 1:50:57 AM4/29/11
to Uday Kumar Reddy B, sk...@kotnet.org, CLooG
Hi Uday:

Thanks for participating. Yes, it might be related to cloog-isl, I
think (don't remember exactly) that I compiled with isl instead of
polylib. Thanks for the confirmation, I will also try using
cloog-polylib.

Christian.

Sven Verdoolaege

unread,
Apr 29, 2011, 4:41:20 AM4/29/11
to Cédric Bastoul, Christian Tenllado, CLooG
On Thu, Apr 28, 2011 at 11:36:00PM +0200, Sven Verdoolaege wrote:
> You're right. The optimization only works for (statement-wise) injective
> schedules. I'm not used to thinking about non-injective schedules.

Actually, I shouldn't have put in that "statement-wise".
It may also break for statement-wise injective schedules that are
not injective as a whole.
The attached case schedules S1 at 0, S2 at 1 and S3 at 0 and 1,
yet CLooG generates:

[CLooG] INFO: 1 domains have been blocked.

/* Generated from test_block.cloog by CLooG 0.16.2-2-gbff486a gmp bits in 0.00s. */
S1();
S2();
for (c1=0;c1<=1;c1++) {
S3(c1);
}

Note that this problem isn't "new". The old PolyLib based blocking
also got it wrong:

skimo@purples:~/src/cloog-0.14.1$ ./cloog ~/obj/cloog-isl/test_block.cloog

[CLooG]INFO: 1 domains have been blocked.

/* Generated from /home/skimo/obj/cloog-isl/test_block.cloog by CLooG v0.14.1 64 bits in 0.00s. */
S1 ;
S2 ;
S3(i = 0) ;
S3(i = 1) ;

However, it also got some other cases wrong and was therfore
"temporarily" removed. According to the commit message of
bdccce9 (PolyLib backend: disable cloog_scattering_lazy_block,
Tue May 5 21:03:03 2009 +0200), Cedric Bastoul is working on a fix.

So, Cedric, what should we do?
I could add a "isl_union_map_fast_is_injective" to isl and only
do blocking if this returns true. This would mean that the fix
depends on a new version of isl.
Or, I could just disable blocking completely.
(I tried that on swim.cloog, and it doesn't seem to affect the
running time in any significant way.)

skimo

test_block.cloog

Cédric Bastoul

unread,
May 1, 2011, 8:04:18 AM5/1/11
to sk...@kotnet.org, Christian Tenllado, CLooG


2011/4/29 Sven Verdoolaege <skimo...@kotnet.org>
Haha this good old bug is still there :-D ! This optimization may really be useful. It brought major speedup to regenerate the SCoPs extracted by WRAP-IT (old cloog-benchmarks, but most of them were too long because of bugs inside WRAP-IT). It is not useful for us today because (1) GRAPHITE considers basic blocks as a single statement, thus the optimization is rarely useful and (2) users of high level tools like Pluto/PolyCC and PoCC are still optimizing small kernels. IMHO it should not be useful at all : if a scheduling algorithm decides that several statements should be processed as a block, it should provide to CLooG only one statement... But that's not what they do :-(. So I, personally, would prefer to keep it and go for a isl_union_map_fast_is_injective.

Ced.

Sven Verdoolaege

unread,
May 5, 2011, 10:53:19 AM5/5/11
to Cédric Bastoul, Christian Tenllado, CLooG
On Sun, May 01, 2011 at 02:04:18PM +0200, C�dric Bastoul wrote:
> So I, personally, would prefer to keep it and go for
> a isl_union_map_fast_is_injective.

How about the attached, then?

skimo

0001-cloog_scattering_lazy_block-check-whether-entire-sch.patch

Cédric Bastoul

unread,
May 6, 2011, 11:53:07 AM5/6/11
to sk...@kotnet.org, Christian Tenllado, CLooG
On Thu, May 5, 2011 at 4:53 PM, Sven Verdoolaege <skimo...@kotnet.org> wrote:
On Sun, May 01, 2011 at 02:04:18PM +0200, Cédric Bastoul wrote:
> So I, personally, would prefer to keep it and go for
> a isl_union_map_fast_is_injective.

How about the attached, then?

Sounds good. Thanks Skimo !
Reply all
Reply to author
Forward
0 new messages