For playing, I would strongly recommend using iscc instead of
calling CLooG directly.
> I was assuming, following C. Bastoul papers and Ph.D. Thesis, that the
> original domain was added to form a new polyhedron in which the
> scattering variables were at the leading dimensions. This polyhedron
> was then scanned lexicographically. Thus adding the extra scattering
> dimension i (leading dimension in the original domains) should make
> no difference, right? The code generated by cloog from the latter
> scattering functions is:
These are implementation details. As a user, you should not depend
on this. CLooG is only required to respect the schedule as specified
by the "scattering functions". It is free to order the iterations
mapped to the same scheduling iteration in any way it sees fit.
> [CLooG] INFO: 1 domains have been blocked.
> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
> 0.00s. */
> if (N >= 1) {
> for (p=-N+1;p<=N-1;p++) {
> for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
> S1(i,-p+i);
> S2(i,-p+i);
> }
> }
> }
Now, this does look like something went wrong.
Please send the complete input for this example.
skimo
> These are implementation details. As a user, you should not depend
> on this. CLooG is only required to respect the schedule as specified
> by the "scattering functions". It is free to order the iterations
> mapped to the same scheduling iteration in any way it sees fit.
I know, but ones you know it... In any case, the biggest confusion arrives
from "not respecting the schedule for S2"
>> [CLooG] INFO: 1 domains have been blocked.
>> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
>> 0.00s. */
>> if (N >= 1) {
>> for (p=-N+1;p<=N-1;p++) {
>> for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
>> S1(i,-p+i);
>> S2(i,-p+i);
>> }
>> }
>> }
>
> Now, this does look like something went wrong.
> Please send the complete input for this example.
I attach you 2 cloog files, the test.limlam2.cloog is the one that
generates the "wrong?" schedule for S2, and the test.limlam3.cloog
is the one with the extra scalar dimensions added, to split both domains.
Thank you!
Actually, the schedule is respected.
Note that the schedule only specifies the order in which iterations
are executed and the code below does respect that order.
> >> [CLooG] INFO: 1 domains have been blocked.
> >> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
> >> 0.00s. */
> >> if (N >= 1) {
> >> � for (p=-N+1;p<=N-1;p++) {
> >> � � for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
> >> � � � S1(i,-p+i);
> >> � � � S2(i,-p+i);
> >> � � }
> >> � }
> >> }
> >
> > Now, this does look like something went wrong.
But it is correct.
What happens is that CLooG notices that each iteration
of S2 is to executed immediately after the same iteration
of S1 and then treats the block {S1;S2} as a single statement.
skimo
I agree that the schedule specifies the order in which iterations are executed,
but also imposes relative order between the operations of the two
statements, right?
if schedule for S1 is p = i-j
and schedule for S2 is p = i-j+1
then for any valid value of p we must compute
S1(i,i-p) for all valid values of i (no order is specified)
S2(i,i-p+1) for all valid values of i (no order is specified)
I am assuming that the value of p is common for both statements,
so you can impose relative order between operations.
>> >> [CLooG] INFO: 1 domains have been blocked.
>> >> /* Generated from test.limlam2.cloog by CLooG 0.16.1 gmp bits in
>> >> 0.00s. */
>> >> if (N >= 1) {
>> >> for (p=-N+1;p<=N-1;p++) {
>> >> for (i=max(0,p);i<=min(N-1,p+N-1);i++) {
>> >> S1(i,-p+i);
>> >> S2(i,-p+i);
>> >> }
>> >> }
>> >> }
>> >
>> > Now, this does look like something went wrong.
>
> But it is correct.
> What happens is that CLooG notices that each iteration
> of S2 is to executed immediately after the same iteration
> of S1 and then treats the block {S1;S2} as a single statement.
Uhmm, let us see...
p = -1
S1(i,i+1) for all 0<=i<=N-2 and
S2(i',i'+2) for all 0<=i'<=N-3 (no order specified)
p = 0
S1(i,i) for all 0<=i<=N-1 and
S2(i',i'+1) for all 0<=i'<=N-2 (no order specified)
p = 1
S1(i,i-1) for all 1<=i<=N-1 and
S2(i',i') for all 0<=i'<=N-1 (no order specified)
You mean that cloog reorders it shifting S2:
p=-2
S2(i',i'+2) for all 0<=i'<=N-3 (no order specified)
p = -1
S1(i,i+1) for all 0<=i<=N-2 and
S2(i',i'+1) for all 0<=i'<=N-2 (no order specified)
p = 0
S1(i,i) for all 0<=i<=N-1 and
S2(i',i') for all 0<=i'<=N-1 (no order specified)
p = 1
S1(i,i-1) for all 1<=i<=N-1 and
S2(i',i'-1) for all 1<=i'<=N-1 (no order specified)
But then what I scheduled for S2 to be in the same
partition than S1 is not respected. S2(i',i'+1) should
be in partition 0, with S1(i,i).
I would have expected a different behavior and I still do
not understand what am I missing.
Thank you once more.
Christian.
Exactly.
> if schedule for S1 is p = i-j
> and schedule for S2 is p = i-j+1
>
> then for any valid value of p we must compute
> S1(i,i-p) for all valid values of i (no order is specified)
> S2(i,i-p+1) for all valid values of i (no order is specified)
You also didn't specify an order between those two groups of
iterations.
You only specified that S1(i,i) and S2(i',i'+1) should
be executed before S1(i,i-1) and S2(i',i').
CLooG respects your specification.
skimo
Right.
>> But then what I scheduled for S2 to be in the same
>> partition than S1 is not respected. S2(i',i'+1) should
>> be in partition 0, with S1(i,i).
>
> You only specified that S1(i,i) and S2(i',i'+1) should
> be executed before S1(i,i-1) and S2(i',i').
> CLooG respects your specification.
Uhmm, here I don't get it.
For S1 I used p=i-j as schedule. Then for any value
of p we have to perform the operations:
S1(i,i-p) for max(1,p+1) <= i <= min(N-1,p+N-1)
For S2 I used p=i-j+1 as schedule. Then for any value
of p we have to perform the operations:
S2(i',i'-p+1) for max(1,p) <= i' <= min(N-1,p+N-2)
Consider for instance N=4, then
S1(2,3) is to be executed at p=-1
and
S2(1,2) is to be executed at p=0
thus S1(2,3) should be executed before S2(1,2). Am I right?
However, the code generated by CLooG computes both operations at:
S2(1,2) p=-1 and i = 1
S1(2,3) p=-1 and i = 2
Thus S2(1,2) is executed before S1(2,3) which, if I was right, contradicts
the schedules.
Could you point me where is my error?
Thanks a lot skimo.
Christian.
You're right. The optimization only works for (statement-wise) injective
schedules. I'm not used to thinking about non-injective schedules.
Thanks for persisting. I'll work on a fix.
skimo
Yes, the ordering specified by your schedule makes sure that all
S1(i,i-p) and S2(i,i-p+1) appear in the same iteration of the
outermost loop (since i-(i-p) = (i-(i-p+1)+1). Within that iteration,
they may appear in any order since you don't specify any more
scattering functions. OTOH, the above code executes S1(i,i-p) and
S2(i,i-p) in the same iteration of the outermost loop and is incorrect
as you point out; all statements are mapped to the same global time
space. I tried cloog-polylib, it does generate correct code (below).
Perhaps this is something specific to cloog-isl.
-------------
if (N >= 1) {
S1(0,N-1);
for (p=-N+2;p<=N-1;p++) {
if (p >= 1) {
S2(p-1,0);
}
for (i=max(0,p);i<=min(N-1,p+N-2);i++) {
S1(i,-p+i);
S2(i,-p+i+1);
}
if (p <= 0) {
S1(p+N-1,N-1);
}
}
S2(N-1,0);
}
-----------------------------
-- Uday
>
> I would have expected a different behavior and I still do
> not understand what am I missing.
>
> Thank you once more.
>
>
> Christian.
>
> --
> You got this message because you subscribed to the CLooG mailing list.
> To send messages to this list, use cl...@googlegroups.com
> To stop subscribing, send a mail to cloog+un...@googlegroups.com
> For more options and to visit the group, http://groups.google.fr/group/cloog?hl=en
>
I had tried previously with single-statement examples, and I guess
that even non injective schedules where working. This was my first try
with two statements.
With the injective schedules (full decision, processor and time
schedule, by adding i as second schedule dimension) CLooG is doing
perfect.
As I told you, I was trying to verify if I was understanding how
schedules work. Therefore, thank you very much for your effort
understanding the problem. I will also try using iscc.
Christian
Thanks for participating. Yes, it might be related to cloog-isl, I
think (don't remember exactly) that I compiled with isl instead of
polylib. Thanks for the confirmation, I will also try using
cloog-polylib.
Christian.
Actually, I shouldn't have put in that "statement-wise".
It may also break for statement-wise injective schedules that are
not injective as a whole.
The attached case schedules S1 at 0, S2 at 1 and S3 at 0 and 1,
yet CLooG generates:
[CLooG] INFO: 1 domains have been blocked.
/* Generated from test_block.cloog by CLooG 0.16.2-2-gbff486a gmp bits in 0.00s. */
S1();
S2();
for (c1=0;c1<=1;c1++) {
S3(c1);
}
Note that this problem isn't "new". The old PolyLib based blocking
also got it wrong:
skimo@purples:~/src/cloog-0.14.1$ ./cloog ~/obj/cloog-isl/test_block.cloog
[CLooG]INFO: 1 domains have been blocked.
/* Generated from /home/skimo/obj/cloog-isl/test_block.cloog by CLooG v0.14.1 64 bits in 0.00s. */
S1 ;
S2 ;
S3(i = 0) ;
S3(i = 1) ;
However, it also got some other cases wrong and was therfore
"temporarily" removed. According to the commit message of
bdccce9 (PolyLib backend: disable cloog_scattering_lazy_block,
Tue May 5 21:03:03 2009 +0200), Cedric Bastoul is working on a fix.
So, Cedric, what should we do?
I could add a "isl_union_map_fast_is_injective" to isl and only
do blocking if this returns true. This would mean that the fix
depends on a new version of isl.
Or, I could just disable blocking completely.
(I tried that on swim.cloog, and it doesn't seem to affect the
running time in any significant way.)
skimo
How about the attached, then?
skimo
On Sun, May 01, 2011 at 02:04:18PM +0200, Cédric Bastoul wrote:How about the attached, then?
> So I, personally, would prefer to keep it and go for
> a isl_union_map_fast_is_injective.