₹ cloog mvt.pluto.cloog
/* Generated from mvt.pluto.cloog by CLooG 0.17.0-2-g3f88770 gmp bits in
0.01s. */
if (N >= 1) {
for (t1=0;t1<=N-1;t1++) {
for (t2=0;t2<=N-1;t2++) {
if (t1 == t2) {
S1(t1,t1);
S2(t1,t1);
}
if (t1 <= t2-1) {
S1(t1,t2);
}
if (t1 <= t2-1) {
S2(t2,t1);
}
if (t1 >= t2+1) {
S2(t2,t1);
}
if (t1 >= t2+1) {
S1(t1,t2);
}
}
}
}
when the generated code could have instead just been
for (t1=0;t1<=N-1;t1++) {
for (t2=0;t2<=N-1;t2++) {
S1(t1,t2);
S2(t2,t1);
}
}
The .cloog specifies (i,j) for S1 and (j,i) for S2 as scatterings; so an
order is not specified for S1, S2 at the innermost level. Cloog uses (or
ends up using) original (i,j) to order them; {S1,S2} when t1 < t2 (since
S1(i,j) << S2(j,i) for t1 < t2, '<<' being lexico less than), {S2,S1}
when t2 > t1, and {S1,S2} when t1==t2 (random in absence of any ordering).
Can it just be improved to generate "better" code whenever it has such
freedom?
-Uday
Hi Uday,
you should already get this behavior, if you set '-l' to the number of
scattering dimensions. For the example you posted, I get:
$cloog /tmp/mvt.pluto.cloog -l 2
/* Generated from /tmp/mvt.pluto.cloog by CLooG 0.17.0-6-g088669b gmp
bits in 0.01s. */
if (N >= 1) {
for (t1=0;t1<=N-1;t1++) {
for (t2=0;t2<=N-1;t2++) {
S1(t1,t2);
S2(t2,t1);
}
}
Tobi
I see, thanks. In this case, it appears counter-intuitive that
optimizing inner levels (depth > 2) by separation actually leads to more
control overhead. I suppose this is because cloog doesn't know that i
and j have no iterations once t1 and t2 are scanned. Using the number of
scattering functions as -l setting appears to be the right choice for
full-ranked scatterings.
-Uday
As far as I understand this is because CLooG orders unordered iterations
by first extending the scattering by the original dimensions and by then
adding the statement number as the very last dimension. This is in this
case a very poor heuristic as it yields a different order of S1 and S2
for different subspaces of the iteration space (which again yields ugly
code).
> Using the number of
> scattering functions as -l setting appears to be the right choice for
> full-ranked scatterings.
Yes. Or alternatively, providing a schedule that does not give any
ordering freedom.
Cheers
Tobi