Hello
I recently picked up an old project that I worked on using polly (I was using polly on llvm 9.0 at the time) with the goal of expanding the functionality that I had. The general idea is that I to detect when I have two functions that have a producer / consumer relationship and pipeline them. This is part of a bigger project where I am building a compiler for FPGA code-generation under the HPVM project (hpvm.cs.illinois.edu). I use polly to determine when the access patterns of the producer/consumer do not match and transform the consumer to make the pipeline generation legal. My original implementation handled very basic cases with lots of limitations, and at the time the approach I took after getting feedback and advice from this mailing list was to combine the producer and consumer functions into a single function so that the Scops and Relations in the producer/consumer have the same ISL context allowing me to manipulate and compare the schedules and access relations. The issue I am facing now, as I try to expand the functionality to support more general loops (including nesting) and more complex access patterns, is that Polly is not always detecting the Scops, and in some cases it is detecting a single Scop spanning both loops.
I have the following example:
Producer:
for (size_t i = 0; i < row_size; ++i) {
out1_in2[i] = in[i] * 2;
}
Consumer:
for (size_t i = 0; i < row_size; ++i) {
size_t Left = i - 1;
size_t Middle = i;
size_t Right = i + 1;
float in_L = (i > 0) ? out1_in2[Left] : 0;
float in_M = out1_in2[Middle];
float in_R = (i < row_size) ? out1_in2[Right] : 0;
out2_in3[i] = (in_L + in_M + in_R) /
3;
}
My pass first combines the two functions into a single merged function, then runs the ScopInfoAnalysis on the merged function as such:
auto &MergedSI = FAM.getResult<ScopInfoAnalysis>(*FMerged);
In the above example, I noticed that there is only one Scop generated for the merged function. I ran through gdb and noticed that in the function ScopDetection::FindScops(), first the two subregions are detected as valid regions, but then, it tries to expand the region in the following code:
for (Region *CurrentRegion : ToExpand) {
// Skip invalid regions. Regions may become invalid, if they are element of
// an already expanded region.
if (!ValidRegions.count(CurrentRegion))
continue;
// Skip regions that had errors.
bool HadErrors = lookupRejectionLog(CurrentRegion)->hasErrors();
if (HadErrors)
continue;
Region *ExpandedR = expandRegion(*CurrentRegion);
if (!ExpandedR)
continue;
R.addSubRegion(ExpandedR, true);
ValidRegions.insert(ExpandedR);
removeCachedResults(*CurrentRegion);
removeCachedResultsRecursively(*ExpandedR);
}
This leads to it merging the two regions of the two loops into a single region and generating a Scop for that region instead. I cannot seem to understand why this is happening, and I was wondering if there is a way to enforce that it will return two separate Scops for the separate loops? Would it be better if I manually built the Scops for the loops that I am interested in instead of using the ScopInfoAnalysis pass?
One thing that I have noticed through some debugging and testing I’ve done is that the merging of the regions is occurring when the loops do not have loop guard branches. In this example (and the one below) the loop guard branches for these loops are being explicitly removed by some of our passes. I’ve tested a different case where the loops have guard branches and I do get two separate scops for the producer and the consumer. I am wondering if I should instead be manually creating scops using ScopBuilder while specifying the regions that I need the Scop for to enforce having two separate Scops for the producer/consumer in these scenarios. Any advice here would be appreciated!
Another problem I am facing when using nested loops as such:
Producer:
for (size_t i = 0; i < row_size; ++i) {
for (size_t j = 0; j < col_size; ++j) {
out1_in2[i * col_size + j] = in[i * col_size + j] * 2;
}
}
Consumer:
for (size_t i = 0; i < row_size; ++i) {
for (size_t j = 0; j < col_size; ++j) {
size_t TopLeft = (i - 1) * col_size + j - 1;
size_t TopMiddle = (i - 1) * col_size + j;
size_t TopRight = (i - 1) * col_size + j + 1;
size_t CenterLeft = i * col_size + j - 1;
size_t CenterMiddle = i * col_size + j;
size_t CenterRight = i * col_size + j + 1;
size_t BottomLeft = (i + 1) * col_size + j - 1;
size_t BottomMiddle = (i + 1) * col_size + j;
size_t BottomRight = (i + 1) * col_size + j + 1;
float in_TL = (i > 0 && j > 0) ? out1_in2[TopLeft] : 0;
float in_TM = (i > 0) ? out1_in2[TopMiddle] : 0;
float in_TR = (i > 0 && j < col_size) ? out1_in2[TopRight] : 0;
float in_CL = (j > 0) ? out1_in2[CenterLeft] : 0;
float in_CM = out1_in2[CenterMiddle];
float in_CR = (j < col_size) ? out1_in2[CenterRight] : 0;
float in_BL = (i < row_size && j > 0) ? out1_in2[BottomLeft] : 0;
float in_BM = (i < row_size) ? out1_in2[BottomMiddle] : 0;
float in_BR = (i < row_size && j < col_size) ? out1_in2[BottomRight] : 0;
out2_in3[i * col_size + j] = (in_TL + in_TM + in_TR + in_CL + in_CM +
in_CR + in_BL + in_BM + in_BR) /
9;
}
}
In this scenario the regions are also getting merged, but now no Scops are generated. Instead, ScopBuilder is generating a message: “SCoP detected but dismissed” which seems to be coming from the ScopBuilder() constructor. Any ideas about this as well will also be helpful.
Thanks
-Adel