Hello,
I am working on a project where I lift Scops to SDFGs and then use DaCe to optimize and generate code into a separate source file (~ a library). I want to remove the Scop in the source file and replace it with calls to the library functions.
I have a basic implementation for the lifting to SDFGs and for replacing Scops, but I have many, many questions:
- Lifting: I am exporting via JScop and parsing the JSON to an SDFG in Python using the SDFG API. I am converting domains to SDFG maps and LLVM instructions to Tasklets inside those maps (purely data-centric). To someone who also knows DaCe: Would it make sense to lift the Scop to the SDFG using state transitions and then apply LoopToMap analysis? I am worried that the instructions in the JScop and the domain are insufficient to represent the algorithm.
- Cutout: I am using the terminator of the entering block to redirect the control flow around the Scop to the existing block. The Scop is then automatically removed by dead code elimination. Calls to the SDFG are inserted before the terminator. On polybench, I get errors that the terminator cannot be found. There should always be a terminator inside the entering block, right? Is there a smarter way to do it?
- PyISL: I would like to use PyISL to parse the string into ISL objects rather than writing a new string parser. Is there an example of how to use PyISL with JScop output?
- False Domain: I have seen domains where the JSON contains false. Are those empty domains or invalid domains?
- Strategy: Since a Scop may contain multiple statements, the approach fails if only one cannot be lifted. E.g., a matmul with initialization is two statements; still, I want to optimize the matmul if the initialization fails to be lifted. Therefore, I would like to replace the code on a statement level rather than a Scop level. Could anyone please give me some input here? It seems possible to only get blocks related to one statement. Are blocks unique, or could they contain multiple statements?
My motivation is to use transfer tuning (and also apply non-polyhedral optimization, e.g., local storage) and so I cannot use the integrated code generator for ISL-based schedules.
Any ideas, comments, and input are very welcome!
Cheers,
Lukas