Currently, all NVPTX kernels generated by Polly-ACC are named "kernel_#" ( #=0,1,2,3,...) . While profiling, we would have to find the host function that called the kernel by searching for the point where the kernel is launched in the source code. This task becomes even more arduous as the number of kernels increase.
Furthermore, it might become impossible to find the kernel-host function association when Polly is used through JIT. i.e. functions with GPU friendly SCoPs would end up calling a function of the name, although, with different definitions. E.g. nvprof on Julia running PolyBench.jl,
==4776== Profiling application: usr/bin/julia --check-bounds=no -g0
==4776== Profiling result:
Time(%) Time Calls Avg Min Max Name
76.16% 24.1500s 25701 939.65us 115.55us 46.950ms kernel_0
19.28% 6.11307s 25054 244.00us 115.46us 22.412ms kernel_1
[...]