On Thu, Dec 22, 2022 at 07:22:20PM -0800, Steve Luo wrote:
> Sorry, maybe I do not clarify my process clearly. Surely tiling and binding
> are all processed in schedule tree before ast generation.
If you had done the binding before the AST generation,
the AST would look something like this:
if (BX == 0 && BY >= 0 && BY <= 1)
for (int c3 = 0; c3 <= 63; c3 += 1)
for (int c4 = 0; c4 <= 31; c4 += 1)
for (int c5 = 0; c5 <= 31; c5 += 1)
S(64 * BY + c3, c4, c5);
and it would already have blockIdx.x (BX) exactly where it is needed
(I'm assuming here that you want to map the outer dimensions to
blockIdx.y and blockIdx.x in that order).
> And the for loops
> above are just the corresponding ast of the schedule tree (that I want)
> after each operation. I need blockIdx.x exist even though it's gridDim.x =
> 1, because I want to reuse the generated code for other workloads like
>
> for (int i = 0; i < 128; i++)
> for (int j = 0; j < 64; j++)
> for (int k = 0; k < 32; k++)
> C[i][j] += A[i][k] * B[j][k]
Then you should generate an AST for the generic case
for (int i = 0; i < 128; i++)
for (int j = 0; j < J; j++)
for (int k = 0; k < 32; k++)
C[i][j] += A[i][k] * B[j][k]
and you get something like
if (BX >= 0 && BX <= 31 && BY >= 0 && BY <= 1)
for (int c1 = BX; c1 <= floord(J - 1, 32); c1 += 32)
for (int c3 = 0; c3 <= 63; c3 += 1)
for (int c4 = 0; c4 <= min(31, J - 32 * c1 - 1); c4 += 1)
for (int c5 = 0; c5 <= 31; c5 += 1)
S(64 * BY + c3, 32 * c1 + c4, c5);
and then you can plug in specific values of J afterwards.
> here j changes from 32 to 64, but if it uses the same tiling size, it will
> share the generated code same with (128, 32, 32)'s, and all I need is to
> configure a different launch parameters(change gridDim.x) without compiling
> the workload again. And to keep k0-loop is for the same reason. If I can
> preserve the k0-loop even though it's extent is 1, then I can adjust the
> k0-loop's extent to K/32(*outside of isl*) as below
You fundamentally cannot first generate code for one specific value of J and
then hope to be able to generalize this to other values of J.
It may look like it could be done in very specific cases,
but you'll run into all sorts of issues very quickly. You already have.
isl will exploit the specific value of J in ways that you may not have imagined.
isl doen't even know about the concept of any J parameter since you've
only told it about a specific value (32).
skimo