Making outermost loop sequential

9 views
Skip to first unread message

Shilpa B

unread,
Sep 21, 2021, 10:47:11 AM9/21/21
to Polly Development
Hi,

For the "gemm" code, Polly generates the following parallel and tiled code (for a problem size of 4096 and tilesize=32),

    #pragma omp parallel for
    for (int c0 = 0; c0 <= 127; c0 += 1)
      for (int c1 = 0; c1 <= 127; c1 += 1)
        #pragma minimal dependence distance: 1
        for (int c2 = 0; c2 <= 127; c2 += 1)
          for (int c3 = 0; c3 <= 31; c3 += 1)
            for (int c4 = 0; c4 <= 31; c4 += 1) {
              if (c2 == 0)
                Stmt1(32 * c0 + c3, 32 * c1 + c4);
              #pragma minimal dependence distance: 1
              for (int c5 = 0; c5 <= 31; c5 += 1)
                Stmt2(32 * c0 + c3, 32 * c1 + c4, 32 * c2 + c5);
            }

However, as per my requirements, I should generate the following code:
    for (int i0 = 0; i0 <= 4095; i0 += 1)
     #pragma omp parallel for

      for (int c1 = 0; c1 <= 127; c1 += 1)
        #pragma minimal dependence distance: 1
        for (int c2 = 0; c2 <= 127; c2 += 1)
            for (int c4 = 0; c4 <= 31; c4 += 1) {
              if (c2 == 0)
                Stmt1(i0, 32 * c1 + c4);
              #pragma minimal dependence distance: 1
              for (int c5 = 0; c5 <= 31; c5 += 1)
                Stmt2(i0, 32 * c1 + c4, 32 * c2 + c5);
            }
The outermost loop is sequential and is not tiled, while the 2nd outermost loop is parallelized.
I tried options like modifying JSCOP file but I could generate loops as shown in (a) and (b) below:

 a) #pragma omp parallel for
    for (int i0 = 0; i0 <= 4095; i0 += 1)
      for (int c1 = 0; c1 <= 127; c1 += 1)
        #pragma minimal dependence distance: 1
        for (int c2 = 0; c2 <= 127; c2 += 1)
            for (int c4 = 0; c4 <= 31; c4 += 1) {
              if (c2 == 0)
                Stmt1(i0, 32 * c1 + c4);
              #pragma minimal dependence distance: 1
              for (int c5 = 0; c5 <= 31; c5 += 1)
                Stmt2(i0, 32 * c1 + c4, 32 * c2 + c5);
            }
   
 and
 b)  for (int i0 = 0; i0 <= 1; i0 += 1)
      #pragma omp parallel for

      for (int c1 = 0; c1 <= 127; c1 += 1)
        #pragma minimal dependence distance: 1
        for (int c2 = 0; c2 <= 127; c2 += 1)
            for (int c4 = 0; c4 <= 31; c4 += 1) {
              if (c2 == 0)
                Stmt1(0, 32 * c1 + c4);
              #pragma minimal dependence distance: 1
              for (int c5 = 0; c5 <= 31; c5 += 1)
                Stmt2(0, 32 * c1 + c4, 32 * c2 + c5);
            }
           
In (b) initially the problem size for i0 is 1 and then I tried changing the domain to increase the iteration space of i0 from 1 to 4096 and also the schedule and access functions to include i0 in Jscop file. Importing this modified Jscop detects that there is a change in access function but the changes made to domain, schedule and access functions are not reflected.

Is there any other way I can generate the loop as I have indicated?

Thanks,
Shilpa

Michael Kruse

unread,
Sep 21, 2021, 11:47:37 AM9/21/21
to Shilpa B, Polly Development
Generally, I recommend against changing the scheduling by modifying
the JScop file. It is intended for regression testing. Only schedule
and access function can be modified, but not domains or loop
properties (such as #pragma omp parallel for). If you don't observe
changes in the schedule, it is probably because you are running the
ScheduleOptimizer after importing the JScop, which just throws away
your imported schedule and computes a new one.

Michael

Am Di., 21. Sept. 2021 um 09:47 Uhr schrieb Shilpa B <shil...@gmail.com>:
> Is there any other way I can generate the loop as I have indicated?

Short answer: Not with a JScop file. Which loop is marked with #pragma
omp parallel for is computed, not determined by the jscop file.

Michael

--
Tardyzentrismus verboten!

Shilpa B

unread,
Sep 21, 2021, 12:45:45 PM9/21/21
to re...@meinersbur.de, Polly Development
Hi Michael,

Thank you for your inputs. Is there any other way to achieve this? Can you give me some pointers if possible?

Thanks,
Shilpa

Michael Kruse

unread,
Sep 21, 2021, 4:55:06 PM9/21/21
to Shilpa B, Michael Kruse, Polly Development
Such transformation would be done on the schedule tree. See e.g.
ScheduleOptimizer.cpp how it adds tiling, loop interchange, marks
loops as parallel etc.

Michael
--
Tardyzentrismus verboten!

Shilpa B

unread,
Sep 22, 2021, 12:54:00 PM9/22/21
to re...@meinersbur.de, Polly Development
Thanks, Michael for your inputs. I am exploring the implementation. I will get back to you if I need any other help.

Thanks,
Shilpa
Reply all
Reply to author
Forward
0 new messages