RFC : OpenMP Dialect in MLIR

284 views
Skip to first unread message

kiran

unread,
Dec 6, 2019, 8:28:07 AM12/6/19
to MLIR, Alex Zinenko, esch...@nvidia.com, jdoe...@anl.gov, cleme...@gmail.com, rofi...@gmail.com, Mehdi AMINI, David...@arm.com
Hi all,

This email introduces the MLIR dialect for OpenMP. The dialect was briefly discussed in the MLIR mailing list before (link below). The primary user of this dialect will the be Flang/F18 compiler currently under construction. It is hoped that other Frontends can also use this dialect as and when they are ready to use MLIR. The primary target for the dialect is LLVM IR. The design intends to re-use code from Clang for achieving this.
https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/4Aj_eawdHiw

The proposed design for adding OpenMP support to Flang/F18 can be seen in slide 10 of the presentation (link below) given to the Flang/F18 community. 

https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view?usp=sharing
The design uses the following two components.
a) MLIR: Flang/F18 compiler uses the MLIR based FIR dialect as its IR. FIR models the Fortran language portion but does not have a representation for OpenMP constructs. By using MLIR for OpenMP we have a common representation for OpenMP and Fortran constructs in the MLIR framework and thereby take advantage of optimisations and avoid black boxes.
b) OpenMP IRBuilder: For reusing OpenMP codegen of Clang. The OpenMP IRBuilder project refactors codegen for OpenMP directives from Clang and places them in the LLVM directory. This way both Clang and Flang can share the code. For details see link below.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html 

ii) Current and Proposed Flow in F18/Flang

a) The current sequential code flow in Flang (Slide 5) of the presentation can be summarised as follows,
https://drive.google.com/file/d/1vU6LsblsUYGA35B_3y9PmBvtKOTXj1Fu/view?usp=sharing
[Fortran code] -> Parser -> [AST] -> Lowering -> [FIR MLIR] -> Conversion -> [LLVM MLIR] -> Translation -> [LLVM IR]
b) The modified flow with OpenMP (Slide 10) will have lowering of the AST to a mix of FIR and OpenMP dialects. These are then optimised and finally converted to mix of OpenMP and LLVM MLIR dialects. The mix is translated to LLVM IR using the existing translation library for LLVM MLIR and the OpenMP IRBuilder currently under construction.
[Fortran code] -> Parser -> [AST] -> Lowering -> [FIR + OpenMP MLIR] -> Conversion -> [LLVM + OpenMP MLIR] -> Translation (Use OpenMP IRBuilder) -> [LLVM IR]
c) The MLIR infrastructure provides a lot of optimisation passes for loops. It is desirable that we take advantage of some of these. But the LLVM infrastructure also provides several optimisations. So there exist some questions regarding where the optimisations should be carried out. We will decide on which framework to choose only after some experimentation. If we decide that the OpenMP construct (for e.g. collapse) can be handled fully in MLIR and that is the best place to do it (based on experiments) then we will not use the OpenMP IRBuilder for these constructs.

iii) OpenMP MLIR dialect
Operations of the dialect will be a mix of fine and coarse-grained. e.g. Coarse: omp.parallel, omp.target, Fine: omp.flush. The operations in MLIR can have regions, hence there is no need for outlining at the MLIR level. While the detailed design of the dialect is TBD, the next section provides some walkthrough examples which provides a summary of the full flow as well as the use of MLIR operations for OpenMP directives, and attributes for representing clauses which are constant.  The proposed plan involves a) lowering F18 AST with OpenMP directly to a mix of OpenMP and FIR dialects. b) converting this finally to a mix of OpenMP and LLVM dialects. This requires that OpenMP dialect can coexist and operate with other dialects. The design is also intended to be modular so that other frontends (C/C++) can reuse the OpenMP dialect in the future. 

iv) Examples
A few example walkthroughs were sent before to the flang mailing lists. These walkthroughs illustrate with an example, the flow for a few constructs (parallel, target, collapse, simd). I am including parallel and collapse construct and will leave pointers for the others.
Example 1: Parallel construct

1) Example OpenMP code

<Fortran code>

!$omp parallel

c = a + b

!$omp end parallel

<More Fortran code>

 


2) Parse tree (Copied relevant section from -fdebug-dump-parse-tree)

<Fortran parse tree>

| | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPBlockConstruct

| | | OmpBlockDirective -> Directive = Parallel

| | | OmpClauseList ->

| | | Block

| | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AssignmentStmt

| | | | | Variable -> Designator -> DataRef -> Name = 'c'

| | | | | Expr -> Add

| | | | | | Expr -> Designator -> DataRef -> Name = 'a'

| | | | | | Expr -> Designator -> DataRef -> Name = 'b'

| | | OmpEndBlockDirective -> OmpBlockDirective -> Directive = Parallel

<More Fortran parse tree>


3) The first lowering will be to a mix of FIR dialect and OpenMP dialect. The OpenMP dialect has an operation called parallel with a nested region of code. The nested region will have FIR (and standard dialect) operations.


Mlir.region(…) {

%1 = fir.x(…)

%20 = omp.parallel {

        %1 = addf %2, %3 : f32

      }

%21 = <more fir>

}



4) The next lowering will be to OpenMP and LLVM dialect

Mlir.region(…) {

%1 = llvm.xyz(...)

%20 = omp.parallel {

        %1 = llvm.fadd %2, %3 : !llvm.float

      }

 %21 = <more llvm dialect>

}



5) The next conversion will be to LLVM IR. Here the OpenMP dialect will be lowered using the OpenMP IRBuilder and the translation library of the LLVM dialect. The IR Builder will see that there is a region under the OpenMP construct omp.parallel. It will collect all the basic blocks inside that region and then generate outlined code using those basic blocks. Suitable calls will be inserted to the OpenMP API. 

 

define @outlined_parallel_fn(...)

{

  ....

  %1 = fadd float %2, %3

  ...

}

 

define @xyz(…)

{

  %1 = alloca float

  ....

  call kmpc_fork_call(...,outlined_parallel_fn,...)

}

For simd, target refer to the links below.


Example 2: Collapse construct

A walkthrough for the collapse clause on an OpenMP loop construct is given below. This is an example where the transformation (collapse) is performed in the MLIR layer itself.


1)Fortran OpenMP code with collapse

!$omp parallel do private(j) collapse(2)

do i=lb1,ub1

  do j=lb2,ub2

    ...

    ...

  end do

end do


2) The Fortran source with OpenMP will be converted to an AST by the F18 parser. Parse tree not shown here to keep it short.


3)

a) The Parse tree will be lowered to a mix of FIR and OpenMP dialects. There are omp.parallel and omp.do operations in the OpenMP dialect which represents parallel and OpenMP loop constructs. The omp.do operation has an attribute "collapse" which specifies the number of loops to be collapsed.
Mlir.region(…) {

  omp.parallel {

    omp.do {collapse = 2} {

      fir.do %i = %lb1 to %ub1 : !fir.integer {

        fir.do %j = %lb2 to %ub2 : !fir.integer {

        ...

        }

      }

    }

  }

}

b) A transformation pass in MLIR will perform the collapsing. The collapse operation will cause the omp.do loop to be coalesced with the loop immediately following it. Note: There exists loop coalescing passes in MLIR transformation passes. We should try to make use of it.
Mlir.region(…) {

  omp.parallel {

    omp.do {

       fir.do %i = 0 to %ub3 : !fir.integer {

        ...

       }

    }

  }
}


4) Next conversion will be to a mix of LLVM and OpenMP dialect.
Mlir.region(…) {

  omp.parallel {

    %ub3 =

    omp.do %i = 0 to %ub3 : !llvm.integer {

    ...

    }

  }
}


5) Finally, LLVM IR will be generated for this code. The translation to LLVM IR can make use of the OpenMP IRBuilder. LLVM IR not shown here to keep it short.

For simd, target refer to the links below.
simd:
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000278.html
target: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

v) Progress
i) OpenMP MLIR 
-> First patch which registers the OpenMP dialect with MLIR has been submitted and merged.
https://github.com/tensorflow/mlir/pull/244
-> Implementation of a minimal OpenMP dialect with a single construct (barrier) is available now. Will need extension to generate LLVM IR.
https://github.com/tensorflow/mlir/pull/275
ii) OpenMP IRBuilder
@Doerfert, Johannes has a series of patches introducing preliminary support for the OpenMP IRBuilder which are either approved or under review. The initial set adds support for the parallel and barrier construct. Others (Roger Ferrer, Fady Ghanim, Kiran) have tried it for constructs like taskwait, flush etc.
https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0
https://reviews.llvm.org/D69785, https://reviews.llvm.org/D70290https://reviews.llvm.org/D69853, https://reviews.llvm.org/D70290

vi) Next Steps
-> Experiment co-existence of OpenMP and LLVM dialects and their lowering to LLVM IR using the OpenMPIRBuilder.
-> Implement on a construct by construct basis starting with the barrier, flush, parallel constructs.
-> Represent construct in OpenMP MLIR
-> Refactor the code for the construct in OpenMP IRBuilder
-> Set up the translation library for OpenMP in MLIR to call the OpenMP IRBuilder
-> Set up the transformation from the frontend to OpenMP MLIR for this construct
-> Upstream changes

Thanks,
Kiran

Valentin Clément

unread,
Dec 6, 2019, 9:18:46 AM12/6/19
to MLIR
Hi Kiran, 

This looks interesting to us as we are exploring an OpenACC dialect for MLIR and I can imagine lots of similarity with your proposal. I added few question/remarks inlined. 
Do you plan to also be able to work with other loops (the one from the Loop or Affine dialect)? Our prototype for OpenACC would be very similar for this construct. So for example it could accept FIR loops or the "standard" loops. 
acc.parallel {
  acc.loop {
    loop.for %arg3 = %lb2 to %ub1 step %c1 {

      loop.for %arg4 = %lb1 to %ub2 step %c1 {

          ...

        }

      }

  } attributes { collapse = 2 }
} attributes { async = 1 }

I can imagine the OMP target offloading to be pretty similar to OpenACC and I could imagine a lowering to a common "accelerator" dialect or something like that. 

Have you think how to model host/device transfer for amp target already?

kiran

unread,
Dec 6, 2019, 11:24:23 AM12/6/19
to Valentin Clément, MLIR
Thanks Clement for your reply.

Yes, as per design we have other loops also working with the OpenMP dialect. The loops and operations in the acc and OpenMP dialects look very similar.

For the target construct, we will depend on the OpenMPIRbuilder (which will have refactored code from Clang) to generate the code.
http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-September/000285.html

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/f8cc935e-35c3-4657-b153-736eed0fddce%40tensorflow.org.


--
regards,
Kiran C

Kiran Chandramohan

unread,
Jan 21, 2020, 6:55:28 PM1/21/20
to MLIR
Hi,

Should this RFC be posted in discourse since not much discussion happened here and this is not the right group anymore?

Also please let me know if I should be posting this RFC differently or adding more information.

I missed one point in the RFC regarding who will be maintaining the OpenMP dialect. The Fortran teams in Nvidia (PGI), Arm, AMD and some members from the US National Labs who are part of the flang community will be maintainers. Should this information be more detailed like the names of the people?

The initial patches to add one construct (barrier) to the dialect and lower it to LLVM IR is under review now.
https://reviews.llvm.org/D72400
https://reviews.llvm.org/D72962

Thanks,
Kiran
Hi all,
Johannes has a series of patches introducing preliminary support for the OpenMP IRBuilder which are either approved or under review. The initial set adds support for the parallel and barrier construct. Others (Roger Ferrer, Fady Ghanim, Kiran) have tried it for constructs like taskwait, flush etc.
https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0
https://reviews.llvm.org/D69785, https://reviews.llvm.org/D70290https://reviews.llvm.org/D69853, https://reviews.llvm.org/D70290

Alex Zinenko

unread,
Jan 22, 2020, 3:37:01 AM1/22/20
to Kiran Chandramohan, MLIR
Yes, please post this on discourse.

--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.


--
Alex
Reply all
Reply to author
Forward
0 new messages