[llvm-dev] MLIR for clang

260 views
Skip to first unread message

Prashanth N R via llvm-dev

unread,
Feb 16, 2020, 4:16:57 AM2/16/20
to llvm-dev, flan...@lists.llvm.org
  Starting from May-June, we at "Compiler Tree" would  start porting clang compiler to use MLIR as middle end target. If someone has already started a similar effort we would love to collaborate with them. If someone would like to work with us, we are ready to form a group and collaborate. If there are sharing opportunities from Fortran side, we would like to consider the same.

   We are in the early phase of design for "C" part of the work. From our experience with (FC+MLIR) compiler, we are estimating that we would have an early cut of the compiler working with non-trivial workload within a quarter of starting of work.

  Please ping me for any queries or concerns. 

Regards,
-Prashanth

Prashanth N R via llvm-dev

unread,
Feb 16, 2020, 4:22:41 AM2/16/20
to llvm-dev, flan...@lists.llvm.org, cfe...@lists.llvm.org
+cfe-dev

Nicolai Hähnle via llvm-dev

unread,
Feb 16, 2020, 6:04:57 AM2/16/20
to Prashanth N R, llvm-dev, cfe-dev, flan...@lists.llvm.org
Hi Prashanth,

On Sun, Feb 16, 2020 at 10:22 AM Prashanth N R via llvm-dev
<llvm...@lists.llvm.org> wrote:
>> Starting from May-June, we at "Compiler Tree" would start porting clang compiler to use MLIR as middle end target. If someone has already started a similar effort we would love to collaborate with them. If someone would like to work with us, we are ready to form a group and collaborate. If there are sharing opportunities from Fortran side, we would like to consider the same.\

That's a rather vague statement, considering the flexibility of MLIR.
Could you explain your plans in more detail, and what specifically you
hope to achieve with them?

Cheers,
Nicolai


>>
>> We are in the early phase of design for "C" part of the work. From our experience with (FC+MLIR) compiler, we are estimating that we would have an early cut of the compiler working with non-trivial workload within a quarter of starting of work.
>>
>> Please ping me for any queries or concerns.
>>
>> Regards,
>> -Prashanth
>

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Michael Kruse via llvm-dev

unread,
Feb 16, 2020, 3:41:36 PM2/16/20
to Prashanth N R, llvm-dev, flan...@lists.llvm.org
Can you elaborate what your approach is? Do you intent to fork clang
for MLIR at a specific version, keep up-to-date with master and/or try
to upstream this?

Do you think MLIR has all the semantics required, such as for
representing exceptions?

Michael

Am So., 16. Feb. 2020 um 03:16 Uhr schrieb Prashanth N R via flang-dev
<flan...@lists.llvm.org>:

> _______________________________________________
> flang-dev mailing list
> flan...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev

Prashanth N R via llvm-dev

unread,
Feb 17, 2020, 11:58:41 AM2/17/20
to Michael Kruse, llvm-dev, flan...@lists.llvm.org
Hi Michael-

1. We intent to fork clang for MLIR at a particular release and develop. We will mostly merge with the master as soon as we reach a good milestones. Most of the development is expected to happen in github or some such version control system.
2. MLIR is extensible and we are hoping that constructs like exceptions can be represented in MLIR. As we dive deep into design we might be able to answer the question in detail.

thanks,
-Prashanth

C Bergström via llvm-dev

unread,
Feb 17, 2020, 12:13:21 PM2/17/20
to Prashanth N R, llvm-dev, Michael Kruse, flan...@lists.llvm.org
On Tue, Feb 18, 2020 at 12:58 AM Prashanth N R via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi Michael-

1. We intent to fork clang for MLIR at a particular release and develop. We will mostly merge with the master as soon as we reach a good milestones. Most of the development is expected to happen in github or some such version control system.
2. MLIR is extensible and we are hoping that constructs like exceptions can be represented in MLIR. As we dive deep into design we might be able to answer the question in detail.

Speaking 1st hand here.

Note: There's a number of internal lowering processes between clang and llvm and so I'll use general terms to describe it for simplicity.

When "we" (PathScale) made clang emit High WHIRL instead of "whatever" it really isn't as bad as some people around here may think. I'd guess it only took us about 2 years to go from zero to production quality and self hosting. This was multiple engineers working concurrently and dealing with a lot of legacy. I could easily see it taking less time if you don't have to bring up advanced loop optimizations or care too much about EH stuff.

I'm not sure how much use or value it would be to anyone, but I have and control all of that code. Briefly, we hook into clang directly after AST and then swap out the IR "codegen" for what we coined WhirlGen.

I must admit that I feel a bit smug that design choices I made 10 years ago are finally being taken seriously around here.


Prashanth N R via llvm-dev

unread,
Feb 17, 2020, 12:14:44 PM2/17/20
to Nicolai Hähnle, llvm-dev, cfe-dev, flan...@lists.llvm.org
Currently LLVM uses a low level IR for representing programs. Memory disambiguation does not happen accurately for constructs like multi-dimensional arrays.  One of the ways we alleviate the same in LLVM currently is by using multiversioning of the code. By supporting a mid-level IR like MLIR we intend to keep the access indices of multidimensional array and do better disambiguation. 

thanks,
-Prashanth

Prashanth N R via llvm-dev

unread,
Feb 17, 2020, 12:25:19 PM2/17/20
to Nicolai Hähnle, llvm-dev, cfe-dev, flan...@lists.llvm.org
Fred Chow is a well known name in compiler community. He was the architect of Open64 compiler.  
His comment on LLVM IR from open64 mailing list can be seen at :  https://sourceforge.net/p/open64/mailman/message/23829398/ 
" From their name, LLVM roughly corresponds to Low WHIRL. I wonder how
LLVM tackles the compilation problems Open64 has tackled.  People with 
exposure to LLVM are welcome to chime in."


C Bergström via llvm-dev

unread,
Feb 17, 2020, 12:46:41 PM2/17/20
to Prashanth N R, llvm-dev, flan...@lists.llvm.org, cfe-dev
When you actually start to solve and implement this you'll find that "LLVM IR" is actually mid-whirl on an almost 1:1 basis. (super close). However, we don't exactly do IR<>IR translation and instead hook in at an API level. So it's more matching constructs and trying to "get in where you fit in". Since we had overlapping mid-whirl optimizations we had to figure out which to turn on/off on each side.

We also skipped VH Whirl for a number of reasons and just kinda cut it out. I don't think Fred is subscribed to the list, but he isn't the only smart person who worked on the compiler. There were a few managers, brilliant people and unsung heroes who worked at SGI.

Dror Maydan and Sun Chan were more intimately involved with the actual implementation of loop optimizations iirc. Fred's best known for his low level codegen work and overall vision of compiler architecture.

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Hal Finkel via llvm-dev

unread,
Feb 17, 2020, 1:28:00 PM2/17/20
to Prashanth N R, llvm-dev, cfe...@lists.llvm.org, Richard Smith

Hi, Prashanth,

I definitely recommend that we have a discussion first on design goals for this. You've mentioned modeling of multidimensional arrays, and I know you've also been thinking about OpenMP, and it would be good to lay out the desired end state.

Part of the reason I say this is because there are significant design decisions that I suspect will appear up front. Handling of multidimensional arrays is a good example. C/C++ certainly do have multidimensional arrays of static extent, but these are largely irrelevant for a significant fraction of production C++ use cases. This is because, in many cases, the array bounds are not known statically, or at least they're not all known statically, and so programmers make use of C++ wrapper libraries which provide the interface of multidimensional arrays implemented on top of one-dimensional heap-allocated data. If we create an infrastructure that works well for static multidimensional arrays but does not contain any provision for recognizing appropriate loop nests and also treating them using the multidimensional-array optimization infrastructure, we won't really improve the compiler in practice for many, if not most, relevant production users.

It's also going to be important what we optimize loops that only look like loops after coroutines are analyzed and inlined. Regardless, there certainly are areas in which we could do a better job optimizing constructs  (e.g., more devirtualization, optimization of exception handling and uses of RTTI), and it would be good to put everything out on the table so that decisions can be made based on use cases as opposed to being driven by the desire to use a particular tool.

Thanks again,

Hal

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

John McCall via llvm-dev

unread,
Feb 17, 2020, 2:27:43 PM2/17/20
to Hal Finkel, llvm-dev, Richard Smith, cfe...@lists.llvm.org
On 17 Feb 2020, at 13:27, Hal Finkel wrote:
> Hi, Prashanth,
>
> I definitely recommend that we have a discussion first on design goals
> for this. You've mentioned modeling of multidimensional arrays, and I
> know you've also been thinking about OpenMP, and it would be good to
> lay out the desired end state.

Is the goal for this to be an out-of-tree proof of concept, or is the
goal to eventually integrate this into LLVM and have Clang compile by
emitting MLIR as an intermediate stage? The latter would be a huge
project with a lot of uncertain trade-offs, but I think it would be very
interesting; whereas I’m afraid the former is not something I can
spare any time to think about.

John.

Michael Kruse via llvm-dev

unread,
Feb 17, 2020, 6:04:51 PM2/17/20
to Prashanth N R, llvm-dev, Michael Kruse, flan...@lists.llvm.org
Am Mo., 17. Feb. 2020 um 10:58 Uhr schrieb Prashanth N R
<prasha...@gmail.com>:

> 1. We intent to fork clang for MLIR at a particular release and develop. We will mostly merge with the master as soon as we reach a good milestones. Most of the development is expected to happen in github or some such version control system.

Once you did the conversion, I think it becomes unfeasible to merge in
LLVM master on a regular basis. For instance, you will have changed
every occurrence of llvm::Instruction to mlir::Operation and its API,
affecting about every line of code starting with clangCodeGen. Even if
you introduce compatibility layer, upstream LLVM will continue to use
llvm::Instruction.

Another case is MLIR's use of BasicBlock arguments instead of PHINode.
This is a change we potentially would want in LLVM as well, but would
require every line that assumes PHI nodes at once, so has not been
done yet.


Michael

C Bergström via llvm-dev

unread,
Feb 17, 2020, 7:30:51 PM2/17/20
to Michael Kruse, llvm-dev, flan...@lists.llvm.org
On Tue, Feb 18, 2020 at 7:04 AM Michael Kruse via llvm-dev <llvm...@lists.llvm.org> wrote:
Am Mo., 17. Feb. 2020 um 10:58 Uhr schrieb Prashanth N R
<prasha...@gmail.com>:
> 1. We intent to fork clang for MLIR at a particular release and develop. We will mostly merge with the master as soon as we reach a good milestones. Most of the development is expected to happen in github or some such version control system.

Once you did the conversion, I think it becomes unfeasible to merge in
LLVM master on a regular basis. For instance, you will have changed
every occurrence of llvm::Instruction to mlir::Operation and its API,
affecting about every line of code starting with clangCodeGen. Even if
you introduce compatibility layer, upstream LLVM will continue to use
llvm::Instruction.

Another case is MLIR's use of BasicBlock arguments instead of PHINode.
This is a change we potentially would want in LLVM as well, but would
require every line that assumes PHI nodes at once, so has not been
done yet.

I don't know how they are doing it, but if they hook in after AST like we do and make a whole knew SomethingGen/ it should be possible that regular rebase will work. That internal API isn't stable, but I don't remember a ton of code churn on either sides of the API that we connected with. Of course this is anecdotal, but it's exactly the kind of feedback that may be helpful to others trying to achieve what I think they are aiming for.

dum dee doo

Prashanth N R via llvm-dev

unread,
Feb 18, 2020, 10:56:41 AM2/18/20
to Hal Finkel, llvm-dev, Richard Smith, cfe-dev
Hal-

Thanks for the critical  issues to ponder over. We will get back to you once we  have more clarity of the task. 

thanks,
-Prashanth

Michael Kruse via llvm-dev

unread,
Feb 18, 2020, 11:16:37 AM2/18/20
to C Bergström, llvm-dev, Michael Kruse, flan...@lists.llvm.org
Am Mo., 17. Feb. 2020 um 18:30 Uhr schrieb C Bergström
<cberg...@pathscale.com>:

> I don't know how they are doing it, but if they hook in after AST like we do and make a whole knew SomethingGen/ it should be possible that regular rebase will work. That internal API isn't stable, but I don't remember a ton of code churn on either sides of the API that we connected with. Of course this is anecdotal, but it's exactly the kind of feedback that may be helpful to others trying to achieve what I think they are aiming for.

I think we are talking about two different things here. Your approach
is to create a new SomethingGen next to (or replacing) clangCodeGen to
emit a different IR.
I was assuming Compiler Tree wants to change the entire mid-end
pipeline to the MLIR's LLVM dialect, but to begin with, clangCodeGen
could be converted only and MLIR's LLVM-IR conversion be used to pass
the result to the mid-end pipeline. One would not want to 'fork' to
clangCodeGen since it's mostly a mechanical change and merge conflicts
are still the easiest way to reflect upstream changes directly.

Chris Lattner via llvm-dev

unread,
Feb 20, 2020, 10:19:26 PM2/20/20
to Prashanth N R, llvm-dev, flan...@lists.llvm.org

> On Feb 16, 2020, at 1:16 AM, Prashanth N R via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Starting from May-June, we at "Compiler Tree" would start porting clang compiler to use MLIR as middle end target. If someone has already started a similar effort we would love to collaborate with them. If someone would like to work with us, we are ready to form a group and collaborate. If there are sharing opportunities from Fortran side, we would like to consider the same.
>
> We are in the early phase of design for "C" part of the work. From our experience with (FC+MLIR) compiler, we are estimating that we would have an early cut of the compiler working with non-trivial workload within a quarter of starting of work.

Hi Prashanth,

I’d love to see this.

In terms of staging this in over time, have you considered starting by tackling the Clang “CFG” representation first? It is used for source level analysis (-Wunreachable, clang static analyzer) and would be much better as a “CIL” implemented in MLIR. From there, you could port Clang’s CodeGen/IRGen to be based on that IR instead of the AST. From there, you could factor other parts of IRGen out into their own independent MLIR lowering phases (e.g. ABI lowering etc).

The advantage of starting with the CFG representation is that the bar to getting it accepted into the tree is lower (Clang CFG isn’t complete) and swapping out one IR with another should not create a compile time regression - adding another phase could.

-Chris

Renato Golin via llvm-dev

unread,
Feb 21, 2020, 6:49:48 PM2/21/20
to Hal Finkel, llvm-dev, Richard Smith, cfe...@lists.llvm.org
+1

This is a very big, long and complex task and if we don't take it as a community project, the off tree project will bit rot before it can be merged. It has happened so many times before and will certainly happen again if we're not careful.

Converting openmp and known library calls into memref and affine for would be super cool. 

Cheers, 
Renato 

_______________________________________________
Reply all
Reply to author
Forward
0 new messages