[LLVMdev] "Graphite" for llvm

212 views
Skip to first unread message

ether zhhb

unread,
Dec 26, 2009, 7:06:33 AM12/26/09
to LLVM Developers Mailing List
hi,

dose anyone going/planning to add something like
Graphite(http://gcc.gnu.org/wiki/Graphite) in gcc to llvm(or that
should be implement at the level of clang?)?

thanks

--ether
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Tobias Grosser

unread,
Dec 26, 2009, 4:43:22 PM12/26/09
to ether zhhb, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
Hi ether,

On 12/26/09 13:06, ether zhhb wrote:
> hi,
>
> dose anyone going/planning to add something like
> Graphite(http://gcc.gnu.org/wiki/Graphite) in gcc to llvm(or that
> should be implement at the level of clang?)?

I already looked into implementing something like Graphite for LLVM.
However just recently, so I have not released any code yet. As soon as
some code is available I will post patches.

Anybody who wants to work on the polyhedral model in LLVM, is invited to
the Graphite mailing list, so we can share ideas.

Here some information about graphite like optimizations in LLVM.

A short introduction to the current state of GCC/Graphite:
-----------------------------------------------------------------------
Graphite is a project in GCC that uses a mathematical representation,
the polytop model, to represent and transform loops and other control
flow structures. Using an abstract representation it is possible to
reason about transformations in a more general way and to use highly
optimized linear programming libraries to figure out the optimal loop
structures. These transformations can be used to do constant propagation
through arrays, remove dead loop iterations, optimize loops for cache
locality, optimize arrays, apply advanced automatic parallelization, or
to drive vectorization.

The current state of Graphite and the polyhedral model in general is at
the moment in between research and production. Over the last 20 years
there has been a lot of research in the area of the polyhedral model,
however it was never used in real world compilers (I know of) until
Sebastian Pop started to implement the required analysis and Graphite
itself. Graphite has shown that it is possible to convert a low level
imperative language into the polyhedral model and generate working code
back from it with reasonable afford. Now several people from INRIA, IBM,
AMD, the University of Passau and China are working on making the
optimizations that have been found during the 20 years of research
available to GCC. The latest news about Graphite, will be presented on
the GROW workshop 2010 in Pisa by Konrad Trifunovic.

[Advertisement end] ;-)
-----------------------------------------------------------------------


A general plan to implement polyhedral transformations in LLVM:

1. The identity transformation (LLVM->polyedral->LLVM)
======================================================

Create the polyhedral representation of the LLVM IR, do nothing with it,
and generate LLVM IR from the polyhedral representation. (Enough to
attach external optimizers)

1.1 Detect regions
1.2 Translate LLVM IR to polyhedral model
-----------------------------------------

The first step will be to analyze the LLVM intermediate language and
extract control flow regions that can be analyzed using the polyhedral
model. This is mainly based on the scalar evolution analysis and should
be more or less like the detection in Graphite.
One point I do not yet fully understand is how to get array access
functions from the LLVM-IR. (Probably based on getElementPtr)

Another question is which polyhedral library can be used inside LLVM.
One option would be ISL from Sven Verdoolaege (LGPL) another the PPL
from Roberto Bagnara (GPLv3).

1.3 Generate LLVM IR from polyhedral mode
-----------------------------------------

For code generation the CLooG/isl library can be used. It is LGPL
licensed. Sven will also work on an CLooG using the PPL, so this could
also be an option.

2. Optimize on the polyhedral representation
============================================

2.1 Use external optimizers
---------------------------

The polyhedral loop description is simple and not compiler depended.
Therefore external tools like LooPo (automatic parallelization), Pluto
(optimization in general) or even Graphite might be used to optimize
code. This could give a first impression what to expect from the
polyhedral model in LLVM.

There are also affords to establish an interchangeable polyhedral format
(scoplib - Louis-Noel Pouchet) and to generate a polyhedral compilation
package. These will allow to share/exchange optimizations between
different compilers and research tools.

Furthermore an external interface will enable researchers to use LLVM
for their work on polyhedral optimizations. This might be useful as
there is no polyhedral compiler for any dynamic language yet. Also
recent work on optimizing for GPUs has started in the polyhedral
community, so the LLVM OpenCL implementation might be interesting too.

2.2 Implement optimizations in LLVM
-----------------------------------

Useful optimizations could be imported into / rewritten in LLVM. For
optimizations that transfrom the LLVM-IR like vectorization or automatic
parallelization at least some part of the optimizations has to be in
LLVM anyway.

This is just a rough overview. If anybody is interested in working on
any of these topics, as mentioned above, I would be glad to help.

Enjoy your holidays

Tobi

P.S.: I do not see any reason implement this in Clang, the LLVM IR is
the right place to do this.

ether

unread,
Dec 27, 2009, 4:18:01 AM12/27/09
to Tobias Grosser, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
hi Tobi ,

that sounds greate :D

On 2009-12-27 5:43, Tobias Grosser wrote:
> I already looked into implementing something like Graphite for LLVM.
> However just recently, so I have not released any code yet. As soon as
> some code is available I will post patches.

whats its status? do you need any help?

> A general plan to implement polyhedral transformations in LLVM:
>
> 1. The identity transformation (LLVM->polyedral->LLVM)
> ======================================================
>
> Create the polyhedral representation of the LLVM IR, do nothing with
> it, and generate LLVM IR from the polyhedral representation. (Enough
> to attach external optimizers)
>
> 1.1 Detect regions
> 1.2 Translate LLVM IR to polyhedral model
> -----------------------------------------
>
> The first step will be to analyze the LLVM intermediate language and
> extract control flow regions that can be analyzed using the polyhedral
> model. This is mainly based on the scalar evolution analysis and
> should be more or less like the detection in Graphite.
> One point I do not yet fully understand is how to get array access
> functions from the LLVM-IR. (Probably based on getElementPtr)

as i know, getElementPtr combine the base pointer and the offset to
generate the pointer to the given element, like:
&(b[i]) ==> %8 = getelementptr i32* %b, i32 %i.03 ;1-dimension
and &(c[i][j]) ==> %4 = getelementptr [4 x i32]* %c, i32
%i.0.reg2mem.0.ph.ph, i32 %j.0.reg2mem.0.ph ;2-dimension
so maybe you could check the non-pointer operand of the getelementptr
instructrion.

> Another question is which polyhedral library can be used inside LLVM.
> One option would be ISL from Sven Verdoolaege (LGPL) another the PPL
> from Roberto Bagnara (GPLv3).
>
> 1.3 Generate LLVM IR from polyhedral mode
> -----------------------------------------
>
> For code generation the CLooG/isl library can be used. It is LGPL
> licensed. Sven will also work on an CLooG using the PPL, so this could
> also be an option.
>
> 2. Optimize on the polyhedral representation
> ============================================
>
> 2.1 Use external optimizers
> ---------------------------
>
> The polyhedral loop description is simple and not compiler depended.
> Therefore external tools like LooPo (automatic parallelization),
> Pluto (optimization in general)

i had contacted the author a week ago, and if we use it, we need a IR
generator in llvm side to extract SCoP, and the corresponding parser in
Pluto side that read, then a backend for cloog and the corresponding in
parser llvm that read those stuff back to llvm. :)
> or even Graphite
hows the statue of PCP? could us just use the PCP language with annotation?


> might be used to optimize code. This could give a first impression
> what to expect from the polyhedral model in LLVM.
>
> There are also affords to establish an interchangeable polyhedral
> format (scoplib - Louis-Noel Pouchet) and to generate a polyhedral
> compilation package. These will allow to share/exchange optimizations
> between different compilers and research tools.
>
> Furthermore an external interface will enable researchers to use LLVM
> for their work on polyhedral optimizations. This might be useful as
> there is no polyhedral compiler for any dynamic language yet. Also
> recent work on optimizing for GPUs has started in the polyhedral
> community, so the LLVM OpenCL implementation might be interesting too.

if that targeting some entertainment application instead of scientific
computing, implement the polyhedral framework beyond the llvm "support"
and "system" library, to allow those thing run as native windows
application is a good idea, i think.


>
> 2.2 Implement optimizations in LLVM
> -----------------------------------
>
> Useful optimizations could be imported into / rewritten in LLVM. For
> optimizations that transfrom the LLVM-IR like vectorization or
> automatic parallelization at least some part of the optimizations has
> to be in LLVM anyway.

the information that compute in polyhedral framework such like iteration
domain and affine schedule may all llvm to do some interesting
optimization like generate stream processing code from normal c code
(and to optimize pointer access boundary check?)
>
> Enjoy your holidays
>
the same to you guys :)

best regards

--ether

Sebastian Pop

unread,
Dec 28, 2009, 11:25:14 AM12/28/09
to Albert Cohen, Jan Sjodin, GCC GRAPHITE, LLVM Developers Mailing List, lo...@infosun.fim.uni-passau.de
On Mon, Dec 28, 2009 at 05:05, Albert Cohen <Albert...@inria.fr> wrote:
> PCP is only partially implemented: conversion out of PCP to Graphite is not
> implemented,

Actually Gimple to PCP to Graphite is implemented in some extent,
but there still are plenty of bugs and we should work on the out of
Graphite to PCP to Gimple/LLVM if we want to get rid of all these bugs.
Also the patches for these are not yet in the Graphite branch,
just in local trees for now.

> but the existing code would definitely help anybody working in
> interfacing other tools with PCP. The main people to contact are
> "Sjodin, Jan" <Jan.S...@amd.com>
> Sebastian Pop <seb...@gmail.com>
>
> Work on PCP has stalled because Jan has left for another group.
>

Jan, what's the plan to integrate PCP in GCC and/or LLVM?

> In the hort term, Tobias is right that the best way to interface tools is to
> use the scoplib format and library from Louis-Noel Pouchet (PoCC).
>
> In any case, work on either PCP or scoplib will be beneficial. The community
> of polyhedral compilation experts is small, and its openly available output
> is already diminished by two proprietary development projects at IBM and
> Reservoir Labs. I strongly wish that any effort within LLVM will be wise
> enough to join forces with Graphite :-)
>
> Albert

Jan Sjodin

unread,
Dec 28, 2009, 2:15:53 PM12/28/09
to Sebastian Pop, Albert Cohen, GCC GRAPHITE, LLVM Developers Mailing List, lo...@infosun.fim.uni-passau.de
> Jan, what's the plan to integrate PCP in GCC and/or LLVM?

Currently thre is no concrete plan, or at least nothing immediate. 
I believe a modular approach to loop transformations is desirable.
The main purpose of PCP is to get a common interface between a lower level
IR like SSA and higher level optimizations done on structured code, but one
could also directly translate from a front-end to PCP. On the 
analysis/transformation side there could be several implementations,
including the polyhedral model. What needs to be done is to clearly specify
how different information should be encoded. Debug info, costs for
heuristics etc. The second benefit of having PCP is to be able to read and
write code fragments to a file. LLVM has some of these features already,
but SSA is not the right level. Implementing views on top of SSA might be
possible, using program structure trees and a lot of bookkeeping, but that
will also reduce the modularity code.

Front End <-             -> Polyhedral Model
            \           /
             ==> PCP <==
            /     ^     \
     SSA  <-      |      -> Classical Loop transforms
                  |
                  v
                Text

 
I believe it would not be too much work to LLVM-ify the current
implementation of PCP since it is all written in C++. Most of the work
would be to translate implementations of PCP data ADTs to use the LLVM ones. 
There are however a few areas that need to be addressed:
 
1. Memory management.
2. Error handling.
3. Improved type system, since we need to be able to reason better about
   induction variables and size of data. Alignment may come in here as well.  
4. Different properties must be encoded and some should perhaps put into
   the PCP language itself e.g. reduction operations.
 
I can look into how the current code can be released, or contributed in order
for other people to work on it.
 
- Jan
 

Albert Cohen

unread,
Dec 28, 2009, 6:05:19 AM12/28/09
to ether, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
ether wrote:
>> The polyhedral loop description is simple and not compiler depended.
>> Therefore external tools like LooPo (automatic parallelization), Pluto
>> (optimization in general)
> i had contacted the author a week ago, and if we use it, we need a IR
> generator in llvm side to extract SCoP, and the corresponding parser in
> Pluto side that read, then a backend for cloog and the corresponding in
> parser llvm that read those stuff back to llvm. :)
>> or even Graphite
> hows the statue of PCP? could us just use the PCP language with annotation?

Hi all,

PCP is only partially implemented: conversion out of PCP to Graphite is

not implemented, but the existing code would definitely help anybody

working in interfacing other tools with PCP. The main people to contact are
"Sjodin, Jan" <Jan.S...@amd.com>
Sebastian Pop <seb...@gmail.com>

Work on PCP has stalled because Jan has left for another group.

In the hort term, Tobias is right that the best way to interface tools

is to use the scoplib format and library from Louis-Noel Pouchet (PoCC).

In any case, work on either PCP or scoplib will be beneficial. The
community of polyhedral compilation experts is small, and its openly
available output is already diminished by two proprietary development
projects at IBM and Reservoir Labs. I strongly wish that any effort
within LLVM will be wise enough to join forces with Graphite :-)

Albert

Tobias Grosser

unread,
Dec 28, 2009, 7:24:52 PM12/28/09
to ether, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
On 12/27/09 10:18, ether wrote:
> hi Tobi ,
>
> that sounds greate :D
>
> On 2009-12-27 5:43, Tobias Grosser wrote:
>> I already looked into implementing something like Graphite for LLVM.
>> However just recently, so I have not released any code yet. As soon as
>> some code is available I will post patches.
> whats its status? do you need any help?

Very recently started. That's why I did not talk about this on my own.

However help is always needed. As already said. Getting affine access
functions from the memory accesses would be a big help.

>> A general plan to implement polyhedral transformations in LLVM:
>>
>> 1. The identity transformation (LLVM->polyedral->LLVM)
>> ======================================================
>>
>> Create the polyhedral representation of the LLVM IR, do nothing with
>> it, and generate LLVM IR from the polyhedral representation. (Enough
>> to attach external optimizers)
>>
>> 1.1 Detect regions
>> 1.2 Translate LLVM IR to polyhedral model
>> -----------------------------------------
>>
>> The first step will be to analyze the LLVM intermediate language and
>> extract control flow regions that can be analyzed using the polyhedral
>> model. This is mainly based on the scalar evolution analysis and
>> should be more or less like the detection in Graphite.
>> One point I do not yet fully understand is how to get array access
>> functions from the LLVM-IR. (Probably based on getElementPtr)

> as i know, getElementPtr combine the base pointer and the offset to
> generate the pointer to the given element, like:
> &(b[i]) ==> %8 = getelementptr i32* %b, i32 %i.03 ;1-dimension
> and &(c[i][j]) ==> %4 = getelementptr [4 x i32]* %c, i32
> %i.0.reg2mem.0.ph.ph, i32 %j.0.reg2mem.0.ph ;2-dimension
> so maybe you could check the non-pointer operand of the getelementptr
> instructrion.

Probably. I think for single dimensional arrays it will not be too
difficult using scalar evolution to get the access functions. I think
multi dimensional arrays will get complicated.

>> Another question is which polyhedral library can be used inside LLVM.
>> One option would be ISL from Sven Verdoolaege (LGPL) another the PPL
>> from Roberto Bagnara (GPLv3).
>>
>> 1.3 Generate LLVM IR from polyhedral mode
>> -----------------------------------------
>>
>> For code generation the CLooG/isl library can be used. It is LGPL
>> licensed. Sven will also work on an CLooG using the PPL, so this could
>> also be an option.
>>
>> 2. Optimize on the polyhedral representation
>> ============================================
>>
>> 2.1 Use external optimizers
>> ---------------------------
>>
>> The polyhedral loop description is simple and not compiler depended.
>> Therefore external tools like LooPo (automatic parallelization), Pluto
>> (optimization in general)

> i had contacted the author a week ago, and if we use it, we need a IR
> generator in llvm side to extract SCoP, and the corresponding parser in
> Pluto side that read, then a backend for cloog and the corresponding in
> parser llvm that read those stuff back to llvm. :)

The way to go is the scoplib format (propably extended by quantified
variables). This format could be extracted from graphite easily and
could also be created in LLVM.
What we need to get back into LLVM is only the new optimized schedule
described e.g. as cloog like scattering functions. These can be parsed
easily. The real code generation could be done internally, so it is not
necessary to parse the generated from external tools.

>> or even Graphite
> hows the statue of PCP? could us just use the PCP language with annotation?

See Jan's mail.
The first steps we have to take are not yet dependend on PCP, later on
it might be useful for debugging and better readability of the test cases.

>> might be used to optimize code. This could give a first impression
>> what to expect from the polyhedral model in LLVM.
>>
>> There are also affords to establish an interchangeable polyhedral
>> format (scoplib - Louis-Noel Pouchet) and to generate a polyhedral
>> compilation package. These will allow to share/exchange optimizations
>> between different compilers and research tools.
>>
>> Furthermore an external interface will enable researchers to use LLVM
>> for their work on polyhedral optimizations. This might be useful as
>> there is no polyhedral compiler for any dynamic language yet. Also
>> recent work on optimizing for GPUs has started in the polyhedral
>> community, so the LLVM OpenCL implementation might be interesting too.

> if that targeting some entertainment application instead of scientific
> computing, implement the polyhedral framework beyond the llvm "support"
> and "system" library, to allow those thing run as native windows
> application is a good idea, i think.

Sure.

>>
>> 2.2 Implement optimizations in LLVM
>> -----------------------------------
>>
>> Useful optimizations could be imported into / rewritten in LLVM. For
>> optimizations that transfrom the LLVM-IR like vectorization or
>> automatic parallelization at least some part of the optimizations has
>> to be in LLVM anyway.

> the information that compute in polyhedral framework such like iteration
> domain and affine schedule may all llvm to do some interesting
> optimization like generate stream processing code from normal c code
> (and to optimize pointer access boundary check?)

Yes

See you

Tobi

Albert Cohen

unread,
Dec 29, 2009, 7:12:10 AM12/29/09
to Tobias Grosser, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
Tobias Grosser wrote:
> The way to go is the scoplib format (propably extended by quantified
> variables). This format could be extracted from graphite easily and
> could also be created in LLVM.
> What we need to get back into LLVM is only the new optimized schedule
> described e.g. as cloog like scattering functions. These can be parsed
> easily. The real code generation could be done internally, so it is not
> necessary to parse the generated from external tools.

By the way, Konrad Trifunovic is interested (for his own research) to
improve on the current read/write capabilities of Graphite to process
the full scoplib format instead. Dealing with iteration domain and
schedule/scattering is easy, but to process array subscripts will
require more work. We will keep you informed of the progresses on the
Graphite mailing list, but collaboration outside the Graphities is welcome.

Albert

Dan Gohman

unread,
Jan 4, 2010, 2:44:24 PM1/4/10
to Tobias Grosser, lo...@infosun.fim.uni-passau.de, LLVM Developers Mailing List, GCC GRAPHITE

On Dec 28, 2009, at 4:24 PM, Tobias Grosser wrote:
>
> Probably. I think for single dimensional arrays it will not be too
> difficult using scalar evolution to get the access functions. I think
> multi dimensional arrays will get complicated.

If you want to know how the address is calculated as a function of
each enclosing loop, ScalarEvolution should be quite usable for
multiple dimensions. This is represented with an add-recurrence
with another add-recurrence as its "start" operand. For example:

for (i=0; i<m; ++i) // X
for (j=0; j<n; ++j) // Y
A[i][j];

The store address has this expression:

{{A,+,sizeof(double)*n}<X>,+,sizeof(double)}<Y>

This says that the value starts at A, steps by sizeof(double)*n
(address units) with the iteration of loop X, and steps by
sizeof(double) with the iteration of loop Y.

However, if you want to recognize this as a high-level array
reference on A with subscripts "i" and then "j", there are some
missing pieces.

On a related note, analyzing getelementptr yourself directly is
doable, but there are several major complications. Using a helper
library such as ScalarEvolution can protect you from many
low-level artifacts (though not all).

Dan

Tobias Grosser

unread,
Jan 4, 2010, 5:31:31 PM1/4/10
to Dan Gohman, LLVM Developers Mailing List, GCC GRAPHITE
On 01/04/10 20:44, Dan Gohman wrote:
>
> On Dec 28, 2009, at 4:24 PM, Tobias Grosser wrote:
>>
>> Probably. I think for single dimensional arrays it will not be too
>> difficult using scalar evolution to get the access functions. I think
>> multi dimensional arrays will get complicated.
>
> If you want to know how the address is calculated as a function of
> each enclosing loop, ScalarEvolution should be quite usable for
> multiple dimensions. This is represented with an add-recurrence
> with another add-recurrence as its "start" operand. For example:
>
> for (i=0; i<m; ++i) // X
> for (j=0; j<n; ++j) // Y
> A[i][j];
>
> The store address has this expression:
>
> {{A,+,sizeof(double)*n}<X>,+,sizeof(double)}<Y>
>
> This says that the value starts at A, steps by sizeof(double)*n
> (address units) with the iteration of loop X, and steps by
> sizeof(double) with the iteration of loop Y.
>
> However, if you want to recognize this as a high-level array
> reference on A with subscripts "i" and then "j", there are some
> missing pieces.

You are right, starting with the ScalarEvolution is the right approach.
However I believe the high level array analysis might be useful to get
simpler expressions. I have to think about this later on.

> On a related note, analyzing getelementptr yourself directly is
> doable, but there are several major complications. Using a helper
> library such as ScalarEvolution can protect you from many
> low-level artifacts (though not all).

Sure, it is a great tool.

Tobias

Grigori Fursin

unread,
Jan 4, 2010, 7:24:13 PM1/4/10
to Albert Cohen, Tobias Grosser, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, ctuning-d...@googlegroups.com, LLVM Developers Mailing List
Dear colleagues,

Happy New Year!

Just wanted to mention that since cTuning community is interested in
performance/code size/power tuning using empirical feedback-directed techniques
using different compilers including GCC and LLVM, so just wanted to mention that
we would be interested to add support to fine-grain optimizations from GRAPHITE
to Interactive Compilation Interface at some point at the beginning
of this year to be able to use cTuning optimization framework directly
and share optimization cases with the community.

Will keep in touch,
Grigori

ether

unread,
Jan 5, 2010, 8:45:05 AM1/5/10
to Tobias Grosser, lo...@infosun.fim.uni-passau.de, GCC GRAPHITE, LLVM Developers Mailing List
hi Tobi,

i just added the Poly
library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm
build system, which only contain a toy pass "Poly".
i think we could add the polyhedral optimization stuff in to this library.

it was test under cmake+visual studio 2009, and i also add the library
build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac,
sorry.
hope this help

best regards

--ether

On 2009-12-27 5:43, Tobias Grosser wrote:

polylib.patch

Tobias Grosser

unread,
Jan 5, 2010, 8:29:06 PM1/5/10
to ether, LLVM Developers Mailing List
On 01/05/10 14:45, ether wrote:
> hi Tobi,
>
> i just added the Poly
> library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm
> build system, which only contain a toy pass "Poly".
> i think we could add the polyhedral optimization stuff in to this library.
>
> it was test under cmake+visual studio 2009, and i also add the library
> build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac,
> sorry.
> hope this help
>
> best regards
>
> --ether

hi ether,

I pushed your work to our git repository at
http://repo.or.cz/w/llvm-complete/pofl.git


So we can work on first version that could be committed to the LLVM svn
repository.

I just had some discussions on the LLVM IRC channel concerning the
integration of Poly.

Tobias Grosser

unread,
Jan 5, 2010, 8:39:54 PM1/5/10
to ether, LLVM Developers Mailing List
On 01/05/10 14:45, ether wrote:
> hi Tobi,
>
> i just added the Poly
> library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm
> build system, which only contain a toy pass "Poly".
> i think we could add the polyhedral optimization stuff in to this library.
>
> it was test under cmake+visual studio 2009, and i also add the library
> build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac,
> sorry.
> hope this help
>
> best regards
>
> --ether

[the complete mail]

hi ether,

great start.

I pushed your work to our git repository at
http://repo.or.cz/w/llvm-complete/pofl.git

This repository is ment to track work on a first version of LLVM Poly
that could be proposed for integration to LLVM. Feel free to register as
a user and to commit to the repository.

I just had some discussions on the LLVM IRC channel concerning the
integration of Poly.

The preferred way to implement this seemed to be like the tools/clang
integration in llvm. A separated repository (that might also be hosted
on the llvm svn server), that is only build if available. So user can
decide if they want to try LLVMPoly and check it out on demand.

What we need to achieve before proposing a patchset:

1. Integrate ClooG/isl
2. A working pass Frontend/Backend (very limited, no optimizations, but
able to handle real world code)

Thanks for working on this

Tobi

ether

unread,
Jan 5, 2010, 9:17:20 PM1/5/10
to Tobias Grosser, LLVM Developers Mailing List
hi Tobi,

On 2010-1-6 9:39, Tobias Grosser wrote:
> On 01/05/10 14:45, ether wrote:
>> hi Tobi,
>>
>> i just added the Poly
>> library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm
>> build system, which only contain a toy pass "Poly".
>> i think we could add the polyhedral optimization stuff in to this
>> library.
>>
>> it was test under cmake+visual studio 2009, and i also add the library
>> build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac,
>> sorry.
>> hope this help
>>
>> best regards
>>
>> --ether
>
> [the complete mail]
>
> hi ether,
>
> great start.
>
> I pushed your work to our git repository at
> http://repo.or.cz/w/llvm-complete/pofl.git
>
> This repository is ment to track work on a first version of LLVM Poly
> that could be proposed for integration to LLVM. Feel free to register
> as a user and to commit to the repository.

ok :) i just learning how to use git yesterday


>
> I just had some discussions on the LLVM IRC channel concerning the
> integration of Poly.
>
> The preferred way to implement this seemed to be like the tools/clang
> integration in llvm. A separated repository (that might also be hosted
> on the llvm svn server), that is only build if available. So user can
> decide if they want to try LLVMPoly and check it out on demand.

got it, i think we could move it out of opt after we finished the first
implement. right now we could play with this dirty library first.


>
> What we need to achieve before proposing a patchset:
>
> 1. Integrate ClooG/isl
> 2. A working pass Frontend/Backend (very limited, no optimizations,
> but able to handle real world code)
>
> Thanks for working on this

you are welcome.
>
> Tobi
>
>
best regards

--ether

ether

unread,
Jan 8, 2010, 8:16:34 AM1/8/10
to John Mosby, LLVM Developers Mailing List, Tobias Grosser
hi all,

On 2010-1-7 0:11, John Mosby wrote:
> In LLVM we could add support for generalized CFG regions and
> RegionPasses. A region is a part of the CFG. The only information we
> have is, that it has one entry and one exit, this it can be optimized
> separately.
> I think this is the best way to add region analysis. I must admit
> this approach
> helps me on another, similar project I'm working on in parallel (no
> pun intended).
> Tobias, is this how you are architecting your region analysis already?
>
> John
>

i just implementing the skeleton of Region/RegionInfo like LoopBase and
LoopInfoBase[1] in the llvm existing codes, and found that theres lots
of common between "Region" and "Loop":

1. both of them are consist of several BasicBlocks
2. both of them have some kind of nested structures, so both a loop and
a region could have parent or childrens
3. both of them have a BasicBlocks(header of a loop and "entry" of a
region) that dominates all others

and the Region class will have the most stuffs very similar in LoopBase,
like: ParentRegion, SubRegions, Blocks, getRegionDepth(),
getExitBlock(), getExitingBlock() ......

so, could us just treat "Loop" as some kind of general "Region" of
BasicBlocks, and make Loop and Region inherit from "RegionBase"?


[1] http://llvm.org/doxygen/LoopInfo_8h-source.html

best regards
--ether

ether

unread,
Jan 8, 2010, 8:20:24 AM1/8/10
to ether, LLVM Developers Mailing List, Tobias Grosser
sorry that i forgot to change the subjuect

Tobias Grosser

unread,
Jan 8, 2010, 2:27:58 PM1/8/10
to ether, LLVM Developers Mailing List
On 01/08/10 14:20, ether wrote:
> sorry that i forgot to change the subjuect

Hi ether,

sounds interesting. Actually is/may be some kind of region. If you want
you can have a look at the analysis, that I wrote. It is not yet
finished, not completely documented and work in progress. However the
first big comment might be interesting for you. Or seeing the results of
opt -regions -analyze

The git repo to see it is here:
http://repo.or.cz/w/llvm-complete/tobias-sandbox.git/shortlog/refs/heads/region

I will think about this and maybe reply again.

Tobi

ether

unread,
Jan 8, 2010, 9:43:02 PM1/8/10
to Tobias Grosser, LLVM Developers Mailing List
hi Tobi

On 2010-1-9 3:27, Tobias Grosser wrote:
> On 01/08/10 14:20, ether wrote:
>> sorry that i forgot to change the subjuect
>
> Hi ether,
>
> sounds interesting. Actually is/may be some kind of region. If you
> want you can have a look at the analysis, that I wrote. It is not yet
> finished, not completely documented and work in progress. However the
> first big comment might be interesting for you. Or seeing the results of
> opt -regions -analyze
>
> The git repo to see it is here:
> http://repo.or.cz/w/llvm-complete/tobias-sandbox.git/shortlog/refs/heads/region
>

that make sense to me, and if you make your Region class a subclass of
LoopBase, the codes like "addChildLoop" and "getLoopDepth()" from
LoopBase may help you a lot to manipulate regions in the later
optimization passes (of course, we should give it a more meaningful name
like "addChildRegion") :)

and i think if we ignore the "goto" statement and "return" statement (i
remember theres a pass in llvm that will make a function only return in
one basicblock) in loops, loops also will have only one exit block, so
we can treat loop as a special region that have back edge, and we can
say, a loop must be a region but a region not necessary a region.

we can read something about this in <<Compilers: Principles Techniques
and Tools, Second Edition>>, 9.7 region-based analysis.

best regards
--ether

Tobias Grosser

unread,
Jan 12, 2010, 12:59:45 PM1/12/10
to ether, LLVM Developers Mailing List
On 01/08/10 14:20, ether wrote:
> sorry that i forgot to change the subjuect
>
>
> hi all,
Hi ether,

now a kind of more complete answer.

> On 2010-1-7 0:11, John Mosby wrote:
>> In LLVM we could add support for generalized CFG regions and
>> RegionPasses. A region is a part of the CFG. The only information we
>> have is, that it has one entry and one exit, this it can be optimized
>> separately.
>> I think this is the best way to add region analysis. I must admit this
>> approach
>> helps me on another, similar project I'm working on in parallel (no
>> pun intended).
>> Tobias, is this how you are architecting your region analysis already?
>>
>> John
>>
>
> i just implementing the skeleton of Region/RegionInfo like LoopBase and
> LoopInfoBase[1] in the llvm existing codes, and found that theres lots
> of common between "Region" and "Loop":
>
> 1. both of them are consist of several BasicBlocks

Correct.

> 2. both of them have some kind of nested structures, so both a loop and
> a region could have parent or childrens

Correct.

> 3. both of them have a BasicBlocks(header of a loop and "entry" of a
> region) that dominates all others

Correct.


> and the Region class will have the most stuffs very similar in LoopBase,
> like: ParentRegion, SubRegions, Blocks, getRegionDepth(),

Correct.

> getExitBlock(), getExitingBlock() ......
This might need some thoughts,


> so, could us just treat "Loop" as some kind of general "Region" of
> BasicBlocks, and make Loop and Region inherit from "RegionBase"?

I would like to do so, as I like the structure of this approach.
However until now my pass was written on the side, as a proof of concept.

I wrote two Passes:

1. Regions
Detect the regions and print the regions tree. Try it with:
opt -regions -analyze file.bc

2. RegionsWithoutLoops
Find the maximal regions that do not contain any loops. Try it with:
opt -regions-without-loops file.bc
opt -view-regions-without-loops file.bc (needs graphviz)

Both ATM only work on BasicBlocks. However I have seen the patches in
your sandbox and I really like the idea to keep the analysis general.

If you are interested you could have a look at my sandbox (not yet well
documented and cleanly formatted).

We might want to think about, how to merge our work.

Tobi

Jan Sjodin

unread,
Jan 12, 2010, 6:56:57 PM1/12/10
to Tobias Grosser, ether, LLVM Developers Mailing List
Why not use the "standard" algorithm for detecting SESE-regions and building a program structure tree?
It should handle everything you want. It also becomes much simpler to specify a connected SESE-region
by entry/exit edges, while a disconnected region is specified by entry/exit blocks. Only defining regions on
blocks is not enough to be able to quickly determine how to replace/move a region in a CFG.

The algorithm can be found in this paper:
The Program Structure Tree: Computing Control Regions in Linear Time
by Richard Johnson , David Pearson , KeshavPingali

- Jan

Tobias Grosser

unread,
Jan 12, 2010, 8:14:33 PM1/12/10
to Jan Sjodin, LLVM Developers Mailing List
On 01/13/10 00:56, Jan Sjodin wrote:
> Why not use the "standard" algorithm for detecting SESE-regions and building a program structure tree?
> It should handle everything you want. It also becomes much simpler to specify a connected SESE-region
> by entry/exit edges, while a disconnected region is specified by entry/exit blocks. Only defining regions on
> blocks is not enough to be able to quickly determine how to replace/move a region in a CFG.
>
> The algorithm can be found in this paper:
> The Program Structure Tree: Computing Control Regions in Linear Time
> by Richard Johnson , David Pearson , KeshavPingali

Hi Jan,

great to read you again.

And thanks for pointing me to this paper, I read it but did some further
research. I like the idea using edges to define the regions, however it
does not catch all regions.

Defining regions using just edges stops us from detection a lot of very
common regions.

Example:

A very common CFG:

\ /
1
/ \
2 3
| |
4 6
| |
5 7
\ /
8
/ \
9 10
| |
11 12
\ /
13
/ \

I would detect these two regions:

Region A: 1 -> 8 containing {1,2,3,4,5,6,7}
Region B: 8 -> 13 containing {8,9,10,11,12}

If I use edges to define the regions the detection is not possible at all.

After region detection the CFG can always be split up to create single
entry single exit edges, if they are needed e.g. for code generation.


\ /
1_a
|
1
/ \
2 3
| |
4 6
| |
5 7
\ /
8_a
|
8
/ \
9 10
| |
11 12
\ /
13_a
|
13
/ \

Now the regions can be defined using edges:

Region A: (1_a,1) -> (8_a, 8) containing {1,2,3,4,5,6,7,8_a}
Region B: (8_a, 8) -> (13_a, 13) containing {8,9,10,11,12,13_a}

In general this approach saves a preliminary pass that has to insert new
bbs, to generate these edges. As we do not have to modify the CFG other
passes like dominance information are still valid, and we do not have to
create a lot of auxiliary bbs, to be able to detect all regions. This
saves memory and runtime. In general it is probably not too easy to
decide where to insert these bbs either:

CFG:
0
|
1
/ |
2 |
/ \ 3
4 5 |
| | |
6 7 8
\ | /
\ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8}
9 region B: 2 -> 9 {2,4,5,6,7}

So we need one bb that joins 6 and 7 and one that joins the two regions

CFG: 0
|
1
/ |
2 |
/ \ 3
4 5 |
| | |
6 7 8
\ | |
\ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b}
9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
\ /
9b
|
9

My approach is comparable to this paper:

The Refined Process Structure Tree by Jussi Vanhatalo, Hagen Völzer,
Jana Koehler

The implementation however takes advantage of the existence of
Dominance/PostDominance information. Therefore it is simpler and
hopefully faster. At the moment run time is comparable to dominance tree
calculation.

If you want, have a look into some results I got with a pass extracting
maximal non trivial regions that do not contain loops from libbz2:

http://tobias.osaft.eu/llvm/region/bz2NoLoops/

Interesting ones:

regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
has a lot of exit edges.

regions_without_loops.bzopen_or_bzdopen.dot.png
the first region has two entry edges. One is the loop latch.

(Keep in mind all regions have the same color, so if it seems there is
an edge into a region, there are just two regions close by)

Without a prepass that exposes the edges almost no region could be
detected with the "standard" approach.

It would be great to hear your opinion about this.

See you

Tobias

Jan Sjodin

unread,
Jan 12, 2010, 11:09:10 PM1/12/10
to Tobias Grosser, LLVM Developers Mailing List
Hi Tobias

> In general this approach saves a preliminary pass that has to insert new

> bbs, to generate these edges. As we do not have to modify the CFG other
> passes like dominance information are still valid, and we do not have to
> create a lot of auxiliary bbs, to be able to detect all regions. This
> saves memory and runtime. In general it is probably not too easy to
> decide where to insert these bbs either:

The general rule is to split all blocks with multiple in-edges and multiple out-edges
into blocks with either multiple in-edges or multiple out-edges, but not both.
One option is to keep this as an invariant throughout the compiler and make use
of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
in regular basic blocks. There are different variations on this that may or may not be
useful.

> CFG:
> 0
> |
> 1
> / |
> 2 |
> / \ 3
> 4 5 |
> | | |
> 6 7 8
> \ | /
> \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8}
> 9 region B: 2 -> 9 {2,4,5,6,7}
>
> So we need one bb that joins 6 and 7 and one that joins the two regions
>
> CFG: 0
> |
> 1
> / |
> 2 |
> / \ 3
> 4 5 |
> | | |
> 6 7 8
> \ | |
> \ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b}
> 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
> \ /
> 9b
> |
> 9

It is fairly simple to use the information from the algorithm to decide
where those merges should be inserted to get the expected regions.
This may be needed in the cases where a sub-region is too complicated
to be represented and must be abstracted into a "black box".

> My approach is comparable to this paper:
> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
> Jana Koehler

I was looking through some slides that described their algorithm. One case that
seems to be legal is this:

|
Entry
/ \
R0 R1
\ /
Exit
|

With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
that a fragment cannot be identified using only the entry and exit blocks, but
the internal blocks or edges will also need to be listed. I don't know if this is
relevant to your implementation.

> The implementation however takes advantage of the existence of
> Dominance/PostDominance information. Therefore it is simpler and
> hopefully faster. At the moment run time is comparable to dominance tree
> calculation.

Both algorithms are linear so there is really no big difference in time imo.
I believe the biggest difference that you mention is that you can capture more
complicated regions without having to modify the CFG with the current
algorithm.

> If you want, have a look into some results I got with a pass extracting
> maximal non trivial regions that do not contain loops from libbz2:
>
> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>
> Interesting ones:
>
> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
> has a lot of exit edges.

I think this example proves the strengths and weaknesses of both
approaches. Making that region into structured control flow would add a lot
of additional blocks. This will also happen after generating code
from the polyhedral model, so either way the cost is there if the optimization
is successful.

The second case is where the optimization fails (no profitable
transformation found) and the CFG can remain untouched.

The third case is if one of those blocks contains something complicated.
I believe the current algorithm simply fails and cannot detect the region. If the
CFG is modified this would allow an internal SESE-region to become a black box, and the
the outer regions could be optimized.

> regions_without_loops.bzopen_or_bzdopen.dot.png
> the first region has two entry edges. One is the loop latch.
> (Keep in mind all regions have the same color, so if it seems there is
> an edge into a region, there are just two regions close by)
>
> Without a prepass that exposes the edges almost no region could be
> detected with the "standard" approach.

Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
between the two approaches is that the algorithm that you currently have is a cheaper up
front, but may be less capable in some cases, while the "standard" algorithm will be more
expensive, but can handle problematic regions better. Would you agree?

- Jan

Tobias Grosser

unread,
Jan 13, 2010, 4:19:09 AM1/13/10
to llv...@cs.uiuc.edu
On 01/13/10 05:09, Jan Sjodin wrote:
> Hi Tobias

>
>> In general this approach saves a preliminary pass that has to insert new
>
>> bbs, to generate these edges. As we do not have to modify the CFG other
>> passes like dominance information are still valid, and we do not have to
>> create a lot of auxiliary bbs, to be able to detect all regions. This
>> saves memory and runtime. In general it is probably not too easy to
>> decide where to insert these bbs either:
>
> The general rule is to split all blocks with multiple in-edges and multiple out-edges
> into blocks with either multiple in-edges or multiple out-edges, but not both.
This is not sufficient, as shown in the example below. It would allow
only one region.

> One option is to keep this as an invariant throughout the compiler and make use
> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
> in regular basic blocks. There are different variations on this that may or may not be
> useful.

This might be possible, however probably doubling the number of bbs.

>
>> CFG:
>> 0
>> |
>> 1
>> / |
>> 2 |
>> / \ 3
>> 4 5 |
>> | | |
>> 6 7 8
>> \ | /
>> \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>> 9 region B: 2 -> 9 {2,4,5,6,7}
>>
>> So we need one bb that joins 6 and 7 and one that joins the two regions
>>
>> CFG: 0
>> |
>> 1
>> / |
>> 2 |
>> / \ 3
>> 4 5 |
>> | | |
>> 6 7 8
>> \ | |
>> \ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b}
>> 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>> \ /
>> 9b
>> |
>> 9
>

> It is fairly simple to use the information from the algorithm to decide
> where those merges should be inserted to get the expected regions.
> This may be needed in the cases where a sub-region is too complicated
> to be represented and must be abstracted into a "black box".

>From which algorithm? The program structure tree does not give this
information, does it?

>> My approach is comparable to this paper:

>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>> Jana Koehler
>
> I was looking through some slides that described their algorithm. One case that
> seems to be legal is this:
>
> |
> Entry
> / \
> R0 R1
> \ /
> Exit
> |
>
> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
> that a fragment cannot be identified using only the entry and exit blocks, but
> the internal blocks or edges will also need to be listed. I don't know if this is
> relevant to your implementation.

No. The ideas are comparable, however I believe their implementation is
a little complicated. ;-)

I would mark the regions as:

Region A: R0 -> Exit, containing {R0}
Region B: R1 -> Exit, containing {R1}

>> The implementation however takes advantage of the existence of
>> Dominance/PostDominance information. Therefore it is simpler and
>> hopefully faster. At the moment run time is comparable to dominance tree
>> calculation.
>

> Both algorithms are linear so there is really no big difference in time imo.

Sure. However in terms of maintainability it is nice to be able to reuse
existing analysis instead of write another triconnected component
analysis upfront.

> I believe the biggest difference that you mention is that you can capture more
> complicated regions without having to modify the CFG with the current
> algorithm.

Yes.

>> If you want, have a look into some results I got with a pass extracting
>> maximal non trivial regions that do not contain loops from libbz2:
>>
>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>
>> Interesting ones:
>>
>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>> has a lot of exit edges.
>

> I think this example proves the strengths and weaknesses of both
> approaches. Making that region into structured control flow would add a lot
> of additional blocks. This will also happen after generating code
> from the polyhedral model, so either way the cost is there if the optimization
> is successful.

Yes, but just in this case and not for regions where the model cannot be
applied. If the regions pass is used for analysis purposes, nothing has
to be touched.

> The second case is where the optimization fails (no profitable
> transformation found) and the CFG can remain untouched.
>
> The third case is if one of those blocks contains something complicated.
> I believe the current algorithm simply fails and cannot detect the region.

Which algorithm? The one in Graphite? Or the region detection I wrote
here? This is just plain region detection, that does not even look at
the content but builds a region tree (program structure tree). It just
detects every possible region.
The selection would be a later pass.

> If the
> CFG is modified this would allow an internal SESE-region to become a black box, and the
> the outer regions could be optimized.

This is an optimization, however I think it is orthogonal to the region
detection problem. Say it works with any algorithm.


>> regions_without_loops.bzopen_or_bzdopen.dot.png
>> the first region has two entry edges. One is the loop latch.
>> (Keep in mind all regions have the same color, so if it seems there is
>> an edge into a region, there are just two regions close by)
>>
>> Without a prepass that exposes the edges almost no region could be
>> detected with the "standard" approach.
>

> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
> between the two approaches is that the algorithm that you currently have is a cheaper up
> front, but may be less capable in some cases, while the "standard" algorithm will be more
> expensive, but can handle problematic regions better. Would you agree?

I agree that the algorithm I have is cheaper upfront, but I do not yet
see a case where the algorithm is less capable. Would you mind to give
an example or to highlight the relevant part of the discussion?

Thanks a lot

ether

unread,
Jan 14, 2010, 9:03:49 PM1/14/10
to Jan Sjodin, LLVM Developers Mailing List, Tobias Grosser
hi,

about region class, i think provide some method to extract the top level
loop from a given region is useful in polly(our polyhedral optimization
framework), because the we are only doing optimization on the region
with loop.

and a long term consideration about "region pass":

if we want to integrate region analysis and optimization framework into
llvm, i think we can use an approach that similar to loop analysis and
optimization: write a class "regionpass" inherit from "pass", and the
corresponding pass manger "RegionPassManager". (a kind of function pass)
if we follow this approach, we need to push the region pass manager into
the llvm pass manager stack.
the first question of this approach is, whats the relationship between
"loop pass manager" and "region pass manager"?

way 1: make region pass manager below loop pass manager in the stack

pass manager stack:

bb pass manager <---top
loop pass manager
region pass manager
function pass manager
... <---bottom

in this way the region hierarchy need to be reconstruct when a loop
transform change it.

way 2: make region pass manager above loop pass manager in the stack

pass manager stack:

bb pass manager <---top
region pass manager
loop pass manager
function pass manager
... <---bottom

in this way the loop hierarchy need to be reconstruct when a region pass
change it.

now we need to choose a way to minimize the loop reconstruction or
region reconstruction. i think that the chance that a region transform
affect the loop structure is smaller, so maybe way 2 is better.


at last, i have some idea about finding a interesting region: (maybe
make the region analysis too complex)

we can introduce some thing like "region filter" that determine the
property of a region, the region filter will like a "pass", which can
run on an instruction at a time, a basic block at a time, or even a sub
region at a time, then we can write a "filter manager" like "pass
manager " to stream the filtering process, so that we can promote the
the performance of the region finding process.

comment and suggestion is appreciate.

best regards
--ether

Tobias Grosser

unread,
Jan 15, 2010, 3:55:46 AM1/15/10
to ether, LLVM Developers Mailing List
On 01/15/10 03:03, ether wrote:
> hi,
Hi,

> about region class, i think provide some method to extract the top level
> loop from a given region is useful in polly(our polyhedral optimization
> framework), because the we are only doing optimization on the region
> with loop.

Hi ether,

I think this should be implemented as a RegionFilter, that checks if a
region contains a loop, and that can be asked for further information.
In general I do not think this kind of analysis belongs to a region, but
as you proposed some kind of filter could be applied. In the short term
the passes who need this information could get it on their own.

This would need some thoughts. Ideal I think we would not order them, but if
a region changed, just reconstruct the loops that are in this region and
if a
loop changed just reconstruct the regions in this loop.

> at last, i have some idea about finding a interesting region: (maybe
> make the region analysis too complex)
>
> we can introduce some thing like "region filter" that determine the
> property of a region, the region filter will like a "pass", which can
> run on an instruction at a time, a basic block at a time, or even a sub
> region at a time, then we can write a "filter manager" like "pass
> manager " to stream the filtering process, so that we can promote the
> the performance of the region finding process.

Yes, I like this idea.
So the basic design would be that we have some passes like:
Maximal Region regarding an Analysis/Filter
Minimal Region regarding an Analysis/Filter
All Regions regarding an Analysis/Filter

So a pass can ask the regionpass manager for a specific kind of regions.
It is than just invoked for regions, that fulfill this requirement.

Tobias

Jan Sjodin

unread,
Jan 20, 2010, 12:05:50 PM1/20/10
to Tobias Grosser, llv...@cs.uiuc.edu
>>> bbs, to generate these edges. As we do not have to modify the CFG other
>>> passes like dominance information are still valid, and we do not have to
>>> create a lot of auxiliary bbs, to be able to detect all regions. This
>>> saves memory and runtime. In general it is probably not too easy to
>>> decide where to insert these bbs either:
>>
>> The general rule is to split all blocks with multiple in-edges and multiple out-edges
>> into blocks with either multiple in-edges or multiple out-edges, but not both.
> This is not sufficient, as shown in the example below. It would allow
> only one region.

Yes, but with the insertion of merge-blocks it will allow more (see below).


>> One option is to keep this as an invariant throughout the compiler and make use
>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
>> in regular basic blocks. There are different variations on this that may or may not be
>> useful.
> This might be possible, however probably doubling the number of bbs.

I don't know if it will be double, but there will be more basic blocks for sure.
The algorithm that computes the SESE-regions can be used to determine
where the merge-nodes should be inserted. There are a couple of ways
of doing it, but if the bracket sets on two eges can be intersected to
match a third edge (which dominates the first two), you can insert a
merge block for the two edges. You don't have to compute the
dominators, but it helps to explain the problem that way.


>>>> My approach is comparable to this paper:
>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>>>> Jana Koehler
>>>
>>> I was looking through some slides that described their algorithm. One case that
>>> seems to be legal is this:
>>>
>>>       |
>>>      Entry
>>>     /    \
>>>   R0      R1
>>>     \     /
>>>      Exit
>>>        |
>>>
>>> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
>>> that a fragment cannot be identified using only the entry and exit blocks, but
>>> the internal blocks or edges will also need to be listed. I don't know if this is
>>> relevant to your implementation.
>
> No. The ideas are comparable, however I believe their implementation is
> a little complicated. ;-)

Do you have the same definition of a region and entry/exit blocks as they do?


> I would mark the regions as:
>
> Region A: R0 -> Exit, containing {R0}
> Region B: R1 -> Exit, containing {R1}

Is the entry always contained and is the exit never contained, or is that specified
per region? Depending on the restrictions of entry and exit blocks a loop with a single
basic block cannot be an entry or exit by itself. Example:

    |
    A
   /| _
  / |/ \
 B  R  |
  \ |\_/
   \|
    C
    |

If you only care about R in this case how is the region formed?
My assumption was that there is a selection in there somewhere.
Do you plan to refine the regions in the selection phase in any way?


>> If the
>> CFG is modified this would allow an internal SESE-region to become a black box, and the
>> the outer regions could be optimized.
>This is an optimization, however I think it is orthogonal to the region
>detection problem. Say it works with any algorithm.

I believe that creating a black-box will map a lot more cleanly to an edge-based
region definition, since block-based may include multiple entry/exit sub-regions
that will not encapsulate control flow in a reasonable way.


>>> regions_without_loops.bzopen_or_bzdopen.dot.png
>>> the first region has two entry edges. One is the loop latch.
>>> (Keep in mind all regions have the same color, so if it seems there is
>>> an edge into a region, there are just two regions close by)
>>>
>>> Without a prepass that exposes the edges almost no region could be
>>> detected with the "standard" approach.
>>
>> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
>> between the two approaches is that the algorithm that you currently have is a cheaper up
>> front, but may be less capable in some cases, while the "standard" algorithm will be more
>> expensive, but can handle problematic regions better. Would you agree?
>
> I agree that the algorithm I have is cheaper upfront, but I do not yet
> see a case where the algorithm is less capable. Would you mind to give
> an example or to highlight the relevant part of the discussion?

With the insertion of extra merge-blocks the code becomes more structured and the PST
can be refined further. A more fine-grained PST may allow more cases to be handled.

Thanks

Jan

Jan Sjodin

unread,
Jan 20, 2010, 12:16:37 PM1/20/10
to Tobias Grosser, ether, LLVM Developers Mailing List

Imo, a loop is simply a special kind of region, so a "filter" is perhaps the way to
go if you are interested in loops. Regions containing loops will have to be inspected
using the PST.

>> at last, i have some idea about finding a interesting region: (maybe
>> make the region analysis too complex)
>>
>> we can introduce some thing like "region filter" that determine the
>> property of a region, the region filter will like a "pass", which can
>> run on an instruction at a time, a basic block at a time, or even a sub
>> region at a time, then we can write a "filter manager" like "pass
>> manager " to stream the filtering process, so that we can promote the
>> the performance of the region finding process.
> Yes, I like this idea.
> So the basic design would be that we have some passes like:
> Maximal Region regarding an Analysis/Filter
> Minimal Region regarding an Analysis/Filter
> All Regions regarding an Analysis/Filter
> So a pass can ask the regionpass manager for a specific kind of regions.
> It is than just invoked for regions, that fulfill this requirement.

If you want to be able to manipulate specific regions you can have a
generic region class, and then sub classes for loop, if, unstructured etc.
That way it is easy to ask for the body of a loop or the true or false regions
of an if-region. It will also allow you to have different kinds of loops for/do-while,
but still treat them in a uniform way in some cases.

- Jan

Tobias Grosser

unread,
Jan 20, 2010, 6:07:44 PM1/20/10
to Jan Sjodin, llv...@cs.uiuc.edu
On 01/20/10 18:05, Jan Sjodin wrote:
>>>> bbs, to generate these edges. As we do not have to modify the CFG other
>>>> passes like dominance information are still valid, and we do not have to
>>>> create a lot of auxiliary bbs, to be able to detect all regions. This
>>>> saves memory and runtime. In general it is probably not too easy to
>>>> decide where to insert these bbs either:
>>>
>>> The general rule is to split all blocks with multiple in-edges and multiple out-edges
>>> into blocks with either multiple in-edges or multiple out-edges, but not both.
>> This is not sufficient, as shown in the example below. It would allow
>> only one region.
>
> Yes, but with the insertion of merge-blocks it will allow more (see below).

Sure. If you insert the right merge blocks you could be optimal. In
terms of finding all regions that could be created by inserting merge
blocks.

>>> One option is to keep this as an invariant throughout the compiler and make use
>>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
>>> in regular basic blocks. There are different variations on this that may or may not be
>>> useful.
>> This might be possible, however probably doubling the number of bbs.
>

> I don't know if it will be double, but there will be more basic blocks for sure.

Sure.

> The algorithm that computes the SESE-regions can be used to determine
> where the merge-nodes should be inserted. There are a couple of ways
> of doing it, but if the bracket sets on two eges can be intersected to
> match a third edge (which dominates the first two), you can insert a
> merge block for the two edges. You don't have to compute the
> dominators, but it helps to explain the problem that way.

Might be possible. I did not reason about this too much, when I found a
reasonable algorithm that did not requiere these blocks.

>>>>> My approach is comparable to this paper:
>>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>>>>> Jana Koehler
>>>>
>>>> I was looking through some slides that described their algorithm. One case that
>>>> seems to be legal is this:
>>>>
>>>> |
>>>> Entry
>>>> / \
>>>> R0 R1
>>>> \ /
>>>> Exit
>>>> |
>>>>
>>>> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
>>>> that a fragment cannot be identified using only the entry and exit blocks, but
>>>> the internal blocks or edges will also need to be listed. I don't know if this is
>>>> relevant to your implementation.
>>
>> No. The ideas are comparable, however I believe their implementation is
>> a little complicated. ;-)
>

> Do you have the same definition of a region and entry/exit blocks as they do?

No. They define a region based on all the edges in the region. This is a
very verbose definition, as all edges have to be saved. Also it is quite
expensive to compare regions, ...

However their description allows to talk about all possible regions.

>> I would mark the regions as:
>>
>> Region A: R0 -> Exit, containing {R0}
>> Region B: R1 -> Exit, containing {R1}
>

> Is the entry always contained and is the exit never contained, or is that specified
> per region? Depending on the restrictions of entry and exit blocks a loop with a single
> basic block cannot be an entry or exit by itself. Example:

The entry is always contained and dominates all bbs in the region. The
exit is never contained, but it postdominates all bbs in the region.


>
> |
> A
> /| _
> / |/ \
> B R |
> \ |\_/
> \|
> C
> |
> If you only care about R in this case how is the region formed?

This is: R -> C, containing {R}

So R is the entry BB with the entry edge A->R and C is the exit BB with
the exit edge R->C. R is in the SCoP, C is not.

> My assumption was that there is a selection in there somewhere.
> Do you plan to refine the regions in the selection phase in any way?

Sure. The region tree can be processed and several sets of regions can
be filtered out. E.g. the maximal regions that fulfill a certain
restriction. The minimal regions that fullfill a certain restriction.

Or some more complicated ones where some problems (like irregular
control flow) can be hidden in SESE subcomponents.

>>> If the
>>> CFG is modified this would allow an internal SESE-region to become a black box, and the
>>> the outer regions could be optimized.
>> This is an optimization, however I think it is orthogonal to the region
>> detection problem. Say it works with any algorithm.
>

> I believe that creating a black-box will map a lot more cleanly to an edge-based
> region definition, since block-based may include multiple entry/exit sub-regions
> that will not encapsulate control flow in a reasonable way.

The block based definition works as well as any edge based definition to
define a sese region. And it should be able to hide stuff in black boxes.

>
>>>> regions_without_loops.bzopen_or_bzdopen.dot.png
>>>> the first region has two entry edges. One is the loop latch.
>>>> (Keep in mind all regions have the same color, so if it seems there is
>>>> an edge into a region, there are just two regions close by)
>>>>
>>>> Without a prepass that exposes the edges almost no region could be
>>>> detected with the "standard" approach.
>>>
>>> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
>>> between the two approaches is that the algorithm that you currently have is a cheaper up
>>> front, but may be less capable in some cases, while the "standard" algorithm will be more
>>> expensive, but can handle problematic regions better. Would you agree?
>>
>> I agree that the algorithm I have is cheaper upfront, but I do not yet
>> see a case where the algorithm is less capable. Would you mind to give
>> an example or to highlight the relevant part of the discussion?
>

> With the insertion of extra merge-blocks the code becomes more structured and the PST
> can be refined further. A more fine-grained PST may allow more cases to be handled.

That is actually the difference between my algorithm and the PST bracket
algorithm. Mine should get the same regions, that the PST one gets after
inserting the best merge blocks possible. Just without requiring CFG
changes.

I am in writing some documentation about this that we could discuss
later on.

If you are interested I run my analysis on aermod from polyhedron.
The results can be found here:
http://tobias.osaft.eu/llvm/region/aermod/

There is a .regtree.txt file containing the region tree for every
function of the binary.

Furthermore there are the results of an example analysis, that finds the
biggest regions without loops. (The .dot and a .svg files)

If you have the impression some regions are not SESE keep in mind that I
took the same color for two regions, therefore they may just be next too
each other.

Thanks for discussing this with me. It helped me to get a feeling where
the differences are. Hoping we can have these discussions more often.

Tobi

Tobias Grosser

unread,
Jan 21, 2010, 4:12:18 AM1/21/10
to Jan Sjodin, LLVM Developers Mailing List
> Imo, a loop is simply a special kind of region, so a "filter" is perhaps the way to
> go if you are interested in loops. Regions containing loops will have to be inspected
> using the PST.

Except loops that have multiple exits. they are not necessarily (single
entry single exit) region, if the exists do not jump to the same exit bb.

Jan Sjodin

unread,
Jan 21, 2010, 2:42:26 PM1/21/10
to Tobias Grosser, LLVM Developers Mailing List
>> Imo, a loop is simply a special kind of region, so a "filter" is perhaps the way to
>> go if you are interested in loops. Regions containing loops will have to be inspected
>> using the PST.
>
> Except loops that have multiple exits. they are not necessarily (single
> entry single exit) region, if the exists do not jump to the same exit bb.

Yes, my assumption was to classify structured loops, but that does not exclude unstructured
regions that contain unstructured loops.

Jan Sjodin

unread,
Jan 21, 2010, 3:38:38 PM1/21/10
to Tobias Grosser, llv...@cs.uiuc.edu
>>> No. The ideas are comparable, however I believe their implementation is
>>> a little complicated. ;-)
>>
>> Do you have the same definition of a region and entry/exit blocks as they do?
>
> No. They define a region based on all the edges in the region. This is a
> very verbose definition, as all edges have to be saved. Also it is quite
> expensive to compare regions, ...
>
> However their description allows to talk about all possible regions.
I agree, their definition did not seem practical for our purposes.

>> Is the entry always contained and is the exit never contained, or is that specified
>> per region? Depending on the restrictions of entry and exit blocks a loop with a single
>> basic block cannot be an entry or exit by itself. Example:
>
> The entry is always contained and dominates all bbs in the region. The
> exit is never contained, but it postdominates all bbs in the region.
Okay, that makes sense.

>> With the insertion of extra merge-blocks the code becomes more structured and the PST
>> can be refined further. A more fine-grained PST may allow more cases to be handled.
>
> That is actually the difference between my algorithm and the PST bracket
> algorithm. Mine should get the same regions, that the PST one gets after
> inserting the best merge blocks possible. Just without requiring CFG
> changes.
With the definition of a SESE-region you mention above I can see how that would work. 
It was just not clear to me what the definition was to begin with. Back-edges is another
matter, but I am sure they are handled one way or another. I will have to look more into the
algorithm.

> I am in writing some documentation about this that we could discuss
> later on.

Sure.
> Thanks for discussing this with me. It helped me to get a feeling where
> the differences are. Hoping we can have these discussions more often.
Yes, there are always interesting details to be discussed between different algorithms.

- Jan

Tobias Grosser

unread,
Jan 22, 2010, 4:49:29 AM1/22/10
to Jan Sjodin, LLVM Developers Mailing List
On 01/21/10 20:42, Jan Sjodin wrote:
>>> Imo, a loop is simply a special kind of region, so a "filter" is perhaps the way to
>>> go if you are interested in loops. Regions containing loops will have to be inspected
>>> using the PST.
>>
>> Except loops that have multiple exits. they are not necessarily (single
>> entry single exit) region, if the exists do not jump to the same exit bb.
>
> Yes, my assumption was to classify structured loops, but that does not exclude unstructured
> regions that contain unstructured loops.

Can you explain your definition of structured/unstructured loops/regions.

A structured loop does have just one exit?

Does a unstructured region have multiple exits, or it just contains
unstructured code?

Tobi

Jan Sjodin

unread,
Jan 22, 2010, 11:30:11 AM1/22/10
to Tobias Grosser, LLVM Developers Mailing List
On 01/21/10 20:42, Jan Sjodin wrote:
>>>> Imo, a loop is simply a special kind of region, so a "filter" is perhaps the way to
>>>> go if you are interested in loops. Regions containing loops will have to be inspected
>>>> using the PST.
>>>
>>> Except loops that have multiple exits. they are not necessarily (single
>>> entry single exit) region, if the exists do not jump to the same exit bb.
>>
>> Yes, my assumption was to classify structured loops, but that does not exclude unstructured
>> regions that contain unstructured loops.
>
> Can you explain your definition of structured/unstructured loops/regions.
Unstructured regions in this case means there is control flow in the region that is not
interesting to look into or classify any further.
 
> A structured loop does have just one exit?
When you say "structured loop" I believe that means it has a single exit and does not use break
or goto to exit the loop.  A loop contained within a region may have multiple exits. If you need
to reason about these kinds of loops, you can create a special kind of region for them. If they
are not interesting, then you would simply treat them as unstructured SESE-regions. This mostly
relates to creating convenience functions around regions so that you can easily extract the body,
exit-edge[s], then/else portions of contitionals etc.

> Does a unstructured region have multiple exits, or it just contains
> unstructured code?
It just contains unstructured code, that cannot be decomposed into smaller
SESE-regions.
Reply all
Reply to author
Forward
0 new messages