CLAST API

28 views
Skip to first unread message

Uday K Bondhugula

unread,
May 21, 2012, 2:46:03 AM5/21/12
to cloog-de...@googlegroups.com

A few questions reg. the CLAST API.

1. Are there functions that can help in traversal and post-processing
Clast? Getting all statement numbers under a particular clast node is
one example. The documentation itself doesn't expose anything like this
but I was wondering if there was something used internally or by any
external tools. I think such functions are going to be really useful to
perform post-processing on the Clast - for example,
get_clast_for_by_name or get_stmts_under_clast etc. If there is no such
support, I can contribute these very soon (next few days). Traversals
and post-processing the Clast is the only good way I can think of to
mark loops parallel, privatize variables, etc.

2. Why does cloog_clast_create_from_input / cloog_clast_create return
'struct clast_stmt *' instead of 'struct clast_root *'?

3. Are cloog_program_read, cloog_program_generate now deprecated?

Thanks,
Uday

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Tobias Grosser

unread,
May 21, 2012, 10:22:14 AM5/21/12
to Uday K Bondhugula, cloog-de...@googlegroups.com
On 05/21/2012 08:46 AM, Uday K Bondhugula wrote:
>
> A few questions reg. the CLAST API.
>
> 1. Are there functions that can help in traversal and post-processing
> Clast? Getting all statement numbers under a particular clast node is
> one example. The documentation itself doesn't expose anything like this
> but I was wondering if there was something used internally or by any
> external tools. I think such functions are going to be really useful to
> perform post-processing on the Clast - for example,
> get_clast_for_by_name or get_stmts_under_clast etc. If there is no such
> support, I can contribute these very soon (next few days). Traversals
> and post-processing the Clast is the only good way I can think of to
> mark loops parallel, privatize variables, etc.

I don't know any functions that exist internally. Both Polly and
graphite have functionality that is build on top of the clast. But this
functionality is intentionally very limited. I see the clast as a
read-only data structure.

What would a get_clast_for_by_name do?

Instead of using a get_stmts_under_clast function, I use the
clast_for->domain field in my parallelism check:

http://repo.or.cz/w/polly-mirror.git/blob/5295547c651c33a1652011a03020191138939514:/lib/Analysis/Dependences.cpp#l221

> 2. Why does cloog_clast_create_from_input / cloog_clast_create return
> 'struct clast_stmt *' instead of 'struct clast_root *'?

I have no answer here. I agree with you that a clast_root would make
more sense. As Sven wrote the code, I don't know the original the answer.

> 3. Are cloog_program_read, cloog_program_generate now deprecated?

I don't know. At least in my projects, I don't use them any more. I
switched to the CloogInput interface.

Tobi

Uday K Bondhugula

unread,
May 21, 2012, 11:00:34 AM5/21/12
to Tobias Grosser, cloog-de...@googlegroups.com
On 05/21/2012 07:52 PM, Tobias Grosser wrote:
> On 05/21/2012 08:46 AM, Uday K Bondhugula wrote:
>>
>> A few questions reg. the CLAST API.
>>
>> 1. Are there functions that can help in traversal and post-processing
>> Clast? Getting all statement numbers under a particular clast node is
>> one example. The documentation itself doesn't expose anything like this
>> but I was wondering if there was something used internally or by any
>> external tools. I think such functions are going to be really useful to
>> perform post-processing on the Clast - for example,
>> get_clast_for_by_name or get_stmts_under_clast etc. If there is no such
>> support, I can contribute these very soon (next few days). Traversals
>> and post-processing the Clast is the only good way I can think of to
>> mark loops parallel, privatize variables, etc.
>
> I don't know any functions that exist internally. Both Polly and
> graphite have functionality that is build on top of the clast. But this
> functionality is intentionally very limited. I see the clast as a
> read-only data structure.

But won't you find it convenient to mark the clast nodes in some way and
then use that information when generating LLVM IR from clast? For eg.,
list of private variables, reduction variables, or just loop being parallel.

>
> What would a get_clast_for_by_name do?

It would return you a list of clast loops (struct clast_for *). You can
then, for eg., mark them as parallel based on other information computed
during the polyhedral optimization process.

- Uday

Tobias Grosser

unread,
May 21, 2012, 11:06:02 AM5/21/12
to ud...@csa.iisc.ernet.in, Uday K Bondhugula, cloog-de...@googlegroups.com
On 05/21/2012 05:00 PM, Uday K Bondhugula wrote:
> On 05/21/2012 07:52 PM, Tobias Grosser wrote:
>> On 05/21/2012 08:46 AM, Uday K Bondhugula wrote:
>>>
>>> A few questions reg. the CLAST API.
>>>
>>> 1. Are there functions that can help in traversal and post-processing
>>> Clast? Getting all statement numbers under a particular clast node is
>>> one example. The documentation itself doesn't expose anything like this
>>> but I was wondering if there was something used internally or by any
>>> external tools. I think such functions are going to be really useful to
>>> perform post-processing on the Clast - for example,
>>> get_clast_for_by_name or get_stmts_under_clast etc. If there is no such
>>> support, I can contribute these very soon (next few days). Traversals
>>> and post-processing the Clast is the only good way I can think of to
>>> mark loops parallel, privatize variables, etc.
>>
>> I don't know any functions that exist internally. Both Polly and
>> graphite have functionality that is build on top of the clast. But this
>> functionality is intentionally very limited. I see the clast as a
>> read-only data structure.
>
> But won't you find it convenient to mark the clast nodes in some way and
> then use that information when generating LLVM IR from clast? For eg.,
> list of private variables, reduction variables, or just loop being
> parallel.

Sure. I do this on the fly when code generating. I can also see that
other people want to first mark certain nodes and later use this marks
when translating the clast to their target language.

>> What would a get_clast_for_by_name do?
>
> It would return you a list of clast loops (struct clast_for *). You can
> then, for eg., mark them as parallel based on other information computed
> during the polyhedral optimization process.

But why by_name? Will it return all clast_fors in the scop or will it
filter the list by name? Do clast_fors have names?

Tobi

Uday K Bondhugula

unread,
May 21, 2012, 11:12:06 AM5/21/12
to Tobias Grosser, Uday K Bondhugula, cloog-de...@googlegroups.com
Correct. Doing it on-the-fly may reduce AST passes, but would add more
code in the clast to IL translation part. Anyway, AST passes are nearly
free I assume.

>
>>> What would a get_clast_for_by_name do?
>>
>> It would return you a list of clast loops (struct clast_for *). You can
>> then, for eg., mark them as parallel based on other information computed
>> during the polyhedral optimization process.
>
> But why by_name? Will it return all clast_fors in the scop or will it
> filter the list by name? Do clast_fors have names?

Sorry I should have explained better. By name, I meant the iterator
name, for eg., "t2", and yes it'll filter it by name if one was
provided. The user would have the scattering function depth and may want
to look for, say, all t2 loops that contain statements S1, S2.
This will be clearer if I send you the function I propose to add; I'll
do that in a moment.

- Uday

>
> Tobi

Uday K Bondhugula

unread,
May 21, 2012, 11:15:12 AM5/21/12
to ud...@csa.iisc.ernet.in, Tobias Grosser, cloog-de...@googlegroups.com
>> But why by_name? Will it return all clast_fors in the scop or will it
>> filter the list by name? Do clast_fors have names?

clast_for->iterator has the time iterator name.

-Uday

Uday K Bondhugula

unread,
May 21, 2012, 11:40:07 AM5/21/12
to Tobias Grosser, cloog-de...@googlegroups.com

>>>>
>>>> It would return you a list of clast loops (struct clast_for *). You can
>>>> then, for eg., mark them as parallel based on other information
>>>> computed
>>>> during the polyhedral optimization process.
>>>
>>> But why by_name? Will it return all clast_fors in the scop or will it
>>> filter the list by name? Do clast_fors have names?
>>
>> Sorry I should have explained better. By name, I meant the iterator
>> name, for eg., "t2", and yes it'll filter it by name if one was
>> provided. The user would have the scattering function depth and may want
>> to look for, say, all t2 loops that contain statements S1, S2.
>> This will be clearer if I send you the function I propose to add; I'll
>> do that in a moment.

Here it is. Comments are welcome.

/*
* A multi-purpose function to traverse and get information on Clast
* loops
*
* node: clast node where processing should start
*
* Returns:
*
* A list of loops under clast_stmt 'node' filtered in two ways: (1) it
contains
* statements appearing in 'stmt_filter' based on filter type (see
below), (2) loops
* with iterator name 'iter' are matched.
* If iter' is set to NULL, no filtering based on iterator name is done
*
* iter: loop iterator name
* stmt_filter: list of statement numbers for filtering (1-indexed)
* nstmts_filter: number of statements in stmt_filter
*
* FilterType: exact (i.e., loops containing only and all those statements
* in stmt_filter) or subset, i.e., loops which have only those statements
* that appear in stmt_filter
*
* To disable all filtering, set 'iter' to NULL, provide all statement
* numbers in 'stmt_filter' and set FilterType to subset
*
* Return fields
*
* stmts: statement numbers under clast node 'node'
* nstmts: number of stmt numbers pointed to by stmts
* loops: list of clast loops
* nloops: number of clast loops in loops
*
* stmts and loops should be freed with 'free'
*
*/
void clast_traverse(struct clast_stmt *node,
const char *iter, const int *stmt_filter, int nstmts_filter,
struct clast_for ***loops, int *nloops,
int **stmts, int *nstmts, FilterType filter_type)

Uday K Bondhugula

unread,
May 22, 2012, 5:11:17 AM5/22/12
to ud...@csa.iisc.ernet.in, Tobias Grosser, cloog-de...@googlegroups.com, Cédric Bastoul
This summarizes what I propose adding to clast loops. pprint_for is
modified accordingly to print out openmp pragmas.

--- cloog/include/cloog/clast.h 2011-04-29 11:30:13.447035000 +0530
+++ cloog-isl/include/cloog/clast.h 2012-05-22 14:30:50.416839585 +0530
@@ -98,6 +98,12 @@
struct clast_expr * UB;
cloog_int_t stride;
struct clast_stmt * body;
+ int omp_parallel;
+ int mpi_parallel;
+ /* Comma separated list of loop private variables for OpenMP
parallelization */
+ char *private_vars;
+ /* Comma separated list of reduction variable/operators for OpenMP
parallelization */
+ char *reduction_vars;
};


For eg., now I'm able to get code where the same iterator (depth) is
parallel at one place and not at the other - was a bad limitation in
Pluto earlier. I assume this isn't an issue for polly as its
parallelization is already clast-based.

/* Start of CLooG code */
if (N >= 1) {
lbp=0;
ubp=N-1;
#pragma omp parallel for private(lbv,ubv)
for (t2=lbp;t2<=ubp;t2++) {
A[t2]=B[t2];;
}
for (t2=1;t2<=N-1;t2++) {
A[t2]=A[t2-1]+1;;
}
}

- Uday

Cédric Bastoul

unread,
May 22, 2012, 6:26:13 AM5/22/12
to Tobias Grosser, Uday K Bondhugula, cloog-de...@googlegroups.com
Hi Uday,

On Mon, May 21, 2012 at 4:22 PM, Tobias Grosser <tob...@grosser.es> wrote:
On 05/21/2012 08:46 AM, Uday K Bondhugula wrote:

A few questions reg. the CLAST API.

1. Are there functions that can help in traversal and post-processing
Clast? Getting all statement numbers under a particular clast node is
one example. The documentation itself doesn't expose anything like this
but I was wondering if there was something used internally or by any
external tools. I think such functions are going to be really useful to
perform post-processing on the Clast - for example,
get_clast_for_by_name or get_stmts_under_clast etc. If there is no such
support, I can contribute these very soon (next few days). Traversals
and post-processing the Clast is the only good way I can think of to
mark loops parallel, privatize variables, etc.

I don't know any functions that exist internally. Both Polly and graphite have functionality that is build on top of the clast. But this functionality is intentionally very limited. I see the clast as a read-only data structure.

There is no function like this I think. There is some debate about whether we should do it inside or outside CLooG. Louis-Noël wrote a "clastlib" (or something like this) for this purpose. I dislike the current state: CLooG is defining Clast but it does not provide management functions. I see two solutions and I'm open to both of them: (1) we add such functionality inside CLooG directly, or (2) we create a Clast library with all the functionality outside CLooG and CLooG links it. I think (1) is a better short-term solution, plus it is backward-compatible.

So I would welcome the functions you'll contribute !
 
What would a get_clast_for_by_name do?

Instead of using a get_stmts_under_clast function, I use the clast_for->domain field in my parallelism check:

http://repo.or.cz/w/polly-mirror.git/blob/5295547c651c33a1652011a03020191138939514:/lib/Analysis/Dependences.cpp#l221


2. Why does cloog_clast_create_from_input / cloog_clast_create return
'struct clast_stmt *' instead of 'struct clast_root *'?

I have no answer here. I agree with you that a clast_root would make more sense. As Sven wrote the code, I don't know the original the answer.

No idea as well. But no function is called cloog_clast_generate so there is some room for such a function and it would fit the old interface.
 
3. Are cloog_program_read, cloog_program_generate now deprecated?

I don't know. At least in my projects, I don't use them any more. I switched to the CloogInput interface.

I would recommend to switch to cloog_input_read / cloog_clast_create_from_input / clast_pprint (bad name btw, I'll do a cloog_clast_pprint alias) but the old interface is used internally so it's not going to vanish anytime soon.

Ced.

Uday K Bondhugula

unread,
May 22, 2012, 6:33:42 AM5/22/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
> whether we should do it inside or outside CLooG. Louis-No�l wrote a
> "clastlib" (or something like this) for this purpose. I dislike the
> current state: CLooG is defining Clast but it does not provide
> management functions. I see two solutions and I'm open to both of them:
> (1) we add such functionality inside CLooG directly, or (2) we create a
> Clast library with all the functionality outside CLooG and CLooG links
> it. I think (1) is a better short-term solution, plus it is
> backward-compatible.

Yes, I fully agree. Having it in another library is just an unnecessary
overkill due to engineering / maintenance overhead. I feel Clast
traversal functions should be provided from within Cloog (as a
long-term solution as well).


>
> So I would welcome the functions you'll contribute !

I already have them ready. Will send patches later today.

Thanks for your other answers.

-Uday


>
> What would a get_clast_for_by_name do?
>
> Instead of using a get_stmts_under_clast function, I use the
> clast_for->domain field in my parallelism check:
>
> http://repo.or.cz/w/polly-_mirror.git/blob/_5295547c651c33a1652011a0302019_1138939514:/lib/Analysis/_Dependences.cpp#l221____
> <http://repo.or.cz/w/polly-mirror.git/blob/5295547c651c33a1652011a03020191138939514:/lib/Analysis/Dependences.cpp#l221>
> ____
>
> ____
>
> ____2. Why does cloog_clast_create_from_input /
> cloog_clast_create return
> 'struct clast_stmt *' instead of 'struct clast_root *'?
> ____
>
> ____
> ____
> ____I have no answer here. I agree with you that a clast_root would
> make more sense. As Sven wrote the code, I don't know the original
> the answer.____
>
> ____
> ____
> ____No idea as well. But no function is called cloog_clast_generate so
> there is some room for such a function and it would fit the old
> interface.____
> ________
>
> ____3. Are cloog_program_read, cloog_program_generate now
> deprecated?
> ____
>
> ____
> ____
> ____I don't know. At least in my projects, I don't use them any
> more. I switched to the CloogInput interface.
> ____
>
> ____
> ____
> ____I would recommend to switch to cloog_input_read
> / cloog_clast_create_from_input / clast_pprint (bad name btw, I'll do a
> cloog_clast_pprint alias) but the old interface is used internally so
> it's not going to vanish anytime soon.____
> ____
> ____
> ____Ced.____
> ____
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean. ____

Sven Verdoolaege

unread,
May 22, 2012, 6:34:14 AM5/22/12
to Cédric Bastoul, Tobias Grosser, Uday K Bondhugula, cloog-de...@googlegroups.com
On Tue, May 22, 2012 at 12:26:13PM +0200, C�dric Bastoul wrote:
> On Mon, May 21, 2012 at 4:22 PM, Tobias Grosser <tob...@grosser.es> wrote:
> > On 05/21/2012 08:46 AM, Uday K Bondhugula wrote:
> > 2. Why does cloog_clast_create_from_input / cloog_clast_create return
> >> 'struct clast_stmt *' instead of 'struct clast_root *'?
> >
> > I have no answer here. I agree with you that a clast_root would make more
> > sense. As Sven wrote the code, I don't know the original the answer.
>
> No idea as well.

The reason is probably just that
when cloog_clast_create was introduced there was no clast_root.

skimo

Uday K Bondhugula

unread,
May 22, 2012, 6:41:34 AM5/22/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com

____I would recommend to switch to cloog_input_read
> / cloog_clast_create_from_input / clast_pprint (bad name btw, I'll do a
> cloog_clast_pprint alias) but the old interface is used internally so
> it's not going to vanish anytime soon.____
> ____

If you plan to do renaming, I feel clast_stmt should be renamed;
clast_stmt / struct clast_stmt is misleading. They should really be
called clast nodes, i.e., clast_node / struct clast_node since a node
here can be a loop, a guard/conditional, remapping, user statement,
etc.. clast_user_stmt is fine, but clast_stmt ends up confusing with the
former. A simple clast_stmt -> clast_node should do it I think.

-Uday

Uday K Bondhugula

unread,
May 23, 2012, 6:59:01 AM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com

A (preliminary) patch to allow clast-based parallelization is attached.

-Uday
0002-clast-based-loop-parallelization-support.patch

Uday K Bondhugula

unread,
May 23, 2012, 7:15:28 AM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On 05/23/2012 04:29 PM, Uday K Bondhugula wrote:
>
> A (preliminary) patch to allow clast-based parallelization is attached.

The same thing is pasted below. -Uday

commit 2fec83ad63bbe1d0b2535d5fff669e27e09f1561
Author: Uday Bondhugula <uday...@gmail.com>
Date: Wed May 23 16:27:37 2012 +0530

clast-based loop parallelization support

This patch provides support for printing OMP parallel loops if marked
appropriately. If clast_for->omp_parallel is set, the loop will be
annotated with an omp parallel for pragma; list of private and
reduction
vars can be set as well.

Signed-off-by: Uday Bondhugula <uday...@gmail.com>

diff --git a/include/cloog/clast.h b/include/cloog/clast.h
index b455369..0d83a84 100644
--- a/include/cloog/clast.h
+++ b/include/cloog/clast.h
@@ -98,6 +98,12 @@ struct clast_for {
struct clast_expr * UB;
cloog_int_t stride;
struct clast_stmt * body;
+ int omp_parallel;
+ int mpi_parallel;
+ /* Comma separated list of loop private variables for OpenMP
parallelization */
+ char *private_vars;
+ /* Comma separated list of reduction variable/operators for OpenMP
parallelization */
+ char *reduction_vars;
};

struct clast_equation {
diff --git a/source/clast.c b/source/clast.c
index 0b67532..2752039 100644
--- a/source/clast.c
+++ b/source/clast.c
@@ -211,6 +211,8 @@ static void free_clast_for(struct clast_stmt *s)
free_clast_expr(f->UB);
cloog_int_clear(f->stride);
cloog_clast_free(f->body);
+ if (f->private_vars) free(f->private_vars);
+ if (f->reduction_vars) free(f->reduction_vars);
free(f);
}

@@ -226,6 +228,10 @@ struct clast_for *new_clast_for(CloogDomain
*domain, const char *it,
f->LB = LB;
f->UB = UB;
f->body = NULL;
+ f->omp_parallel = 0;
+ f->mpi_parallel = 0;
+ f->private_vars = NULL;
+ f->reduction_vars = NULL;
cloog_int_init(f->stride);
if (stride)
cloog_int_set(f->stride, stride->stride);
diff --git a/source/pprint.c b/source/pprint.c
index 9c7f1d4..dd151c9 100644
--- a/source/pprint.c
+++ b/source/pprint.c
@@ -405,6 +405,56 @@ void pprint_guard(struct cloogoptions *options,
FILE *dst, int indent,
void pprint_for(struct cloogoptions *options, FILE *dst, int indent,
struct clast_for *f)
{
+ if (options->language == CLOOG_LANGUAGE_C) {
+ if (f->omp_parallel && !f->mpi_parallel) {
+ if (f->LB) {
+ fprintf(dst, "lbp=");
+ pprint_expr(options, dst, f->LB);
+ fprintf(dst, ";\n");
+ }
+ if (f->UB) {
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "ubp=");
+ pprint_expr(options, dst, f->UB);
+ fprintf(dst, ";\n");
+ }
+ fprintf(dst, "#pragma omp parallel for%s%s%s%s%s%s\n",
+ (f->private_vars)? " private(":"",
+ (f->private_vars)? f->private_vars: "",
+ (f->private_vars)? ")":"",
+ (f->reduction_vars)? " reduction(": "",
+ (f->reduction_vars)? f->reduction_vars: "",
+ (f->reduction_vars)? ")": "");
+ fprintf(dst, "%*s", indent, "");
+ }
+ if (f->mpi_parallel) {
+ if (f->LB) {
+ fprintf(dst, "_lb_dist=");
+ pprint_expr(options, dst, f->LB);
+ fprintf(dst, ";\n");
+ }
+ if (f->UB) {
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "_ub_dist=");
+ pprint_expr(options, dst, f->UB);
+ fprintf(dst, ";\n");
+ }
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "polyrt_loop_dist(_lb_dist, _ub_dist, nprocs,
my_rank, &lbp, &ubp);\n");
+ if (f->omp_parallel) {
+ fprintf(dst, "#pragma omp parallel for%s%s%s%s%s%s\n",
+ (f->private_vars)? " private(":"",
+ (f->private_vars)? f->private_vars: "",
+ (f->private_vars)? ")":"",
+ (f->reduction_vars)? " reduction(": "",
+ (f->reduction_vars)? f->reduction_vars: "",
+ (f->reduction_vars)? ")": "");
+ }
+ fprintf(dst, "%*s", indent, "");
+ }
+
+ }
+
if (options->language == CLOOG_LANGUAGE_FORTRAN)
fprintf(dst, "DO ");
else
@@ -412,7 +462,11 @@ void pprint_for(struct cloogoptions *options, FILE
*dst, int indent,

if (f->LB) {
fprintf(dst, "%s=", f->iterator);
+ if (f->omp_parallel || f->mpi_parallel) {
+ fprintf(dst, "lbp");
+ }else{
pprint_expr(options, dst, f->LB);
+ }
} else if (options->language == CLOOG_LANGUAGE_FORTRAN)
cloog_die("unbounded loops not allowed in FORTRAN.\n");

@@ -424,6 +478,10 @@ void pprint_for(struct cloogoptions *options, FILE
*dst, int indent,
if (f->UB) {
if (options->language != CLOOG_LANGUAGE_FORTRAN)
fprintf(dst,"%s<=", f->iterator);
+
+ if (f->omp_parallel || f->mpi_parallel) {
+ fprintf(dst, "ubp");
+ }else{
pprint_expr(options, dst, f->UB);
} else if (options->language == CLOOG_LANGUAGE_FORTRAN)
cloog_die("unbounded loops not allowed in FORTRAN.\n");

Cédric Bastoul

unread,
May 23, 2012, 9:46:07 AM5/23/12
to Uday K Bondhugula, Tobias Grosser, cloog-de...@googlegroups.com
Hi Uday,
I'm fine with the changes. I would suggest :

- only one "parallel" field in clast_for, with some #define flags:
#define CLAST_PARALLEL_NOT 0
#define CLAST_PARALLEL_VEC 2
#define CLAST_PARALLEL_OMP 4
#define CLAST_PARALLEL_MPI 8
So we can manage more options, e.g., ivdep. In fact a more general "decoration" field could be used, not explicitly linked to parallelism (since unroll, prefetch etc. could be managed through this field as well).

- the code in pprint is somewhat hardcoded, I would suggest some #define PPRINT_ for lbp, _lb_dist etc., but I admit it's readable this way, so I don't care much.
Thank you,

Ced.

--
You got this message because you subscribed to the CLooG Development mailing list.
To send messages to this list, use cloog-development@googlegroups.com
To stop subscribing, send a mail to cloog-development+unsubscribe@googlegroups.com
For more options and to visit the group, http://groups.google.fr/group/cloog-development?hl=en

Uday K Bondhugula

unread,
May 23, 2012, 10:21:20 AM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On 05/23/2012 07:16 PM, C�dric Bastoul wrote:
> Hi Uday,
> I'm fine with the changes. I would suggest :
>
> - only one "parallel" field in clast_for, with some #define flags:
> #define CLAST_PARALLEL_NOT 0
> #define CLAST_PARALLEL_VEC 2
> #define CLAST_PARALLEL_OMP 4
> #define CLAST_PARALLEL_MPI 8

But a loop can be both MPI parallel and OpenMP parallel. The code
already handles this; if you set both mpi_parallel and omp_parallel,
it'll distribute the loop for MPI and mark the local loop OpenMP
parallel. May be another

#define CLAST_PARALLEL_MPIOMP 16

> So we can manage more options, e.g., ivdep. In fact a more general
> "decoration" field could be used, not explicitly linked to parallelism
> (since unroll, prefetch etc. could be managed through this field as well).

Yes, I was thinking about ivdep and unroll as well; they are next.

>
> - the code in pprint is somewhat hardcoded, I would suggest some #define
> PPRINT_ for lbp, _lb_dist etc., but I admit it's readable this way, so I
> don't care much.

Okay.

Thanks,
-Uday
> cloog-development@__googlegroups.com
> <mailto:cloog-de...@googlegroups.com>
> To stop subscribing, send a mail to
> cloog-development+unsubscribe@__googlegroups.com
> <mailto:cloog-development%2Bunsu...@googlegroups.com>
> For more options and to visit the group,
> http://groups.google.fr/group/__cloog-development?hl=en
> <http://groups.google.fr/group/cloog-development?hl=en>
>
>
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.

Tobias Grosser

unread,
May 23, 2012, 10:41:42 AM5/23/12
to Uday K Bondhugula, Cédric Bastoul, cloog-de...@googlegroups.com
On 05/23/2012 12:59 PM, Uday K Bondhugula wrote:
>
> A (preliminary) patch to allow clast-based parallelization is attached.

Hi Uday,

I like the idea of extending the CLooG pprinter to support OpenMP,
vectorization & Co. However, I am not sure about the approach taken.

By defining PARALLEL_NOT, PARALLEL_VEC, ... we hard code a lot of logic
into CLooG. Which makes it difficult to add new extensions and adds
complex logic into the core of CLooG. (How to print both MPI and OpenMP
exist).

What about defining a more flexible interface? We could e.g. add a
pprint_options struct to the pretty printer, that allows to define a
callback to print the for node. By default this callback just uses
pprint_for_sequential, but the user can define its own call back. We can
provide default implementations such as pprint_for_openmp,
pprint_for_mpi, ..., which a user could use in his callback.

The user callback can than flexibly decide what kind of loop should be
created.

Cheers
Tobi

Cédric Bastoul

unread,
May 23, 2012, 10:43:48 AM5/23/12
to ud...@csa.iisc.ernet.in, Tobias Grosser, cloog-de...@googlegroups.com


On Wed, May 23, 2012 at 4:21 PM, Uday K Bondhugula <uday...@gmail.com> wrote:

On 05/23/2012 07:16 PM, Cédric Bastoul wrote:
Hi Uday,
I'm fine with the changes. I would suggest :

- only one "parallel" field in clast_for, with some #define flags:
#define CLAST_PARALLEL_NOT 0
#define CLAST_PARALLEL_VEC 2
#define CLAST_PARALLEL_OMP 4
#define CLAST_PARALLEL_MPI 8

But a loop can be both MPI parallel and OpenMP parallel. The code already handles this; if you set both mpi_parallel and omp_parallel, it'll distribute the loop for MPI and mark the local loop OpenMP parallel.  May be another

Sure, I was thinking about bit flags, so you can have many options together:
if ((f->decoration & CLAST_DECORATION_OMP) || (f->decoration & CLAST_DECORATION_MPI))

(well, "decoration" or anything better)


   To stop subscribing, send a mail to
   cloog-development+unsubscribe@__googlegroups.com
   <mailto:cloog-development%2Bunsu...@googlegroups.com>

   For more options and to visit the group,
   http://groups.google.fr/group/__cloog-development?hl=en
   <http://groups.google.fr/group/cloog-development?hl=en>




--
This message has been scanned for viruses and
dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
believed to be clean.
--
You got this message because you subscribed to the CLooG Development mailing list.
To send messages to this list, use cloog-development@googlegroups.com
To stop subscribing, send a mail to cloog-development+unsubscribe@googlegroups.com
For more options and to visit the group, http://groups.google.fr/group/cloog-development?hl=en

Tobias Grosser

unread,
May 23, 2012, 10:47:36 AM5/23/12
to Cédric Bastoul, ud...@csa.iisc.ernet.in, cloog-de...@googlegroups.com
On 05/23/2012 04:43 PM, C�dric Bastoul wrote:
>
>
> On Wed, May 23, 2012 at 4:21 PM, Uday K Bondhugula <uday...@gmail.com
> <mailto:uday...@gmail.com>> wrote:
>
> On 05/23/2012 07:16 PM, C�dric Bastoul wrote:
>
> Hi Uday,
> I'm fine with the changes. I would suggest :
>
> - only one "parallel" field in clast_for, with some #define flags:
> #define CLAST_PARALLEL_NOT 0
> #define CLAST_PARALLEL_VEC 2
> #define CLAST_PARALLEL_OMP 4
> #define CLAST_PARALLEL_MPI 8
>
>
> But a loop can be both MPI parallel and OpenMP parallel. The code
> already handles this; if you set both mpi_parallel and omp_parallel,
> it'll distribute the loop for MPI and mark the local loop OpenMP
> parallel. May be another
>
>
> Sure, I was thinking about bit flags, so you can have many options together:
> if ((f->decoration & CLAST_DECORATION_OMP) || (f->decoration &
> CLAST_DECORATION_MPI))
>
> (well, "decoration" or anything better)

An alternative approach would be to not add anything pprinter related to
the clast itself, but to add a single user void*. The user can store
its analysis results there and can use the results in the call back I
proposed before.

Tobi

Uday K Bondhugula

unread,
May 23, 2012, 10:50:56 AM5/23/12
to cloog-de...@googlegroups.com
On 05/23/2012 08:11 PM, Tobias Grosser wrote:
> On 05/23/2012 12:59 PM, Uday K Bondhugula wrote:
>>
>> A (preliminary) patch to allow clast-based parallelization is attached.
>
> Hi Uday,
>
> I like the idea of extending the CLooG pprinter to support OpenMP,
> vectorization & Co. However, I am not sure about the approach taken.
>
> By defining PARALLEL_NOT, PARALLEL_VEC, ... we hard code a lot of logic
> into CLooG. Which makes it difficult to add new extensions and adds
> complex logic into the core of CLooG. (How to print both MPI and OpenMP
> exist).

The code I sent already handles MPI+OpenMP; you can get OpenMP, MPI, or
MPI+OpenMP.

>
> What about defining a more flexible interface? We could e.g. add a
> pprint_options struct to the pretty printer, that allows to define a
> callback to print the for node. By default this callback just uses
> pprint_for_sequential, but the user can define its own call back. We can
> provide default implementations such as pprint_for_openmp,
> pprint_for_mpi, ..., which a user could use in his callback.

Such an interface will be flexible and generic, but I think it's too
heavy-weight and an overkill for what we may want; I'm not sure it's
warranted. In addition, OpenMP code is something that many people might
want from Cloog, and Cloog I feel should have reusable functionality
like these. Marking OpenMP parallel and providing list of private and
reduction vars is already quite powerful. As for addition of logic to
Cloog code, I think it does increase the code size there, but the logic
itself appears to be simple to follow/work on. On top of all this, if a
user wants to do more pretty-printing, custom post-processing, having
such a callback interface is a good idea; but I'm suggesting having this
OpenMP stuff inside Cloog and, in addition, providing those custom
callback things.

-Uday

Tobias Grosser

unread,
May 23, 2012, 11:10:09 AM5/23/12
to cloog-de...@googlegroups.com, Uday K Bondhugula
I agree that having this functionality in CLooG is a good thing. I have
no strong opinion on how you implement it, I just wanted to point out
the alternative. If what you have works for you and Cedric agrees,
that's all you need.

Tobi

Cédric Bastoul

unread,
May 23, 2012, 11:47:43 AM5/23/12
to Tobias Grosser, ud...@csa.iisc.ernet.in, cloog-de...@googlegroups.com


On Wed, May 23, 2012 at 4:47 PM, Tobias Grosser <tob...@grosser.es> wrote:
On 05/23/2012 04:43 PM, Cédric Bastoul wrote:


On Wed, May 23, 2012 at 4:21 PM, Uday K Bondhugula <uday...@gmail.com
<mailto:uday...@gmail.com>> wrote:


   On 05/23/2012 07:16 PM, Cédric Bastoul wrote:

       Hi Uday,
       I'm fine with the changes. I would suggest :

       - only one "parallel" field in clast_for, with some #define flags:
       #define CLAST_PARALLEL_NOT 0
       #define CLAST_PARALLEL_VEC 2
       #define CLAST_PARALLEL_OMP 4
       #define CLAST_PARALLEL_MPI 8


   But a loop can be both MPI parallel and OpenMP parallel. The code
   already handles this; if you set both mpi_parallel and omp_parallel,
   it'll distribute the loop for MPI and mark the local loop OpenMP
   parallel.  May be another


Sure, I was thinking about bit flags, so you can have many options together:
if ((f->decoration & CLAST_DECORATION_OMP) || (f->decoration &
CLAST_DECORATION_MPI))

(well, "decoration" or anything better)

An alternative approach would be to not add anything pprinter related to the clast itself, but to add a single user void*. The user can  store its analysis results there and can use the results in the call back I proposed before.

Since CLooG is a loop generator, I think it's OK (and simple) to support explicitly some loop properties. Providing a void * field in addition for user's convenience is a good idea as well, I'm doing it everywhere now (btw I think some users are already using Clast's char * fields as void *, they can do it with Uday's changes as well. The callback idea may be more complex in the case of compositions (like MPI + OpenMP), so I think Uday's solution is OK: it's simple, it helps to offer an anticipated feature and it makes one of the main users happy :) !

Uday, can you point me to polyrt_loop_dist() ?

Uday Reddy

unread,
May 23, 2012, 11:52:11 AM5/23/12
to Cédric Bastoul, Tobias Grosser, ud...@csa.iisc.ernet.in, cloog-de...@googlegroups.com
On Wed, May 23, 2012 at 9:17 PM, Cédric Bastoul <cedric....@u-psud.fr> wrote:


On Wed, May 23, 2012 at 4:47 PM, Tobias Grosser <tob...@grosser.es> wrote:
On 05/23/2012 04:43 PM, Cédric Bastoul wrote:


On Wed, May 23, 2012 at 4:21 PM, Uday K Bondhugula <uday...@gmail.com
<mailto:uday...@gmail.com>> wrote:

   On 05/23/2012 07:16 PM, Cédric Bastoul wrote:

       Hi Uday,
       I'm fine with the changes. I would suggest :

       - only one "parallel" field in clast_for, with some #define flags:
       #define CLAST_PARALLEL_NOT 0
       #define CLAST_PARALLEL_VEC 2
       #define CLAST_PARALLEL_OMP 4
       #define CLAST_PARALLEL_MPI 8


   But a loop can be both MPI parallel and OpenMP parallel. The code
   already handles this; if you set both mpi_parallel and omp_parallel,
   it'll distribute the loop for MPI and mark the local loop OpenMP
   parallel.  May be another


Sure, I was thinking about bit flags, so you can have many options together:
if ((f->decoration & CLAST_DECORATION_OMP) || (f->decoration &
CLAST_DECORATION_MPI))

(well, "decoration" or anything better)

An alternative approach would be to not add anything pprinter related to the clast itself, but to add a single user void*. The user can  store its analysis results there and can use the results in the call back I proposed before.

Since CLooG is a loop generator, I think it's OK (and simple) to support explicitly some loop properties. Providing a void * field in addition for user's convenience is a good idea as well, I'm doing it everywhere now (btw I think some users are already using Clast's char * fields as void *, they can do it with Uday's changes as well. The callback idea may be more complex in the case of compositions (like MPI + OpenMP), so I think Uday's solution is OK: it's simple, it helps to offer an anticipated feature and it makes one of the main users happy :) !

Uday, can you point me to polyrt_loop_dist() ?

polyrt_loop_dist provides a lower and upper bound for each process. Here is how it would look for a load-balanced block distribution.

void polyrt_loop_dist(int lb, int ub, int nprocs, int my_rank, int *my_start, int *my_end)
{  
    int p;
   
    long n = ub - lb + 1;
   
    for (p=0; p<nprocs; p++)    {
        if (p < n%nprocs)  {
            compute_start[p] =  lb + (n/nprocs)*p + p;
            /* procs with id < n%nprocs get an extra iteration to distributed
             * the remainder */
            compute_end[p] = compute_start[p] + (n/nprocs)-1 + 1;
        }else{
            compute_start[p] =  lb + (n/nprocs)*p + n%nprocs;
            compute_end[p] = compute_start[p] + (n/nprocs) - 1;
        }
               
    }  
       
    *my_start = compute_start[my_rank];
    *my_end = compute_end[my_rank];
}


 

--
You got this message because you subscribed to the CLooG Development mailing list.
To send messages to this list, use cloog-development@googlegroups.com
To stop subscribing, send a mail to cloog-development+unsubscribe@googlegroups.com
For more options and to visit the group, http://groups.google.fr/group/cloog-development?hl=en

--
You got this message because you subscribed to the CLooG Development mailing list.
To send messages to this list, use cloog-de...@googlegroups.com
To stop subscribing, send a mail to cloog-developm...@googlegroups.com

Uday Reddy

unread,
May 23, 2012, 11:56:51 AM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com

Since CLooG is a loop generator, I think it's OK (and simple) to support explicitly some loop properties. Providing a void * field in addition for user's convenience is a good idea as well, I'm doing it everywhere now (btw I think some users are already using Clast's char * fields as void *, they can do it with Uday's changes as well. The callback idea may be more complex in the case of compositions (like MPI + OpenMP), so I think Uday's solution is OK: it's simple, it helps to offer an anticipated feature and it makes one of the main users happy :) !

Uday, can you point me to polyrt_loop_dist() ?

polyrt_loop_dist provides a lower and upper bound for each process. Here is how it would look for a load-balanced block distribution.

Here's a more simplified version.

------------

void polyrt_loop_dist(int lb, int ub, int nprocs, int my_rank, int *my_start, int *my_end)
{
    int p;
    long n = ub - lb + 1;

    if (my_rank < n%nprocs)  {
        *my_start =  lb + (n/nprocs)*my_rank + my_rank;

        /* procs with id < n%nprocs get an extra iteration to distribute
         * the remainder */
        *my_end = *my_start + (n/nprocs)-1 + 1;
    }else{
        *my_start =  lb + (n/nprocs)*my_rank + n%nprocs;
        *my_end = *my_start + (n/nprocs) - 1;
    }
}
------------

-Uday

Cédric Bastoul

unread,
May 23, 2012, 12:19:09 PM5/23/12
to Uday Reddy, Tobias Grosser, cloog-de...@googlegroups.com
I see. Thank you. CLooG should provide an example implementation of this function somewhere (documentation, -compilable option...) at some point. Also, how do you mark the loops ? Beta-vector-based ? I was planning a loop-level-based decoration extension to OpenScop, but I guess you want something more elaborated. We should discuss such an extension, I need it anyway.

(NB: I started today integrating Prasanth Symbols extension and I will use it in Clan, then I'll help you to use it in Pluto, I need Pluto/PoCC with the new parser).

--

Uday Reddy

unread,
May 23, 2012, 12:57:41 PM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On Wed, May 23, 2012 at 9:49 PM, Cédric Bastoul <cedric....@u-psud.fr> wrote:
I see. Thank you. CLooG should provide an example implementation of this function somewhere (documentation, -compilable option...) at some point.

Yes, sure; I can update the documentatino.
 
Also, how do you mark the loops ? Beta-vector-based ? I was planning a

Actually, it's more general than beta-vector-based since not everyone may have transformations in a form where a loop (scattering tree loop actually) can be identified based on beta-vectors. I do it based on {depth, <stmt list>}, and I'm going to send a function that given {depth, <stmt list>}, can return all loops at that depth that have only those statements that appear in <stmt list>. So, if you have a beta-vector, if you provide its size as depth and the list of statements that share that beta-vector, that'll do.

The function is more generic and can do more things.

Thanks,
Uday
 

Uday Reddy

unread,
May 23, 2012, 1:05:34 PM5/23/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On Wed, May 23, 2012 at 10:27 PM, Uday Reddy <uday...@gmail.com> wrote:


On Wed, May 23, 2012 at 9:49 PM, Cédric Bastoul <cedric....@u-psud.fr> wrote:
I see. Thank you. CLooG should provide an example implementation of this function somewhere (documentation, -compilable option...) at some point.

Yes, sure; I can update the documentatino.
 
Also, how do you mark the loops ? Beta-vector-based ? I was planning a

Actually, it's more general than beta-vector-based since not everyone may have transformations in a form where a loop (scattering tree loop actually) can be identified based on beta-vectors. I do it based on {depth, <stmt list>}, and I'm going to send a function that given {depth, <stmt list>}, can return all loops at that depth that have only those statements that appear in <stmt list>. So, if you have a beta-vector, if you provide its size as depth and the list of statements that share that beta-vector, that'll do.

The function is more generic and can do more things.

Patch for clast traversal pasted below.

-Uday


From d255400679150ed91c39bdaa730d200435388e5c Mon Sep 17 00:00:00 2001
From: Uday Bondhugula <uday...@gmail.com>
Date: Wed, 23 May 2012 22:32:57 +0530
Subject: [PATCH 2/2] clast traversal support

See comment for function clast_traverse.

Signed-off-by: Uday Bondhugula <uday...@gmail.com>
---
 Makefile.am              |    1 +
 include/cloog/clast.h    |    2 +
 source/clast_traversal.c |  192 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 195 insertions(+), 0 deletions(-)
 create mode 100644 source/clast_traversal.c

diff --git a/Makefile.am b/Makefile.am
index 1749c95..d45c7ae 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -73,6 +73,7 @@ SOURCES_CORE = \
     $(GET_MEMORY_FUNCTIONS) \
     source/block.c \
     source/clast.c \
+    source/clast_traversal.c \
     source/matrix.c \
     source/state.c \
     source/input.c \
diff --git a/include/cloog/clast.h b/include/cloog/clast.h
index 0d83a84..3b16649 100644
--- a/include/cloog/clast.h
+++ b/include/cloog/clast.h
@@ -154,6 +154,8 @@ int clast_expr_equal(struct clast_expr *e1, struct clast_expr *e2);
 struct clast_expr *clast_bound_from_constraint(CloogConstraint *constraint,
                            int level, CloogNames *names);
 
+typedef enum filterType {exact, subset} ClastFilterType;
+
 #if defined(__cplusplus)
   }
 #endif
diff --git a/source/clast_traversal.c b/source/clast_traversal.c
new file mode 100644
index 0000000..fdeea51
--- /dev/null
+++ b/source/clast_traversal.c
@@ -0,0 +1,192 @@
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include "../include/cloog/cloog.h"
+
+
+/* Adds to the list if not already in it */
+static int add_if_new(void **list, int num, void *new, int size)
+{
+    int i;
+
+    for (i=0; i<num; i++) {
+        if (!memcmp((*list) + i*size, new, size)) break;
+    }
+
+    if (i==num) {
+        *list = realloc(*list, (num+1)*size);
+        memcpy(*list + num*size, new, size);
+        return 1;
+    }
+
+    return 0;
+}
+
+
+/* Concatenates all elements of list2 that are not in list1;
+ * Returns the new size of the list */
+int concat_if_new(void **list1, int num1, void *list2, int num2, int size)
+{
+    int i, ret;
+
+    for (i=0; i<num2; i++) {
+        ret = add_if_new(list1, num1, (char *)list2 + i*size, size);
+        if (ret) num1++;
+    }
+
+    return num1;
+}
+
+/* Compares list1 to list2
+ * Returns 0 if both have the same elements; returns -1 if all elements of
+ * list1 are strictly contained in list2; 1 otherwise
+ */
+int list_compare(const int *list1, int num1, const int *list2, int num2)
+{
+    int i, j;
+
+    for (i=0; i<num1; i++) {
+        for (j=0; j<num2; j++) {
+            if (list1[i] == list2[j]) break;
+        }
+        if (j==num2) break;
+    }
+    if (i==num1) {
+       if (num1 == num2) {
+        return 0;
+       }
+       return -1;
+    }
+
+    return 1;
+}
+
+
+
+/*
+ * A multi-purpose function to traverse and get information on Clast
+ * loops
+ *
+ * node: clast node where processing should start
+ *
+ * Returns:
+ *
+ * A list of loops under clast_stmt 'node' filtered in two ways: (1) it contains
+ * statements appearing in 'stmt_filter', (2) loop iterator's name is 'iter'
+ * If iter' is set to NULL, no filtering based on iterator name is done
+ *
+ * A list of statements (statement numbers) under clast node 'node'
+ *
+ * iter: loop iterator name
+ * stmt_filter: list of statement numbers for filtering (1-indexed)
+ * nstmts_filter: number of statements in stmt_filter
+ *
+ * FilterType: match exact (i.e., loops containing only and all those statements
+ * in stmt_filter) or subset, i.e., loops which have only those statements
+ * that appear in stmt_filter
+ *
+ * To disable all filtering, set 'iter' to NULL, provide all statement
+ * numbers in 'stmt_filter' and set FilterType to subset
+ *
+ * Return fields
+ *
+ * stmts: statement numbers under node
+ * nstmts: number of stmt numbers pointed to by stmts
+ * loops: list of clast loops
+ * nloops: number of clast loops in loops
+ *
+ */
+void clast_traverse(struct clast_stmt *node,
+        const char *iter, const int *stmt_filter, int nstmts_filter,
+        struct clast_for ***loops, int *nloops,
+        int **stmts, int *nstmts, ClastFilterType filter_type)
+{
+    int num_next_stmts, num_next_loops, ret, *stmts_next;
+    struct clast_for **loops_next;
+
+    *loops = NULL;
+    *nloops = 0;
+    *nstmts = 0;
+    *stmts = NULL;
+
+    if (node == NULL) {
+        return;
+    }
+
+    if (CLAST_STMT_IS_A(node, stmt_root)) {
+        // printf("root stmt\n");
+        struct clast_root *root = (struct clast_root *) node;
+        clast_traverse((root->stmt).next, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+        free(loops_next);
+        free(stmts_next);
+    }
+
+    if (CLAST_STMT_IS_A(node, stmt_guard)) {
+        // printf("guard stmt\n");
+        struct clast_guard *guard = (struct clast_guard *) node;
+        clast_traverse(guard->then, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+        free(loops_next);
+        free(stmts_next);
+        clast_traverse((guard->stmt).next, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+        free(loops_next);
+        free(stmts_next);
+    }
+
+    if (CLAST_STMT_IS_A(node, stmt_user)) {
+        struct clast_user_stmt *user_stmt = (struct clast_user_stmt *) node;
+        // printf("user stmt: S%d\n", user_stmt->statement->number);
+        ret = add_if_new((void **)stmts, *nstmts, &user_stmt->statement->number, sizeof(int));
+        if (ret) (*nstmts)++;
+        clast_traverse((user_stmt->stmt).next, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+        free(loops_next);
+        free(stmts_next);
+    }
+    if (CLAST_STMT_IS_A(node, stmt_for)) {
+        struct clast_for *for_stmt = (struct clast_for *) node;
+        // printf("for stmt: %s\n", for_stmt->iterator);
+       
+        clast_traverse(for_stmt->body, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+
+        if (iter == NULL || !strcmp(for_stmt->iterator, iter)) {
+            if (stmt_filter == NULL ||
+                    (filter_type == subset && list_compare(stmts_next, num_next_stmts,
+                                                           stmt_filter, nstmts_filter) <= 0)
+                    || (filter_type == exact && list_compare(stmts_next, num_next_stmts,
+                            stmt_filter, nstmts_filter) == 0 )) {
+                ret = add_if_new((void **)loops, *nloops, &for_stmt, sizeof(struct clast_for *));
+                if (ret) (*nloops)++;
+            }
+        }
+        free(loops_next);
+        free(stmts_next);
+
+        clast_traverse((for_stmt->stmt).next, iter, stmt_filter, nstmts_filter, &loops_next,
+                &num_next_loops, &stmts_next, &num_next_stmts, filter_type);
+        *nstmts = concat_if_new((void **)stmts, *nstmts, stmts_next, num_next_stmts, sizeof(int));
+        *nloops = concat_if_new((void **)loops, *nloops, loops_next, num_next_loops,
+                sizeof(struct clast_stmt *));
+        free(loops_next);
+        free(stmts_next);
+    }
+}
+
--
1.7.4.4



 

Cédric Bastoul

unread,
May 24, 2012, 3:56:11 AM5/24/12
to Uday Reddy, Tobias Grosser, cloog-de...@googlegroups.com
Hi Uday,
my main suggestion is to create a struct clast_filter to embed all the filtering criteria, so the function can be extended easily to new criteria without modifying its prototype. You may call the function clast_filtrate which is closer to its purpose. Lastly, you should put it in clast.c directly.
Thank you,

Ced.

--

Uday Reddy

unread,
May 24, 2012, 6:37:42 AM5/24/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On Thu, May 24, 2012 at 1:26 PM, Cédric Bastoul <cedric....@u-psud.fr> wrote:
Hi Uday,
my main suggestion is to create a struct clast_filter to embed all the filtering criteria, so the function can be extended easily to new criteria without modifying its prototype. You may call the function clast_filtrate which is closer to its purpose. Lastly, you should put it in clast.c directly.

Thanks, I agree.

-Uday

 

Uday K Bondhugula

unread,
May 28, 2012, 4:47:49 AM5/28/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com
On 05/23/2012 08:13 PM, C�dric Bastoul wrote:
>
>
> On Wed, May 23, 2012 at 4:21 PM, Uday K Bondhugula <uday...@gmail.com
> <mailto:uday...@gmail.com>> wrote:
>
> On 05/23/2012 07:16 PM, C�dric Bastoul wrote:
>
> Hi Uday,
> I'm fine with the changes. I would suggest :
>
> - only one "parallel" field in clast_for, with some #define flags:
> #define CLAST_PARALLEL_NOT 0
> #define CLAST_PARALLEL_VEC 2
> #define CLAST_PARALLEL_OMP 4
> #define CLAST_PARALLEL_MPI 8
>
>
> But a loop can be both MPI parallel and OpenMP parallel. The code
> already handles this; if you set both mpi_parallel and omp_parallel,
> it'll distribute the loop for MPI and mark the local loop OpenMP
> parallel. May be another
>
>
> Sure, I was thinking about bit flags, so you can have many options together:
> if ((f->decoration & CLAST_DECORATION_OMP) || (f->decoration &
> CLAST_DECORATION_MPI))

I've made changes along these lines. Patch attached and also pasted below.

-Uday

From 7d3f8bd33788d3e58e97a34b540fae234f9a98ce Mon Sep 17 00:00:00 2001
From: Uday Bondhugula <uday...@gmail.com>
Date: Wed, 23 May 2012 16:27:37 +0530
Subject: [PATCH 1/2] clast-based loop parallelization support

This patch provides support for printing OMP parallel loops if marked
appropriately. clast_for->parallel can be set to CLAST_PARALLEL_OMP,
CLAST_PARALLEL_MPI, or CLAST_PARALLEL_OMP + CLAST_PARALLEL_MPI. The loop
will be
annotated with omp parallel pragmas and/or distributed via MPI; list of
private and reduction variables for omp parallelization can be set as
well.

Signed-off-by: Uday Bondhugula <uday...@gmail.com>
---
include/cloog/clast.h | 9 +++++++
source/clast.c | 5 ++++
source/pprint.c | 63
+++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/cloog/clast.h b/include/cloog/clast.h
index b455369..675c5ff 100644
--- a/include/cloog/clast.h
+++ b/include/cloog/clast.h
@@ -31,6 +31,10 @@ struct clast_term {
struct clast_expr *var;
};

+#define CLAST_PARALLEL_NOT 0
+#define CLAST_PARALLEL_OMP 1
+#define CLAST_PARALLEL_MPI 2
+
enum clast_red_type { clast_red_sum, clast_red_min, clast_red_max };
struct clast_reduction {
struct clast_expr expr;
@@ -98,6 +102,11 @@ struct clast_for {
struct clast_expr * UB;
cloog_int_t stride;
struct clast_stmt * body;
+ int parallel;
+ /* Comma separated list of loop private variables for OpenMP
parallelization */
+ char *private_vars;
+ /* Comma separated list of reduction variable/operators for OpenMP
parallelization */
+ char *reduction_vars;
};

struct clast_equation {
diff --git a/source/clast.c b/source/clast.c
index 0b67532..7cd1e45 100644
--- a/source/clast.c
+++ b/source/clast.c
@@ -211,6 +211,8 @@ static void free_clast_for(struct clast_stmt *s)
free_clast_expr(f->UB);
cloog_int_clear(f->stride);
cloog_clast_free(f->body);
+ if (f->private_vars) free(f->private_vars);
+ if (f->reduction_vars) free(f->reduction_vars);
free(f);
}

@@ -226,6 +228,9 @@ struct clast_for *new_clast_for(CloogDomain *domain,
const char *it,
f->LB = LB;
f->UB = UB;
f->body = NULL;
+ f->parallel = CLAST_PARALLEL_NOT;
+ f->private_vars = NULL;
+ f->reduction_vars = NULL;
cloog_int_init(f->stride);
if (stride)
cloog_int_set(f->stride, stride->stride);
diff --git a/source/pprint.c b/source/pprint.c
index 9c7f1d4..32c6a02 100644
--- a/source/pprint.c
+++ b/source/pprint.c
@@ -405,6 +405,56 @@ void pprint_guard(struct cloogoptions *options,
FILE *dst, int indent,
void pprint_for(struct cloogoptions *options, FILE *dst, int indent,
struct clast_for *f)
{
+ if (options->language == CLOOG_LANGUAGE_C) {
+ if ((f->parallel & CLAST_PARALLEL_OMP) && !(f->parallel &
CLAST_PARALLEL_MPI)) {
+ if (f->LB) {
+ fprintf(dst, "lbp=");
+ pprint_expr(options, dst, f->LB);
+ fprintf(dst, ";\n");
+ }
+ if (f->UB) {
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "ubp=");
+ pprint_expr(options, dst, f->UB);
+ fprintf(dst, ";\n");
+ }
+ fprintf(dst, "#pragma omp parallel for%s%s%s%s%s%s\n",
+ (f->private_vars)? " private(":"",
+ (f->private_vars)? f->private_vars: "",
+ (f->private_vars)? ")":"",
+ (f->reduction_vars)? " reduction(": "",
+ (f->reduction_vars)? f->reduction_vars: "",
+ (f->reduction_vars)? ")": "");
+ fprintf(dst, "%*s", indent, "");
+ }
+ if (f->parallel & CLAST_PARALLEL_MPI) {
+ if (f->LB) {
+ fprintf(dst, "_lb_dist=");
+ pprint_expr(options, dst, f->LB);
+ fprintf(dst, ";\n");
+ }
+ if (f->UB) {
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "_ub_dist=");
+ pprint_expr(options, dst, f->UB);
+ fprintf(dst, ";\n");
+ }
+ fprintf(dst, "%*s", indent, "");
+ fprintf(dst, "polyrt_loop_dist(_lb_dist, _ub_dist, nprocs,
my_rank, &lbp, &ubp);\n");
+ if (f->parallel & CLAST_PARALLEL_OMP) {
+ fprintf(dst, "#pragma omp parallel for%s%s%s%s%s%s\n",
+ (f->private_vars)? " private(":"",
+ (f->private_vars)? f->private_vars: "",
+ (f->private_vars)? ")":"",
+ (f->reduction_vars)? " reduction(": "",
+ (f->reduction_vars)? f->reduction_vars: "",
+ (f->reduction_vars)? ")": "");
+ }
+ fprintf(dst, "%*s", indent, "");
+ }
+
+ }
+
if (options->language == CLOOG_LANGUAGE_FORTRAN)
fprintf(dst, "DO ");
else
@@ -412,7 +462,11 @@ void pprint_for(struct cloogoptions *options, FILE
*dst, int indent,

if (f->LB) {
fprintf(dst, "%s=", f->iterator);
+ if (f->parallel & (CLAST_PARALLEL_OMP || CLAST_PARALLEL_MPI)) {
+ fprintf(dst, "lbp");
+ }else{
pprint_expr(options, dst, f->LB);
+ }
} else if (options->language == CLOOG_LANGUAGE_FORTRAN)
cloog_die("unbounded loops not allowed in FORTRAN.\n");

@@ -424,8 +478,13 @@ void pprint_for(struct cloogoptions *options, FILE
*dst, int indent,
if (f->UB) {
if (options->language != CLOOG_LANGUAGE_FORTRAN)
fprintf(dst,"%s<=", f->iterator);
- pprint_expr(options, dst, f->UB);
- } else if (options->language == CLOOG_LANGUAGE_FORTRAN)
+
+ if (f->parallel & (CLAST_PARALLEL_OMP || CLAST_PARALLEL_MPI)) {
+ fprintf(dst, "ubp");
+ }else{
+ pprint_expr(options, dst, f->UB);
+ }
+ }else if (options->language == CLOOG_LANGUAGE_FORTRAN)
cloog_die("unbounded loops not allowed in FORTRAN.\n");

if (options->language == CLOOG_LANGUAGE_FORTRAN) {
--
1.7.4.4






0001-clast-based-loop-parallelization-support.patch

Uday K Bondhugula

unread,
May 28, 2012, 5:02:16 AM5/28/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com

Patch attached.

-Uday

On 05/24/2012 01:26 PM, C�dric Bastoul wrote:
> Hi Uday,
> my main suggestion is to create a struct clast_filter to embed all the
> filtering criteria, so the function can be extended easily to new
> criteria without modifying its prototype. You may call the function
> clast_filtrate which is closer to its purpose. Lastly, you should put it
> in clast.c directly.
> Thank you,
>
> Ced.
>
> On Wed, May 23, 2012 at 7:05 PM, Uday Reddy <uday...@gmail.com
> <mailto:uday...@gmail.com>> wrote:
>
>
>
> On Wed, May 23, 2012 at 10:27 PM, Uday Reddy <uday...@gmail.com
> <mailto:uday...@gmail.com>> wrote:
>
>
>
> On Wed, May 23, 2012 at 9:49 PM, C�dric Bastoul
> <cedric....@u-psud.fr <mailto:cedric....@u-psud.fr>> wrote:
>
> I see. Thank you. CLooG should provide an example
> implementation of this function somewhere (documentation,
> -compilable option...) at some point.
>
>
> Yes, sure; I can update the documentatino.
>
> Also, how do you mark the loops ? Beta-vector-based ? I was
> planning a
>
>
> Actually, it's more general than beta-vector-based since not
> everyone may have transformations in a form where a loop
> (scattering tree loop actually) can be identified based on
> beta-vectors. I do it based on {depth, <stmt list>}, and I'm
> going to send a function that given {depth, <stmt list>}, can
> return all loops at that depth that have only those statements
> that appear in <stmt list>. So, if you have a beta-vector, if
> you provide its size as depth and the list of statements that
> share that beta-vector, that'll do.
>
> The function is more generic and can do more things.
>
>
> Patch for clast traversal pasted below.
>
> -Uday
>
>
> From d255400679150ed91c39bdaa730d200435388e5c Mon Sep 17 00:00:00 2001
> From: Uday Bondhugula <uday...@gmail.com <mailto:uday...@gmail.com>>
> Date: Wed, 23 May 2012 22:32:57 +0530
> Subject: [PATCH 2/2] clast traversal support
>
> See comment for function clast_traverse.
>
> Signed-off-by: Uday Bondhugula <uday...@gmail.com
> <mailto:uday...@gmail.com>>
> <mailto:cloog-de...@googlegroups.com>
> To stop subscribing, send a mail to
> cloog-developm...@googlegroups.com
> <mailto:cloog-development%2Bunsu...@googlegroups.com>
0002-clast-traversal-and-node-filtering-support.patch

Uday Reddy

unread,
May 28, 2012, 3:49:53 PM5/28/12
to Cédric Bastoul, Tobias Grosser, cloog-de...@googlegroups.com

Previous 0001-clast-based-loop-parallelization... was bugged; i've fixed it now. Here are the two patches again.

-Uday
0001-clast-based-loop-parallelization-support.patch
0002-clast-traversal-and-node-filtering-support.patch

Uday Reddy

unread,
May 30, 2012, 2:35:55 PM5/30/12
to Cédric Bastoul, CLooG Development

White space errors in the 0002-... patch are now fixed.

Thanks,
-Uday
0001-clast-based-loop-parallelization-support.patch
0002-clast-traversal-and-node-filtering-support.patch

Cédric Bastoul

unread,
Jun 1, 2012, 3:37:30 AM6/1/12
to Uday Reddy, CLooG Development
Pushed. Thank you Uday !

Ced.

--
You got this message because you subscribed to the CLooG Development mailing list.
To send messages to this list, use cloog-de...@googlegroups.com
To stop subscribing, send a mail to cloog-developm...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages