Autotuning and user control

39 views
Skip to first unread message

Hal Finkel

unread,
Sep 1, 2017, 8:42:50 PM9/1/17
to poll...@googlegroups.com, protonu basu, Cy Chan, Mary Hall
Hi everyone,

Some of my colleagues and I are interested in understanding how we might
have fine-grained control over the transformations that Polly performs
in two contexts:

1. For tuning by autotuning software (e.g., CHiLL).

2. For tuning by the programmer (via pragmas, attributes, or similar).

The best mechanism for both categories might be the same, but I can
easily imagine it being different. For example, autotuning software
might simply insert some pragma in the source code in order to name a
particular loop nest and/or region and then provide a detailed
transformation recipe via some other input file (something like using
-polly-import/-polly-export).

For programmer-directed tuning, I can imagine extending Clang's existing
loop pragma
(http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations)
with additional clauses for transformations that Polly supports, plus
potentially teaching Polly to understand some of the existing clauses
(e.g., for unrolling/interleaving or vectorization). For
unrolling/interleaving, Polly might perform the optimization itself, or
as with vectorization, it might simply prepare for the later
transformation. This means that a loop annotated with #pragma clang loop
vectorize_width(n) would be akin to setting
-polly-vectorize=stripmine/-polly-prevect-width=n for the relevant loop
nest.

To the existing pragma clauses, we'd need to add something that allows
control over tiling. For example, tile(enable/disable) and tile_size(n).
We could also add pragmas for interchange, fusion, etc. For fusion, we
could use a loop-level pragma (and then potentially fuse loops all
marked for fusion), use some region-marking scheme (such that we might
fuse all loops within the region), or some naming scheme with a pragma
taking a list of names (as IBM did in [1]). Finally, we should add some
clause for enabling/disabling polyhedral transformations in total.

These would be translated into metadata, similar to the current loop
hints, and then read by Polly (as an alternative to using settings from
defaults or command-line parameters).

I'd appreciate any thoughts or pointers to relevant/useful prior art
(the only thing I know about in this space is [1]).

Thanks in advance,
Hal

[1]
http://scv.bu.edu/computation/bluegene/IBMdocs/compiler/xlc-8.0/html/compiler/ref/rnpgbklp.htm

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Tobias Grosser

unread,
Sep 2, 2017, 2:00:43 AM9/2/17
to Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan, Mary Hall
On Sat, Sep 2, 2017, at 02:42, Hal Finkel wrote:
> Hi everyone,

Hi Hal! Protonu and Mary, good to meet you here! Hi Cy!

> Some of my colleagues and I are interested in understanding how we might
> have fine-grained control over the transformations that Polly performs
> in two contexts:
>
> 1. For tuning by autotuning software (e.g., CHiLL).
>
> 2. For tuning by the programmer (via pragmas, attributes, or similar).

That's a very good idea. We have been thinking in this direction for a
long time, I would certainly like to get involved.

> The best mechanism for both categories might be the same, but I can
> easily imagine it being different. For example, autotuning software
> might simply insert some pragma in the source code in order to name a
> particular loop nest and/or region and then provide a detailed
> transformation recipe via some other input file (something like using
> -polly-import/-polly-export).

You can use -polly-import / -polly-export already today. Each file will
have a unique name per scop, and already provides information which
function it is derived from. We could likely add 'marking metadata' to
identify specific
source code loop nests and add this information into the exported files.

> For programmer-directed tuning, I can imagine extending Clang's existing
> loop pragma
> (http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations)
> with additional clauses for transformations that Polly supports, plus
> potentially teaching Polly to understand some of the existing clauses
> (e.g., for unrolling/interleaving or vectorization). For
> unrolling/interleaving, Polly might perform the optimization itself, or
> as with vectorization, it might simply prepare for the later
> transformation.

A related project is loopy, which allows for C/C++ pragmas to drive loop
transformations. It is based on Polly, but not available upstream:

https://github.com/nimit-singhania/loopy

> This means that a loop annotated with #pragma clang loop
> vectorize_width(n) would be akin to setting
> -polly-vectorize=stripmine/-polly-prevect-width=n for the relevant loop
> nest.

That's possible.

> To the existing pragma clauses, we'd need to add something that allows
> control over tiling. For example, tile(enable/disable) and tile_size(n).
> We could also add pragmas for interchange, fusion, etc. For fusion, we
> could use a loop-level pragma (and then potentially fuse loops all
> marked for fusion), use some region-marking scheme (such that we might
> fuse all loops within the region), or some naming scheme with a pragma
> taking a list of names (as IBM did in [1]). Finally, we should add some
> clause for enabling/disabling polyhedral transformations in total.

Sounds all very good!

> These would be translated into metadata, similar to the current loop
> hints, and then read by Polly (as an alternative to using settings from
> defaults or command-line parameters).
>
> I'd appreciate any thoughts or pointers to relevant/useful prior art
> (the only thing I know about in this space is [1]).

There is Vikram's autotuning presented at EuroLLVm this year.

There is somewhere a patch in reviews.llvm.org to build up a search
space tree for optimizations in LLVM (AFAIK it was submitted to do code
size tuning). I was listed as reviewer, but could not find it quickly

As mentioned in the thread on the LLVM mailing list, we have now a
incremental scheduler in isl. This means, it should be possible to add
an interface that allows to tune specific loop scheduling decisions.

Best,
Tobias

> Thanks in advance,
> Hal
>
> [1]
> http://scv.bu.edu/computation/bluegene/IBMdocs/compiler/xlc-8.0/html/compiler/ref/rnpgbklp.htm
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> --
> You received this message because you are subscribed to the Google Groups
> "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to polly-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hal Finkel

unread,
Sep 3, 2017, 10:52:46 AM9/3/17
to Tobias Grosser, poll...@googlegroups.com, protonu basu, Cy Chan, Mary Hall

On 09/02/2017 01:00 AM, Tobias Grosser wrote:
> On Sat, Sep 2, 2017, at 02:42, Hal Finkel wrote:
>> Hi everyone,
> Hi Hal! Protonu and Mary, good to meet you here! Hi Cy!
>
>> Some of my colleagues and I are interested in understanding how we might
>> have fine-grained control over the transformations that Polly performs
>> in two contexts:
>>
>> 1. For tuning by autotuning software (e.g., CHiLL).
>>
>> 2. For tuning by the programmer (via pragmas, attributes, or similar).
> That's a very good idea. We have been thinking in this direction for a
> long time, I would certainly like to get involved.

Great!

>
>> The best mechanism for both categories might be the same, but I can
>> easily imagine it being different. For example, autotuning software
>> might simply insert some pragma in the source code in order to name a
>> particular loop nest and/or region and then provide a detailed
>> transformation recipe via some other input file (something like using
>> -polly-import/-polly-export).
> You can use -polly-import / -polly-export already today. Each file will
> have a unique name per scop, and already provides information which
> function it is derived from. We could likely add 'marking metadata' to
> identify specific
> source code loop nests and add this information into the exported files.

Sounds good. This will work for a polyhedral autotuner (e.g., CHiLL) in
the sense that it knows how to explore some space of transformations.
This is definitely a useful point of integration. It also, in a
significant sense, replaces parts of Polly with the autotuner instead of
using the autotuner to drive Polly. I think that the latter is also
something that we should explore. The loopy project you mention looks
like it provides an interface more along those lines.
I'm inclined toward a scheme similar to what IBM used because it a)
seems easy to implement and b) also solves the naming problem mentioned
above. This could look like a naming clause that can be attached to a loop:

#pragma clang loop name(foo)

and then some kind of multi-loop pragma. Maybe something like:

#pragma clang loops fuse(foo, bar, andmore)

>
>> These would be translated into metadata, similar to the current loop
>> hints, and then read by Polly (as an alternative to using settings from
>> defaults or command-line parameters).
>>
>> I'd appreciate any thoughts or pointers to relevant/useful prior art
>> (the only thing I know about in this space is [1]).
> There is Vikram's autotuning presented at EuroLLVm this year.

Good point. The module-extraction technique could be useful in this context.

>
> There is somewhere a patch in reviews.llvm.org to build up a search
> space tree for optimizations in LLVM (AFAIK it was submitted to do code
> size tuning). I was listed as reviewer, but could not find it quickly

https://reviews.llvm.org/D12199 and related patches.

>
> As mentioned in the thread on the LLVM mailing list, we have now a
> incremental scheduler in isl. This means, it should be possible to add
> an interface that allows to tune specific loop scheduling decisions.

Great. Can you sketch how this works (I see test_schedule_incremental in
lib/External/isl/isl_test.c, but that only tests the incremental feature
in a trivial sense?).

Thanks again,
Hal

Tobias Grosser

unread,
Sep 3, 2017, 11:21:58 AM9/3/17
to Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan, Mary Hall
Sure, that sounds reasonable. Polly has indeed already a very simple
pragma to ignore entire functions. We call it 'polly-optimized' and use
it to not re-optimize functions we already optimized earlier. We should
probably rename it to 'disable-polly'.

> >> These would be translated into metadata, similar to the current loop
> >> hints, and then read by Polly (as an alternative to using settings from
> >> defaults or command-line parameters).
> >>
> >> I'd appreciate any thoughts or pointers to relevant/useful prior art
> >> (the only thing I know about in this space is [1]).
> > There is Vikram's autotuning presented at EuroLLVm this year.
>
> Good point. The module-extraction technique could be useful in this
> context.
>
> >
> > There is somewhere a patch in reviews.llvm.org to build up a search
> > space tree for optimizations in LLVM (AFAIK it was submitted to do code
> > size tuning). I was listed as reviewer, but could not find it quickly
>
> https://reviews.llvm.org/D12199 and related patches.

Right.

> > As mentioned in the thread on the LLVM mailing list, we have now a
> > incremental scheduler in isl. This means, it should be possible to add
> > an interface that allows to tune specific loop scheduling decisions.
>
> Great. Can you sketch how this works (I see test_schedule_incremental in
> lib/External/isl/isl_test.c, but that only tests the incremental feature
> in a trivial sense?).

See: https://lirias.kuleuven.be/handle/123456789/586108

The incremental scheduling part is what I am referring to.

Best,
Tobias

Mary Hall

unread,
Sep 14, 2017, 10:28:54 AM9/14/17
to Tobias Grosser, Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan
Hi,

This all sounds interesting.   I have been thinking about autotuning of pragmas (like OpenMP and OpenACC), but it had not occurred to me a similar idea could direct compiler transformations.  So it looks like we could either insert pragmas into code and rely on Clang/Polly to consume them, or integrate CHiLL with Polly.  One limitation with either is the AST transformations we find we need to do in CHiLL for stencil and sparse matrix computations.  We have an experimental Clang frontend for CHiLL, but have found it difficult to modify the Clang AST.   What are next steps? 

Mary

Tobias Grosser

unread,
Oct 1, 2017, 4:41:31 AM10/1/17
to Mary Hall, Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan
On Thu, Sep 14, 2017, at 16:28, Mary Hall wrote:
> Hi,
>
> This all sounds interesting. I have been thinking about autotuning of
> pragmas (like OpenMP and OpenACC), but it had not occurred to me a
> similar idea could direct compiler transformations. So it looks like we
> could either insert pragmas into code and rely on Clang/Polly to consume
> them, or integrate CHiLL with Polly. One limitation with either is the
> AST transformations we find we need to do in CHiLL for stencil and sparse
> matrix computations. We have an experimental Clang frontend for CHiLL,
> but have found it difficult to modify the Clang AST. What are next
> steps?

I feel we should look into concrete examples we care about, define the
kind of transformations we need, and then discuss how best to implement
them as part of LLVM. We could also have a brainstorming phone call.

Best,
Tobias
> > See: https://lirias.kuleuven.be/handle/123456789/586108 <https://lirias.kuleuven.be/handle/123456789/586108>

Mary Hall

unread,
Oct 1, 2017, 1:56:45 PM10/1/17
to Tobias Grosser, Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan
Perhaps as a starting point we can try a simple stencil example that uses only standard transformations (tile, permute, skew) and work through the details.  Eventually we would need some AST transformations, but we can do without for now.

Mary

Tobias Grosser

unread,
Oct 4, 2017, 8:03:08 AM10/4/17
to Mary Hall, Hal Finkel, poll...@googlegroups.com, protonu basu, Cy Chan
On Sun, Oct 1, 2017, at 19:56, Mary Hall wrote:
> Perhaps as a starting point we can try a simple stencil example that uses
> only standard transformations (tile, permute, skew) and work through the
> details. Eventually we would need some AST transformations, but we can
> do without for now.

That sounds like a great idea. As pointed out earlier,
https://github.com/nimit-singhania/loopy might be a good start. I would
also be happy to give advice in case any of you wants to give this a
shot.

Best,
Tobias
> >>> See: https://lirias.kuleuven.be/handle/123456789/586108 <https://lirias.kuleuven.be/handle/123456789/586108> <https://lirias.kuleuven.be/handle/123456789/586108 <https://lirias.kuleuven.be/handle/123456789/586108>>
Reply all
Reply to author
Forward
0 new messages