LLVM Polly

Alex Mason

unread,

Feb 9, 2017, 1:27:01 PM2/9/17

to Haskell Repa

Hi All,

Has there been any investigation into using LLVM's Polly project with Repa? It seems like it could provide some nice benefits if the presence of loops produced by the Haskell code is visible to those passes. I'm sure it's much more difficult than it worth it, but it seems like some of the results from Polly could feed back into Repa for things like tiling and parallelism.

Cheers,

Alex

Ben Lippmeier

unread,

Feb 11, 2017, 4:30:13 AM2/11/17

to Alex Mason, Haskell Repa

> On 9 Feb 2017, at 1:00 AM, Alex Mason <axm...@gmail.com> wrote:
>
> Has there been any investigation into using LLVM's Polly project with Repa? It seems like it could provide some nice benefits if the presence of loops produced by the Haskell code is visible to those passes. I'm sure it's much more difficult than it worth it, but it seems like some of the results from Polly could feed back into Repa for things like tiling and parallelism.

The compilation path for Repa was always intended to be via GHC. It depends on the particular way in which the GHC simplifier transforms code. GHC does have an LLVM backend, but we don’t have much control over exactly what LLVM instructions are produced from the higher level source language. I think if we deeply cared about getting code into a form that the LLVM polyhedral optimiser could deal with we’d probably need to write our own code generator.

Ben.

Dominic Steinitz

unread,

Feb 11, 2017, 12:37:01 PM2/11/17

to Haskell Repa, axm...@gmail.com

You mean a code generator for ghc that somehow targets polly? I am vaguely interested as I am just about to submit a patch to improve the performance of floating point abs. Or something that compiles repa directly?

Alex Mason

unread,

Feb 12, 2017, 10:23:36 PM2/12/17

to Dominic Steinitz, Haskell Repa, Trevor McDonell

Thanks for the info Trevor, I’d half assumed that the output of GHC would GHC would look loopy enough to get picked up by things like the vectoriser pass(es). If these sorts of optimisations aren’t being fired, what is LLVM doing to improve the performance of numerical code in Repa and other Haskell code? If there’s a paper I’m happy to check that out.

Cheers,

Alex

On 12 Feb 2017, 12:43 PM +1100, Trevor McDonell <trevor....@gmail.com>, wrote:

It is important to note that GHC does not generate loops.

At least, not in the sense that a C compiler like LLVM recognises. Thus, questions surrounding optimisations such as polly or vectorisation are complete non-starters.

More concretely, consider this program:
test1 :: Array U DIM1 Double -> IO (Array U DIM1 Double)
test1 xs = computeUnboxedP $ R.map (+1) xs
Compile with the usual swath of options and add -keep-llvm-files, or just look the output I saved here (as the generated names are likely to change): https://gist.github.com/tmcdonell/490e336274d8bff943b0259b13382907

The work happens with the fadd instruction, for example in function @siOS_info$def. That function then does a tail call to @siOR_info$def, which itself then tail calls back to @siOS_info$def. This kind of looping structure is, demonstrably, entirely opaque to LLVM’s optimisations.

As Ben mentioned above, if you really care about this sort of thing you need to write your own code generator. Here is what happens if you give LLVM the above in a form it recognises: https://gist.github.com/tmcdonell/3efb7345b227e4b980b36fe4ca315ba5

-Trevor

--
You received this message because you are subscribed to the Google Groups "Haskell Repa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-repa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dominic Steinitz

unread,

Feb 13, 2017, 3:13:29 AM2/13/17

to Alex Mason, Dominic Steinitz, Haskell Repa, Trevor McDonell

Maybe this is relevant? Type-safe Runtime Code Generation: Accelerate to LLVM. But of course it’s not repa.

Dominic Steinitz

dom...@steinitz.org

http://idontgetoutmuch.wordpress.com

Trevor McDonell

unread,

Feb 13, 2017, 4:34:33 AM2/13/17

to Dominic Steinitz, Alex Mason, Dominic Steinitz, Haskell Repa

Actually, the vectorised loop example I posted above is the output of Accelerate. Felt a let little inappropriate to point that out on the repa mailing list though (and, beside the point that you should be able to get that from a properly structured C loop).

-Trev

Trevor McDonell

unread,

Feb 13, 2017, 4:34:33 AM2/13/17

to Alex Mason, Dominic Steinitz, Haskell Repa

I don’t know if anybody has explicitly studied the difference, but if you come across anything send it my way. My intuition tells me that LLVM is just generally better, but the NGC knows more about the peculiarities of the code GHC produces. Depending on your program, one of those is more important than the other.

LLVM has many different optimisation passes and you are free to specify as many or as few as you want, in any order and multiple times. Actually, if you just feed the output of my previous email through opt -O3 again it turns the tail calls into local branches. Still doesn’t vectorise though…

Repeating this experiment may be interesting:
https://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/

-Trev

Trevor McDonell

unread,

Feb 13, 2017, 4:34:33 AM2/13/17

to Dominic Steinitz, Haskell Repa, axm...@gmail.com

It is important to note that GHC does not generate loops.

At least, not in the sense that a C compiler like LLVM recognises. Thus, questions surrounding optimisations such as polly or vectorisation are complete non-starters.

More concretely, consider this program:

test1 :: Array U DIM1 Double -> IO (Array U DIM1 Double)
test1 xs = computeUnboxedP $ R.map (+1) xs

Compile with the usual swath of options and add -keep-llvm-files, or just look the output I saved here (as the generated names are likely to change): https://gist.github.com/tmcdonell/490e336274d8bff943b0259b13382907

The work happens with the fadd instruction, for example in function @siOS_info$def. That function then does a tail call to @siOR_info$def, which itself then tail calls back to @siOS_info$def. This kind of looping structure is, demonstrably, entirely opaque to LLVM’s optimisations.

As Ben mentioned above, if you really care about this sort of thing you need to write your own code generator. Here is what happens if you give LLVM the above in a form it recognises: https://gist.github.com/tmcdonell/3efb7345b227e4b980b36fe4ca315ba5

-Trevor

On Sun, 12 Feb 2017 at 04:37 ‘Dominic Steinitz’ via Haskell Repa [haskel...@googlegroups.com](mailto:haskel...@googlegroups.com) wrote:

Ben Lippmeier

unread,

Feb 13, 2017, 4:56:33 AM2/13/17

to Dominic Steinitz, 'Dominic Steinitz' via Haskell Repa, axm...@gmail.com

> On 11 Feb 2017, at 6:37 PM, 'Dominic Steinitz' via Haskell Repa <haskel...@googlegroups.com> wrote:
>
> You mean a code generator for ghc that somehow targets polly? I am vaguely interested as I am just about to submit a patch to improve the performance of floating point abs. Or something that compiles repa directly?

Either.

The GHC simplifier is very successful at optimising standard looking Haskell code, but writing "high performance” numeric code in Haskell is not standard. It may be “possible” to get what you want by compiling source Haskell programs, but with the standard simplifier and code generator you’ll be swimming up hill all the way.

Ben.

Reply all

Reply to author

Forward