Re: LLVM Polly optimizations for Android

Nick Desaulniers

okunmadı,

12 Mar 2020 13:27:2712.03.2020

alıcı Danny Lin, clang-built-linux, android-llvm

+ AOSP LLVM's public email list, CBL

On Wed, Mar 11, 2020 at 4:40 PM Danny Lin <da...@kdrag0n.dev> wrote:
>
> Hi Nick,

Hi Danny, great to hear from you. Thank you for taking the time to
write all this up! I can tell a lot of care was put into it.

>
> Polly is LLVM's polyhedral loop optimizer, which analyzes loops and optimizes them for cache locality as well as reduced memory accesses, similar to GCC Graphite. It can also perform automatic loop parallelization using OpenMP, but that's definitely out-of-scope for the kernel and most likely for Android as well. You can read more about it here: https://polly.llvm.org/

I admit I don't know too much about Polly (though I can describe it
better than most people); my opinion is limited from my experience on
TensorFlow's XLA compiler which didn't use Polly intentionally, though
that team is mostly replaced with folks working on MLIR now actively
researching Polly.

>
> LLVM needs to be built with the "polly" project enabled in LLVM_ENABLE_PROJECTS for it to be available. After that you need to enable the use of Polly by passing flags when invoking Clang. These are LLVM flags, not Clang ones, so each one needs to have "-mllvm" before it. Below is a list of the ones I've tested, along with the explanations to the best of my knowledge:

It's probably low hanging fruit for us to enable polly in the
configuration of AOSP LLVM, since it seems like making use of it at
compile time is also gated on the flags you describe below. That
seems like it should give us fine grain control over disabling it for
projects should they miscompile with it enabled, if any.

>
> -polly: Base flag to enable Polly itself
> -polly-run-dce: Polyhedral dead code elimination, analyzes and eliminates statements that can be proven dead (https://polly.llvm.org/doxygen/DeadCodeElimination_8cpp_source.html)
> -polly-run-inliner: Run an early LLVM inlining pass before running Polly
> -polly-opt-fusion=max: Optimization fusion strategy (default min)
> -polly-ast-use-context: Pass context around loops to the optimizer so that it can make better decisions
> -polly-detect-keep-going: Don't fail on the first error encountered (this is probably a bad idea)
> -polly-vectorizer=stripmine: Generate vector code automatically (https://polly.llvm.org/docs/UsingPollyWithClang.html#automatic-vector-code-generation)
> -polly-invariant-load-hoisting: Hoist loads of invariant memory values out of loops, when possible (https://reviews.llvm.org/D31842)

Thanks for the research (and testing of D31842, though seeing Johannes
resign is curious), this is a great starting point for folks looking
to try Polly!

>
> I have a kernel commit that exposes the ones I deemed useful through a Kconfig option: https://github.com/kdrag0n/proton_zf6/commit/00f711eead423
> And a prebuilt toolchain with Polly support that can be used for preliminary testing and evaluation: https://github.com/kdrag0n/proton-clang

So no new compiler warnings, boot issues, or otherwise noticeable
runtime issues? That's impressive, and worth paying attention to.

Thanks for all of the work you do on Proton kernels (and toolchains),
too BTW, which I've watched from afar. The Android ROM scene on XDA
is a vibrant and competitive edge that Android has in the market.

> While it hasn't provided much of a performance improvement for kernels in the past, I've recently done some new tests and it looks like that's changed drastically. On my 4.14 kernel, Polly is now showing a larger performance improvement than LTO in terms of hackbench times. Without LTO the improvement is 14% over not using Polly, and with LTO it's 10% — still substantial. The results are available here: https://docs.google.com/spreadsheets/d/1mhjyshujZz8jYI7dMoCe-yFbxymW-fWaC08vMhBbEmQ/edit

That's a good start, though hackbench is not the be-all end-all of
benchmarks, and N=3 isn't statistically significant. Internally, we
have a composite of numerous first and third party performance test
suites. Quantifying it in terms of speedup relative to what we saw
with LTO is brilliant, and eye opening though!

> At least with the kernel, I haven't observed any noticeable differences in compile times.

Impressive as well. Another thing we measure is binary size; kernel
boot time is strongly correlated with kernel image size (decompressing
the kernel image scaling with kernel image size). But everything is a
tradeoff; quantifying the tradeoff is important.

> I'd be willing to submit an AOSP LLVM patch to enable Polly given instructions on doing so.

Great! Personally, that would make me so happy to help you do this; I
think Google+Android in general could be doing more outreach to the
fantastic XDA developers who sacrifice a lot of their time and money
for improving the Android ecosystem. I'm committed to changing the
current status quo in that regard.

If you clone this repo: https://android.googlesource.com/toolchain/llvm_android/

and modify the stage 2 cmakes flags (stage2_extra_defines) in do_build.py:
https://android.googlesource.com/toolchain/llvm_android/+/refs/heads/master/do_build.py#1371

Send me a patch on https://android-review.googlesource.com/ and we'll
go from there.

>
> At any rate, I'd be curious to see the results of your testing and whether there are any stability issues with it in production. Let me know if you need more info!
--
Thanks,
~Nick Desaulniers

Kees Cook

okunmadı,

12 Mar 2020 19:42:5812.03.2020

alıcı Nick Desaulniers, Danny Lin, clang-built-linux, android-llvm

On Thu, Mar 12, 2020 at 10:27:13AM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> On Wed, Mar 11, 2020 at 4:40 PM Danny Lin <da...@kdrag0n.dev> wrote:
> > I have a kernel commit that exposes the ones I deemed useful through a Kconfig option: https://github.com/kdrag0n/proton_zf6/commit/00f711eead423
> > And a prebuilt toolchain with Polly support that can be used for preliminary testing and evaluation: https://github.com/kdrag0n/proton-clang
>
> So no new compiler warnings, boot issues, or otherwise noticeable
> runtime issues? That's impressive, and worth paying attention to.

Neat! This should be very upstreamable into the kernel (though it
can be updated to actually test, from Kconfig, for the available
options). I'm sure someone will want to bikeshed the details, but it'd
be nice to gain CONFIG_LLVM_POLLY in upstream.

The Kconfig would look like this:

config LLVM_POLLY
bool "Enable LLVM's polyhedral loop optimizer (Polly)"
depends on $(cc-option,-mllvm -polly)
depends on $(cc-option,-mllvm -polly-run-dce)
depends on $(cc-option,-mllvm -polly-run-inliner)
depends on $(cc-option,-mllvm -polly-opt-fusion=max)
depends on $(cc-option,-mllvm -polly-ast-use-context)
depends on $(cc-option,-mllvm -polly-detect-keep-going)
depends on $(cc-option,-mllvm -polly-vectorizer=stripmine)
depends on $(cc-option,-mllvm -polly-invariant-load-hoisting)
help
...

And then the CONFIG won't even show up in the compiler doesn't support
it. (Which is the preferred way to doing things now; see
STACKPROTECTOR_STRONG for example.)

I imagine a commit log would included details on an N=100 hackbench,
N=9 kernel builds, and probably a "size" output before/after of the
resulting defconfig-built kernel.

I'm adding polly to my local LLVM builds right now. :)

-Kees

--
Kees Cook

Nick Desaulniers

okunmadı,

24 Mar 2020 18:55:0724.03.2020

alıcı Kees Cook, Danny Lin, clang-built-linux, android-llvm

On Thu, Mar 12, 2020 at 4:43 PM Kees Cook <kees...@chromium.org> wrote:
>
> On Thu, Mar 12, 2020 at 10:27:13AM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> > On Wed, Mar 11, 2020 at 4:40 PM Danny Lin <da...@kdrag0n.dev> wrote:
> > > I have a kernel commit that exposes the ones I deemed useful through a Kconfig option: https://github.com/kdrag0n/proton_zf6/commit/00f711eead423
> > > And a prebuilt toolchain with Polly support that can be used for preliminary testing and evaluation: https://github.com/kdrag0n/proton-clang
> >
> > So no new compiler warnings, boot issues, or otherwise noticeable
> > runtime issues? That's impressive, and worth paying attention to.
>
> Neat! This should be very upstreamable into the kernel (though it
> can be updated to actually test, from Kconfig, for the available
> options). I'm sure someone will want to bikeshed the details, but it'd

Did someone say bikeshed? *ears perk up*

> be nice to gain CONFIG_LLVM_POLLY in upstream.
>
> The Kconfig would look like this:
>
> config LLVM_POLLY
> bool "Enable LLVM's polyhedral loop optimizer (Polly)"
> depends on $(cc-option,-mllvm -polly)
> depends on $(cc-option,-mllvm -polly-run-dce)
> depends on $(cc-option,-mllvm -polly-run-inliner)
> depends on $(cc-option,-mllvm -polly-opt-fusion=max)
> depends on $(cc-option,-mllvm -polly-ast-use-context)
> depends on $(cc-option,-mllvm -polly-detect-keep-going)
> depends on $(cc-option,-mllvm -polly-vectorizer=stripmine)
> depends on $(cc-option,-mllvm -polly-invariant-load-hoisting)

Rather than shell out the compiler 8 times, why don't we do it once,
with all 8 specified? Let's support all or nothing, that way we don't
have to chase compiler bugs for combinations of the above.

> help
> ...
>
> And then the CONFIG won't even show up in the compiler doesn't support
> it. (Which is the preferred way to doing things now; see
> STACKPROTECTOR_STRONG for example.)
>
> I imagine a commit log would included details on an N=100 hackbench,
> N=9 kernel builds, and probably a "size" output before/after of the
> resulting defconfig-built kernel.
>
> I'm adding polly to my local LLVM builds right now. :)

How were the results?

Do you plan on posting the above as a patch, or were you suggesting
Danny dip their toes into upstream kernel development? ;) Maybe you
could help mentor another padawan? (Careful, Sith have this weird
thing where the pupil becomes the master and...not suggesting anyone
is Sith, but everytime it seems they've all been eliminated, it turns
out they've just been hiding "in deep space," which is such a cliche.)

--
Thanks,
~Nick Desaulniers

Kees Cook

okunmadı,

25 Mar 2020 22:01:1225.03.2020

alıcı Nick Desaulniers, Danny Lin, clang-built-linux, android-llvm

On Tue, Mar 24, 2020 at 03:54:53PM -0700, Nick Desaulniers wrote:
> On Thu, Mar 12, 2020 at 4:43 PM Kees Cook <kees...@chromium.org> wrote:
> >
> > On Thu, Mar 12, 2020 at 10:27:13AM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> > > On Wed, Mar 11, 2020 at 4:40 PM Danny Lin <da...@kdrag0n.dev> wrote:
> > > > I have a kernel commit that exposes the ones I deemed useful through a Kconfig option: https://github.com/kdrag0n/proton_zf6/commit/00f711eead423
> > > > And a prebuilt toolchain with Polly support that can be used for preliminary testing and evaluation: https://github.com/kdrag0n/proton-clang
> > >
> > > So no new compiler warnings, boot issues, or otherwise noticeable
> > > runtime issues? That's impressive, and worth paying attention to.
> >
> > Neat! This should be very upstreamable into the kernel (though it
> > can be updated to actually test, from Kconfig, for the available
> > options). I'm sure someone will want to bikeshed the details, but it'd
>
> Did someone say bikeshed? *ears perk up*

oH oh me me :)

>
> > be nice to gain CONFIG_LLVM_POLLY in upstream.
> >
> > The Kconfig would look like this:
> >
> > config LLVM_POLLY
> > bool "Enable LLVM's polyhedral loop optimizer (Polly)"
> > depends on $(cc-option,-mllvm -polly)
> > depends on $(cc-option,-mllvm -polly-run-dce)
> > depends on $(cc-option,-mllvm -polly-run-inliner)
> > depends on $(cc-option,-mllvm -polly-opt-fusion=max)
> > depends on $(cc-option,-mllvm -polly-ast-use-context)
> > depends on $(cc-option,-mllvm -polly-detect-keep-going)
> > depends on $(cc-option,-mllvm -polly-vectorizer=stripmine)
> > depends on $(cc-option,-mllvm -polly-invariant-load-hoisting)
>
> Rather than shell out the compiler 8 times, why don't we do it once,
> with all 8 specified? Let's support all or nothing, that way we don't
> have to chase compiler bugs for combinations of the above.

Sure, yeah.

>
> > help
> > ...
> >
> > And then the CONFIG won't even show up in the compiler doesn't support
> > it. (Which is the preferred way to doing things now; see
> > STACKPROTECTOR_STRONG for example.)
> >
> > I imagine a commit log would included details on an N=100 hackbench,
> > N=9 kernel builds, and probably a "size" output before/after of the
> > resulting defconfig-built kernel.
> >
> > I'm adding polly to my local LLVM builds right now. :)
>
> How were the results?

Well, I have polly in my LLVM builds. ;) I didn't do benchmarks yet, but
since I'm preparing to do some LTO benchmarks, I'll try polly too.

> Do you plan on posting the above as a patch, or were you suggesting
> Danny dip their toes into upstream kernel development? ;) Maybe you
> could help mentor another padawan? (Careful, Sith have this weird
> thing where the pupil becomes the master and...not suggesting anyone
> is Sith, but everytime it seems they've all been eliminated, it turns
> out they've just been hiding "in deep space," which is such a cliche.)

Heh, I'm happy to do whatever. Danny, do you want to send this upstream
after tweaking of the patch on this list, or do you want me to send
it? I'm happy to do either. :)

--
Kees Cook

Danny Lin

okunmadı,

26 Mar 2020 01:31:0926.03.2020

alıcı Kees Cook, Nick Desaulniers, clang-built-linux, android-llvm

Mar 25, 2020, 7:01 PM by kees...@chromium.org:

I'd be glad to send it upstream as soon as I have definitive results on which
specific Polly options produce the best output.

Since my initial message, I've decided that the following combination of flags
is a better "base" to start with:

-mllvm -polly
-mllvm -polly-vectorizer=polly
-mllvm -polly-opt-fusion=max
-mllvm -polly-run-inliner
-mllvm -polly-run-dce

Changes from the original:
-polly-vectorizer was switched from stripmine to polly
-polly-ast-use-context was removed because it's now the default
-polly-invariant-load-hoisting was removed because it makes Clang crash
when compiling some (Android 4.4, mainline 5.5) kernels and is likely a bad
idea when it comes to low-level code
-polly-detect-keep-going was removed because it prevents errors from being
printed until the end of the compilation unit, likely making the compiler
crash more instead of giving errors

I still don't have numbers for how the rest of the flags affect performance,
however, so this is probably not the best we can do.

---Danny

Nick Desaulniers

okunmadı,

26 Mar 2020 14:59:0726.03.2020

alıcı Danny Lin, Kees Cook, clang-built-linux, android-llvm

Bug reports would be very very useful to the polly developers to have.
I posit we want that flag, with the compiler bugs fixed.

> -polly-detect-keep-going was removed because it prevents errors from being
> printed until the end of the compilation unit, likely making the compiler
> crash more instead of giving errors

I think diagnosing the most errors is helpful here. Let's say there's
2 errors in our code. If the compiler stops after diagnosing 1 bug,
then we're under the assumption that there's only 1 bug to fix. After
fixing it, we may be surprised to find there's 1 more. If you only
have visibility into 1 bug at a time, it makes it very difficult to
estimate how many bugs there are in total to fix, which hurts our
ability to allocate time and resources towards fixing.

>
> I still don't have numbers for how the rest of the flags affect performance,
> however, so this is probably not the best we can do.

That's fine, I'd rather have the option available, but defaulted off.
We can justify turning it on by default when we have a better
understanding of the tradeoffs.

--
Thanks,
~Nick Desaulniers

Tümünü yanıtla

Yazarı yanıtla

Yönlendir