Link time optimization

1,115 views
Skip to first unread message

Tony Gentilcore

unread,
May 6, 2013, 11:50:20 AM5/6/13
to chromium-dev
The windows official build builds with LinkTimeCodeGeneration.
Recently LTCG was experimentally disabled for WebCore and that led to
a 30% regression on some Dromaeo benchmarks. Granted Dromaeo has some
really sensitive tight loops, but I was still quite surprised by how
big of an impact link time optimization can have.

That got me thinking about whether we can get this sweetness on other
platforms (particularly Android where it seems to be supported now in
the freshly minted Android NDK r8e).

Has anyone played around with gcc or llvm's link time optimization
(-flto)? Are there reasons we don't enable it for official builds? If
not, I'd like to experiment with enabling it.

-Tony

Nico Weber

unread,
May 6, 2013, 11:56:12 AM5/6/13
to Tony Gentilcore, chromium-dev
I played with it in llvm about a year ago. I ran into several compiler crashes (some got fixed). Linking took ~16GB and crashed after 2h (http://llvm.org/bugs/show_bug.cgi?id=12400), so I eventually gave up on it. It's probably worth looking at this again.

Rafael Espindola played with this during his time at Mozilla too; he got it working but the results weren't that impressive: https://blog.mozilla.org/respindola/2011/03/04/lto-on-os-x/ (also from a while ago).

If you want to play with it, I think these are the CLs I used:

Nico
 

-Tony

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
    http://groups.google.com/a/chromium.org/group/chromium-dev




Reid Kleckner

unread,
May 6, 2013, 12:19:34 PM5/6/13
to Nico Weber, Tony Gentilcore, chromium-dev
On Mon, May 6, 2013 at 11:56 AM, Nico Weber <tha...@chromium.org> wrote:
> On Mon, May 6, 2013 at 8:50 AM, Tony Gentilcore <to...@chromium.org> wrote:
>>
>> The windows official build builds with LinkTimeCodeGeneration.
>> Recently LTCG was experimentally disabled for WebCore and that led to
>> a 30% regression on some Dromaeo benchmarks. Granted Dromaeo has some
>> really sensitive tight loops, but I was still quite surprised by how
>> big of an impact link time optimization can have.
>>
>> That got me thinking about whether we can get this sweetness on other
>> platforms (particularly Android where it seems to be supported now in
>> the freshly minted Android NDK r8e).
>>
>> Has anyone played around with gcc or llvm's link time optimization
>> (-flto)? Are there reasons we don't enable it for official builds? If
>> not, I'd like to experiment with enabling it.
>
>
> I played with it in llvm about a year ago. I ran into several compiler
> crashes (some got fixed). Linking took ~16GB and crashed after 2h
> (http://llvm.org/bugs/show_bug.cgi?id=12400), so I eventually gave up on it.
> It's probably worth looking at this again.
>
> Rafael Espindola played with this during his time at Mozilla too; he got it
> working but the results weren't that impressive:
> https://blog.mozilla.org/respindola/2011/03/04/lto-on-os-x/ (also from a
> while ago).
>
> If you want to play with it, I think these are the CLs I used:
> https://codereview.chromium.org/9903020/
> https://codereview.chromium.org/9789003/

IIRC Rafael recently helped fix some quadratic behavior in linking
together LLVM modules, which should impact LTO times, although maybe
not memory usage. It's probably worth trying again.

On the other hand, if there are such low-hanging problems lying
around, it suggests that it's still early days for LLVM-based LTO.

Christian Biesinger

unread,
May 6, 2013, 5:11:20 PM5/6/13
to r...@google.com, Nico Weber, Tony Gentilcore, chromium-dev
Doesn't gcc also support LTO? Does it work any better? But I guess
that's only useful on Linux, not Mac.

-christian

Tony Gentilcore

unread,
May 7, 2013, 6:47:12 PM5/7/13
to Christian Biesinger, Reid Kleckner, Nico Weber, chromium-dev
I managed to get -flto working on the android build, but did not see a
significant enough perf improvement to justify turning it on.

The problem is that the current default, gcc4.6, produces internal
compile errors with -flto and switching to gcc4.7 alone resulted in
about a 3% hit on octane and 6% on dromaeo. Then enabling -flto on
gcc4.7 was practically a wash on those benchmarks so we were only left
with the perf hit.

On the plus side, -flto did reduce the .so size by 5% (1.6M).

-Tony

Stephen White

unread,
May 8, 2013, 11:09:06 AM5/8/13
to Tony Gentilcore, chromium-dev
I've noticed in the past that MSVC is rather reluctant to inline even small functions marked "inline". Using __forceinline (MSVC-specific) showed obvious perf gains in those cases (perhaps because we set size-over-speed?). So I'm guessing that the cross-module inlining in LTCG is making up a chunk of this loss by inlining at link time. Just a guess, but this may be why gcc/clang don't show as great a speedup from LTCG.

Stephen


Šechtl Voseček

unread,
Sep 16, 2013, 8:03:35 AM9/16/13
to chromi...@chromium.org, Christian Biesinger, Reid Kleckner, Nico Weber
GCC 4.9 should give significant improvements with LTO both compile time/memory usage wise as well as code quality wise.  Number of internal bottlenecks was
removed, especialy the type merging rewritten by Richard Biener.  My SoC student Martin Liska got Chromium, libreoffice and Firefox working with GCC LTO.
I would like to try to get LTO production ready for 4.9 or in worst case for next release for projects like that.  Some results are at http://www.ucw.cz/~hubicka/slides/labs2013.pdf (We collected nubers mostly for Firefox, but I think Chromium will behave similarly).  Compilation now fits into 4GB of RAM.

Main benefits of LTO for projects like Chromium is IMO code size difference.  GCC often inline too much with current settings. It seems that most apps compile well with --param inline-unit-growth=5 where one gets -Os sized binary with perormance better than -O3 w/o LTO. Sigificant help is also use of profile feedback.  Martin has new feedback directed code layout patch (not merged yet, but hopefuly this month) that reduces number of pages touched in the binary to about 20-30%

I would definitely like to know if GCC LTO misses some optimizations in comparsion to MSVC.  One of problems of LTO is that it will stay immature until people start using it.  It would be nice to set up regular testing and chase away remaining problems.

Honza

Dne středa, 8. května 2013 0:47:12 UTC+2 tonyg napsal(a):

Tony Gentilcore

unread,
Sep 16, 2013, 1:07:04 PM9/16/13
to sechtl....@gmail.com, chromium-dev, Christian Biesinger, Reid Kleckner, Nico Weber
Currently the latest GCC included in the Android NDK is 4.8. I'm not
sure there is a good way for us to experiment with 4.9 until it is in
the NDK.

I'm also not sure how much effort it would be worth to play with in on
Linux desktop.

Maybe someone from CrOS land would be interested in playing with 4.9?
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chromium-dev...@chromium.org.

Šechtl Voseček

unread,
Sep 17, 2013, 3:11:06 AM9/17/13
to chromi...@chromium.org, sechtl....@gmail.com, Christian Biesinger, Reid Kleckner, Nico Weber

Currently the latest GCC included in the Android NDK is 4.8. I'm not
sure there is a good way for us to experiment with 4.9 until it is in
the NDK.

4.9 is still in stage1 of development, so it is months away from the release...

The problem I am trying to solve is really the chicken-egg scenario.  Until bigger
apps are patched to easily build with LTO and LTO gets regularly tested, we won't be
able to chase bugs from it... I think there is realistic chance to get LTO+FDO useful
for real world in 4.9, but it needs work and testing.

Honza

Nico Weber

unread,
Sep 17, 2013, 1:08:29 PM9/17/13
to Šechtl Voseček, Chromium-dev, Christian Biesinger, Reid Kleckner
Hi Honza,

Chromium should probably build find with gcc 4.9 without too many issues (you might need to disable warnings as errors). I'm happy to help you find the places you need to tweak build settings to enable LTO. You can email me directly with questions, and I'm thakis on #chromium on freenode.

Nico
 

Honza


Martin Liška

unread,
Mar 12, 2014, 9:38:27 AM3/12/14
to chromi...@chromium.org, to...@chromium.org
Hello,
   I would like to compile Chromium with LTO, but there is a lot of linker magic around. You use hard-code gold binary (located with -B added to ldflags in common.gypi). As you probably know, link time optimization needs dynamic linker with plugin (-plugin option) and I would like to use my system installed gold linker. I test gcc 4.9 with latest gold linker (both located on $PATH). I tried to comment out '-B ...' and I encountered following error:

ninja -j1 -v -C out/Release chrome
ninja: Entering directory `out/Release'
[1/10721] cd ../../native_client/src/trusted/service_runtime/linux; python nacl_bootstrap_munge_phdr.py ../../../../../out/Release/nacl_bootstrap_munge_phdr ../../../../../out/Release/nacl_bootstrap_raw ../../../../../out/Release/nacl_helper_bootstrap
FAILED: cd ../../native_client/src/trusted/service_runtime/linux; python nacl_bootstrap_munge_phdr.py ../../../../../out/Release/nacl_bootstrap_munge_phdr ../../../../../out/Release/nacl_bootstrap_raw ../../../../../out/Release/nacl_helper_bootstrap
../../../../../out/Release/nacl_bootstrap_munge_phdr: elf_update: invalid section alignment
Traceback (most recent call last):
  File "nacl_bootstrap_munge_phdr.py", line 39, in <module>
    sys.exit(Main(sys.argv))
  File "nacl_bootstrap_munge_phdr.py", line 33, in Main
    subprocess.check_call([munger, tmpfile, segment_num])
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../../../../../out/Release/nacl_bootstrap_munge_phdr', '../../../../../out/Release/nacl_helper_bootstrap.tmp', '2']' returned non-zero exit status 2

Is it possible to build chrome JUST with gold linker (or do you rely on any BFD-specific feature)?
I will welcome help with chrome build system configuration, it would be nice if you introduced some kind of 'build/gyp_chromium -D enable_lto=yes'.

I am ready to test any suggestions,
Thank you

Roland McGrath

unread,
Mar 12, 2014, 12:07:50 PM3/12/14
to marxin...@gmail.com, chromium-dev, to...@chromium.org
The nacl_helper_bootstrap program (linked as nacl_bootstrap_raw) has very
particular requirements. You really don't want to do anything that changes
the compiler flags or linking procedure for that binary.
native_client/src/trusted/service_runtime/linux/ld_bfd.py attempts to
ensure that it's using a non-gold linker that works correctly for its case.
If your environment is such that this script fails to find an ld.bfd that
works, that will break your build.

Victor Miura

unread,
Oct 2, 2014, 8:27:30 PM10/2/14
to chromi...@chromium.org, marxin...@gmail.com, to...@chromium.org
Has anyone tried building with LTO on Android recently?

Tony Gentilcore

unread,
Oct 6, 2014, 3:09:42 PM10/6/14
to Victor Miura, chromium-dev, marxin...@gmail.com, Fabrice de Gans-Riberi, pa...@chromium.org
Fabrice or Egor would probably have the most up to date info about LTO on Android.

Fabrice de Gans-Riberi

unread,
Oct 7, 2014, 6:01:17 AM10/7/14
to Tony Gentilcore, Victor Miura, chromium-dev, marxin...@gmail.com, Egor Pasko
There are still many issues with LTO for GCC ARM. Mainly, GCC sometimes crashes or generates incorrect code with LTO. We have a tracking bug for enabling LTO (http://crbug.com/407544). It will be behind a flag (use_lto) for the time being. There are already patches in other repos, and I am going to push the one for chromium this week. You can follow the bug if you want to know when that will happen.
This is still a work in progress and there are many many toolchain bugs to iron out before we can deem this configuration stable.

Cheers!
Fabrice
Reply all
Reply to author
Forward
0 new messages