PSA: Enabling -O2 selectively on Android.

204 views
Skip to first unread message

Fabrice de Gans-Riberi

unread,
Sep 17, 2014, 8:55:20 AM9/17/14
to Chromium-dev, perf-s...@chromium.org
Hi all,

tl;dr: we're switching base/ cc/ and skia/ to build with -O2 on Android.

CL in flight, landing soon: https://codereview.chromium.org/555373004/

We have been working on enabling -O2 selectively on the Android build to improve performance while not increasing the binary size too much. Our results show that on most high-level benchmarks we can almost get the full gains from a full -O2 build by enabling -O2 on base/, cc/ and skia/ with a less than 600 kiB binary size (or <2%) increase in official release mode. Here is a document summing up our findings:

Cheers!
Fabrice

Daniel Bratell

unread,
Sep 17, 2014, 11:27:31 AM9/17/14
to Chromium-dev, perf-s...@chromium.org, Fabrice de Gans-Riberi
On Wed, 17 Sep 2014 14:53:05 +0200, Fabrice de Gans-Riberi <fde...@chromium.org> wrote:

Hi all,

tl;dr: we're switching base/ cc/ and skia/ to build with -O2 on Android.

Quite often the amount of code that benefits from maximum unrolling and inlining is just a tiny fraction of a program. It would be nice if there was a way to mark specific code blocks (or at the file level) as "this should be optimized with no other concerns than raw performance".

I think the discussion has been up before, and honestly I am not sure it would work in reality. Way too likely that every developer would mark his favourite code with that flag to make a (globally) irrelevant performance test faster.

But nice job identifying areas to get good bang for the buck! If you do it again, you could try checking blink/wtf since that is kind of equivalent to base where you had some success.

/Daniel

Tom Hudson

unread,
Sep 17, 2014, 11:35:02 AM9/17/14
to Daniel Bratell, Chromium-dev, perf-s...@chromium.org, Fabrice de Gans-Riberi
A few months ago we went through a few test scenarios in Blink on Android with a CPU profiler and tried to identify places that were marked inline but weren't being inlined in practice because of "-Os"; ALWAYS_INLINE works. Unfortunately, that is the sort of performance tweaking that isn't stable long-term; it ought to be revisited every few months, and I don't think anybody has that on their plate.

Tom

Fabrice de Gans-Riberi

unread,
Sep 17, 2014, 11:44:12 AM9/17/14
to Daniel Bratell, Chromium-dev, perf-s...@chromium.org
On Wed, Sep 17, 2014 at 5:26 PM, Daniel Bratell <bra...@opera.com> wrote:
On Wed, 17 Sep 2014 14:53:05 +0200, Fabrice de Gans-Riberi <fde...@chromium.org> wrote:

Hi all,

tl;dr: we're switching base/ cc/ and skia/ to build with -O2 on Android.

Quite often the amount of code that benefits from maximum unrolling and inlining is just a tiny fraction of a program. It would be nice if there was a way to mark specific code blocks (or at the file level) as "this should be optimized with no other concerns than raw performance".

I think the discussion has been up before, and honestly I am not sure it would work in reality. Way too likely that every developer would mark his favourite code with that flag to make a (globally) irrelevant performance test faster.

Yes, that was one of the reasons why we did not go with that approach. That and maintenance, you can't know that a specific piece of code is always going to benefit from being optimized for speed. A full component is more likely to always have the same impact over time. It is also less likely people are going to "hoard" it by adding their files here if they don't belong there.
 
But nice job identifying areas to get good bang for the buck! If you do it again, you could try checking blink/wtf since that is kind of equivalent to base where you had some success.

Thanks! It seems we are lacking in terms of high-level blink-oriented benchmarks. My own tests have shown that for the metrics that do benefit from blink (namely record_time), the improvements seem to come from blink_web, which is too large right now.

Fabrice

Fabrice de Gans-Riberi

unread,
Sep 17, 2014, 11:50:56 AM9/17/14
to Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
On Wed, Sep 17, 2014 at 5:34 PM, Tom Hudson <tomh...@google.com> wrote:
[SNIP]

A few months ago we went through a few test scenarios in Blink on Android with a CPU profiler and tried to identify places that were marked inline but weren't being inlined in practice because of "-Os"; ALWAYS_INLINE works. Unfortunately, that is the sort of performance tweaking that isn't stable long-term; it ought to be revisited every few months, and I don't think anybody has that on their plate.


I am curious about what tools you used to identify what functions would benefit from inlining? Could the process be automated on a bot?
We are working on using FDO for Android, which should be able to find good inlining decisions, maybe, if all goes well. But I'd like to hear more about how you did it.

Thanks!
Fabrice

Chris Harrelson

unread,
Sep 17, 2014, 12:23:55 PM9/17/14
to fde...@chromium.org, chri...@chromium.org, Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
I assume it's not possible to use -O2 on a per-file basis? Why is that?

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Fabrice de Gans-Riberi

unread,
Sep 17, 2014, 12:54:58 PM9/17/14
to Chris Harrelson, chri...@chromium.org, Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
On Wed, Sep 17, 2014 at 6:22 PM, Chris Harrelson <chri...@google.com> wrote:
I assume it's not possible to use -O2 on a per-file basis? Why is that?


I experimented with that back in January. There are several reasons why we didn't go through with it:
-Gyp does not support it. You can either have one-file-targets or use a "pseudo-gcc" script to replace command-line arguments for specific files before calling the real gcc. Neither option are very pretty.
-You cannot really automatize the file set to optimize. I tried going with the "most used functions during a run" but these do not necessarily benefit from being optimized for speed.
-If you go with a manual approach, it is unmaintainable. You don't know how the codebase is going to evolve and if your local speed optimization is going to remain relevant.

Cheers!
Fabrice

Chris Harrelson

unread,
Sep 17, 2014, 1:16:48 PM9/17/14
to Fabrice de Gans-Riberi, Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
On Wed, Sep 17, 2014 at 9:53 AM, Fabrice de Gans-Riberi <fde...@chromium.org> wrote:
On Wed, Sep 17, 2014 at 6:22 PM, Chris Harrelson <chri...@google.com> wrote:
I assume it's not possible to use -O2 on a per-file basis? Why is that?


I experimented with that back in January. There are several reasons why we didn't go through with it:
-Gyp does not support it. You can either have one-file-targets or use a "pseudo-gcc" script to replace command-line arguments for specific files before calling the real gcc. Neither option are very pretty.

I see. But that can be fixed with effort, if it is justified.
 
-You cannot really automatize the file set to optimize. I tried going with the "most used functions during a run" but these do not necessarily benefit from being optimized for speed. 
-If you go with a manual approach, it is unmaintainable. You don't know how the codebase is going to evolve and if your local speed optimization is going to remain relevant.

I'm pretty sure the Blink engineers can identify some pieces of code are hotspots. Though point taken that it's a bit of a guessing game, and requires attention over time. I'll ask around to see what others think.
 

Nico Weber

unread,
Sep 17, 2014, 1:35:13 PM9/17/14
to Chris Harrelson, Fabrice de Gans-Riberi, Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
FWIW, I'm pretty strongly exposed to doing this on a per-file basis, or even more granularly. This isn't stable over time, and it's better to try and find better algorithms than to spend a lot of time on micro-optimizations – these tend to suck up a lot of time, usually don't produce wins in macro benchmarks, make code and build files confusing, slow down other people trying to change code, etc. Doing this (very conservatively!) per target was the compromise we arrived at.

Nico Weber

unread,
Sep 17, 2014, 2:00:42 PM9/17/14
to Chris Harrelson, Fabrice de Gans-Riberi, Tom Hudson, Daniel Bratell, Chromium-dev, perf-s...@chromium.org
On Wed, Sep 17, 2014 at 10:32 AM, Nico Weber <tha...@chromium.org> wrote:
On Wed, Sep 17, 2014 at 10:15 AM, Chris Harrelson <chri...@chromium.org> wrote:


On Wed, Sep 17, 2014 at 9:53 AM, Fabrice de Gans-Riberi <fde...@chromium.org> wrote:
On Wed, Sep 17, 2014 at 6:22 PM, Chris Harrelson <chri...@google.com> wrote:
I assume it's not possible to use -O2 on a per-file basis? Why is that?


I experimented with that back in January. There are several reasons why we didn't go through with it:
-Gyp does not support it. You can either have one-file-targets or use a "pseudo-gcc" script to replace command-line arguments for specific files before calling the real gcc. Neither option are very pretty.

I see. But that can be fixed with effort, if it is justified.
 
-You cannot really automatize the file set to optimize. I tried going with the "most used functions during a run" but these do not necessarily benefit from being optimized for speed. 
-If you go with a manual approach, it is unmaintainable. You don't know how the codebase is going to evolve and if your local speed optimization is going to remain relevant.

I'm pretty sure the Blink engineers can identify some pieces of code are hotspots. Though point taken that it's a bit of a guessing game, and requires attention over time. I'll ask around to see what others think.

FWIW, I'm pretty strongly exposed

(and also opposed)
Reply all
Reply to author
Forward
0 new messages