Re: Why -O2 in Linux gcc and -O3 in OSX gcc?

492 views
Skip to first unread message

Mark Mentovai

unread,
May 11, 2011, 12:28:35 AM5/11/11
to chromium-dev
[This came up semi-privately but it was suggested that chromium-dev
would be a better place for discussion.]

>> For Release builds, we use -O2 in Linux gcc and -O3 in OSX gcc. -O3 may
>> dramatically increase the elf size with no guarantees of increased
>> performance. The increased binary size makes it possible to reduce
>> performance due to increased cache misses.

Thanks for your concern. It’s not clear whether I’ve got the full
context of this question or if that’s even relevant here, but rest
assured that we’re well aware of the tradeoffs and made an informed
decision to do things as we’ve done.

The Mac build started out at -Os, and when we had enough of the
product up and running and enough perf testing in place, I
experimented with -O2 and -O3, among other more specific tunables. I
watched all of the perf data that we track very closely when I made
this change. My initial hypothesis was that -O3 wouldn’t be worth the
speed/size tradeoff, and expected to settle on -O2. I was wrong. At
the time, -O3 gave us the best results for startup time, pageload test
time, and V8 benchmark score. The advantages over -O2 and -Os were not
insignificant.

Speed was one of Chrome’s esses. Size wasn’t. Speed won.

The size difference in those days was about a megabyte in the
compressed .dmg. It’d probably be more now since we’ve gotten so
bloaty.

I’m not aware of anyone taking -O level experimentation on Linux as
seriously as I did on the Mac. That doesn’t mean it didn’t happen,
though.

>> The official chrome downloads seem to indicate the size diff: (hopefully
>> not comparing apples to oranges here)

But you absolutely are. There are obvious differences in all of the
platform-specific code, and features that are present on one platform
but not another. For example, you’re looking at a .deb that doesn’t
even include a Flash plug-in, since there is no x86_64 Flash plug-in.
Linux versions don’t carry their own auto-update code, and rely on the
system to provide certain libraries that we build and bundle on the
Mac. The Mac version carries more icons in more resolutions than the
Linux one. The Mac version carries more translated strings, because
Mac-only strings are usually excluded from other platforms, but
strings that aren’t used on the Mac aren’t excluded there. The Mac
version carries most of its user interface in resource files and not
code, because that’s how the Mac UI toolkit works best. The Mac
version bundles more metadata for various OS-integration features that
are entirely absent in the Linux package, including scripting, managed
preferences, and code signature data. You’re looking at an x86_64
Linux package, but all Mac packages are x86.

More importantly, I assume you’ve tried to “level the playing field”
by wrapping each package in a .tar.gz. Our .dmg is already
bzip2-compressed, and most of the contents of our .debs are
lzma-compressed, and as I’m sure you’re aware, the extra layer of gzip
compression on each at that point doesn’t actually do anything to make
the comparison fair.

>> googlechrome.dmg.tgz = 37MB
>> google-chrome-amd64.deb.tgz = 22MB

Yes, -O3 is bigger. No, it’s not 2/3 bigger. A more apt first-cut
comparison would be to take the x86 Linux version, install both the
Mac and Linux versions, and then compare the (uncompressed or
similarly-compressed) sizes of each. Using the current dev versions,
/Applications/Google Chrome.app is 99MB uncompressed and 34MB in a
.tar.bz2 (with -9); /opt/google/chrome is 87MB uncompressed and 31MB
in a similar .tar.bz2. In these terms, 3MB for added -O3 size overhead
plus all of the other stuff the Mac version bundles that the Linux
version doesn’t have actually seems pretty reasonable.

Lei Zhang

unread,
May 11, 2011, 12:50:42 AM5/11/11
to ma...@chromium.org, chromium-dev
On Tue, May 10, 2011 at 9:28 PM, Mark Mentovai <ma...@chromium.org> wrote:
> I’m not aware of anyone taking -O level experimentation on Linux as
> seriously as I did on the Mac. That doesn’t mean it didn’t happen,
> though.

I did -O2 vs -Os here: http://code.google.com/p/chromium/issues/detail?id=76288.

Evan Martin

unread,
May 11, 2011, 12:12:34 PM5/11/11
to ma...@chromium.org, chromium-dev
On Tue, May 10, 2011 at 9:28 PM, Mark Mentovai <ma...@chromium.org> wrote:
> Yes, -O3 is bigger. No, it’s not 2/3 bigger. A more apt first-cut
> comparison would be to take the x86 Linux version, install both the
> Mac and Linux versions, and then compare the (uncompressed or
> similarly-compressed) sizes of each.

Less accurate, but much less work: go to build.chromium.org, pick
"sizes" in the upper left, pick the appropriate chart.
Looks like both ('chrome' / 'Chromium.framework') are around 59mb.

I was going to say that Mark left out that OS X uses an ancient gcc,
but I guess we actually use roughly the same ancient gcc on Linux. I
took a look at an Ubuntu build, which uses a gcc that was released
after the iPhone, and it's 50mb. (But they also have different build
settings.)

I think Mark's characterization of there being a lot more stuff in the
Mac version isn't exactly accurate -- we also have plenty of
Linux-only goop, like Skia, multiple implementations of password
storage for KDE/Gnome, multiple fallback paths for proxy settings, man
pages, etc. -- but I don't think that matters too much.

I'm probably as responsible as anyone for using -O2 on Linux and I
think I might have played with -O3 once but I just can't be bothered
to worry about it too much. Reporters make headlines out of 1%
differences on SunSpider on non-Linux platforms but realistically I
expect the performance is similar with whichever compiler flags. (I
see in Mark's experimentation he found it was a non-insignificant
difference, so maybe my intuition is plain wrong.) I would welcome a
patch to use -O3 if you could make a convincing argument (sizes + perf
numbers + interpretation) that it is better.

Another way of putting this is that my intuition is that our large
download size is more damaging to our long-term success than another
5% of performance. Or just that I have too much other stuff to do.

Mark Mentovai

unread,
May 11, 2011, 12:30:13 PM5/11/11
to Evan Martin, chromium-dev
Evan Martin wrote:
> I think Mark's characterization of there being a lot more stuff in the
> Mac version isn't exactly accurate -- we also have plenty of
> Linux-only goop, like Skia,

We do build that one on the Mac too.

> Or just that I have too much other stuff to do.

:)

Nico Weber

unread,
May 11, 2011, 12:38:23 PM5/11/11
to ma...@chromium.org, Evan Martin, chromium-dev
On Wed, May 11, 2011 at 9:30 AM, Mark Mentovai <ma...@chromium.org> wrote:
Evan Martin wrote:
> I think Mark's characterization of there being a lot more stuff in the
> Mac version isn't exactly accurate -- we also have plenty of
> Linux-only goop, like Skia,

We do build that one on the Mac too.

And we even use it for a few things ;-)
 

> Or just that I have too much other stuff to do.

:)

--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
   http://groups.google.com/a/chromium.org/group/chromium-dev

Frank Barchard

unread,
May 20, 2011, 3:32:47 PM5/20/11
to tha...@chromium.org, Mark Mentovai, Evan Martin, chromium-dev
I've looked into O3 vs O2 for 2 things: ffmpeg and Google Talk Video.

On ffmpeg O2 is about 30% smaller.
Performance (fps) is ~5 better on pentium4, marginally worse on Core, and ~5% worse on Atom.
Download/DLL Loadtime/memory for the ffmpeg dll's is proportional to DLL size.  So we went with O2.

On talk, code size is 20% larger with O3 and performance is 15% faster.
We decided 20% larger overall was too much, but only a small percentage of the code needed speed.
So we did O3 on image processing and codec related code, and O2 on everything else, resulting in a negligible size increase.

In video games, memory (DRAM) is even more dire than performance.  So we uses Os or O2, even with a large performance hit, but wrote the critical code in assembly.

I suggest O2, and you can make up for the inlining and unrolling with hand tuned code.
Reply all
Reply to author
Forward
0 new messages