Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Firefox and clang-cl

656 views
Skip to first unread message

Ehsan Akhgari

unread,
Aug 12, 2017, 9:41:18 PM8/12/17
to dev-pl...@lists.mozilla.org
Hi everyone,

As you may have heard by now, Chromium has started to switch their Windows
builds to use clang-cl instead of MSVC [1]. This has improved their
Speedometer v2 benchmark score on x86 (but not on x86-64) by about 30%
according to AWFY [2]. Over the past few days, several people have reached
out to me to ask about how close we are to switching to clang-cl on
Windows, I think due to seeing this improvement on their side. I thought
I'd write up a summary of the current state of affairs to my knowledge in
the hopes that many people would find it interesting.

First things first, please note that just because Chromium has seen this
improvement shouldn't make us automatically expect a similar improvement in
Speedometer v2 scores in Firefox when built with clang-cl. Both Chromium
and Firefox are large enough codebases that we shouldn't assume any such
results to transfer from one side to the other. Also, you should note that
the Chrome have also seen some regressions in some other benchmarks as a
result of this change, and those regressions are currently being
investigated. Those who are curious can see the dependency list of this
bug [3].

About our current status with clang-cl, right now Firefox builds with
clang-cl. We use these builds for the purpose of static analysis using our
custom clang plugin (similarly to Mac and Linux). These builds are stood
up on TreeHerder under "Windows 2012 opt" and "Windows 2012 x64 opt" marked
as "S" jobs.

That being said, we still have a lot of work ahead of us before we can get
to a point where we can consider switching to clang-cl for the builds that
we ship to our users. The below is a rough list of things we need to look
into before we can consider doing so.

* Keeping up with the LLVM trunk.
Any serious attempt for us to switch from MSVC to clang-cl will involve
fixing bugs on the LLVM side in addition to on the Firefox side (as the
long history of the work done so far [4] demonstrates.) Right now, the
current LLVM version we use on Windows [5] was last updated in February and
is outdated. glandium recently tried building with LLVM trunk and there is
a regression on the LLVM side causing the build to fail. :-( But in the
periods of time when we have been actively working on the clang-cl port, we
have tried to follow the LLVM trunk as closely as possible in order to
reduce the amount of work involved in cherry-picking the fixes we need.

* Ensuring the correctness of the resulting build.
clang-cl implements Microsoft's ABI and attempts to produce object files
that are compatible with those produced by MSVC. As such, even though we
already build and ship our code with clang, it is possible that we still
have bugs lurking either on our side or on the LLVM side that we need to
find and fix (not to speak of all of the Windows specific code we have
which hasn't been exercised in a shipping environment with clang.) The
first step here would be to stand up all of our tests on the clang-cl
builds and making them green. Ensuring things like crash rates being
similar to MSVC builds, etc. would be the next steps.

* Ensuring the performance of the resulting build.
The MSVC builds that we ship are compiled using the PGO compiler. In order
to perform a fair comparison with clang-cl, we should probably try to get
PGO builds with clang-cl to work [6]. There is no theoretical reason why
this can't work, but this isn't something that we currently support.
Failing that, there is another option which is using LTO [7] but that
requires us to also port Firefox to link with lld [8] as well. Another
open question is how to compare the performance. The obvious answers would
be to run our Talos benchmarks, and AWFY benchmarks against the two
builds. Whether that would be enough is an open question.

* Ensuring debuggability of the builds.
I haven't paid much close attention to the recent LLVM developments for
CodeView debug info support, but clang-cl has some support for -Z7 and -Zi
flags [9]. We need to ensure that the generated debug info works well for
our stackwalking needs (both locally using a debugger/programatically and
on the server side for crash-stats) and the generated builds are usefully
debuggable on Windows.

* (If there are other potential details I'm not thinking of right now,
please feel free to mention it here.)


Last but not least, you may ask yourself why would we want to spend this
much effort to switch to clang-cl on Windows? I believe this is an
important long term shift that is beneficial for us. First and foremost,
clang is a vibrant open source compiler, and being able to use open source
toolchains on our most important platforms is really important for us in
terms of being able to contribute to the compiler where needed (anyone
remember the issues we had a few years back with regards to MSVC PGO
compiler hitting the maximum address space limit on Win32 when linking
Firefox?).

But more importantly, clang supports many exciting features that MSVC
lacks. For example, clang usually implements newer language features
faster than MSVC (sometimes by years), and our usage of MSVC holds us back
in terms of the adoption of such features. Also, it has features such as
various sanitizers, some of which [10] we may want to consider turning on
by default in the builds that we ship to our users.

I hope this is helpful.

Cheers,
Ehsan


[1]
https://groups.google.com/a/chromium.org/forum/#!msg/chromium-dev/Y3OEIKkdlu0/TCcT1SvwAwAJ
[2]
https://arewefastyet.com/#machine=37&view=single&suite=speedometer-misc&subtest=score&start=1501023532&end=1501883073
[3] https://bugs.chromium.org/p/chromium/issues/detail?id=82385
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=winclang
[5] See llvm_revision in
https://searchfox.org/mozilla-central/source/build/build-clang/clang-win32.json
and
https://searchfox.org/mozilla-central/source/build/build-clang/clang-win64.json
[6] While it is possible to compare the MSVC PGO build's performance to
clang non-PGO build's performance, that is really an apples vs. oranges
comparison since the types of optimizations done by the compiler would be
quite different.
[7] http://llvm.org/docs/LinkTimeOptimization.html
[8] https://lld.llvm.org/
[9] https://msdn.microsoft.com/en-us/library/958x11bc.aspx
[10] For example, ubsan <
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>, CPI <
https://clang.llvm.org/docs/SafeStack.html> and CFI <
https://clang.llvm.org/docs/ControlFlowIntegrity.html>.

--
Ehsan

Jeff Muizelaar

unread,
Aug 12, 2017, 11:22:41 PM8/12/17
to Ehsan Akhgari, Mozilla
On Sat, Aug 12, 2017 at 9:40 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Last but not least, you may ask yourself why would we want to spend this
> much effort to switch to clang-cl on Windows? I believe this is an
> important long term shift that is beneficial for us. First and foremost,
> clang is a vibrant open source compiler, and being able to use open source
> toolchains on our most important platforms is really important for us in
> terms of being able to contribute to the compiler where needed

It's worth emphasizing the value of using an open source compiler.
Being able to find and fix bugs in the compiler instead of having to
work around them without knowing the true cause is enormously
valuable. A recent example of this happened to me yesterday with
https://bugzilla.mozilla.org/show_bug.cgi?id=1382857. Once I had
reported the issue (https://bugs.llvm.org/show_bug.cgi?id=34163) a fix
was committed to clang trunk in less than 6 hours. That's something
not ever possible with MSVC.

-Jeff

Mike Hommey

unread,
Aug 13, 2017, 1:20:31 AM8/13/17
to Jeff Muizelaar, Ehsan Akhgari, Mozilla
That bugs can be fixed in a few hours is nice, but that's not the main
advantage. (also, not all bugs are fixed in a few hours. The one that
makes Firefox fail to build with current clang trunk has been open for 2
weeks and hasn't been fixed yet).

The main advantage is that you can take that fix, and apply it to your
compiler *right now*. As opposed to "whenever the vendor makes a new
release". We're regularly applying patches to clang and GCC. We can't
do the same to MSVC, even when we file bugs to Microsoft and they fix
them.

Another advantage is that if you're so inclined, you can fix the
it yourself.

Mike

PS: And once clang-cl+lld actually works, it will also be possible to
build for Windows on Linux.

cosinus...@gmail.com

unread,
Aug 13, 2017, 9:32:52 AM8/13/17
to
Haven't you been able to do that with MinGW on Linux since about 1998?

Thanks
Liam Wilson

Joshua Cranmer 🐧

unread,
Aug 13, 2017, 12:38:55 PM8/13/17
to
On 8/13/2017 8:32 AM, cosinus...@gmail.com wrote:
> Haven't you been able to do that with MinGW on Linux since about 1998?

MinGW doesn't follow the MSVC ABI, as I recall, which makes any MS
interface that uses C++ unusable. I believe this causes issues in places
like accessibility or graphics.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Julian Seward

unread,
Aug 14, 2017, 3:28:12 AM8/14/17
to Ehsan Akhgari, dev-pl...@lists.mozilla.org
On 13/08/17 03:40, Ehsan Akhgari wrote:
> As you may have heard by now, Chromium has started to switch their Windows
> builds to use clang-cl instead of MSVC [1]. This has improved their
> Speedometer v2 benchmark score on x86 (but not on x86-64) by about 30%
> according to AWFY [2]. [..]

Do we have any insight into why the Clang version is so much faster?
30% strikes me as a large difference for two supposedly mature optimizing
compilers. And stranger still that it applies only for the 32-bit case.
So I'm curious to know what's changed.

J

Till Schneidereit

unread,
Aug 14, 2017, 6:11:23 AM8/14/17
to jse...@acm.org, Ehsan Akhgari, dev-platform
AFAICT, the real change is about 19%: shortly before the jump to ~103,
their score regressed from 86 to 78. I think using 86 as the baseline makes
much more sense. 19% is obviously still a substantial improvement from a
compiler change.

Ben Kelly

unread,
Aug 14, 2017, 9:58:17 AM8/14/17
to Till Schneidereit, jse...@acm.org, dev-platform, Ehsan Akhgari
Google ran a bisect on the regression and improvements. Here is some
explanation of the regression:

https://bugs.chromium.org/p/chromium/issues/detail?id=749359#c4

The comment suggests that the change would see an improvement in clang
since the code would then use class "overflow builtins".

Its hard to tell if all of the clang-on-windows improvement is due to this
same set of code or not.

The bisects were run from this issue:

https://bugs.chromium.org/p/chromium/issues/detail?id=750672#c12

Ben

Tristan Bourvon

unread,
Aug 14, 2017, 10:44:37 AM8/14/17
to dev-pl...@lists.mozilla.org
Here's the RFC of the overflow builtins:
http://clang-developers.42468.n3.nabble.com/RFC-Introduce-overflow-builtins-td3838320.html
Along with the tracking issue: https://bugs.llvm.org/show_bug.cgi?id=12290
And the patch:
https://github.com/llvm-mirror/clang/commit/98d1ec1e99625176626b0bcd44cef7df6e89b289

There's also another patch that was added on top of this one which adds
more overflow builtins:
https://github.com/llvm-mirror/clang/commit/c41c63fbf84cc904580e733d1123d3b03bb5584c

It seems clear that this optimization could bring big performance
improvements on hot functions. It could also reduce binary size
substantially (we're talking about 14->5 instructions in their case).

Tristan
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform


tha...@chromium.org

unread,
Aug 14, 2017, 1:12:46 PM8/14/17
to
Hi,

we (Chromium) are also happy to answer questions if there's interest. We've looked at most of these issues in some detail.

In bullet points:
* Correctness: You might have some UB here and there but I wouldn't expect this to be a big problem.
* Performance: We switched from msvc+pgo to clang without pgo and got comparable perf. We did have to use an order file (/order: flag to link.exe) to get comparable startup perf.
*Debuggability: Basically works, see blockers of https://crbug.com/636111 for in-progress work. link.exe can produce pdbs with clang's codeview debug info. -Z7 and -Zi are aliased to each other in clang-cl, we don't do mspdbsrv)

You can find us on #chromium on freenode, or over email.

Nico

Ehsan Akhgari

unread,
Aug 14, 2017, 4:36:37 PM8/14/17
to tha...@chromium.org, dev-pl...@lists.mozilla.org
On 08/14/2017 01:12 PM, tha...@chromium.org wrote:
> Hi,
>
> we (Chromium) are also happy to answer questions if there's interest. We've looked at most of these issues in some detail.
Thanks Nico, much appreciated!

(For the record, we have already gotten a lot of help from the Google
compiler folks with the LLVM side fixes for the bugs that we discovered
while porting Firefox to build with clang-cl. Getting to where we are
now would certainly not be possible without that help! I wish I had done
a better job at compiling a full list of all of those contributions over
the years...)
> In bullet points:
> * Correctness: You might have some UB here and there but I wouldn't expect this to be a big problem.
Yes, we have indeed found and fixed bugs in our Windows specific code
where MSVC has been too lenient in accepting bad C++ code, e.g.
https://bugzilla.mozilla.org/show_bug.cgi?id=1251226.

> * Performance: We switched from msvc+pgo to clang without pgo and got comparable perf. We did have to use an order file (/order: flag to link.exe) to get comparable startup perf.
That is very interesting! This is one of the aspects that we have been
worried about a lot. We should probably also think about using /order
as well.

Does Chromium plan to switch to use clang with PGO on Windows by any chance?
> *Debuggability: Basically works, see blockers of https://crbug.com/636111 for in-progress work. link.exe can produce pdbs with clang's codeview debug info.
Wow, it looks like things have improved quite a bit on this front since
the last time I looked at this closely. Really impressive work!
> -Z7 and -Zi are aliased to each other in clang-cl, we don't do mspdbsrv)
I think this should be sufficient for Firefox's needs as well.

Cheers,
Ehsan
> You can find us on #chromium on freenode, or over email.
>
> Nico

Hans Wennborg

unread,
Aug 14, 2017, 4:47:02 PM8/14/17
to Ehsan Akhgari, Nico Weber, dev-pl...@lists.mozilla.org
On Mon, Aug 14, 2017 at 1:36 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Does Chromium plan to switch to use clang with PGO on Windows by any chance?

Yes, we want to do LTO+PGO builds eventually. (In particular, we'd
like to use ThinLTO for more manageable build times.) That requires
switching to using the lld linker, which in turn is requires adding
support for writing pdb files, an area that's been making lots of
progress lately.

Cheers,
Hans

Ted Mielczarek

unread,
Aug 15, 2017, 12:31:30 PM8/15/17
to Ehsan Akhgari, tha...@chromium.org, dev-pl...@lists.mozilla.org
On Mon, Aug 14, 2017, at 04:36 PM, Ehsan Akhgari wrote:
> > * Performance: We switched from msvc+pgo to clang without pgo and got comparable perf. We did have to use an order file (/order: flag to link.exe) to get comparable startup perf.
> That is very interesting! This is one of the aspects that we have been
> worried about a lot. We should probably also think about using /order
> as well.

It seems plausible that we could use our existing PGO build steps to
capture the proper ordering and then re-link using that as input to
/order. We already instrument the order of access of omni.ja entries
during that step and use that to produce an optimized omni.ja in the
second build pass.

> > *Debuggability: Basically works, see blockers of https://crbug.com/636111 for in-progress work. link.exe can produce pdbs with clang's codeview debug info.
> Wow, it looks like things have improved quite a bit on this front since
> the last time I looked at this closely. Really impressive work!
> > -Z7 and -Zi are aliased to each other in clang-cl, we don't do mspdbsrv)
> I think this should be sufficient for Firefox's needs as well.

We already build all of our non-PGO Windows builds with -Z7 for
compatibility with sccache anyway:
https://dxr.mozilla.org/mozilla-central/rev/b95b1638db48fc3d450b95b98da6bcd2f9326d2f/build/mozconfig.cache#137


-Ted

Zachary Turner

unread,
Aug 20, 2017, 9:19:15 PM8/20/17
to
On Monday, August 14, 2017 at 1:47:02 PM UTC-7, Hans Wennborg wrote:
> Yes, we want to do LTO+PGO builds eventually. (In particular, we'd
> like to use ThinLTO for more manageable build times.) That requires
> switching to using the lld linker, which in turn is requires adding
> support for writing pdb files, an area that's been making lots of
> progress lately.
>
> Cheers,
> Hans

A quick update on this, but I wrote a blog post about the state of PDBs in lld today.

http://blog.llvm.org/2017/08/llvm-on-windows-now-supports-pdb-debug.html

TL;DR is that all the hard stuff is done, PDBs are happening, and now it's just a matter of finding debug quality issues. I haven't tried linking Chromium with lld yet, but I have tried linking clang with lld and I'm seeing /DEBUG:FULL link times with /Z7 being similar to MSVC /DEBUG:FASTLINK times with /Zi.

(With all the obvious disclaimers about how as we find / fix debug quality issues, things could slow down, since PDB generation without /Zi is a pretty big contributor to link time).

If anyone here is up for dogfooding, I would love to have bug reports against LLD's PDB debug info quality.

Ehsan Akhgari

unread,
Aug 23, 2017, 12:47:09 AM8/23/17
to Zachary Turner, dev-pl...@lists.mozilla.org
This is great news! For those who are interested in the very young
project to port Firefox to link with lld,
https://bugzilla.mozilla.org/show_bug.cgi?id=linker-lld tracks that work.

Cheers,
Ehsan

Ben Kelly

unread,
Sep 7, 2017, 10:04:43 AM9/7/17
to Tristan Bourvon, dev-pl...@lists.mozilla.org
On Mon, Aug 14, 2017 at 10:44 AM, Tristan Bourvon <tbou...@mozilla.com>
wrote:

> Here's the RFC of the overflow builtins:
> http://clang-developers.42468.n3.nabble.com/RFC-Introduce-
> overflow-builtins-td3838320.html
> Along with the tracking issue: https://bugs.llvm.org/show_bug.cgi?id=12290
> And the patch:
> https://github.com/llvm-mirror/clang/commit/98d1ec1e99625176626b0bcd44cef7
> df6e89b289
>
> There's also another patch that was added on top of this one which adds
> more overflow builtins:
> https://github.com/llvm-mirror/clang/commit/c41c63fbf84cc904580e733d1123d3
> b03bb5584c
>
> It seems clear that this optimization could bring big performance
> improvements on hot functions. It could also reduce binary size
> substantially (we're talking about 14->5 instructions in their case).
>

Do we have a bug filed to investigate these overflow builtins? Should we
file one?

Even if we can't use them on all platforms yet, it might be a nice win on
mac where we lack other optimizations like PGO right now.

Thanks.

Ben

Nathan Froyd

unread,
Sep 7, 2017, 10:09:37 AM9/7/17
to Ben Kelly, dev-pl...@lists.mozilla.org, Tristan Bourvon
On Thu, Sep 7, 2017 at 10:04 AM, Ben Kelly <bke...@mozilla.com> wrote:
> On Mon, Aug 14, 2017 at 10:44 AM, Tristan Bourvon <tbou...@mozilla.com>
> wrote:
>
>> Here's the RFC of the overflow builtins:
>> http://clang-developers.42468.n3.nabble.com/RFC-Introduce-
>> overflow-builtins-td3838320.html
>> Along with the tracking issue: https://bugs.llvm.org/show_bug.cgi?id=12290
>> And the patch:
>> https://github.com/llvm-mirror/clang/commit/98d1ec1e99625176626b0bcd44cef7
>> df6e89b289
>>
>> There's also another patch that was added on top of this one which adds
>> more overflow builtins:
>> https://github.com/llvm-mirror/clang/commit/c41c63fbf84cc904580e733d1123d3
>> b03bb5584c
>>
>> It seems clear that this optimization could bring big performance
>> improvements on hot functions. It could also reduce binary size
>> substantially (we're talking about 14->5 instructions in their case).
>>
>
> Do we have a bug filed to investigate these overflow builtins? Should we
> file one?

There is bug 1356936 for mozilla::CheckedInt; I don't know how many
saturating-style arithmetic implementations we have in the tree, or
whether similar bugs exist for those.

-Nathan

Ben Kelly

unread,
Sep 7, 2017, 10:12:44 AM9/7/17
to Nathan Froyd, dev-pl...@lists.mozilla.org
I guess my impression was this would be something we would want the jit to
emit in its bytecode. But maybe I don't fully understand.

Ehsan Akhgari

unread,
Sep 7, 2017, 11:24:10 AM9/7/17
to Ben Kelly, Nathan Froyd, dev-pl...@lists.mozilla.org
These are LLVM/gcc builtins, which allow the C++ compiler code generator
to understand that a high-level operation like
<https://searchfox.org/mozilla-central/rev/67f38de2443e6b613d874fcf4d2cd1f2fc3d5e97/mfbt/CheckedInt.h#256>
is actually just checking whether an addition overflows, so that it can
generate efficient machine code based on that. For example, on x86 you
can use the overflow bit alongside instructions such as JO (jump on
overflow) or JNO (jump on not overflow) to do these checks more
efficiently, but without the builtins the compiler won't be able to
decipher what the IsAddValid function is trying to do and would instead
generate more inefficient code that does bit manipulation.

If our JIT code emits overflow checks, it can directly emit the more
efficient machine code as it has an assembler and it doesn't need to
rely on C++ compiler intrinsics like this.

Hope this helps!

Cheers,
Ehsan
0 new messages