Engineering Effectiveness Newsletter (August and September 2022)

Marco Castelluccio

unread,

Oct 19, 2022, 7:03:18 AM10/19/22

to dev-pl...@mozilla.org, Firefox Dev

Welcome to the August and September edition of the Engineering Effectiveness Newsletter! The Engineering Effectiveness org makes it easy to develop, test and release Mozilla software at scale. See below for some highlights, then read on for more detailed info!

Highlights

All Hands!
We significantly improved the experience of users by shifting 50-60% of parent process crashes on Windows to content process crashes.
PDF editing was released in 106, and the new feature was picked up by several press outlets!

Detailed Project Updates

Bugzilla and Bugbug

Suhaib implemented a feature for autonag to reduce bugmail by aggregating related bot changes before posting them on Bugzilla.
Suhaib implemented a feature for autonag to detect mozregression results from bug comments and automatically set the "regressed by" field when possible.
Suhaib implemented a script that gradually requests re-triaging bugs with old high-severity values (i.e., blocker, critical, or major).
Suhaib implemented a feature for autonag to detect bugs with low crash volume to drop topcrash keywords and decrease the severity.

Build System and Mach Environment

Starting with Bug 1746462 you can use the mold linker for linking Mozilla Firefox on most Linux distributions. MacOS support will come shortly.

CI and Treeherder

arai added a mitten icon to tasks on Treeherder to indicate there are both failed and successful runs for it on the same push, or intermittent if you will.
ahal implemented a public_restricted pull request policy in Taskcluster Github and the Firefox-CI cluster.

This allows projects to run non-sensitive tasks on contributor pull requests without exposing sensitive scopes.

Brenden landed a patch to reduce the retention of Taskcluster artifacts depending on the branch, which will result in huge cost savings.

Next steps will be to reduce retention even further for artifacts we don’t need to store that long, e.g. crash symbols.

Jmaher added a “new” button to treeherder which should help give a signal for new failures not seen in the last 3 week.
masterwayz made lots of progress on the GCP migration, moving shippable builds and Android emulator tests from AWS over to GCP

Crash Management

All of Ubuntu Snap channels (stable, beta, edge and ESR) should now have proper symbols and are integrated with the pipeline to keep them updated.

Lint, Static Analysis and Code Coverage

code-review bot now has a retry mechanism when pushing to remote repositories and failure happens. It also detects try closures and retries when try is open.
Integration between heroku and code-review bot is more streamlined when shutdowns happen.
When a base revision is missing for the code-review bot, a message is going to be displayed in Phabricator informing the developer about this.
code-review bot implements the Mozilla Dockerflow supports.

OS Integration and Security

Gabriele Svelto landed a change to our locking on macOS that significantly improves Firefox responsiveness on loaded systems.
Raymond Kraesig, in coordination with the Windows Spotlight team and Gabriele Svelto’s insight specifically, was able to shift 50-60% of parent process crashes on Windows to content process crashes, which significantly improves the experience for our users. Instead of bringing down the whole browser, a user typically only needs to reload a tab. The team is looking at reducing the total number of OOM crashes next. This graph illustrates the steep dropoff in parent process crashes (blue):

Loading...

Bob Owen landed a change to use the WER runtime exception module to catch early crashes.
The Windows Spotlight team continued to address stability (1, 2, 3, 4, 5, 6), fullscreen (1), multi-monitor (1) and sizing (1, 2) issues.
Yannis Juglaret published mitimon, a tool that will help us collect useful debug information from bug reporters when their issue is related to security mitigations and we have trouble reproducing it.

PDF.js

PDF editing is enabled by default in 106, which was just released!

Next steps will be the ability to import signatures from images and the ability to highlight and comment.

Calixte fixed some serious pdf.js accessibility issues for HCM users and keyboard users

Power use

The new power profiling feature got some attention when it shipped in Firefox 104, including a blog post that reversed engineered it and wonders if/when a similar feature will appear in other browsers.
In addition to showing the instantaneous power, the profiler tooltip for power tracks now shows the energy used over the selected time range.

Release Engineering and Management

S3 storage investigation

Archive.m.o cleanup

aki presented a Chain of Trust deep dive to help cross-train on this important topic
gabriel and ahal automated more of the VPN release pipeline with signing and beetmover tasks
gabriel and ahal also created a release pipeline for the new VPN addons by adding signing, beetmover and a release promotion action
String freeze date is now available in product-details
The l10n bumper now runs on autoland instead of mozilla-central

mozregression

zeid made improvements to the CI workflow, including automating PyPI deployments, moving all CI to GitHub Actions including the building of Windows packages. Also upgraded PySide2 to PySide6, dropped support for Python 3.6, and cleaned up outdated and unmaintained packages.

Other

ahal and Sylvestre proposed and created a new module for Firefox Source Docs!
zeid moved module owner and peer definitions for Desktop Firefox, Toolkit, Core, and Testing to be in tree and the wiki pages are no longer used.

Thanks for reading and see you next month!

ISHIKAWA,chiaki

unread,

Oct 24, 2022, 1:48:16 PM10/24/22

to dev-pl...@mozilla.org

Build System and Mach Environment

Starting with Bug 1746462 you can use the mold linker for linking Mozilla Firefox on most Linux distributions. MacOS support will come shortly.

I tested "mold" for building C-C thunderbird.

I was impressed.

|The I/O speed during linking "libxul", the large libary, is near the max bandwidth of my local disk setup (on a linux guest within VirtualBox hosted under Windows 10).
I watched the xosview's disk I/O activity with amusement.
As is described in Bugzilla, then all of a sudden the multiple CPU's got busy (I have assigned 7 vCPUs to my linux guest and 16GB of memory assigned) and linking was over.
In contrast, with GNU gold which I have been using, I see prolonged I/O not near the maximum bandwidth and single CPU getting busy during long linking.

Of course, the linking is only a small portion of the whole build process, but it *IS* a lengthy process.

This is a great work.

Keep the good work going!

Chiaki

ISHIKAWA,chiaki

unread,

Oct 25, 2022, 1:57:25 AM10/25/22

to dev-pl...@mozilla.org

I think this has a large impact on tryserver.

Before, I have noticed that typical build has a very long tail end of single CPU usage. This was the long linking process, I think.
I mentioned once on one of the mailing list (or bugzilla?) that the link ought to get started as early as possible to avoid this.
It seems it was started rather early after all, but slow linking showed this behavior.
However, with mold, the link process actually ends rather early and such a tail end of single CPU usage is not visible at all now.

In my local setup where I monitor the compilation through emacs shell buffer, the verbose output of housechore commands (it seems build was creating jar files under various directories)
is printed for like several seconds very quickly at the end (very light CPU consumption way after the burst of heavy parallel usage is gone) and the build ends then.
I have never seen such end behavior of build before. It used to be a very long link process at the end and these housechore commands ended well before link ended.

I believe someone ought to check the job workload profile in the tryserver once mold becomes widely used.
It can possibly ends the jobs quicker, or the parallel CPU usage and large memory footprint may impact the farm negatively.
It all depends on the workload profile. In my single build environment, mold performs very well to my pleasant surprise.

I wonder if it will be available under Windows (!?).
THAT will change the CPU/IO workload of tryserver computer farm.

Chiaki

Mike Hommey

unread,

Oct 25, 2022, 2:03:01 AM10/25/22

to ISHIKAWA,chiaki, dev-pl...@mozilla.org

On Tue, Oct 25, 2022 at 03:01:07PM +0900, Mike Hommey wrote:

> On Tue, Oct 25, 2022 at 02:57:23PM +0900, ISHIKAWA,chiaki wrote:
> > On 2022/10/25 2:48, ISHIKAWA,chiaki wrote:
> > >
> > > >
> > > > Build System and Mach Environment
> > > >

> > > > *
> > > >
> > > > Starting with Bug 1746462
> > > > <https://bugzilla.mozilla.org/show_bug.cgi?id=1746462>you can use

> "tryserver" is using lld, which is not that much slower than mold.
> Linking is not what's costly there.

Local builds default to lld too, by the way.

Mike

ISHIKAWA,chiaki

unread,

Oct 25, 2022, 4:11:53 AM10/25/22

to dev-pl...@mozilla.org, Mike Hommey

I should have mentioned the background of my local build.

I wanted to use -gsplit-dwarf option to GCC. Yes, I use gcc to compile
C-C TB, and this necessitated the use of gnu gold.

Using -gsplit-dwarf makes starting gdb makes much, much quicker (on my
local CPU at least a few years ago).

Unfortunately, lld did not produce complete -gdb-index (it initially did
not handle it at all IIRC).
I am not sure if it does today. From a post early last year.
https://groups.google.com/g/llvm-dev/c/hxnPll-6de0

Thus I was forced to use gnu gold for that reason alone.
mold is a complete replacement for this situation (-gsplit-dwarf to the
compiler and linker passed an option to create gdb-index.
I checked and I can use produced binary with gdb without an issue in a
few simple tests.)

For that situation, mold is a life saver.

(BTW, separating the debug info into independent files seems to speed up
gnu gold link time as well. That is why I tolerated the
gnu goldk, but I cannot recall the detailed performance numbers.
I DID compare gnu gold and lld. But I think I reverted to gnu gold to
enjoy the fast gdb startup when -gsplit-dwarf to GCC is used.
The only consternation I experience is, during local mochitest and
xpcshell test, the dumper that prints the stack backtrace does not
understand the split debug symbol files and thus prints numeric
addresses only when MOZ_ASSERT is hit during local test.
VERY OLD dumper script handled it, but a couple of years back, it was
replaced and it no longer handles the split debug symbol file).

For those developers who need to use gdb in edit-compile-debug cycle
often, GCC's -gsplit-dwarf makes the gdb startup very quick and I like it.
Highly recommended, but I know people may not go down to gdb that quite
often.
Even myself, I don't use gdb every day, but when it comes to really
subtle/complex bug, I have to use gdb often repeatedly :-(

(Note: the recent fast CPUs may have made the speedup factor not much
noticeable after all.
The absolute speed of recent CPUs may shrink the initial long time of
gdb without -gsplit-dwarf down to a tolerable level to the owners of
such very fast CPUs.)

I should have mentioned this feature of mold as a complete replacement
for gnu gold from the viewpoint of "-gsplit-dwarf" and necessary
gdb-index creation.

TIA

Chiaki

Reply all

Reply to author

Forward