Engineering Effectiveness Newsletter (April and May 2024 Edition)

321 views
Skip to first unread message

Andrew Halberstadt

unread,
Jun 6, 2024, 2:17:14 PM6/6/24
to dev-pl...@mozilla.org, Firefox Dev

Welcome to the April and May edition of the Engineering Effectiveness Newsletter! The Engineering Effectiveness org makes it easy to develop, test and release Mozilla software at scale. See below for some highlights, then read on for more detailed info!

Highlights

  • The Select Translations MVP has landed in Nightly, and the feature is scheduled to ride the trains to ship in Firefox 128. This allows users to translate selected text via the context menu.

  • A bug in Identical Code Folding detection was fixed for Firefox Desktop and Android builds. This leads to a 20MB reduction on Firefox Desktop build size and a 2MB reduction on Android!

  • We published Linux ARM64 Nightlies, which have seen a steady increase in DAU/MAU since launch. The deb package already represents 45% of ARM64 MAU.

  • Mozillians across many teams (both within EE and without) successfully rotated the Certification Authority we use to sign Firefox plugins and addons! This prevented a third “Armag-addon” (only this one would have been much worse).

  • We kicked off our first big parallel translations training run! This follows a long effort to stabilize the incredibly complex pipeline such that it can run hundreds of training tasks in parallel.

  • We added support for running tests on try matching tags in the manifest. Now you can do ./mach try fuzzy –tag <tag> and only tests annotated with that tag will be selected (WPT and Reftest based suites are not yet supported).

Detailed Project Updates

Bugzilla and Bugbug

Build System and Mach Environment

  • Serge Guelton fixed a bug in Identical Code Folding (ICF) detection for Firefox desktop and Android builds. This leads to a 20MB reduction on Firefox desktop build size and a 2MB reduction on Android!

  • Serge Guelton reduced the execution time of mach configure + mach export by ~25%, mostly through parallelisation of various operations.

CI and Treeherder

Crash Management

  • Suhaib Mujahid and Marco Castelluccio published a paper titled “Predicting the Impact of Crashes Across Release Channels” at the MSR conference. The paper was published in collaboration with Diego Elias Costa from Concordia University

  • Many issues were fixed in the new crash reporter client, including: improved localization, better Thunderbird support and superior backwards compatibility with the old client.

  • Gabriele Svelto ensured crash reports intercepted by the Windows Error Reporting runtime exception module now always contain an install time.

  • The Linux symbol scrapers have been expanded to cover more packages and not fail when presented with huge amounts of debug information in a single pass.

  • Crash Pings submitted over Glean on desktop now contain the full telemetry environment and the crash stack.

Lint, Static Analysis and Code Coverage

  • Marco Castelluccio, Christian Holler and Jason Kratzer published a study about code coverage gaps and automatic generation of tests, titled “Mind the Gap: What Working With Developers on Fuzz Tests Taught Us About Coverage Gaps”. This study was published at the ICSE conference in collaboration with Carolin Brandt and Andy Zaidman from Delft University of Technology and with Alberto Bacchelli from the University of Zurich.

OS Integration and Security

  • QA has begun testing our integration of the DLP (data loss prevention) SDK support in Nightly. This is an enterprise feature allowing data loss prevention vendors such as Broadcom and Trellix to integrate with Firefox in a more reliable and stable manner.

PDF.js

Firefox Translations

  • We kicked off our first big parallel training run! This follows a long effort to stabilize the incredibly complex pipeline such that it can run hundreds of training tasks in parallel.

  • Greg Tatum created a dashboard that shows the current training run’s progress. Updates are also manually tracked in this spreadsheet (which also contains a link to the most recent dashboard).

  • We will train the first half of the model pipeline (up until a single teacher training) and look at the initial evaluation results. If the models are good enough to continue, we'll trigger the rest of the training to go until the final production ready models.

  • The first wave will be the models going into English, because there is a lot of English monolingual data available. After the first wave, we'll continue with a second wave going from English. We can bootstrap this second wave with our xx-en models we trained in the first wave.

  • It's about 3-4 weeks for a full training run for a single language direction. The first stage we're stopping at is about 1 week of training. This is all dependent on data size, and it will be variable.

  • Evgeny Pavlov has been leading up a big part of the work on developing our training recipe, and coordinating with Teklia contractors to get our experiment tracking integration with Weights and Biases set up.

  • Ben Hearsum has done significant work to ensure that we can train new language pairs on preemptible GCP instances, which will greatly lower the financial cost of training them.

  • Erik Nordin has nearly completed the implementation of the Select Translations MVP, and the feature is scheduled to ride the trains to ship in Firefox 128.

Phabricator , moz-phab, and Lando

  • Connor Sheehan completed a migration of the Treestatus tool from a standalone service owned by RelEng into a feature of Lando. The new Lando Treestatus has a proper test suite and the UI is implemented in technologies familiar to our engineering teams.

  • Connor Sheehan implemented several of the hook checks on hg.mozilla.org as checks within Lando, which is required for the hg->git migration.

  • Connor Sheehan added support for the cypress project branch to Lando/Phabricator.

Release Engineering and Management

Version Control

  • Van Le, Greg Cox and Connor Sheehan worked to increase the amount of RAM on hg.mozilla.org, eliminating many OOM issues and making the service more stable.

  • Connor Sheehan added a pushchangedfiles endpoint to hg.mozilla.org, which is a minimal and more performant version of the json-automationrelevance endpoint used by various tasks in CI, and Andrew Halberstadt updated CI to use it.

Other


Thanks for reading and see you next time!

Reply all
Reply to author
Forward
0 new messages