Welcome to the February & March edition of the Engineering Effectiveness & OS integration Newsletter! The Engineering Effectiveness org makes it easy to develop, test and release Mozilla software at scale. See below for some highlights, then read on for more detailed info!
PDF.js will ship in Fenix 111! Say goodbye to downloading PDFs on Android and enjoy reading them in Firefox with maximum security!
We are also one of the few readers on Android to support filling forms, JS in PDF and soon editing
A huge performance issue in Windows Defender’s Real-time Protection feature has been identified and reported. The patch from Microsoft will be released soon and appears to reduce CPU usage from MsMpEng.exe by 75% when Firefox is running. There were also additional fixes and workarounds for various crashes and performance issues caused by third-party software on Windows.
Code to provide Glean Crash Ping Telemetry in Fenix has landed! This means Fenix now has reliable stability metrics.
The minimum Python version for Mach and all commands has been bumped to 3.7! This unblocks a ton of pain points and bugs that were held up due to dependency conflicts.
Windows 11 is now a top tier test platform, running the full gamut of tests in CI.
John Pangas (jpangas)
Xuanwo
Yogesh Singla (singla007)
Anurag Bhandari (WhiteWolf47)
Elena McLeod (ElusiveEllie)
Joshua Hassan (skynette)
Prerna Dabi (prernadabi23)
Srishti Gupta (srishtig2412)
Suhaib Mujahid reduce the swinging on topcrash bugs by making autonag more restrictive when re-adding topcrash keywords.
Suhaib Mujahid refactored autonag to use the Firefox Trains API instead of parsing a wiki page, thanks to Pascal Chevrel for building the API!
Marco Castelluccio presented @ FOSDEM 2023 - Teaching machines to handle bugs and test Firefox more efficiently.
John Pangas (first contribution🌟) prepared Bugbug to upgrade to a newer version of scikit-learn.
Suhaib Mujahid gave a talk @ Montreal Software Test & Automation Meetup - Automate tedious tasks in the bug management process at Mozilla.
glob updated the Bugzilla home page, adding links to download Firefox Beta and Nightly and refreshing the design.
Dave Lawrence added an endpoint to Bugzilla that automatically comments on bugs when an associated pull request on GitHub lands
Suhaib Mujahid implemented a new feature for autonag that requests missing information when moving a bug to the Core::Performance component.
We kicked off an effort to reduce open S2 platform bugs by 75%, closing out around 80 bugs so far.
Alex Hochheiden added caching to configure compiler checks (15-50% speed-up on subsequent runs of configure with the same compiler version).
Alex Hochheiden bumped the minimum Python version for Mach to 3.7
Mike Hommey updated the Rust compiler to 1.68
Mike updated the build system to build libunwind & compiler-rt for Android and upgraded the Android NDK to r23c.
Serge Guelton improved null build run time
Alex Hochheiden landed work that restores the ability to vendor Python code into mozilla-central
Sebastian Hengst, Joel Maher and Suhaib Mujahid identified one of the main reasons for Treeherder performance issues (you know…those pesky times when it closes the trees) landed some fixes and planned some work.
Joel Maher has been reducing the volume of errors Treeherder parses (helps with TH database perf) by >50% already
Geoff Brown, Joel Maher and Jonathan Moss migrated Windows 7 from AWS -> Azure.
Joel Maher finished the pixel2 -> pixel5 migration (thanks to Jamie Nicol for the fix to wrench jobs!)
Joel Maher and Marco Castelluccio got consensus to turn off Windows 10/Aarch64 tests in CI.
Joel Maher and Jonathan Moss have turned on win11 tests in CI, we will continue to run a subset on win10.
Suhaib Mujahid and Joel Maher had the first cycle of variant expiration. In Bug 1816141, we tracked the first expiration/renewal cycle:
17 Renewed for 6 months
3 Removed
100% response rate!
Sylvestre Ledru created a sccache github action.
It allows very simple usage of sccache with GitHub storage for faster builds
For example, servo is now using sccache actions for their builds.
Sylvestre Ledru released version 0.4.0 of sccache. This version brings many more error checks, leverages OpenDAL for storage access and other improvements. Thanks to Xuanwo for all the hard work.
Marco Castelluccio made some CI artifacts expire sooner, saving dozens of TBs of cloud storage
Glob updated the Build Telemetry Dashboard to include “mach try” latency
Andrew Halberstadt implemented a taskgraph init subcommand to help projects get bootstrapped with Taskgraph more quickly. He also created a firefox-ci-playground repository for anyone who wants to try Taskgraph out.
Andrew Halberstadt created a generic “mozilla” trust domain and pools. This will allow new projects to get set up much quicker by using pre-defined resources instead of blocking on project specific ones.
Alex Franchuk landed the code to provide Glean Crash Ping Telemetry in Fenix. This means Fenix now will also have reliable stability metrics.
Valentin Rigal has been making steady progress towards before/after analysis in the code review bot
Michelle Goossens has migrated the code review and the code coverage services from AWS to GCP
Andrew Halberstadt replaced our Python linters (flake8, isort, pylint) with Ruff, a comprehensive linter written in Rust. Python linting now takes one or two seconds instead of minutes!
Bob Owen landed the code to enable Low Privileged Application Container support which we will use to sandbox some media decoders.
Yannis Juglaret identified and reported a huge performance issue in Windows Defender’s Real-time Protection feature. The patch from Microsoft will be released soon and appears to reduce CPU usage from MsMpEng.exe by 75% when Firefox is running. He also landed a series of fixes and workarounds for various crashes and performance issues caused by third-party software on Windows.
Stephen A Pohl landed support for macOS session resume after restarts of macOS, for example as a result of OS updates. We plan to use telemetry to measure how this affects users getting back into Firefox after macOS OS updates. Currently, we are observing that roughly 1% of Firefox starts are occurring after a macOS restart vs. users manually starting Firefox:
Greg Stoll wrote a blog post on hacks about third-party DLLs: Letting users block injected third-party DLLs in Firefox - Mozilla Hacks - the Web developer blog (reddit discussion 1, reddit discussion 2)
PDF.js will ship in Fenix 111, say goodbye to downloading PDFs on Android, enjoy reading them in Firefox with maximum security!
We are also one of the few readers on Android to support filling forms, JS in PDF and soon editing
Florian Quèze talked at FOSDEM to explain the work we are doing to understand and reduce Firefox’s power use. He also explained how power profiling works in the Firefox Profiler.
Zeid Zabaneh significantly reduced the time it takes Lando to load large stacks.
Zeid Zabaneh implemented various changes in Lando in preparation for revision worker, including fixes to merge conflict detection and improvements to how Phabricator data is cached
Ben Hearsum fixed mach try, so that –artifact implies –disable-pgo which makes it easier to get tests or other jobs that depend on shippable builds to run quickly.
Geoff Brown and Johan Lorenzo migrated Fenix (Android) to the new Android Monorepo (bug 1803130). Now android-components, Focus, and Fenix all reside and are all built in the same repo.
RelMan identified a major regression with the Denmark Digital ID system via the Local Firefox project started last quarter. One day later, we shipped a planned dot release with an additional fix for this regression.
RelEng, Release SRE and RelMan handled a Chain of Trust rotation incident immediately prior to shipping Firefox 110. These rotations normally take months, but we were able to get it done in two days and avoid delaying the release.
Johan Lorenzo ensured Geckoview nightly builds twice daily. He also made Android Fenix/Focus Nightlies and Betas follow the same cadence as desktop Firefox.
Johan Lorenzo gave a talk (in French) at PyCon France - Can a bunch of Python make Firefox less prone to supply chain attacks?. It details our use of Taskcluster, Taskgraph and Chain of Trust.
Pascal Chevre refreshed and migrated the Release Management Calendar page from https://wiki.mozilla.org/Release_Management/Calendar (manually edited) to https://whattrainisitnow.com/calendar/ (automated). The redirect from the wiki will be set up soon.
Pascal Chevre wrote a proof of concept called BzKarma. It creates an impact field for bugs that gives a value score for uplifting patches based on a bug metadata. Early testing proves interesting and seems to highlight some potential valuable uplifts.
Connor Sheehan, Zeid Zabaneh, and Frida Kiriakos performed a security audit of pash (the code that runs when you ssh into hg.mozilla.org). While no new issues were found, a number of code quality and test coverage issues were identified and resolved by Connor Sheehan.
Connor Sheehan and Christopher Knowles added rate limiting to hg.mozilla.org, which will reduce the load impact of spam and improper service use.
Zeid Zabaneh fixed a codesign issue when using mozregression with Thunderbird.
Rob Lemley added markdown support when exporting mots.yaml.
Zeid Zabaneh improved support for subcommands in mots when using them via mach.
Zeid Zabaneh fixed an issue with rs-parserpatch builds that were failing when using the latest rust toolchain.
The fuzzing team moved to the Security organization
Johan Lorenzo is now the manager of the Release Engineering team
Andrew Halberstadt has been promoted to P5 (Sr Staff Software Engineer)
Gabriele Svelto has been promoted to P5 (Sr Staff Software Engineer)
Stephen Pohl has been promoted to P5 (Sr Staff Software Engineer)
Ray Kraesig has been promoted to P3 (Sr Software Engineer)
Alex Hochheiden has been promoted to P3 (Sr Software Engineer)
Thanks for reading and see you next month!