Engineering Effectiveness Newsletter (February and March 2023 Edition)

52 views
Skip to first unread message

Andrew Halberstadt

unread,
Apr 5, 2023, 2:16:14 PM4/5/23
to dev-platform, Firefox Dev

Welcome to the February & March edition of the Engineering Effectiveness & OS integration Newsletter! The Engineering Effectiveness org makes it easy to develop, test and release Mozilla software at scale. See below for some highlights, then read on for more detailed info!

Highlights

  • PDF.js will ship in Fenix 111! Say goodbye to downloading PDFs on Android and enjoy reading them in Firefox with maximum security!

    • We are also one of the few readers on Android to support filling forms, JS in PDF and soon editing

  • A huge performance issue in Windows Defender’s Real-time Protection feature has been identified and reported. The patch from Microsoft will be released soon and appears to reduce CPU usage from MsMpEng.exe by 75% when Firefox is running. There were also additional fixes and workarounds for various crashes and performance issues caused by third-party software on Windows.

  • Code to provide Glean Crash Ping Telemetry in Fenix has landed! This means Fenix now has reliable stability metrics.

  • The minimum Python version for Mach and all commands has been bumped to 3.7! This unblocks a ton of pain points and bugs that were held up due to dependency conflicts.

  • Windows 11 is now a top tier test platform, running the full gamut of tests in CI.

Contributors

  • John Pangas (jpangas)

  • Xuanwo

  • Yogesh Singla (singla007)

  • Anurag Bhandari (WhiteWolf47)

  • Elena McLeod (ElusiveEllie)

  • Joshua Hassan (skynette)

  • Prerna Dabi (prernadabi23)

  • Srishti Gupta (srishtig2412)

Detailed Project Updates

Bugzilla and Bugbug

Build System and Mach Environment

CI and Treeherder

  • Sebastian Hengst, Joel Maher and Suhaib Mujahid identified one of the main reasons for Treeherder performance issues (you know…those pesky times when it closes the trees) landed some fixes and planned some work.

  • Joel Maher has been reducing the volume of errors Treeherder parses (helps with TH database perf) by >50% already

  • Geoff Brown, Joel Maher and Jonathan Moss migrated Windows 7 from AWS -> Azure.

  • Joel Maher finished the pixel2 -> pixel5 migration (thanks to Jamie Nicol for the fix to wrench jobs!)

  • Joel Maher and Marco Castelluccio got consensus to turn off Windows 10/Aarch64 tests in CI.

  • Joel Maher and Jonathan Moss have turned on win11 tests in CI, we will continue to run a subset on win10.

  • Suhaib Mujahid and Joel Maher had the first cycle of variant expiration. In Bug 1816141, we tracked the first expiration/renewal cycle:

    • 17 Renewed for 6 months

    • 3 Removed

    • 100% response rate!

  • Sylvestre Ledru created a sccache github action.

    • It allows very simple usage of sccache with GitHub storage for faster builds

    • For example, servo is now using sccache actions for their builds.

  • Sylvestre Ledru released version 0.4.0 of sccache. This version brings many more error checks, leverages OpenDAL for storage access and other improvements. Thanks to Xuanwo for all the hard work.

  • Marco Castelluccio made some CI artifacts expire sooner, saving dozens of TBs of cloud storage

  • Glob updated the Build Telemetry Dashboard to include “mach try” latency

  • Andrew Halberstadt implemented a taskgraph init subcommand to help projects get bootstrapped with Taskgraph more quickly. He also created a firefox-ci-playground repository for anyone who wants to try Taskgraph out.

  • Andrew Halberstadt created a generic “mozilla” trust domain and pools. This will allow new projects to get set up much quicker by using pre-defined resources instead of blocking on project specific ones.

Crash Management

Lint, Static Analysis and Code Coverage

  • Valentin Rigal has been making steady progress towards before/after analysis in the code review bot

  • Michelle Goossens has migrated the code review and the code coverage services from AWS to GCP

  • Andrew Halberstadt replaced our Python linters (flake8, isort, pylint) with Ruff, a comprehensive linter written in Rust. Python linting now takes one or two seconds instead of minutes!

OS Integration and Security

  • Bob Owen landed the code to enable Low Privileged Application Container support which we will use to sandbox some media decoders.

  • Yannis Juglaret identified and reported a huge performance issue in Windows Defender’s Real-time Protection feature. The patch from Microsoft will be released soon and appears to reduce CPU usage from MsMpEng.exe by 75% when Firefox is running. He also landed a series of fixes and workarounds for various crashes and performance issues caused by third-party software on Windows.

  • Stephen A Pohl landed support for macOS session resume after restarts of macOS, for example as a result of OS updates. We plan to use telemetry to measure how this affects users getting back into Firefox after macOS OS updates. Currently, we are observing that roughly 1% of Firefox starts are occurring after a macOS restart vs. users manually starting Firefox:

PDF.js

  • PDF.js will ship in Fenix 111, say goodbye to downloading PDFs on Android, enjoy reading them in Firefox with maximum security!

    • We are also one of the few readers on Android to support filling forms, JS in PDF and soon editing

Power use

Phabricator , moz-phab, and Lando

  • Zeid Zabaneh significantly reduced the time it takes Lando to load large stacks.

  • Zeid Zabaneh implemented various changes in Lando in preparation for revision worker, including fixes to merge conflict detection and improvements to how Phabricator data is cached


Release Engineering and Management

  • Ben Hearsum fixed mach try, so that –artifact implies –disable-pgo which makes it easier to get tests or other jobs that depend on shippable builds to run quickly.

  • Geoff Brown and Johan Lorenzo migrated Fenix (Android) to the new Android Monorepo (bug 1803130). Now android-components, Focus, and Fenix all reside and are all built in the same repo.

  • RelMan identified a major regression with the Denmark Digital ID system via the Local Firefox project started last quarter. One day later, we shipped a planned dot release with an additional fix for this regression.

  • RelEng, Release SRE and RelMan handled a Chain of Trust rotation incident immediately prior to shipping Firefox 110. These rotations normally take months, but we were able to get it done in two days and avoid delaying the release.

  • Johan Lorenzo ensured Geckoview nightly builds twice daily. He also made Android Fenix/Focus Nightlies and Betas follow the same cadence as desktop Firefox.

  • Johan Lorenzo gave a talk (in French) at PyCon France - Can a bunch of Python make Firefox less prone to supply chain attacks?. It details our use of Taskcluster, Taskgraph and Chain of Trust.

  • Pascal Chevre refreshed and migrated the Release Management Calendar page from https://wiki.mozilla.org/Release_Management/Calendar (manually edited) to https://whattrainisitnow.com/calendar/ (automated). The redirect from the wiki will be set up soon.

  • Pascal Chevre wrote a proof of concept called BzKarma. It creates an impact field for bugs that gives a value score for uplifting patches based on a bug metadata. Early testing proves interesting and seems to highlight some potential valuable uplifts.

Version Control

  • Connor Sheehan, Zeid Zabaneh, and Frida Kiriakos performed a security audit of pash (the code that runs when you ssh into hg.mozilla.org).  While no new issues were found, a number of code quality and test coverage issues were identified and resolved by Connor Sheehan.

  • Connor Sheehan and Christopher Knowles added rate limiting to hg.mozilla.org, which will reduce the load impact of spam and improper service use.

mozregression

Other

Org Changes

  • The fuzzing team moved to the Security organization

  • Johan Lorenzo is now the manager of the Release Engineering team

  • Andrew Halberstadt has been promoted to P5 (Sr Staff Software Engineer)

  • Gabriele Svelto has been promoted to P5 (Sr Staff Software Engineer)

  • Stephen Pohl has been promoted to P5 (Sr Staff Software Engineer)

  • Ray Kraesig has been promoted to P3 (Sr Software Engineer)

  • Alex Hochheiden has been promoted to P3 (Sr Software Engineer)


Thanks for reading and see you next month!

Reply all
Reply to author
Forward
0 new messages