Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Post-mortem of 0.25.0 release

3 views
Skip to first unread message

Andreas Tolfsen

unread,
Sep 10, 2019, 2:18:33 PM9/10/19
to tools-marionette, Johan Lorenzo, Nathan Froyd
Hi all,

Quite a few things have changed since the last geckodriver release
back in January, and I want to spend a few moments looking into
some of the challenges I faced making today’s 0.25.0 release.

Crate dependencies by relative path cause problems when they are
attempted published, as crates.io needs a specific semver range.
For each release we make this necessitates local changes to Cargo.toml
coupled with "cargo publish --allow-dirty", and this is bad for a
number of really good reasons. I intend to fix this and I’ve gone
into more detail on the bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=1579902#c0

0.25.0 was also released entirely without Travis. Earlier today
we decided to disable all Travis hooks on GitHub, and I’m working
on removing any leftover configuration:

https://bugzilla.mozilla.org/show_bug.cgi?id=1580265

The primary motivation for using binaries built on TaskCluster has
been code signing. With 0.25.0 we are shipping builds signed with
the same key as Firefox on Windows and macOS. We should in the
future set up a process that also publishes the ASC (PGP) key so
downloads can be verified against Mozilla’s public chain of trust.

Releases on GitHub cannot be drafted without creating a new git
tag. To circumvent this I created v0.25.0 which points to the exact
same thing as v0.24.0. We should revisit the discussion to wean
us off GitHub before the 1.0 release.

The Linux builds produced on the TaskCluster signing jobs appear
to be debug builds (>60M in size) which make them unsuited for
publication:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=geckodriver&group_state=expanded

I don’t know if we have a bug that tracks this?

For 0.25.0 I built the Linux binaries locally and stripped the debug
symbols manually. The 32-bit Linux build using cross-compilation
worked without a hitch:

% cargo build --target=i686-unknown-linux-musl --release

I have a concern we should follow up on that the Linux binaries are
not built using static linking and libmusl on TaskCluster. I didn’t
yet file a bug for this because I’m time-constrained, and before
we do so we should confirm with the build team if it actually matters
that we ship statically linked binaries.

Following the elimination of the libbzip2 dependency, I can’t see
anything of particular concern in a dynamically linked build:

% ldd ldd obj-x86_64-pc-linux-gnu/dist/bin/geckodriver
linux-vdso.so.1 (0x00007ffea78e0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5e351af000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5e351aa000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5e351a0000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5e35186000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5e34fc5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5e35965000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5e34e42000)

In the past the commit containing the geckodriver version bump has
been deemed the canonical commit from where we have tagged and made
the release. This is no longer the case when we have twice-daily
signing tasks on TaskCluster.

We sign release twice daily for Nightly, and for 0.25.0 I picked
up the first subsequent signing job following the version bump. I
feel there is an argument to be made that releasing geckodriver
should be more flexible than having to wait for the twice-daily
Nightly.

This also has some unforeseen consequences. As whimboo pointed out
to me yesterday, I had in the interim between the version bump and
the next signing rebased and landed a stale patch from a contributor:

https://bugzilla.mozilla.org/show_bug.cgi?id=1529296

This caused an additional change to be included in 0.25.0. In this
particular case there was no cause for concern as the change was
not user-facing, but it’s not hard to imagine a case where an
experimental change intended for the next release lands between the
version bump and the next signing.

The solution I see to this problem is to either sign _every_
geckodriver build, which means we could return to the version bump
being the canonical release commit. But I don’t know how easy or
what impact it would have to sign every build for every push to
central. Alternatively we could have a “special” file that, when
touched, causes a TaskCluster signing job to be triggered. I feel
the input from jlorenzo and RelEng is needed.

When we start development on the next milestone for geckodriver we
should make it a habit to bump the version number immediately after
the release similarly to sccache:

https://github.com/mozilla/sccache/commit/d5b98e7e4b6d75c98d1e275c75922e22e28cf829

This would prevent subsequent builds produced on TaskCluster being
mistakenly branded with the previous version number despite containing
several new changes.

Finally, it comes as no surprise after reading this wall of text
that the instructions for releasing geckodriver are now completely
out of date:

https://firefox-source-docs.mozilla.org/testing/geckodriver/Releasing.html

I don’t feel compelled to update those until we’ve addressed at
least a few of the points raised above.

Johan Lorenzo

unread,
Sep 11, 2019, 5:19:16 AM9/11/19
to Andreas Tolfsen, tools-marionette, Nathan Froyd
Congratulations for this big release! It looks really exciting \o/

The solution I see to this problem is to either sign _every_ geckodriver
> build, which means we could return to the version bump being the canonical
> release commit. But I don’t know how easy or what impact it would have to
> sign every build for every push to central.
>

Running the signing jobs per push is doable, from an implementation
perspective. I don't think it has a big implementation cost. That said, I
don't think it's worth doing it, for now. Here's my current thinking:

- We currently trigger nightlies twice a day, which translates into 1
nigthly every 2 pushes, roughly. The granularity is just slightly increased.
- For this particular case, build on every push wouldn't have solved the
problem, because both commits were part of the same push to central:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=d8ea766a18e5c135e093a4585407619b1a7bd00c

Alternatively we could have a “special” file that, when touched, causes a
> TaskCluster signing job to be triggered.
>
That may work. Do you want to stop having signed builds if this special
file hasn't changed? For what it's worth, changing such a file would have
solved the problem either, for the same reason.

In my opinion, the problem that was faced is more within the realm of
Release Management, than Release Engineering. Please let me know what you
all think 🙂

Andreas Tolfsen

unread,
Sep 11, 2019, 7:09:26 AM9/11/19
to Johan Lorenzo, tools-marionette, Nathan Froyd
Thanks for your feedback, Johan!

Also sprach Johan Lorenzo:

>> The solution I see to this problem is to either sign _every_
>> geckodriver build, which means we could return to the version bump
>> being the canonical release commit. But I don’t know how easy or
>> what impact it would have to sign every build for every push to
>> central.
>
> Running the signing jobs per push is doable, from an implementation
> perspective. I don't think it has a big implementation cost. That
> said, I don't think it's worth doing it, for now. Here's my current
> thinking:
>

> • We currently trigger nightlies twice a day, which translates into
> 1 nigthly every 2 pushes, roughly. The granularity is just slightly
> increased.
>

> • For this particular case, build on every push wouldn't have solved
> the problem, because both commits were part of the same push to
> central:
> https://hg.mozilla.org/mozilla-central/pushloghtmlchangeset=d8ea766a18e5c135e093a4585407619b1a7bd00c

I didn’t consider either of these things, but you’re of course
entirely correct!

Thinking more carefully about this, I wonder if not the current
situation is fine. We’re generally careful of what reaches autoland
and in which order, and I suppose the complication here was the
unpredictability of the inbound merge.

This does mean, however, that we have to be careful not to push any
more changes that would impact the release until after the next
merge to central.

Henrik Skupin

unread,
Sep 11, 2019, 5:28:24 PM9/11/19
to mozilla-tool...@lists.mozilla.org
Andreas Tolfsen wrote on 10.09.19 20:18:

> The Linux builds produced on the TaskCluster signing jobs appear
> to be debug builds (>60M in size) which make them unsuited for
> publication:
>
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=geckodriver&group_state=expanded
>
> I don’t know if we have a bug that tracks this?

As we know this is only a problem for geckodriver builds as produced by
the build job. The newly created toolchain build jobs (bug 1534533) for
geckodriver don't suffer from that huge binary problem (bug 1442253).

> For 0.25.0 I built the Linux binaries locally and stripped the debug
> symbols manually. The 32-bit Linux build using cross-compilation
> worked without a hitch:
>
> % cargo build --target=i686-unknown-linux-musl --release

The 64-bit build is also done via the toolchain build job. So for the
next release we would only have to trigger that manually on
mozilla-central. The 32-bit builds aren't built at the moment and if we
want to continue those please file a bug to get it added.

> I have a concern we should follow up on that the Linux binaries are
> not built using static linking and libmusl on TaskCluster. I didn’t
> yet file a bug for this because I’m time-constrained, and before
> we do so we should confirm with the build team if it actually matters
> that we ship statically linked binaries.

I would suggest you just file the bug. It doesn't take long but gives us
a bit of time for getting the required feedback. Or as alternative send
an email.

> In the past the commit containing the geckodriver version bump has
> been deemed the canonical commit from where we have tagged and made
> the release. This is no longer the case when we have twice-daily
> signing tasks on TaskCluster.
>
> We sign release twice daily for Nightly, and for 0.25.0 I picked
> up the first subsequent signing job following the version bump. I
> feel there is an argument to be made that releasing geckodriver
> should be more flexible than having to wait for the twice-daily
> Nightly.

That's why we have the toolchain build jobs, which we should be able to
trigger for each commit on central. I hope that with signing it will all
be fine. See bug 1577110.

> The solution I see to this problem is to either sign _every_
> geckodriver build, which means we could return to the version bump
> being the canonical release commit. But I don’t know how easy or
> what impact it would have to sign every build for every push to
> central. Alternatively we could have a “special” file that, when
> touched, causes a TaskCluster signing job to be triggered. I feel
> the input from jlorenzo and RelEng is needed.

Toolchain build jobs should only run when there are changes to the code
under /testing/geckodriver, testing/webdriver, and testing/mozbase/rust.
So we do not run those jobs for each and every commit. But as checked
right now this is not happening. I filed bug 1580622 to discuss /
investigate it.

> When we start development on the next milestone for geckodriver we
> should make it a habit to bump the version number immediately after
> the release similarly to sccache:
>
> https://github.com/mozilla/sccache/commit/d5b98e7e4b6d75c98d1e275c75922e22e28cf829
>
> This would prevent subsequent builds produced on TaskCluster being
> mistakenly branded with the previous version number despite containing
> several new changes.

This should go in with the same or follow-up commit as needed for the
changeset number for the Github releases page. It should be documented.

> Finally, it comes as no surprise after reading this wall of text
> that the instructions for releasing geckodriver are now completely
> out of date:
>
> https://firefox-source-docs.mozilla.org/testing/geckodriver/Releasing.html
>
> I don’t feel compelled to update those until we’ve addressed at
> least a few of the points raised above.

Sure.

--
Henrik Skupin
Senior Software Engineer
Mozilla Corporation
0 new messages