Hi Alex, and thanks for the quick reply!
> As you know, the RC marks when we release Go to Google. We aim to cut
> the release 4 weeks after the RC in order to ensure that there is
> enough time for new software at Google to be deployed with the latest
> version of Go. When we find an issue that needs to be fixed, it means
> we need to cut a new RC and give that enough time to be deployed to
> production as well.
>
> Once the first RC was out, we ran into a challenging issue that was
> difficult to predict or reproduce,
https://golang.org/issue/47441.
> It's hard to say how we could have handled this particular issue
> better, as it took time to investigate, form a proper fix, then also
> roll out to Google. While the issue has since been resolved, we're
> still investigating this across the team.
That makes sense. I think we have to work with the assumption that most
releses will have at least one challenging issue that doesn't get
spotted until production testing begins.
I'm particularly worried about upcoming releases, because generics will
bring far more invasive changes than we've seen for a while. And, as the
project grows, I also imagine we'll tend to merge more changes into each
release.
> For scheduling the tree opening, we could consider opening the tree
> before the final release, as we have a branch created as part of the
> RC. There's some challenges if we have to backport fixes to the
> release branch, but it's not insurmountable. The biggest risk is
> distracting from fixing RC issues, but I don't see anyone mistaking
> priorities.
That's a good idea, and I believe recent releases have already opened
the tree a few days before the final release. Though it's worth
remembering that it's just a minor mitigation to one of the symptoms.
> Another idea may involve releasing to Google earlier outside of the RC
> cycle, which should help us discover these issues earlier in the cycle.
That's what I was aiming for when I mentioned we could aim to release
rc1 earlier - to have an extra week or two to discover and iron out
last-minute tricky issues. I don't think the details matter as long as
we have the extra time to help prevent delays.
> As you mentioned, we love the idea of Go users and companies testing
> betas and RCs. The feedback is incredibly valuable, especially for
> learning about edge cases in the runtime and the go command.
>
> I think that improving our communication after the first RC, as well
> as the possible process changes to tree-opening and internal releases,
> will be a step in the right direction to improving this.
A big thumbs up to better communication. I think each release should
gain two "calls to action" to drive that involvement:
1) When beta1 is out, email golang-nuts and post a tweet telling people
about the main great new features in the release, asking users to
download the release and *run all their tests on it*.
I don't think it's fair to ask downstream users to start deploying
beta1 to production, but running their test suites (e.g. as an extra
job on CI, or once on a laptop) could already flag plenty of
regressions a whole two months before the final release is due.
2) When rc1 is out, send a similar email as before, and now asking
enterprises and personal users to start running it in one of their
live environments, ideally production.
And the reasons why they should do that will be made clear:
* Google has been running it on production for a week
* Early widespread testing will ensure a release on time
* A late release will make the next merge window smaller
The last two points, I think, will help convince many downstreams to do
their part. I know plenty of enterprise users who simply "can't wait"
for a particular new feature to release, like go:embed or generics.
We should make it clear that less testing contributes to a later release
and a smaller changelog in the following release.
Plus, for many downstreams it can be hard to contribute during the
freeze months, unless they have specific experience with a release bug
or its fix. The testing would be invaluable to the project, and allow
third parties to get involved by just investing a couple of hours.
I think both emails should also encourage enterprise downstreams to
reply in-thread with their experience, even if it's just to say "we've
been running it for two weeks and there are no problems". That should
help you gauge how many people are helping, and what rough portion of
them are running into issues.
I'll let you get back to the release work now :)