Hi Luca,
On Thu, May 21, 2020 at 1:40 AM Luca Milanesio <
luca.mi...@gmail.com> wrote:
>
> On 21 May 2020, at 03:47, David Pursehouse <
david.pu...@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 11:03 AM Elijah Newren <
new...@gmail.com> wrote:
>>
>> > Why doing offline reindex? There are *NO SCHEMA CHANGES* in v3.1.4.
>> > (See
https://www.gerritcodereview.com/3.1.html#schema-changes)
>>
>> Because it was mandatory; 3.1.4 would not start until I did. Perhaps
>> if you're upgrading from 3.0.x to 3.1.4 you don't need to reindex, but
>> going from 2.15->2.16->3.1 (with 2.16 only intermediate and gerrit
>> still offline),
>
>
> Ah, I missed the fact that you skipped 3.0. Yes, of course if you skip steps, you need to go through the pain of off-line reindexing :-(
>
>> it refused to start and told me I needed the changes
>> index built before it would run. I spent a week or two trying to find
>> ways around the offline index,
>
> Did you write to the mailing list and we did not answer for a week?
No, I had read in several places over the years that upgrading through
individual versions was required and that offline reindexing was
mandatory if you skip major versions. I saw it so much, I just
assumed the answer without asking.
(In the past, I once succesfully did an test upgrade where I started
the offline reindexing, almost immediately aborted it, marked the
super-incomplete index as active, and then manually fired off a lot of
indexing jobs. In fact, it was the only way I could get the upgrade
to work since the offline reindexing never completed and the plugin
story was an even bigger mess back then. But some other backward
compatible change, I think with the hooks, prevented us from upgrading
at the time, and then Gerrit-2.15 made the offline reindexing
dramatically faster and actually complete before being terminated. I
didn't take good notes on how I did that
kinda-sorta-bypass-the-reindexing step, though, and didn't know how to
replicate it or if it was even possible.)
> There are also company giving Gerrit Enterprise Support with support SLA, that would definitely saved you time.
Yes; currently there's a long ongoing battle between GHE and Gerrit
proponents within our company (Stash and Gitolite lost already years
ago, though there was a time when all four were active); nearly all
repositories have moved to GHE, but one big super-important repo
hasn't and has lots of developers that do not want it to move. The
GHE side seems to have the upper hand, at least among leadership
(among developers it's probably more of a draw, although maybe that's
just my proximity to fellow proponents showing through). The "cost of
conversion" is probably what has kept a mandate from coming down, but
any question about putting resources into Gerrit (e.g. "we need to
update off an EOL version") or even announcements of downtimes for
upgrades is often met with "Would it make sense to just switch to GHE
now?"
I might try to play the GerritForge/GerritHub "Gerrit+GHE" card if
we're forced to make GHE the source of truth and even ask for approval
for Enterprise Support (they did so with Reviewable.io on top of GHE
after all), but for the most part I like to avoid political battles as
much as I possibly can.
Your point is well taken, though, and I am certainly keeping this in mind.
>> or trying to reduce the amount of
>> downtime needed while waiting for an offline reindex, all to no avail.
>> It sounds like I was close with my idea of reindexing an old version
>> of the data and copying it over based on Matthias' comments in another
>> thread, but he has some secret sauce that I was missing (and still
>> am).
>>
>> Yes, I know I could have run every intermediate version of Gerrit in
>> prod for a while, but I vastly preferred a half day outage on a
>> weekend than dragging out the upgrade over multiple weeks and playing
>> roulette with whether I could get all the plugins working on all
>> intermediate Gerrit versions. (Huge thanks to David for fixing up
>> find-owners and saml on 3.1 for me, by the way.)
>
>
> We typically make sure that all the most popular plugins are in workable state in the past supported releases.
> The issue I believe with plugins is that some times we allow breaking changes on stable branches (we shouldn’t do that !!!) and that breaks plugins on stable branches.
> Those breakages are unnoticed because we don’t rebuild 100s of plugins for every single Gerrit change.
>
>>
>> I know _you_ can't take a half day outage and even a 15 second outage
>> is huge; but while a half day outage on a weekend was big enough to
>> make us squirm very uncomfortably, for _me_ the really huge thing was
>> worrying about getting plugins to function on that many different
>> versions of Gerrit.
>
>
> What are the plugins that gave you trouble?
1) Auth has been a historical painpoint, across multiple auth systems.
In the Gerrit-2.5 era, the changes to be strict about captizalization
with LDAP as the backend (not noted in the release notes IIRC) caused
us a fair amount of debugging work (sadly someone had capitalized a
domain component somewhere). Granted, that wasn't a plugin, but kind
of falls under auth.
Around the 2.10/2.11 ERA I was given a dictate to move Gerrit to the
cloud and to not use LDAP anymore. I don't think the saml plugin
existed back then; I certainly didn't find it in searches. We used
the GitHub plugin against an internal GHE, but _only_ for auth. At
the time, the plugin was on some random github repo or something, not
closely associated with the Gerrit project. The documentation was bad
and assumed you'd use
github.com, and also only showed how to build
against 2.10 (and I think used pre-bazel instructions in contrast to
2.11 talking about bazel). And it seemed to remain that way for a few
years, making me concerned about upgrades. Of course, there were also
the breakages in the hooks API (some of which I thought made things
better so I was sympathetic, but meant a lot of work for us to convert
our hooks over) around that time, and the massive reindexing problems
trying to skip from 2.11 to 2.13 or further.
When we switched to 2.15, the saml plugin existed and we picked it up
(most Gerrit outages had been caused by GHE outages, and it felt weird
using GitHub OAuth). Good ol'
com.thesamet.gerrit.plugins.saml.SamlWebFilter, which was also on some
random github repo, had horrible documentation, didn't even build, and
took approximately a full day for some ADFS engineer I suckered in to
helping out to figure out how to configure the ADFS servers to talk to
this plugin.
With 2.16, the plugin story was getting better. The github and saml
plugins were both on
gerrit-ci.gerritforge.com. The saml plugin was
now com.googlesource.gerrit.plugins.saml.SamlWebFilter. Was very
encouraging, but upgrading to it broke auth. Had to get another ADFS
engineer involved and he spent a couple hours figuring it out what
changes were needed on the server.
With Gerrit-3.1, I wanted to try the
gerrit-ci.gerritforge.com saml
plugin too. Too bad the build was red and had been. David Ostrovsky
helped me out. Ran with that plugin, but auth broke. Roped in the
same ADFS engineer as helped with 2.16. Found out it was the plugin
that was broken this time rather than that more server config needed
to be changed; the plugin itself on the master branch was missing an
important change that had been included in the 2.16 branch. David
Ostrovsky again helped me out, merging that change in to master and
giving me a new build. That one worked well perfectly for us.
2) Owners was a huge problem.
We first installed the owners plugin with 2.15, where it was just for
a specific user with a special usecase. Discovered that the plugin
literally made CRs inaccessible with nasty stacktraces in the
error_log if you used certain features of the plugin. Filed some bugs
in the gerrit tracker (never got a response). Ended up using the
plugin anyway because of the importance of the usecase after first
verifying that the plugin would only be used in a very specific manner
that wouldn't trigger the bugs I saw on staging. Almost immediately
saw the plugin being adopted by dozens of folks for other cases, and
had to repeatedly warn folks to avoid certain constructs. People
complained about the owners plugin for years, especially the default
+2 requirement (and how unclear it was in the docs to attempt to allow
just a +1 requirement). The docs weren't the best either, assuming
that you'd use a submit_rule instead of a submit_filter. Had lots of
stacktraces in the error_log over the past couple years from this
plugin, but things seemed to mostly work. I was super uneasy with
this plugin the whole time.
gerrit-2.16 made this worse. I wanted to upgrade shortly after
release, but that release broke something with the owners plugin so it
needed to be updated. Watched and checked frequently at first, then
stopped at some point and 8-10 months later noticed there was an email
from Luca to this list saying there was now a CR which updated it. I
went to the CR and saw that it was abandoned with no link to any new
CRs. (I think it did get fixed not long after that specific time I
checked, but it was fixed differently and in a different CR. At the
very least, I didn't find it at that time.)
There were a couple times I started investigating an upgrade again (I
typically attempt a gerrit upgrade like 2-3 times for every time we
actually do the upgrade, but just run into issues and abort), and the
plugins as found on
gerrit-ci.gerritforge.com suggested to me that it
wasn't building for various stable branches (or there was only a
master branch that I'd have to try my luck with). Of course, by the
time we were finally read to upgrade, and the building of the owners
plugin had long since been sorted out upstream, I was at the point
where I wanted very badly to not ever touch that plugin again.
With the upgrade from 2.15->3.1, we decided to switch from owners to
find-owners. People have been *extremely* happy with the change, but
find-owners didn't have releases that worked with 3.1 on
gerrit-ci.gerritforge.com when I went looking. David Ostrovsky built
one for me when I asked; super helpful.
3) The typical fact that
gerrit-ci.gerritforge.com will often have
plugins without a version for a given branch, and will often have red
builds, some dating back as much as a year. (And when I attempt to
"just use master" like I did with find-owners, I find it's been
adapted to some API break and thus will only work with development
versions of Gerrit, not with the stable releases I am trying.)
4) gitiles was once upon a time a concern. The configuration needed
to get it or gitblit or cgit going was kinda painful once upon a time,
but having gitiles pulled in as a core plugin was hugely beneficial in
reducing upgrade concerns.
5) The movement of hooks into a plugin, but more so the various
backward incompatible changes made to the hooks at the time delayed
previous upgrades. (On the positive side, those backward incompatible
changes did allow me to make the hooks cleaner and easier to maintain
in some cases). I think we found a bug somewhere that still causes
hooks to fire when they shouldn't (e.g. the hooks say that one type of
hook fires for direct pushes and a different hook fires for pushes to
refs/for/, but the actual behavior I observed was the refs/for/ hook
fired in both cases), but I didn't record the details and we just
temporarily worked around it at the time by disabling the hook for a
brief window. I know, lame report and I should dig up the details but
I'm just trying to answer "What plugins gave you trouble?"
Anyway, hope that helps somehow.