Reducing size of multiple Linux kernel repos

64 views
Skip to first unread message

Vitaliy Lotorev

unread,
Jun 25, 2022, 4:16:40 PM6/25/22
to Repo and Gerrit Discussion
Hi,
Currently we have one Linux kernel repo, but managing access permissions for multiple branches/teams/contractors is hard to maintain and just doesn't fit Gerrit *-Projects inheritance model.
So the only solution would be to split branches+teams into separate linux projects, but it will increase size, GC, backups, etc.
Are there any techniques to reduce the size of multiple Linux kernel repos on Gerrit? (maybe it might be done with 'git reference').

--
Regards, Vitaliy

Vitaliy Lotorev

unread,
Jun 25, 2022, 5:51:00 PM6/25/22
to Repo and Gerrit Discussion
25 jun 2022 г. в 23:16, Vitaliy Lotorev <lot...@gmail.com>:
Answering myself:
Gitlab uses 'git alternates' for object deduplication (forked repos) [1] and Gerrit/JGit do not support git alternates [2].

Martin Fick

unread,
Jun 25, 2022, 8:18:03 PM6/25/22
to Vitaliy Lotorev, Repo and Gerrit Discussion

We us alternates with Gerrit on our busiest repos and it works. We have one of the largest multi-primary Gerrit sites in the world (and it's on linux kernel repos), so this gets some serious use.

There are many ways to setup alternates, and I suspect that it takes some expertise to come up with a strategy that matches your use case, in particular for gc. On an always live site (like most Gerrit instances), this is likely easy to get wrong. So #2 may not really be a reflection of issues with Gerrit, but rather it could be issues with managing alternates (if #2 are even alternate problems) appropriately in a server environment. But it's hard to tell because #2 seems to have potentially become a dumping ground for people with issues who happen to be running alternates. Without a clear resolution, or even a clear identification of an actual known alternates issue, or a specific alternates problem (all the error messages in #2 there are things that happen to busy sites, even without alternates), I wouldn't consider #2 much in your decision to try/use alternates,

-Martin


Oswald Buddenhagen

unread,
Jun 26, 2022, 3:54:05 AM6/26/22
to repo-d...@googlegroups.com
On Sat, Jun 25, 2022 at 06:17:56PM -0600, Martin Fick wrote:
>There are many ways to setup alternates, and I suspect that it takes
>some expertise to come up with a strategy that matches your use case,
>in particular for gc.
>
i like to think that i have that expertise. ;)
also, i think i provided sufficient evidence that the setup was working
at a pure git level.

the only hard requirement is that the alternate repo does not accept
non-fast-forwards of any kind (incl. ref deletions), so no objects
disappear. (on a non-bare repo, amends, rebases, and reflogs (incl.
stashes) would complicate matters more.)

>We us alternates with Gerrit on our busiest repos and it works. We have
>one of the largest multi-primary Gerrit sites in the world (and it's on
>linux kernel repos), so this gets some serious use.
>
since when are you doing that? does the version chronology permit the
conclusion that jgit was simply fixed since i ran into the issue? i
can't recheck it without major effort, as i have no access to the site
in question any more.
vitaliy giving it a shot would certainly provide a useful datapoint,
esp. for a simple mainline setup of gerrit (which i'm assuming his is,
unlike yours).

Martin Fick

unread,
Jun 26, 2022, 3:52:56 PM6/26/22
to repo-d...@googlegroups.com
On 6/26/22 1:53 AM, Oswald Buddenhagen wrote:
> On Sat, Jun 25, 2022 at 06:17:56PM -0600, Martin Fick wrote:
>> There are many ways to setup alternates, and I suspect that it takes
>> some expertise to come up with a strategy that matches your use case,
>> in particular for gc.
>>
> i like to think that i have that expertise. ;)
> also, i think i provided sufficient evidence that the setup was
> working at a pure git level.

I don't even see any assertions of this in the bug report, let alone
evidence of any operations that worked with pure git, did I miss them?
The assertion that it works in cgit and jgit are about different
operations, browsing (what does that mean for jgit?) than what is
claimed to not work in this bug report using Gerrit, pull and pushing?

I see no reproduction steps mentioned in that bug report, did I miss
them? There are no hints as to how the alternates are setup, or what is
trying to be achieved with the alternates setup. It sounds like you
believe you had the expertise to set this up right, and you may have.
Unfortunately, that was not very well communicated, and it was not
conveyed to me when I read that bug report yesterday. As a bug report
reader, it appeared that not very much was done to actually try and
communicate what the problem you encountered was, and that not much was
done to try and debug or resolve the issue.

So let me rephrase things without using the word "expertise". I do
strongly suggest that you be willing to file a comprehensive clear bug
report, and to follow through with good questions, if you do run into
issues if you want to use alternates (or any tool for that matter). :)

> the only hard requirement is that the alternate repo does not accept
> non-fast-forwards of any kind (incl. ref deletions), so no objects
> disappear. (on a non-bare repo, amends, rebases, and reflogs (incl. 
> stashes) would complicate matters more.)

I have no idea what the statement above means, or is about?


>> We us alternates with Gerrit on our busiest repos and it works. We
>> have one of the largest multi-primary Gerrit sites in the world (and
>> it's on linux kernel repos), so this gets some serious use.
>>
> since when are you doing that?

We have likely been doing this since sometime around 2014.

> does the version chronology permit the conclusion that jgit was simply
> fixed since i ran into the issue?

That certainly could be, and we may have had to make a fix. Perusing
jgit quickly, I see

e4714a2a5faa2d5cc8c9b129f96296dc2d6d26f8 Prevent alternates loop

which was mine. We have a bit of a FrankenGerrit/JGit running, so my
recollection of any difficulties we ran into ~10 years ago may be low.
At the time we were not too scared to dig in to the jgit codebase to
make things work if it didn't. I do think we had to do that. My
recollection was that the code base for alternates was already pretty
good at that time, but it looks like it needed that one tweak.

> i can't recheck it without major effort, as i have no access to the
> site in question any more.
> vitaliy giving it a shot would certainly provide a useful datapoint,
> esp. for a simple mainline setup of gerrit (which i'm assuming his is,
> unlike yours).

I do believe that the jgit community has the expertise to fix any issues
in this area if he runs into any, so I certainly would encourage people
to try alternates with Gerrit,

-Martin


Oswald Buddenhagen

unread,
Jun 27, 2022, 4:53:24 AM6/27/22
to repo-d...@googlegroups.com
On Sun, Jun 26, 2022 at 01:52:48PM -0600, Martin Fick wrote:
>On 6/26/22 1:53 AM, Oswald Buddenhagen wrote:
>> On Sat, Jun 25, 2022 at 06:17:56PM -0600, Martin Fick wrote:
>>> There are many ways to setup alternates, and I suspect that it takes
>>> some expertise to come up with a strategy that matches your use case,
>>> in particular for gc.
>>>
>> i like to think that i have that expertise. ;)
>> also, i think i provided sufficient evidence that the setup was
>> working at a pure git level.
>
>I don't even see any assertions of this in the bug report, let alone
>evidence of any operations that worked with pure git, did I miss them?
>The assertion that it works in cgit and jgit are about different
>operations, browsing (what does that mean for jgit?) than what is
>claimed to not work in this bug report using Gerrit, pull and pushing?
>
the fact that cgit and jgit can browse the repos (by which i presumably
meant that equivalents of `git log -p` work, but unfortunately you're
seven years late with that question) and that `git repack -a` works
fine, are sufficient evidence for the allegedly missing objects being in
fact present, and therefore the thrown exceptions being bogus.

>There are no hints as to how the alternates are setup,
>
"some repos are (restricted) supersets of other repos". as nothing else
is said, it's reasonable to assume that no other repos are involved.
from here, the only reasonable conclusion is that the superset repos use
the subset repos as alternatives.

>or what is trying to be achieved with the alternates setup.
>
the report says "to avoid duplicating all the shared data, i tried to
use alternates". also, why would that even matter?

>It sounds like you
>believe you had the expertise to set this up right, and you may have.
>Unfortunately, that was not very well communicated, and it was not
>conveyed to me when I read that bug report yesterday.
>
what you're saying here is that you are approaching reports with the
presumtion that the reporter is clueless. you may want to rethink that.

>So let me rephrase things without using the word "expertise". I do
>strongly suggest that you be willing to file a comprehensive clear bug
>report, and to follow through with good questions, if you do run into
>issues if you want to use alternates (or any tool for that matter). :)
>
i'm positive that if anyone from the team had shown *any* interest at
that time, we would have been able to fill in any missing pieces. now,
your "instructions" sound just a wee bit cynical.

>> the only hard requirement is that the alternate repo does not accept
>> non-fast-forwards of any kind (incl. ref deletions), so no objects
>> disappear. (on a non-bare repo, amends, rebases, and reflogs (incl. 
>> stashes) would complicate matters more.)
>
>I have no idea what the statement above means, or is about?
>
you made a statement about git alternates being easy to get wrong, i
documented what getting it right actually means.

>e4714a2a5faa2d5cc8c9b129f96296dc2d6d26f8 Prevent alternates loop
>
does it seem like it might match the backtraces i posted?

anyway, the question about jgit might have been wrong to start with - as
noted, jgit alone was perfectly capable of working with the repo. it
must have had something to do with how gerrit used it.

Reply all
Reply to author
Forward
0 new messages