Good question; I have some ideas for you, but I have to tell you that
this won't be a straightforward or one-size-fits-all solution.
First of all, gc doesn't happen in cases where you still have actual
references to those git objects. In some cases only
experimental/feature (now stale) branches contain large objects and
deleting them, then running gc helps, but the majority is on the
'main' branches/tags as well, then you'll have to take other measures.
The simplest option, client only:
As a client cloning the project, use *shallow* clones/fetches [1]. The
server has to operate still on a full repository, so it doesn't help
to reduce the size of the repository there, but clients cloning it can
get more of a truncated version of the history. However,
unfortunately, while shallow clones using --depth are supported from
Gerrit, --shallow-since=<date>/--shallow-exclude=<rev> isn't yet. See
Issue 11564 [2] which I just filed as feature request.
Also doing shallow clones only proves to be making sense for
single-branch clones, which is the default when doing this. Ie.
speicifying --no-single-branch on a --depth=100 still shows me full
history on other (not-HEAD) branches.
For fixing this on the Gerrit server, this is much harder. Your server
probably has references to many of those commits if you've used Gerrit
changes over all those years. If I am correct, removing the history
will result in an inconsistency in the data (either ReviewDb or
NoteDb) and cause more problems than you want to deal with.
A nice solution for that doesn't really exist, but I have a few ideas:
Set up replication, and make it only replicate the currently active
branches and pattern of tags. Before doing so, make sure to have
created a manual shallow clone. Then have users clone from the replica
instead of directly from Gerrit. Clones should then only include
active branches, and if needed they can add the original Gerrit as
remote with their git client to fetch full history.
This is basically a user friendly set up from the first option, but
the major downside is the complicated replication set up, and the
Gerrit server still having full history.
A third option could work in case you have not that many commits, but
you simply don't care about most of the large files anymore (e.g.
images). Try setting up a local clone and filter the history with a
git filter-branch [3] script (deleting bloat data on each commit)
until you're satisfied. As this rewrites history and thus affects
commit hashes, you'll lose the references to the original revisions
and signatures will not validate any longer. Moreover, you'll have to
push to a new project on Gerrit and lose all the review metadata as
well. You may want to try to include a change in the commit message of
the filter-branch actions to include the original commit and then push
to refs/for/<branch> with the 'submit' push option [4] (perhaps also
include 'skip-validation') to let it create 'merged' changes and
include all data in the index, making it 'searchable' at least (but
you won't have original review data any longer).
An alternative to git filter-branch is CopyBara [5]. It allows you to
set up transformations (in your case to reduce the size, ignore some
paths via origin_files/destination_files [6]) and it supports adding
the original commit revision in the footer message ('GitOrigin-RevId'
footer via set_rev_id workflow setting). Then when finding a reference
to a git commit sha1 hash in the history, you should be able to find
the new change using Gerrit search. It's not ideal, but it could work.
Hardest part is is handing plethora of branches/tags here...
For the future, consider limiting the size of the commits to avoid
accepting bloat in your repositories ("Maximum Git object size limit"
project setting).
HTH
Gert
[1]:
https://git-scm.com/docs/git-clone/2.23.0#Documentation/git-clone.txt---depthltdepthgt
[2]:
https://bugs.chromium.org/p/gerrit/issues/detail?id=11564
[3]:
https://git-scm.com/docs/git-filter-branch/2.23.0
[4]:
https://gerrit-documentation.storage.googleapis.com/Documentation/3.0.2/user-upload.html#auto_merge
[5]:
https://github.com/google/copybara
[6]:
https://github.com/google/copybara/blob/15f519039076ff49562ee96420c970356fea48b6/docs/reference.md#coreworkflow