Would large amount of refs in refs/changes/* impact git fetch/clone?

90 views
Skip to first unread message

Musab Shakeel

unread,
Sep 5, 2024, 5:32:52 PM9/5/24
to Repo and Gerrit Discussion
Fact: refs/changes/* stick around forever. They are not removed, moved or "archived" [1].

Questions
1. Deleting a Gerrit change (by an admin, via the UI for example) and then a subsequent GC run should delete the associated refs/changes/* for that particular change, right?

2. Let's say refs/changes/* has millions of refs. Would this be expected to have any impact on `git clone/fetch`, assuming we're not specifying a refspec as part of the `fetch` and are thus not fetching refs/changes explicitly? 

For 2) feels like the answer should be "no" because we're not actually fetching the changeset refs (refs/changes) but some thoughts that come to my mind:
- during upload-pack, the first step of the "negotiation" phase is "ref advertisement" where Gerrit sends back all refs it has for the repo, including refs/changes/*. Feels like this would contribute to clone/fetch overhead?

I asked this question in GerritMeets today, the response covered a different angle. The response was:
- If you have a bloated refs/changes/* space, you want to make sure you're using JGit GC and not CGit GC because latter does not recognize the special meaning of /refs/changes refs; JGit GC is optimized for this bc of Gerrit-awareness

Musab Shakeel

unread,
Sep 5, 2024, 5:43:24 PM9/5/24
to Repo and Gerrit Discussion
I guess another piece to consider is the actual objects themselves. The refs/changes/* space for the repo in question is so big partly because there are tens of thousands of abandoned / unmerged-and-will-never-merge Gerrit changes that need to be cleaned up. Once these are deleted, not only will refs/changes/* reduce in number, but the underlying objects will presumably reduce in number as well. This will translate to leaner packfiles, and this in turn might improve clone/fetch times considerably?

Luca Milanesio

unread,
Sep 5, 2024, 6:03:51 PM9/5/24
to Repo and Gerrit Discussion, Luca Milanesio

On 5 Sep 2024, at 22:43, 'Musab Shakeel' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

I guess another piece to consider is the actual objects themselves. The refs/changes/* space for the repo in question is so big partly because there are tens of thousands of abandoned / unmerged-and-will-never-merge Gerrit changes that need to be cleaned up.

You may consider using the git-refs-filter module which would automatically hide them at Git level, see [1].

Once these are deleted, not only will refs/changes/* reduce in number, but the underlying objects will presumably reduce in number as well. This will translate to leaner packfiles, and this in turn might improve clone/fetch times considerably?

Not really, because the clone is not impacted by non-heads and non-tags.


On Thursday, September 5, 2024 at 5:32:52 PM UTC-4 Musab Shakeel wrote:
Fact: refs/changes/* stick around forever. They are not removed, moved or "archived" [1].

Questions
1. Deleting a Gerrit change (by an admin, via the UI for example) and then a subsequent GC run should delete the associated refs/changes/* for that particular change, right?

Yes, assuming there is no reference in the ref-log.

2. Let's say refs/changes/* has millions of refs. Would this be expected to have any impact on `git clone/fetch`, assuming we're not specifying a refspec as part of the `fetch` and are thus not fetching refs/changes explicitly? 

No, it shouldn’t … unless you use git gc that may “mix up” objects of refs/changes/* with objects of refs/heads/* in the same packfiles.
If you use JGit's gc, they will be placed in separate packfiles.

For 2) feels like the answer should be "no" because we're not actually fetching the changeset refs (refs/changes) but some thoughts that come to my mind:
- during upload-pack, the first step of the "negotiation" phase is "ref advertisement" where Gerrit sends back all refs it has for the repo, including refs/changes/*. Feels like this would contribute to clone/fetch overhead?

No, in git protocol v2 the negotiation is about the filtered refs that the client is interested into.

I asked this question in GerritMeets today, the response covered a different angle. The response was:
- If you have a bloated refs/changes/* space, you want to make sure you're using JGit GC and not CGit GC because latter does not recognize the special meaning of /refs/changes refs; JGit GC is optimized for this bc of Gerrit-awareness

Yes, that is the answer I gave. That is based on the analysis of what the JGit gc code does and how differently behaves from git gc.
Also the real-life testing on large repos confirmed it.

HTH

Luca.


-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/25867e0f-a6d4-49da-a22d-d6ef45dac02dn%40googlegroups.com.

Luca Milanesio

unread,
Sep 5, 2024, 6:04:22 PM9/5/24
to Repo and Gerrit Discussion, Luca Milanesio

On 5 Sep 2024, at 23:03, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Sep 2024, at 22:43, 'Musab Shakeel' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

I guess another piece to consider is the actual objects themselves. The refs/changes/* space for the repo in question is so big partly because there are tens of thousands of abandoned / unmerged-and-will-never-merge Gerrit changes that need to be cleaned up.

You may consider using the git-refs-filter module which would automatically hide them at Git level, see [1].

Reply all
Reply to author
Forward
0 new messages