Git clone a large repo very slowly.

110 views
Skip to first unread message

Gavin

unread,
May 31, 2025, 1:53:39 PMMay 31
to repo-d...@googlegroups.com, Gavin
Hi team,

When I clone a remote repository, git clone stays in the 
remote: counting objects : (*/2000000) 
remote: finding sources:  (*/14000000)

 stage For more than an hour, the number * increases by 4,000 per second on average. 

Excuse me, how can I speed up the git clone? It's best to tune it through the gerrit configuration file. 

 Gerrit version v3.11

Thanks

Matthias Sohn

unread,
May 31, 2025, 4:04:49 PMMay 31
to Gavin, repo-d...@googlegroups.com
On Sat, May 31, 2025 at 7:53 PM Gavin <gavin....@gmail.com> wrote:
Hi team,

When I clone a remote repository, git clone stays in the 
remote: counting objects : (*/2000000) 
remote: finding sources:  (*/14000000)

 stage For more than an hour, the number * increases by 4,000 per second on average. 

What's the output of git count-objects -vH on the remote repository ?
Does the remote repository have a bitmap index ?
Is it served by Gerrit ?

-Matthias

Excuse me, how can I speed up the git clone? It's best to tune it through the gerrit configuration file. 

 Gerrit version v3.11

Thanks

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/CAG5NPhpLFTQwhQScxVmzkqQsex6EcU-3nCeJPFyp6w4YX3tSCg%40mail.gmail.com.

Gavin

unread,
May 31, 2025, 9:05:50 PMMay 31
to Matthias Sohn, repo-d...@googlegroups.com
yes, Is gerrit. I have two questions :
1.  On gerrit /var/gerrit/git/<repo.git> directory run git count-objects ?
2. How to check repo have a bitmap index? I understand this is a default option in gerrit

Thanks 


Matthias Sohn <matthi...@gmail.com>于2025年6月1日 周日04:04写道:

Gavin

unread,
May 31, 2025, 9:31:38 PMMay 31
to Matthias Sohn, repo-d...@googlegroups.com

repo.git]# git count-objects -vH
count: 124171
size: 507.07 MiB
in-pack: 15112959
packs: 1072
size-pack: 5.63 GiB
prune-packable: 8221
garbage: 0
size-garbage: 0


Gavin <gavin....@gmail.com>于2025年6月1日 周日09:05写道:

Luca Milanesio

unread,
Jun 1, 2025, 3:27:29 AMJun 1
to repo-d...@googlegroups.com, Luca Milanesio
Hi Gavin,

On 1 Jun 2025, at 03:31, Gavin <gavin....@gmail.com> wrote:


repo.git]# git count-objects -vH 
count: 124171
size: 507.07 MiB
in-pack: 15112959
packs: 1072

You have 1072 packs and 15M of objects to search for in them.

The “finding sources” phase is where the objects are looked at for their best representation across the 1072 packs (search-for-reuse) and in your case the complexity of the operation is 15M x 1072 = 16BN of operations.
I am not surprised that that phase is slow.

Please look at the research we’ve done on how speed-up the “search-for-reuse” phase at [1].

At this point, the only thing you can do is:

a) Shutdown Gerrit
b) Perform a full GC with prune=now of the repository (use JGit and *not* Git, as the packfile reorganisation won’t be ideal for Gerrit when using the C Git implementation)
c) Start Gerrit

After the full GC, the number of packfiles should be *exactly* 2 and you should have *exactly* 1 bitmap file.

HTH

Luca.



Matthias Sohn

unread,
Jun 1, 2025, 3:16:26 PMJun 1
to Gavin, repo-d...@googlegroups.com
On Sun, Jun 1, 2025 at 3:31 AM Gavin <gavin....@gmail.com> wrote:

repo.git]# git count-objects -vH
count: 124171

the repo has 124k loose objects
 
size: 507.07 MiB
in-pack: 15112959
packs: 1072

and a ton of pack files
 
size-pack: 5.63 GiB
prune-packable: 8221
garbage: 0
size-garbage: 0


You need to schedule running jgit gc or git gc on all repositories on a regular basis.
We run it on most repositories once a day and on busy repositories every couple of hours
and on All-Users every 30 minutes.

If a repository has more than a couple of thousand loose objects and / or more than a few dozen packfiles
performance will degrade since JGit runs a binary search on the indexes of all pack files to lookup objects
it wants to load and in addition checks for loose objects with the id of the object it's looking for.

The git bitmap index uses the file extension .bitmap, it's stored in the objects/pack/ directory.

Luca Milanesio

unread,
Jun 1, 2025, 3:26:57 PMJun 1
to repo-d...@googlegroups.com, Luca Milanesio

On 1 Jun 2025, at 21:15, Matthias Sohn <matthi...@gmail.com> wrote:

On Sun, Jun 1, 2025 at 3:31 AM Gavin <gavin....@gmail.com> wrote:

repo.git]# git count-objects -vH 
count: 124171

the repo has 124k loose objects
 
size: 507.07 MiB
in-pack: 15112959
packs: 1072

and a ton of pack files
 
size-pack: 5.63 GiB
prune-packable: 8221
garbage: 0
size-garbage: 0


You need to schedule running jgit gc or git gc on all repositories on a regular basis.

Our tests are showing that jgit gc provides *much better results* than git gc, for three reasons:
1) jgit gc separates the heads and non-heads in two different packfiles, leaving a single packfile ready to be sent ‘as-is’ for serving clones
2) jgit gc can preserve the packfiles to be pruned into a separate /preserved directory, therefore making it compatible with incoming traffic
3) jgit gc produces a much higher quality bitmap, also allowing to chose which ref prefixes to include/exclude

We run it on most repositories once a day and on busy repositories every couple of hours
and on All-Users every 30 minutes.

I reckon this repo hasn’t been GCed for a while :-O

If a repository has more than a couple of thousand loose objects and / or more than a few dozen packfiles
performance will degrade since JGit runs a binary search on the indexes of all pack files to lookup objects
it wants to load and in addition checks for loose objects with the id of the object it's looking for.

Also the filesystem plays a big role. Are the repos on NFS or local filesystem?
NFS is very slow in listing a large number of files in a directory, therefore having more than 100 packfiles means that the repo is basically dead.

The git bitmap index uses the file extension .bitmap, it's stored in the objects/pack/ directory.

Also consider using the git-repo-metrics plugin [1] which keeps track of the main metrics of the repo so that you could set a more intelligent GC policy and alerts when some of them are going out of control.

HTH

Luca.



Reply all
Reply to author
Forward
0 new messages