Checkout large repository from gerrit is slow.

1,232 views
Skip to first unread message

Mtip

unread,
Nov 8, 2016, 1:45:30 AM11/8/16
to Repo and Gerrit Discussion
Hi,

    I have one large repository,it is over 12G in size.
    If only one task checkout will take 20 minutes. 
    
    But more than 4 tasks chekout at the same time will be very slowly, and i do not know how long it will be successful.

    Anyone has suggest for me to improve this problem?

clone info:

remote: Counting objects: 6542695, done
remote: Finding sources:  17% (1353788/7962985)


cpu info :
    

Matthias Sohn

unread,
Nov 8, 2016, 4:28:19 AM11/8/16
to Mtip, Repo and Gerrit Discussion
On Tue, Nov 8, 2016 at 7:45 AM, Mtip <stmao...@gmail.com> wrote:
Hi,

    I have one large repository,it is over 12G in size.
    If only one task checkout will take 20 minutes. 
    

do you mean clone ? Checkout is a local operation in git ...
 
    But more than 4 tasks chekout at the same time will be very slowly, and i do not know how long it will be successful.

    Anyone has suggest for me to improve this problem?

clone info:

remote: Counting objects: 6542695, done
remote: Finding sources:  17% (1353788/7962985)



how large is the server you are running the Gerrit server on (cores, memory etc) ?
What's the available network bandwidth between client and server ?
Do you regularly run git gc or gerrit gc on all repositories ?
How does your Gerrit server's configuration look like ?
 
cpu info :
    

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mtip

unread,
Nov 8, 2016, 7:26:35 AM11/8/16
to Repo and Gerrit Discussion, stmao...@gmail.com
Hi
   I mean clone in my local.
   My Gerrit Server:
         CPU e5-2650 20 cores, 2 chips, 10 cores/chip, 2 threads/core // total 40 threads
Memory 64G
The network bandwidth is 100M.
Run git gc every week.
I used default config for my gerrit server.
    

在 2016年11月8日星期二 UTC+8下午5:28:19,Matthias Sohn写道:
On Tue, Nov 8, 2016 at 7:45 AM, Mtip <stmao...@gmail.com> wrote:
Hi,

    I have one large repository,it is over 12G in size.
    If only one task checkout will take 20 minutes. 
    

do you mean clone ? Checkout is a local operation in git ...
 
    But more than 4 tasks chekout at the same time will be very slowly, and i do not know how long it will be successful.

    Anyone has suggest for me to improve this problem?

clone info:

remote: Counting objects: 6542695, done
remote: Finding sources:  17% (1353788/7962985)



how large is the server you are running the Gerrit server on (cores, memory etc) ?
What's the available network bandwidth between client and server ?
Do you regularly run git gc or gerrit gc on all repositories ?
How does your Gerrit server's configuration look like ?
 
cpu info :
    

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
Nov 8, 2016, 7:43:55 AM11/8/16
to Mtip, Repo and Gerrit Discussion
Looks like clone speed you observe is dominated by the available network bandwidth,
downloading 12GB over a 100 MBit/s link needs at least 16min:

12 *10^9 Byte * 8 bit/Byte / 10^8 bit/sec = 960 sec = 16 min  

Using default configuration you are running with a pretty small JGit object cache.
You should consider to increase core.packedGitLimit (JGit object cache),
ideally to a size big enough to cache all your busy repositories in memory.
Note that you also need to increase max JVM heap size (container.heapLimit) accordingly,
as the JGit object cache is allocated on the JVM's heap.

git gc should be configured to generate bitmap indexes which speed up cloning
and reduce server load.

-Matthias

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Mtip

unread,
Nov 8, 2016, 8:28:18 AM11/8/16
to Repo and Gerrit Discussion, stmao...@gmail.com
Thanks for you suggestion, 

I will try it in my Gerrrit Server.

I think it will slove some of our current problems.

Thanks!

在 2016年11月8日星期二 UTC+8下午8:43:55,Matthias Sohn写道:

vista...@gmail.com

unread,
Sep 3, 2018, 4:43:51 PM9/3/18
to Repo and Gerrit Discussion
Hi,Matthias:
  
    How to configured to generate bitmap indexes when git gc?

Thank you!

Sven Selberg

unread,
Sep 4, 2018, 2:13:33 AM9/4/18
to vista...@gmail.com, Repo and Gerrit Discussion

git repack -b


"

       -b, --write-bitmap-index
           Write a reachability bitmap index as part of the repack. This only makes sense when used with -a or -A, as the bitmaps must be able to refer to all reachable objects. This option overrides the setting of
           repack.writeBitmaps. This option has no effect if multiple packfiles are created.​


"


From: repo-d...@googlegroups.com <repo-d...@googlegroups.com> on behalf of vista...@gmail.com <vista...@gmail.com>
Sent: Monday, September 3, 2018 10:43 PM
To: Repo and Gerrit Discussion
Subject: Re: Checkout large repository from gerrit is slow.
 

Matthias Sohn

unread,
Sep 4, 2018, 5:19:00 AM9/4/18
to Sven Selberg, vista...@gmail.com, Repo and Gerrit Discussion
I think JGit gc always writes a bitmap index for each packfile.

Matthias Sohn

unread,
Sep 4, 2018, 5:40:10 AM9/4/18
to Sven Selberg, vista...@gmail.com, Repo and Gerrit Discussion
On Tue, Sep 4, 2018 at 11:18 AM Matthias Sohn <matthi...@gmail.com> wrote:
I think JGit gc always writes a bitmap index for each packfile.

no, I was wrong, in JGit writing bitmap indexes is controlled by the boolean option pack.buildbitmaps [1]
which is true by default. This option is equivalent to native git's option repack.writeBitmaps [2] which defaults to false.

euphxenos

unread,
Sep 4, 2018, 6:45:11 PM9/4/18
to Repo and Gerrit Discussion
If you're repeatedly cloning a large repo to the same client (for example in build automation, where you're likely to clone and delete the same repo repeatedly), you might also want to consider using a reference repo on the client side for your clone operations.  On that client, clone that repo with the "--mirror" option added to get all the refs.  Put that repo in a location and keep it on that system (don't check branches out there, and don't delete it).  That's your reference repo.  You can update it to keep it current by doing a "git remote update" in it.  Then when you want to clone and use a repo, do your git clone with "--reference <path to that reference repo>".  This will then only pull down refs from the server over the network if they aren't already in your reference repo.  If you update your reference repo before each use, you will then only download each new ref once over the network.

I have a project where we have a 20gig repo, and the build automation will frequently clone it to 48 clients simultaneously at the start of a multi-way build with lots of different targets building in parallel.  Using reference repos like this, that barely causes a flicker in network traffic (as opposed to almost a terabyte of data moving around the network at the start of a build if we didn't).  It's working very well for us.


--Andrew

vista...@gmail.com

unread,
Sep 5, 2018, 4:56:32 AM9/5/18
to Repo and Gerrit Discussion
Thank you very much,euphxenos.

euphxenos於 2018年9月5日星期三 UTC+8上午6時45分11秒寫道:

vista...@gmail.com

unread,
Sep 5, 2018, 4:56:46 AM9/5/18
to Repo and Gerrit Discussion
Thank you very much,Matthias.

Matthias Sohn於 2018年9月4日星期二 UTC+8下午5時40分10秒寫道:

vista...@gmail.com

unread,
Sep 6, 2018, 10:48:07 PM9/6/18
to Repo and Gerrit Discussion
Hi,euphxenos:
   
     I had tried your plan like that:
         1. Use commond  ‘repo init’  with ‘--mirror’ and repo sync to make the reference repo
         2. On the client,  ‘repo init’ with ‘--reference=/mirror’  and  ‘’/mirror’ is the reference repo
     
     After that, take ‘repo sync’ and I also found fetch projects from gerrit server and fetch msg in ssh_log.

    Why?

Thank you!

vista...@gmail.com

unread,
Sep 10, 2018, 10:04:25 PM9/10/18
to Repo and Gerrit Discussion
Hi,Matthias:

With the jstack msg, I found the one CPU core of the gerrit server are 100% used for ‘SSH-Interactive-Worker'.

How can I do for that?


On Tuesday, November 8, 2016 at 5:28:19 PM UTC+8, Matthias Sohn wrote:
On Tue, Nov 8, 2016 at 7:45 AM, Mtip <stmao...@gmail.com> wrote:
Hi,

    I have one large repository,it is over 12G in size.
    If only one task checkout will take 20 minutes. 
    

do you mean clone ? Checkout is a local operation in git ...
 
    But more than 4 tasks chekout at the same time will be very slowly, and i do not know how long it will be successful.

    Anyone has suggest for me to improve this problem?

clone info:

remote: Counting objects: 6542695, done
remote: Finding sources:  17% (1353788/7962985)



how large is the server you are running the Gerrit server on (cores, memory etc) ?
What's the available network bandwidth between client and server ?
Do you regularly run git gc or gerrit gc on all repositories ?
How does your Gerrit server's configuration look like ?
 
cpu info :
    

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

euphxenos

unread,
Sep 11, 2018, 7:13:53 PM9/11/18
to Repo and Gerrit Discussion
I don't use repo.  When I do this, it's "git clone --mirror <repo>", "git clone --reference /path/to/that/mirror <repo on gerrit server>", and "git remote update" from the directory that mirror is stored in on the client.  I don't know what the equivalent commands are with repo.


--Andrew

vista...@gmail.com

unread,
Sep 12, 2018, 4:13:49 AM9/12/18
to Repo and Gerrit Discussion
ok 

Thank you very much,euphxenos.

Sven Selberg

unread,
Sep 12, 2018, 5:10:17 AM9/12/18
to Repo and Gerrit Discussion
When working with a reference mirror conceptually you can think about it as such:
You create the mirror by clone:ing  the project to your mirror ( let's call it mirror-repo).

When you clone the repository you are going to work in with repo init --mirror:
1. git first links the object store from 'mirror-repo' so your git is now in the same state as the mirror.
2. then git asks the remote for any additional objects that are not in 'mirror-repo'
You can verify this by checking the size of the repository you cloned with --mirror, it should be significantly smaller then a repository that you cloned without mirror (this is because the bulk of the objects doesn't live in the repository, they are only linked from the mirror.

Clone:ing with mirror is network-wise equivalent to fetching into an already existing repository. That is probably why you still see a fetch command on the server-side, from when git is checking if there are additional objects that are not in the mirror.

/Sven

vista...@gmail.com

unread,
Nov 29, 2018, 8:12:06 PM11/29/18
to Repo and Gerrit Discussion
Thank you very much!
Reply all
Reply to author
Forward
0 new messages