How to run gerrit implementation with native git

255 views
Skip to first unread message

Abhishek Patel

unread,
Sep 25, 2019, 6:56:59 PM9/25/19
to Repo and Gerrit Discussion
I am interested to know if there are any performance gain with running gerrit with native git as oppose to using jgit. Though I am not sure what are the steps(configurations) I need to do in order to actually start gerrit with a native git. 
Currently using Gerrit 2.14.20, with Jgit 4.7.9.x

Gert van Dijk

unread,
Sep 25, 2019, 7:07:44 PM9/25/19
to Abhishek Patel, Repo and Gerrit Discussion
On Thu, Sep 26, 2019 at 12:57 AM Abhishek Patel <abh...@gmail.com> wrote:
> I am interested to know if there are any performance gain with running gerrit with native git as oppose to using jgit. Though I am not sure what are the steps(configurations) I need to do in order to actually start gerrit with a native git.
> Currently using Gerrit 2.14.20, with Jgit 4.7.9.x

I'm not very sure what you're asking or why do you think that would be
a possibility with just a configuration change.

You should understand Gerrit is an application written in Java, using
JGit as a library and heavily relies on JGit to do its work, while at
the same time allowing us to 'inject' code at places Gerrit needs
customizations like enforcement of ACLs. 'Native git' is written in C
and does not offer the same level of functionality needed. Even if you
would be able to put everything in a Java wrapper or a JNI interface
to call 'native git', this would be a huge effort and most likely
affects the architecture as a whole.

Now, please take a step back, is there a specific (performance)
problem that you're trying to solve? Perhaps we could help you with
that. :-)
Some larger installations have put read-only slaves in place to
off-load most of the traffic to those locations and scale out like
that. IOW, what is your problem exactly, please provide some numbers
to it, and what have you tried?

HTH

Gert

Abhishek Patel

unread,
Sep 25, 2019, 7:16:18 PM9/25/19
to Repo and Gerrit Discussion
I see, I just thought that Jgit is pluggable part. Well, if JGit is the only way to run gerrit as of now then no meaning of my original question. Thank you for providing clarity on the topic.

Matthias Sohn

unread,
Sep 25, 2019, 8:23:54 PM9/25/19
to Abhishek Patel, Repo and Gerrit Discussion
On Thu, Sep 26, 2019 at 12:57 AM Abhishek Patel <abh...@gmail.com> wrote:
I am interested to know if there are any performance gain with running gerrit with native git as oppose to using jgit. Though I am not sure what are the steps(configurations) I need to do in order to actually start gerrit with a native git. 

you can't as Gert wrote, since Gerrit is based on JGit and uses its Java API
 
Currently using Gerrit 2.14.20, with Jgit 4.7.9.x

if you want better performance upgrade to Gerrit 2.16 or better 3.0, always use the latest bugfix release
of a given minor release (as you are doing now with 2.14.20).
If you are on 2.16 or higher then use a filesystem with high file timestamp resolution (e.g. ext4, btrfs, xfs, zfs
which all provide 1ns resolution, on Java this is reduced depending on OS and Java version).
Observe your caches using show-caches ssh command to learn which caches need to be tuned.

Can you share your gerrit.config ?

Very important is to have a sufficiently large core.packedGitLimit which give the size of the jgit cache mapping pack files into memory.
Ideally it matches the total size of actively used repositories on that server. Max heap size should be around twice this size.
If this cache is too small you'll have a lot of IO to read objects from packfiles.

Do you run gc on a regular basis on all repositories ?

Can you provide some numbers about size of repositories you are serving and an idea about the load ?
Install javamelody plugin to simplify monitoring

-Matthias 

Sven Selberg

unread,
Sep 26, 2019, 6:24:06 AM9/26/19
to Repo and Gerrit Discussion

Very important is to have a sufficiently large core.packedGitLimit which give the size of the jgit cache mapping pack files into memory.
Ideally it matches the total size of actively used repositories on that server. Max heap size should be around twice this size.

Did you mean "Ideally it matches the total size of the *packfiles* in the actively used repositories ..."?

Matthias Sohn

unread,
Sep 26, 2019, 7:05:43 AM9/26/19
to Sven Selberg, Repo and Gerrit Discussion
On Thu, Sep 26, 2019 at 12:24 PM Sven Selberg <sven.s...@axis.com> wrote:

Very important is to have a sufficiently large core.packedGitLimit which give the size of the jgit cache mapping pack files into memory.
Ideally it matches the total size of actively used repositories on that server. Max heap size should be around twice this size.

Did you mean "Ideally it matches the total size of the *packfiles* in the actively used repositories ..."?

yes 

If this cache is too small you'll have a lot of IO to read objects from packfiles.

Do you run gc on a regular basis on all repositories ?

Can you provide some numbers about size of repositories you are serving and an idea about the load ?
Install javamelody plugin to simplify monitoring

-Matthias 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/7883b3a1-fef3-4217-8559-b787625ff03d%40googlegroups.com.

Abhishek Patel

unread,
Sep 27, 2019, 3:11:32 PM9/27/19
to Repo and Gerrit Discussion
We are looking at bunch of issue with gerrit+jgit combination.

First, we saw a repository which was something like 75GB in size(on disk) - even though actually when we clone repo is no more than 1 MB. On investigating around 70GB contents were in preserved/<sha1>.old-pack and preserved/<sha1>.old-idx files. This however partially fixed by using only "jgit gc" with "-prune-preserved" option, i.e. now repo is 6GB. I found that "native git" hasn't implemented yet the "-prune-preserved" option, but "git gc" command was able to cleanup repo to reduce size to 1MB when running after the "jgit -git_dir=<repo> gc --prune-preserved".

Second, we saw that loading around 15000 repos in gerrit "projects cache" took roughly 20-30GB JVM Heap, possibly due to jgit's way of loading cache in JVM. 

Third, we found that some jgit version with gc-conductor takes too long to run GC on repo. Though latest jgit version(4.7.9) doesn't have that issue.

Apart from these three, lot of time we get too many ssh requests for git clone which causes JVM Heap spike, causing JVM Full GC which ultimately makes gerrit unstable and we do restart gerrit. If we don't restart gerrit, next all ssh clone requests hangs, even administrative gerrit commands doesn't respond back.
[JVM Heap 128 GB, Repo sizes ranging 3-15GB, ssh request rate(10 request within 10 seconds duration].

So, I was curious to try if native git is going to work differently in above cases or not. 

Matthias Sohn

unread,
Sep 27, 2019, 3:39:06 PM9/27/19
to Abhishek Patel, Repo and Gerrit Discussion
On Fri, Sep 27, 2019 at 9:11 PM Abhishek Patel <abh...@gmail.com> wrote:
We are looking at bunch of issue with gerrit+jgit combination.

First, we saw a repository which was something like 75GB in size(on disk) - even though actually when we clone repo is no more than 1 MB. On investigating around 70GB contents were in preserved/<sha1>.old-pack and preserved/<sha1>.old-idx files. This however partially fixed by using only "jgit gc" with "-prune-preserved" option, i.e. now repo is 6GB. I found that "native git" hasn't implemented yet the "-prune-preserved" option, but "git gc" command was able to cleanup repo to reduce size to 1MB when running after the "jgit -git_dir=<repo> gc --prune-preserved".

Do you use NFS ? These options are meant to be used as a workaround for issues which may occur on NFS.
 
Second, we saw that loading around 15000 repos in gerrit "projects cache" took roughly 20-30GB JVM Heap, possibly due to jgit's way of loading cache in JVM. 

Gerrit's project cache is not implemented in JGit
 
Third, we found that some jgit version with gc-conductor takes too long to run GC on repo. Though latest jgit version(4.7.9) doesn't have that issue.

you didn't specify which version you are talking about and I don't know what gc-conductor is
4.7.9 is not the latest JGit version, the latest version is 5.5.0. 

Apart from these three, lot of time we get too many ssh requests for git clone which causes JVM Heap spike, causing JVM Full GC which ultimately makes gerrit unstable and we do restart gerrit. If we don't restart gerrit, next all ssh clone requests hangs, even administrative gerrit commands doesn't respond back.
[JVM Heap 128 GB, Repo sizes ranging 3-15GB, ssh request rate(10 request within 10 seconds duration].

these are very large git repositories, running many concurrent clones of repositories of that size is heavy load
Why are these repositories so large and why do you need so many clone commands ?

You didn't answer the questions in my last reply
 
So, I was curious to try if native git is going to work differently in above cases or not. 

you could try
 
On Wednesday, September 25, 2019 at 8:23:54 PM UTC-4, Matthias Sohn wrote:
On Thu, Sep 26, 2019 at 12:57 AM Abhishek Patel <abh...@gmail.com> wrote:
I am interested to know if there are any performance gain with running gerrit with native git as oppose to using jgit. Though I am not sure what are the steps(configurations) I need to do in order to actually start gerrit with a native git. 

you can't as Gert wrote, since Gerrit is based on JGit and uses its Java API
 
Currently using Gerrit 2.14.20, with Jgit 4.7.9.x

if you want better performance upgrade to Gerrit 2.16 or better 3.0, always use the latest bugfix release
of a given minor release (as you are doing now with 2.14.20).
If you are on 2.16 or higher then use a filesystem with high file timestamp resolution (e.g. ext4, btrfs, xfs, zfs
which all provide 1ns resolution, on Java this is reduced depending on OS and Java version).
Observe your caches using show-caches ssh command to learn which caches need to be tuned.

Can you share your gerrit.config ?

Very important is to have a sufficiently large core.packedGitLimit which give the size of the jgit cache mapping pack files into memory.
Ideally it matches the total size of actively used repositories on that server. Max heap size should be around twice this size.
If this cache is too small you'll have a lot of IO to read objects from packfiles.

Do you run gc on a regular basis on all repositories ?

Can you provide some numbers about size of repositories you are serving and an idea about the load ?
Install javamelody plugin to simplify monitoring

-Matthias 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
Sep 27, 2019, 3:40:24 PM9/27/19
to Abhishek Patel, Repo and Gerrit Discussion
On Fri, Sep 27, 2019 at 9:38 PM Matthias Sohn <matthi...@gmail.com> wrote:
On Fri, Sep 27, 2019 at 9:11 PM Abhishek Patel <abh...@gmail.com> wrote:
We are looking at bunch of issue with gerrit+jgit combination.

First, we saw a repository which was something like 75GB in size(on disk) - even though actually when we clone repo is no more than 1 MB. On investigating around 70GB contents were in preserved/<sha1>.old-pack and preserved/<sha1>.old-idx files. This however partially fixed by using only "jgit gc" with "-prune-preserved" option, i.e. now repo is 6GB. I found that "native git" hasn't implemented yet the "-prune-preserved" option, but "git gc" command was able to cleanup repo to reduce size to 1MB when running after the "jgit -git_dir=<repo> gc --prune-preserved".

Do you use NFS ? These options are meant to be used as a workaround for issues which may occur on NFS.
 
Second, we saw that loading around 15000 repos in gerrit "projects cache" took roughly 20-30GB JVM Heap, possibly due to jgit's way of loading cache in JVM. 

Gerrit's project cache is not implemented in JGit
 
Third, we found that some jgit version with gc-conductor takes too long to run GC on repo. Though latest jgit version(4.7.9) doesn't have that issue.

you didn't specify which version you are talking about and I don't know what gc-conductor is
4.7.9 is not the latest JGit version, the latest version is 5.5.0. 

Apart from these three, lot of time we get too many ssh requests for git clone which causes JVM Heap spike, causing JVM Full GC which ultimately makes gerrit unstable and we do restart gerrit. If we don't restart gerrit, next all ssh clone requests hangs, even administrative gerrit commands doesn't respond back.
[JVM Heap 128 GB, Repo sizes ranging 3-15GB, ssh request rate(10 request within 10 seconds duration].

these are very large git repositories, running many concurrent clones of repositories of that size is heavy load
Why are these repositories so large and why do you need so many clone commands ?

you can use git-sizer to get an idea

Marco Miller

unread,
Sep 28, 2019, 7:06:05 PM9/28/19
to Repo and Gerrit Discussion
On Friday, September 27, 2019 at 3:40:24 PM UTC-4, Matthias Sohn wrote:
(...)

these are very large git repositories, running many concurrent clones of repositories of that size is heavy load
Why are these repositories so large and why do you need so many clone commands ?

you can use git-sizer to get an idea

-Abhishek is part of the Ericsson internal Gerrit DevOps team.
The question that came from him internally this week was, is native git usable in lieu of jgit for the client or repositories maintenance side.
I.e., not to replace jgit with git within server, unless I myself misunderstood his original question internally.
Swapping jgit out of Gerrit server is indeed inconceivable to date.

Trying git instead of jgit as client for maintenance showed to be more problematic than improving on jgit, for us internally some years ago.
This thread I think was initiated to reiterate on that topic in the context of today; right @Abhishek?
Also, @Abhishek: I already backlogged the topic of git-sizer and a side repository cleaner.
We'd also first need to revisit our git_tags cache configuration on slaves, I think.

Thanks to Gert, Sven and Matthias for having helped on this thread :)
Reply all
Reply to author
Forward
0 new messages