Gerrit performance

300 views
Skip to first unread message

Dmytro Rodionov

unread,
Mar 14, 2024, 9:41:45 AM3/14/24
to Repo and Gerrit Discussion
Good day everyone!
At my company we have a HA gerrit installion, and we are struggling to properly configure gerrit to use full host potential.
Please find or typical 'high' load during nighlty builds in the attachments.
As you can see on a screenshot, nearly half of the cores are doing nothing (25% total load at peak)
On the other hand, all of our sshd.threads are busy with git clones, gerrit ls-projects taking forever to show result and even web ui starts giving 504 error and taking long time to show changes/repos.
Also, it seems related and concerning that only around 800 fd's are open during nightly fetch, while max is at 8192

How can we improve this situation?
Any opinion will be much helpful. Thank you.

Host specs:
60 CPUs
180Gb RAM
SSD on NFS mount

Here's our config:
[core]
        packedGitWindowSize = 64k
        packedGitLimit = 16g
        packedGitOpenFiles = 8192
[container]
        javaOptions = -Xms160g
        javaOptions = -Xmx160g
        javaOptions = -XX:-UseAdaptiveSizePolicy
        javaOptions = -XX:+AlwaysPreTouch
        javaOptions = -XX:+UseParallelGC
        user = root
        heapLimit = 160g
[sshd]
        batchThreads = 110
        commandStartThreads = 10
        waitTimeout = 60m
        idleTimeout = 60m
        maxConnectionsPerUser = 1000
        threads = 120
[sshd]
        batchThreads = 110
        commandStartThreads = 10
        waitTimeout = 60m
        idleTimeout = 60m
        maxConnectionsPerUser = 1000
        threads = 120

P.S. There was a discussion about adding some cheat sheet to gerrit reppo, but I've failed to find one. And it might have been very useful 
Screenshot 2024-03-14 at 14.24.08.png

Matthias Sohn

unread,
Mar 14, 2024, 10:05:26 AM3/14/24
to Dmytro Rodionov, Repo and Gerrit Discussion
On Thu, Mar 14, 2024 at 2:41 PM Dmytro Rodionov <smpli...@gmail.com> wrote:
Good day everyone!
At my company we have a HA gerrit installion, and we are struggling to properly configure gerrit to use full host potential.
Please find or typical 'high' load during nighlty builds in the attachments.
As you can see on a screenshot, nearly half of the cores are doing nothing (25% total load at peak)
On the other hand, all of our sshd.threads are busy with git clones, gerrit ls-projects taking forever to show result and even web ui starts giving 504 error and taking long time to show changes/repos.
Also, it seems related and concerning that only around 800 fd's are open during nightly fetch, while max is at 8192

How can we improve this situation?
Any opinion will be much helpful. Thank you.

First of all install a monitoring solution e.g. https://gerrit.googlesource.com/gerrit-monitoring/+/refs/heads/master
Monitor the time the JVM spends on running Java gc. If percentage spent on Java gc goes through the roof
you are overloading the process. In that case it's typically better to reduce the sshd thread pool sizes to
prevent overload.

Check the hit rates of Gerrit caches and increase their size if necessary.

Run git gc on a regular schedule on all repos. Run it more frequently on busy repositories.

We found it's not possible to tune Java gc for both high throughput (needed to serve bulky upload-pack requests) 
and at the same time avoid stop-the-world pauses caused by Java gc which hurt for REST requests used in the UI.
What can help is offloading upload-pack requests to Gerrit replicas and for the JVM use parallelGC for Gerrit replicas
and G1GC for  Gerrit primaries. If primary and replica have access to the same file system where git repositories are stored
you don't need to replicate which would come with some lag.
 
Host specs:
60 CPUs
180Gb RAM
SSD on NFS mount

If repositories are on NFS make sure they are always well-packed and there are not a large number of empty directories.
Traversing a large file tree on NFS can take a lot of time.
 
Here's our config:
[core]
        packedGitWindowSize = 64k
        packedGitLimit = 16g
        packedGitOpenFiles = 8192
[container]
        javaOptions = -Xms160g
        javaOptions = -Xmx160g
        javaOptions = -XX:-UseAdaptiveSizePolicy
        javaOptions = -XX:+AlwaysPreTouch
        javaOptions = -XX:+UseParallelGC
        user = root
        heapLimit = 160g

set either -Xmx or heapLimit but not both
 
[sshd]
        batchThreads = 110
        commandStartThreads = 10
        waitTimeout = 60m
        idleTimeout = 60m
        maxConnectionsPerUser = 1000
        threads = 120
[sshd]
        batchThreads = 110
        commandStartThreads = 10
        waitTimeout = 60m
        idleTimeout = 60m
        maxConnectionsPerUser = 1000
        threads = 120

P.S. There was a discussion about adding some cheat sheet to gerrit reppo, but I've failed to find one. And it might have been very useful 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/0290dd1a-2567-4ce1-a842-2c31641e5f3dn%40googlegroups.com.

Sven Selberg

unread,
Mar 14, 2024, 10:07:20 AM3/14/24
to Repo and Gerrit Discussion
On Thursday, March 14, 2024 at 2:41:45 PM UTC+1 Dmytro Rodionov wrote:
Good day everyone!
At my company we have a HA gerrit installion, and we are struggling to properly configure gerrit to use full host potential.
Please find or typical 'high' load during nighlty builds in the attachments.
As you can see on a screenshot, nearly half of the cores are doing nothing (25% total load at peak)
On the other hand, all of our sshd.threads are busy with git clones, gerrit ls-projects taking forever to show result and even web ui starts giving 504 error and taking long time to show changes/repos.
Also, it seems related and concerning that only around 800 fd's are open during nightly fetch, while max is at 8192

How can we improve this situation?
Any opinion will be much helpful. Thank you.

Two things spring to mind:
* What does the I/O situation look like? Is that the bottleneck? What sort of disk-performance do you get?
* How are the caches configured? What is the hit-rate for the caches?

A long-shot: are you running Gerrit in docker on a virtualized server?
* Check that docker daemon "knows" about all cores.[1]
   `$ cat /sys/fs/cgroup/cpuset/docker/cpuset.cpus`

[1] https://superuser.com/questions/1440602/how-does-the-docker-daemon-know-about-the-available-hardware-resources

Sven Selberg

unread,
Mar 14, 2024, 10:13:28 AM3/14/24
to Repo and Gerrit Discussion
On Thursday, March 14, 2024 at 3:05:26 PM UTC+1 Matthias Sohn wrote:
On Thu, Mar 14, 2024 at 2:41 PM Dmytro Rodionov <smpli...@gmail.com> wrote:
Good day everyone!
At my company we have a HA gerrit installion, and we are struggling to properly configure gerrit to use full host potential.
Please find or typical 'high' load during nighlty builds in the attachments.
As you can see on a screenshot, nearly half of the cores are doing nothing (25% total load at peak)
On the other hand, all of our sshd.threads are busy with git clones, gerrit ls-projects taking forever to show result and even web ui starts giving 504 error and taking long time to show changes/repos.
Also, it seems related and concerning that only around 800 fd's are open during nightly fetch, while max is at 8192

How can we improve this situation?
Any opinion will be much helpful. Thank you.

First of all install a monitoring solution e.g. https://gerrit.googlesource.com/gerrit-monitoring/+/refs/heads/master
Monitor the time the JVM spends on running Java gc. If percentage spent on Java gc goes through the roof
you are overloading the process. In that case it's typically better to reduce the sshd thread pool sizes to
prevent overload.

Another monitoring tool that can be very helpful is the javamelody plugin: https://gerrit.googlesource.com/plugins/javamelody/+/refs/heads/master/src/main/resources/Documentation/about.md
 

Check the hit rates of Gerrit caches and increase their size if necessary.

Run git gc on a regular schedule on all repos. Run it more frequently on busy repositories.

+1 the usual suspect, well worth investigating (but I don't think it should affect `ls-projects` which smells more like cache issue).

Nasser Grainawi

unread,
Mar 14, 2024, 11:30:56 AM3/14/24
to Sven Selberg, Repo and Gerrit Discussion
On Thu, Mar 14, 2024 at 8:13 AM Sven Selberg <sven.s...@axis.com> wrote:


On Thursday, March 14, 2024 at 3:05:26 PM UTC+1 Matthias Sohn wrote:
On Thu, Mar 14, 2024 at 2:41 PM Dmytro Rodionov <smpli...@gmail.com> wrote:
Good day everyone!
At my company we have a HA gerrit installion, and we are struggling to properly configure gerrit to use full host potential.
Please find or typical 'high' load during nighlty builds in the attachments.
As you can see on a screenshot, nearly half of the cores are doing nothing (25% total load at peak) 
On the other hand, all of our sshd.threads are busy with git clones, gerrit ls-projects taking forever to show result and even web ui starts giving 504 error and taking long time to show changes/repos.
Also, it seems related and concerning that only around 800 fd's are open during nightly fetch, while max is at 8192

Sven and Matthias touched on this indirectly, but you should expect Gerrit workloads to be more heap/RAM-intensive than CPU-intensive. Given the high number of SSHD threads you have configured, it could be very possible to have a few large git repos being cloned by many of those threads and end up overloading the heap resulting in large JVM GC pauses. I would use the monitoring/debug methods mentioned by Sven & Matthias to better understand your current problem/limitations and then apply solutions (either already mentioned here or given later based on monitoring/debug feedback) incrementally to improve the situation.
 

Dmytro Rodionov

unread,
Mar 20, 2024, 3:50:11 AM3/20/24
to Repo and Gerrit Discussion
Good day
Thank you for your advises. We already had a monitoring tool, but did't know where to look.


+1 the usual suspect, well worth investigating (but I don't think it should affect `ls-projects` which smells more like cache issue).
This was, indeed, true. All lags started when high jgit cache miss ratio occured. Increasing core.packedGitLimit to 60g fixed the problem. 

Reply all
Reply to author
Forward
0 new messages