On 15 Feb 2024, at 14:38, bhanu prakash <vbhanupr...@gmail.com> wrote:Hello Team,I have a query about Gerrit Garbage Collection: which option is preferable, configuring Git garbage collection in gerrit.config or using JGit GC?
Additionally, is there a recommended best practices document for Gerrit maintainers? I came across a document titled "Gerrit Performance Tuning Talk at Google Summit," However, it appears to be somewhat outdated.
Could you please suggest if there is any such document available?
Thanks,Bhanu--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/acee4cc1-407b-4eb1-8c22-48ed4bf4328bn%40googlegroups.com.
Hello Team,I have a query about Gerrit Garbage Collection: which option is preferable, configuring Git garbage collection in gerrit.config or using JGit GC?
Additionally, is there a recommended best practices document for Gerrit maintainers? I came across a document titled "Gerrit Performance Tuning Talk at Google Summit," However, it appears to be somewhat outdated.
https://www.slideshare.net/JohannesNicolai1/gerrit-performance-tuning-talk-at-google-summitCould you please suggest if there is any such document available?
On Thu, Feb 15, 2024 at 3:38 PM bhanu prakash <vbhanupr...@gmail.com> wrote:Hello Team,I have a query about Gerrit Garbage Collection: which option is preferable, configuring Git garbage collection in gerrit.config or using JGit GC?
Running gc in the gerrit process is convenient to get started, if you are hosting large reposand have many users I'd recommend to run it in a separate process so that it doesn'tdegrade other requests being processed in the gerrit process.
Use a recent JGit version, configure a sufficiently large heap (try with your largest repos)and configure the JVM used to run JGit gc to use parallelGC which has a higher throughputthan the other Java gc algorithms.
Additionally, is there a recommended best practices document for Gerrit maintainers? I came across a document titled "Gerrit Performance Tuning Talk at Google Summit," However, it appears to be somewhat outdated.
https://www.slideshare.net/JohannesNicolai1/gerrit-performance-tuning-talk-at-google-summitCould you please suggest if there is any such document available?The most important prerequisite for performance tuning is a decent monitoring setup.
If you increase the load CPU% spent on Java GC and request latencies go up.If the server is overloaded, reduce thread pool sizes to throttle the load to avoidoverload situations.
Monitor cache hit rates and adjust their configuration if needed.
Run git gc regularly, on busy repos you may need to run it more frequently.Each push request adds another pack file and when the number of packsgrows JGit has to scan more and more pack indexes to find objects it wants to loadwhich slows down all git requests for the affected repository.The load is typically dominated by CI systems fetching large repositories (>500MB).
Setup Gerrit replica(s) to serve this load and configure their JVM to use parallelGCsince large fetch/clone requests take minutes hence stop-the-worldpauses of several seconds caused by Java GC have little impact.The Gerrit primary should better use G1GC to avoid long pauses for REST requestsimpacting responsiveness of the Gerrit UI.
Tuning JVM gc to mixed workload of large fetch/clone requests for max throughput and small REST requestsfor low latency in a single JVM isn't really possible without compromises.We made good experience running a primary and a replica side-by-side on the same serverserving repos from the same shared directory (they still need separate site directories so you needto symlink the shared git directory). This side car replica deployment has the advantage that there is noreplication involved hence there is no replication lag and you can tune JVM GC to the different workloads.Put a load balancer in front e.g. HAProxy. If you use http the load balancer can route requestsautomatically. Configure it to route fetch/clone requests to the replica and all other trafficto the primary. For ssh the sshd daemon needs to use different ports and clientsneed to configure different ports to send fetch/clone requests to the replica and push requeststo the primary.Use the replication plugin or pull-replication plugin to replicate from a primary toreplicas if you want to deploy them on different hosts.
On Fri, 16 Feb 2024 at 00:21, Matthias Sohn <matthi...@gmail.com> wrote:On Thu, Feb 15, 2024 at 3:38 PM bhanu prakash <vbhanupr...@gmail.com> wrote:Hello Team,I have a query about Gerrit Garbage Collection: which option is preferable, configuring Git garbage collection in gerrit.config or using JGit GC?Running gc in the gerrit process is convenient to get started, if you are hosting large reposand have many users I'd recommend to run it in a separate process so that it doesn'tdegrade other requests being processed in the gerrit process.Thank you, Matthias, for providing such thorough information.By "separate process," are you referring to pack.threads?
Use a recent JGit version, configure a sufficiently large heap (try with your largest repos)and configure the JVM used to run JGit gc to use parallelGC which has a higher throughputthan the other Java gc algorithms.Would using ParallelGC slow down Gerrit's response time? If not, will give it a try. I think the configuration below will be effective.[container]javaoptions = -XX:+UseParallelGC
Additionally, is there a recommended best practices document for Gerrit maintainers? I came across a document titled "Gerrit Performance Tuning Talk at Google Summit," However, it appears to be somewhat outdated.
https://www.slideshare.net/JohannesNicolai1/gerrit-performance-tuning-talk-at-google-summitCould you please suggest if there is any such document available?The most important prerequisite for performance tuning is a decent monitoring setup.
Yes, this setup is already in place.If you increase the load CPU% spent on Java GC and request latencies go up.If the server is overloaded, reduce thread pool sizes to throttle the load to avoidoverload situations.We're nearing the CPU usage limits, as it seems our garbage collector is active throughout the day, resulting in high CPU utilization. Is there a method to verify in the GC logs when the garbage collection process has finished for all repositories? For instance, can we identify when the GC has completed by a specific message like "GC is completed!"?
On Mon, Feb 19, 2024 at 2:08 PM bhanu prakash <vbhanupr...@gmail.com> wrote:On Fri, 16 Feb 2024 at 00:21, Matthias Sohn <matthi...@gmail.com> wrote:On Thu, Feb 15, 2024 at 3:38 PM bhanu prakash <vbhanupr...@gmail.com> wrote:Hello Team,I have a query about Gerrit Garbage Collection: which option is preferable, configuring Git garbage collection in gerrit.config or using JGit GC?Running gc in the gerrit process is convenient to get started, if you are hosting large reposand have many users I'd recommend to run it in a separate process so that it doesn'tdegrade other requests being processed in the gerrit process.Thank you, Matthias, for providing such thorough information.By "separate process," are you referring to pack.threads?No, that's configuring how many threads gc uses for packing,I meant you should run gc in a different process on OS level than the gerrit process.Use a recent JGit version, configure a sufficiently large heap (try with your largest repos)and configure the JVM used to run JGit gc to use parallelGC which has a higher throughputthan the other Java gc algorithms.Would using ParallelGC slow down Gerrit's response time? If not, will give it a try. I think the configuration below will be effective.[container]javaoptions = -XX:+UseParallelGCparallelGC has a higher throughput but can cause longer stop-the-world pauses than G1GC.jgit gc run in a separate process isn't directly observable by a user of your Gerrit server.
Hence I think as long the stop-the-world pauses are shorter than the typical jgit gc run theyshouldn't cause problems and the higher throughput helps finishing faster overall when you run gcon many repositories.