Gerrit Performance issue

333 views
Skip to first unread message

pshetty

unread,
Jun 24, 2021, 3:21:29 AM6/24/21
to Repo and Gerrit Discussion
Hello,

I have recently upgraded gerrit version from 2.15.17 to 3.3.0 following the steps mentioned here groups.google.com/u/2/g/repo-discuss/c/G5wucKJg9Ag. After gerrit upgrade , performance is very low and CPU is loaded heavily all the time. We have setup jenkins and gerrit on the same server and the server has 12 CPU cores with 64Gb RAM . I have set the gerrit heap size to 52GB. All CPU cores are 100% utilised always. CPU percentage used by the gerrit process is always above 1000% when checked in htop. When there are too many push and pull requests, load increases and it causes jenkins jobs to be in queue. I need to restart gerrit service each time when the load increases after which it comes back to normal. Following is the gerrit.config 

[container]
        user = sdf
        javaHome = /usr/bin/java
        heapLimit = 52g
javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
        listenAddress = *:29418
        threads = 24

[httpd]
        listenUrl = proxy-https://*:8080/
        acceptorThreads = 10
        maxThreads = 400
[cache]
        directory = cache
[download]
        command = checkout
        scheme = ssh
[index]
        type = lucene
[core]
        packedGitLimit = 20g
        packedGitWindowSize = 16k
        packedGitOpenFiles = 12000
[receive]
enableSignedPush = false

Matthias Sohn

unread,
Jun 24, 2021, 8:53:22 AM6/24/21
to pshetty, Repo and Gerrit Discussion
On Thu, Jun 24, 2021 at 9:21 AM pshetty <pshet...@gmail.com> wrote:
Hello,

I have recently upgraded gerrit version from 2.15.17 to 3.3.0 following the steps mentioned here groups.google.com/u/2/g/repo-discuss/c/G5wucKJg9Ag. After gerrit upgrade , performance is very low and CPU is loaded heavily all the time. We have setup jenkins and gerrit on the same server and the server has 12 CPU cores with 64Gb RAM . I have set the gerrit heap size to 52GB. All CPU cores are 100% utilised always. CPU percentage used by the gerrit process is always above 1000% when checked in htop. When there are too many push and pull requests, load increases and it causes jenkins jobs to be in queue. I need to restart gerrit service each time when the load increases after which it comes back to normal. Following is the gerrit.config 

  • always update to the latest service release of the release you are using in order to not miss any of the bug fixes done in service releases,
    the latest one for 3.3.x is 3.3.4 [1]
  • Do you run git gc on all repositories on a regular schedule ?
    • Check if any of your large repositories with a lot of traffic have a large number of loose objects/packs e.g. using "git count-objects -vH"
    • if that's the case either you don't run gc at all or you need to run it more frequently
  • configure Java gc logging and check if Java gc consumes a lot of CPU
  • you didn't explain the size of your site and how much traffic you have, as a rule of thumb you need one core per concurrent fetch/clone request
  • typically the load is dominated by fetch/clone requests for large repositories (large is anything >1GB)
  • if Java gc cannot keep pace with the load you have you may consider to use parallelGC instead of G1GC, it has higher throughput but stop the world pauses can be longer
  • ensure that clients are using git protocol version 2, this improves performance. This requires at least git version v2.18.0,
    needs to be configured on client side (git config --global protocol.version 2), since v2.26.0 it's used by default
  • setup monitoring, e.g. using [2] to get more insight
  • try to separate read load from CI systems from interactive requests from end users, by configuring separate thread pool for CI,
    this can help to ensure that enough threads are available for end user requests, CI systems typically can tolerate some wait time if their request is queued for a while
    • assign CI users to the service user group [3] and configure sshd.batchThreads [4]. All service users will be scheduled in this separate thread pool.
      If this still overloads your system reduce sshd.batchThreads to reduce system load by queuing some CI requests

pshetty

unread,
Jun 28, 2021, 12:07:27 AM6/28/21
to Repo and Gerrit Discussion
Hello Matthias,

1. Do you run git gc on all repositories on a regular schedule ?    - We do not run gc currently on our repositories.  I added the following in gerrit.cofig file and enabled gc logging

javaOptions = -Xloggc:/path/to/gerrit/logs/javagc.log
javaOptions = -Xlog:gc*
javaOptions = -Xlog::::filecount=5,filesize=1024
javaOptions = -Xlog:safepoint
javaOptions = -Xlog:age*=debug
javaOptions = -Xlog:ergo*=debug
javaOptions = -Xlog:ref*=debug
javaOptions = -XX:+UnlockDiagnosticVMOptions
javaOptions = -XX:+UseG1GC

We currently have only 2 repositories in Gerrit and I tried to run "git count-objects -vH" on the gerrit server and it gave the follwoing rsult. Id this what you were looking for

count: 138
size: 1.18 MiB
in-pack: 1037622
packs: 24
size-pack: 529.41 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Following is the gc log after it is enabled

[8460.135s][info][gc] GC(808) Pause Young (Prepare Mixed) (GCLocker Initiated GC) 3464M->2887M(4400M) 45.385ms
[8461.592s][info][gc] GC(809) Pause Young (Mixed) (G1 Evacuation Pause) 2991M->2542M(4400M) 43.297ms
[8468.300s][info][gc] GC(810) Pause Young (Mixed) (G1 Evacuation Pause) 2726M->2520M(4400M) 46.350ms
[8490.302s][info][gc] GC(811) Pause Young (Normal) (G1 Evacuation Pause) 3336M->2802M(4400M) 59.092ms
[8490.711s][info][gc] GC(812) Pause Young (Normal) (G1 Evacuation Pause) 3474M->2854M(4400M) 32.449ms
[8510.676s][info][gc] GC(813) Pause Young (Concurrent Start) (G1 Evacuation Pause) 3542M->3127M(4400M) 60.916ms
[8510.676s][info][gc] GC(814) Concurrent Cycle
[8510.999s][info][gc] GC(814) Pause Remark 3453M->3253M(4400M) 6.148ms
[8511.169s][info][gc] GC(814) Pause Cleanup 3257M->3257M(4400M) 0.237ms
[8511.175s][info][gc] GC(814) Concurrent Cycle 498.528ms
[8517.070s][info][gc] GC(815) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 3407M->3027M(4400M) 43.500ms
[8522.090s][info][gc] GC(816) Pause Young (Mixed) (G1 Evacuation Pause) 3171M->2679M(4400M) 35.117ms
[8871.218s][info][gc] GC(864) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 3508M->3055M(4400M) 32.869ms
[8871.291s][info][gc] GC(865) Pause Young (Mixed) (G1 Evacuation Pause) 3191M->2649M(4400M) 23.285ms
[8871.395s][info][gc] GC(866) Pause Young (Mixed) (G1 Evacuation Pause) 2841M->2334M(4400M) 39.246ms
[8884.593s][info][gc] GC(867) Pause Young (Normal) (G1 Evacuation Pause) 3158M->2543M(4400M) 47.816ms
[8900.239s][info][gc] GC(868) Pause Young (Normal) (G1 Evacuation Pause) 3303M->2717M(4400M) 119.216ms
[8909.072s][info][gc] GC(869) Pause Young (Concurrent Start) (G1 Humongous Allocation) 2893M->2809M(4400M) 100.257ms
[8909.073s][info][gc] GC(870) Concurrent Cycle
[8909.518s][info][gc] GC(870) Pause Remark 2839M->2783M(4400M) 21.737ms
[8909.856s][info][gc] GC(870) Pause Cleanup 2812M->2812M(4400M) 0.841ms
[8909.862s][info][gc] GC(870) Concurrent Cycle 789.182ms
[8913.640s][info][gc] GC(871) Pause Young (Prepare Mixed) (GCLocker Initiated GC) 3306M->2840M(4400M) 28.903ms
[8915.470s][info][gc] GC(872) Pause Young (Mixed) (G1 Evacuation Pause) 2976M->2507M(4400M) 49.899ms
[8922.028s][info][gc] GC(873) Pause Young (Mixed) (G1 Evacuation Pause) 2691M->2477M(4400M) 54.924ms
[8942.004s][info][gc] GC(874) Pause Young (Normal) (G1 Evacuation Pause) 3173M->2712M(4400M) 54.207ms
[8955.817s][info][gc] GC(875) Pause Young (Normal) (G1 Evacuation Pause) 3336M->2907M(4400M) 55.824ms
[8968.690s][info][gc] GC(876) Pause Young (Concurrent Start) (G1 Evacuation Pause) 3451M->3107M(4400M) 51.610ms
[8968.690s][info][gc] GC(877) Concurrent Cycle
[8969.134s][info][gc] GC(877) Pause Remark 3124M->2852M(4400M) 7.101ms
[8969.292s][info][gc] GC(877) Pause Cleanup 2853M->2853M(4400M) 0.242ms
[8969.298s][info][gc] GC(877) Concurrent Cycle 608.383ms
[8985.128s][info][gc] GC(878) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 3267M->3000M(4400M) 50.497ms
[8989.467s][info][gc] GC(879) Pause Young (Mixed) (GCLocker Initiated GC) 3152M->2643M(4400M) 34.153ms
[8991.538s][info][gc] GC(880) Pause Young (Mixed) (G1 Evacuation Pause) 2827M->2456M(4400M) 39.838ms
[8999.418s][info][gc] GC(881) Pause Young (Normal) (G1 Evacuation Pause) 3128M->2590M(4400M) 31.502ms
[9009.196s][info][gc] GC(882) Pause Young (Normal) (G1 Evacuation Pause) 3254M->2736M(4400M) 40.132ms
[9019.289s][info][gc] GC(883) Pause Young (Normal) (G1 Evacuation Pause) 3375M->2848M(4400M) 47.809ms


2. Regarding traffic to the site, could you please let me know how can get the traffic. Currently we have 45 Gerrit users with two repositories. 

3. "as a rule of thumb you need one core per concurrent fetch/clone request" ->How do we make sure that one core is used per concurrent fetch/clone request" 

4. I have set the sshd.batchthreads equals to 2 and still there is no improvement in the performance in the gerrit server. 

pshetty

unread,
Jun 28, 2021, 6:13:59 AM6/28/21
to Repo and Gerrit Discussion
HI Mathias,

Could you please help me in tuning the Gerrit service,  due to the increased load we are unable to push/fetch and need to restart the gerrit service everytime. Please help
Reply all
Reply to author
Forward
0 new messages