Increased Gerrit rest api connections caused system hang

124 views
Skip to first unread message

Sehen

unread,
Jun 10, 2025, 11:27:02 AM6/10/25
to Repo and Gerrit Discussion
Hi,
Recently, there have been a lot of slowness and stuttering on the Gerrit server, so we are reviewing it, but we don't know how to do it, so we are going to ask for your help.
I would be grateful if you could review it together.

Gerrit version: v3.5.2
Situation :
In recent months, Gerrit has been slowing down during business hours, or not being able to access the web or clone the code.
I think the problem is the recent increase in the amount of access (httpd rest api calls), and the moment the connection is cut off also coincides with the time when the number of Metrics rest API calls increases.
Action:
I am testing by increasing and decreasing the values of httpd.maxThreads, minThreads, and maxQueued in gerrit.config.
We have identified that a large part of the rest API calls are /a/changes/xxxx, so we are testing them by increasing or decreasing the changes, change_notes, and cache memoryLimit values.
However, the phenomenon has not improved and the interruption phenomenon continues.
Request:
I ask for help to see if there is anything wrong with my settings or if there is something I need to do further.

attached files is the values that monitor the metrics at the time of slow performance or no connection.
My guess is that the httpd active threads value is small, but the moment it rises, and the moment it reaches the maxThreads value, it disconnects.
How does Gerrit behave when the active threads count reaches maxThreads? I wonder if it queues additional requests and processes them sequentially, or if the system goes into an unresponsive waiting state.
gerrit_metrics.png

etc. Other features
1) The system load value of the server that Gerrit is running is high. (min 6, avg 15, max 50)
2) The spec of the Gerrit server is as follows.
    Memory = 128GB
    CPU = 48core
3) The contents of gerrit.config are as follows.
[sshd]
    listenAddress = *:29xxx
    threads = 1000
    loginGraceTime = 600s
    maxConnectionsPerUser = 0
    waitTimeout = 30m
    idleTimeout = 5h
[httpd]
    listenUrl = proxy-https://xxx.xxx.xxx.xxx:xxxx/
    requestLog = true
    maxThreads = 500
    minThreads = 500
    maxQueued = 3000
[container]
    heapLimit = 60000m
[cache]
    directory = cache
[cache "git_file_diff"]
    diskLimit = 3g
[cache "gerrit_file_diff"]
    diskLimit = 3g

Matthias Sohn

unread,
Jun 10, 2025, 11:50:09 AM6/10/25
to Sehen, Repo and Gerrit Discussion
You are probably overloading your server, this is the upper limit for concurrent git requests.
Don't allow more than 2 threads per CPU core.
See https://groups.google.com/g/repo-discuss/c/BVii174v_JI/m/F0iCzIl6AwAJ
 
    loginGraceTime = 600s
    maxConnectionsPerUser = 0
    waitTimeout = 30m
    idleTimeout = 5h
[httpd]
    listenUrl = proxy-https://xxx.xxx.xxx.xxx:xxxx/
    requestLog = true
    maxThreads = 500
    minThreads = 500
    maxQueued = 3000
[container]
    heapLimit = 60000m
[cache]
    directory = cache
[cache "git_file_diff"]
    diskLimit = 3g
[cache "gerrit_file_diff"]
    diskLimit = 3g

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/33b1e577-1974-4dfc-b10f-8a02d95c694en%40googlegroups.com.

Sehen

unread,
Jun 11, 2025, 12:57:22 AM6/11/25
to Repo and Gerrit Discussion
Thanks for the answer  Matthias Sohn.

How does Gerrit work if HTTPD or SSHD Threads are added above the maxThreads setting at once?
Is it queued and processed sequentially?
In my experience, when the limit is reached, the gerrit system seems to stop responding.
What happens if you allocate 2 threads per CPU core = 96, and you get 200 threads at once?

2025년 6월 11일 수요일 오전 12시 50분 9초 UTC+9에 Matthias Sohn님이 작성:

Luca Milanesio

unread,
Jun 11, 2025, 6:13:44 AM6/11/25
to Repo and Gerrit Discussion, Luca Milanesio


> On 11 Jun 2025, at 05:57, Sehen <ohse...@gmail.com> wrote:
>
> Thanks for the answer Matthias Sohn.
>
> How does Gerrit work if HTTPD or SSHD Threads are added above the maxThreads setting at once?
> Is it queued and processed sequentially?

If you have more concurrent users than 2x the number of CPUs, you should be looking at a more scalable and elastic setup, such as K8s-Gerrit or AWS-Gerrit.
If you are on-premises, you should look at multiple Gerrit primaries in HA.

Lastly, if you are referring to a temporary burst of requests, they’ll be queued in the Jetty HTTP queue. You can configure how many of them should be queued.

> In my experience, when the limit is reached, the gerrit system seems to stop responding.

That’s not my experience, unless your Gerrit server is misconfigured, unless you reached the max capacity and then it is expected.
If you have a motorway of 4 lanes and they are all fully occupied, the montorway is stuck. How can that be different?

> What happens if you allocate 2 threads per CPU core = 96, and you get 200 threads at once?

200 - (2x96) = 8 threads will be queuened until an executor becomes available.

HTH

Luca.

>
> 2025년 6월 11일 수요일 오전 12시 50분 9초 UTC+9에 Matthias Sohn님이 작성:
> On Tue, Jun 10, 2025 at 5:27 PM Sehen <ohse...@gmail.com> wrote:
> Hi,
> Recently, there have been a lot of slowness and stuttering on the Gerrit server, so we are reviewing it, but we don't know how to do it, so we are going to ask for your help.
> I would be grateful if you could review it together.
>
> Gerrit version: v3.5.2
> Situation :
> In recent months, Gerrit has been slowing down during business hours, or not being able to access the web or clone the code.
> I think the problem is the recent increase in the amount of access (httpd rest api calls), and the moment the connection is cut off also coincides with the time when the number of Metrics rest API calls increases.
> Action:
> I am testing by increasing and decreasing the values of httpd.maxThreads, minThreads, and maxQueued in gerrit.config.
> We have identified that a large part of the rest API calls are /a/changes/xxxx, so we are testing them by increasing or decreasing the changes, change_notes, and cache memoryLimit values.
> However, the phenomenon has not improved and the interruption phenomenon continues.
> Request:
> I ask for help to see if there is anything wrong with my settings or if there is something I need to do further.
>
> attached files is the values that monitor the metrics at the time of slow performance or no connection.
> My guess is that the httpd active threads value is small, but the moment it rises, and the moment it reaches the maxThreads value, it disconnects.
> How does Gerrit behave when the active threads count reaches maxThreads? I wonder if it queues additional requests and processes them sequentially, or if the system goes into an unresponsive waiting state.
>
>
> To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/f13515a0-8361-45bb-ad76-7b08e67c879dn%40googlegroups.com.

Matthias Sohn

unread,
Jun 11, 2025, 8:14:53 AM6/11/25
to Luca Milanesio, Repo and Gerrit Discussion
On Wed, Jun 11, 2025 at 12:13 PM Luca Milanesio <luca.mi...@gmail.com> wrote:


> On 11 Jun 2025, at 05:57, Sehen <ohse...@gmail.com> wrote:
>
> Thanks for the answer  Matthias Sohn.
>
> How does Gerrit work if HTTPD or SSHD Threads are added above the maxThreads setting at once?
> Is it queued and processed sequentially?

If you have more concurrent users than 2x the number of CPUs, you should be looking at a more scalable and elastic setup, such as K8s-Gerrit or AWS-Gerrit.
If you are on-premises, you should look at multiple Gerrit primaries in HA.

Lastly, if you are referring to a temporary burst of requests, they’ll be queued in the Jetty HTTP queue. You can configure how many of them should be queued.

> In my experience, when the limit is reached, the gerrit system seems to stop responding.

That’s not my experience, unless your Gerrit server is misconfigured, unless you reached the max capacity and then it is expected.
If you have a motorway of 4 lanes and they are all fully occupied, the montorway is stuck. How can that be different?

> What happens if you allocate 2 threads per CPU core = 96, and you get 200 threads at once?

200 - (2x96) = 8 threads will be queuened until an executor becomes available.

If you overload the system by configuring too large thread pools all threads will suffer and the JVM starts choking
when the Java GC can't keep pace with the too high workload.
 
Typically the load is dominated by upload-pack requests for large repositories. You can offload them to a Gerrit replica
which you can run co-located on the same machine where the Gerrit primary runs. Use a load balancer to route
https traffic automatically (upload-pack to replica, all other requests to primary), for ssh you can use different ports or host names.
Use parallelGC for the replica to optimize for throughput and generational ZGC (if you run on Java 21 or higher) for the primary
to optimize response time.

If you need more insights into the runtime behavior install a monitoring solution like

-Matthias


Sehen

unread,
Jun 15, 2025, 1:29:52 PM6/15/25
to Repo and Gerrit Discussion
Your answers have been very helpful. Thank you.
Referring to the answers, we are considering HA configuration, redundancy, etc. However, it will take some time to review the current configuration to change.
So I want to change the value of httpd.maxThreads to '2 threads per CPU core'.

In addition, we are considering installing the quota plugin to restrict RestAPI calls to Gerrit so that we don't get into trouble with excessive calls.
- https://gerrit.googlesource.com/plugins/quota/
- https://gerrit-ci.gerritforge.com/plugin-manager/

However, I'm running Gerrit 3.5.2, and the latest quota plugin doesn't work on this version, so I don't know which version to build.
Can you help me in this area?
- Gerrit : 3.5.2
- What tag version of the quota plugin should I build with?
- I can't build with the Bazel version listed in the .bazelversion in the quota code, so I'd appreciate it if you could tell me which version of Bazel is suitable.


2025년 6월 11일 수요일 오후 9시 14분 53초 UTC+9에 Matthias Sohn님이 작성:

Luca Milanesio

unread,
Jun 15, 2025, 4:10:09 PM6/15/25
to Repo and Gerrit Discussion, Luca Milanesio
Sehen,

Firstly, please do respect the etiquette of using this mailing list and adopt the interleaved answer style [1].

On 15 Jun 2025, at 18:29, Sehen <ohse...@gmail.com> wrote:

Your answers have been very helpful. Thank you.
Referring to the answers, we are considering HA configuration, redundancy, etc. However, it will take some time to review the current configuration to change.

Moving to HA is quite simple, should you need an expert to help you moving to HA quickly, you can get in touch with companies providing Enterprise Support at [2].

So I want to change the value of httpd.maxThreads to '2 threads per CPU core'.

In addition, we are considering installing the quota plugin to restrict RestAPI calls to Gerrit so that we don't get into trouble with excessive calls.
- https://gerrit.googlesource.com/plugins/quota/
- https://gerrit-ci.gerritforge.com/plugin-manager/

However, I'm running Gerrit 3.5.2, and the latest quota plugin doesn't work on this version, so I don't know which version to build.

The quota plugin was also built with Gerrit v3.5.x, see the archived artifacts at [3].

Can you help me in this area?
- Gerrit : 3.5.2
- What tag version of the quota plugin should I build with?
- I can't build with the Bazel version listed in the .bazelversion in the quota code, so I'd appreciate it if you could tell me which version of Bazel is suitable.

Matthias Sohn

unread,
Jun 15, 2025, 4:24:08 PM6/15/25
to Luca Milanesio, Repo and Gerrit Discussion
On Sun, Jun 15, 2025 at 10:10 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
Sehen,

Firstly, please do respect the etiquette of using this mailing list and adopt the interleaved answer style [1].

On 15 Jun 2025, at 18:29, Sehen <ohse...@gmail.com> wrote:

Your answers have been very helpful. Thank you.
Referring to the answers, we are considering HA configuration, redundancy, etc. However, it will take some time to review the current configuration to change.

Moving to HA is quite simple, should you need an expert to help you moving to HA quickly, you can get in touch with companies providing Enterprise Support at [2].

So I want to change the value of httpd.maxThreads to '2 threads per CPU core'.
As I said earlier the load of Gerrit servers is typically dominated by upload-pack (fetch) requests on large repositories.
Hence you should limit sshd.threads to 2 threads per CPU. This option limits the number of concurrent
git requests both over ssh and http. httpd.maxThreads limits the number of concurrent requests via http, including
git requests over http and REST API requests.
 
In addition, we are considering installing the quota plugin to restrict RestAPI calls to Gerrit so that we don't get into trouble with excessive calls.
- https://gerrit.googlesource.com/plugins/quota/
- https://gerrit-ci.gerritforge.com/plugin-manager/

However, I'm running Gerrit 3.5.2, and the latest quota plugin doesn't work on this version, so I don't know which version to build.
You should consider upgrading at least to the latest 3.5 service release which is 3.5.6.
The upgrade to a higher service release of the same minor release should be straight-forward since they mostly fix bugs.

The quota plugin was also built with Gerrit v3.5.x, see the archived artifacts at [3].

Can you help me in this area?
- Gerrit : 3.5.2
- What tag version of the quota plugin should I build with?
- I can't build with the Bazel version listed in the .bazelversion in the quota code, so I'd appreciate it if you could tell me which version of Bazel is suitable.
Use bazelisk to start the build which should automatically select the correct bazel version for the respective Gerrit source code version.
 
Reply all
Reply to author
Forward
0 new messages