On 26 May 2025, at 18:25, lingalugari mohankrishna <lingalugari....@gmail.com> wrote:Hello Experts,
We have a 2 node cluster of Gerrit Running on 3.6.8 -- We know it is EOL (Planned for upgrade)
I have been observing too many Times my HTTP Requests on One Node is Going very high and at the same time I am observing from the queue Repository Index is getting triggered for certain repos and they are failing to sync the changes at the same time when my HTTP queue is going high.
This is literally impacting my Server Accessibility on UI. Any idea on how to fix these issues as HA seems to be not replicating the changes i suspect when i check queue below events can be seen they may be less sometimes and higher sometimes but HTTP queues are going very high.
index change:dev%2Fman%2Fsam~517760 => http://x.x.x.:8080 (try #0)
After service restart everything comes back normal and i had to restart each node twice today.
Before Restart:
Gerrit Code Review 3.6.8 now 10:07:28 UTC
uptime 6 days 15 hrs
Name |Entries | AvgGet |Hit Ratio|
| Mem Disk Space| |Mem Disk|
--------------------------------+---------------------+---------+---------+
adv_bases | | | |
change_notes | 123 | 29.6ms | 97% |
changeid_project | 1024 | | 85% |
changes_by_project | 1 | | 0% |
default_preferences | | | |
external_ids_map | 1 | | 99% |
groups | | | |
groups_bymember | 1024 | 290.8us | 99% |
groups_byname | 1 | 823.5us | 99% |
groups_bysubgroup | 357 | 258.0us | 99% |
groups_byuuid | 1339 | 1.6ms | 99% |
groups_external | 1 | 1.7s | 99% |
groups_external_persisted | | 1.6s | 0% |
ldap_group_existence | 7 | 338.3ms | 87% |
ldap_groups | 653 | 862.9ms | 99% |
ldap_groups_byinclude | 1024 | | 96% |
ldap_usernames | 1024 | 101.7us | 97% |
permission_sort | 1024 | 91.9us | 97% |
plugin_resources | 7 | | 2% |
project_list | 1 | 40.2s | 98% |
projects | 1024 | 2.9ms | 37% |
prolog_rules | | | |
soy_sauce_compiled_templates | 1 | 35.5ms | 99% |
sshkeys | 637 | 60.5ms | 99% |
static_content | 34 | 1.4ms | 85% |
lfs-lfs_project_locks | | | |
plugin-manager-plugins_list | 1 | 3.0s | 0% |
D accounts | 1024 22626 10.95m| 365.1us | 99% 100%|
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/b5ad845d-1abe-404f-80c9-1b60d6e05cb7n%40googlegroups.com.
Hi Luca,
Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
basePath = /shared/git
[index]
type = lucene
ramBufferSize = 4096m
maxTerms = 8192
maxBufferedDocs = 3000
threads = 16
batchThreads = 16
maxMergeCount = 100
reuseExistingDocuments = true
defaultLimit = 100
maxLimit = 500
cacheQueryResultsByChangeNum = true
[receive]
enableSignedPush = false
timeout = 150min
[transfer]
timeout = 120s
[sendemail]
smtpServer = localhost
smtpServerPort = 25
smtpUser = gerrit
[container]
user = gerrit
#javaHome = /usr/java/latest
javaHome = /usr/lib/jvm/jre
heapLimit = 190g
javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 200
batchThreads = 70
commandStartThreads = 24
maxConnectionsPerUser = 64
[httpd]
listenUrl = proxy-https://*:8080/
maxThreads = 1000
requestLog = true
acceptorThreads = 48
minThreads = 49
maxQueued = 2000
[cache]
directory = cache
threads = 0
[cache "project_list"]
maxAge = 130s
[lfs]
plugin = lfs
[accountPatchReviewDb]
url = jdbc:postgresql://postgres:5432/and_ha?user=mohan&password=krishna
[database "h2"]
autoServer = true
and
HA.config
main]
sharedDirectory = /shared/git
[peerInfo]
strategy = static
[autoReindex]
enabled = false
delay = 10
pollInterval = 0
[peerInfo "static"] (on Node2)
url = http://Node1:8080
url = http://Node2:8080
[peerInfo "static"] (on Node1)
url = http://Node2:8080
url = http://Node1:8080
[http]
maxTries = 360
retryInterval = 10000
connectionTimeout = 5000
socketTimeout = 5000
[healthcheck]
enable = true
[cache]
threadPoolSize = 8
[index]
threadPoolSize = 12
numStripedLocks = 10000
maxTries = 10
[websession]
cleanupInterval = 24 hoursOn Wednesday, 28 May 2025 at 12:53:14 UTC+5:30 Luca Milanesio wrote:On 26 May 2025, at 18:25, lingalugari mohankrishna <lingalugari....@gmail.com> wrote:Hello Experts,
We have a 2 node cluster of Gerrit Running on 3.6.8 -- We know it is EOL (Planned for upgrade)Yes, please do upgrade. There are many issues that we fixed in the HA plugin and they just do not exist on your version.I have been observing too many Times my HTTP Requests on One Node is Going very high and at the same time I am observing from the queue Repository Index is getting triggered for certain repos and they are failing to sync the changes at the same time when my HTTP queue is going high.
This is literally impacting my Server Accessibility on UI. Any idea on how to fix these issues as HA seems to be not replicating the changes i suspect when i check queue below events can be seen they may be less sometimes and higher sometimes but HTTP queues are going very high.First: upgrade to get the most of fixes in the HA plugin.Second: check your change.mergeabilityComputationBehavior setting in gerrit.config and make sure that is disabled (the default).I don’t believe you have just two nodes: how can two nodes in HA configuration schedule replication events?Can you please share with us the full picture?
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/d3601673-4656-4453-8767-714b32b3cef5n%40googlegroups.com.
On 28 May 2025, at 16:37, Matthias Sohn <matthi...@gmail.com> wrote:On Wed, May 28, 2025 at 2:58 PM lingalugari mohankrishna <lingalugari....@gmail.com> wrote:Hi Luca,
Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
basePath = /shared/git
[index]
type = lucene
ramBufferSize = 4096mthis option doesn't exist, but there is an option for each index called index.<index name>.ramBufferSize
maxTerms = 8192
maxBufferedDocs = 3000
threads = 16
batchThreads = 16
maxMergeCount = 100
reuseExistingDocuments = true
defaultLimit = 100
maxLimit = 500
cacheQueryResultsByChangeNum = true
[receive]
enableSignedPush = false
timeout = 150minWhy do you think you need a 2.5 hour timeout for receiving push requests ?
[transfer]
timeout = 120s
[sendemail]
smtpServer = localhost
smtpServerPort = 25
smtpUser = gerrit
[container]
user = gerrit
#javaHome = /usr/java/latest
javaHome = /usr/lib/jvm/jreWhich Java version are you using ?heapLimit = 190g
javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 200sshd.threads limits the number of concurrently executed ssh requests and also the concurrently executed git requests (both ssh and http)
see https://gerrit-documentation.storage.googleapis.com/Documentation/3.6.8/config-gerrit.html#sshd.threadsAs a rule of thumb fetching from a repo can keep one CPU core busy, hence allowing up to200 concurrent git requests on a 64 CPU machine may overload it.
batchThreads = 70
commandStartThreads = 24
maxConnectionsPerUser = 64
[httpd]
listenUrl = proxy-https://*:8080/
maxThreads = 1000
requestLog = true
acceptorThreads = 48
minThreads = 49
maxQueued = 2000
[cache]
directory = cache
threads = 0
[cache "project_list"]
maxAge = 130sWhy do you think you need that ? This means gerrit potentially has to scan the file system to find projects every 2 minutes.
[gitweb]
cgi = /var/www/git/gitweb.cgi
type = gitweb
[core]
packedGitLimit = 10g
packedGitWindowSize = 16k
packedGitOpenFiles = 10240
#[gc]
# startTime = Sun 00:00
# interval = 1 w
[pack]
threads = 24
windowMemory = 16g
[plugins]
allowRemoteAdmin = true
[hooks]
path = /data/gerrit/hooks
syncHookTimeout = 900
[change]
# mergeabilityComputationBehavior = API_REF_UPDATED_AND_CHANGE_REINDEX
cumulativeCommentSizeLimit = 10m
maxComments = 40000
maxUpdates = 10000
conflictsPredicateEnabled = true
submitWholeTopic = true
maxPatchSets = 1000000
These options are meant to protect your gerrit server from a run-away scriptor a crazy user to prevent issues caused by excessive size/number of objects related to a change.You are effectively switching off these limits. A more cautious approach would be to start with the defaultsand increase limits when a user hits it with a reasonable use case you want to support.Though it would be an interesting test to push 1 million patchsets for a single changeand see how gerrit can cope with it.
[lfs]
plugin = lfs
[accountPatchReviewDb]
url = jdbc:postgresql://postgres:5432/and_ha?user=mohan&password=krishna
[database "h2"]The database section ceased to exist in gerrit 3.x. It was used to configure reviewdb in gerrit 2.x.autoServer = true
and
HA.config
main]
sharedDirectory = /shared/gitWhich type of filesystem are you using for the sharedDir ?
[peerInfo]
strategy = static
[autoReindex]
enabled = false
delay = 10
pollInterval = 0
[peerInfo "static"] (on Node2)
url = http://Node1:8080
url = http://Node2:8080
[peerInfo "static"] (on Node1)
url = http://Node2:8080
url = http://Node1:8080
[http]
maxTries = 360
retryInterval = 10000
connectionTimeout = 5000
socketTimeout = 5000
[healthcheck]
enable = true
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/CAKSZd3QZShcfHy7pKNLod7LF%3DMXSm9e%2B%3D2s8KZE71E3MA2YgNA%40mail.gmail.com.
On 28 May 2025, at 16:37, Matthias Sohn <matthi...@gmail.com> wrote:On Wed, May 28, 2025 at 2:58 PM lingalugari mohankrishna <lingalugari....@gmail.com> wrote:Hi Luca,
Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
basePath = /shared/git
[index]
type = lucene
ramBufferSize = 4096mthis option doesn't exist, but there is an option for each index called index.<index name>.ramBufferSizeI am curious where did you get this suggestion from? What were you trying to achieve? Did you experience some Lucene slowdowns? Do we have some outdated documentation mentioning it somewhere?maxTerms = 8192
maxBufferedDocs = 3000
threads = 16
batchThreads = 16
maxMergeCount = 100
reuseExistingDocuments = trueThis option is not available on Gerrit v3.8x., where did you find it?
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/397869A2-9778-455E-84FD-C9ECE2BE6C7B%40gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/11FB7B0C-89A0-4B12-AF0B-729BEC2CC511%40gmail.com.
[…]
I confirm my final statement: I don’t believe you have *one* problem but a *series of problems* due mainly to misconfiguration of the system.Based on the traffic you have, the data on the repositories and the users, it should be pretty straightforward to come to a much balanced set of settings.
Finding the optimal configuration is challenging with hundreds of Gerrit options.
Can we make this easier ?
I have been advocating for years for having safer defaults :-) and I’m still all for it.Some examples:1) Infinite timeouts (still today, Gerrit wait indefinitely for the SMTP server to respond)2) Unsafe defaults which may cause repository corruption (we have warning on the release notes, but unsure of how many people actually read them)