java.lang.OutOfMemoryError: Java heap space - Gerrit 3.11.3

37 views
Skip to first unread message

Guy Levkowitz

unread,
Aug 26, 2025, 11:14:27 AM (12 days ago) Aug 26
to Repo and Gerrit Discussion
Hey

We getting lots of below error , and we need to restart the gerrit service as it get stuck and developers can't work , this is not the first time it happend almos every 2-4 days , 

we have server with 
free -g
              total        used        free      shared  buff/cache   available
Mem:            161          42          24           2          94         115
Swap:            23           0          23

with 48 CPUs that get high load when it happends, the attached is after we restart the service 


this is the container config in gerrit.config file:
[container]
    user = git
    javaHome = /usr/lib/jvm/java-21-openjdk-21.0.6.0.7-1.0.1.el8.x86_64
    startupTimeout = 120
    heapLimit = 48g
    javaOptions = "-Xms8g"
    #javaOptions = "-Xmx64g"
    javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
    javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
    javaOptions = "-XX:+UseG1GC"
    javaOptions = "-XX:MaxGCPauseMillis=200"
    javaOptions = "-XX:+UseStringDeduplication"
    #javaOptions = "-XX:+HeapDumpOnOutOfMemoryError"
    javaOptions = "-XX:HeapDumpPath=/opt/gerrit/logs"
    javaOptions = "-Xlog:gc*:file=/opt/gerrit/logs/jvm_gc.log:time,uptime,tags:filecount=10,filesize=50M"

what else can be done ? 

 
[2025-08-26T17:50:20.144+03:00] [SSH gerrit query --current-patch-set --format json 6035441 (builder)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user builder account 1000503) during gerrit query --current-patch-set --format=json 6035441
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.search.comparators.LongComparator.<init>(LongComparator.java:37)
        at org.apache.lucene.search.SortField.getComparator(SortField.java:529)
        at org.apache.lucene.search.FieldValueHitQueue.<init>(FieldValueHitQueue.java:137)
        at org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue.<init>(FieldValueHitQueue.java:98)
        at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:161)
        at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:454)
        at org.apache.lucene.search.IndexSearcher$2.newCollector(IndexSearcher.java:652)
        at org.apache.lucene.search.IndexSearcher$2.newCollector(IndexSearcher.java:639)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:684)
        at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:583)
        at com.google.gerrit.lucene.LuceneChangeIndex$QuerySource.doRead(LuceneChangeIndex.java:438)
        at com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:362)
        at com.google.gerrit.lucene.LuceneChangeIndex$QuerySource$1.call(LuceneChangeIndex.java:359)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
        at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:912)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.runWith(Thread.java:1596)
        at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-08-26T17:50:20.145+03:00] [[events-log] LocalEventsDb housekeeper] WARN  com.zaxxer.hikari.pool.HikariPool : [events-log] LocalEventsDb - Thread starvation or clock leap detected (housekeeper delta=50s35ms308µs508ns).
[2025-08-26T17:50:39.956+03:00] [SSH git-receive-pack /gsoapis (builder)] WARN  com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker did not call end() before returning
[2025-08-26T17:50:39.958+03:00] [SSH git-receive-pack /gsoapis (builder)] ERROR com.google.gerrit.server.git.receive.AsyncReceiveCommits : error while processing push
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at com.google.gerrit.server.git.receive.AsyncReceiveCommits.preReceive(AsyncReceiveCommits.java:407)
        at com.google.gerrit.server.git.receive.AsyncReceiveCommits.lambda$asHook$0(AsyncReceiveCommits.java:351)
        at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:2287)
        at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:2200)
        at com.google.gerrit.sshd.commands.Receive.runImpl(Receive.java:98)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:109)
48_cpus_2025_08_26_18_11_55_mRemoteNG_confCons.xml_perkins.png

Matthias Sohn

unread,
Aug 26, 2025, 11:47:11 AM (12 days ago) Aug 26
to Guy Levkowitz, Repo and Gerrit Discussion
On Tue, Aug 26, 2025 at 5:14 PM Guy Levkowitz <sil...@gmail.com> wrote:
Hey

We getting lots of below error , and we need to restart the gerrit service as it get stuck and developers can't work , this is not the first time it happend almos every 2-4 days , 

we have server with 
free -g
              total        used        free      shared  buff/cache   available
Mem:            161          42          24           2          94         115
Swap:            23           0          23

with 48 CPUs that get high load when it happends, the attached is after we restart the service 


this is the container config in gerrit.config file:
[container]
    user = git
    javaHome = /usr/lib/jvm/java-21-openjdk-21.0.6.0.7-1.0.1.el8.x86_64
    startupTimeout = 120
    heapLimit = 48g
    javaOptions = "-Xms8g"
    #javaOptions = "-Xmx64g"

Did you intentionally comment out this option setting the maximum heap size ?
 
    javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
    javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
    javaOptions = "-XX:+UseG1GC"
    javaOptions = "-XX:MaxGCPauseMillis=200"
    javaOptions = "-XX:+UseStringDeduplication"
    #javaOptions = "-XX:+HeapDumpOnOutOfMemoryError"
    javaOptions = "-XX:HeapDumpPath=/opt/gerrit/logs"
    javaOptions = "-Xlog:gc*:file=/opt/gerrit/logs/jvm_gc.log:time,uptime,tags:filecount=10,filesize=50M"

How did you configure thread pool sizes ?
How is the JGit cache configured ?
 
what else can be done ? 
  • Install a monitoring solution to record metrics, I recommend https://gerrit.googlesource.com/gerrit-monitoring/+/refs/heads/master
  • check if the Java gc uses a large % of CPU, does Java gc use >20% of available CPUs ? Excessive Java gc activity can trigger OOM errors
  • find out which requests have a large memory allocation in the request logs (httpd_log and sshd_log), see https://gerrit-review.googlesource.com/Documentation/logs.html
  • reduce thread pool sizes until this doesn't happen anymore
  • move fetch (upload-pack) requests to a Gerrit replica, this can also be deployed co-located on the same server, this helps to separate
    large fetch requests from the other requests in different JVMs and this allows to use parallelGC for the replica, which has a higher throughput but longer pauses.
  • Do you host repositories containing large (most often binary) blobs ? 
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/f2903e86-3ba8-4ff0-8d49-5a9788dcb322n%40googlegroups.com.

Guy Levkowitz

unread,
Aug 27, 2025, 1:21:13 AM (11 days ago) Aug 27
to Repo and Gerrit Discussion
Hey

Regarding the  binary blobs - we are push binary files to our repos 

About "
  • reduce thread pool sizes until this doesn't happen anymore - how can I do this 
today we have defined of:
minThreads = 25
maxThreads = 150

Should I reduce to 
maxThreads = 120





ב-יום שלישי, 26 באוגוסט 2025 בשעה 18:47:11 UTC+3, Matthias Sohn כתב/ה:

Matthias Sohn

unread,
Aug 27, 2025, 5:47:33 AM (11 days ago) Aug 27
to Guy Levkowitz, Repo and Gerrit Discussion
On Wed, Aug 27, 2025 at 7:21 AM Guy Levkowitz <sil...@gmail.com> wrote:
Hey

Regarding the  binary blobs - we are push binary files to our repos 

How large are these binary files typically ? 

About "
  • reduce thread pool sizes until this doesn't happen anymore - how can I do this 
today we have defined of:
minThreads = 25
maxThreads = 150

Should I reduce to 
maxThreads = 120

That depends on your server's workload.

The most important thread pool size is sshd.threads. It defines the max. number of concurrent sshd requests
and also for all git requests (both via https and ssh). Shouldn't be larger than 2 * number of available CPUs.
A fetch or clone request can keep a CPU core busy for the runtime of the request. So if your server has 48 CPU cores
start with 48 and check metrics if the server can carry your load. If not reduce it, if it seems you have some
head room you can try to increase it. If you use the gerrit-monitoring solution you can check the process and queues
dashboards to see the current load and if requests are queueing up when more concurrent requests arrive than can
be handled by the available worker threads.
 
Please don't use top-posting on this list. We prefer interleaved posting which simplifies following the conversation
in a mail thread.

Reply all
Reply to author
Forward
0 new messages