Jetty Thread Starvation

673 views
Skip to first unread message

m33...@gmail.com

unread,
Aug 21, 2020, 10:50:17 AM8/21/20
to Repo and Gerrit Discussion

Hi, we are quite often hitting thread starvation on what seems to be the Jetty thread pool. This ends up making Gerrit unusable and requires a restart.

[2020-08-21 08:06:49,105] [HTTP-163264] WARN  org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@73852720{STARTED,5<=25<=25,i=0,q=256}[ReservedThreadExecutor@2f63d51f{s=2/2,p=0}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@5e5d1383


We have seen some very large changes affecting many files (deleting ~5605 files) around the time of the thread starvation which might be the cause. I have read that diffing a large file can cause this issue.

We are running Gerrit 2.16.13 using Docker on a Centos VM with 20 CPUs with 60GB RAM. I have attached our configuration.

1. Are there any improvements or issues with our Gerrit configuration file?
2. What can we do to alleviate the thread pool from being exhausted?

Thanks, /M

[receive]
    enableSignedPush = true
    checkReferencedObjectsAreReachable = false

[gerrit]
    basePath = git
    editGpgKeys = true
    canonicalWebUrl = <REDACTED>
    serverId = <REDACTED>

[container]
    heapLimit = 32g
    user = gerrit
    javaHome = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre

[core]
    packedGitLimit = 8g
    packedGitWindowSize = 16k
    packedGitOpenFiles = 1024

[sshd]
    listenAddress = *:29418
    threads = 24
    waitTimeout = 120s
    maxConnectionsPerUser = 256
    batchThreads = 28
    advertisedAddress = <REDACTED>

[gc]
    startTime = Sun 04:00
    interval = 3 days

[auth]
    <REDACTED>

[ldap]
    <REDACTED>

[plugins]
    allowRemoteAdmin = true

[plugin "events-log"]
    storeUrl = jdbc:h2:/var/gerrit/db/ChangeEvents

[index]
    type = LUCENE

[httpd]
    listenUrl = proxy-https://*:8081/
    maxQueued = 256
    filterClass = com.googlesource.gerrit.plugins.saml.SamlWebFilter

[cache]
    directory = cache

[user]
    <REDACTED>

[sendemail]
    enable = false
    smtpServer = localhost

[noteDb "changes"]
    autoMigrate = false
    trial = false
    write = true
    read = true
    sequence = true
    primaryStorage = NOTE_DB
    disableReviewDb = true

[database]
    type = h2
    database = /var/gerrit/db/ReviewDB

[theme]
    topMenuColor = B1CAF2

[saml]
    keystorePath = /var/gerrit/etc/samlKeystore.jks
    metadataPath = file:///var/gerrit/FederationMetadata.xml
    useNameQualifier = false

Luca Milanesio

unread,
Aug 21, 2020, 11:01:55 AM8/21/20
to m33...@gmail.com, Luca Milanesio, Repo and Gerrit Discussion

On 21 Aug 2020, at 09:40, m33...@gmail.com <m33...@gmail.com> wrote:


Hi, we are quite often hitting thread starvation on what seems to be the Jetty thread pool. This ends up making Gerrit unusable and requires a restart.

[2020-08-21 08:06:49,105] [HTTP-163264] WARN  org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@73852720{STARTED,5<=25<=25,i=0,q=256}[ReservedThreadExecutor@2f63d51f{s=2/2,p=0}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@5e5d1383


That is typically the consequence of something that is blocking all threads at the same time.
In my personal experience, it is typically an external factor (LDAP, DBMS, other?)


We have seen some very large changes affecting many files (deleting ~5605 files) around the time of the thread starvation which might be the cause. I have read that diffing a large file can cause this issue.

That would have blocked one thread, but not all of them.


We are running Gerrit 2.16.13 using Docker on a Centos VM with 20 CPUs with 60GB RAM. I have attached our configuration.

1. Are there any improvements or issues with our Gerrit configuration file?
2. What can we do to alleviate the thread pool from being exhausted?

You should next time get a JVM thread dump to understand where all the threads are blocked.
Once you’ve identified the root cause, you can fix or mitigate it.

Making changes without knowing what is the problem is like shooting in the dark: it may work, or you may hurt someone :-(

P.S. A general comment on your config: there are way to many “redacted” and it should be difficult to understand how you configured things. Can you just replace the name of your company with “<redacted>.com” and leave everything else? 

HTH

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/3f5ba3df-184f-411d-9ba0-795c3480c36cn%40googlegroups.com.

m33...@gmail.com

unread,
Aug 21, 2020, 12:04:29 PM8/21/20
to Repo and Gerrit Discussion
On Friday, August 21, 2020 at 5:01:55 PM UTC+2 lucamilanesio wrote:

On 21 Aug 2020, at 09:40, m33...@gmail.com <m33...@gmail.com> wrote:


Hi, we are quite often hitting thread starvation on what seems to be the Jetty thread pool. This ends up making Gerrit unusable and requires a restart.

[2020-08-21 08:06:49,105] [HTTP-163264] WARN  org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@73852720{STARTED,5<=25<=25,i=0,q=256}[ReservedThreadExecutor@2f63d51f{s=2/2,p=0}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@5e5d1383


That is typically the consequence of something that is blocking all threads at the same time.
In my personal experience, it is typically an external factor (LDAP, DBMS, other?)

That makes a lot of sense, we have had plenty of issues with LDAP in the past so I wouldn't be surprised if that is the cause.
 


We have seen some very large changes affecting many files (deleting ~5605 files) around the time of the thread starvation which might be the cause. I have read that diffing a large file can cause this issue.

That would have blocked one thread, but not all of them.


We are running Gerrit 2.16.13 using Docker on a Centos VM with 20 CPUs with 60GB RAM. I have attached our configuration.

1. Are there any improvements or issues with our Gerrit configuration file?
2. What can we do to alleviate the thread pool from being exhausted?

You should next time get a JVM thread dump to understand where all the threads are blocked.
Once you’ve identified the root cause, you can fix or mitigate it.

Making changes without knowing what is the problem is like shooting in the dark: it may work, or you may hurt someone :-(

Of course! Will report back when I have a thread dump in hand.


P.S. A general comment on your config: there are way to many “redacted” and it should be difficult to understand how you configured things. Can you just replace the name of your company with “<redacted>.com” and leave everything else?

Absolutely, please see below,

Thanks /M

[receive]
    enableSignedPush = true
    checkReferencedObjectsAreReachable = false

[gerrit]
    basePath = git
    editGpgKeys = true
      canonicalWebUrl = https://gerrit.company.com/
      serverId = dc3d245d-b898-4b72-972f-6a612be4f327


[container]
    heapLimit = 32g
    user = gerrit
      javaHome = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre


[core]
    packedGitLimit = 8g
    packedGitWindowSize = 16k
    packedGitOpenFiles = 1024

[sshd]
    listenAddress = *:29418
    threads = 24
    waitTimeout = 120s
    maxConnectionsPerUser = 256
    batchThreads = 28
      advertisedAddress = gerrit.company.com:29418


[gc]
    startTime = Sun 04:00
    interval = 3 days

[auth]
    type = HTTP_LDAP
    logoutUrl = https://fs.company.com/adfs/ls/?wa=wsignout1.0
    httpHeader = X-SAML-UserName
    httpDisplaynameHeader = X-SAML-DisplayName
    httpEmailHeader = X-SAML-EmailHeader
    httpExternalIdHeader = X-SAML-ExternalId

[ldap]
    server = ldap://ad.company.com
    sslVerify = false
    referral = follow
    accountBase = ou=Locations,dc=ad,dc=company,dc=com
    accountPattern = (&(objectClass=user)(userPrincipalName=${username}))
    accountFullName = displayName
    accountEmailAddress = mail
    accountSshUserName = ${sAMAccountName.toLowerCase}
    accountMemberField = memberOf
    fetchMemberOfEagerly = true
    groupBase = ou=Locations,dc=ad,dc=company,dc=com
    groupPattern = (&(objectClass=group)(cn=${groupname}))
     username = service...@company.com

[plugins]
     allowRemoteAdmin = true

# plugin in use by zuul2 in ci_service_environment repo

[plugin "events-log"]
     storeUrl = jdbc:h2:/var/gerrit/db/ChangeEvents

[index]
     type = LUCENE

[httpd]
    listenUrl = proxy-https://*:8081/
    # based on sshd.maxConnectionsPerUser (default 200)

    maxQueued = 256
    filterClass = com.googlesource.gerrit.plugins.saml.SamlWebFilter

[cache]
    directory = cache

[user]
    email = svc-...@company.com

Matthias Sohn

unread,
Aug 21, 2020, 12:11:58 PM8/21/20
to Luca Milanesio, m33...@gmail.com, Repo and Gerrit Discussion
On Fri, Aug 21, 2020 at 5:01 PM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 21 Aug 2020, at 09:40, m33...@gmail.com <m33...@gmail.com> wrote:


Hi, we are quite often hitting thread starvation on what seems to be the Jetty thread pool. This ends up making Gerrit unusable and requires a restart.

[2020-08-21 08:06:49,105] [HTTP-163264] WARN  org.eclipse.jetty.util.thread.QueuedThreadPool : QueuedThreadPool[HTTP]@73852720{STARTED,5<=25<=25,i=0,q=256}[ReservedThreadExecutor@2f63d51f{s=2/2,p=0}] rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@5e5d1383


That is typically the consequence of something that is blocking all threads at the same time.
In my personal experience, it is typically an external factor (LDAP, DBMS, other?)


We have seen some very large changes affecting many files (deleting ~5605 files) around the time of the thread starvation which might be the cause. I have read that diffing a large file can cause this issue.

That would have blocked one thread, but not all of them.


We are running Gerrit 2.16.13 using Docker on a Centos VM with 20 CPUs with 60GB RAM. I have attached our configuration.

1. Are there any improvements or issues with our Gerrit configuration file?
2. What can we do to alleviate the thread pool from being exhausted?

You should next time get a JVM thread dump to understand where all the threads are blocked.
Once you’ve identified the root cause, you can fix or mitigate it.

Whenever you have a situation where a request or Gerrit hangs create a couple of thread dumps
From the stack traces for each thread you can see what's going on and where it is stuck.
By comparing multiple of these thread dumps you can see if it is completely stuck or if something
is still moving.

As Luca pointed out already such global problems are often caused by some resource needed by many
threads like LDAP, database, filesystem etc is very slow. If e.g. a cache needs to be updated from
disk and multiple threads need that update only one thread will do the update and if it is slow all the
other threads are blocked waiting for the update to happen.
if you need a larger httpd thread pool increase httpd.maxThreads [1] the default is 25
 
[cache]
    directory = cache

[user]
    <REDACTED>

[sendemail]
    enable = false
    smtpServer = localhost

[noteDb "changes"]
    autoMigrate = false
    trial = false
    write = true
    read = true
    sequence = true
    primaryStorage = NOTE_DB
    disableReviewDb = true

[database]
    type = h2
    database = /var/gerrit/db/ReviewDB

[theme]
    topMenuColor = B1CAF2

[saml]
    keystorePath = /var/gerrit/etc/samlKeystore.jks
    metadataPath = file:///var/gerrit/FederationMetadata.xml
    useNameQualifier = false

Reply all
Reply to author
Forward
0 new messages