Gerrit spawning several HTTP threads causing gerrit to crash

235 views
Skip to first unread message

hari

unread,
May 31, 2019, 4:31:40 AM5/31/19
to Repo and Gerrit Discussion
I went through one of the earlier post where this issue was discussed but unfortunately no solutions were provided. I am facing the exact same issue very frequently, at least twice a week. Whenever this happens, only solution is to restart gerrit after which everything is back to normal. This is causing huge productivity loss. 

I am using Gerrit 2.13.9 version that is running on HTTPS/SSL. Whenever this happens, I see that httpd thread count is maxed out [256 threads] and CPU is at 100%.  My java melody plugin shows a huge spike in JDBC threads whenever we see this issue, not sure if this is due to this gerrit bug. All the java melody graphs are attached below. 

jm-gerri.PNG





As you can see, the whites vertical gaps in these graphs are when gerrit went unresponsive and needs to be restarted. Other system related charts for the same timeline is below. 

other-gerrit.PNG





My gerrit config is as below :


[database]
        type = mysql
        hostname = localhost
        database = gerrit12001
        username = git12001
        port = 32001
        connectionPool = false
        poolLimit = 5000
        poolMinIdle = 10
        poolMaxIdle = 90
        ## Enable Javamelody plugin to monitor DB and HTTP connexns ##
        dataSourceInterceptorClass = "com.googlesource.gerrit.plugins.javamelody.MonitoringDataSourceInterceptor"

[Note: we currently do not use database.ConnectionPool ]

[sshd]
        listenAddress = xxxxxxxx
        advertisedAddress = gerrit-master
        #threads = 2x num CPUs (DEFAULT) (currently 32x2 = 64)
        # NEw section -- MWS
        threads = 1024
        batchThreads = 512
        maxConnectionsPerUser = 1024
        CommandStartThreads = 5
        idleTimeout = 30m #Set idle ssh connections to timeout.
        backend = MINA


[httpd]
        requestLog = true
        listenUrl = proxy-https://xx.xx.x.x/r
        maxThreads = 256
        minThreads = 20
        Timeout = 60


The server which run this gerrit instance is ProLiant DL360 Gen9, 48 cores CPU 250GB memory.  We have around 3K active user and heavily used. We have around 500-1K fetches running on primary server at any given time


hari

unread,
May 31, 2019, 4:35:48 AM5/31/19
to Repo and Gerrit Discussion
And following is the stack-trace when gerrit goes unresponsive. 

Stack-trace
com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Cannot open ReviewDb
at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
while locating com.google.gerrit.reviewdb.server.ReviewDb
1 error
      at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028)
      at com.google.gerrit.httpd.GitOverHttpServlet$UploadFilter.doFilter(GitOverHttpServlet.java:266)
      at org.eclipse.jgit.http.server.SmartServiceInfoRefs$Chain.doFilter(SmartServiceInfoRefs.java:160)
      at org.eclipse.jgit.http.server.SmartServiceInfoRefs.doFilter(SmartServiceInfoRefs.java:112)
      at org.eclipse.jgit.http.server.glue.UrlPipeline$Chain.doFilter(UrlPipeline.java:235)
      at org.eclipse.jgit.http.server.RepositoryFilter.doFilter(RepositoryFilter.java:151)
      at org.eclipse.jgit.http.server.glue.UrlPipeline$Chain.doFilter(UrlPipeline.java:235)
      at org.eclipse.jgit.http.server.NoCacheFilter.doFilter(NoCacheFilter.java:80)
      at org.eclipse.jgit.http.server.glue.UrlPipeline$Chain.doFilter(UrlPipeline.java:235)
      at org.eclipse.jgit.http.server.glue.UrlPipeline.service(UrlPipeline.java:215)
      at org.eclipse.jgit.http.server.glue.SuffixPipeline.service(SuffixPipeline.java:101)
      at org.eclipse.jgit.http.server.glue.MetaFilter.doFilter(MetaFilter.java:175)
      at org.eclipse.jgit.http.server.glue.MetaServlet.service(MetaServlet.java:133)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
      at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286)
      at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276)
      at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181)
      at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
      at com.google.gerrit.httpd.GetUserFilter.doFilter(GetUserFilter.java:82)
      at com.google.gerrit.httpd.RequireSslFilter.doFilter(RequireSslFilter.java:77)
      at com.google.gwtexpui.server.CacheControlFilter.doFilter(CacheControlFilter.java:73)
      at com.google.gerrit.httpd.RunAsFilter.doFilter(RunAsFilter.java:122)
      at com.google.gerrit.httpd.ProjectBasicAuthFilter.doFilter(ProjectBasicAuthFilter.java:105)
      at com.google.gerrit.httpd.RequestMetricsFilter.doFilter(RequestMetricsFilter.java:60)
      at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:136)
      at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
      at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
      at com.googlesource.gerrit.plugins.javamelody.GerritMonitoringFilter.doFilter(GerritMonitoringFilter.java:67)
      at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:132)
      at com.google.gerrit.httpd.AllRequestFilter$FilterProxy.doFilter(AllRequestFilter.java:105)
      at com.google.gerrit.httpd.RequestContextFilter.doFilter(RequestContextFilter.java:75)
      at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
      at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135)
      at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
      at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
      at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
      at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
      at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
      at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
      at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
      at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
      at org.eclipse.jetty.server.Server.handle(Server.java:499)
      at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
      at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
      at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
      at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.gwtorm.server.OrmException: Cannot open database connection
      at com.google.gwtorm.jdbc.Database.newConnection(Database.java:130)
      at com.google.gwtorm.jdbc.JdbcSchema.<init>(JdbcSchema.java:40)
      at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$16.<init>(Unknown Source)
      at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$16_Factory_GwtOrm$$17.open(Unknown Source)
      at com.google.gwtorm.jdbc.Database.open(Database.java:122)
      at com.google.gerrit.server.schema.NotesMigrationSchemaFactory.open(NotesMigrationSchemaFactory.java:40)
      at com.google.gerrit.server.schema.NotesMigrationSchemaFactory.open(NotesMigrationSchemaFactory.java:25)
      at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:46)
      at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:27)
      at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70)
      at com.google.gerrit.server.util.ThreadLocalRequestContext$1$$FastClassByGuice$$75e0eb90.invoke(<generated>)
      at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264)
      at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401)
      at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376)
      at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
      at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085)
      at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
      ... 50 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 21,952 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
      at sun.reflect.GeneratedConstructorAccessor162.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
      at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:919)
      at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1694)
      at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1244)
      at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2397)
      at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2430)
      at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2215)
      at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:813)
      at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
      at sun.reflect.GeneratedConstructorAccessor50.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
      at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:399)
      at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:334)
      at com.google.gwtorm.jdbc.SimpleDataSource.getConnection(SimpleDataSource.java:104)
      at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at net.bull.javamelody.JdbcWrapper$3.invoke(JdbcWrapper.java:781)
      at net.bull.javamelody.JdbcWrapper$DelegatingInvocationHandler.invoke(JdbcWrapper.java:294)
      at com.sun.proxy.$Proxy16.getConnection(Unknown Source)
      at com.google.gwtorm.jdbc.Database.newConnection(Database.java:128)
      ... 66 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
      at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
      at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
      ... 90 more

Luca Milanesio

unread,
May 31, 2019, 4:40:28 AM5/31/19
to hari, Luca Milanesio, Repo and Gerrit Discussion
Hi,
I believe you have two separate issues.

1) The "blank space" in your JavaMelody graphs: that looks like a "stop-the-world" GC cycle
2) The stacktrace below: MySQL connectivity issues

With regards to 1), you have an excessive heap utilization: are you using slaves for the Git clone traffic? How big are your repos? Have you tried to get a heap analysis in JavaMelody to see where you consume all that memory?
With regards to 2), looks like you have some peaks of 60k queries per minute !!! (up to 1k queries a second) I am not surprised you end up eating all your TCP ports and then not able to connect to MySQL anymore. Why don't you use a co-located MySQL on localhost?

HTH

Luca.

-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/fc96d2a8-3efa-4f50-8821-d844b74f943c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Sohn

unread,
May 31, 2019, 5:06:22 AM5/31/19
to hari, Repo and Gerrit Discussion
On Fri, May 31, 2019 at 10:31 AM hari <bhh...@gmail.com> wrote:
I went through one of the earlier post where this issue was discussed but unfortunately no solutions were provided. I am facing the exact same issue very frequently, at least twice a week. Whenever this happens, only solution is to restart gerrit after which everything is back to normal. This is causing huge productivity loss. 

I am using Gerrit 2.13.9 version that is running on HTTPS/SSL. Whenever this happens, I see that httpd thread count is maxed out [256 threads] and CPU is at 100%.  My java melody plugin shows a huge spike in JDBC threads whenever we see this issue, not sure if this is due to this gerrit bug. All the java melody graphs are attached below. 

why didn't you upgrade at least to the latest 2.13.x release which is 2.13.14 ?

Note that the oldest version which is still maintained in the Open Source project is 2.15 [2].

The bug you referenced was fixed in 2.14.6 so if you need that you either upgrade to a higher version or apply this patch on stable-2.13 and roll your own build.
Are you using suexec [1] ? Users can only use that if you granted them the corresponding Capability.
 
what are your JVM settings and how did you configure jgit cache ?

Maybe you should consider to offload reads to slaves.

-Matthias

hari

unread,
May 31, 2019, 5:10:16 AM5/31/19
to Repo and Gerrit Discussion
Hi Luca,

Thanks for your quick reply. Regarding your questions, 

1) you have an excessive heap utilization: are you using slaves for the Git clone traffic? How big are your repos? Have you tried to get a heap analysis in JavaMelody to see where you consume all that memory?

          -    Yes, we are using read-only slaves worldwide for git clone traffic. However we have certain super users who will fetch directly from the gerrit master server ( these super users contribute to around 3K-4K fetches every day on gerrit master). And regarding the heap analysis, I am trying to generate heap dump as we speak. I can share it once it is generated and analyzed. 


2)looks like you have some peaks of 60k queries per minute !!! (up to 1k queries a second) I am not surprised you end up eating all your TCP ports and then not able to connect to MySQL anymore. Why don't you use a co-located MySQL on localhost?

-   Pardon my ignorance, but what you exactly mean by co-located mysql on localhost? Do you mean having multi-master ? 

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

hari

unread,
May 31, 2019, 5:12:14 AM5/31/19
to Repo and Gerrit Discussion
Also our repos are quite big. The biggest repo is around 60 G and entire repos constitute around 1000Gigs
Stack-trace

Luca Milanesio

unread,
May 31, 2019, 5:14:04 AM5/31/19
to hari, Luca Milanesio, Repo and Gerrit Discussion

On 31 May 2019, at 10:10, hari <bhh...@gmail.com> wrote:

Hi Luca,

Thanks for your quick reply. Regarding your questions, 

1) you have an excessive heap utilization: are you using slaves for the Git clone traffic? How big are your repos? Have you tried to get a heap analysis in JavaMelody to see where you consume all that memory?

          -    Yes, we are using read-only slaves worldwide for git clone traffic. However we have certain super users who will fetch directly from the gerrit master server ( these super users contribute to around 3K-4K fetches every day on gerrit master). And regarding the heap analysis, I am trying to generate heap dump as we speak. I can share it once it is generated and analyzed. 

4k fetches a day *isn't much* and I am surprised they raise so much heap, if the JGit cache is the problem, a big *IF* :-)
Do you have metrics or analytics?
Do you have a single master or several masters in HA in the center?



2)looks like you have some peaks of 60k queries per minute !!! (up to 1k queries a second) I am not surprised you end up eating all your TCP ports and then not able to connect to MySQL anymore. Why don't you use a co-located MySQL on localhost?

-   Pardon my ignorance, but what you exactly mean by co-located mysql on localhost? Do you mean having multi-master ? 

In your gerrit.config it looks like your MySQL server isn't on the same host but is remote. If you do 1k SQL queries per *second*, you'll be opening 1k sockets per second and you will very soon ran out of ephemeral ports.
You should either use pooling (not suggested on MySQL, you should move to PostgreSQL) OR move MySQL to the same host and use it through the loopback.

HTH

Luca.

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/9deab1f8-d8da-45d1-9b5e-7ce28d21fb96%40googlegroups.com.

hari

unread,
May 31, 2019, 5:18:16 AM5/31/19
to Repo and Gerrit Discussion
Yes, upgrade is in pipeline. We are about to upgrade to 2.16.8 next quarter. But I am just making sure, we are not hitting the same issue again may be due to poor workflow. And regarding the 'suexec', we indeed have given the 'RunAs' to few super-users (mostly service accounts). But I am unsure how to trace their fetch calls to confirm if they are using suexec parameters. 

Luca Milanesio

unread,
May 31, 2019, 5:26:10 AM5/31/19
to hari, Luca Milanesio, Repo and Gerrit Discussion

On 31 May 2019, at 10:18, hari <bhh...@gmail.com> wrote:

Yes, upgrade is in pipeline. We are about to upgrade to 2.16.8 next quarter.

Plan small steps, not skipping any release in between.

But I am just making sure, we are not hitting the same issue again may be due to poor workflow.

Don't expect an upgrade to solve your scalability issues: first understand what is the problem and how to change your setup to make it more scalable. Then, do small steps to improve things out.
Upgrade is one of them, but isn't the "silver bullet" to solve scalability issues.

HTH

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
May 31, 2019, 5:26:16 AM5/31/19
to hari, Repo and Gerrit Discussion
On Fri, May 31, 2019 at 11:18 AM hari <bhh...@gmail.com> wrote:
Yes, upgrade is in pipeline. We are about to upgrade to 2.16.8 next quarter. But I am just making sure, we are not hitting the same issue again may be due to poor workflow. And regarding the 'suexec', we indeed have given the 'RunAs' to few super-users (mostly service accounts). But I am unsure how to trace their fetch calls to confirm if they are using suexec parameters. 

withdraw this permission (after telling those who have it) to test if this fixes your issue ?
 
--

Matthias Sohn

unread,
May 31, 2019, 5:27:14 AM5/31/19
to hari, Repo and Gerrit Discussion
On Fri, May 31, 2019 at 11:12 AM hari <bhh...@gmail.com> wrote:
Also our repos are quite big. The biggest repo is around 60 G and entire repos constitute around 1000Gigs

how did you configure max heap size and core.packedGitLimit ?
 
--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/b465bc23-d4a0-4a41-b4af-e9d6cc6a9b9f%40googlegroups.com.

hari

unread,
May 31, 2019, 5:40:38 AM5/31/19
to Repo and Gerrit Discussion
Here's our current settings. 

[core]
        packedGitOpenFiles = 4096
        packedGitLimit = 8g
        packedGitWindowSize = 16k

[container]
        user = git12001
        javaHome = /scm/apps/jdk1.8.0_111/jre
        javaOptions = -Djavamelody.log=true
        heapLimit = 48g

Also I have disabled the 'RunAs' capability, will monitor for a week or so and will update if that resolved my issue. 
To unsubscribe, email repo-d...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

Matthias Sohn

unread,
May 31, 2019, 6:00:29 AM5/31/19
to hari, Repo and Gerrit Discussion
On Fri, May 31, 2019 at 11:40 AM hari <bhh...@gmail.com> wrote:
Here's our current settings. 

[core]
        packedGitOpenFiles = 4096
        packedGitLimit = 8g
        packedGitWindowSize = 16k

your packedGitLimit is way too small if you have 1TB of repositories and your largest repo has 60GB.

This means you can only read 8GB worth of git objects from cache, for all objects not in the cache
you need to read from the filesystem and (re-)parse git objects. If someone clones your 60GB
repository the cache content will be displaced at least 5 times and all requests for other repositories
need to go back and read from disk and (re-)parse objects.

If you increase it don't forget to also increase container.heapLimit.
Probably also packedGitOpenFiles is too small.

We use

core.packedGitOpenFiles =30000
core.packedGitLimit=96g
container.heapLimit=256g

Why did you increase the window size ?
 
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/26d4211e-c344-4c3c-8539-1decd52dac94%40googlegroups.com.

luca.mi...@gmail.com

unread,
May 31, 2019, 6:07:52 AM5/31/19
to Matthias Sohn, hari, Repo and Gerrit Discussion


Sent from my iPhone

On 31 May 2019, at 11:00, Matthias Sohn <matthi...@gmail.com> wrote:

On Fri, May 31, 2019 at 11:40 AM hari <bhh...@gmail.com> wrote:
Here's our current settings. 

[core]
        packedGitOpenFiles = 4096
        packedGitLimit = 8g
        packedGitWindowSize = 16k

your packedGitLimit is way too small if you have 1TB of repositories and your largest repo has 60GB.

This means you can only read 8GB worth of git objects from cache, for all objects not in the cache
you need to read from the filesystem and (re-)parse git objects. If someone clones your 60GB
repository the cache content will be displaced at least 5 times and all requests for other repositories
need to go back and read from disk and (re-)parse objects.

If you increase it don't forget to also increase container.heapLimit.
Probably also packedGitOpenFiles is too small.

We use

core.packedGitOpenFiles =30000
core.packedGitLimit=96g
container.heapLimit=256g

Wow, what JVM are you using? With Java 8 and a 256Gb of heap the STW GC would take minutes !!!!

Luca

Matthias Sohn

unread,
May 31, 2019, 8:15:43 AM5/31/19
to Luca Milanesio, hari, Repo and Gerrit Discussion
On Fri, May 31, 2019 at 12:07 PM <luca.mi...@gmail.com> wrote:


Sent from my iPhone

On 31 May 2019, at 11:00, Matthias Sohn <matthi...@gmail.com> wrote:

On Fri, May 31, 2019 at 11:40 AM hari <bhh...@gmail.com> wrote:
Here's our current settings. 

[core]
        packedGitOpenFiles = 4096
        packedGitLimit = 8g
        packedGitWindowSize = 16k

your packedGitLimit is way too small if you have 1TB of repositories and your largest repo has 60GB.

This means you can only read 8GB worth of git objects from cache, for all objects not in the cache
you need to read from the filesystem and (re-)parse git objects. If someone clones your 60GB
repository the cache content will be displaced at least 5 times and all requests for other repositories
need to go back and read from disk and (re-)parse objects.

If you increase it don't forget to also increase container.heapLimit.
Probably also packedGitOpenFiles is too small.

We use

core.packedGitOpenFiles =30000
core.packedGitLimit=96g
container.heapLimit=256g

Wow, what JVM are you using? With Java 8 and a 256Gb of heap the STW GC would take minutes !!!!

we use sapjvm 8.1 [1] using G1GC with the following options

    javaOptions = -Xms256g
    javaOptions = -Xmx256g
    javaOptions = -XX:+UnlockExperimentalVMOptions
    javaOptions = -XX:G1NewSizePercent=35
    javaOptions = -XX:MaxGCPauseMillis=500
    javaOptions = -XX:+UseGCLogFileRotation
    javaOptions = -XX:GCLogFileSize=40M
    javaOptions = -XX:NumberOfGCLogFiles=20
    javaOptions = -XX:+PrintGCDateStamps
    javaOptions = -XX:+PrintGCDetails
    javaOptions = -XX:+PrintGCApplicationStoppedTime
    javaOptions = -XX:+PrintTenuringDistribution
    javaOptions = -XX:+PrintAdaptiveSizePolicy
    javaOptions = -XX:+PrintReferenceGC
    javaOptions = -XX:+UnlockDiagnosticVMOptions
    javaOptions = -XX:+G1SummarizeRSetStats
    javaOptions = -XX:+GCHistory

during week days we serve around 100k http requests + 100k ssh requests per hour
from the master (we also use slaves mostly to offload load from build servers)
and GC pauses are typically between 10ms up to around 1s:

Screenshot 2019-05-31 at 13.34.07.png

Nasser Grainawi

unread,
May 31, 2019, 11:41:34 AM5/31/19
to Luca Milanesio, hari, Repo and Gerrit Discussion
On May 31, 2019, at 3:26 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 31 May 2019, at 10:18, hari <bhh...@gmail.com> wrote:

Yes, upgrade is in pipeline. We are about to upgrade to 2.16.8 next quarter.

Plan small steps, not skipping any release in between.

But I am just making sure, we are not hitting the same issue again may be due to poor workflow.

Don't expect an upgrade to solve your scalability issues: first understand what is the problem and how to change your setup to make it more scalable. Then, do small steps to improve things out.
Upgrade is one of them, but isn't the "silver bullet" to solve scalability issues.

I don't assume to know if this is your issue, but if you have a repo with many commits (millions), many tags (10k+), and many non-tag/non-change refs (10k+?), you could be running into the excess memory usage we discovered in TagSet. See https://gerrit-review.googlesource.com/c/gerrit/+/224772 for a partial improvement. We had one repo that could not run ls-remote without getting an OOM with anything less than a 70GB heap and with this change worked on anything 20GB or larger. We have a further improvement in progress that we hope will get it under 1GB for our use case.

Nasser


For more options, visit https://groups.google.com/d/optout.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

Luca Milanesio

unread,
May 31, 2019, 11:51:10 AM5/31/19
to Matthias Sohn, Luca Milanesio, hari, Repo and Gerrit Discussion
Is SAP-JVM 8.1 OpenSource? The above parameters I believe are specific to that implementation and won't work the OpenJDK :-(

Luca.


during week days we serve around 100k http requests + 100k ssh requests per hour
from the master (we also use slaves mostly to offload load from build servers)
and GC pauses are typically between 10ms up to around 1s:

Matthias Sohn

unread,
May 31, 2019, 8:37:01 PM5/31/19
to Luca Milanesio, hari, Repo and Gerrit Discussion
no, AFAIK it's not. 

Though SapMachine is Open Source, find it here
it's available since Java 11.

Almost all these options are also available in OpenJDK [1]
It seems only -XX:+GCHistory is a sapjvm specific option

I could successfully start Gerrit 2.16 using

$ java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)

and this JVM configuration in gerrit.config

[container]
    javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
    javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
    javaOptions = -Xloggc:/Users/xxx/gerrit-site-2.16/logs/javagc.log
    javaOptions = -XX:+UnlockExperimentalVMOptions
javaOptions = -XX:G1NewSizePercent=35
javaOptions = -XX:MaxGCPauseMillis=500
javaOptions = -XX:+UnlockDiagnosticVMOptions
javaOptions = -XX:+G1SummarizeRSetStats
    javaOptions = -XX:+UseG1GC
    javaHome = /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre


-Matthias

Matthias Sohn

unread,
May 31, 2019, 8:46:16 PM5/31/19
to Luca Milanesio, hari, Repo and Gerrit Discussion
also the full set of options which we use on sapjvm except the proprietary GCHistory works on AdoptOpenJDK:

    javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
    javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
    javaOptions = -Xloggc:/Users/d029788/tmp/gerrit-site-2.16/logs/javagc.log
    javaOptions = -XX:+UseG1GC
javaOptions = -XX:+UnlockExperimentalVMOptions
javaOptions = -XX:G1NewSizePercent=35
javaOptions = -XX:MaxGCPauseMillis=500
javaOptions = -XX:+UseGCLogFileRotation
javaOptions = -XX:GCLogFileSize=40M
javaOptions = -XX:NumberOfGCLogFiles=20
javaOptions = -XX:+PrintGCDateStamps
javaOptions = -XX:+PrintGCDetails
javaOptions = -XX:+PrintGCApplicationStoppedTime
javaOptions = -XX:+PrintTenuringDistribution
javaOptions = -XX:+PrintAdaptiveSizePolicy
javaOptions = -XX:+PrintReferenceGC
javaOptions = -XX:+UnlockDiagnosticVMOptions
javaOptions = -XX:+G1SummarizeRSetStats
    javaHome = /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre
Reply all
Reply to author
Forward
0 new messages