Thread locks on ExponentiallyDecayingReservoir

23 views
Skip to first unread message

Nuno Costa

unread,
Oct 15, 2025, 6:43:48 AM (2 days ago) Oct 15
to Repo and Gerrit Discussion
Hi All,

After we upgraded from 3.4.8 to 3.9.11(Java11), we noticed some thread locks that caused thread exhaustion and a Gerrit restart was needed to clear them.

This happen both in high and low usage of Gerrit.
One of the occurrences was during the weekend, when the server was with a load <5 on a cpu with 256 cores.

We focused our investigation specially on this occurrence to take the high load issue out of the equation.

The prometheus metrics and log parsing did not clearly show any specific action that could trigger this behaviour.

On the last occurrence of the issue we took a thread dump where we could see that it was being locked on com.codahale.metrics.ExponentiallyDecayingReservoir.

The locks would occur on different types of operations. Below is just one example:

HTTP GET /gerrit/config/server/version (N/A from some.ip" #47448 prio=5 os_prio=0 cpu=0.68ms elapsed=425.32s tid=0x00007ea694009000 nid=0x2d332a waiting on condition  [0x00007e97207f4000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java...@11.0.25/Native Method)
- parking to wait for  <0x00007ebd0ed60f20> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java...@11.0.25/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java...@11.0.25/AbstractQueuedSynchronizer.java:885)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java...@11.0.25/AbstractQueuedSynchronizer.java:917)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java...@11.0.25/AbstractQueuedSynchronizer.java:1240)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java...@11.0.25/ReentrantReadWriteLock.java:959)
at com.codahale.metrics.ExponentiallyDecayingReservoir.lockForRescale(ExponentiallyDecayingReservoir.java:197)
at com.codahale.metrics.ExponentiallyDecayingReservoir.rescale(ExponentiallyDecayingReservoir.java:164)
at com.codahale.metrics.ExponentiallyDecayingReservoir.rescaleIfNeeded(ExponentiallyDecayingReservoir.java:122)
at com.codahale.metrics.ExponentiallyDecayingReservoir.update(ExponentiallyDecayingReservoir.java:94)
at com.codahale.metrics.ExponentiallyDecayingReservoir.update(ExponentiallyDecayingReservoir.java:84)
at com.codahale.metrics.Histogram.update(Histogram.java:41)
at com.codahale.metrics.Timer.update(Timer.java:199)
at com.codahale.metrics.Timer.update(Timer.java:94)
at com.google.gerrit.metrics.dropwizard.DropWizardMetricMaker$TimerImpl.doRecord(DropWizardMetricMaker.java:462)
at com.google.gerrit.metrics.Timer0.record(Timer0.java:85)
at com.google.gerrit.metrics.dropwizard.TimerImplN$2.doRecord(TimerImplN.java:53)
at com.google.gerrit.metrics.Timer3.record(Timer3.java:124)
at com.google.gerrit.metrics.Timer3$Context.record(Timer3.java:59)
at com.google.gerrit.metrics.TimerContext.stop(TimerContext.java:47)
at com.google.gerrit.metrics.Timer3$Context.stop(Timer3.java:44)
at com.google.gerrit.metrics.TimerContext.close(TimerContext.java:56)
at com.google.gerrit.metrics.Timer3$Context.close(Timer3.java:44)
at com.google.gerrit.server.plugincontext.PluginContext.runLogExceptions(PluginContext.java:215)
at com.google.gerrit.server.plugincontext.PluginSetContext.lambda$runEach$1(PluginSetContext.java:148)
at com.google.gerrit.server.plugincontext.PluginSetContext$$Lambda$370/0x00007eb3991410b0.accept(Unknown Source)
at java.lang.Iterable.forEach(java...@11.0.25/Iterable.java:75)
at com.google.gerrit.server.plugincontext.PluginSetContext.runEach(PluginSetContext.java:148)
at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:333)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:293)
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:283)
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:184)
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:89)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at com.google.gerrit.httpd.raw.StaticModule$PolyGerritFilter.doFilter(StaticModule.java:406)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.GetUserFilter.doFilter(GetUserFilter.java:92)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.RequireSslFilter.doFilter(RequireSslFilter.java:72)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.RunAsFilter.doFilter(RunAsFilter.java:120)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.SetThreadNameFilter.doFilter(SetThreadNameFilter.java:62)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:139)
at com.googlesource.gerrit.plugins.readonly.ReadOnly.doFilter(ReadOnly.java:99)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:135)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
at com.googlesource.gerrit.plugins.javamelody.GerritMonitoringFilter.doFilter(GerritMonitoringFilter.java:66)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:135)
at com.xxx.gerrit.plugins.cloneblock.CloneblockHTTPFilter.doFilter(CloneblockHTTPFilter.java:115)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:135)
at com.google.gerrit.httpd.AllowRenderInFrameFilter.doFilter(AllowRenderInFrameFilter.java:56)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:135)
at com.google.gerrit.httpd.AllRequestFilter$FilterProxy.doFilter(AllRequestFilter.java:141)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.RequestCleanupFilter.doFilter(RequestCleanupFilter.java:60)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.RequestMetricsFilter.doFilter(RequestMetricsFilter.java:92)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.gerrit.httpd.RequestContextFilter.doFilter(RequestContextFilter.java:64)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:121)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:133)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:54)
at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at org.eclipse.jetty.server.HttpChannel$$Lambda$1301/0x00007ea690684c40.dispatch(Unknown Source)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.lang.Thread.run(java...@11.0.25/Thread.java:829)

We decided to set metrics.reservoir = SlidingTimeWindowArray(we did not had any configuration before, so we were using the default ExponentiallyDecaying) to check if this would help avoiding the locked threads.

Gerrit is running without any issues since then.
System load are what we expect during business hours.
No visible difference in the quality of the prometheus metrics.

We reviewed all the release notes between 3.5 and 3.9.11 and we did not see anything that might be the cause of this lock. 2 metrics were implicitly added but did not expected to be the cause of this situation.
We also run gatling tests and did not triggered any issue.

What could have changed on the gerrit code that might be triggering this behaviour?

Any tips to understand the root cause are appreciated :)

Thanks,
Nuno
Reply all
Reply to author
Forward
0 new messages