Gerrit hangs on ssh commands (2.7-rc4)

568 views
Skip to first unread message

Tomas Hellberg

unread,
Aug 27, 2013, 7:42:09 AM8/27/13
to repo-d...@googlegroups.com
I've had rock-solid performance on Gerrit 2.5 and I recently upgraded to 2.6.1, but a problem with the diff view of files with CRLF encoding forced me to upgrade to 2.7-rc4 (which fixes that problem).

I've been running 2.7-rc4 for three days now and both yesterday and today Gerrit stopped serving commands over ssh. If I connect without giving a command I get the "****    Welcome to Gerrit Code Review    ****" message, but both the "gerrit" and "git" commands hang indefinitely. Both times I had to restart the Gerrit service.

I don't see anything related to this in the error log. Is there anything clever I can do the next time this happens to try to understand what is happening?


Matthias Sohn

unread,
Aug 27, 2013, 9:45:09 AM8/27/13
to Tomas Hellberg, Repo and Gerrit Discussion
create a thread dump this should reveal where the ssh commands hang

--
Matthias 

Tomas Hellberg

unread,
Aug 28, 2013, 7:53:35 AM8/28/13
to repo-d...@googlegroups.com, Tomas Hellberg
Today, at the same time of day, Gerrit stopped working again, but today with different symptoms. The previous two days, the web front end worked, and it was only the ssh commands that didn't work. Today "git push" stopped working while other commands such as "fetch" worked. The web didn't work att all and responded with:

HTTP ERROR: 500

Problem accessing /. Reason:

    Guice provision errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:71)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error


The dump shows lots of LDAP entries, so I'm suspecting something related to that.

Doug Kelly

unread,
Aug 28, 2013, 2:38:32 PM8/28/13
to repo-d...@googlegroups.com, Tomas Hellberg
On Wednesday, August 28, 2013 6:53:35 AM UTC-5, Tomas Hellberg wrote:
Today, at the same time of day, Gerrit stopped working again, but today with different symptoms. The previous two days, the web front end worked, and it was only the ssh commands that didn't work. Today "git push" stopped working while other commands such as "fetch" worked. The web didn't work att all and responded with:

HTTP ERROR: 500

Problem accessing /. Reason:

    Guice provision errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:71)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
This means a database connection failed.  Usually there's a more specific error in the logs.
 


The dump shows lots of LDAP entries, so I'm suspecting something related to that.
The number of LDAP connections being made does look curious, but doesn't necessarily mean it's causing your DB issue.  I'd also suggest reviewing your caching settings?  Maybe a cache value can be tweaked to give you slightly better performance.

--Doug
 

Tomas Hellberg

unread,
Sep 3, 2013, 4:10:50 AM9/3/13
to repo-d...@googlegroups.com, Tomas Hellberg
Gerrit just stopped responding to git commands again over ssh. Here's the thread dump: https://gist.github.com/tomhe/6420933

Not sure if it's related but the error log shows some of these:

[2013-09-03 09:49:43,384] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
        at sun.nio.ch.IOUtil.read(IOUtil.java:224)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:280)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:677)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:651)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:640)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1111)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:679)

On Tuesday, August 27, 2013 3:45:09 PM UTC+2, Matthias Sohn wrote:

Tomas Hellberg

unread,
Sep 3, 2013, 9:50:50 AM9/3/13
to repo-d...@googlegroups.com, Tomas Hellberg
It worked for a couple of hours and now git push stopped working, while git fetch appears to work fine. I see database connection errors in the log, like below, but I have no clue as to what might be causing them:

[2013-09-03 15:47:03,001] WARN  com.google.gerrit.server.account.AccountCacheImpl : Cannot load AccountState for 1000090
java.util.concurrent.ExecutionException: com.google.gwtorm.server.OrmException: Cannot open database connection
        at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
        at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
        at com.google.gerrit.server.account.AccountCacheImpl.get(AccountCacheImpl.java:83)
        at com.google.gerrit.server.IdentifiedUser.state(IdentifiedUser.java:210)
        at com.google.gerrit.server.IdentifiedUser.getEffectiveGroups(IdentifiedUser.java:262)
        at com.google.gerrit.server.project.ProjectControl.match(ProjectControl.java:427)
        at com.google.gerrit.server.project.ProjectControl.match(ProjectControl.java:420)
        at com.google.gerrit.server.project.RefControl.access(RefControl.java:521)
        at com.google.gerrit.server.project.RefControl.canPerform(RefControl.java:467)
        at com.google.gerrit.server.project.RefControl.isVisible(RefControl.java:111)
        at com.google.gerrit.server.project.ChangeControl.isRefVisible(ChangeControl.java:171)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl$3$2.isVisible(SuggestServiceImpl.java:245)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl.addSuggestion(SuggestServiceImpl.java:174)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl.suggestAccount(SuggestServiceImpl.java:158)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl.access$100(SuggestServiceImpl.java:58)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl$3.run(SuggestServiceImpl.java:250)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl$3.run(SuggestServiceImpl.java:220)
        at com.google.gerrit.httpd.rpc.BaseServiceImplementation.run(BaseServiceImplementation.java:68)
        at com.google.gerrit.httpd.rpc.SuggestServiceImpl.suggestChangeReviewer(SuggestServiceImpl.java:220)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at com.google.gwtjsonrpc.server.MethodHandle.invoke(MethodHandle.java:91)
        at com.google.gwtjsonrpc.server.JsonServlet.doService(JsonServlet.java:379)
        at com.google.gwtjsonrpc.server.JsonServlet.service(JsonServlet.java:265)
        at com.google.gerrit.httpd.rpc.GerritJsonServlet.service(GerritJsonServlet.java:120)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.google.gerrit.pgm.http.jetty.GetUserFilter.doFilter(GetUserFilter.java:76)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.gwtexpui.server.CacheControlFilter.doFilter(CacheControlFilter.java:70)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:64)
        at com.google.gerrit.httpd.AllRequestFilter$FilterProxy.doFilter(AllRequestFilter.java:57)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.gerrit.httpd.RequestContextFilter.doFilter(RequestContextFilter.java:75)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:67)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:365)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
        at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:679) 

Matthias Sohn

unread,
Sep 3, 2013, 10:08:54 AM9/3/13
to Tomas Hellberg, Repo and Gerrit Discussion
maybe your connection pool is exhausted or you need to increase max number of 
connections on the database

--
Matthias

Tomas Hellberg

unread,
Sep 3, 2013, 2:59:12 PM9/3/13
to repo-d...@googlegroups.com, Tomas Hellberg
Thanks. I'll try first with increasing the number of connections and see what that gives.

Tomas Hellberg

unread,
Sep 4, 2013, 4:01:47 AM9/4/13
to repo-d...@googlegroups.com, Tomas Hellberg
Nope, that didn't help. I increased the number from 100 to 300, but I still get database connection errors (and Gerrit stops responding). I'm running Postgresql, but I'm far from a wizard on this. I would appreciate any suggestion on how to try to find the problem, as my users are getting more and more frustrated.

Would connection pooling software be something for me to look into? I've got around 100 users where maybe 75 % are frequent users of the system.

Tomas Hellberg

unread,
Sep 4, 2013, 6:23:01 AM9/4/13
to repo-d...@googlegroups.com, Tomas Hellberg
It's gotten a lot worse now. Currently I'm getting 30-60 minutes uptime until the database connection errors starts and then I have to restart Gerrit.

Edwin Kempin

unread,
Sep 4, 2013, 6:42:12 AM9/4/13
to Tomas Hellberg, Repo and Gerrit Discussion



2013/9/4 Tomas Hellberg <tom...@gmail.com>

It's gotten a lot worse now. Currently I'm getting 30-60 minutes uptime until the database connection errors starts and then I have to restart Gerrit.

On Wednesday, September 4, 2013 10:01:47 AM UTC+2, Tomas Hellberg wrote:
Nope, that didn't help. I increased the number from 100 to 300,
Where did you increase the number of connections? Did you also do it in postgresql.conf?
 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tomas Hellberg

unread,
Sep 4, 2013, 6:54:01 AM9/4/13
to repo-d...@googlegroups.com, Tomas Hellberg
I increased it in my postgresql.conf. Is there another place?

Edwin Kempin

unread,
Sep 4, 2013, 7:10:35 AM9/4/13
to Tomas Hellberg, Repo and Gerrit Discussion



2013/9/4 Tomas Hellberg <tom...@gmail.com>

I increased it in my postgresql.conf. Is there another place?
IIRC it is important that a database connection is available for every thread.
This means the configuration for the max number of threads in gerrit config (threads for SSH, HTTP, batch etc.) in total must not be higher than the available database connections.
 

Matthias Sohn

unread,
Sep 4, 2013, 7:25:02 AM9/4/13
to Edwin Kempin, Tomas Hellberg, Repo and Gerrit Discussion
pool limit should be higher than all threads needing a DB connection:
 
database.poolLimit > sshd.threads + sshd.batchThreads + httpd.maxThreads

I remember we once had such problems when our LDAP server was slow.

--
Matthias

Blewitt, Alex

unread,
Sep 4, 2013, 7:30:04 AM9/4/13
to Matthias Sohn, Edwin Kempin, Tomas Hellberg, Repo and Gerrit Discussion

Should Gerrit print out a warning at startup when this condition isn’t met?

 

Since http.maxThreads = 25, and database.poolLikmit = 8 by default, shouldn’t these values be changed?

 

Alex

Tomas Hellberg

unread,
Sep 4, 2013, 7:38:46 AM9/4/13
to repo-d...@googlegroups.com, Edwin Kempin, Tomas Hellberg
Thank you Mathias and Edwin. I will try to increase these settings and see what that gives. I know that our LDAP server is rather slow, so this might very well be what's causing my problem.

Matthias Sohn

unread,
Sep 4, 2013, 7:43:53 AM9/4/13
to Tomas Hellberg, Repo and Gerrit Discussion, Edwin Kempin
we are monitoring response time of the LDAP server so that we get alerts if it gets too bad

--
Matthias

Edwin Kempin

unread,
Sep 4, 2013, 7:47:27 AM9/4/13
to Blewitt, Alex, Matthias Sohn, Tomas Hellberg, Repo and Gerrit Discussion



2013/9/4 Blewitt, Alex <Alex.B...@gs.com>

Should Gerrit print out a warning at startup when this condition isn’t met?

 

Since http.maxThreads = 25, and database.poolLikmit = 8 by default, shouldn’t these values be changed?

If I understand [1] it is not disallowed to have more threads than database connections. If a thread needs a database connection
and none is free, it will wait for a free connection, until the configured timeout [2] is reached.
Reply all
Reply to author
Forward
0 new messages