After upgrade to 2.11.1, the "Cannot open ReviewDb" error become more frequent occurring, which can not be self recovery

624 views
Skip to first unread message

Chunlin Zhang

unread,
Jun 30, 2015, 10:03:34 PM6/30/15
to repo-d...@googlegroups.com
This type of error happen in previous versions (I used 2.1.7 2.5 2.5.2 2.6.1 2.8 2.8.5), but it is no so frequent before, now after upgrade to 2.11.1, it happens several times one week.
I even write a dedicated tool to watch the error log and restart gerrit after detect this error, because if it occur it will not recovery, only restart gerrit can solve.
But this error can cause jenkins output error for many projects, it is really annoying.

I have 1 master and 4 slave gerrits to maintain ,this error only happen in slave gerrit.
The database type is postgresql, the master gerrit use a read-write user to connect database, and the 4 slave gerrits use a read-only user to connect.
The postgresql database is in the same server with master gerrit, 3 slave gerrit in different data center, 1 slave gerrit in the same LAN and under same network switch with master gerrit ( so that will not be the network problem to this slave), but the error happen in this closer slave gerrit frequently after upgrade to 2.11.1

---------- 
I search in this group:

In "Gerrit often open ReviewDb error."( https://groups.google.com/d/topic/repo-discuss/GlQwrHRujnE/discussion ), Shawn said "In general you need one for every sshd.threads, one for every httpd.maxThreads, plus a few spare."

According the method above it will be 32*1.5+25+1= 74 , my config is poolLimit = 100

Part of my gerrit.config in slave gerrit:
```
[database]
type = postgresql
hostname = 10.125.159.101
database = reviewdb
username = gerrit2_readonly
        connectionPool = true
        poolLimit = 100
        poolMaxIdle = 16
[sshd]
listenAddress = *:29418
[httpd]
listenUrl = http://*:8080/
[cache]
directory = cache
```

---------
There is 2 kind of error/warning in error_log when this issue happen.

One is:
```
[2015-07-01 09:02:24,641] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user midhzdgj account 1000591) during git-upload-pack '/projects/mt6592_6290/mediatek-bsp'
com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1018)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:69)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:445)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:379)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.gwtorm.server.OrmException: Cannot open database connection
at com.google.gwtorm.jdbc.Database.newConnection(Database.java:130)
at com.google.gwtorm.jdbc.JdbcSchema.<init>(JdbcSchema.java:40)
at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$22.<init>(Unknown Source)
at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$22_Factory_GwtOrm$$23.open(Unknown Source)
at com.google.gwtorm.jdbc.Database.open(Database.java:122)
at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:45)
at com.google.gerrit.server.config.RequestScopedReviewDbProvider.get(RequestScopedReviewDbProvider.java:27)
at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70)
at com.google.gerrit.server.util.ThreadLocalRequestContext$1$$FastClassByGuice$$75e0eb90.invoke(<generated>)
at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:230)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:148)
at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:54)
at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:46)
at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1009)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1059)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1005)
... 13 more
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at com.google.gwtorm.jdbc.Database.newConnection(Database.java:128)
... 30 more
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1144)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 32 more
```

The other is a a warning:
```
[2015-07-01 09:02:24,960] WARN  com.google.gerrit.server.git.ScanningChangeCacheImpl : Cannot fetch changes for platform/hardware/akm
java.util.concurrent.ExecutionException: com.google.gwtorm.server.OrmException: Cannot open database connection
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2348)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2320)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.gerrit.server.git.ScanningChangeCacheImpl.get(ScanningChangeCacheImpl.java:78)
at com.google.gerrit.server.git.VisibleRefFilter.visibleChanges(VisibleRefFilter.java:180)
at com.google.gerrit.server.git.VisibleRefFilter.filter(VisibleRefFilter.java:91)
at com.google.gerrit.server.git.VisibleRefFilter.filter(VisibleRefFilter.java:169)
at com.google.gerrit.server.git.VisibleRefFilter.getAdvertisedRefs(VisibleRefFilter.java:158)
at org.eclipse.jgit.transport.AbstractAdvertiseRefsHook.advertiseRefs(AbstractAdvertiseRefsHook.java:62)
at org.eclipse.jgit.transport.UploadPack.sendAdvertisedRefs(UploadPack.java:785)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:664)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:630)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:80)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:445)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:379)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.gwtorm.server.OrmException: Cannot open database connection
at com.google.gwtorm.jdbc.Database.newConnection(Database.java:130)
at com.google.gwtorm.jdbc.JdbcSchema.<init>(JdbcSchema.java:40)
at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$22.<init>(Unknown Source)
at com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$22_Factory_GwtOrm$$23.open(Unknown Source)
at com.google.gwtorm.jdbc.Database.open(Database.java:122)
at com.google.gerrit.server.util.ManualRequestContext.<init>(ManualRequestContext.java:37)
at com.google.gerrit.server.util.OneOffRequestContext.open(OneOffRequestContext.java:48)
at com.google.gerrit.server.git.ScanningChangeCacheImpl$Loader.load(ScanningChangeCacheImpl.java:99)
at com.google.gerrit.server.git.ScanningChangeCacheImpl$Loader.load(ScanningChangeCacheImpl.java:85)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
... 27 more
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at com.google.gwtorm.jdbc.Database.newConnection(Database.java:128)
... 37 more
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1144)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
... 39 more
```

How can I do with this issue?
Thanks

Edwin Kempin

unread,
Jul 1, 2015, 2:21:56 AM7/1/15
to Chunlin Zhang, Repo and Gerrit Discussion
2015-07-01 4:03 GMT+02:00 Chunlin Zhang <zhangc...@gmail.com>:
This type of error happen in previous versions (I used 2.1.7 2.5 2.5.2 2.6.1 2.8 2.8.5), but it is no so frequent before, now after upgrade to 2.11.1, it happens several times one week.
I even write a dedicated tool to watch the error log and restart gerrit after detect this error, because if it occur it will not recovery, only restart gerrit can solve.
But this error can cause jenkins output error for many projects, it is really annoying.

I have 1 master and 4 slave gerrits to maintain ,this error only happen in slave gerrit.
The database type is postgresql, the master gerrit use a read-write user to connect database, and the 4 slave gerrits use a read-only user to connect.
The postgresql database is in the same server with master gerrit, 3 slave gerrit in different data center, 1 slave gerrit in the same LAN and under same network switch with master gerrit ( so that will not be the network problem to this slave), but the error happen in this closer slave gerrit frequently after upgrade to 2.11.1

---------- 
I search in this group:

In "Gerrit often open ReviewDb error."( https://groups.google.com/d/topic/repo-discuss/GlQwrHRujnE/discussion ), Shawn said "In general you need one for every sshd.threads, one for every httpd.maxThreads, plus a few spare."

According the method above it will be 32*1.5+25+1= 74 , my config is poolLimit = 100
Just setting the poolLimit is not sufficient. You must make sure that your database is configured to have that many connections available.
For postgres you would configure this in postgresql.conf.
 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chunlin Zhang

unread,
Jul 1, 2015, 4:11:21 AM7/1/15
to Repo and Gerrit Discussion
(Resend for repo-discuss)

On Wed, Jul 1, 2015 at 2:21 PM, Edwin Kempin <edwin....@gmail.com> wrote:
Just setting the poolLimit is not sufficient. You must make sure that your database is configured to have that many connections available.
For postgres you would configure this in postgresql.conf.
Yes, but my config in postgresql.conf is: "max_connections = 500" ,or I should change it to more?
And the master gerrit seldom come out this error( I don't remember this issue happen in the master gerrit ever)

Luca Milanesio

unread,
Jul 1, 2015, 4:58:49 AM7/1/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hi Chunlin,
to be honest it is not normal for Gerrit to use so many connections: there is something else that is rotten over there.

Typically Gerrit should use no more than 10-16 connections for a medium-loaded server (a few hundreds of active users).
How many users you have? How many HTTP or SSH hits per minutes you have?

Luca.

Chunlin Zhang

unread,
Jul 1, 2015, 6:08:15 AM7/1/15
to Luca Milanesio, Repo and Gerrit Discussion
On Wed, Jul 1, 2015 at 4:58 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi Chunlin,
to be honest it is not normal for Gerrit to use so many connections: there is something else that is rotten over there.

Typically Gerrit should use no more than 10-16 connections for a medium-loaded server (a few hundreds of active users).
How many users you have? How many HTTP or SSH hits per minutes you have?
How to estimate this?
There are not only developer user, there are many continuous integration accounts to obtain source code concurrently.
For example, this error occur this morning, and I ask a SCM and he told me before the error happened he just triggered 4 jenkins jobs, 2 is "repo sync -j40", the other 2 is "repo sync -j8", and there may be other user to do repo sync.


Luca.

Luca Milanesio

unread,
Jul 1, 2015, 6:18:10 AM7/1/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hi Chunlin,
you can run some basic Gerrit commands [1] for that:
- gerrit show-queue
- gerrit show-connections
- gerrit show-caches

Luca.

Luca Milanesio

unread,
Jul 1, 2015, 7:02:40 AM7/1/15
to Chunlin Zhang, Repo and Gerrit Discussion

Martin Fick

unread,
Jul 1, 2015, 1:20:58 PM7/1/15
to repo-d...@googlegroups.com, Chunlin Zhang
Could you have a per-user postgress connection limitthat you
are running up against?

-Martin

--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Chunlin Zhang

unread,
Jul 1, 2015, 10:13:34 PM7/1/15
to Luca Milanesio, Repo and Gerrit Discussion
I will write a tools to log the 3 command output (for example log once for 1 minute), so I can give result when gerrit being high-load or just before the error happen.
But wait some days before I can give the result.

Luca Milanesio

unread,
Jul 2, 2015, 2:07:00 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hi Chinlin,
thanks for letting us know, looking forward to see the graphs at the connection pool limit problem interval !

Luca

Sent from my iPhone

Chunlin Zhang

unread,
Jul 2, 2015, 2:54:20 AM7/2/15
to Luca Milanesio, Repo and Gerrit Discussion
One of a high-load output:

2015-07-02 12:48:52,975 INFO cmd stdout: Task     State        StartTime         Command
2015-07-02 12:48:52,975 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-02 12:48:52,976 INFO cmd stdout: ca8774ad              12:50:47.639      git-upload-pack '/VIBEUI/Apps' (huangzh3
2015-07-02 12:48:52,976 INFO cmd stdout: b9f704bc              12:51:33.065      git-upload-pack '/platform/external/tiny
2015-07-02 12:48:52,976 INFO cmd stdout: 39355480              12:51:35.094      git-upload-pack '/platform/external/elfu
2015-07-02 12:48:52,976 INFO cmd stdout: 797aec2b              12:51:35.126      git-upload-pack '/platform/external/jdif
2015-07-02 12:48:52,976 INFO cmd stdout: 5995a871              12:51:35.151      git-upload-pack '/platform/external/tags
2015-07-02 12:48:52,976 INFO cmd stdout: 79618c83              12:51:35.161      git-upload-pack '/platform/external/esd'
2015-07-02 12:48:52,976 INFO cmd stdout: b95b84d3              12:51:35.181      git-upload-pack '/platform/external/tcpd
2015-07-02 12:48:52,977 INFO cmd stdout: 396794a0              12:51:35.191      git-upload-pack '/platform/prebuilts/cla
2015-07-02 12:48:52,977 INFO cmd stdout: 19bdf00b              12:51:35.430      git-upload-pack '/platform/hardware/libh
2015-07-02 12:48:52,977 INFO cmd stdout: f9b7bc2c              12:51:35.430      git-upload-pack '/platform/hardware/libh
2015-07-02 12:48:52,977 INFO cmd stdout: 59c7e891              12:51:35.430      git-upload-pack '/platform/external/java
2015-07-02 12:48:52,977 INFO cmd stdout: 39c23484              12:51:35.431      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,977 INFO cmd stdout: 79b3cc47              12:51:35.461      git-upload-pack '/platform/external/time
2015-07-02 12:48:52,977 INFO cmd stdout: b9adc4e2              12:51:35.467      git-upload-pack '/platform/external/libs
2015-07-02 12:48:52,977 INFO cmd stdout: f99edcac              12:51:35.473      git-upload-pack '/platform/external/apac
2015-07-02 12:48:52,977 INFO cmd stdout: 790e6cf1              12:51:37.284      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,978 INFO cmd stdout: 39247479              12:51:37.354      git-upload-pack '/device/generic/x86' (h
2015-07-02 12:48:52,978 INFO cmd stdout: 5929284f              12:51:37.353      git-upload-pack '/platform/external/ant-
2015-07-02 12:48:52,978 INFO cmd stdout: 79350ca9              12:51:37.358      git-upload-pack '/platform/external/andr
2015-07-02 12:48:52,978 INFO cmd stdout: b92f0459              12:51:37.376      git-upload-pack '/platform/frameworks/op
2015-07-02 12:48:52,978 INFO cmd stdout: 393b14a6              12:51:37.663      git-upload-pack '/platform/external/netc
2015-07-02 12:48:52,978 INFO cmd stdout: 99dd607e              12:51:37.663      git-upload-pack '/platform/packages/inpu
2015-07-02 12:48:52,978 INFO cmd stdout: 79e0ac38              12:51:37.664      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,978 INFO cmd stdout: b9eaa418              12:51:37.668      git-upload-pack '/platform/hardware/qcom
2015-07-02 12:48:52,979 INFO cmd stdout: 39d6b469              12:51:37.679      git-upload-pack '/platform/external/dhcp
2015-07-02 12:48:52,979 INFO cmd stdout: 59db688f              12:51:37.679      git-upload-pack '/platform/vendor/partne
2015-07-02 12:48:52,979 INFO cmd stdout: b90144dc              12:51:37.686      git-upload-pack '/platform/external/libp
2015-07-02 12:48:52,979 INFO cmd stdout: d9fe78e6              12:51:37.686      git-upload-pack '/platform/external/prog
2015-07-02 12:48:52,979 INFO cmd stdout: f9f25c1e              12:51:37.726      git-upload-pack '/platform/external/guav
2015-07-02 12:48:52,979 INFO cmd stdout: 39ed543d              12:51:38.858      git-upload-pack '/platform/packages/scre
2015-07-02 12:48:52,979 INFO cmd stdout: 79b2ec6c              12:51:38.874      git-upload-pack '/device/google/accessor
2015-07-02 12:48:52,979 INFO cmd stdout: b9bce43f              12:51:38.955      git-upload-pack '/platform/external/jmdn
2015-07-02 12:48:52,979 INFO cmd stdout: f9bd7c3d              12:51:39.096      git-upload-pack '/platform/external/libu
2015-07-02 12:48:52,980 INFO cmd stdout: 39c8f4d0              12:51:39.114      git-upload-pack '/platform/packages/wall
2015-07-02 12:48:52,980 INFO cmd stdout: 79998cf0              12:51:39.123      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,980 INFO cmd stdout: f9a49c31              12:51:39.127      git-upload-pack '/platform/hardware/qcom
2015-07-02 12:48:52,980 INFO cmd stdout: 19aad0fb              12:51:39.127      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,980 INFO cmd stdout: 399f94dd              12:51:39.135      git-upload-pack '/platform/external/apac
2015-07-02 12:48:52,980 INFO cmd stdout: 79842c90              12:51:39.145      git-upload-pack '/platform/external/harf
2015-07-02 12:48:52,980 INFO cmd stdout: b98e24b0              12:51:39.160      git-upload-pack '/platform/external/libv
2015-07-02 12:48:52,980 INFO cmd stdout: f96fbcb9              12:51:39.164      git-upload-pack '/platform/external/comp
2015-07-02 12:48:52,981 INFO cmd stdout: 397a3481              12:51:39.165      git-upload-pack '/platform/external/qemu
2015-07-02 12:48:52,981 INFO cmd stdout: 195c101b              12:51:39.166      git-upload-pack '/platform/frameworks/op
2015-07-02 12:48:52,981 INFO cmd stdout: 3951d408              12:51:39.167      git-upload-pack '/platform/external/nist
2015-07-02 12:48:52,981 INFO cmd stdout: f956dcfd              12:51:39.166      git-upload-pack '/platform/frameworks/op
2015-07-02 12:48:52,981 INFO cmd stdout: 59560801              12:51:39.167      git-upload-pack '/platform/packages/serv
2015-07-02 12:48:52,981 INFO cmd stdout: 79666cf6              12:51:39.185      git-upload-pack '/platform/frameworks/op
2015-07-02 12:48:52,981 INFO cmd stdout: b97064ad              12:51:40.246      git-upload-pack '/platform/hardware/qcom
2015-07-02 12:48:52,981 INFO cmd stdout: f971fcb2 waiting .... 12:51:41.319      git-upload-pack '/platform/packages/apps
2015-07-02 12:48:52,981 INFO cmd stdout: 397c748a waiting .... 12:51:41.343      git-upload-pack '/platform/external/opro
2015-07-02 12:48:52,982 INFO cmd stdout: 798d0cca waiting .... 12:51:41.354      git-upload-pack '/platform/external/kern
2015-07-02 12:48:52,982 INFO cmd stdout: 199e50fd waiting .... 12:51:41.356      git-upload-pack '/platform/external/bsdi
2015-07-02 12:48:52,982 INFO cmd stdout: f9981c13 waiting .... 12:51:41.357      git-upload-pack '/platform/hardware/qcom
2015-07-02 12:48:52,982 INFO cmd stdout: 399314fb waiting .... 12:51:41.421      git-upload-pack '/platform/bootable/boot
2015-07-02 12:48:52,982 INFO cmd stdout: d9bf5861 waiting .... 12:51:41.794      git-upload-pack '/platform/external/jsil
2015-07-02 12:48:52,982 INFO cmd stdout: f4ecf1ed 10:03:22.576 Jul-01 09:03      Log File Compressor
2015-07-02 12:48:52,982 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-02 12:48:52,982 INFO cmd stdout:   56 tasks
2015-07-02 12:48:53,780 INFO cmd stdout: Session     Start     Idle   User            Remote Host
2015-07-02 12:48:53,780 INFO cmd stdout: --------------------------------------------------------------
2015-07-02 12:48:53,780 INFO cmd stdout: 06b00061 10:37:16 26:13:52   zhangrc2        ?
2015-07-02 12:48:53,780 INFO cmd stdout: d61a4c94 13:50:44 23:00:58   liaohj          xmws37
2015-07-02 12:48:53,780 INFO cmd stdout: dc32f466 15:25:05 21:25:32   wuzb1           ?
2015-07-02 12:48:53,780 INFO cmd stdout: 1095e825 15:36:11 21:14:53   yuqing1         ?
2015-07-02 12:48:53,780 INFO cmd stdout: b2ed28da 15:54:06 20:55:56   yanqd1          ?
2015-07-02 12:48:53,780 INFO cmd stdout: 1c630219 16:21:21 20:29:35   midhzdgj        ?
2015-07-02 12:48:53,780 INFO cmd stdout: 4ab336d5 17:46:19 19:05:16   midhzdgj        ?
2015-07-02 12:48:53,780 INFO cmd stdout: 813d9e97 17:57:23 18:53:33   oujf            ?
2015-07-02 12:48:53,780 INFO cmd stdout: 23656896 21:50:39 14:59:54   huangzh3        ?
2015-07-02 12:48:53,780 INFO cmd stdout: 71719758 05:06:34 07:41:21   midhzdgj        ?
2015-07-02 12:48:53,780 INFO cmd stdout: 60265284 11:25:28 01:25:45   wengjun1        ?
2015-07-02 12:48:53,780 INFO cmd stdout: 73e2dbf4 12:05:37 00:45:29   zhuan1          ?
2015-07-02 12:48:53,780 INFO cmd stdout: ea2cb865 12:50:46 00:00:00   huangzh3        xmws22
2015-07-02 12:48:53,781 INFO cmd stdout: cfea261a 12:50:58 00:00:01   midhzdgj        xmws54
2015-07-02 12:48:53,781 INFO cmd stdout: 19a9702e 12:51:42 00:00:00   zhangclb        swplnew
2015-07-02 12:48:53,781 INFO cmd stdout: --
2015-07-02 12:48:53,781 INFO cmd stdout: SSHD Backend: mina
2015-07-02 12:48:55,808 INFO cmd stdout: Gerrit Code Review        2.11.1                    now    12:51:44   CST
2015-07-02 12:48:55,808 INFO cmd stdout:                                                  uptime    1 days  3 hrs
2015-07-02 12:48:55,808 INFO cmd stdout: 
2015-07-02 12:48:55,808 INFO cmd stdout:   Name                          |Entries              |  AvgGet |Hit Ratio|
2015-07-02 12:48:55,808 INFO cmd stdout:                                 |   Mem   Disk   Space|         |Mem  Disk|
2015-07-02 12:48:55,808 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-02 12:48:55,808 INFO cmd stdout:   accounts                      |    17               |  23.3ms | 99%     |
2015-07-02 12:48:55,809 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-02 12:48:55,809 INFO cmd stdout:   accounts_byname               |    17               |         |         |
2015-07-02 12:48:55,809 INFO cmd stdout:   changes                       |                     |  22.9ms |  0%     |
2015-07-02 12:48:55,809 INFO cmd stdout:   groups                        |   243               | 938.0us | 47%     |
2015-07-02 12:48:55,809 INFO cmd stdout:   groups_byinclude              |                     |         |         |
2015-07-02 12:48:55,809 INFO cmd stdout:   groups_byname                 |                     |         |         |
2015-07-02 12:48:55,809 INFO cmd stdout:   groups_byuuid                 |                     |         |         |
2015-07-02 12:48:55,809 INFO cmd stdout:   groups_external               |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   groups_members                |   459               |   5.2ms | 99%     |
2015-07-02 12:48:55,810 INFO cmd stdout:   ldap_group_existence          |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   ldap_groups                   |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   ldap_groups_byinclude         |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   ldap_usernames                |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   permission_sort               |  1024               |         | 60%     |
2015-07-02 12:48:55,810 INFO cmd stdout:   project_list                  |                     |         |         |
2015-07-02 12:48:55,810 INFO cmd stdout:   projects                      |   760               |  51.3ms | 99%     |
2015-07-02 12:48:55,810 INFO cmd stdout:   sshkeys                       |    16               |  13.3ms | 97%     |
2015-07-02 12:48:55,811 INFO cmd stdout: D change_kind                   |                0.00k|         |         |
2015-07-02 12:48:55,811 INFO cmd stdout: D conflicts                     |                0.00k|         |         |
2015-07-02 12:48:55,811 INFO cmd stdout: D diff                          |                0.00k|         |         |
2015-07-02 12:48:55,811 INFO cmd stdout: D diff_intraline                |                0.00k|         |         |
2015-07-02 12:48:55,811 INFO cmd stdout: D git_tags                      |   674    674  52.09m|         | 95% 100%|
2015-07-02 12:48:55,811 INFO cmd stdout: D mergeability                  |                0.00k|         |         |
2015-07-02 12:48:55,811 INFO cmd stdout: 
2015-07-02 12:48:55,811 INFO cmd stdout: SSH:     15  users, oldest session started    1 days  2 hrs ago
2015-07-02 12:48:55,811 INFO cmd stdout: Tasks:   53  total =   48 running +      4 ready +    1 sleeping
2015-07-02 12:48:55,812 INFO cmd stdout: Mem: 27.21g total = 15.87g used + 9.34g free + 2.00g buffers
2015-07-02 12:48:55,812 INFO cmd stdout:      27.21g max
2015-07-02 12:48:55,812 INFO cmd stdout:           29 open files
2015-07-02 12:48:55,812 INFO cmd stdout: 
2015-07-02 12:48:55,812 INFO cmd stdout: Threads: 32 CPUs available, 91 threads
2015-07-02 12:48:55,812 INFO cmd stdout: 

Luca Milanesio

unread,
Jul 2, 2015, 3:05:52 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hold on … you have ONLY 17 active users (from your caches output) generating 48 running tasks and 243 groups?
How can you have 243 groups out of 17 users?

Most of them are clone/fetch only as well …

Do you have lots of anonymous access through HTTP?
How your http_log looks like?

From what I see, there is no justification of running of our connection pool with the DB, unless there is an untold story on the HTTP side.

Luca.

Luca Milanesio

unread,
Jul 2, 2015, 3:08:02 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion
One thing I’ve noticed: you are running short of permissions cache.
2015-07-02 12:48:55,810 INFO cmd stdout:   permission_sort               |  1024               |         | 60%     |

This means for each operation you make, 40% of them need to fetch permissions on the DB, including the clone/fetch operations you see in the active tasks.
*THIS* could be a problem for your DB.

Please raise your permission_sort to at least 2048 and see if the load on the DB ease a bit.

Luca.

Chunlin Zhang

unread,
Jul 2, 2015, 3:25:46 AM7/2/15
to Luca Milanesio, Repo and Gerrit Discussion
On Thu, Jul 2, 2015 at 3:05 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Hold on … you have ONLY 17 active users (from your caches output) generating 48 running tasks and 243 groups?
How can you have 243 groups out of 17 users?
Hmm...Yes, there are too much groups and permission settings in the All-Projects. Because there are too much android projects and branches ...

Most of them are clone/fetch only as well …

Do you have lots of anonymous access through HTTP?
No.
How your http_log looks like?
There is no http_log for slave gerrit, because the slave mode. 

Chunlin Zhang

unread,
Jul 2, 2015, 3:33:11 AM7/2/15
to Luca Milanesio, Repo and Gerrit Discussion
On Thu, Jul 2, 2015 at 3:07 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
One thing I’ve noticed: you are running short of permissions cache.
2015-07-02 12:48:55,810 INFO cmd stdout:   permission_sort               |  1024               |         | 60%     |

This means for each operation you make, 40% of them need to fetch permissions on the DB, including the clone/fetch operations you see in the active tasks.
*THIS* could be a problem for your DB.

Please raise your permission_sort to at least 2048 and see if the load on the DB ease a bit.
So I should increase the  "permission_sort" and "groups" in cache config, is it?
Is it right for the config?
```
[cache "permission_sort"]
memoryLimit = 3072
```

Luca Milanesio

unread,
Jul 2, 2015, 3:41:48 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion
OK, then you definitely need to increase the size of your groups* and permission* cache otherwise your slaves are going to hammer the DB most of the times.

Luca.

Luca Milanesio

unread,
Jul 2, 2015, 3:45:59 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion

Chunlin Zhang

unread,
Jul 2, 2015, 4:04:52 AM7/2/15
to Luca Milanesio, Repo and Gerrit Discussion
I add config to one slave gerrit and restart it
```
[cache "permission_sort"]
        memoryLimit = 3072
[cache "groups"]
        memoryLimit = 2048
```
now the cache output of  this slave gerrit became:
```
2015-07-02 15:57:35,319 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-02 15:57:35,319 INFO cmd stdout:   accounts                      |     5               |  19.3ms | 99%     |
2015-07-02 15:57:35,320 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   accounts_byname               |     5               |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   changes                       |                     |  24.0ms |  0%     |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups                        |    79               | 847.2us | 10%     |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups_byinclude              |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups_byname                 |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups_byuuid                 |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups_external               |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   groups_members                |   416               |   2.8ms | 99%     |
2015-07-02 15:57:35,320 INFO cmd stdout:   ldap_group_existence          |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   ldap_groups                   |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   ldap_groups_byinclude         |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   ldap_usernames                |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   permission_sort               |  3072               |         | 93%     |
2015-07-02 15:57:35,320 INFO cmd stdout:   project_list                  |                     |         |         |
2015-07-02 15:57:35,320 INFO cmd stdout:   projects                      |   476               |  49.9ms | 94%     |
2015-07-02 15:57:35,321 INFO cmd stdout:   sshkeys                       |     5               |   9.2ms | 89%     |
```

The hit ratio of "permission_sort" became 93% now, hope it will solve this issue.
I have no idea why the "groups" cache only 10% but the memory still 79.

Luca Milanesio

unread,
Jul 2, 2015, 4:54:17 AM7/2/15
to Chunlin Zhang, Repo and Gerrit Discussion
The groups line is very strange indeed, but possibly only because the system has just being restarted and thus is quite normal that the majority of fetches are actually a miss.
You need to see how that numbers grows over time, it should eventually go towards the 99% ;-)

Luca.

Chunlin Zhang

unread,
Jul 2, 2015, 9:28:25 PM7/2/15
to Luca Milanesio, Repo and Gerrit Discussion
Now still 50%.
2015-07-03 09:06:06,469 INFO cmd stdout:   groups                        |   231               | 776.7us | 50%     |

Chunlin Zhang

unread,
Jul 2, 2015, 9:44:58 PM7/2/15
to Martin Fick, Repo and Gerrit Discussion
No, I don't remember I did this type of config, and I don't know how to do this config.

On Fri, Jul 3, 2015 at 4:42 AM, Martin Fick <mf...@codeaurora.org> wrote:
On Thursday, July 02, 2015 10:48:30 AM Chunlin Zhang wrote:

> On Thu, Jul 2, 2015 at 1:20 AM, Martin Fick
<mf...@codeaurora.org> wrote:
> > Could you have a per-user postgress connection limitthat
> > you are running up against?
>
> I don't know what you mean exactly. You want me to limit
> the connections of readonly user ?

No! :)  I am aksing if you perhaps already done that?  Do
you have some postgress per-user limit configured somewhere
without realizing it?  All the gerrit connections are likely
comming from the same user,  so even though you have a large
connection limit on your pgserver,  you may still be limited
by this user.  Also, do you have any users that are not
Gerrit hitting your pgdb directly that might not be
accounted for?

Chunlin Zhang

unread,
Jul 6, 2015, 9:37:42 PM7/6/15
to Luca Milanesio, Repo and Gerrit Discussion
The "Cannot open ReviewDb" error came out once after I add the cache config.

log 1(my log watch tool output):
```
2015-07-06 16:23:38 INFO cmd stdout:Stopping Gerrit Code Review: OK
2015-07-06 16:23:52 INFO cmd stdout:Starting Gerrit Code Review: OK
```

log 2(gerrit error_log,there are many,I only paste one):
```
[2015-07-06 16:23:03,058] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user yangyang19 account 1000543) during git-upload-pack '/platform/ndk'
com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
        at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1018)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:69)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
```

log 3(gerrit cmd output):
```
2015-07-06 16:21:38,249 INFO run cmd: ssh smamirr_sh gerrit show-queue;ssh smamirr_sh gerrit show-connections;ssh smamirr_sh gerrit show-caches
2015-07-06 16:21:39,136 INFO cmd stdout: Task     State        StartTime         Command
2015-07-06 16:21:39,136 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:21:39,136 INFO cmd stdout: 8fb92dc0              16:24:24.025      git-upload-pack '/platform/frameworks/ba
2015-07-06 16:21:39,136 INFO cmd stdout: bea22067 17:23:51.771 16:23:51.769      Log File Compressor
2015-07-06 16:21:39,136 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:21:39,136 INFO cmd stdout:   2 tasks
2015-07-06 16:21:39,635 INFO cmd stdout: Session     Start     Idle   User            Remote Host
2015-07-06 16:21:39,635 INFO cmd stdout: --------------------------------------------------------------
2015-07-06 16:21:39,635 INFO cmd stdout: 4f10159d 16:24:23 00:00:06   yangyang19      shws23
2015-07-06 16:21:39,635 INFO cmd stdout: 2faa8164 16:24:30 00:00:00   zhangclb        swplnew
2015-07-06 16:21:39,635 INFO cmd stdout: --
2015-07-06 16:21:39,635 INFO cmd stdout: SSHD Backend: mina
2015-07-06 16:21:40,803 INFO cmd stdout: Gerrit Code Review        2.11.1                    now    16:24:31   CST
2015-07-06 16:21:40,803 INFO cmd stdout:                                                  uptime     0 min 39 sec
2015-07-06 16:21:40,803 INFO cmd stdout: 
2015-07-06 16:21:40,803 INFO cmd stdout:   Name                          |Entries              |  AvgGet |Hit Ratio|
2015-07-06 16:21:40,803 INFO cmd stdout:                                 |   Mem   Disk   Space|         |Mem  Disk|
2015-07-06 16:21:40,803 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-06 16:21:40,804 INFO cmd stdout:   accounts                      |     2               | 537.4ms | 99%     |
2015-07-06 16:21:40,804 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   accounts_byname               |     2               |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   changes                       |                     |  54.6ms |  0%     |
2015-07-06 16:21:40,804 INFO cmd stdout:   groups                        |    17               |  49.5ms |  0%     |
2015-07-06 16:21:40,804 INFO cmd stdout:   groups_byinclude              |                     |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   groups_byname                 |                     |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   groups_byuuid                 |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   groups_external               |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   groups_members                |   149               |  66.8ms | 99%     |
2015-07-06 16:21:40,805 INFO cmd stdout:   ldap_group_existence          |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   ldap_groups                   |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   ldap_groups_byinclude         |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   ldap_usernames                |                     |         |         |
2015-07-06 16:21:40,805 INFO cmd stdout:   permission_sort               |  3072               |         | 98%     |
2015-07-06 16:21:40,805 INFO cmd stdout:   project_list                  |                     |         |         |
2015-07-06 16:21:40,806 INFO cmd stdout:   projects                      |   525               |  29.0ms | 83%     |
2015-07-06 16:21:40,806 INFO cmd stdout:   sshkeys                       |     2               |  79.9ms | 99%     |
2015-07-06 16:21:40,806 INFO cmd stdout: D change_kind                   |                0.00k|         |         |
2015-07-06 16:21:40,806 INFO cmd stdout: D conflicts                     |                0.00k|         |         |
2015-07-06 16:21:40,806 INFO cmd stdout: D diff                          |                0.00k|         |         |
2015-07-06 16:21:40,806 INFO cmd stdout: D diff_intraline                |                0.00k|         |         |
2015-07-06 16:21:40,806 INFO cmd stdout: D git_tags                      |   485    692  52.80m|         |  0% 100%|
2015-07-06 16:21:40,806 INFO cmd stdout: D mergeability                  |                0.00k|         |         |
2015-07-06 16:21:40,806 INFO cmd stdout: 
2015-07-06 16:21:40,807 INFO cmd stdout: SSH:      2  users, oldest session started  0 min  7 sec ago
2015-07-06 16:21:40,807 INFO cmd stdout: Tasks:    3  total =    2 running +      0 ready +    1 sleeping
2015-07-06 16:21:40,807 INFO cmd stdout: Mem: 8.81g total = 3.67g used + 5.11g free + 24.90m buffers
2015-07-06 16:21:40,807 INFO cmd stdout:      26.67g max
2015-07-06 16:21:40,807 INFO cmd stdout:          535 open files
2015-07-06 16:21:40,807 INFO cmd stdout: 
2015-07-06 16:21:40,807 INFO cmd stdout: Threads: 40 CPUs available, 107 threads
2015-07-06 16:21:40,807 INFO cmd stdout: 
2015-07-06 16:23:40,817 INFO run cmd: ssh smamirr_sh gerrit show-queue;ssh smamirr_sh gerrit show-connections;ssh smamirr_sh gerrit show-caches
2015-07-06 16:23:41,358 INFO cmd stdout: Task     State        StartTime         Command
2015-07-06 16:23:41,358 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:23:41,358 INFO cmd stdout: bea22067 17:23:51.769 16:23:51.769      Log File Compressor
2015-07-06 16:23:41,358 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:23:41,358 INFO cmd stdout:   1 tasks
2015-07-06 16:23:42,125 INFO cmd stdout: Session     Start     Idle   User            Remote Host
2015-07-06 16:23:42,125 INFO cmd stdout: --------------------------------------------------------------
2015-07-06 16:23:42,126 INFO cmd stdout: 0863fbaa 16:26:32 00:00:00   zhangclb        swplnew
2015-07-06 16:23:42,126 INFO cmd stdout: --
2015-07-06 16:23:42,126 INFO cmd stdout: SSHD Backend: mina
2015-07-06 16:23:42,678 INFO cmd stdout: Gerrit Code Review        2.11.1                    now    16:26:33   CST
2015-07-06 16:23:42,678 INFO cmd stdout:                                                  uptime     2 min 41 sec
2015-07-06 16:23:42,678 INFO cmd stdout: 
2015-07-06 16:23:42,679 INFO cmd stdout:   Name                          |Entries              |  AvgGet |Hit Ratio|
2015-07-06 16:23:42,679 INFO cmd stdout:                                 |   Mem   Disk   Space|         |Mem  Disk|
2015-07-06 16:23:42,679 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-06 16:23:42,679 INFO cmd stdout:   accounts                      |     2               | 537.4ms | 99%     |
2015-07-06 16:23:42,679 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-06 16:23:42,679 INFO cmd stdout:   accounts_byname               |     2               |         |         |
2015-07-06 16:23:42,679 INFO cmd stdout:   changes                       |                     | 285.1ms |  0%     |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups                        |    17               |  49.5ms |  0%     |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups_byinclude              |                     |         |         |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups_byname                 |                     |         |         |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups_byuuid                 |                     |         |         |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups_external               |                     |         |         |
2015-07-06 16:23:42,680 INFO cmd stdout:   groups_members                |   284               |  91.1ms | 99%     |
2015-07-06 16:23:42,680 INFO cmd stdout:   ldap_group_existence          |                     |         |         |
2015-07-06 16:23:42,680 INFO cmd stdout:   ldap_groups                   |                     |         |         |
2015-07-06 16:23:42,681 INFO cmd stdout:   ldap_groups_byinclude         |                     |         |         |
2015-07-06 16:23:42,681 INFO cmd stdout:   ldap_usernames                |                     |         |         |
2015-07-06 16:23:42,681 INFO cmd stdout:   permission_sort               |  3072               |         | 97%     |
2015-07-06 16:23:42,681 INFO cmd stdout:   project_list                  |                     |         |         |
2015-07-06 16:23:42,681 INFO cmd stdout:   projects                      |   565               |  37.7ms | 91%     |
2015-07-06 16:23:42,681 INFO cmd stdout:   sshkeys                       |     2               |  79.9ms | 99%     |
2015-07-06 16:23:42,681 INFO cmd stdout: D change_kind                   |                0.00k|         |         |
2015-07-06 16:23:42,681 INFO cmd stdout: D conflicts                     |                0.00k|         |         |
2015-07-06 16:23:42,682 INFO cmd stdout: D diff                          |                0.00k|         |         |
2015-07-06 16:23:42,682 INFO cmd stdout: D diff_intraline                |                0.00k|         |         |
2015-07-06 16:23:42,682 INFO cmd stdout: D git_tags                      |   523    692  52.81m|         | 48% 100%|
2015-07-06 16:23:42,682 INFO cmd stdout: D mergeability                  |                0.00k|         |         |
2015-07-06 16:23:42,682 INFO cmd stdout: 
2015-07-06 16:23:42,682 INFO cmd stdout: SSH:      1  users, oldest session started 484 ms ago
2015-07-06 16:23:42,682 INFO cmd stdout: Tasks:    2  total =    1 running +      0 ready +    1 sleeping
2015-07-06 16:23:42,682 INFO cmd stdout: Mem: 11.20g total = 4.48g used + 6.69g free + 36.03m buffers
2015-07-06 16:23:42,682 INFO cmd stdout:      26.67g max
2015-07-06 16:23:42,683 INFO cmd stdout:          584 open files
2015-07-06 16:23:42,683 INFO cmd stdout: 
2015-07-06 16:23:42,683 INFO cmd stdout: Threads: 40 CPUs available, 89 threads
2015-07-06 16:23:42,683 INFO cmd stdout: 
2015-07-06 16:25:42,692 INFO run cmd: ssh smamirr_sh gerrit show-queue;ssh smamirr_sh gerrit show-connections;ssh smamirr_sh gerrit show-caches
2015-07-06 16:25:43,280 INFO cmd stdout: Task     State        StartTime         Command
2015-07-06 16:25:43,280 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:25:43,280 INFO cmd stdout: bea22067 17:23:51.769 16:23:51.769      Log File Compressor
2015-07-06 16:25:43,280 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:25:43,280 INFO cmd stdout:   1 tasks
2015-07-06 16:25:43,840 INFO cmd stdout: Session     Start     Idle   User            Remote Host
2015-07-06 16:25:43,840 INFO cmd stdout: --------------------------------------------------------------
2015-07-06 16:25:43,840 INFO cmd stdout: 088a1b79 16:28:34 00:00:00   zhangclb        swplnew
2015-07-06 16:25:43,840 INFO cmd stdout: --
2015-07-06 16:25:43,840 INFO cmd stdout: SSHD Backend: mina
2015-07-06 16:25:44,476 INFO cmd stdout: Gerrit Code Review        2.11.1                    now    16:28:35   CST
2015-07-06 16:25:44,476 INFO cmd stdout:                                                  uptime     4 min 43 sec
2015-07-06 16:25:44,476 INFO cmd stdout: 
2015-07-06 16:25:44,476 INFO cmd stdout:   Name                          |Entries              |  AvgGet |Hit Ratio|
2015-07-06 16:25:44,477 INFO cmd stdout:                                 |   Mem   Disk   Space|         |Mem  Disk|
2015-07-06 16:25:44,477 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-06 16:25:44,477 INFO cmd stdout:   accounts                      |     2               | 537.4ms | 99%     |
2015-07-06 16:25:44,477 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-06 16:25:44,477 INFO cmd stdout:   accounts_byname               |     2               |         |         |
2015-07-06 16:25:44,477 INFO cmd stdout:   changes                       |                     | 285.1ms |  0%     |
2015-07-06 16:25:44,477 INFO cmd stdout:   groups                        |    17               |  49.5ms |  0%     |
2015-07-06 16:25:44,478 INFO cmd stdout:   groups_byinclude              |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   groups_byname                 |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   groups_byuuid                 |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   groups_external               |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   groups_members                |   284               |  91.1ms | 99%     |
2015-07-06 16:25:44,478 INFO cmd stdout:   ldap_group_existence          |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   ldap_groups                   |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   ldap_groups_byinclude         |                     |         |         |
2015-07-06 16:25:44,478 INFO cmd stdout:   ldap_usernames                |                     |         |         |
2015-07-06 16:25:44,479 INFO cmd stdout:   permission_sort               |  3072               |         | 97%     |
2015-07-06 16:25:44,479 INFO cmd stdout:   project_list                  |                     |         |         |
2015-07-06 16:25:44,479 INFO cmd stdout:   projects                      |   565               |  37.7ms | 91%     |
2015-07-06 16:25:44,479 INFO cmd stdout:   sshkeys                       |     2               |  79.9ms | 99%     |
2015-07-06 16:25:44,479 INFO cmd stdout: D change_kind                   |                0.00k|         |         |
2015-07-06 16:25:44,479 INFO cmd stdout: D conflicts                     |                0.00k|         |         |
2015-07-06 16:25:44,479 INFO cmd stdout: D diff                          |                0.00k|         |         |
2015-07-06 16:25:44,479 INFO cmd stdout: D diff_intraline                |                0.00k|         |         |
2015-07-06 16:25:44,480 INFO cmd stdout: D git_tags                      |   523    692  52.81m|         | 48% 100%|
2015-07-06 16:25:44,480 INFO cmd stdout: D mergeability                  |                0.00k|         |         |
2015-07-06 16:25:44,480 INFO cmd stdout: 
2015-07-06 16:25:44,480 INFO cmd stdout: SSH:      1  users, oldest session started 565 ms ago
2015-07-06 16:25:44,480 INFO cmd stdout: Tasks:    2  total =    1 running +      0 ready +    1 sleeping
2015-07-06 16:25:44,480 INFO cmd stdout: Mem: 11.20g total = 4.61g used + 6.55g free + 36.03m buffers
2015-07-06 16:25:44,480 INFO cmd stdout:      26.67g max
2015-07-06 16:25:44,480 INFO cmd stdout:          584 open files
2015-07-06 16:25:44,481 INFO cmd stdout: 
2015-07-06 16:25:44,481 INFO cmd stdout: Threads: 40 CPUs available, 89 threads
2015-07-06 16:25:44,481 INFO cmd stdout: 
```

Luca Milanesio

unread,
Jul 7, 2015, 3:23:33 AM7/7/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hi Chunlin,
the situation you are showing here is quite strange.

I do suggest someone having a look on the OS-level situation of your host.
I see a very powerful machine (40 CPUs) with not much memory use and quite high Get times for the cached entries.
I would be interesting having a look on your DB-server as well so see the full picture.

Doesn’t seems however a problem with the load or a bug in Gerrit but rather something wrong in your set-up.

See below my comments:

On 7 Jul 2015, at 02:37, Chunlin Zhang <zhangc...@gmail.com> wrote:

The "Cannot open ReviewDb" error came out once after I add the cache config.

log 1(my log watch tool output):
```
2015-07-06 16:23:38 INFO cmd stdout:Stopping Gerrit Code Review: OK
2015-07-06 16:23:52 INFO cmd stdout:Starting Gerrit Code Review: OK
```

log 2(gerrit error_log,there are many,I only paste one):
```
[2015-07-06 16:23:03,058] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user yangyang19 account 1000543) during git-upload-pack '/platform/ndk'
com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
        at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1018)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:69)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
```


… and ? can you include all the stack traces associated?

log 3(gerrit cmd output):
```
2015-07-06 16:21:38,249 INFO run cmd: ssh smamirr_sh gerrit show-queue;ssh smamirr_sh gerrit show-connections;ssh smamirr_sh gerrit show-caches
2015-07-06 16:21:39,136 INFO cmd stdout: Task     State        StartTime         Command
2015-07-06 16:21:39,136 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:21:39,136 INFO cmd stdout: 8fb92dc0              16:24:24.025      git-upload-pack '/platform/frameworks/ba
2015-07-06 16:21:39,136 INFO cmd stdout: bea22067 17:23:51.771 16:23:51.769      Log File Compressor
2015-07-06 16:21:39,136 INFO cmd stdout: ------------------------------------------------------------------------------
2015-07-06 16:21:39,136 INFO cmd stdout:   2 tasks
2015-07-06 16:21:39,635 INFO cmd stdout: Session     Start     Idle   User            Remote Host
2015-07-06 16:21:39,635 INFO cmd stdout: --------------------------------------------------------------
2015-07-06 16:21:39,635 INFO cmd stdout: 4f10159d 16:24:23 00:00:06   yangyang19      shws23
2015-07-06 16:21:39,635 INFO cmd stdout: 2faa8164 16:24:30 00:00:00   zhangclb        swplnew

So you had *ONLY* one running task and 2 active users? … and the DB was already out of connections?
This is definitely not a normal situation, have you checked any anomaly on your DB log?
Have you checked the status of the running connections?

2015-07-06 16:21:39,635 INFO cmd stdout: --
2015-07-06 16:21:39,635 INFO cmd stdout: SSHD Backend: mina
2015-07-06 16:21:40,803 INFO cmd stdout: Gerrit Code Review        2.11.1                    now    16:24:31   CST
2015-07-06 16:21:40,803 INFO cmd stdout:                                                  uptime     0 min 39 sec
2015-07-06 16:21:40,803 INFO cmd stdout: 
2015-07-06 16:21:40,803 INFO cmd stdout:   Name                          |Entries              |  AvgGet |Hit Ratio|
2015-07-06 16:21:40,803 INFO cmd stdout:                                 |   Mem   Disk   Space|         |Mem  Disk|
2015-07-06 16:21:40,803 INFO cmd stdout: --------------------------------+---------------------+---------+---------+
2015-07-06 16:21:40,804 INFO cmd stdout:   accounts                      |     2               | 537.4ms | 99%     |

You are able to saturate the DB connections with 2 active users?

2015-07-06 16:21:40,804 INFO cmd stdout:   accounts_byemail              |                     |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   accounts_byname               |     2               |         |         |
2015-07-06 16:21:40,804 INFO cmd stdout:   changes                       |                     |  54.6ms |  0%     |
2015-07-06 16:21:40,804 INFO cmd stdout:   groups                        |    17               |  49.5ms |  0%     |

0% for Groups? You are basically not able to cache anything !
40 CPUs, memory almost free … but I see very high AvgGet times ! But again not justified for a 2 users set-up.

Chunlin Zhang

unread,
Jul 7, 2015, 5:56:41 AM7/7/15
to Luca Milanesio, Repo and Gerrit Discussion
"So you had *ONLY* one running task and 2 active users? … and the DB was already out of connections?"
Actually there is only 1 user, because 1 is my test user.
But the time is not so match, because I run gerrit command once with 2 minutes as interval.

"This is definitely not a normal situation, have you checked any anomaly on your DB log?"
I can not see anything useful from postgresql log, the error of "unexpected EOF on client connection" maybe cause by gerrit restart
```
2015-07-06 13:36:45 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:23:38 CST LOG:  unexpected EOF on client connection
2015-07-06 16:53:09 CST LOG:  unexpected EOF on client connection
2015-07-06 17:03:59 CST LOG:  unexpected EOF on client connection

```

"Have you checked the status of the running connections?"
I will try to learn how to do this, but now I do not know how to check.

"0% for Groups? You are basically not able to cache anything !"
The data of "Groups" cache is always strange, I config with cache.groups.memoryLimit = 2048 ,but it use alway little for example 17.

"… and ? can you include all the stack traces associated?"
```
[2015-07-06 16:23:03,058] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user yangyang19 account 1000543) during git-upload-pack '/platform/ndk'
com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Cannot open ReviewDb
  at com.google.gerrit.server.util.ThreadLocalRequestContext$1.provideReviewDb(ThreadLocalRequestContext.java:70) (via modules: com.google.gerrit.server.config.GerritGlobalModule -> com.google.gerrit.server.util.ThreadLocalRequestContext$1)
  while locating com.google.gerrit.reviewdb.server.ReviewDb

1 error
        at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1018)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:69)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
Caused by: org.postgresql.util.PSQLException: 尝试连线已失败。
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:152)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
        at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:32)
        at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:393)
        at org.postgresql.Driver.connect(Driver.java:267)
        at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
        at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
        at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1158)
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at com.google.gwtorm.jdbc.Database.newConnection(Database.java:128)
        ... 30 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:145)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:114)
        at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
        at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:274)
        at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:271)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:108)
        ... 44 more

```

Chunlin Zhang

unread,
Jul 7, 2015, 6:05:22 AM7/7/15
to Luca Milanesio, Repo and Gerrit Discussion
Another possible cause of "Cannot open ReviewDb", because my mirror gerrit and the master gerrit(postgresql with it) are in different city, maybe sometimes the network connection is not so stable?
I think the odds of "Cannot open ReviewDb" decrease after I add the cache config, thank you very much for the advice.

On Tue, Jul 7, 2015 at 5:56 PM, Chunlin Zhang <zhangc...@gmail.com> wrote:

Luca Milanesio

unread,
Jul 7, 2015, 6:10:17 AM7/7/15
to Chunlin Zhang, Repo and Gerrit Discussion
Hi Chunlin,
the error here is not overload but rather your PostgreSQL resetting the connection :-)

… and yes, if the DB is so far away (that could be the reason behind the high execution times of the cache load) then it could actually create a waterfall effect on the connection pool as well.
(queries are taking longer, they accumulate, the pool gets full !)

You should look at DB replication as well across the cities (see https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling)
Gerrit here is out of the picture, you have basic DBMS problems.

Luca.
Reply all
Reply to author
Forward
0 new messages