Hi,
Our site has been experiencing alarmingly frequent stalling and crashing of our Gerrit server, so we tried several things to mend it, to no avail.
First, we upgraded Gerrit from 2.13.7 to 2.13.9 .
Then, we installed openjdk java 7 and used it instead of openjdk java 8, which is this server's system default,
Then, we did some tuning of parameters in gerrit.config, based on information found
here:
[database]
type = h2
database = /var/lib/gerrit/db/ReviewDB
poolLimit = 50
poolMinIdle = 4
poolMaxIdle = 16
[receive]
enableSignedPush = false
checkReferencedObjectsAreReachable = false
timeout = 4min
[container]
user = gerrit
javaHome = /usr/lib/jvm/java-7-openjdk-amd64/jre
heapLimit = 64g
[sshd]
listenAddress = *:29418
threads = 8
batchThreads = 2
commandStartThreads = 2
[httpd]
listenUrl = https://*:8080/
sslKeyStore = etc/keystore
sslKeyPassword = keystore
maxThreads = 25
[cache]
directory = cache
[download]
scheme = ssh
[core]
packedGitLimit = 1g
packedGitWindowSize = 8k
packedGitOpenFiles = 1024
Then, we decided to switch (described at the bottom bellow) from H2 reviewdb to a MySQL one, because we assumed that database performance in combination with size of our repositories was the source of our problems.
After the switch and site init, we tried to do reindex.
Accounts indexing gets done, but at change indexing stage we get errors for apparently just any and all of our projects.
Often, reindexing apparently just halts, and the process needs to be killed and restarted.
Giving it more resources seems to help it push through, but even when it finishes, it still always fails to produce changes index.
Our index/gerrit_index.config after the attempted reindex:
[index "accounts_0003"]
ready = true
[index "changes_0032"]
ready = false
The error logs are the same for all projects:
gerrit@<server>:~$ java -Xms64G -Xmx64G -jar /var/lib/gerrit/bin/gerrit.war reindex --threads 8 -d /var/lib/gerrit
[<timestamp>] [main] INFO com.google.gerrit.server.git.LocalDiskRepositoryManager : Defaulting core.streamFileThreshold to 2047m
[<timestamp>] [main] INFO com.google.gerrit.server.cache.h2.H2CacheFactory : Enabling disk cache /var/lib/gerrit/cache
Reindexing accounts: 100% (<account number>/<account number>)
Reindexed <account number> documents in accounts index in 0.4s (34.3/s)
Collecting projects: <number of projects>
[<timestamp>] [Index-Batch-<n>] ERROR com.google.gerrit.server.index.SiteIndexer : Failed to index project <any of our projects>
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.InvalidObjectIdException: Invalid id:
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:435)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
at com.google.gerrit.server.index.SiteIndexer$ErrorListener.run(SiteIndexer.java:110)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:753)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:634)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:110)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:417)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.jgit.errors.InvalidObjectIdException: Invalid id:
at org.eclipse.jgit.lib.ObjectId.fromString(ObjectId.java:232)
at com.google.gerrit.server.notedb.NoteDbChangeState.parse(NoteDbChangeState.java:88)
at com.google.gerrit.server.notedb.NoteDbChangeState.parse(NoteDbChangeState.java:77)
at com.google.gerrit.server.notedb.ChangeNotes.openHandle(ChangeNotes.java:553)
at com.google.gerrit.server.notedb.AbstractChangeNotes.load(AbstractChangeNotes.java:149)
at com.google.gerrit.server.notedb.ChangeNotes$Factory.createFromChangeOnlyWhenNoteDbDisabled(ChangeNotes.java:221)
at com.google.gerrit.server.notedb.ChangeNotes$Factory.scanDb(ChangeNotes.java:311)
at com.google.gerrit.server.notedb.ChangeNotes$Factory.scan(ChangeNotes.java:297)
at com.google.gerrit.server.index.change.AllChangesIndexer$2.call(AllChangesIndexer.java:225)
at com.google.gerrit.server.index.change.AllChangesIndexer$2.call(AllChangesIndexer.java:215)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
... 10 more
< ... repeated six more times with different timestamps, Index-Batch numbers and projects ... >
Reindexing changes: projects: 100% (<number of projects>/<number of projects>), done
Reindexed 0 documents in changes index in 0.7s (0.0/s)
[<timestamp>] [DiskCache-Store-0] WARN com.google.gerrit.server.cache.h2.H2CacheImpl : Cannot build BloomFilter for jdbc:h2:file:///var/lib/gerrit/cache/diff: Error opening database: "Sleep interrupted" [8000-176]
[<timestamp>] [main] INFO com.google.gerrit.server.cache.h2.H2CacheFactory : Finishing 4 disk cache updates
gerrit@<server>:~$
The number of projects is bellow 500, but many of them are hefty ones.
Changes in review are few, though.
The changes are concentrated in only a few projects, the rest of the projects is a mirror of a large project snapshot we need, but never changed so far.
We can switch the site back to H2 database. Reindexing after the init also often chokes, but when it gets to the end, it creates usable indexes and at least we can then start the service (even though it fails soon, as it is flaky).
At the moment, I don't know where to look for the problem source.
Is it the database, or maybe the git repository?
Was the moving of data from one db to the other done wrong?
Is the data itself corrupted if it was transferred faithfully?
In latter case wouldn't it then fail more subtly, instead of blocking completely?
Thanks to everyone who read this rant this far :) .
-----------------------------------
How we transferred the database contents from H2 to MySQL
(but we don't know if this is a correct method, because our site still doesn't work with MySQL reviewdb):
We made a csv dump of all tables from H2 reviewdb, using simplification (chopping down, hardcoding and commenting out almost everything in main) of a perl script found
here.We checked the content of cvs files against sql script obtained using "SCRIPT TO" command on H2 command prompt.
We created a new empty reviewdb database on the server, and gave all permissions for it to a new MySQL user "gerrit"
We made another, dummy review site in another directory, changed the $reviewsite/etc/gerrit.config and $reviewsite/etc/secure.config to point to MySQL reviewdb, copied them to $dummy_site_directory/etc/ and did init of that dummy site, to initialize the MySQL reviewdb.
[database]
# type = h2
# database = /var/lib/gerrit/db/ReviewDB
type = mysql
hostname = localhost
port = 3306
database = reviewdb
username = gerrit
poolLimit = 50
poolMinIdle = 4
poolMaxIdle = 16
Then, using phpMyAdmin, we imported the csv tables into MySQL reviewdb, one by one, skipping the first lines of csv files, with checked option "Replace table data with file".