Gerrit 2.13.9 reindex of changes index fails after switching reviewdb from H2 to MySQL

1,376 views
Skip to first unread message

Sasha C.

unread,
Dec 8, 2017, 2:20:03 AM12/8/17
to Repo and Gerrit Discussion
Hi,

Our site has been experiencing alarmingly frequent stalling and crashing of our Gerrit server, so we tried several things to mend it, to no avail.

First, we upgraded Gerrit from 2.13.7 to 2.13.9 .
Then, we installed openjdk java 7 and used it instead of openjdk java 8, which is this server's system default,
Then, we did some tuning of parameters in gerrit.config, based on information found here:

[database]
    type = h2
    database = /var/lib/gerrit/db/ReviewDB
    poolLimit = 50
    poolMinIdle = 4
    poolMaxIdle = 16
[receive]
    enableSignedPush = false
    checkReferencedObjectsAreReachable = false
    timeout = 4min
[container]
    user = gerrit
    javaHome = /usr/lib/jvm/java-7-openjdk-amd64/jre
        heapLimit = 64g
[sshd]
    listenAddress = *:29418
    threads = 8
    batchThreads = 2
    commandStartThreads = 2
[httpd]
    listenUrl = https://*:8080/
    sslKeyStore = etc/keystore
    sslKeyPassword = keystore
    maxThreads = 25
[cache]
    directory = cache
[download]
    scheme = ssh
[core]
    packedGitLimit = 1g
    packedGitWindowSize = 8k
    packedGitOpenFiles = 1024

Then, we decided to switch (described at the bottom bellow) from H2 reviewdb to a MySQL one, because we assumed that database performance in combination with size of our repositories was the source of our problems.

After the switch and site init, we tried to do reindex.
Accounts indexing gets done, but at change indexing stage we get errors for apparently just any and all of our projects.

Often, reindexing apparently just halts, and the process needs to be killed and restarted.

Giving it more resources seems to help it push through, but even when it finishes, it still always fails to produce changes index.

Our index/gerrit_index.config after the attempted reindex:

[index "accounts_0003"]
    ready = true
[index "changes_0032"]
    ready = false

The error logs are the same for all projects:

gerrit@<server>:~$ java -Xms64G -Xmx64G -jar /var/lib/gerrit/bin/gerrit.war reindex --threads 8 -d /var/lib/gerrit
[<timestamp>] [main] INFO  com.google.gerrit.server.git.LocalDiskRepositoryManager : Defaulting core.streamFileThreshold to 2047m
[
<timestamp>] [main] INFO  com.google.gerrit.server.cache.h2.H2CacheFactory : Enabling disk cache /var/lib/gerrit/cache
Reindexing accounts:    100% (<account number>/<account number>)
Reindexed
<account number> documents in accounts index in 0.4s (34.3/s)
Collecting projects:    <number of projects>
[
<timestamp>] [Index-Batch-<n>] ERROR com.google.gerrit.server.index.SiteIndexer : Failed to index project <any of our projects>
java.util.concurrent.ExecutionException: org.eclipse.jgit.errors.InvalidObjectIdException: Invalid id:
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:435)
    at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
    at com.google.gerrit.server.index.SiteIndexer$ErrorListener.run(SiteIndexer.java:110)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817)
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:753)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:634)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:110)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:417)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.jgit.errors.InvalidObjectIdException: Invalid id:
    at org.eclipse.jgit.lib.ObjectId.fromString(ObjectId.java:232)
    at com.google.gerrit.server.notedb.NoteDbChangeState.parse(NoteDbChangeState.java:88)
    at com.google.gerrit.server.notedb.NoteDbChangeState.parse(NoteDbChangeState.java:77)
    at com.google.gerrit.server.notedb.ChangeNotes.openHandle(ChangeNotes.java:553)
    at com.google.gerrit.server.notedb.AbstractChangeNotes.load(AbstractChangeNotes.java:149)
    at com.google.gerrit.server.notedb.ChangeNotes$Factory.createFromChangeOnlyWhenNoteDbDisabled(ChangeNotes.java:221)
    at com.google.gerrit.server.notedb.ChangeNotes$Factory.scanDb(ChangeNotes.java:311)
    at com.google.gerrit.server.notedb.ChangeNotes$Factory.scan(ChangeNotes.java:297)
    at com.google.gerrit.server.index.change.AllChangesIndexer$2.call(AllChangesIndexer.java:225)
    at com.google.gerrit.server.index.change.AllChangesIndexer$2.call(AllChangesIndexer.java:215)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
    ... 10 more
< ... repeated six more times with different timestamps, Index-Batch numbers and projects ... >
Reindexing changes: projects: 100% (<number of projects>/
<number of projects>), done   
Reindexed 0 documents in changes index in 0.7s (0.0/s)
[
<timestamp>] [DiskCache-Store-0] WARN  com.google.gerrit.server.cache.h2.H2CacheImpl : Cannot build BloomFilter for jdbc:h2:file:///var/lib/gerrit/cache/diff: Error opening database: "Sleep interrupted" [8000-176]
[
<timestamp>] [main] INFO  com.google.gerrit.server.cache.h2.H2CacheFactory : Finishing 4 disk cache updates
gerrit@
<server>:~$


The number of projects is bellow 500, but many of them are hefty ones.
Changes in review are few, though.
The changes are concentrated in only a few projects, the rest of the projects is a mirror of a large project snapshot we need, but never changed so far.

We can switch the site back to H2 database. Reindexing after the init also often chokes, but when it gets to the end, it creates usable indexes and at least we can then start the service (even though it fails soon, as it is flaky).

At the moment, I don't know where to look for the problem source.
Is it the database, or maybe the git repository?
Was the moving of data from one db to the other done wrong?
Is the data itself corrupted if it was transferred faithfully?
In latter case wouldn't it then fail more subtly, instead of blocking completely?

Thanks to everyone who read this rant this far :) .



-----------------------------------

How we transferred the database contents from H2 to MySQL
(but we don't know if this is a correct method, because our site still doesn't work with MySQL reviewdb):


We made a csv dump of all tables from H2 reviewdb, using simplification (chopping down, hardcoding and commenting out almost everything in main) of a perl script found here.

We checked the content of cvs files against sql script obtained using "SCRIPT TO" command on H2 command prompt.

We created a new empty reviewdb database on the server, and gave all permissions for it to a new MySQL user "gerrit"

We made another, dummy review site in another directory, changed the $reviewsite/etc/gerrit.config and $reviewsite/etc/secure.config to point to MySQL reviewdb, copied them to $dummy_site_directory/etc/ and did init of that dummy site, to initialize the MySQL reviewdb.

[database]
#    type = h2
#    database = /var/lib/gerrit/db/ReviewDB

    type = mysql
    hostname = localhost
    port = 3306
    database = reviewdb
    username = gerrit

    poolLimit = 50
    poolMinIdle = 4
    poolMaxIdle = 16

Then, using phpMyAdmin, we imported the csv tables into MySQL reviewdb, one by one, skipping the first lines of csv files, with checked option "Replace table data with file".

Sasha C.

unread,
Jan 18, 2018, 9:33:50 AM1/18/18
to Repo and Gerrit Discussion
To anyone experiencing similar stability problems:

After different approaches tried and tests performed, it turned out that problem exists only if Gerrit is hosted on that one server machine, which is the only Xeon server we had.

I don't know if it was the CPU architecture (we don't have another Xeon to try it on) that prevented our Gerrit from working reliably, nor what is the mechanism through which the bug acts, but as soon as we moved the Gerrit server to another, mediocre Pentium machine, stability ensued.

No other problems were reported for this Xeon machine.



On Friday, December 8, 2017 at 8:20:03 AM UTC+1, Sasha C. wrote:
Hi,

Our site has been experiencing alarmingly frequent stalling and crashing of our Gerrit server, so we tried several things to mend it, to no avail.

First, we upgraded Gerrit from 2.13.7 to 2.13.9 .
Then, we installed openjdk java 7 and used it instead of openjdk java 8, which is this server's system default,
Then, we did some tuning of parameters in gerrit.config, based on information found here:

...

Aditya kumar Madhira

unread,
Jan 10, 2019, 1:07:24 PM1/10/19
to Repo and Gerrit Discussion
Can I known how you fix it ? Thanks

Sasha C.

unread,
Jan 12, 2019, 11:19:39 AM1/12/19
to Repo and Gerrit Discussion
We moved it to another computer.

Aditya kumar Madhira

unread,
Jan 12, 2019, 11:20:55 AM1/12/19
to Sasha C., Repo and Gerrit Discussion
What cpu is used? What about memory?

Thanks 

On Sat, Jan 12, 2019 at 8:19 AM Sasha C. <crn...@gmail.com> wrote:
We moved it to another computer.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sasa Crnobrnja

unread,
Jan 17, 2019, 3:13:26 PM1/17/19
to Aditya kumar Madhira, Repo and Gerrit Discussion
We have moved it to a machine weaker than the one on which we had the problem. 

Hardware was not critical for the server used by such small team as ours.
 
We moved it around few times, from initial virtual machine used for proof of concept, to i7 workstation with 16MB RAM, then to Xeon server with 32MB, where it gave us headache, then back to the workstation, then to an dual core Pentium with 2MB, then to i5 with 8MB RAM ...

Aditya kumar Madhira

unread,
Jan 17, 2019, 3:24:17 PM1/17/19
to Sasa Crnobrnja, Repo and Gerrit Discussion
Ok, thanks for the clarification. 

Matthias Sohn

unread,
Jan 17, 2019, 3:29:08 PM1/17/19
to Sasa Crnobrnja, Aditya kumar Madhira, Repo and Gerrit Discussion
On Thu 17. Jan 2019 at 21:13, Sasa Crnobrnja <crn...@gmail.com> wrote:
We have moved it to a machine weaker than the one on which we had the problem. 

Hardware was not critical for the server used by such small team as ours.
 
We moved it around few times, from initial virtual machine used for proof of concept, to i7 workstation with 16MB RAM, then to Xeon server with 32MB, where it gave us headache, then back to the workstation, then to an dual core Pentium with 2MB, then to i5 with 8MB RAM ...

I guess you meant GB
Reply all
Reply to author
Forward
0 new messages