Gerrit 2.16 -> Gerrit 3.0: change X has no note_db_state; rebuild it first

1,137 views
Skip to first unread message

JB

unread,
Sep 5, 2019, 3:19:22 AM9/5/19
to Repo and Gerrit Discussion
We are looking at migrating to Gerrit 3.0.x from Gerrit 2.16.x and are running into hundreds of errors with the notedb migration.

Our logs contain lots of entries such as:

[2019-09-04 09:54:52,210] [RebuildChange-1] ERROR com.google.gerrit.server.notedb.rebuild.NoteDbMigrator : Error migrating primary storage for 1
com
.google.gerrit.server.notedb.PrimaryStorageMigrator$NoNoteDbStateException: change 1 has no note_db_state; rebuild it first
at com
.google.gerrit.server.notedb.PrimaryStorageMigrator$1.update(PrimaryStorageMigrator.java:299)
at com
.google.gerrit.server.notedb.PrimaryStorageMigrator$1.update(PrimaryStorageMigrator.java:289)
at com
.google.gwtorm.server.AbstractAccess.atomicUpdate(AbstractAccess.java:80)
at com
.google.gerrit.server.notedb.PrimaryStorageMigrator.setReadOnlyInReviewDb(PrimaryStorageMigrator.java:287)
at com
.google.gerrit.server.notedb.PrimaryStorageMigrator.migrateToNoteDbPrimary(PrimaryStorageMigrator.java:254)
at com
.google.gerrit.server.notedb.rebuild.NoteDbMigrator.lambda$setNoteDbPrimary$2(NoteDbMigrator.java:637)
at com
.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com
.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com
.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at com
.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:83)
at java
.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java
.util.concurrent.FutureTask.run(FutureTask.java:266)
at java
.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java
.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at com
.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:646)
at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java
.lang.Thread.run(Thread.java:748)


But with no information on how to fix this issue. I have seen the thread about someone else experiencing it, which led me to this: https://github.com/jameingh/gerrit/blob/master/gerrit-server/src/main/java/com/google/gerrit/server/notedb/PrimaryStorageMigrator.java#L282 where the complexity of rebuilding is explained (with a diff where this reference is altered: https://github.com/GerritCodeReview/gerrit/commit/602370152f86fba9a4c801da3435be9ae826635c#diff-17019ad58a6efc6705a951dd804e439e).

I have come across multiple references to this issue already:

Does anyone know how to rebuild these items?

Alan Tokaev

unread,
Sep 6, 2019, 3:32:28 AM9/6/19
to Repo and Gerrit Discussion
Experiencing same issue.
Can not migrate changes to noteDB.

Migration from 2.15.3 -> 2.16.10.

JB

unread,
Sep 6, 2019, 7:23:03 AM9/6/19
to Repo and Gerrit Discussion
I have spent a lot of effort working through this, and the Gerrit documentation is less than helpful.

There are seemingly no references anywhere to if you run into this problem. Here are some things I have tried however:

On 2.16.10:

    Attempt to run the NoteDB migration with:
        java -jar review_site/bin/gerrit.war migrate-to-note-db -d review_site/ 2>&1 | tee /tmp/notedb.txt
    I thought this may have been an issue with corruptions somewhere in our MySQL database. So I attempted to remove all relating entries from it:
        I obtained a list of all related changes from the aforementioned log file:
            cat /tmp/notedb.txt | grep 'has no note_db_state' | awk '{print $3}'
        Made a shell script to remove these from the database (I had to format them into a single line, too)
            #!/bin/bash

            for i in _PUT_ALL_CHANGES_HERE; do
              mysql --database="gerritdb" --execute="DELETE FROM change_messages where change_id=$i"
            done

        Then tried to re-run the migration:
            java -jar review_site/bin/gerrit.war migrate-to-note-db -d review_site/ 2>&1 | tee /tmp/notedb.txt
            Same issue.

I searched for a long time, reattempted migrations and deleting things (finding all those links above)

I then noticed a line in the NoteDB documentation which said:

"In general, users should not set the options described below manually; this section serves primarily as a reference."

and

"noteDb.changes.disableReviewDb=true: All access to Changes or related tables is disabled; reads return no results, and writes are no-ops. Assumes the state of all changes in NoteDb is accurate, and so is only safe once all changes are NoteDb primary. Otherwise, reading changes only from NoteDb might result in inaccurate results, and writing to NoteDb would compound the problem."

Out of chance I did the following steps:

On 2.16.10:

    Attempted to migrate:
        java -jar review_site/bin/gerrit.war migrate-to-note-db -d review_site/ 2>&1 | tee /tmp/notedb.txt
            Got errors
    Modified the notedb.config to look like this:
        [noteDb "changes"]
           autoMigrate = false
           trial = false
           write = true
           read = true
           sequence = true
           primaryStorage = NOTE_DB
           disableReviewDb = true
           primaryStorage = review db
           disableReviewDb = false
    After the first migration it looked like this:
        [noteDb "changes"]
           autoMigrate = false
           trial = false
           write = true
           read = true
           sequence = true
           primaryStorage = review db
           disableReviewDb = false

Again, out of chance I tried to migrate again, still on 2.16.10:

    java -jar review_site/bin/gerrit.war migrate-to-note-db -d review_site/ 2>&1 | tee /tmp/notedb.txt
        This took a while and 15-20 minutes later it had finished reindexing.

At this point I could upgrade to version 3.0.x and running the init command:

    java -jar review_site/bin/gerrit.war init --batch --install-all-plugins -d review_site/

Then it showed me:

Migrating data to schema 180 ...
Migrating data to schema 181 ...
Rebuild GPGP note map to build subkey to master key map

..and started okay on Gerrit version 3.0.x.

But now I wonder:

    Why did this work?
    Am I missing something?
    Is there really no documentation for this?

Doug Luedtke

unread,
Sep 6, 2019, 10:43:58 AM9/6/19
to Repo and Gerrit Discussion
We hit this with Gerrit 2.16.10 and an online migration while testing in a lower environment. The migration process seemed to heal itself after another pass. The additional pass was automatic during the online migration. It completed the first pass in about 3 hours and that error. Then another 2.5 hours it had corrected itself and marked the migration as complete. 

Do I know what the migration did to fix it? No. And now that others report they were not able to get past the problem worries me for when we go to production with 2.16.10/11/etc.

Luca Milanesio

unread,
Sep 6, 2019, 11:24:25 AM9/6/19
to Doug Luedtke, Luca Milanesio, Repo and Gerrit Discussion

On 6 Sep 2019, at 15:43, Doug Luedtke <douglas...@gmail.com> wrote:

We hit this with Gerrit 2.16.10 and an online migration while testing in a lower environment. The migration process seemed to heal itself after another pass. The additional pass was automatic during the online migration. It completed the first pass in about 3 hours and that error. Then another 2.5 hours it had corrected itself and marked the migration as complete. 

From my experience of migration to NoteDb on GerritHub.io, you need to treat the Gerrit v2.16 version as two separate migrations:

a) Migration to v2.16 / ReviewDb
b) (OnLine)Migration from ReviewDb to NoteDb

When we tried to do a) and b) all at once on GerritHub.io (test environment) we miserably failed with similar errors.
When we broken them down into two phases (a) (b) then we succeeded.

HTH

Luca.


Do I know what the migration did to fix it? No. And now that others report they were not able to get past the problem worries me for when we go to production with 2.16.10/11/etc.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/fd18f476-aade-4caa-ba6d-b947bafa0e1b%40googlegroups.com.

Martin Fick

unread,
Sep 6, 2019, 11:33:57 AM9/6/19
to repo-d...@googlegroups.com, Luca Milanesio, Doug Luedtke
On Friday, September 6, 2019 4:24:18 PM MDT Luca Milanesio wrote:
> > On 6 Sep 2019, at 15:43, Doug Luedtke <douglas...@gmail.com> wrote:
> >
> > We hit this with Gerrit 2.16.10 and an online migration while testing in a
> > lower environment. The migration process seemed to heal itself after
> > another pass. The additional pass was automatic during the online
> > migration. It completed the first pass in about 3 hours and that error.
> > Then another 2.5 hours it had corrected itself and marked the migration
> > as complete.
> From my experience of migration to NoteDb on GerritHub.io
> <http://gerrithub.io/>, you need to treat the Gerrit v2.16 version as two
> separate migrations:
>
> a) Migration to v2.16 / ReviewDb
> b) (OnLine)Migration from ReviewDb to NoteDb
>
> When we tried to do a) and b) all at once on GerritHub.io
> <http://gerrithub.io/> (test environment) we miserably failed with similar
> errors. When we broken them down into two phases (a) (b) then we succeeded.

This is rather a scary premonition to those looking to upgrade. It sounds like
gerrit 2.16 is not yet ready for consumption in production? Should we really
be recommending people upgrade to it if ?

-Martin


--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Luca Milanesio

unread,
Sep 6, 2019, 11:37:58 AM9/6/19
to Martin Fick, Luca Milanesio, repo-d...@googlegroups.com, Doug Luedtke

On 6 Sep 2019, at 16:33, Martin Fick <mf...@codeaurora.org> wrote:

On Friday, September 6, 2019 4:24:18 PM MDT Luca Milanesio wrote:
On 6 Sep 2019, at 15:43, Doug Luedtke <douglas...@gmail.com> wrote:

We hit this with Gerrit 2.16.10 and an online migration while testing in a
lower environment. The migration process seemed to heal itself after
another pass. The additional pass was automatic during the online
migration. It completed the first pass in about 3 hours and that error.
Then another 2.5 hours it had corrected itself and marked the migration
as complete.
From my experience of migration to NoteDb on GerritHub.io
<http://gerrithub.io/>, you need to treat the Gerrit v2.16 version as two
separate migrations:

a) Migration to v2.16 / ReviewDb
b) (OnLine)Migration from ReviewDb to NoteDb

When we tried to do a) and b) all at once on GerritHub.io
<http://gerrithub.io/> (test environment) we miserably failed with similar
errors. When we broken them down into two phases (a) (b) then we succeeded.

This is rather a scary premonition to those looking to upgrade. It sounds like
gerrit 2.16 is not yet ready for consumption in production?

We have been running v2.16 in prod as soon as it was available, and then migrated to v3.0 ... and we have 100% availability on GerritHub.io :-)
With versions prior to v2.16, we actually had outages due to pretty serious problems, that have been now fixed.

Should we really
be recommending people upgrade to it if ?

100% yes, as I would be a lot more scared using an older version of Gerrit, like v2.13 or v2.14.

That's my experience :-)

Luca.
Message has been deleted

Luca Milanesio

unread,
Sep 6, 2019, 11:45:21 AM9/6/19
to Martin Fick, Luca Milanesio, repo-d...@googlegroups.com, Doug Luedtke

On 6 Sep 2019, at 16:37, Luca Milanesio <Luca.Mi...@gmail.com> wrote:



On 6 Sep 2019, at 16:33, Martin Fick <mf...@codeaurora.org> wrote:

On Friday, September 6, 2019 4:24:18 PM MDT Luca Milanesio wrote:
On 6 Sep 2019, at 15:43, Doug Luedtke <douglas...@gmail.com> wrote:

We hit this with Gerrit 2.16.10 and an online migration while testing in a
lower environment. The migration process seemed to heal itself after
another pass. The additional pass was automatic during the online
migration. It completed the first pass in about 3 hours and that error.
Then another 2.5 hours it had corrected itself and marked the migration
as complete.
From my experience of migration to NoteDb on GerritHub.io
<http://gerrithub.io/>, you need to treat the Gerrit v2.16 version as two
separate migrations:

a) Migration to v2.16 / ReviewDb
b) (OnLine)Migration from ReviewDb to NoteDb

When we tried to do a) and b) all at once on GerritHub.io
<http://gerrithub.io/> (test environment) we miserably failed with similar
errors. When we broken them down into two phases (a) (b) then we succeeded.

This is rather a scary premonition to those looking to upgrade. It sounds like 
gerrit 2.16 is not yet ready for consumption in production?

We have been running v2.16 in prod as soon as it was available, and then migrated to v3.0 ... and we have 100% availability on GerritHub.io :-)
With versions prior to v2.16, we actually had outages due to pretty serious problems, that have been now fixed.

I have proposed into the Gerrit User Summit 2019 agenda a specific talk about Gerrit Upgrades :-)

I'll go step-by-step on how to upgrade Gerrit from v2.13 (or earlier) up to v3.0 and with (almost) zero downtime.

Hope that session would be interesting for many people and will trigger lots of Q&A on the topic.

Luca.

Martin Fick

unread,
Sep 6, 2019, 11:56:07 AM9/6/19
to Luca Milanesio, repo-d...@googlegroups.com
On Friday, September 6, 2019 4:49:17 PM MDT Luca Milanesio wrote:
> Hi Martin,
> I'm replying privately, as you wrote this just for me :-)
>
> > On 6 Sep 2019, at 16:45, Martin Fick <mf...@codeaurora.org> wrote:
> >> <http://gerrithub.io/ <http://gerrithub.io/>> :-) With versions prior to
> >> v2.16, we actually had outages due to pretty serious problems, that have
> >> been now fixed.
> >>
> >>> Should we really
> >>> be recommending people upgrade to it if ?
> >>
> >> 100% yes, as I would be a lot more scared using an older version of
> >> Gerrit,
> >> like v2.13 or v2.14.
> >>
> >> That's my experience :-)
> >
> > I am hearing contradictory statements. It sounds like online migration did
> > not work for you.
>
> Nope, it did work. I just broke it down into two steps:
>
> a) Index migration (online)
> b) NoteDb migration (online)

OK, thank you for clarifying, this seems different than what you listed above.

> It did work, when I did it in two steps.
>
> > It sounds like 2.16 might be fine for new sites, but for anyone wishing to
> > upgrade, the online upgrade process is currently broken?
>
> Our site has 500k changes, 40k repos and 16k active users and it worked,
> when broken down into two steps.

Is this the standard method recommended in the documentation?

luca.mi...@gmail.com

unread,
Sep 6, 2019, 12:00:50 PM9/6/19
to Martin Fick, repo-d...@googlegroups.com


Sent from my iPhone
The documentation doesn’t rally guide you through the process :-(

I found out by myself in the hard way :-(

I believe we need more a step-by-step section for upgrades, especially the ones like this one.

Luca

JB

unread,
Sep 6, 2019, 12:06:15 PM9/6/19
to Repo and Gerrit Discussion
Out of interest, are there going to be any supported and up to date configuration management options? I have written a Puppet module for managing Gerrit, but I would clearly rather have one that is community supported. Until I use this I've been performing the upgrades with Ansible playbooks.

Alan Tokaev

unread,
Sep 8, 2019, 11:55:25 AM9/8/19
to Repo and Gerrit Discussion

On Friday, September 6, 2019 at 9:32:28 AM UTC+2, Alan Tokaev wrote:
Experiencing same issue.
Can not migrate changes to noteDB.

Migration from 2.15.3 -> 2.16.10.

After debugging the  problem I was able to identify the root cause.
We had ca. 11K "orphan changes" without git repos (the repos were deleted from the file system).
This was the cause, why note_db_state couldn't be written.
For all missing projects there is one warning in the log that could be easily overlooked:
Repository foo not found


The solution was to clean up the database for the corrupted changes by deleting them from the tables.
CHANGES, PATCH_SETS, etc. 

To summarize, the missing NoteDb state exception during the NoteDb migration indicates corruption in the database. The only question ist, what kind of corruption it is.
So I would recommend to debug the migration process.

JB

unread,
Sep 9, 2019, 2:28:14 AM9/9/19
to Repo and Gerrit Discussion
"The solution was to clean up the database for the corrupted changes by deleting them from the tables.
CHANGES, PATCH_SETS, etc. " - What is the best way of doing this? Any documentation?

"So I would recommend to debug the migration process." - Are there any documented ways of doing this?

David Ostrovsky

unread,
Sep 9, 2019, 3:05:06 AM9/9/19
to Repo and Gerrit Discussion

Am Sonntag, 8. September 2019 17:55:25 UTC+2 schrieb Alan Tokaev:

On Friday, September 6, 2019 at 9:32:28 AM UTC+2, Alan Tokaev wrote:
Experiencing same issue.
Can not migrate changes to noteDB.

Migration from 2.15.3 -> 2.16.10.

After debugging the  problem I was able to identify the root cause.
We had ca. 11K "orphan changes" without git repos (the repos were deleted from the file system).
This was the cause, why note_db_state couldn't be written.
For all missing projects there is one warning in the log that could be easily overlooked:
Repository foo not found


The solution was to clean up the database for the corrupted changes by deleting them from the tables.
CHANGES, PATCH_SETS, etc. 

To summarize, the missing NoteDb state exception during the NoteDb migration indicates corruption in the database. The only question ist, what kind of corruption it is.

Thank for tracking this down!

I went ahead, and added "orphan project" corruption detection to the known
recovery scenarios to the NoteDb migration process.

We have already supported one corruption detection: changes without patch sets.
Changes without git repositories case was not detected until now and the migration
process was indeed broken.

Can you test with your 11k orphan projects scenario this change: [1] and verify,
if this fixed the migration for you? (You can fetch the gerrit.war from the CI, once
the change is built).


Edwin Kempin

unread,
Sep 9, 2019, 3:21:46 AM9/9/19
to David Ostrovsky, Repo and Gerrit Discussion
I think we are aware that upgrade strategies are not properly documented:

It would be great if anyone could help with that!

Please also add further upgrade questions that need clarification to that issue.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,
Sep 9, 2019, 5:10:08 AM9/9/19
to Edwin Kempin, Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion

On 9 Sep 2019, at 08:21, 'Edwin Kempin' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

I think we are aware that upgrade strategies are not properly documented:

It would be great if anyone could help with that!

It's already on my backlog: will work and present on that in November at the Gerrit User Summit 2019 in Sunnyvale.

Luca.
 

Edwin Kempin

unread,
Sep 9, 2019, 5:32:33 AM9/9/19
to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion
On Mon, Sep 9, 2019 at 11:10 AM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 9 Sep 2019, at 08:21, 'Edwin Kempin' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

I think we are aware that upgrade strategies are not properly documented:

It would be great if anyone could help with that!

It's already on my backlog: will work and present on that in November at the Gerrit User Summit 2019 in Sunnyvale.
Perfect! Thank you very much!!

 

David Ostrovsky

unread,
Sep 9, 2019, 7:06:53 AM9/9/19
to Repo and Gerrit Discussion

Am Montag, 9. September 2019 08:28:14 UTC+2 schrieb JB:
"The solution was to clean up the database for the corrupted changes by deleting them from the tables.
CHANGES, PATCH_SETS, etc. " - What is the best way of doing this? Any documentation?


Can you try with the patch referenced in my previous comment?

"So I would recommend to debug the migration process." - Are there any documented ways of doing this?


Set up Bazel build tool: [1], generate Eclipse .project/.classpath: [2], set up Eclipse: [3],
refer to the debug section: [4], replace in daemon launch configuration: [5] the line: 14
with: "MigrateToNoteDb -d <your_gerrit_site>", push debug button and step through
NoteDbMigrator#rebuildProject() method.

JB

unread,
Sep 9, 2019, 8:54:26 AM9/9/19
to Repo and Gerrit Discussion
Could you link to the CI job so that I can find the correct file?

David Ostrovsky

unread,
Sep 9, 2019, 9:16:50 AM9/9/19
to Repo and Gerrit Discussion

Am Montag, 9. September 2019 14:54:26 UTC+2 schrieb JB:
Could you link to the CI job so that I can find the correct file?


Message has been deleted

JB

unread,
Oct 29, 2019, 3:42:50 AM10/29/19
to Repo and Gerrit Discussion
Thanks for this, I was able to upgrade to Gerrit 3.x with this, but I have not had the chance to do any extensive testing as of yet.

For now I think it would be good to merge this change.

Thank you for your help!
Reply all
Reply to author
Forward
0 new messages