[Need help] Gerrit recovery steps after power outage

306 views
Skip to first unread message

Mateusz Grzechociński

unread,
Jul 5, 2018, 4:36:37 PM7/5/18
to Repo and Gerrit Discussion
Hi,

last night we had an unexpected power outage which caused some really complicated problems with our Gerrit instance. I did my best to recover it but I failed so far.

My setup is as follows:

After starting docker container, at a first glance everything looks fine. Gerrit is up, both UI and ssh access are available. Unfortunately, 5 of 7 awaiting changes are not accessible through UI anymore. When I open them, the messages are:
  • [Old UI] The page you requested was not found, or you do not have permission to view this page.
  • [PolyGerrit] 404 Not Found: + Server error: Not found: my-project~1634
There is no additional log on stdout. Here's how the it looks in the UI (yellow changes are not accessible):


When browsing 'CHANGES' table in the DB, all "yellow" changes are missing:

gerrit> select change_key, last_updated_on, dest_project_name, subject from changes order by last_updated_on asc;
 CHANGE_KEY                                | LAST_UPDATED_ON         | DEST_PROJECT_NAME       | SUBJECT
 ------------------------------------------+-------------------------+-------------------------+---------
 [...irrelevant rows...]
 If1c5d0accc249eff0d0a93b64f38714b0a6c63d3 | 2018-07-05 09:18:21.507 | **************          | Add new button style
 I43a429ca2a4de8d2d65d6eea382922db5e75ebf6 | 2018-07-05 09:29:32.184 | **************          | Allow user to choose virtual card with new buttons
 Ibdcf1eab2ef78d6e11322c192bc4060755eca599 | 2018-07-05 10:15:08.852 | **************          | Set new buttons all over the place
 I83a41b8d6e3232317408434570408a0b73121c61 | 2018-07-05 12:44:32.612 | **************          | Add trailing commas to support code reformat

Fortunately, they are still accessible and downloadable through ssh:

> git fetch ssh://gerrit/my-project refs/changes/34/1634/1 && git checkout FETCH_HEAD
Resolving deltas: 100% (4491/4491), done.
From ssh://gerrit:29418/my-project
 * branch              refs/changes/70/1370/1 -> FETCH_HEAD
You are in 'detached HEAD' state. You can look around, [...]

AFAIU, due to unexpected system shutdown, Gerrit DB is now corrupted and it lost information about a few awaiting changes. Since I'm able to download them locally and have all other repositories up-to-date, I'm pretty ok with getting rid of those broken changes and uploading them again. Unfortunately, only 4 of all 5 changes could be published again. On the last one I get the mysterious:

To ssh://gerrit/my-project
 ! [remote rejected]   HEAD -> refs/for/master (internal server error) 

I'm also not able to abandon those changes, neither from the UI nor through ssh:
↪  ssh gerrit gerrit review --abandon 24028834b999698f93fedf8f8ae3958b98d3c174
error: fatal: Cannot parse change

fatal: one or more reviews failed; review output above

I'm looking for any hints to dig deeper and find possible fixes. I'm also looking forward to any steps to follow to restore fully-working Gerrit even without the history of all my merged/abandoned changes but with current Git repos, user configs, ACLs etc.

Thanks in advance for any help.

MG

Edwin Kempin

unread,
Jul 6, 2018, 2:28:26 AM7/6/18
to mateusz.gr...@gmail.com, Repo and Gerrit Discussion
Any internal server error should write a log to the gerrit  error_log file.
It would be good to know what it says.

I would have a look at the server-side repository of "my-project" (in the filesystem) and check if there are any refs/changes/ refs for that change.

 

I'm also not able to abandon those changes, neither from the UI nor through ssh:
↪  ssh gerrit gerrit review --abandon 24028834b999698f93fedf8f8ae3958b98d3c174
error: fatal: Cannot parse change

fatal: one or more reviews failed; review output above

I'm looking for any hints to dig deeper and find possible fixes. I'm also looking forward to any steps to follow to restore fully-working Gerrit even without the history of all my merged/abandoned changes but with current Git repos, user configs, ACLs etc.

Thanks in advance for any help.

MG

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mateusz Grzechociński

unread,
Jul 6, 2018, 3:24:53 AM7/6/18
to Repo and Gerrit Discussion
Hi Edwin. Thanks for replying.


W dniu piątek, 6 lipca 2018 08:28:26 UTC+2 użytkownik Edwin Kempin napisał:
Any internal server error should write a log to the gerrit  error_log file.
It would be good to know what it says.

Here's the output when I'm trying to push a change and gerrint internal server error is returned in the console:
 
gerrit-container           | [2018-07-06 06:57:16,001] [ReceiveCommits-2] ERROR com.google.gerrit.server.git.receive.ReceiveCommits : [my-project-1530860235977-d9c2c287]Error collecting groups for changes
gerrit
-container           | com.google.gerrit.server.project.NoSuchChangeException: 1576
gerrit
-container           | at com.google.gerrit.server.notedb.ChangeNotes$Factory.createChecked(ChangeNotes.java:133)
gerrit
-container           | at com.google.gerrit.server.git.GroupCollector$1.lookup(GroupCollector.java:120)
gerrit
-container           | at com.google.gerrit.server.git.GroupCollector.resolveGroup(GroupCollector.java:294)
gerrit
-container           | at com.google.gerrit.server.git.GroupCollector.resolveGroups(GroupCollector.java:269)
gerrit
-container           | at com.google.gerrit.server.git.GroupCollector.getGroups(GroupCollector.java:233)
gerrit
-container           | at com.google.gerrit.server.git.receive.ReceiveCommits.selectNewAndReplacedChangesFromMagicBranch(ReceiveCommits.java:1977)
gerrit
-container           | at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:548)
gerrit
-container           | at com.google.gerrit.server.git.receive.AsyncReceiveCommits$Worker.run(AsyncReceiveCommits.java:117)
                               
[...]
gerrit
-container           | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
gerrit
-container           | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
gerrit
-container           | at java.lang.Thread.run(Thread.java:748)
gerrit
-container           | [2018-07-06 06:57:16,489] [SSH git-receive-pack /my-project (mateusz.grzechocinski)] WARN  com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker did not call end() before returning
gerrit
-container           | [2018-07-06 06:57:16,490] [SSH git-receive-pack /my-project (mateusz.grzechocinski)] WARN  com.google.gerrit.server.git.receive.AsyncReceiveCommits : Error in ReceiveCommits while processing changes for project my-project
gerrit
-container           | java.util.concurrent.ExecutionException: java.lang.NullPointerException
gerrit
-container           | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
gerrit
-container           | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
gerrit
-container           | at com.google.gerrit.server.git.WorkQueue$Task.get(WorkQueue.java:405)
gerrit
-container           | at com.google.gerrit.server.git.MultiProgressMonitor.waitFor(MultiProgressMonitor.java:245)
gerrit
-container           | at com.google.gerrit.server.git.receive.AsyncReceiveCommits.onPreReceive(AsyncReceiveCommits.java:266)
gerrit
-container           | at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:266)
gerrit
-container           | at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:208)
                               
[...]
gerrit
-container           | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
gerrit
-container           | at java.lang.Thread.run(Thread.java:748)
gerrit
-container           | Caused by: java.lang.NullPointerException
gerrit
-container           | at com.google.gerrit.server.git.receive.ReceiveCommits.preparePatchSetsForReplace(ReceiveCommits.java:2290)
gerrit
-container           | at com.google.gerrit.server.git.receive.ReceiveCommits.processCommands(ReceiveCommits.java:550)
gerrit
-container           | at com.google.gerrit.server.git.receive.AsyncReceiveCommits$Worker.run(AsyncReceiveCommits.java:117)
gerrit
-container           | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
gerrit
-container           | at com.google.gerrit.server.util.RequestScopePropagator.lambda$cleanup$1(RequestScopePropagator.java:212)
gerrit
-container           | at com.google.gerrit.server.util.RequestScopePropagator.lambda$context$0(RequestScopePropagator.java:191)
gerrit
-container           | at com.google.gerrit.server.util.ThreadLocalRequestScopePropagator.lambda$wrapImpl$0(ThreadLocalRequestScopePropagator.java:50)
gerrit
-container           | at com.google.gerrit.server.util.RequestScopePropagator$1.call(RequestScopePropagator.java:94)
gerrit
-container           | at com.google.gerrit.server.util.RequestScopePropagator$2.run(RequestScopePropagator.java:125)
gerrit
-container           | ... 8 more

 
I would have a look at the server-side repository of "my-project" (in the filesystem) and check if there are any refs/changes/ refs for that change.

For almost all of the inaccessible changes I can't find them in the refs/changes as well. One change looks interesting. I have a change with ID: 1576. There's a folder refs/changes/76/1576 but it's empty. AFAIK, there should be a file named as patch set number whichc contains commit-ish of that patchset, right?

What's the best way to get rid of all those broken changes completely and keep my repos, users, acls only, so that I'll be able to push them as new changes and continue my work eventually?


 

Edwin Kempin

unread,
Jul 6, 2018, 3:44:24 AM7/6/18
to mateusz.gr...@gmail.com, Repo and Gerrit Discussion
Yes, looks like you lost the change refs due to the crash. 

What's the best way to get rid of all those broken changes completely and keep my repos, users, acls only, so that I'll be able to push them as new changes and continue my work eventually?
If you have the local commits for these changes (e.g. in the local repositories of the change owners) you can just repush them with a new Change-Id (amend the commit and remove the Change-Id line, amend again to get a new Change-Id from the hook),
 
If there are no change refs, these changes only exist in the change index. Unfortunately we don't have any command to remove single entries from the change index. But you can run the reindex program [1] to reindex *all* changes. For running this command Gerrit must be offline and running it can take a long time if your server has many changes. So it's on you if you want to do that or not.

Mateusz Grzechociński

unread,
Jul 8, 2018, 4:08:41 PM7/8/18
to Repo and Gerrit Discussion
W dniu piątek, 6 lipca 2018 09:44:24 UTC+2 użytkownik Edwin Kempin napisał:
 
What's the best way to get rid of all those broken changes completely and keep my repos, users, acls only, so that I'll be able to push them as new changes and continue my work eventually?
If you have the local commits for these changes (e.g. in the local repositories of the change owners) you can just repush them with a new Change-Id (amend the commit and remove the Change-Id line, amend again to get a new Change-Id from the hook),

That's what I've already done. For few changes it worked pretty well. For one - there was internal server error which i mentioned previously.
 
 
If there are no change refs, these changes only exist in the change index. Unfortunately we don't have any command to remove single entries from the change index. But you can run the reindex program [1] to reindex *all* changes. For running this command Gerrit must be offline and running it can take a long time if your server has many changes. So it's on you if you want to do that or not.


Reindex reports many error that "Table 'CHANGES' does not exist". Strange thing, since I can select this table through 'gsql'.

I've eventually restored my Gerrit from backup. Fortunately, I had a full gerrit site backup so recovery was as simple as moving folder to new place.

Thank for your help.

-- 
MG
Reply all
Reply to author
Forward
0 new messages