Backing up Gerrit

3,942 views
Skip to first unread message

Swindells, Thomas

unread,
Aug 24, 2010, 9:52:40 AM8/24/10
to repo-d...@googlegroups.com

We’re just in the process of starting to use Gerrit and as part of the process we need to work out a disaster recovery plan in how to minimize the impact if the server dies etc.

 

What is safest way to backup the state of a Gerrit server so that it can be reliably restored?

 

I’ve had a look at the replication support and while this seems to provide a solution for backing up the repositories it doesn’t cover backing up Gerrit’s current state in terms of what reviews and review comments are outstanding, the merge queue etc.

 

We are currently using the default config of an internal H2 database but can change to another one if this makes the problem simpler. If the repositories and database are backed up individually then I would assume there is risk they the backup wouldn’t be in sync? Or is gerrit designed in such a way that this wouldn’t be a problem?

 

Thanks,

 

Thomas




**************************************************************************************
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postm...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary.

NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18 4EX, United Kingdom. A company registered in England and Wales. Registered no. 3080780. VAT no. GB 603 8808 40-00
**************************************************************************************

Shawn Pearce

unread,
Aug 24, 2010, 1:13:17 PM8/24/10
to Swindells, Thomas, repo-d...@googlegroups.com
On Tue, Aug 24, 2010 at 06:52, Swindells, Thomas <TSwin...@nds.com> wrote:
> We’re just in the process of starting to use Gerrit and as part of the
> process we need to work out a disaster recovery plan in how to minimize the
> impact if the server dies etc.
>
> What is safest way to backup the state of a Gerrit server so that it can be
> reliably restored?
>
> I’ve had a look at the replication support and while this seems to provide a
> solution for backing up the repositories it doesn’t cover backing up
> Gerrit’s current state in terms of what reviews and review comments are
> outstanding, the merge queue etc.
>
> We are currently using the default config of an internal H2 database but can
> change to another one if this makes the problem simpler. If the repositories
> and database are backed up individually then I would assume there is risk
> they the backup wouldn’t be in sync? Or is gerrit designed in such a way
> that this wouldn’t be a problem?

Use the replication system to push repositories to another server.
Your exposure for losing source code is then bounded by the delay of
the replication, which is configurable. The default 30 second delay
is probably acceptable, if the server dies before something can
replicate it should still be on the desktop/laptop of the user who
just gave it to the server. :-)

The metadata database is another story altogether. You can backup the
H2 system by logging in over SSH and executing a query to backup the
database. I think this works:

ssh -p 29418 localhost gerrit gsql -c \" BACKUP TO \'/tmp/backup.zip\' \"

The quoting here is ugly because you need to get 3 levels of quotes
in. The first level is eaten by the shell running ssh, which we
handling by putting \ in front of every quote character. The second
level is eaten by the Gerrit Code Review server when it parses the
command, this is the \" we have around the string. The last level is
eaten by H2 as it parses the path of where the backup will go. This
is the \'.

Unfortunately there is no real safeguard to ensure the metadata is
consistent with the Git repositories themselves. If you use
replication to archive the Git repositories in near-real time, and
backup the H2 database nightly, yes, a recovery will be inconsistent.

If you restored this inconsistent state, you will find two things:

Changes not merged, but are: Changes that were merged into the
project may not show as merged in Gerrit. If you encounter such a
change, submitting it again will be a no-op (but will fix the
metadata) if your project's submit type is anything *except*
cherry-pick. If its cherry-pick, you would need to manually locate
the final commit SHA-1 for that change, upload it as a replacement to
the change, then temporarily update the submit type to be say "merge
if necessary" and submit the change. Gerrit will realize its already
been submitted and close the change.

New change uploads fail: The change_id sequence will start handing
out duplicate change numbers, because it rewound in the database, but
there are change_id values already used in the Git repositories.
Uploads will continue to fail until the change_id sequence in the
database is altered or advanced beyond the largest value already used.
You can find those by scanning the `git for-each-ref refs/changes/`
output of each project and clipping the 2nd from last component,
sorting those and finding the largest.


Yea, this is an area where we need to work on it. The fact that the
metadata can be so wildly inconsistent is a good reason to move the
metadata out of the database and into Git itself. We keep talking
about doing it, but it hasn't been done yet.

Swindells, Thomas

unread,
Aug 25, 2010, 5:07:18 AM8/25/10
to repo-d...@googlegroups.com
Hi Shawn,

Thanks for such a comprehensive answer.

It sounds like the simplest/safest solution for now is just to stop gerrit each night, rsync (or equivalent) the review site directory and then start it again, this will ensure that the files are all in sync and is no worse than what our current svn solution does.

It would be good in the long run if there was a more seamless way to do this, as you say one option is to store all the information in git, the other option would be to have a "gerrit backup" command which iterates through each project, locks the project to prevent updates, backs up the repository, backs up the relevant parts of the database and then unlocks the project. I've not looked at the gerrit code yet so don't know how feasible this would be.

Thomas

Shawn Pearce

unread,
Aug 25, 2010, 10:20:51 AM8/25/10
to Swindells, Thomas, repo-d...@googlegroups.com
On Wed, Aug 25, 2010 at 02:07, Swindells, Thomas <TSwin...@nds.com> wrote:
> It would be good in the long run if there was a more seamless way to do this, as you say one option is to store all the information in git, the other option would be to have a "gerrit backup" command which iterates through each project, locks the project to prevent updates, backs up the repository, backs up the relevant parts of the database and then unlocks the project. I've not looked at the gerrit code yet so don't know how feasible this would be.

Its not going to be easy. Servers side code just sort of blind-writes
to the database for most operations, in too many places. By the time
you invest the effort to go around and add guards to create a "backup
lock" its going to be close to the effort required to convert the
majority of the data storage to Git. :-)

The more I'm hacking on the exp-nosql branch (which is planned to
support storing everything in a NoSQL clustered database like Apache
Cassandra) the more I'm realizing we just want to move data (even the
files in site_path/etc) into Git.

Michelle Pogado

unread,
Jul 11, 2018, 1:23:21 AM7/11/18
to Repo and Gerrit Discussion

On Wednesday, 25 August 2010 02:13:17 UTC+9, Shawn Pearce wrote:

The metadata database is another story altogether.  You can backup the
H2 system by logging in over SSH and executing a query to backup the
database.  I think this works:

  ssh -p 29418 localhost gerrit gsql -c \" BACKUP TO \'/tmp/backup.zip\' \"

The quoting here is ugly because you need to get 3 levels of quotes
in.  The first level is eaten by the shell running ssh, which we
handling by putting \ in front of every quote character.  The second
level is eaten by the Gerrit Code Review server when it parses the
command, this is the \" we have around the string.  The last level is
eaten by H2 as it parses the path of where the backup will go.  This
is the \'.



Hello Shawn, 

I tried this method but, where :
$ ssh -p 29418 xxxx@xxxxx gerrit gsql -c \" BACKUP TO \'testBackUp.zip\' \"
UPDATE 0; 10 ms

I was expecting the backup to be located in my current directory but I don't see it.

May I know where the backup actually be located?

Jonathan Nieder

unread,
Jul 11, 2018, 1:31:00 AM7/11/18
to Michelle Pogado, Repo and Gerrit Discussion
Hi,

The backup would end up on the xxxxx machine, not the client machine you are running ssh from.

My advice is to pass an absolute path, since that way you know where the backup was written to. Otherwise I believe it would go in gerrit's current directory, which is likely the root of your gerrit site.

Hope that helps,
Jonathan

вт, 10 июл. 2018 г. в 22:23, Michelle Pogado <michell...@gmail.com>:
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Pursehouse

unread,
Jul 11, 2018, 1:31:29 AM7/11/18
to Michelle Pogado, Repo and Gerrit Discussion
I just tried it against my test site and the backup file is located in my current directory.

David Pursehouse

unread,
Jul 11, 2018, 1:32:41 AM7/11/18
to Michelle Pogado, Repo and Gerrit Discussion
On Wed, Jul 11, 2018 at 2:31 PM David Pursehouse <david.pu...@gmail.com> wrote:
On Wed, Jul 11, 2018 at 2:23 PM Michelle Pogado <michell...@gmail.com> wrote:

On Wednesday, 25 August 2010 02:13:17 UTC+9, Shawn Pearce wrote:

The metadata database is another story altogether.  You can backup the
H2 system by logging in over SSH and executing a query to backup the
database.  I think this works:

  ssh -p 29418 localhost gerrit gsql -c \" BACKUP TO \'/tmp/backup.zip\' \"

The quoting here is ugly because you need to get 3 levels of quotes
in.  The first level is eaten by the shell running ssh, which we
handling by putting \ in front of every quote character.  The second
level is eaten by the Gerrit Code Review server when it parses the
command, this is the \" we have around the string.  The last level is
eaten by H2 as it parses the path of where the backup will go.  This
is the \'.



Hello Shawn, 

I tried this method but, where :
$ ssh -p 29418 xxxx@xxxxx gerrit gsql -c \" BACKUP TO \'testBackUp.zip\' \"
UPDATE 0; 10 ms

I was expecting the backup to be located in my current directory but I don't see it.

May I know where the backup actually be located?

I just tried it against my test site and the backup file is located in my current directory.

Jonathan answered at the same time as me.  I guess in my case it ends up in the current directory because I'm running Gerrit on the same machine.

Michelle Pogado

unread,
Jul 11, 2018, 2:26:34 AM7/11/18
to Repo and Gerrit Discussion
Hi, 

Indeed, the backup did end up on the xxxxx machine for me.
Thank you so much for the responses!
Reply all
Reply to author
Forward
0 new messages