Gerrit Backup and Restore advise

659 views
Skip to first unread message

Mai Waly

unread,
Mar 24, 2019, 3:07:49 AM3/24/19
to Repo and Gerrit Discussion
Hi All,

Could you please advise regarding the below as for sure someone has been into this before.

We have gerrit2.13.2

for the last 2 years we have following the below backup steps:

At midnight
- take a dump of reviewdb and all our gerritdata hosted under /gerrit

then we do a restore test on a different machine to make sure backup is Ok

we restore reviewdb and all under /gerrit, then:

- $ java -jar gerrit.war init --> to reinitiate the restored VM with a differenet URL and ID, also clear all caches
- $ java -jar gerrit.war reindex --> to catch any inconsistency between metadata and repo

problem is we dont stop the service as we depend on that at midnight there is no or minual changes.


reinedexing take 23 hours.

Have any following different or better steps, is stopping the service for backup is a must?


Thanks alot
Mai

luca.mi...@gmail.com

unread,
Mar 24, 2019, 12:14:18 PM3/24/19
to Mai Waly, Repo and Gerrit Discussion


Sent from my iPhone

On 24 Mar 2019, at 07:07, Mai Waly <mai....@gmail.com> wrote:

Hi All,

Could you please advise regarding the below as for sure someone has been into this before.

We have gerrit2.13.2

for the last 2 years we have following the below backup steps:

At midnight
- take a dump of reviewdb and all our gerritdata hosted under /gerrit

then we do a restore test on a different machine to make sure backup is Ok

we restore reviewdb and all under /gerrit, then:

- $ java -jar gerrit.war init --> to reinitiate the restored VM with a differenet URL and ID, also clear all caches
- $ java -jar gerrit.war reindex --> to catch any inconsistency between metadata and repo

problem is we dont stop the service as we depend on that at midnight there is no or minual changes.


reinedexing take 23 hours.

Even if reindexing is fine, you may still have inconsistencies and then have problems if you had to restore a real production system from backup.

Example: if ReviewDb backup happens before the repos backup, you may have new changes and patch sets that are not punted by a corresponding ReviewDb record.

If repos backup happens before ReviewDb backup, you may have ReviewDb records that are pointing to non existent SHA1s.

Only the second is detected by a reindex, not the first.

To have a consistent backup in 2.13, you need to shutdown Gerrit.

In 2.14 you could use the readonly plugin and in 2.15 you can move to NoteDb, which resolves the problem.

I’m afraid in 2.13 there isn’t much you can do.

HTH

Luca


Have any following different or better steps, is stopping the service for backup is a must?


Thanks alot
Mai

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luca Milanesio

unread,
Mar 24, 2019, 5:30:50 PM3/24/19
to Mai Waly, Luca Milanesio, Repo and Gerrit Discussion

On 24 Mar 2019, at 16:14, luca.mi...@gmail.com wrote:



Sent from my iPhone

On 24 Mar 2019, at 07:07, Mai Waly <mai....@gmail.com> wrote:

Hi All, 

Could you please advise regarding the below as for sure someone has been into this before. 

We have gerrit2.13.2

for the last 2 years we have following the below backup steps:

At midnight 
- take a dump of reviewdb and all our gerritdata hosted under /gerrit

then we do a restore test on a different machine to make sure backup is Ok

we restore reviewdb and all under /gerrit, then:

- $ java -jar gerrit.war init --> to reinitiate the restored VM with a differenet URL and ID, also clear all caches 
- $ java -jar gerrit.war reindex --> to catch any inconsistency between metadata and repo

problem is we dont stop the service as we depend on that at midnight there is no or minual changes.


reinedexing take 23 hours.

Even if reindexing is fine, you may still have inconsistencies and then have problems if you had to restore a real production system from backup.

Example: if ReviewDb backup happens before the repos backup, you may have new changes and patch sets that are not punted by a corresponding ReviewDb record.

If repos backup happens before ReviewDb backup, you may have ReviewDb records that are pointing to non existent SHA1s.

Only the second is detected by a reindex, not the first.

To have a consistent backup in 2.13, you need to shutdown Gerrit.

In 2.14 you could use the readonly plugin and in 2.15 you can move to NoteDb, which resolves the problem.

I’m afraid in 2.13 there isn’t much you can do.

Let me clarify *what* you can do in v2.13: accept the inconsistencies and develop some custom scripting that detects the differences between the two (ReviewDb and repos) and fix the situation, with the acceptance that some data would be lost anyway. At the end of the day, it is better a "reduced loss" of data rather than an inconsistent data-store.

That is the reason why the community and Dave Borowitz in the first place invested so much effort in NoteDb, because we did not want anymore inconsistent backups :-)

With NoteDb (Gerrit v2.15 or later), you have 0% data loss and 100% consistent online backups.

Luca.

Gert van Dijk

unread,
Mar 25, 2019, 5:41:32 AM3/25/19
to Repo and Gerrit Discussion
On Sunday, 24 March 2019 22:30:50 UTC+1, lucamilanesio wrote:
That is the reason why the community and Dave Borowitz in the first place invested so much effort in NoteDb, because we did not want anymore inconsistent backups :-)

With NoteDb (Gerrit v2.15 or later), you have 0% data loss and 100% consistent online backups.

I believe you can't say that. If you're taking a copy of the Git repos of a running Gerrit you'll still make inconsistent copies of the data, at several levels:
  • You'd have to read-lock individual files in each repository, because while reading the file for copy, a write can happen in the file.
  • You'd have to read-lock each repository as a whole to avoid making references from one file to another while reading.
  • You'd have to read-lock all repositories as a whole to avoid making references in one repository to another while reading (e.g. draft comments in All-Users, setting topics, etc.)
So, in fact NoteDb does not solve this issue other than taking away the RDBMS backup as a separate component to backup.

As far as I understand, the only way to create fully consistent backups is to use a form of snapshotting that allows you to take a full consistent "picture" of the state at a certain point in time.

Use whatever you feel comfortable with, e.g. LVM, ZFS, BtrFS, or something else that's native to your OS. It's been mentioned by me several times as an offtopic sidenote, but I guess it's relevant here for the original question by Mai: use a modern filesystem that allows you to take incremental streaming backups. For example, with ZFS and older Gerrit, make sure that the RDBMS data and Gerrit site data is part of the same parent dataset and create a recursive snapshot. This will then include *everything* including an up-to-date index, and only requires transfer of changed data, so that you don't waste time/money/space on all of that. Your RDBMS will be able to recover (because it should be ACID and thus crash-recoverable).

Or am I missing something, Luca? Is there something magic I don't know about to solve the three levels of potential inconsistencies with NoteDb?

Thanks!

Luca Milanesio

unread,
Mar 25, 2019, 5:46:41 AM3/25/19
to Gert van Dijk, Luca Milanesio, Repo and Gerrit Discussion

On 25 Mar 2019, at 09:41, Gert van Dijk <gert...@gmail.com> wrote:

On Sunday, 24 March 2019 22:30:50 UTC+1, lucamilanesio wrote:
That is the reason why the community and Dave Borowitz in the first place invested so much effort in NoteDb, because we did not want anymore inconsistent backups :-)

With NoteDb (Gerrit v2.15 or later), you have 0% data loss and 100% consistent online backups.

I believe you can't say that. If you're taking a copy of the Git repos of a running Gerrit you'll still make inconsistent copies of the data, at several levels:
  • You'd have to read-lock individual files in each repository, because while reading the file for copy, a write can happen in the file.
  • You'd have to read-lock each repository as a whole to avoid making references from one file to another while reading.
  • You'd have to read-lock all repositories as a whole to avoid making references in one repository to another while reading (e.g. draft comments in All-Users, setting topics, etc.)
Backup is typically done on a slave aligned through replication, that allows you to have all the time you need to make a consistent snapshot.
If your filesystem supports snapshotting, then you can do on your master also.

So, in fact NoteDb does not solve this issue other than taking away the RDBMS backup as a separate component to backup.

See above.


As far as I understand, the only way to create fully consistent backups is to use a form of snapshotting that allows you to take a full consistent "picture" of the state at a certain point in time.

Use whatever you feel comfortable with, e.g. LVM, ZFS, BtrFS, or something else that's native to your OS. It's been mentioned by me several times as an offtopic sidenote, but I guess it's relevant here for the original question by Mai: use a modern filesystem that allows you to take incremental streaming backups. For example, with ZFS and older Gerrit, make sure that the RDBMS data and Gerrit site data is part of the same parent dataset and create a recursive snapshot. This will then include *everything* including an up-to-date index, and only requires transfer of changed data, so that you don't waste time/money/space on all of that. Your RDBMS will be able to recover (because it should be ACID and thus crash-recoverable).

Or am I missing something, Luca? Is there something magic I don't know about to solve the three levels of potential inconsistencies with NoteDb?

With NoteDb you have the full picture of the data at any point of time to any replica you want.
You can either do it on a separate slave or just another repository on the same filesystem. We do that all the time and it is consistent.

Furthermore, with the multi-site plugin, you can have all the masters you want, all eventually aligned.
It's like having multiple backups around the network, all available to have consistent snapshots.

Just take one off, do a full snapshot, and put it back to the cluster, and there you go :-)

HTH

Luca.


Thanks!

Gert van Dijk

unread,
Mar 25, 2019, 6:20:55 AM3/25/19
to Repo and Gerrit Discussion
On Monday, 25 March 2019 10:46:41 UTC+1, lucamilanesio wrote:
With NoteDb you have the full picture of the data at any point of time to any replica you want.

Yeah, agree. But having replicas is the big requirement here. The original post by Mai does not mention they are using replicas, so I assumed the use of a single node Gerrit. Suggesting that copying data on a running instance with NoteDb would produce consistent backups is plain false and dangerous, hence my previous reply in this thread. :)
 
You can either do it on a separate slave or just another repository on the same filesystem. We do that all the time and it is consistent.

If you do that, you will need to pause replication while you're copying the data, right?
 
Furthermore, with the multi-site plugin, you can have all the masters you want, all eventually aligned.
It's like having multiple backups around the network, all available to have consistent snapshots.

Just take one off, do a full snapshot, and put it back to the cluster, and there you go :-)

Yes, that's really great, but then you mention snapshot again and we're back to the requirement of doing snapshots?

Anyway, doing full copies of data regularly is a waste of resources imo (I guess it's a lot if it takes 23H to reindex). So, I still really enjoy my incremental snaphots :-)

HTH

Luca Milanesio

unread,
Mar 25, 2019, 6:27:08 AM3/25/19
to Gert van Dijk, Luca Milanesio, Repo and Gerrit Discussion

On 25 Mar 2019, at 10:20, Gert van Dijk <gert...@gmail.com> wrote:

On Monday, 25 March 2019 10:46:41 UTC+1, lucamilanesio wrote:
With NoteDb you have the full picture of the data at any point of time to any replica you want.

Yeah, agree. But having replicas is the big requirement here. The original post by Mai does not mention they are using replicas, so I assumed the use of a single node Gerrit. Suggesting that copying data on a running instance with NoteDb would produce consistent backups is plain false and dangerous, hence my previous reply in this thread. :)

When I say "replica" doesn't have to be a Gerrit server, it could well be a folder on shared drive.
Replication supports the filesystem protocol also :-)

Having a "location to put files" isn't a big requirement I believe, isn't it?
If you need to backup data, you need a filesystem to put the data onto.

 
You can either do it on a separate slave or just another repository on the same filesystem. We do that all the time and it is consistent.

If you do that, you will need to pause replication while you're copying the data, right?

Yes, exactly. A backup is a "snapshot" of a point in time of your data. The moment you suspend replication on that folder, that is your "backup timestamp".


 
Furthermore, with the multi-site plugin, you can have all the masters you want, all eventually aligned.
It's like having multiple backups around the network, all available to have consistent snapshots.

Just take one off, do a full snapshot, and put it back to the cluster, and there you go :-)

Yes, that's really great, but then you mention snapshot again and we're back to the requirement of doing snapshots?

Snapshot can be done either at filesystem level or by pausing replication and doing a regular .tar.gz or similar.


Anyway, doing full copies of data regularly is a waste of resources imo (I guess it's a lot if it takes 23H to reindex). So, I still really enjoy my incremental snaphots :-)

Yes, you could do incremental backups also. You chose the strategy you want, the whole point here is that with NoteDb *everything you need* is on the filesystem and thus you can have 100% guarantee of a consistent and complete backup operations.

Luca.


HTH

Gert van Dijk

unread,
Mar 25, 2019, 6:48:37 AM3/25/19
to Repo and Gerrit Discussion
On Monday, 25 March 2019 11:27:08 UTC+1, lucamilanesio wrote:
When I say "replica" doesn't have to be a Gerrit server, it could well be a folder on shared drive.
Replication supports the filesystem protocol also :-)

Having a "location to put files" isn't a big requirement I believe, isn't it?
If you need to backup data, you need a filesystem to put the data onto.

Yeah, well, with replication I need that on the same server, online. Plus, I need a place to copy it to. That basically doubles my storage requirements on a Gerrit server together with the initial data on another piece of hardware.
Offsite storage is much cheaper and can be kept offline while at rest.
 
Yes, exactly. A backup is a "snapshot" of a point in time of your data. The moment you suspend replication on that folder, that is your "backup timestamp".
[...]
Snapshot can be done either at filesystem level or by pausing replication and doing a regular .tar.gz or similar.

Agree, thanks for clarifying :)
 
Yes, you could do incremental backups also. You chose the strategy you want, the whole point here is that with NoteDb *everything you need* is on the filesystem and thus you can have 100% guarantee of a consistent and complete backup operations.

I could agree with your last point on that, but that guarantee means nothing unless you really know what you're doing. The big caveat is that people think they can just copy files on a running system. I've seen it fail so many times in practice when shit hit the fan and Gert has to help, only to find out their rsync data was taken while the app was running and there's not much I can do...
So, quite on the contrary, an RDBMS helps you to take consistent backups (of the RDBMS data only!); when invoking an SQL dump it will snapshot the data for you and you can't really do it the wrong way. I guess it just depends on how you look at it: the actual operation to copy data is easier with plain files, but also more error prone. And worse, you will only notice something got corrupt when it's too late. Taking the snapshot approach is much safer for all kinds of data - the only requirement is running an application that fsync's properly (ACID compliant).

HTH

Luca Milanesio

unread,
Mar 25, 2019, 6:53:35 AM3/25/19
to Gert van Dijk, Luca Milanesio, Repo and Gerrit Discussion

On 25 Mar 2019, at 10:48, Gert van Dijk <gert...@gmail.com> wrote:

On Monday, 25 March 2019 11:27:08 UTC+1, lucamilanesio wrote:
When I say "replica" doesn't have to be a Gerrit server, it could well be a folder on shared drive.
Replication supports the filesystem protocol also :-)

Having a "location to put files" isn't a big requirement I believe, isn't it?
If you need to backup data, you need a filesystem to put the data onto.

Yeah, well, with replication I need that on the same server, online. Plus, I need a place to copy it to. That basically doubles my storage requirements on a Gerrit server together with the initial data on another piece of hardware.
Offsite storage is much cheaper and can be kept offline while at rest.

I typically use a mounted NFS on a different machine.

 
Yes, exactly. A backup is a "snapshot" of a point in time of your data. The moment you suspend replication on that folder, that is your "backup timestamp".
[...]
Snapshot can be done either at filesystem level or by pausing replication and doing a regular .tar.gz or similar.

Agree, thanks for clarifying :)
 
Yes, you could do incremental backups also. You chose the strategy you want, the whole point here is that with NoteDb *everything you need* is on the filesystem and thus you can have 100% guarantee of a consistent and complete backup operations.

I could agree with your last point on that, but that guarantee means nothing unless you really know what you're doing. The big caveat is that people think they can just copy files on a running system.

Well, that is true for anything, not strictly related to Gerrit or Git.
If you want to have a "consistent set of data" you cannot copy over files whilst they are getting modified on the filesystem, isn't it?
You need to "freeze" the content (e.g. suspending replication) or have the underlying filesystem to manage the snapshots for you.

I've seen it fail so many times in practice when shit hit the fan and Gert has to help, only to find out their rsync data was taken while the app was running and there's not much I can do...

Oh yes, that's a common mistake. People believe that rsync and replication are the same ... and they ended up with corrupted Git repos because of rsync :-(

So, quite on the contrary, an RDBMS helps you to take consistent backups (of the RDBMS data only!); when invoking an SQL dump it will snapshot the data for you and you can't really do it the wrong way. I guess it just depends on how you look at it: the actual operation to copy data is easier with plain files, but also more error prone. And worse, you will only notice something got corrupt when it's too late. Taking the snapshot approach is much safer for all kinds of data - the only requirement is running an application that fsync's properly (ACID compliant).

People uses also virtualisation with Gerrit, that means that snapshots are a lot easier to manage also.
Lots of options the with NoteDb :-)

Matthias Sohn

unread,
Mar 26, 2019, 6:26:52 AM3/26/19
to Luca Milanesio, Gert van Dijk, Repo and Gerrit Discussion
On Mon, Mar 25, 2019 at 6:53 AM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 25 Mar 2019, at 10:48, Gert van Dijk <gert...@gmail.com> wrote:

On Monday, 25 March 2019 11:27:08 UTC+1, lucamilanesio wrote:
When I say "replica" doesn't have to be a Gerrit server, it could well be a folder on shared drive.
Replication supports the filesystem protocol also :-)

Having a "location to put files" isn't a big requirement I believe, isn't it?
If you need to backup data, you need a filesystem to put the data onto.

Yeah, well, with replication I need that on the same server, online. Plus, I need a place to copy it to. That basically doubles my storage requirements on a Gerrit server together with the initial data on another piece of hardware.
Offsite storage is much cheaper and can be kept offline while at rest.

I typically use a mounted NFS on a different machine.

 
Yes, exactly. A backup is a "snapshot" of a point in time of your data. The moment you suspend replication on that folder, that is your "backup timestamp".
[...]
Snapshot can be done either at filesystem level or by pausing replication and doing a regular .tar.gz or similar.

Agree, thanks for clarifying :)
 
Yes, you could do incremental backups also. You chose the strategy you want, the whole point here is that with NoteDb *everything you need* is on the filesystem and thus you can have 100% guarantee of a consistent and complete backup operations.

I could agree with your last point on that, but that guarantee means nothing unless you really know what you're doing. The big caveat is that people think they can just copy files on a running system.

Well, that is true for anything, not strictly related to Gerrit or Git.
If you want to have a "consistent set of data" you cannot copy over files whilst they are getting modified on the filesystem, isn't it?
You need to "freeze" the content (e.g. suspending replication) or have the underlying filesystem to manage the snapshots for you.

I've seen it fail so many times in practice when shit hit the fan and Gert has to help, only to find out their rsync data was taken while the app was running and there's not much I can do...

Oh yes, that's a common mistake. People believe that rsync and replication are the same ... and they ended up with corrupted Git repos because of rsync :-(

So, quite on the contrary, an RDBMS helps you to take consistent backups (of the RDBMS data only!); when invoking an SQL dump it will snapshot the data for you and you can't really do it the wrong way. I guess it just depends on how you look at it: the actual operation to copy data is easier with plain files, but also more error prone. And worse, you will only notice something got corrupt when it's too late. Taking the snapshot approach is much safer for all kinds of data - the only requirement is running an application that fsync's properly (ACID compliant).

People uses also virtualisation with Gerrit, that means that snapshots are a lot easier to manage also.
Lots of options the with NoteDb :-)

I tried to summarise this discussion to add it to the documentation here
Please review and provide feedback.

-Matthias 

Matthias Sohn

unread,
Dec 9, 2019, 4:00:47 PM12/9/19
to Repo and Gerrit Discussion

luca.mi...@gmail.com

unread,
Dec 10, 2019, 3:42:21 AM12/10/19
to Matthias Sohn, Repo and Gerrit Discussion


Sent from my iPhone

On 9 Dec 2019, at 21:00, Matthias Sohn <matthi...@gmail.com> wrote:



Great job Matthias.

Luca

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages