Merging 2 servers without losing review history

259 views
Skip to first unread message

sporzio

unread,
Feb 25, 2021, 3:40:10 PM2/25/21
to Repo and Gerrit Discussion
We currently have 2 independent Gerrit servers running to host our projects.  One is based on some historical projects and we would like to migrate all those to our newer server which has better capabilities and more storage and will result in less maintenance overall.  Is it possible to somehow merge the two together without losing the existing review history from either server?

Thanks in advance for any help.
-S

Luca Milanesio

unread,
Feb 25, 2021, 4:51:05 PM2/25/21
to Repo and Gerrit Discussion

On 25 Feb 2021, at 20:38, sporzio <spo...@gmail.com> wrote:

We currently have 2 independent Gerrit servers running to host our projects.  One is based on some historical projects and we would like to migrate all those to our newer server which has better capabilities and more storage and will result in less maintenance overall.  Is it possible to somehow merge the two together without losing the existing review history from either server?

Which version are you running?
Are all the three servers running the SAME version?

Luca.


Thanks in advance for any help.
-S


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/994086ee-3cbc-4f3b-a8b8-950573ff8e44n%40googlegroups.com.

sporzio

unread,
Mar 2, 2021, 2:23:06 AM3/2/21
to Repo and Gerrit Discussion
Both servers are running the same version.  Currently we are on version 3.3.2.

-S

Han-Wen Nienhuys

unread,
Mar 2, 2021, 8:31:42 AM3/2/21
to sporzio, Repo and Gerrit Discussion
On Thu, Feb 25, 2021 at 9:40 PM sporzio <spo...@gmail.com> wrote:
We currently have 2 independent Gerrit servers running to host our projects.  One is based on some historical projects and we would like to migrate all those to our newer server which has better capabilities and more storage and will result in less maintenance overall.  Is it possible to somehow merge the two together without losing the existing review history from either server?


With NoteDb, this is technically possible. There are a couple of things that need to be resolved:

* All NoteDb commits have to have their server  ID (the host that the server uses in author/comitter fields commits rewritten)
* All user IDs change, so a mapping needs to be applied to the NoteDb data
* New numeric change IDs have to be allocated to avoid conflicts.

It's something we have considered implementing, but we never had anyone need it badly enough to get it prioritized. 

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Andrew Grimberg

unread,
Mar 2, 2021, 2:49:44 PM3/2/21
to Han-Wen Nienhuys, sporzio, Repo and Gerrit Discussion
On 3/2/21 5:31 AM, 'Han-Wen Nienhuys' via Repo and Gerrit Discussion wrote:
>
>
> On Thu, Feb 25, 2021 at 9:40 PM sporzio <spo...@gmail.com
> <mailto:spo...@gmail.com>> wrote:
>
> We currently have 2 independent Gerrit servers running to host our
> projects.  One is based on some historical projects and we would
> like to migrate all those to our newer server which has better
> capabilities and more storage and will result in less maintenance
> overall.  Is it possible to somehow merge the two together without
> losing the existing review history from either server?
>
>
> With NoteDb, this is technically possible. There are a couple of things
> that need to be resolved:
>
> * All NoteDb commits have to have their server  ID (the host that the
> server uses in author/comitter fields commits rewritten)
> * All user IDs change, so a mapping needs to be applied to the NoteDb data
> * New numeric change IDs have to be allocated to avoid conflicts.
>
> It's something we have considered implementing, but we never had anyone
> need it badly enough to get it prioritized. 

I'll note that I miss the import plugin. That made it so easy to do
these sort of things in the past.

My team used it (back when it was still something we could use) several
times to migrate repositories between Gerrit systems. We've had to stop
doing that since it stopped working back in the 2.16 days.

-Andy-
OpenPGP_0x3360FFB703A9DA1F_and_old_rev.asc
OpenPGP_signature

Luca Milanesio

unread,
Mar 2, 2021, 5:24:44 PM3/2/21
to Repo and Gerrit Discussion, Luca Milanesio


> On 2 Mar 2021, at 19:49, Andrew Grimberg <grim...@gmail.com> wrote:
>
> On 3/2/21 5:31 AM, 'Han-Wen Nienhuys' via Repo and Gerrit Discussion wrote:
>>
>>
>> On Thu, Feb 25, 2021 at 9:40 PM sporzio <spo...@gmail.com
>> <mailto:spo...@gmail.com>> wrote:
>>
>> We currently have 2 independent Gerrit servers running to host our
>> projects. One is based on some historical projects and we would
>> like to migrate all those to our newer server which has better
>> capabilities and more storage and will result in less maintenance
>> overall. Is it possible to somehow merge the two together without
>> losing the existing review history from either server?
>>
>>
>> With NoteDb, this is technically possible. There are a couple of things
>> that need to be resolved:
>>
>> * All NoteDb commits have to have their server ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.

>> * New numeric change IDs have to be allocated to avoid conflicts.

See below my sad comment :-(

>>
>> It's something we have considered implementing, but we never had anyone
>> need it badly enough to get it prioritized.
>
> I'll note that I miss the import plugin. That made it so easy to do
> these sort of things in the past.
>
> My team used it (back when it was still something we could use) several
> times to migrate repositories between Gerrit systems. We've had to stop
> doing that since it stopped working back in the 2.16 days.

The “promise” of NoteDb was that we *could* move the reviews without the need of the importer plugin, because all the meta-data was in NoteDb which means in the repository itself.

However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

Bottom line: we are without ReviewDb and the “shortcuts” of updating the DBMS and, at the same time, we haven’t resolved the change number issue either.

It is a very unfortunate position, I know :-(

Luca.

>
> -Andy-
>
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/ac3fae0d-b3a4-c625-a585-13bcb9bf35b6%40gmail.com.
> <OpenPGP_0x3360FFB703A9DA1F_and_old_rev.asc>

Han-Wen Nienhuys

unread,
Mar 2, 2021, 5:44:51 PM3/2/21
to Luca Milanesio, Repo and Gerrit Discussion
On Tue, Mar 2, 2021 at 11:24 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
>>
>> * All NoteDb commits have to have their server  ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.


they are considered personal data, so persisting the email is problematic for GDPR purposes. Also, email addresses can actually move between accounts, while account IDs can't
 
However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

The numbers are the easiest to fix. You just have to allocate a new one. The change number is not persisted in the NoteDb data for that change, so you could easily assign a change to a new number.  The real problem is in the data that has to be rewritten to be be adapted to fit within the destination server (ie. accounts and server ID).
 

Bottom line: we are without ReviewDb and the “shortcuts” of updating the DBMS and, at the same time, we haven’t resolved the change number issue either.

It is a very unfortunate position, I know :-(

Luca.

David Ostrovsky

unread,
Mar 3, 2021, 4:33:46 AM3/3/21
to Repo and Gerrit Discussion
han...@google.com schrieb am Dienstag, 2. März 2021 um 23:44:51 UTC+1:
On Tue, Mar 2, 2021 at 11:24 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
>>
>> * All NoteDb commits have to have their server  ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.


they are considered personal data, so persisting the email is problematic for GDPR purposes. Also, email addresses can actually move between accounts, while account IDs can't
 
However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

The numbers are the easiest to fix. You just have to allocate a new one. The change number is not persisted in the NoteDb data for that change, so you could easily assign a change to a new number.  The real problem is in the data that has to be rewritten to be be adapted to fit within the destination server (ie. accounts and server ID).

If I look at change number 298465 on gerrit-review: [1]
  
  $ git ls-remote | grep 298465
  ce0ebd524edab6292e407c0122725077429374b7 refs/changes/65/298465/1
  cb48f1bebd1f58e86a3720749cbdcf75c0345bcd refs/changes/65/298465/2
  422b89e29a1827cf159a8901965fb3ebfb032923 refs/changes/65/298465/meta

Why it have to be unique across all projects on the gerrit site? 

Atlassian's Jira does it too: you have FOO-42 and BAR-42 issues.

GitHub does this too: PR 28 are different for bazelbuild and gerrit projects:


Of course, multi-tenancy Gerrit installations could emulate that setup, so that
the changes with the same number could be created:


Alternative approach would be to have change number (and sequence) per project.
As the consequence we would have multiple changes with the same number on 
one gerrit site.

We already adapted the URL to include project name:


The query predicate "number:42" wouldn't be unique, though.
 
The migration path wouldn't be trivial, though, but we should consider resolving
this technical debt from global database sequence era in the next major Gerrit release 4.0.

Luca Milanesio

unread,
Mar 3, 2021, 4:40:42 AM3/3/21
to Repo and Gerrit Discussion, Luca Milanesio

On 3 Mar 2021, at 09:33, David Ostrovsky <david.o...@gmail.com> wrote:



han...@google.com schrieb am Dienstag, 2. März 2021 um 23:44:51 UTC+1:
On Tue, Mar 2, 2021 at 11:24 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
>> 
>> * All NoteDb commits have to have their server  ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.


they are considered personal data, so persisting the email is problematic for GDPR purposes. Also, email addresses can actually move between accounts, while account IDs can't
 
However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

The numbers are the easiest to fix. You just have to allocate a new one. The change number is not persisted in the NoteDb data for that change, so you could easily assign a change to a new number.  The real problem is in the data that has to be rewritten to be be adapted to fit within the destination server (ie. accounts and server ID).

If I look at change number 298465 on gerrit-review: [1]
  
  $ git ls-remote | grep 298465
  ce0ebd524edab6292e407c0122725077429374b7 refs/changes/65/298465/1
  cb48f1bebd1f58e86a3720749cbdcf75c0345bcd refs/changes/65/298465/2
  422b89e29a1827cf159a8901965fb3ebfb032923 refs/changes/65/298465/meta

Why it have to be unique across all projects on the gerrit site? 

This was a requirement *a long ago* when we firstly introduced the project prefix: the requirement is needed because of the primary key of the changeid_project which does the redirect.

Example:
HTTP/2 302
location: /c/gerrit/+/297272/

If we *could* relax that requirement, then I don’t believe Gerrit would have any problem with duplicate change numbers.


Atlassian's Jira does it too: you have FOO-42 and BAR-42 issues.

GitHub does this too: PR 28 are different for bazelbuild and gerrit projects:


Of course, multi-tenancy Gerrit installations could emulate that setup, so that
the changes with the same number could be created:


Alternative approach would be to have change number (and sequence) per project.

I believe that would be good, but not necessary a requirement, as long as we would drop the support for the initial redirection.

As the consequence we would have multiple changes with the same number on 
one gerrit site.

We already adapted the URL to include project name:


Yep.


The query predicate "number:42" wouldn't be unique, though.

Change queries are fine to return more than one result, so it won’t be an issue IMHO.

Luca.

 
The migration path wouldn't be trivial, though, but we should consider resolving
this technical debt from global database sequence era in the next major Gerrit release 4.0.

 
 

Bottom line: we are without ReviewDb and the “shortcuts” of updating the DBMS and, at the same time, we haven’t resolved the change number issue either.

It is a very unfortunate position, I know :-(

Luca.
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Han-Wen Nienhuys

unread,
Mar 3, 2021, 4:42:55 AM3/3/21
to David Ostrovsky, Repo and Gerrit Discussion
On Wed, Mar 3, 2021 at 10:33 AM David Ostrovsky <david.o...@gmail.com> wrote:


han...@google.com schrieb am Dienstag, 2. März 2021 um 23:44:51 UTC+1:
On Tue, Mar 2, 2021 at 11:24 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
>>
>> * All NoteDb commits have to have their server  ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.


they are considered personal data, so persisting the email is problematic for GDPR purposes. Also, email addresses can actually move between accounts, while account IDs can't
 
However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

The numbers are the easiest to fix. You just have to allocate a new one. The change number is not persisted in the NoteDb data for that change, so you could easily assign a change to a new number.  The real problem is in the data that has to be rewritten to be be adapted to fit within the destination server (ie. accounts and server ID).

If I look at change number 298465 on gerrit-review: [1]
  
  $ git ls-remote | grep 298465
  ce0ebd524edab6292e407c0122725077429374b7 refs/changes/65/298465/1
  cb48f1bebd1f58e86a3720749cbdcf75c0345bcd refs/changes/65/298465/2
  422b89e29a1827cf159a8901965fb3ebfb032923 refs/changes/65/298465/meta

Why it have to be unique across all projects on the gerrit site? 

Atlassian's Jira does it too: you have FOO-42 and BAR-42 issues.

The change number has always been globally unique, and Dave at the time took a lot of care to avoid encoding the change number in NoteDb data.

If you break that assumption, you break any and all clients that operate on that assumption. 

It's already hard enough to upgrade Gerrit; let's not make it even harder.
 

Luca Milanesio

unread,
Mar 3, 2021, 5:02:31 AM3/3/21
to Repo and Gerrit Discussion, Luca Milanesio

On 3 Mar 2021, at 09:42, 'Han-Wen Nienhuys' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:



On Wed, Mar 3, 2021 at 10:33 AM David Ostrovsky <david.o...@gmail.com> wrote:


han...@google.com schrieb am Dienstag, 2. März 2021 um 23:44:51 UTC+1:
On Tue, Mar 2, 2021 at 11:24 PM Luca Milanesio <luca.mi...@gmail.com> wrote:
>> 
>> * All NoteDb commits have to have their server  ID (the host that the
>> server uses in author/comitter fields commits rewritten)

Yeah, I never understood why we decide to include the server-id in NoteDb.

>> * All user IDs change, so a mapping needs to be applied to the NoteDb data

And here we could have used e-mails, which aren’t linked to the internals of the old-times user-id sequences.


they are considered personal data, so persisting the email is problematic for GDPR purposes. Also, email addresses can actually move between accounts, while account IDs can't
 
However, the devil is in the details and some of the “legacy” of ReviewDb stuck with us, such as the change numbers, which were linked to a sequence on the DBMS, that is obviously not portable.
When we moved to NoteDb we just moved that concept to All-Projects.git without getting rid of it.

The numbers are the easiest to fix. You just have to allocate a new one. The change number is not persisted in the NoteDb data for that change, so you could easily assign a change to a new number.  The real problem is in the data that has to be rewritten to be be adapted to fit within the destination server (ie. accounts and server ID).

If I look at change number 298465 on gerrit-review: [1]
  
  $ git ls-remote | grep 298465
  ce0ebd524edab6292e407c0122725077429374b7 refs/changes/65/298465/1
  cb48f1bebd1f58e86a3720749cbdcf75c0345bcd refs/changes/65/298465/2
  422b89e29a1827cf159a8901965fb3ebfb032923 refs/changes/65/298465/meta

Why it have to be unique across all projects on the gerrit site? 

Atlassian's Jira does it too: you have FOO-42 and BAR-42 issues.

The change number has always been globally unique, and Dave at the time took a lot of care to avoid encoding the change number in NoteDb data.

Are you aware of any reasons why two changes with the same number won’t work?
(Assuming them to be used always with a project name prefix and apart from the changeid_project cache)



If you break that assumption, you break any and all clients that operate on that assumption. 

That is very true: a client asking for a change ONLY using its number would certainly break.


It's already hard enough to upgrade Gerrit; let's not make it even harder.

You have a good point: we are in a tremendous effort to get most of the community onto Gerrit v3, we don’t want to make yet another major breaking change and release a v4.
However, creating ideas on how to get this work would be more than welcome :-)

Luca.

sporzio

unread,
Mar 8, 2021, 4:33:22 PM3/8/21
to Repo and Gerrit Discussion
Thank you all for the detailed technical discussion around this.  It was very enlightening to see some of the complexities behind the scenes that we, as end users, take much for granted. 

On Tuesday, March 2, 2021 at 8:31:42 AM UTC-5 han...@google.com wrote:
On Thu, Feb 25, 2021 at 9:40 PM sporzio <spo...@gmail.com> wrote:
We currently have 2 independent Gerrit servers running to host our projects.  One is based on some historical projects and we would like to migrate all those to our newer server which has better capabilities and more storage and will result in less maintenance overall.  Is it possible to somehow merge the two together without losing the existing review history from either server?


With NoteDb, this is technically possible. There are a couple of things that need to be resolved:

* All NoteDb commits have to have their server  ID (the host that the server uses in author/comitter fields commits rewritten)
* All user IDs change, so a mapping needs to be applied to the NoteDb data
* New numeric change IDs have to be allocated to avoid conflicts.


Is it possible to do these things manually in some way, or is this something that would need to be done via a coded plugin or directly within the source due to the complexities involved?
 
It's something we have considered implementing, but we never had anyone need it badly enough to get it prioritized. 

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado


Thanks,
-S
Reply all
Reply to author
Forward
0 new messages