Warning message on Reindex

171 views
Skip to first unread message

adrien...@gmail.com

unread,
Jun 18, 2021, 10:43:27 AM6/18/21
to Repo and Gerrit Discussion
Hi,

We setup a Gerrit HA Cluster composing of Server1 and Server2 on our Data Center1. Both are up and running.
We also setup a Cold standby Gerrit (in shutdown mode) in another Data Center2 (for DR purpose).

The way we configured our Load Balancer is that all traffic routes to DataCenter1->Server1 only. And incase it went down as detected by the health check, LoadBalancer will route the traffic to DataCenter1->Server2. 
The same automated detection is applied to restore operation back to DataCenter1->Server1 when it comes back (up and running).

Now when Data Center 1 is not available, we will manually operate on Data Center2 by turning on the Gerrit instance and Enabling the instance in the Load Balancer.
Restoring operation back to Data Center 1 is also manual since we need to ensure that at least one of the member cluster is up running again before we do the  switch.

In terms of file system, we are using NFS which is shared  by the Cluster nodes in Data Center 1.
It replicates every 5 minutes to another NFS that is mounted on Data Center 2.

Similarly, there is a common Postgres Database that is shared by the Cluster nodes in Data Center 1. 
It also replicates to another Postgres Database in Data Center 2.

In terms of going to the DR, Replications flow for both NFS and DB will be reversed.

We set this up using Gerrit 3.1.8.

Sorry for the long introduction but my query is next.

We are at a stage where we refresh the data from current non-ha Gerrit production to the HA cluster.
As part of our 'refresh' process, there is a step to trigger a re-index.

Question#1
As I run the re-index the elapsed time took like 2hrs and 15 mins. Is this normal since we are using NFS in the cluster while the current non-ha Gerrit uses a local filesystem?
When we do a refresh to another non-ha Gerrit dev environment, the re-index takes only like 25 mins tops.

Question#2
During re-index, I saw a lot of warning like the below. Could this be because Gerrit on Server2 is down? 
And is there anything that I need to do since the re-index was completed?

[2021-06-17 14:48:15,258] [RestForwarderScheduler-1] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling send event:ref-updated => https://server2:8443 (try #31) for retry after 30000 msec [CONTEXT PLUGIN="gerrit" PLUGIN="high-availability" ]
[2021-06-17 14:48:15,258] [RestForwarderScheduler-4] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling invalidate cache accounts:accounts => https://server2:8443 (try #31) for retry after 30000 msec [CONTEXT PLUGIN="high-availability" ]
[2021-06-17 14:48:15,684] [RestForwarderScheduler-3] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling index account:1000020 => https://server2:8443 (try #31) for retry after 30000 msec
[2021-06-17 14:48:24,655] [RestForwarderScheduler-2] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling invalidate cache accounts:accounts => https://server2:8443 (try #15) for retry after 30000 msec [CONTEXT PLUGIN="high-availability" ]

I also run the following checks in Server1 and  Server2 and both indicate that there is nothing more to re-index: 
 [gerrit@server1]$ ssh -p 29418 localhost gerrit index start accounts             
Nothing to reindex, index is already the latest version
[gerrit@server1]$ ssh -p 29418 localhost gerrit index start changes
Nothing to reindex, index is already the latest version

[gerrit@server2]$ ssh -p 29418 localhost gerrit index start accounts             
Nothing to reindex, index is already the latest version
[gerrit@server2]$ ssh -p 29418 localhost gerrit index start changes
Nothing to reindex, index is already the latest version


Hope to hear from anyone soon.

Regards,

Adrien

Luca Milanesio

unread,
Jun 18, 2021, 1:58:51 PM6/18/21
to Repo and Gerrit Discussion, Luca Milanesio
Hi Andien,
Thanks for writing to the mailing list about Gerrit HA.

See my feedback below.

On 18 Jun 2021, at 15:43, adrien...@gmail.com <adrien...@gmail.com> wrote:

Hi,

We setup a Gerrit HA Cluster composing of Server1 and Server2 on our Data Center1. Both are up and running.
We also setup a Cold standby Gerrit (in shutdown mode) in another Data Center2 (for DR purpose).

The way we configured our Load Balancer is that all traffic routes to DataCenter1->Server1 only. And incase it went down as detected by the health check, LoadBalancer will route the traffic to DataCenter1->Server2. 
The same automated detection is applied to restore operation back to DataCenter1->Server1 when it comes back (up and running).

Out of curiosity, what do you use as Load Balancer?
Any reasons why you don’t want to use the Server2 in parallel with Server1? (e.g. off-loading part of the traffic from Server1, such as Git clones)


Now when Data Center 1 is not available, we will manually operate on Data Center2 by turning on the Gerrit instance and Enabling the instance in the Load Balancer.
Restoring operation back to Data Center 1 is also manual since we need to ensure that at least one of the member cluster is up running again before we do the  switch.

In terms of file system, we are using NFS which is shared  by the Cluster nodes in Data Center 1.
It replicates every 5 minutes to another NFS that is mounted on Data Center 2.

Why every 5 minutes? It would be best to just configuring the replication from Server1/Server2 to Data Center 2, using the replication plugin.


Similarly, there is a common Postgres Database that is shared by the Cluster nodes in Data Center 1. 
It also replicates to another Postgres Database in Data Center 2.

Why having a PostgreSQL DB with Gerrit v3.1.8? What’s the purpose?

In terms of going to the DR, Replications flow for both NFS and DB will be reversed.

You could also configure replication from Data Center 2 back to Data Center 1.


We set this up using Gerrit 3.1.8.

Gerrit v3.1.8 doesn’t need a RDBMS at all, so no reasons to have PostgreSQL.


Sorry for the long introduction but my query is next.

Actually it was very useful to understand the context.


We are at a stage where we refresh the data from current non-ha Gerrit production to the HA cluster.
As part of our 'refresh' process, there is a step to trigger a re-index.

Question#1
As I run the re-index the elapsed time took like 2hrs and 15 mins. Is this normal since we are using NFS in the cluster while the current non-ha Gerrit uses a local filesystem?
When we do a refresh to another non-ha Gerrit dev environment, the re-index takes only like 25 mins tops.

That is expected, as the NFS latency is typically much higher than the local filesystem.
However, you may NOT need to perform the full reindex but just a delta-reindex, which is typicallly *a lot* faster.


Question#2
During re-index, I saw a lot of warning like the below. Could this be because Gerrit on Server2 is down? 
And is there anything that I need to do since the re-index was completed?

[2021-06-17 14:48:15,258] [RestForwarderScheduler-1] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling send event:ref-updated => https://server2:8443 (try #31) for retry after 30000 msec [CONTEXT PLUGIN="gerrit" PLUGIN="high-availability" ]
[2021-06-17 14:48:15,258] [RestForwarderScheduler-4] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling invalidate cache accounts:accounts => https://server2:8443 (try #31) for retry after 30000 msec [CONTEXT PLUGIN="high-availability" ]
[2021-06-17 14:48:15,684] [RestForwarderScheduler-3] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling index account:1000020 => https://server2:8443 (try #31) for retry after 30000 msec
[2021-06-17 14:48:24,655] [RestForwarderScheduler-2] WARN  com.ericsson.gerrit.plugins.highavailability.forwarder.rest.RestForwarderScheduler : Rescheduling invalidate cache accounts:accounts => https://server2:8443 (try #15) for retry after 30000 msec [CONTEXT PLUGIN="high-availability" ]

Yep, that is correct. Server1 wants to propagate reindexing, events, cache evictions and all other events to Server2, but that is down, so Server1 has a series of retry cycles.


I also run the following checks in Server1 and  Server2 and both indicate that there is nothing more to re-index: 
 [gerrit@server1]$ ssh -p 29418 localhost gerrit index start accounts             
Nothing to reindex, index is already the latest version
[gerrit@server1]$ ssh -p 29418 localhost gerrit index start changes
Nothing to reindex, index is already the latest version

If you want to re-trigger a full online reindex you need to use the ‘--force’ option, otherwise Gerrit won’t reindex an existing index version.

HTH

Luca.



[gerrit@server2]$ ssh -p 29418 localhost gerrit index start accounts             
Nothing to reindex, index is already the latest version
[gerrit@server2]$ ssh -p 29418 localhost gerrit index start changes
Nothing to reindex, index is already the latest version


Hope to hear from anyone soon.

Regards,

Adrien

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d3de8d37-9e1f-42a7-b859-97588b5e511cn%40googlegroups.com.

adrien...@gmail.com

unread,
Jun 18, 2021, 5:28:34 PM6/18/21
to Repo and Gerrit Discussion
Hi Luca,

Thanks for the response. Maybe I forgot to mention that I just took over the project when everything was already architected and setup.
I just started on the part of configuration to ensure that HA works. 
Please find below the answers to your queries, hope these helps. I still have some queries on the inline.

Luca:Out of curiosity, what do you use as Load Balancer?
Any reasons why you don’t want to use the Server2 in parallel with Server1? (e.g. off-loading part of the traffic from Server1, such as Git clones)

AZ: We use F5. 
I believe its all about the 'stickiness' of the session which is why the decision is to ensure that only Server1 is serving while  Server2 is just a fallback. Although both are running simultaneously but F5 was configured not to do round-robin but prioritize instead.

Luca  :Why every 5 minutes? It would be best to just configuring the replication from Server1/Server2 to Data Center 2, using the replication plugin.

AZ: The NFS replication is something that is not even supposed to be known from the application point of view. The underlying implementation uses an Alias so that Gerrit may not know where the actual NFS is mounted. So our DC1 and DC2 are actually using the same filesystem. But the mounted NFS could from DC1 or DC2 and is being replicated every 5 mins.
I have not read about the replication plugin and as far as I remember, we are not following a multi-master approach as was decided prior to me taking over.

Luca: Why having a PostgreSQL DB with Gerrit v3.1.8? What’s the purpose?

AZ: I  believe that since we are coming from a non-ha Production (also with Gerrit v3.1.8) that still uses Postgres with Reviewdb, the design is to move into an HA platform with less differences. I could foresee that in our future upgrades, that Postgres and Reviewdb will be gone.

Luca:You could also configure replication from Data Center 2 back to Data Center 1.

AZ: Yes as you mentioned using the replication plugin. But as the design was already in place, I am just implementing and will propose the use of such plugin in the near future.

Luca: Gerrit v3.1.8 doesn’t need a RDBMS at all, so no reasons to have PostgreSQL.

AZ: Agreed. You are preaching the choir here.

Luca: That is expected, as the NFS latency is typically much higher than the local filesystem.
However, you may NOT need to perform the full reindex but just a delta-reindex, which is typicallly *a lot* faster.

AZ Question: How do you trigger a delta-reindex, does it look like running this command?  
                         java -jar gerrit.war  delta-reindex -d $GERRIT_SITE


Luca: Yep, that is correct. Server1 wants to propagate reindexing, events, cache evictions and all other events to Server2, but that is down, so Server1 has a series of retry cycles.

AZ: I assume nothing needs to be done at this point

Luca: If you want to re-trigger a full online reindex you need to use the ‘--force’ option, otherwise Gerrit won’t reindex an existing index version.

AZ Question: When you say "online", do you mean to re-index even when users are accessing Gerrit?
                        Just got lost here. Does the command needed to do it looks like the below?
                          java -jar gerrit.war  reindex --force -d $GERRIT_SITE

Regards,
Adrien


Luca Milanesio

unread,
Jun 18, 2021, 5:37:05 PM6/18/21
to Repo and Gerrit Discussion, Luca Milanesio, adrien...@gmail.com

On 18 Jun 2021, at 22:28, adrien...@gmail.com <adrien...@gmail.com> wrote:

Hi Luca,

Thanks for the response. Maybe I forgot to mention that I just took over the project when everything was already architected and setup.
I just started on the part of configuration to ensure that HA works. 
Please find below the answers to your queries, hope these helps. I still have some queries on the inline.

Luca:Out of curiosity, what do you use as Load Balancer?
Any reasons why you don’t want to use the Server2 in parallel with Server1? (e.g. off-loading part of the traffic from Server1, such as Git clones)

AZ: We use F5. 
I believe its all about the 'stickiness' of the session which is why the decision is to ensure that only Server1 is serving while  Server2 is just a fallback. Although both are running simultaneously but F5 was configured not to do round-robin but prioritize instead.

Luca  :Why every 5 minutes? It would be best to just configuring the replication from Server1/Server2 to Data Center 2, using the replication plugin.

AZ: The NFS replication is something that is not even supposed to be known from the application point of view. The underlying implementation uses an Alias so that Gerrit may not know where the actual NFS is mounted. So our DC1 and DC2 are actually using the same filesystem. But the mounted NFS could from DC1 or DC2 and is being replicated every 5 mins.
I have not read about the replication plugin and as far as I remember, we are not following a multi-master approach as was decided prior to me taking over.

Luca: Why having a PostgreSQL DB with Gerrit v3.1.8? What’s the purpose?

AZ: I  believe that since we are coming from a non-ha Production (also with Gerrit v3.1.8) that still uses Postgres with Reviewdb, the design is to move into an HA platform with less differences. I could foresee that in our future upgrades, that Postgres and Reviewdb will be gone.

PostreSQL and ReviewDb are GONE already in Gerrit v3.1.8: you don’t need them.


Luca:You could also configure replication from Data Center 2 back to Data Center 1.

AZ: Yes as you mentioned using the replication plugin. But as the design was already in place, I am just implementing and will propose the use of such plugin in the near future.

+1


Luca: Gerrit v3.1.8 doesn’t need a RDBMS at all, so no reasons to have PostgreSQL.

AZ: Agreed. You are preaching the choir here.

Luca: That is expected, as the NFS latency is typically much higher than the local filesystem.
However, you may NOT need to perform the full reindex but just a delta-reindex, which is typicallly *a lot* faster.

AZ Question: How do you trigger a delta-reindex, does it look like running this command?  
                         java -jar gerrit.war  delta-reindex -d $GERRIT_SITE

See [1] of the high-availability plugin documentation: just define what is the “start date/time” of the reindex and then the high-availability plugin will do the job automatically at startup.
There aren’t any commands to run, it just works like magic :-)




Luca: Yep, that is correct. Server1 wants to propagate reindexing, events, cache evictions and all other events to Server2, but that is down, so Server1 has a series of retry cycles.

AZ: I assume nothing needs to be done at this point

Yep.


Luca: If you want to re-trigger a full online reindex you need to use the ‘--force’ option, otherwise Gerrit won’t reindex an existing index version.

AZ Question: When you say "online", do you mean to re-index even when users are accessing Gerrit?
                        Just got lost here. Does the command needed to do it looks like the below?
                          java -jar gerrit.war  reindex --force -d $GERRIT_SITE

Forget about re-triggering yet another on-line reindex: using [1] would do the job for you.

HTH

Luca.

adrien...@gmail.com

unread,
Jun 29, 2021, 3:40:29 PM6/29/21
to Repo and Gerrit Discussion
Hi Luca,

I just wanted to circle back with your statement below about Postgres and ReviewDB. 

Because as we were running some test on the new HA, it looks like Gerrit (true to itself) is up running and was not even connected to the Postgres database after all (I asked the DBA's and they found no connectivity from Gerrit with the schema user we asked them to create) and we were able to work on the HA cluster and simulate our HA and DR exercises.

    " AZ: I  believe that since we are coming from a non-ha Production (also with Gerrit v3.1.8) that still uses Postgres with Reviewdb, the design is to move into an HA platform with less differences. I could foresee that in our future upgrades, that   
            Postgres and Reviewdb will be gone.

     Luca: PostreSQL and ReviewDb are GONE already in Gerrit v3.1.8: you don’t need them. "


I just wanted to know like how are we going to check whether everything we need were already moved out of the database? Because it looks like when our Gerrit was upgraded to 3.1.8, it was not even noticed that the database is even in use.

Below is our current gerrit.config and it looks like the 'database' configuration is even non-trivial anymore as I cannot see these parameters in the Gerrit documentation. 
Can I assume that these database configs are not necessary and we don't even need to bother with the database anymore?

[gerrit]
        basePath = git
        canonicalWebUrl = https://ourinternalgerriturl/
        serverId = server_id_number
[database]
        type = postgresql
        hostname = database_server_name
        database = reviewdb
        username = gerritdatabaseuser

Hope to hear from you soon,

Regards,

Adrien

Luca Milanesio

unread,
Jun 29, 2021, 3:52:21 PM6/29/21
to adrien...@gmail.com, Luca Milanesio, Repo and Gerrit Discussion
On 29 Jun 2021, at 20:40, adrien...@gmail.com <adrien...@gmail.com> wrote:

Hi Luca,

I just wanted to circle back with your statement below about Postgres and ReviewDB. 

Because as we were running some test on the new HA, it looks like Gerrit (true to itself) is up running and was not even connected to the Postgres database after all (I asked the DBA's and they found no connectivity from Gerrit with the schema user we asked them to create) and we were able to work on the HA cluster and simulate our HA and DR exercises.

    " AZ: I  believe that since we are coming from a non-ha Production (also with Gerrit v3.1.8) that still uses Postgres with Reviewdb, the design is to move into an HA platform with less differences. I could foresee that in our future upgrades, that   
            Postgres and Reviewdb will be gone.

     Luca: PostreSQL and ReviewDb are GONE already in Gerrit v3.1.8: you don’t need them. "


I just wanted to know like how are we going to check whether everything we need were already moved out of the database? Because it looks like when our Gerrit was upgraded to 3.1.8, it was not even noticed that the database is even in use.

The migration to v3.0.x would have already failed *IF* the conversion of ReviewDb would have not been already performed.


Below is our current gerrit.config and it looks like the 'database' configuration is even non-trivial anymore as I cannot see these parameters in the Gerrit documentation. 
Can I assume that these database configs are not necessary and we don't even need to bother with the database anymore?

[gerrit]
        basePath = git
        canonicalWebUrl = https://ourinternalgerriturl/
        serverId = server_id_number
[database]
        type = postgresql
        hostname = database_server_name
        database = reviewdb
        username = gerritdatabaseuser

Yes, confirmed. The Gerrit v3.1.x documentation does not even have a ‘database’ section anymore:

The section in your gerrit.config is 100% ignored by Gerrit.

Luca.

Nasser Grainawi

unread,
Jun 29, 2021, 4:04:55 PM6/29/21
to Luca Milanesio, adrien...@gmail.com, Repo and Gerrit Discussion
On Jun 29, 2021, at 1:52 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 29 Jun 2021, at 20:40, adrien...@gmail.com <adrien...@gmail.com> wrote:

Hi Luca,

I just wanted to circle back with your statement below about Postgres and ReviewDB. 

Because as we were running some test on the new HA, it looks like Gerrit (true to itself) is up running and was not even connected to the Postgres database after all (I asked the DBA's and they found no connectivity from Gerrit with the schema user we asked them to create) and we were able to work on the HA cluster and simulate our HA and DR exercises.

    " AZ: I  believe that since we are coming from a non-ha Production (also with Gerrit v3.1.8) that still uses Postgres with Reviewdb, the design is to move into an HA platform with less differences. I could foresee that in our future upgrades, that   
            Postgres and Reviewdb will be gone.

     Luca: PostreSQL and ReviewDb are GONE already in Gerrit v3.1.8: you don’t need them. "


I just wanted to know like how are we going to check whether everything we need were already moved out of the database? Because it looks like when our Gerrit was upgraded to 3.1.8, it was not even noticed that the database is even in use.

The migration to v3.0.x would have already failed *IF* the conversion of ReviewDb would have not been already performed.


Below is our current gerrit.config and it looks like the 'database' configuration is even non-trivial anymore as I cannot see these parameters in the Gerrit documentation. 
Can I assume that these database configs are not necessary and we don't even need to bother with the database anymore?

[gerrit]
        basePath = git
        canonicalWebUrl = https://ourinternalgerriturl/
        serverId = server_id_number
[database]
        type = postgresql
        hostname = database_server_name
        database = reviewdb
        username = gerritdatabaseuser

Yes, confirmed. The Gerrit v3.1.x documentation does not even have a ‘database’ section anymore:

The section in your gerrit.config is 100% ignored by Gerrit.

There is the accountPatchReviewDb section [1] that *could* be configured to use postgres, but if you don’t have any config in that section it will be using a local H2 db by default.

[1] https://gerrit-documentation.storage.googleapis.com/Documentation/3.1.15/config-gerrit.html#accountPatchReviewDb

Luca Milanesio

unread,
Jun 29, 2021, 4:08:12 PM6/29/21
to Repo and Gerrit Discussion, Luca Milanesio
True, but that has nothing to do with ReviewDb:
it is a completely separate DB and did not even use the [database] section.

Luca.

adrien...@gmail.com

unread,
Jun 29, 2021, 4:34:06 PM6/29/21
to Repo and Gerrit Discussion
Hi Luca,

Thanks for the response. It gives us the comfort that the Postgres database (which apparently is not in use after the last upgrade) is not required anymore. Less component on our HA architecture.

Now since we did exercised a DR failover, we just noticed that the Gerrit UI did not show the commits/changes from the HA cluster. Note that our DR instance is Non-HA and its Cold (stop mode). But it does share the same NFS used by the Cluster and hosts the /repos and /share folder.
After I did a re-index which took 2hrs and 45mins, we were then able to see the changes in the UI.

Now we were thinking whether we really need to do the re-index every time we do a DR failover? 
Or is there a way to configure Gerrit and move the /index from the $GERRIT_SITE into the same NFS filesystem so that after we refresh the data on our First Server (during cutover), we will run the re-index once and can reuse the same /index in DR by taking advantage of the NFS?

Regards,

Adrien 

Luca Milanesio

unread,
Jun 29, 2021, 4:54:21 PM6/29/21
to Repo and Gerrit Discussion, Luca Milanesio
On 29 Jun 2021, at 21:34, adrien...@gmail.com <adrien...@gmail.com> wrote:

Hi Luca,

Thanks for the response. It gives us the comfort that the Postgres database (which apparently is not in use after the last upgrade) is not required anymore. Less component on our HA architecture.

Now since we did exercised a DR failover, we just noticed that the Gerrit UI did not show the commits/changes from the HA cluster. Note that our DR instance is Non-HA and its Cold (stop mode). But it does share the same NFS used by the Cluster and hosts the /repos and /share folder.
After I did a re-index which took 2hrs and 45mins, we were then able to see the changes in the UI.

Now we were thinking whether we really need to do the re-index every time we do a DR failover? 
Or is there a way to configure Gerrit and move the /index from the $GERRIT_SITE into the same NFS filesystem so that after we refresh the data on our First Server (during cutover), we will run the re-index once and can reuse the same /index in DR by taking advantage of the NFS?

Could you start a different topic for this question?
We typically separate topics in the mailing list, otherwise the discussions are difficult to follow for other readers.

Luca.

Reply all
Reply to author
Forward
0 new messages