Pull-replication with Broker(kafka) : It takes so long time to replicate data created during Replica shut down.

341 views
Skip to first unread message

Sehen

unread,
Sep 19, 2023, 10:59:04 AM9/19/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Dear all & Matthias Sohn,  Luca Milanesio

I am testing Primary and Replica gerrits with pull-replication.
They are located between WAN network.
Gerrit has about 1,800 repositories and pull-replication seems to work well usually.

But After Replica was shut down and started, It takes so long time to replicate refspec created during Replica was shut down. (Big size repository, 30GB)
Is it 

1. Is there a way to measure the time it takes to replicate data through a Broker(Kafka)?
If in case of through a broker, There is no logs in logs/pull-replication_log.

2. Is there a way to shorten replication time through Broker?

This is my configuration.

Primary : 
Gerrit : v3.5.2
pull-replication  : v3.3.0-245-g2acb33b45c(Version)   3.6.6-SNAPSHOT(Version)
events-kafka : cbf7fe0662(Version) 3.6.6-SNAPSHOT(Version)
lib/events-broker.jar
etc/replication.conf
[gerrit]
        autoReload = true
        replicateOnStartup = false
[replication]
        lockErrorMaxRetries = 5
        eventBrokerTopic = gerrit
        consumeStreamEvents = false
        syncRefs = "ALL REFS ASYNC"
        maxApiPayloadSize = 100000000
[remote "replica_lgsi_1"]
        apiUrl = http://xx.xx.xx.xx:xxxx/
        fetch = +refs/*:refs/* 
        createMissingRepositories = true 
        replicateProjectDeletions = true 
        replicatePermissions = true 
        replicateHiddenProjects = true
        mirror = true
        connectionTimeout = 360000
        rescheduleDelay = 15
        replicationDelay = 1
        tagopt = --no-tags

etc/gerrit.config
[plugin "events-kafka"]
    sendAsync = true
    bootstrapServers = 156.147.61.49:9092
    groupId = $INSTANCE_ID
    numberOfSubscribers = 6
    securityProtocol = PLAINTEXT
    pollingIntervalMs = 1000
    enableAutoCommit = true
    autoCommitIntervalMs = 1000
    autoOffsetReset = latest
    sendStreamEvents = true


Replica :
Gerrit : v3.5.2
pull-replication : v3.3.0-245-g2acb33b45c(Version)   3.6.6-SNAPSHOT(Version)
etc/replication.config
[gerrit]
        autoReload = true
        replicateOnStartup = true
[replication]
        consumeStreamEvents = false
        eventBrokerTopic = gerrit
        lockErrorMaxRetries = 5
        syncRefs = "ALL REFS ASYNC"
[remote "primary"]
        url = ssh://aaa...@xxx.xxx.com:xxxx/${name}.git 
        fetch = +refs/*:refs/*
        threads = 400
        mirror = true
        lockErrorMaxRetries = 3
        replicationRetry = 5m
        rescheduleDelay = 15
        replicationDelay = 1
        tagopt = --no-tags


Thankyou in advance. 

Luca Milanesio

unread,
Sep 19, 2023, 5:01:02 PM9/19/23
to Repo and Gerrit Discussion, Luca Milanesio, Sehen
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Sehen,

On 19 Sep 2023, at 15:59, Sehen <ohse...@gmail.com> wrote:

Dear all & Matthias Sohn,  Luca Milanesio

I am testing Primary and Replica gerrits with pull-replication.
They are located between WAN network.
Gerrit has about 1,800 repositories and pull-replication seems to work well usually.

Happy to hear that :-)


But After Replica was shut down and started, It takes so long time to replicate refspec created during Replica was shut down. (Big size repository, 30GB)

How many refs do you have?
Typically the catch-up based on the broker messages is purely using the git fetch, which is influence by the number of the refs of both repositories.

Is it 

1. Is there a way to measure the time it takes to replicate data through a Broker(Kafka)?

The problem isn’t the broker, but the Git protocol and the JGit client / server implementation of it.

If in case of through a broker, There is no logs in logs/pull-replication_log.

The events-broker produces messages on the message_log.


2. Is there a way to shorten replication time through Broker?

There are some plugins I’ve develop that reduce some of the phases, including the advertisement.
See [1] for instance.

There are also lots of fixes on JGit that we have provided for review which improves a lot the execution times. I’ve presented them last year at the Gerrit User Summit, see [2].

HTH

Luca.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/cc6d98cb-d2ae-47ef-b3eb-4b49d089cda6n%40googlegroups.com.

Daniele Sassoli

unread,
Sep 19, 2023, 9:25:50 PM9/19/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Sehen,

On Tuesday, 19 September 2023 at 14:01:02 UTC-7 Luca Milanesio wrote:
Hi Sehen,

On 19 Sep 2023, at 15:59, Sehen <ohse...@gmail.com> wrote:

Dear all & Matthias Sohn,  Luca Milanesio

I am testing Primary and Replica gerrits with pull-replication.
They are located between WAN network.
Gerrit has about 1,800 repositories and pull-replication seems to work well usually.

Happy to hear that :-)


But After Replica was shut down and started, It takes so long time to replicate refspec created during Replica was shut down. (Big size repository, 30GB)

How many refs do you have?
Typically the catch-up based on the broker messages is purely using the git fetch, which is influence by the number of the refs of both repositories.

Is it 

1. Is there a way to measure the time it takes to replicate data through a Broker(Kafka)?

The problem isn’t the broker, but the Git protocol and the JGit client / server implementation of it.
 
You might also want to upgrade to the latest version of pull-replication, as  A LOT of new metrics have been added, as documented[1].
It might give you a lot more insight into what's going on. 

Sehen

unread,
Sep 20, 2023, 4:28:47 AM9/20/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi, Luca

Thank you for your kind reply. And I feel sorry for my lack of knowledge about Pull-replication and Broker.
I will try to measure how many refs to replicate.
But Both gerrit(Primary and Replica) does not have 'message_log' file under $Gerrit_home/logs/ .
Can you please check that There are some miss configurtions on my systems?

1. Broker(Zookeeper & Kafka) is installed only on Primary
2. events-kafka.jar plugin installed only on Primary.
3. [plugin "events-kafka"] config is only in Primary's gerrit.config
4. events-broker.jar installed on Primay and Replica both (under lib/)

Thank you in advance.
2023년 9월 20일 수요일 오전 10시 25분 50초 UTC+9에 Daniele Sassoli님이 작성:

Matthias Sohn

unread,
Sep 20, 2023, 5:46:51 AM9/20/23
to Sehen, Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
On Wed, Sep 20, 2023 at 10:28 AM Sehen <ohse...@gmail.com> wrote:
Hi, Luca

Thank you for your kind reply. And I feel sorry for my lack of knowledge about Pull-replication and Broker.
I will try to measure how many refs to replicate.
But Both gerrit(Primary and Replica) does not have 'message_log' file under $Gerrit_home/logs/ .
Can you please check that There are some miss configurtions on my systems?

1. Broker(Zookeeper & Kafka) is installed only on Primary
2. events-kafka.jar plugin installed only on Primary.

if the replica shall use kafka you need to install events-kafka.jar also on the replica.
events-broker only contains the API but not its implementation for kafka
 
3. [plugin "events-kafka"] config is only in Primary's gerrit.config

you will also need events-kafka config for the replica otherwise it doesn't know how to connect to kafka
 
4. events-broker.jar installed on Primay and Replica both (under lib/)

depending on the configuration pull-replication plugin is also required on both primary and replica 
 

Sehen

unread,
Sep 20, 2023, 12:25:33 PM9/20/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Thank you  Matthias,

My pull-replication was configured one-way, Primary to Replica. 
And pull-replication.jar, events-broker.jar, events-kafka.jar were installed on Primary and Replica both.
And also Brokers are installed on both gerrit.

1. Primary and Replica's events-kafka config(in gerrit.config) 'bootstrapServers' should be set to the samely Primary's broker? or set to the each own brokers?
2. And 'groupId' should be differ from each other or be same?
3. gerrit.config's this javaOptions is required?
  -  javaOptions = "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005"(Primary), javaOptions = "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5006"(Replica)

Even though I tried many ways, Still logs/message_log is not created on both gerrits.

Thank you.
2023년 9월 20일 수요일 오후 6시 46분 51초 UTC+9에 Matthias Sohn님이 작성:

Sehen

unread,
Sep 26, 2023, 12:38:18 PM9/26/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse

Dear all & Matthias Sohn,  Luca Milanesio

I'm sorry for asking such detailed questions again.
I'm testing Pull-replication with broker(events-kafka) as I said.
However, there are one or two concernsabout it. I would appreciate it if someone could help me.

1. I am concerned that when using kafka broker, there will be a risk of replication in the opposite direction (from Replica to Primary).
    Broker is installed only on Primary server, Primary & Replica's 'bootstrapServers' was set as same IP and port.
2. Should message_log be created inside Gerrit? message_log not found. Only there is message log in broker container(/var/lib/kafka/data/gerrit/)

Primary's gerrit.config
[plugin "events-kafka"]
    sendAsync = true
    bootstrapServers = $primary_broker:9092
    groupId = primary

    numberOfSubscribers = 6
    securityProtocol = PLAINTEXT
    pollingIntervalMs = 1000
    enableAutoCommit = true
    autoCommitIntervalMs = 1000
    autoOffsetReset = latest
    sendStreamEvents = true
 
Replica's gerrit.config
[plugin "events-kafka"]
    sendAsync = true
    bootstrapServers =  $primary_broker :9092
    groupId = replica

    numberOfSubscribers = 6
    securityProtocol = PLAINTEXT
    pollingIntervalMs = 1000
    enableAutoCommit = true
    autoCommitIntervalMs = 1000
    autoOffsetReset = latest
    sendStreamEvents = true
    restApiThreads = 400

2023년 9월 21일 목요일 오전 1시 25분 33초 UTC+9에 Sehen님이 작성:

Fabio Ponciroli

unread,
Sep 26, 2023, 1:16:23 PM9/26/23
to Sehen, Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Sehen,

On Tue, 26 Sept 2023 at 09:38, Sehen <ohse...@gmail.com> wrote:

Dear all & Matthias Sohn,  Luca Milanesio

I'm sorry for asking such detailed questions again.
I'm testing Pull-replication with broker(events-kafka) as I said.
However, there are one or two concernsabout it. I would appreciate it if someone could help me.

1. I am concerned that when using kafka broker, there will be a risk of replication in the opposite direction (from Replica to Primary).
    Broker is installed only on Primary server, Primary & Replica's 'bootstrapServers' was set as same IP and port.

Replicas are readonly nodes, hence they won't receive any write traffic. Thus the replication will only happen from the Primary to the Secondary node(s).
For the Replica to consume messages you will need the broker to be installed there as well. You can find an example of a primary-replica setup with a broker here [1].
Bear in mind that data replicated over the broker are just indexing, cache eviction, ref-updates, etc events. The git data is replicated, by the pull-replication plugin, via a git fetch and/or apply-object.

 
2. Should message_log be created inside Gerrit? message_log not found. Only there is message log in broker container(/var/lib/kafka/data/gerrit/)

message_log will be in the $GERRIT_SITE/logs along with the other log files.
You should have it both on the primary and replica node.
HTH,
Ponch

P.S.: Please avoid top posting on this list, instead use interleaved posting style [2] which is easier to follow a conversation

 

Matthias Sohn

unread,
Sep 26, 2023, 6:21:39 PM9/26/23
to Fabio Ponciroli, Sehen, Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
On Tue, Sep 26, 2023 at 7:16 PM Fabio Ponciroli <pon...@gmail.com> wrote:
Hi Sehen,

On Tue, 26 Sept 2023 at 09:38, Sehen <ohse...@gmail.com> wrote:

Dear all & Matthias Sohn,  Luca Milanesio

I'm sorry for asking such detailed questions again.
I'm testing Pull-replication with broker(events-kafka) as I said.
However, there are one or two concernsabout it. I would appreciate it if someone could help me.

1. I am concerned that when using kafka broker, there will be a risk of replication in the opposite direction (from Replica to Primary).
    Broker is installed only on Primary server, Primary & Replica's 'bootstrapServers' was set as same IP and port.

Replicas are readonly nodes, hence they won't receive any write traffic. Thus the replication will only happen from the Primary to the Secondary node(s).
For the Replica to consume messages you will need the broker to be installed there as well. You can find an example of a primary-replica setup with a broker here [1].
Bear in mind that data replicated over the broker are just indexing, cache eviction, ref-updates, etc events. The git data is replicated, by the pull-replication plugin, via a git fetch and/or apply-object.

Only events are transmitted via the message broker since git data is potentially large and
message brokers are typically not designed to transfer bulk data.

Hence git data is replicated via git transport (push or fetch) using the replication or pull-replication plugin.
In case some replication task is lost e.g. due to some error it can be repeated since the data to be
replicated is still persisted in the git repo initiating the replication.
  

Sehen

unread,
Sep 27, 2023, 8:47:35 PM9/27/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.

If there's anything else you need me to share, please let me know.

Thank you/


2023년 9월 27일 수요일 오전 7시 21분 39초 UTC+9에 Matthias Sohn님이 작성:

Fabio Ponciroli

unread,
Sep 28, 2023, 11:03:41 AM9/28/23
to Sehen, Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Sehen,

On Wed, 27 Sept 2023 at 17:47, Sehen <ohse...@gmail.com> wrote:
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.


That's the reason why you don't see the message_log. It is produced by the multi-site, not the pull-replication plugin.

Thanks,
Ponch

P.S.: Please avoid top posting on this list, instead use interleaved posting style [1] which is easier to follow a conversation

 

David Åkerman

unread,
Sep 29, 2023, 7:28:36 AM9/29/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
On Thursday, September 28, 2023 at 5:03:41 PM UTC+2 Fabio Ponciroli wrote:
Hi Sehen,

On Wed, 27 Sept 2023 at 17:47, Sehen <ohse...@gmail.com> wrote:
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.


Hi Sehen,
 
Do you need zookeeper? As far as I know you only need zookeeper to solve the split-brain problem that arise in a multi-site setup with multiple primaries. 
Hence, no zookeeper should be needed when you only have one primary.

Best regards,
David

Sehen

unread,
Sep 29, 2023, 12:06:48 PM9/29/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Thank you  Ponch,

2023년 9월 29일 금요일 오전 12시 3분 41초 UTC+9에 Fabio Ponciroli님이 작성:
Hi Sehen,

On Wed, 27 Sept 2023 at 17:47, Sehen <ohse...@gmail.com> wrote:
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.


That's the reason why you don't see the message_log. It is produced by the multi-site, not the pull-replication plugin.

Thanks,
Ponch

Until I received your answer, I thought my configuration was wrong and the message_log was not being created.
However, I now know that the log is not created unless I use the multi-site plugin.
As mentioned earlier, it is composed of only Pull-replication, Kafka broker, and Zookeeper, and with this configuration, Can I catch up with missing refs when Replica was offline.??
I'm not sure about that.
And should I use 'consumeStreamEvents = true' or 'eventBrokerTopic = gerrit' in Pull-replication configuration to prevent missing refs when Replica was offline?

I am referring to the content below[1].

Sehen

unread,
Sep 29, 2023, 12:17:59 PM9/29/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Thank you David.

2023년 9월 29일 금요일 오후 8시 28분 36초 UTC+9에 David Åkerman님이 작성:
On Thursday, September 28, 2023 at 5:03:41 PM UTC+2 Fabio Ponciroli wrote:
Hi Sehen,

On Wed, 27 Sept 2023 at 17:47, Sehen <ohse...@gmail.com> wrote:
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.


Hi Sehen,
 
Do you need zookeeper? As far as I know you only need zookeeper to solve the split-brain problem that arise in a multi-site setup with multiple primaries. 
Hence, no zookeeper should be needed when you only have one primary.

Best regards,
David

Sorry, but your suggestion is something I haven't thought about..
As mentioned earlier, To prevent missing refs when Replica was offline I'm using Pull-replication, Kafka broker, and Zookeeper.


 
That's the reason why you don't see the message_log. It is produced by the multi-site, not the pull-replication plugin.

Fabio Ponciroli

unread,
Oct 3, 2023, 8:44:58 PM10/3/23
to Sehen, Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi Sehen,


On Fri, 29 Sept 2023 at 09:06, Sehen <ohse...@gmail.com> wrote:
Thank you  Ponch,

2023년 9월 29일 금요일 오전 12시 3분 41초 UTC+9에 Fabio Ponciroli님이 작성:
Hi Sehen,

On Wed, 27 Sept 2023 at 17:47, Sehen <ohse...@gmail.com> wrote:
Hi Fabio Ponciroli,  Matthias Sohn
Thank you for your reply. I have some understanding of how pull-replication and brokers work.

However, whenever there are changes to Gerrit (e.g. review comments), logs in binary format are accumulated inside the broker (/var/lib/kafka/data/gerrit-0/).
But Inside $GERRIT_SITE/logs the message_log still does not exist.
There are only pull-replication_log, error_log, sshd_log, and httpd_log.
Did I miss some settings? As I mentioned earlier, I am using a combination of pull-replication & broker & zookeeper and do not use multi-site plugins.


That's the reason why you don't see the message_log. It is produced by the multi-site, not the pull-replication plugin.

Thanks,
Ponch

Until I received your answer, I thought my configuration was wrong and the message_log was not being created.
However, I now know that the log is not created unless I use the multi-site plugin.
As mentioned earlier, it is composed of only Pull-replication, Kafka broker, and Zookeeper, and with this configuration, Can I catch up with missing refs when Replica was offline.??

Yes.
 
I'm not sure about that.
And should I use 'consumeStreamEvents = true' or 'eventBrokerTopic = gerrit' in Pull-replication configuration to prevent missing refs when Replica was offline?

The correct configuration is actually:

[replication]
  consumeStreamEvents = false
  eventBrokerTopic = gerrit

From the documentation [1]:

"When `eventBrokerTopic` is enabled gerrit.instanceId
instead of replication.instanceLabel must be used.

Bear in mind that if consumeStreamEvents is set to true this
parameter will be ignored."

Hope it makes sense.

Ponch


Sehen

unread,
Oct 18, 2023, 8:59:53 AM10/18/23
to Repo and Gerrit Discussion
Thank your help   Ponch, Matthias Sohn,  Daniele Sassoli,  Luca Milanesio
With your help, pull-replication( with broker) is working well. Thank you again.

I have a question. I would like to see how long it takes for replication to complete. but, there are two types of time(ms) for same refspec in logs/pull_replication_log
- E2E 5357ms
- completed in 12ms

What is the difference between the two types? How can I find out exactly how long it will take?

---- logs/pull_replication_log -----
[95651bca] Replication from ssh://gerrit.com:29418/abc.git started for refs [refs/changes/70/455070/1] ...
[95651bca] Fetch references [+refs/changes/70/455070/1:refs/changes/70/455070/1] from ssh://gerrit.com:29418/abc.git
[95651bca] Replication from ssh://gerrit.com:29418/abc.git completed in 4140ms, 1000ms delay, 0 retries, E2E 5357ms
Apply object API from abc for primary:refs/changes/70/455070/1 - {commitObject=aaaaabbbb ...
Apply object from primary for abc:refs/changes/70/455070/1 - [{commitObject=aaaaabbbb ...
Apply object from primary for project abc, ref name refs/changes/70/455070/1 completed in 12ms

Thank you.



2023년 10월 4일 수요일 오전 9시 44분 58초 UTC+9에 Fabio Ponciroli님이 작성:

Marcin Czech

unread,
Oct 18, 2023, 12:49:00 PM10/18/23
to Sehen, Repo and Gerrit Discussion
Hi Sehen,
[95651bca] Replication from ssh://gerrit.com:29418/abc.git completed in 4140ms, 1000ms delay, 0 retries, E2E 5357ms
Shows e2e time for git fetch operation from ssh://gerrit.com:29418/abc.git node

Apply object from primary for project abc, ref name refs/changes/70/455070/1 completed in 12ms
Shows how long it took to write data to the repository using apply-object
If you want to see e2e time for apply-object you have to check pull_replication_log on the other node(gerrit.com:29418/)

If you want to understand difference between apply-object and git fetch I recommend to watch:
https://youtu.be/6rubGQE9v2E?si=BcSF1NgnWK1IAdd8

Marcin

Sehen

unread,
Oct 20, 2023, 2:55:01 AM10/20/23
to Repo and Gerrit Discussion
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Hi, Marcin
Thanks for making me realize the difference between fetch and apply objects.
 
If you want to see e2e time for apply-object you have to check pull_replication_log on the other node(gerrit.com:29418/)
According to your comment, time(time:11496.518613 ms) information on Primary(gerrit.com:29418)'s logs/pull_replication_log is exact real fetch and apply time right? 

---- Replica's logs/pull_replication_log -----

[95651bca] Replication from ssh://gerrit.com:29418/abc.git started for refs [refs/changes/70/455070/1] ...
[95651bca] Fetch references [+refs/changes/70/455070/1:refs/changes/70/455070/1] from ssh://gerrit.com:29418/abc.git
[95651bca] Replication from ssh://gerrit.com:29418/abc.git completed in 4140ms, 1000ms delay, 0 retries,  E2E 5357ms
Apply object API from abc for primary:refs/changes/70/455070/1 - {commitObject=aaaaabbbb ...
Apply object from primary for abc:refs/changes/70/455070/1 - [{commitObject=aaaaabbbb ...
Apply object from primary for project abc, ref name refs/changes/70/455070/1  completed in 12ms

---- Primary's logs/pull_replication_log -----
Pull replication REST API apply object to http://gerrit.com/ COMPLETED for abc:refs/changes/70/455070/1 - [{commitObject=aaaaabbbb ... (COMMIT) treeObject=cccddd... , HTTP Result: OK - time:11496.518613 ms

Thank you in advance.
Sehen




2023년 10월 19일 목요일 오전 1시 49분 0초 UTC+9에 Marcin Czech님이 작성:

Marcin Czech

unread,
Oct 20, 2023, 3:45:55 AM10/20/23
to Sehen, Repo and Gerrit Discussion
Hi Sehen,

You are correct

Pull replication REST API apply object to http://gerrit.com/ COMPLETED for abc:refs/changes/70/455070/1 - [{commitObject=aaaaabbbb ... (COMMIT) treeObject=cccddd... , HTTP Result: OK - time:11496.518613 ms
shows e2e time for apply-object. Another option is to use metrics exposed by pull-replication.

 In my opinion you should have a look why apply-object took so long compared to fetch. Usually apply-object is 100x faster than fetch. Then you can check why you had apply-object and fetch, in pull-replication we have a cache which helps to filter out already replicated refs[1]. Make sure you have the newest of the pull-replication.jar. Also make sure that apply-object is executed before fetch, you can do that by renaming events-kafka.jar to z-events-kafka.jar 


Marcin

Reply all
Reply to author
Forward
0 new messages