Cannot replicate due to TransportException: null

264 views
Skip to first unread message

motorhe...@gmail.com

unread,
Jan 13, 2024, 2:31:29 PM1/13/24
to Repo and Gerrit Discussion
We started with an rsync snapshot copy of our Gerrit 3.2.7 environment on a separate test Gerrit master, which we then upgraded to 3.5.2 and then on to 3.6.6. From there we created a set of remote 3.6.6 mirror servers and got them replicating smoothly.

We are trying to streamline this process so that we can migrate to using the Gerrit 3.6 system for production. To that end we currently refresh this master with a weekly rsync snapshot from the 3.2.7 environment using the same original script that reindexes 3.2.7->3.5.2->3.6.6. This process is fairly well understood and works well up to now.  The Gerrit 3.6.6 servers exist in parallel to the production systems. We do not touch the production Gerrit 3.2 servers for this purpose and leave them running throughout this process.

The problem arises when we try to populate the Gerrit 3.6 proxies from the 3.6 master after the weekly snapshot. The methods tried so far as follows:
  • rsync is reliable, but takes too long to complete
  • using replication, we see many jobs getting rejected REJECTED_NONFASTFORWARD
  • when we set defaultForceUpdate=true we get TransportException: null (stack trace below), presumably for the same projects that were getting  REJECTED_NONFASTFORWARD before
    • rsyncing the affected repos will workaround this exception so that they replicate smoothly again, but this is taking manual work to track down the failures to correct them
Since we perform tests on the Gerrit 3.6 systems during the week and since routine work is ongoing in the production environment, it is expected that they will get out of sync. 

Is there a way to configure the replication plugin so that the proxies get clobberred by the current state of the master?

Regards,
Robert.

stack trace follows:

[2024-01-13 13:39:09,028] Cannot replicate to <redacted>.git [CONTEXT pushOneId="00f8be0c" ]
org.eclipse.jgit.errors.TransportException: <redacted>.git: null
        at org.eclipse.jgit.transport.BasePackPushConnection.doPush(BasePackPushConnection.java:209)
        at org.eclipse.jgit.transport.BasePackPushConnection.push(BasePackPushConnection.java:139)
        at org.eclipse.jgit.transport.PushProcess.execute(PushProcess.java:179)
        at org.eclipse.jgit.transport.Transport.push(Transport.java:1537)
        at org.eclipse.jgit.transport.Transport.push(Transport.java:1583)
        at com.googlesource.gerrit.plugins.replication.PushOne.pushInBatches(PushOne.java:591)
        at com.googlesource.gerrit.plugins.replication.PushOne.pushVia(PushOne.java:584)
        at com.googlesource.gerrit.plugins.replication.PushOne.runImpl(PushOne.java:555)
        at com.googlesource.gerrit.plugins.replication.PushOne.doRunPushOperation(PushOne.java:437)
        at com.googlesource.gerrit.plugins.replication.PushOne.runPushOperation(PushOne.java:405)
        at com.googlesource.gerrit.plugins.replication.PushOne.lambda$run$2(PushOne.java:391)
        at com.google.gerrit.server.util.RequestScopePropagator.lambda$cleanup$1(RequestScopePropagator.java:186)
        at com.google.gerrit.server.util.RequestScopePropagator.lambda$context$0(RequestScopePropagator.java:174)
        at com.google.gerrit.server.git.PerThreadRequestScope$Propagator.lambda$scope$0(PerThreadRequestScope.java:70)
        at com.googlesource.gerrit.plugins.replication.PushOne.run(PushOne.java:394)
        at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException

Martin Fick

unread,
Jan 13, 2024, 2:51:26 PM1/13/24
to motorhe...@gmail.com, Repo and Gerrit Discussion
On Sat, Jan 13, 2024 at 12:31 PM motorhe...@gmail.com <motorhe...@gmail.com> wrote:
Is there a way to configure the replication plugin so that the proxies get clobberred by the current state of the master?


remote.NAME.mirror : If true, replication will remove remote branches that are absent locally or invisible to the replication (for example read access denied via authGroup option). 

-Martin

motorhe...@gmail.com

unread,
Jan 13, 2024, 4:11:38 PM1/13/24
to Repo and Gerrit Discussion
Thank you for the link Marting, that is helpful. 
We already set mirror=true for these replication targets. Should we still be getting TransportError: null?.

Martin Fick

unread,
Jan 17, 2024, 2:07:00 PM1/17/24
to motorhe...@gmail.com, Repo and Gerrit Discussion
On Sat, Jan 13, 2024 at 2:11 PM motorhe...@gmail.com <motorhe...@gmail.com> wrote:
Thank you for the link Marting, that is helpful. 
We already set mirror=true for these replication targets. Should we still be getting TransportError: null?.

Definitely not, I am not sure what would cause that. Since you have a stacktrace, perhaps you can look at the code to see what is null?

-Martin

motorhe...@gmail.com

unread,
Jan 31, 2024, 5:14:18 PM1/31/24
to Repo and Gerrit Discussion
We have inspected the source code for
Gerrit v3.6.8 (f82751c957b3a3576df9d03f83d2f5a5e4104936)
 - Replication plugin (47ee3dab0dd96900e85662adf0d5f48a33d17733)
 - JGit (82e277c813398c9f519f16e83d080a94fa29a27c )

From what we can tell, the TransportException is caused by an original nullPointerException that occurs at org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/pack/DeltaWindow.java#206:

    ObjectToPack srcObj = bestBase.object;
    ObjectToPack resObj = res.object;
    if (srcObj.isEdge()) // Here, meaning bestBase.object is null?

Does this help?

Regards,
Robert.


org.eclipse.jgit.errors.TransportException: gerritcr@<affected_mirror_server>:/home/gerritcr/repositories/<affected_project>.git: null

        at org.eclipse.jgit.transport.BasePackPushConnection.doPush(BasePackPushConnection.java:209)
        at org.eclipse.jgit.transport.BasePackPushConnection.push(BasePackPushConnection.java:139)
        at org.eclipse.jgit.transport.PushProcess.execute(PushProcess.java:179)
        at org.eclipse.jgit.transport.Transport.push(Transport.java:1537)
        at org.eclipse.jgit.transport.Transport.push(Transport.java:1583)
        at com.googlesource.gerrit.plugins.replication.PushOne.pushInBatches(PushOne.java:591)
        at com.googlesource.gerrit.plugins.replication.PushOne.pushVia(PushOne.java:584)
        at com.googlesource.gerrit.plugins.replication.PushOne.runImpl(PushOne.java:555)
        at com.googlesource.gerrit.plugins.replication.PushOne.doRunPushOperation(PushOne.java:437)
        at com.googlesource.gerrit.plugins.replication.PushOne.runPushOperation(PushOne.java:405)
        at com.googlesource.gerrit.plugins.replication.PushOne.lambda$run$2(PushOne.java:391)
        at com.google.gerrit.server.util.RequestScopePropagator.lambda$cleanup$1(RequestScopePropagator.java:186)
        at com.google.gerrit.server.util.RequestScopePropagator.lambda$context$0(RequestScopePropagator.java:174)
        at com.google.gerrit.server.git.PerThreadRequestScope$Propagator.lambda$scope$0(PerThreadRequestScope.java:70)
        at com.googlesource.gerrit.plugins.replication.PushOne.run(PushOne.java:394)
        at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
        at org.eclipse.jgit.internal.storage.pack.DeltaWindow.searchInWindow(DeltaWindow.java:206)
        at org.eclipse.jgit.internal.storage.pack.DeltaWindow.search(DeltaWindow.java:144)
        at org.eclipse.jgit.internal.storage.pack.DeltaTask.runWindow(DeltaTask.java:295)
        at org.eclipse.jgit.internal.storage.pack.DeltaTask.call(DeltaTask.java:270)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        ... 3 more

motorhe...@gmail.com

unread,
Feb 8, 2024, 4:18:58 PM2/8/24
to Repo and Gerrit Discussion
Ping. Can anyone advise where this null pointer might be originating?

In a seperate note, I tried cloning an affected repo from a proxy after seeing this error, and found that the affected change was present after all, even after the replication log stated that the replication was "cancelled after maximum number of retries". In that case perhaps we can safely ignore this error, but that is one hell of a stretch after the log states that replication was cacnelled. This is why we *really* want to know.

One of my concerns is that the repos might be somehow corrupted by the migration process (upgrade 3.2->3.5->3.6) or that the replication can be genuinely affected under specific circumstances that we don't know of. I am deathly afraid to roll Gerrit 3.6 out to production in this state without having any answer to why this is happening.

Can anyone suggest a tool (or set of tools) that would check the integrity of a gerrit repo to make sure it is internally consistent? Maybe that would help at least for us to safely ignore these exception.

Any advice appreciated.

Regards,
Robert.

Sven Selberg

unread,
Feb 9, 2024, 1:48:26 AM2/9/24
to Repo and Gerrit Discussion
On Thursday, February 8, 2024 at 10:18:58 PM UTC+1 motorhe...@gmail.com wrote:
Ping. Can anyone advise where this null pointer might be originating?

In a seperate note, I tried cloning an affected repo from a proxy after seeing this error, and found that the affected change was present after all, even after the replication log stated that the replication was "cancelled after maximum number of retries". In that case perhaps we can safely ignore this error, but that is one hell of a stretch after the log states that replication was cacnelled. This is why we *really* want to know.

One of my concerns is that the repos might be somehow corrupted by the migration process (upgrade 3.2->3.5->3.6) or that the replication can be genuinely affected under specific circumstances that we don't know of. I am deathly afraid to roll Gerrit 3.6 out to production in this state without having any answer to why this is happening.

Can anyone suggest a tool (or set of tools) that would check the integrity of a gerrit repo to make sure it is internally consistent? Maybe that would help at least for us to safely ignore these exception.

I'm guessing that `git fsck` is what you are after.
Reply all
Reply to author
Forward
0 new messages