Got internal server error (user xxx account xxx) during git-upload-pack

265 views
Skip to first unread message

Makson Lee

unread,
Jun 23, 2022, 11:41:41 PM6/23/22
to Repo and Gerrit Discussion
recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:132)
        at org.eclipse.jgit.transport.UploadPack$ResponseBufferedOutputStream.write(UploadPack.java:2476)
        at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:141)
        at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:120)
        at org.eclipse.jgit.internal.storage.io.CancellableDigestOutputStream.write(CancellableDigestOutputStream.java:110)
        at org.eclipse.jgit.internal.storage.file.Pack.copyAsIs2(Pack.java:568)
        at org.eclipse.jgit.internal.storage.file.Pack.copyAsIs(Pack.java:388)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:188)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1768)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1745)
        at org.eclipse.jgit.internal.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:85)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.writeObjects(WindowCursor.java:196)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1733)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1721)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1270)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:2417)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:2246)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:1100)
        at org.eclipse.jgit.transport.UploadPack.uploadWithExceptionPropagation(UploadPack.java:843)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:757)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:101)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:109)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:74)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:491)
        at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833) 

Makson Lee

unread,
Jun 23, 2022, 11:47:13 PM6/23/22
to Repo and Gerrit Discussion
and here is part of our gerrit.config,

[container]
        user = gerrit2
        javaHome = /usr/lib/jvm/java-17-openjdk-amd64
        heapLimit = 48g
        javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
        javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
        javaOptions = "--add-opens java.base/java.net=ALL-UNNAMED"
        slave = true
        daemonOpt = --enable-httpd
[core]
        packedGitLimit = 24g
        packedGitWindowSize = 16k
        packedGitOpenFiles = 2048
[sshd]
        listenAddress = *:29418
        threads = 40
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776

Luca Milanesio

unread,
Jun 24, 2022, 4:00:36 AM6/24/22
to Repo and Gerrit Discussion, Luca Milanesio, Makson Lee

On 24 Jun 2022, at 04:41, Makson Lee <cdle...@gmail.com> wrote:

recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed

This says that the SSH connection has been closed, nothing more.

Do you collect metrics?
Can you check the SSH thread pool utilisation over time? 
Can you check your SSH log and see the execution timings?

Luca.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/433f95ff-4523-4790-9b80-42d7fcb97dabn%40googlegroups.com.

Makson Lee

unread,
Jun 24, 2022, 4:10:20 AM6/24/22
to Repo and Gerrit Discussion
On Friday, June 24, 2022 at 4:00:36 PM UTC+8 lucamilanesio wrote:

On 24 Jun 2022, at 04:41, Makson Lee <cdle...@gmail.com> wrote:

recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed

This says that the SSH connection has been closed, nothing more.

Do you collect metrics?
Can you check the SSH thread pool utilisation over time? 
Can you check your SSH log and see the execution timings?

Can you give me some instructions about how to collect these? 

Luca Milanesio

unread,
Jun 24, 2022, 4:48:16 AM6/24/22
to Repo and Gerrit Discussion, Luca Milanesio, Makson Lee

On 24 Jun 2022, at 09:10, Makson Lee <cdle...@gmail.com> wrote:



On Friday, June 24, 2022 at 4:00:36 PM UTC+8 lucamilanesio wrote:

On 24 Jun 2022, at 04:41, Makson Lee <cdle...@gmail.com> wrote:

recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed

This says that the SSH connection has been closed, nothing more.

Do you collect metrics?

See the metrics-reporter plugin at [1].

Can you check the SSH thread pool utilisation over time? 

The two metrics:
queue_ssh_batch_worker_active_threads
queue_ssh_interactive_worker_active_threads

Can you check your SSH log and see the execution timings?


Can you give me some instructions about how to collect these? 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Makson Lee

unread,
Jun 24, 2022, 4:56:31 AM6/24/22
to Repo and Gerrit Discussion
i will try it, thanks.

Matthias Sohn

unread,
Jun 24, 2022, 6:04:47 AM6/24/22
to Luca Milanesio, Repo and Gerrit Discussion, Makson Lee
On Fri, Jun 24, 2022 at 10:48 AM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 24 Jun 2022, at 09:10, Makson Lee <cdle...@gmail.com> wrote:



On Friday, June 24, 2022 at 4:00:36 PM UTC+8 lucamilanesio wrote:

On 24 Jun 2022, at 04:41, Makson Lee <cdle...@gmail.com> wrote:

recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed

This says that the SSH connection has been closed, nothing more.

Do you collect metrics?

See the metrics-reporter plugin at [1].

Can you check the SSH thread pool utilisation over time? 

The two metrics:
queue_ssh_batch_worker_active_threads
queue_ssh_interactive_worker_active_threads

You may consider to use the gerrit monitoring setup which gives you a nice collection of 
Grafana dashboards including thread pool metrics

 

Can you check your SSH log and see the execution timings?

See [2] for the details of the sshd_log file data.

HTH

Luca.



Can you give me some instructions about how to collect these? 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/9d67ce36-2eb4-48aa-928b-e201adde2445n%40googlegroups.com.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Makson Lee

unread,
Jun 24, 2022, 6:29:12 AM6/24/22
to Repo and Gerrit Discussion
On Friday, June 24, 2022 at 6:04:47 PM UTC+8 Matthias Sohn wrote:
You may consider to use the gerrit monitoring setup which gives you a nice collection of 
Grafana dashboards including thread pool metrics

I will try it too, thanks. 

Makson Lee

unread,
Jun 24, 2022, 12:07:50 PM6/24/22
to Repo and Gerrit Discussion
On Friday, June 24, 2022 at 4:00:36 PM UTC+8 lucamilanesio wrote:

On 24 Jun 2022, at 04:41, Makson Lee <cdle...@gmail.com> wrote:

recently we upgraded our gerrit server from 3.5.x to 3.6.1, then we need to restart the server each a few days, otherwise, we will get the follow error while repo sync,

[2022-06-23T18:23:28.784+08:00] [SSH git-upload-pack /platform/prebuilts/clang/host/linux-x86 (tedwu)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxx account xxx) during git-upload-pack '/platform/prebuilts/clang/host/linux-x86'
org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=388, recipient=30]-ServerSessionImpl[xxx@/172.17.100.22:40114]] SSH_MSG_CHANNEL_DATA) len=65520 - channel already closed

This says that the SSH connection has been closed, nothing more.

maybe i have misunderstood the problem, our user indicated that the repo init command will just hang there, but i can't find any related error message.
 

Thomas Wolf

unread,
Jun 24, 2022, 7:52:39 PM6/24/22
to Repo and Gerrit Discussion
On Friday, June 24, 2022 at 5:47:13 AM UTC+2 cdle...@gmail.com wrote:
and here is part of our gerrit.config,
[...]
[sshd]
[...]
        rekeyTimeLimit = 0 
        rekeyBytesLimit = 1099511627776

Don't know if this might be part of the problem. You are trying to avoid re-keying (that rekeyBytesLimit is 1TB). Presumably /platform/prebuilts/clang/host/linux-x86 is a large repository with more than 1GB of data transferred ?

I'm not sure this is effective; Apache MINA sshd has two more settings: rekeyPacketsLimit (default 2^31), and rekeyBlocksLimit (default 0, which means auto-computed). rekeyBlocksLimit is in measures of cipher blocks. For the chacha20-poly1305 cipher chosen by default with Apache MINA sshd 2.8.0 it is auto-computed to re-key every 1GB of data. For other ciphers (like the AES ciphers, which have a block size of 16 bytes) the default comes out as 2^36 bytes (2^32 blocks).

Gerrit doesn't have a property to set the rekeyBlocksLimit. OpenSSH has this to say about it: "ChaCha20 must never reuse a {key, nonce} for encryption nor may it be used to encrypt more than 2^70 bytes under the same {key, nonce}. The SSH Transport protocol (RFC4253) recommends a far more conservative rekeying every 1GB of data sent or received. If this recommendation is followed, then chacha20-poly1305[] requires no special handling in this area."[2] The "nonce" is the packet sequence number. See also RFC 4344.[3] OpenSSH chooses 1GB for chacha20-poly1305. From the aforementioned statement it appears that a much larger value would be feasible provided that that the rekeyPacketsLimit is not more than 2^31.

I also don't see chacha20-poly1305 mentioned in the Gerrit 3.6.1 docs at [1], even though it is supported by Apache MINA sshd 2.8.0.

You could try disabling the chacha20-poly1305 cipher via the sshd.cipher Gerrit configuration, so that one of the AES ciphers with a higher auto-computed rekey limit would be used.

If the fetch just hangs, as you wrote, it's perhaps because of issue 12758 [4] (which is SSHD-966 [5]), which is indeed about problems with re-keying. It should be fixed in Apache MINA sshd 2.9.0.

Makson Lee

unread,
Jun 24, 2022, 8:55:03 PM6/24/22
to Repo and Gerrit Discussion
On Saturday, June 25, 2022 at 7:52:39 AM UTC+8 Thomas Wolf wrote:
You could try disabling the chacha20-poly1305 cipher via the sshd.cipher Gerrit configuration, so that one of the AES ciphers with a higher auto-computed rekey limit would be used.

so a sshd.cipher setting like following will disable all ciphers including chacha20-poly1305 except the AES-CTR modes?

[sshd]
        listenAddress = *:29418
        threads = 40
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        cipher = +aes128-ctr,+aes192-ctr,+aes256-ctr,+aes128-cbc,+aes192-cbc,+aes256-cbc 

Makson Lee

unread,
Jun 24, 2022, 9:56:46 PM6/24/22
to Repo and Gerrit Discussion
sorry, fixed the wrong cipher setting,

[sshd]
        listenAddress = *:29418
        threads = 40
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        cipher = -chacha20-poly1305

Thomas Wolf

unread,
Jun 25, 2022, 2:53:45 AM6/25/22
to Repo and Gerrit Discussion
On Saturday, June 25, 2022 at 3:56:46 AM UTC+2 cdle...@gmail.com wrote:
sorry, fixed the wrong cipher setting,

[sshd]
        listenAddress = *:29418
        threads = 40
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        cipher = -chacha20-poly1305

The name of the cipher is "chacha20-poly1305", followed by an "@", followed by "openssh.com". This discussion board mangles this because it looks like an e-mail address.

If you can post a jstack -l -e thread dump of your Gerrit server when such a fetch hangs, I could take a look to see if it might indeed by issue 12758.

David Ostrovsky

unread,
Jun 25, 2022, 7:45:07 AM6/25/22
to Repo and Gerrit Discussion
Thomas Wolf schrieb am Samstag, 25. Juni 2022 um 01:52:39 UTC+2:
On Friday, June 24, 2022 at 5:47:13 AM UTC+2 cdle...@gmail.com wrote:
and here is part of our gerrit.config,
[...]
[sshd]
[...]
        rekeyTimeLimit = 0 
        rekeyBytesLimit = 1099511627776

Don't know if this might be part of the problem. You are trying to avoid re-keying (that rekeyBytesLimit is 1TB). Presumably /platform/prebuilts/clang/host/linux-x86 is a large repository with more than 1GB of data transferred ?

I'm not sure this is effective; Apache MINA sshd has two more settings: rekeyPacketsLimit (default 2^31), and rekeyBlocksLimit (default 0, which means auto-computed). rekeyBlocksLimit is in measures of cipher blocks. For the chacha20-poly1305 cipher chosen by default with Apache MINA sshd 2.8.0 it is auto-computed to re-key every 1GB of data. For other ciphers (like the AES ciphers, which have a block size of 16 bytes) the default comes out as 2^36 bytes (2^32 blocks).

Gerrit doesn't have a property to set the rekeyBlocksLimit. OpenSSH has this to say about it: "ChaCha20 must never reuse a {key, nonce} for encryption nor may it be used to encrypt more than 2^70 bytes under the same {key, nonce}. The SSH Transport protocol (RFC4253) recommends a far more conservative rekeying every 1GB of data sent or received. If this recommendation is followed, then chacha20-poly1305[] requires no special handling in this area."[2] The "nonce" is the packet sequence number. See also RFC 4344.[3] OpenSSH chooses 1GB for chacha20-poly1305. From the aforementioned statement it appears that a much larger value would be feasible provided that that the rekeyPacketsLimit is not more than 2^31.

I also don't see chacha20-poly1305 mentioned in the Gerrit 3.6.1 docs at [1], even though it is supported by Apache MINA sshd 2.8.0.

Thanks for flagging this, should be fixed in: [6].

  

You could try disabling the chacha20-poly1305 cipher via the sshd.cipher Gerrit configuration, so that one of the AES ciphers with a higher auto-computed rekey limit would be used.

If the fetch just hangs, as you wrote, it's perhaps because of issue 12758 [4] (which is SSHD-966 [5]), which is indeed about problems with re-keying. It should be fixed in Apache MINA sshd 2.9.0.

To add to this, I uploaded this pending change: [7], with the update
to upcoming sshd 2.9.0, with the fix to 12758 [4]. I would be great
if someone could try running gerrit from this change to confirm it
indeed fixed the deadlock during key exchange protocol.

Makson Lee

unread,
Jun 25, 2022, 10:47:30 AM6/25/22
to Repo and Gerrit Discussion
On Saturday, June 25, 2022 at 2:53:45 PM UTC+8 Thomas Wolf wrote:
On Saturday, June 25, 2022 at 3:56:46 AM UTC+2 cdle...@gmail.com wrote:
sorry, fixed the wrong cipher setting,

[sshd]
        listenAddress = *:29418
        threads = 40
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        cipher = -chacha20-poly1305

The name of the cipher is "chacha20-poly1305", followed by an "@", followed by "openssh.com". This discussion board mangles this because it looks like an e-mail address.
 
I have fixed it with the right cipher name, thanks. 

 
If you can post a jstack -l -e thread dump of your Gerrit server when such a fetch hangs, I could take a look to see if it might indeed by issue 12758.

 
Will provide it to you if a fetch hangs again. 

Makson Lee

unread,
Jun 30, 2022, 7:55:46 PM6/30/22
to Repo and Gerrit Discussion
Just for your information, no fetch hangs again since we disabled cipher chacha20-poly1305 five days ago.
Reply all
Reply to author
Forward
0 new messages