Gerrit checkout/checkin "Internal Server error" after upgraded to 3.8 version

586 views
Skip to first unread message

sunyl...@gmail.com

unread,
Mar 14, 2024, 6:13:17 AM3/14/24
to Repo and Gerrit Discussion
Anyone can help?
      We have a Gerrit server at our headquarters, and after we upgraded from 2.12 to 3.8.3, we often encountered the "Internal server error" problems when did "git fetch" from other regional offices, and this is a random problem.
     We have checked that the problem is very likely to be encountered when the download speed is less than 150KB/s. Currently, this is only a problem in our European offices, but is less common in other regions.
We have checked the relevant repositories and there is no problem, is there any parameter that can be adjusted to solve this problem?
      The other issue is that our current gerrit system runs for a while, say a month or so, and then the "git push /git pull" gets stuck often, and then we need to restart the gerrit service to refresh all the Gerrit connections temprarily to fix the problem, so do you have any suggestions on how to deal with this?

    Any advice is appreciated!

   [2024-03-13T05:54:54.800-07:00] [SSH git-upload-pack /cv/eva/cvtask/DeepOF/DeepOF (useri)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user useri account 1000102) during git-upload-pack '/cv/eva/cvtask/DeepOF/DeepOF'
[2024-03-13T14:00:58.733-07:00] [SSH git-upload-pack /cv/cv_libraries (qaauto)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user qaauto account 1000225) during git-upload-pack '/cv/cv_libraries'

collecting garbage for "cv/eva/cvtask/DeepOF/DeepOF":
Pack refs:              100% (50/50)
Counting objects:       787
Finding sources:        100% (787/787)
Getting sizes:          100% (271/271)
Compressing objects:    100% (217940/217943)
Writing objects:        100% (787/787)
Selecting commits:      100% (58/58)
Building bitmaps:       100% (58/58)
Finding sources:        100% (9/9)
Getting sizes:          100% (6/6)
Compressing objects:    100% (443/443)
Writing objects:        100% (9/9)
Prune loose objects also found in pack files: 100% (11/11)
Prune loose, unreferenced objects: 100% (11/11)
done.

TRACE_ID: 1710409836914-8bac8a0d

collecting garbage for "cv/cv_libraries":
Pack refs:              100% (768/768)
Counting objects:       13257
Finding sources:        100% (13257/13257)
Getting sizes:          100% (1059/1059)
Compressing objects:    100% (10232/10243)
Writing objects:        100% (13257/13257)
Selecting commits:      100% (2724/2724)
Building bitmaps:       100% (171/171)
Finding sources:        100% (43/43)
Getting sizes:          100% (12/12)
Compressing objects:    100% (58953/58953)
Writing objects:        100% (43/43)
Prune loose objects also found in pack files: 100% (2/2)
Prune loose, unreferenced objects: 100% (2/2)
done.

TRACE_ID: 1710409679705-8bac8a0d
=================================================================
Gerrit.conf 
[sendemail]
        smtpServer = localhost
[container]
        user = gerrit
        javaHome = /usr/lib/jvm/java-11-openjdk-amd64
        javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
        javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
        listenAddress = *:29418
        maxConnectionsPerUser = 0
[httpd]
        listenUrl = proxy-https://127.0.0.1:8081/
[cache]
        directory = cache
[index]
        type = lucene
[receive]
        enableSignedPush = false
[core]
        packedGitLimit = 4g
        packedGitWindowSize = 16k
        packedGitOpenFiles = 2048
[lfs]
        plugin = lfs
[gitweb]
        type = gitweb
        cgi = /usr/lib/cgi-bin/gitweb.cgi

Matthias Sohn

unread,
Mar 14, 2024, 7:45:13 AM3/14/24
to sunyl...@gmail.com, Repo and Gerrit Discussion
On Thu, Mar 14, 2024 at 11:13 AM sunyl...@gmail.com <sunyl...@gmail.com> wrote:
Anyone can help?
      We have a Gerrit server at our headquarters, and after we upgraded from 2.12 to 3.8.3, we often encountered the "Internal server error" problems when did "git fetch" from other regional offices, and this is a random problem.
     We have checked that the problem is very likely to be encountered when the download speed is less than 150KB/s. Currently, this is only a problem in our European offices, but is less common in other regions.
We have checked the relevant repositories and there is no problem, is there any parameter that can be adjusted to solve this problem?

Check the server's error_log, it should contain stack traces with details about what failed and why.
 
      The other issue is that our current gerrit system runs for a while, say a month or so, and then the "git push /git pull" gets stuck often, and then we need to restart the gerrit service to refresh all the Gerrit connections temprarily to fix the problem, so do you have any suggestions on how to deal with this?

Do you run git gc on all repositories served by the gerrit server on a regular interval ?
If a request is stuck, create a couple of thread dumps on the server to get insight into where it's stuck.
 
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/32d450de-455c-4838-b5d8-8e13cf9603cen%40googlegroups.com.

sunyl...@gmail.com

unread,
Mar 15, 2024, 4:30:52 AM3/15/24
to Repo and Gerrit Discussion
Hi Matthias 
     Thanks a lot for replying.

     Here is our users' side  error message.
-------------------------------------------------------------------
 remote: Counting objects: 117647, done

remote: Finding sources: 100% (117647/117647)

fatal: internal server error

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

=============================================================

Here is the server's error log. 

   But, unfortunately, I can't fully understand this error and we may need your help to analyze some experiences.

     [2024-03-14T09:30:41.418-07:00] [SSH git-upload-pack /cv/arm_scheduler (qaauto)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user qaauto account 1000225) during git-upload-pack '/cv/arm_scheduler'
org.apache.sshd.common.channel.WindowClosedException: Already closed: RemoteWindow[server](ChannelSession[id=3, recipient=6]-ServerSessionImpl[qaauto:59056])
        at org.apache.sshd.common.channel.RemoteWindow.waitForCondition(RemoteWindow.java:230)
        at org.apache.sshd.common.channel.RemoteWindow.waitForSpace(RemoteWindow.java:187)
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:278)
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:201)
        at org.eclipse.jgit.transport.UploadPack$ResponseBufferedOutputStream.write(UploadPack.java:2553)
        at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:141)
        at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:120)
        at org.eclipse.jgit.internal.storage.io.CancellableDigestOutputStream.write(CancellableDigestOutputStream.java:110)
        at org.eclipse.jgit.internal.storage.file.ByteArrayWindow.write(ByteArrayWindow.java:58)
        at org.eclipse.jgit.internal.storage.file.Pack.copyAsIs2(Pack.java:535)
        at org.eclipse.jgit.internal.storage.file.Pack.copyAsIs(Pack.java:388)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:196)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1813)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeBase(PackWriter.java:1855)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1806)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeBase(PackWriter.java:1855)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1806)
        Suppressed: org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=3, recipient=6]-ServerSessionImpl[qaauto@]] SSH_MSG_CHANNEL_DATA) len=26 - channel already closed
                at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:146)
                at org.eclipse.jgit.transport.UploadPack$ResponseBufferedOutputStream.write(UploadPack.java:2553)
                at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:141)
                at org.eclipse.jgit.transport.SideBandOutputStream.flushBuffer(SideBandOutputStream.java:94)
                at org.eclipse.jgit.transport.SideBandOutputStream.flush(SideBandOutputStream.java:100)
                at org.eclipse.jgit.transport.UploadPack$SideBandErrorWriter.writeError(UploadPack.java:2586)
                at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:799)
                ... 12 more
    ==================================================================================
     Do you run git gc on all repositories served by the gerrit server on a regular interval ?
If a request is stuck, create a couple of thread dumps on the server to get insight into where it's stuck.
    ===============
    We are using a VM server with 8 virtual CPU cores and 64G RAM, what should be a reasonable thread value for us? Our peak tasks are going to be over 200, and I have referred to another previous post page, but didn't get any more valid opinions.
A lot of people are saying that the SSHD version is the cause of this problem, is there anything else that is more definitive?

Matthias Sohn

unread,
Mar 15, 2024, 4:50:29 AM3/15/24
to sunyl...@gmail.com, Repo and Gerrit Discussion
This looks like the client disconnected maybe due to a timeout.
Check timeout and keepalive configuration of the failing ssh clients.
 
    ==================================================================================
     Do you run git gc on all repositories served by the gerrit server on a regular interval ?
If a request is stuck, create a couple of thread dumps on the server to get insight into where it's stuck.
    ===============
    We are using a VM server with 8 virtual CPU cores and 64G RAM, what should be a reasonable thread value for us? Our peak tasks are going to be over 200, and I have referred to another previous post page, but didn't get any more valid opinions.

As a rule of thumb you can run 1-2 concurrent git requests per CPU core.
This means if you have 8 cores don't expect you can run 200 concurrent git requests on that machine size.
Set sshd.threads to at most 16 on this machine to ensure you don't overload it.
Memory needed to process these requests typically depends on the size of the repository these requests run on.
Cloning a large repository can keep a core busy for many minutes .
 

sunyl...@gmail.com

unread,
Mar 22, 2024, 2:22:53 AM3/22/24
to Repo and Gerrit Discussion
Matthias
     Thanks a lot first
      After double-checking and confirming, the root cause of this issue is mainly due to the download speed of the client server machine

We solved the network congestion problem on the client-server and then enabled the "bbr" on the Gerrit cluster server side, which greatly improved the problem.


Thanks again for your great help


    net.core.default_qdisc=fq

net.ipv4.tcp_congestion_control=bbr

Reply all
Reply to author
Forward
0 new messages