On Mon, 4 Sept 2023 at 15:08, Christian Gagneraud <
chg...@gmail.com> wrote:
> So it looks like the waiting time is big, using "cut -d' ' -f 10,11
> logs/sshd_log" to grab wait and exec times, the wait time up to 500s.
> This could indicate that the server is short of ssh connections?
Using jstack i can see that most of the SSH threads (and other threads
too) are parked:
"SSH git-upload-pack /aosp/platform/external/iw (jenkins.bot)" #181
prio=1 os_prio=0 cpu=6567240.07ms elapsed=1557960.01s
tid=0x00007f41640ca000 nid=0x14e waiting on condition
[0x00007f3ff50fe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java...@11.0.19/Native Method)
- parking to wait for <0x000000078d804410> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java...@11.0.19/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java...@11.0.19/AbstractQueuedSynchronizer.java:2081)
at org.apache.sshd.common.channel.ChannelPipedInputStream.read(ChannelPipedInputStream.java:144)
at org.eclipse.jgit.util.IO.readFully(IO.java:201)
at org.eclipse.jgit.transport.PacketLineIn.readLength(PacketLineIn.java:316)
at org.eclipse.jgit.transport.PacketLineIn.readString(PacketLineIn.java:180)
at org.eclipse.jgit.transport.ProtocolV0Parser.recvWants(ProtocolV0Parser.java:66)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:1062)
at org.eclipse.jgit.transport.UploadPack.uploadWithExceptionPropagation(UploadPack.java:873)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:781)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:101)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:109)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:74)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:492)
- locked <0x000000078c9ef880> (a
com.google.gerrit.sshd.BaseCommand$TaskThunk)
at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:113)
at java.util.concurrent.Executors$RunnableAdapter.call(java...@11.0.19/Executors.java:515)
at java.util.concurrent.FutureTask.run(java...@11.0.19/FutureTask.java:264)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java...@11.0.19/ScheduledThreadPoolExecutor.java:304)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:675)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@11.0.19/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@11.0.19/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java...@11.0.19/Thread.java:829)
This condition object is likely similar to pthread_cond_wait, so the
pool is too small it seems, but only 1 git unpack thread is in
RUNNABLE state.....
Candidates seem to be:
pack.threads
sshd.threads
sshd.batchThreads
and batchThread is 2 by default, so this might be it, I need to try...
So I ended up with
[sshd]
listenAddress = *:29418
maxconnectionsperuser = 64
threads = 64
batchThreads = 64
waitTimeout = 5m
The waitTimeout helps with our network issue.
And now, it's working way better.
clone on localhost went from 9m30 to 7m40, which gives 45MB/s.
CI agents have now an improved clone reliability and speed.
Good enough for now, that's all folks! :)
Chris
PS: Thanks to everyone who has answered similar questions on this ML
in the past! :)