Number of interactive ssh threads are leaking, growing forever and stalls

460 views
Skip to first unread message

Erik ht

unread,
Apr 27, 2022, 2:13:14 AM4/27/22
to Repo and Gerrit Discussion
Hi.
We are seeing some issues with Gerrit 3.5.0.1, where the number of active interactive threads in gerrit just keep growing. We've had same problem with 3.4 as well, probably even earlier but never investigated deeply.
Leading to number of threads remaining to serve requests being very low and frequently stalling the ssh queue.

See following graph of gerrit queue_ssh_batch_worker_active_threads metrics over several days, where batch threads jump up a lot every time a build server does a clean repo sync. But once done, it doesn't go down by same amount:
gerrit_threads.png

Our gerrit.conf has following timeout-parameters set
[sshd]
    idleTimeout = 10 minutes
[transfer]
    timeout = 6 minutes
[deadline "general"]
    timeout = 4h

Are there more timeout parameters missing?

See also attached thread dump:
gerrit_thread_dump.txt

Matthias Sohn

unread,
Apr 27, 2022, 4:26:26 AM4/27/22
to Erik ht, Repo and Gerrit Discussion
in the thread dump I see 
  • a ton of concurrent UploadPack (fetch) requests
  • some deadlocked threads
How did you configure thread pools and on what size of hardware do you run this ?
Did you check the sshd_log for anomalies ? Usually there is a LOGIN, one or multiple commands and then a LOGOUT for each session.

-Matthias
 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/58e0f4f8-0379-4cae-bf2e-093ccb374ee1n%40googlegroups.com.

Thomas Wolf

unread,
Apr 27, 2022, 7:19:09 AM4/27/22
to Repo and Gerrit Discussion
On Wednesday, April 27, 2022 at 10:26:26 AM UTC+2 Matthias Sohn wrote:
On Wed, Apr 27, 2022 at 8:13 AM Erik ht <er...@haleytek.com> wrote:
Hi.
We are seeing some issues with Gerrit 3.5.0.1, where the number of active interactive threads in gerrit just keep growing. We've had same problem with 3.4 as well, probably even earlier but never investigated deeply.

 
in the thread dump I see 
  • a ton of concurrent UploadPack (fetch) requests
  • some deadlocked threads
 The deadlocks look very much like SSHD-966 / Gerrit issue 12758.

The ultimate cause is bugs in the SSH key exchange implementation in Apache MINA sshd, which has race conditions and is prone to causing deadlocks due to lock inversion if exceptions occur during KEX.

I've prepared a fix in Apache MINA sshd (PR 217, hopefully to be included in Apache MINA sshd 2.9.0), but as it is a major non-trivial rewrite of fairly central parts of Apache MINA sshd it could benefit from much more extensive testing since we didn't succeed in creating an automated test that would reliably exhibit the problem. The Apache MINA sshd tests, the JGit SSH tests, and the Gerrit tests all succeed with the fix.
 
Cheers,

  Thomas

Erik ht

unread,
May 31, 2022, 7:29:51 AM5/31/22
to Repo and Gerrit Discussion
Thank you for the fix Thomas.
Do you have any rough idea how long it would take for this to make it into a release all the way from MINA to JGit to Gerrit? From my understanding it is not released in MINA yet so could not be part of the latest Gerrit 3.6 yet?

David Ostrovsky

unread,
May 31, 2022, 10:45:32 AM5/31/22
to Repo and Gerrit Discussion
Erik ht schrieb am Dienstag, 31. Mai 2022 um 13:29:51 UTC+2:
Thank you for the fix Thomas.
Do you have any rough idea how long it would take for this to make it into a release all the way from MINA to JGit to Gerrit?

It's available for gerrit@HEAD: [1].

Reply all
Reply to author
Forward
0 new messages