Hi,We're having the same issue with Gerrit 2.8.5, ssh-worker threads get stuck at this exact stack trace (on an average 1 would get stuck per day, with 4 active stream-events listeners), and when all are stuck, no further event is pushed to subscribers.It doesn't get sorted/unfreeze if the listeners get disconnected.I've upgrade sshd-core from 0.11.1-atlassian to 0.12.0 (simply updated the jar inside the war, no migration/rebuild required) yesterday, and so far I haven't seen any thread get stuck.I suspected this change [1] could be related, but failed to find any hard proof.I'll update this thread if the issue re-occurs (or not).
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi,
Looking at the Gerrit thread dump below:
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(J)V(Native Method)
- waiting on <0x00007f22a7b62290> (a org.apache.sshd.common.channel.Window)
at java.lang.Object.wait()V(Object.java:503)
at org.apache.sshd.common.channel.Window.waitForSpace()I(Window.java:148)
- locked <0x00007f22a7b62290> (a org.apache.sshd.common.channel.Window)
at org.apache.sshd.common.channel.ChannelOutputStream.flush()V(ChannelOutputStream.java:116)
- locked <0x00007f22a7b62478> (a org.apache.sshd.common.channel.ChannelOutputStream)
at sun.nio.cs.StreamEncoder.implFlush()V(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush()V(StreamEncoder.java:141)
- locked <0x00007f22a7b83618> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flush()V(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush()V(BufferedWriter.java:254)
- locked <0x00007f22a7b83618> (a java.io.OutputStreamWriter)
at java.io.PrintWriter.flush()V(PrintWriter.java:320)
- locked <0x00007f22a7b835d0> (a java.io.BufferedWriter)
at java.io.PrintWriter.checkError()Z(PrintWriter.java:357)
at com.google.gerrit.sshd.commands.StreamEvents.writeEvents()V(StreamEvents.java:186)
at com.google.gerrit.sshd.commands.StreamEvents.access$100(Lcom/google/gerrit/sshd/commands/StreamEvents
In the class org.apache.sshd.common.channel.Window , line 148
at org.apache.sshd.common.channel.Window.waitForSpace()(Window.java:148)
the waiting happens when the window size == 0 and it is not closed.
If you can enable logging at the debug level for the library “Apache MINA SSHD Project” you can try to confirm the size of the window. If the window is overflown then you will have a message logged like “Waiting for some space on …”. Perhaps the client streams too many events and the size of the window is too small. The Javadoc for this class states:
” A Window for a given channel.
Windows are used to not overflow the client or server when sending datas. Both clients and servers have a local and remote window and won't send anymore data until the window has been expanded.”
I think the solution for this problem could be to allow to configure a bigger window depending on the anticipated load or expand the window at the run time without restarting Gerrit.
Regards,
Olga Grinberg
Sasa,Were you able to find a workaround for such issue other than restarting the instance ?
Is it still re-occuring at your end ?
Hi Saša,
We are planning to deploy 2.9 in production soon but we want to avoid restarting the server so I added an ssh command that re-size the pool of ssh-stream-workers as a temporary solution if we get the same problem. This will allow us to allocate more worker threads to compensate for the ones that are stuck.
How many thread do you have for ssh-stream-workers (I actually want to know how many threads get stuck over a 1 to 2 week period)?
Did you try reverting the sshd library?
We are now using 2.9 in production and we have this issue :(
I investigated and found out that, as Shawn suspected, the stuck thread is waiting for space but the client is disconnected. I tried to reproduce the problem by disconnecting the client when the server is waiting for space but the server handled the disconnection and stopped to wait for space.
We are now using 2.9 in production and we have this issue :(
I investigated and found out that, as Shawn suspected, the stuck thread is waiting for space but the client is disconnected. I tried to reproduce the problem by disconnecting the client when the server is waiting for space but the server handled the disconnection and stopped to wait for space.
Any idea how reproduce/fix that problem?
I was wondering if this issue has been fixed? if so what was the change to fix it? Thanks.
Yes, this issue was solved.
On Saturday, November 15, 2014, Khai Do <zaro...@gmail.com> wrote:
We are on 2.8.4 and we use stream events extensively so I can confirm that this isn't a problem in 2.8. We are thinking of upgrading so I wanted to make sure that it's fixed before we attempt an upgrade. Thanks for the info David.--
On Saturday, November 15, 2014 8:24:24 AM UTC-8, David Ostrovsky wrote:
Am Samstag, 15. November 2014 16:48:06 UTC+1 schrieb Khai Do:I was wondering if this issue has been fixed? if so what was the change to fix it? Thanks.These changes are needed to upgrade SSHD to version 0.13.We havn't tried to back port them to 2.8, though. Also note, thatnobody has confirmed yet, that the problem was actually fixed.
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
Vlad (evlacan) said it was solved but actually, we reverted the sshd library in 2.9, which solved our problem and did not upgraded to include sshd newer version yet.
So, like David said, nobody confirmed that the problem is fixed. Somebody reported that issue in the tracker[1], another possible user that could confirm if it is fixed.
[1]https://code.google.com/p/gerrit/issues/detail?id=3013
I did that before you contributed the downgrade so I simply reverted the 4 commits that were done after 2.8.4 regarding sshd. I wanted to go back to exactly what we had in 2.8.4, it was a known stable state for us (we were not experiencing any issues that were fixed in later version of sshd).
Am Montag, 17. November 2014 18:59:45 UTC+1 schrieb Hugo Arès:I did that before you contributed the downgrade so I simply reverted the 4 commits that were done after 2.8.4 regarding sshd. I wanted to go back to exactly what we had in 2.8.4, it was a known stable state for us (we were not experiencing any issues that were fixed in later version of sshd).Thanks. I would suggest not to wait for confirmation (looks like only you, Gustaf or Saša can confirm that it works)
and release 2.9.2 ASAP with upgraded SSHD to 0.13. We know that the downgrade path works.
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.