Threads hanging on socketRead0

387 views
Skip to first unread message

Luke Engle

unread,
Dec 16, 2019, 4:06:30 PM12/16/19
to Repo and Gerrit Discussion
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

I've attempted to define 30 minute read/connect timeouts via java options with the following (which has not worked):
javaOptions = -Dsun.net.client.defaultReadTimeout=1800000
javaOptions = -Dsun.net.client.defaultConnectTimeout=1800000

All our enabled java options:
javaOptions = -Djava.rmi.server.hostname=<our hostname> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=<port>
javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
javaOptions = -Dsun.net.client.defaultReadTimeout=1800000
javaOptions = -Dsun.net.client.defaultConnectTimeout=1800000
javaOptions = --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED


Example thread dump:
"Thread-63" #194 daemon prio=5 os_prio=0 cpu=0.14ms elapsed=224528.26s tid=0x00007ef7e8052800 nid=0x676a runnable  [0x00007ef6df6f5000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(java...@11.0.5/Native Method)
        at java.net.SocketInputStream.socketRead(java...@11.0.5/SocketInputStream.java:115)
        at java.net.SocketInputStream.read(java...@11.0.5/SocketInputStream.java:168)
        at java.net.SocketInputStream.read(java...@11.0.5/SocketInputStream.java:140)
        at sun.security.ssl.SSLSocketInputRecord.read(java...@11.0.5/SSLSocketInputRecord.java:448)
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java...@11.0.5/SSLSocketInputRecord.java:68)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java...@11.0.5/SSLSocketImpl.java:1104)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java...@11.0.5/SSLSocketImpl.java:823)
        - locked <0x00007efa8a0580c0> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
        at java.io.BufferedInputStream.fill(java...@11.0.5/BufferedInputStream.java:252)
        at java.io.BufferedInputStream.read1(java...@11.0.5/BufferedInputStream.java:292)
        at java.io.BufferedInputStream.read(java...@11.0.5/BufferedInputStream.java:351)
        - locked <0x00007efa8a05a1c0> (a java.io.BufferedInputStream)
        at com.sun.jndi.ldap.Connection.run(java....@11.0.5/Connection.java:793)
        at java.lang.Thread.run(java...@11.0.5/Thread.java:834)

Thanks,
Luke

Matthias Sohn

unread,
Dec 16, 2019, 4:58:44 PM12/16/19
to Luke Engle, Repo and Gerrit Discussion
On Mon, Dec 16, 2019 at 10:06 PM Luke Engle <luke....@hpe.com> wrote:
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

prerequisite is Java 8 [1] running on java 11 is not yet supported

 
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/17f8a32e-411e-4029-abb3-196a5b10879a%40googlegroups.com.

Luca Milanesio

unread,
Dec 16, 2019, 5:02:46 PM12/16/19
to Luke Engle, Repo and Gerrit Discussion, Luca Milanesio, Matthias Sohn

On 16 Dec 2019, at 21:58, Matthias Sohn <matthi...@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:06 PM Luke Engle <luke....@hpe.com> wrote:
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

prerequisite is Java 8 [1] running on java 11 is not yet supported


One of the reasons why it is officially support, is the known issues of Java11 with LDAPS (see [2]).
I believe the problem you are seeing is *exactly* on the SSL handshake hanging on LDAP, isn’t it? You nailed it :-)

Luca.


Matthias Sohn

unread,
Dec 16, 2019, 5:25:11 PM12/16/19
to Luca Milanesio, Luke Engle, Repo and Gerrit Discussion
On Mon, Dec 16, 2019 at 11:02 PM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 16 Dec 2019, at 21:58, Matthias Sohn <matthi...@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:06 PM Luke Engle <luke....@hpe.com> wrote:
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

prerequisite is Java 8 [1] running on java 11 is not yet supported


One of the reasons why it is officially support, is the known issues of Java11 with LDAPS (see [2]).
I believe the problem you are seeing is *exactly* on the SSL handshake hanging on LDAP, isn’t it? You nailed it :-)

Luca.


you can try the workaround given in [2] and add these Java system properties disabling TLSv1.3:

Disabling TLS 1.3 completely with -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 

Matthias Sohn

unread,
Dec 16, 2019, 5:26:25 PM12/16/19
to Luca Milanesio, Luke Engle, Repo and Gerrit Discussion
On Mon, Dec 16, 2019 at 11:24 PM Matthias Sohn <matthi...@gmail.com> wrote:
On Mon, Dec 16, 2019 at 11:02 PM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 16 Dec 2019, at 21:58, Matthias Sohn <matthi...@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:06 PM Luke Engle <luke....@hpe.com> wrote:
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

prerequisite is Java 8 [1] running on java 11 is not yet supported


One of the reasons why it is officially support, is the known issues of Java11 with LDAPS (see [2]).
I believe the problem you are seeing is *exactly* on the SSL handshake hanging on LDAP, isn’t it? You nailed it :-)

Luca.


you can try the workaround given in [2] and add these Java system properties disabling TLSv1.3:

Disabling TLS 1.3 completely with -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 

though better do not enable the older TLSv1 and TLSv1.1 and use -Dhttps.protocols=TLSv1.2

Luca Milanesio

unread,
Dec 16, 2019, 5:27:28 PM12/16/19
to Luke Engle, Repo and Gerrit Discussion, Luca Milanesio, Matthias Sohn

On 16 Dec 2019, at 22:02, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 16 Dec 2019, at 21:58, Matthias Sohn <matthi...@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:06 PM Luke Engle <luke....@hpe.com> wrote:
Since upgrading to Gerrit 3.0.3 and java 11.0.5 from Gerrit 2.16.7 and 8.0.0_172 we've noticed a lot of threads hanging, which is eventually causing gerrit to blow out its open file limit due to the open file descriptors the thread hangs accumulate. The thread dumps aren't exactly clear what the threads are doing aside from they're clearly trying to read *something*. Does anyone have any ideas?

prerequisite is Java 8 [1] running on java 11 is not yet supported


One of the reasons why it is officially support, is the known issues of Java11 with LDAPS (see [2]).

Sorry, it was a typo: Gerrit on Java11 it is NOT officially supported (yet).

P.S. Not Gerrit’s fault, but problems that we know about. There could be potentially more though.

Luke Engle

unread,
Dec 16, 2019, 5:28:01 PM12/16/19
to Repo and Gerrit Discussion
Ah darn :( I saw the thread on [1] and wrongly assumed 'So, to clarify, Gerrit can be built with Java 11, and produce byte code major number 55, and also all tests are passing on Java 11' meant it was 'unofficially' supported with java 11. In fact, it *does* seem to work perfectly fine for everything except that ssl handshake hang. I even added ldap connect/read timeouts to the gerrit.config with no luck there as well.

The interesting thing is that it doesn't cause any noticeable problems, even though since Friday we've had >1500 ldap ssl handshake threads that have hung. It seems like we could bump the ulimit to a very high number and bypass any open file limit issues for many weeks/months.

So, reverting to java 8 is our only viable option?

Thanks,


On Monday, December 16, 2019 at 2:02:46 PM UTC-8, lucamilanesio wrote:
On 16 Dec 2019, at 21:58, Matthias Sohn <matthi...@gmail.com> wrote:

        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java.ba...@11.0.5/SSLSocketInputRecord.java:68)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.ba...@11.0.5/SSLSocketImpl.java:1104)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java...@11.0.5/SSLSocketImpl.java:823)
        - locked <0x00007efa8a0580c0> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
        at java.io.BufferedInputStream.fill(java...@11.0.5/BufferedInputStream.java:252)
        at java.io.BufferedInputStream.read1(java...@11.0.5/BufferedInputStream.java:292)
        at java.io.BufferedInputStream.read(java...@11.0.5/BufferedInputStream.java:351)
        - locked <0x00007efa8a05a1c0> (a java.io.BufferedInputStream)
        at com.sun.jndi.ldap.Connection.run(java....@11.0.5/Connection.java:793)
        at java.lang.Thread.run(java.ba...@11.0.5/Thread.java:834)

Thanks,
Luke

-- 
-- 
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

-- 
-- 
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

Luca Milanesio

unread,
Dec 16, 2019, 5:33:06 PM12/16/19
to Luke Engle, Luca Milanesio, Repo and Gerrit Discussion

On 16 Dec 2019, at 22:28, Luke Engle <luke....@hpe.com> wrote:

Ah darn :( I saw the thread on [1] and wrongly assumed 'So, to clarify, Gerrit can be built with Java 11, and produce byte code major number 55, and also all tests are passing on Java 11' meant it was 'unofficially' supported with java 11. In fact, it *does* seem to work perfectly fine for everything except that ssl handshake hang. I even added ldap connect/read timeouts to the gerrit.config with no luck there as well.

The interesting thing is that it doesn't cause any noticeable problems, even though since Friday we've had >1500 ldap ssl handshake threads that have hung. It seems like we could bump the ulimit to a very high number and bypass any open file limit issues for many weeks/months.

So, reverting to java 8 is our only viable option?

You could try the workaround proposed by Matthias (disabling TLS 1.3) and see if that helps.
There could be more issues that we don’t know about yet, so we cannot tell.

I am planning to move to Java 11 on GerritHub.io very soon, as we are heading to Gerrit v3.2.
Gerrit master is already built and validated for Java 11 *as well* on Gerrit-CI.

DavidO did a fantastic job in solving a lot of issues on Java 11, however, others are beyond our reach as they are JVM-related.

Luca.


More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d1ffed48-ca03-45ca-842c-419aa45debb5%40googlegroups.com.

Luke Engle

unread,
Jan 6, 2020, 3:06:15 PM1/6/20
to Repo and Gerrit Discussion
I added

javaOptions = -Dhttps.protocols=TLSv1.2

To the container section of the gerrit.config, and I'm still experiencing the same thread hangs. I've verified that the running gerrit instance does have the option enabled via jps -lvm

$ jps -lvm
28181 ... -Dhttps.protocols=TLSv1.2 ...

I can significantly increase the ulimit for the running process so we don't hit the issue for a while which should hold us over for a bit until 3.2 can be released, unless someone has any other suggestions for workarounds?

Thanks,
Luke

On Monday, December 16, 2019 at 2:26:25 PM UTC-8, Matthias Sohn wrote:
On Mon, Dec 16, 2019 at 11:24 PM Matthias Sohn <matthi...@gmail.com> wrote:
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(java.ba...@11.0.5/SSLSocketInputRecord.java:68)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord(java.ba...@11.0.5/SSLSocketImpl.java:1104)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read(java...@11.0.5/SSLSocketImpl.java:823)
        - locked <0x00007efa8a0580c0> (a sun.security.ssl.SSLSocketImpl$AppInputStream)
        at java.io.BufferedInputStream.fill(java...@11.0.5/BufferedInputStream.java:252)
        at java.io.BufferedInputStream.read1(java...@11.0.5/BufferedInputStream.java:292)
        at java.io.BufferedInputStream.read(java...@11.0.5/BufferedInputStream.java:351)
        - locked <0x00007efa8a05a1c0> (a java.io.BufferedInputStream)
        at com.sun.jndi.ldap.Connection.run(java....@11.0.5/Connection.java:793)
        at java.lang.Thread.run(java.ba...@11.0.5/Thread.java:834)

Thanks,
Luke

-- 
-- 
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

-- 
-- 
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages