Hi,
We’re leveraging netty (v4.1.92) via Vertx (v4.3.8) configured to use OpenJdkSSLEngine with a deletegatedTaskExecutor() when creating the ssl_handler for our webserver. In general, we see SSL operations (e.g., certificate validation) occurring on a worker thread, but we have run into a deadlock due to our current implementation and wanted some feedback on options.
The crux of the issue is that during our certificate validation via our custom TrustManager, we need to communicate with another internal component (which is running on a typical event loop thread) to complete the validation processing, but other SSL operations block the event loop on which that internal service is running by grabbing a lock. Its my understanding that not all blocking SSL operations are offloaded, so there are likely more cases like these floating around.
What we are specifically seeing is that while we’re waiting on a response from the internal component to validate the certificate, a simultaneous request to close a TLS connection comes in. The closing of the session is running on the event loop and blocks by waiting on a lock, but unfortunately, that lock is held by the worker thread waiting on a response from the internal component. IOW, the internal service cannot respond to the worker thread because it its event loop is blocked by the close connection processing. At this point, our entire webserver is dead in the water until we restart the JVM.
Sample traces are listed below from this example:
We’ve looked at several options:
Any thoughts on other/better ways to handle this?
Thanks in advance,
Chris
# close connection handler running on event loop thread 0 is blocked
Sep 15 10:13:43.470794 APM00203802612-A control-path[37578]: "vert.x-eventloop-thread-0" #25 prio=5 os_prio=0 cpu=595313.19ms elapsed=28293.78s tid=0x00007f5bb8021800 nid=0x22c waiting for monitor entry [0x00007f5c0034d000]
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: java.lang.Thread.State: BLOCKED (on object monitor)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at sun.security.ssl.SSLEngineImpl.closeOutbound(java...@11.0.19/SSLEngineImpl.java:749)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: - waiting to lock <0x00000000dca1d0f8> (a sun.security.ssl.SSLEngineImpl)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1874)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1078)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Sep 15 10:13:43.471494 APM00203802612-A control-path[37578]: at java.lang.Thread.run(java...@11.0.19/Thread.java:829)
# Trust manager waiting on a response from our internal component which is running on event loop thread 0. Note that the lock that the close connection handler is waiting on is held by this thread.
Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: "vert.x-internal-blocking-1" #42 prio=5 os_prio=0 cpu=23498.06ms elapsed=28290.74s tid=0x00007f5bb02b1000 nid=0x23b waiting on condition [0x00007f5bc4e59000]Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: java.lang.Thread.State: WAITING (parking)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at jdk.internal.misc.Unsafe.park(java...@11.0.19/Native Method)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: - parking to wait for <0x00000000dca24ec0> (a java.util.concurrent.CountDownLatch$Sync)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at java.util.concurrent.locks.LockSupport.park(java...@11.0.19/LockSupport.java:194)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java...@11.0.19/AbstractQueuedSynchronizer.java:885)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java...@11.0.19/AbstractQueuedSynchronizer.java:1039)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java...@11.0.19/AbstractQueuedSynchronizer.java:1345)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at java.util.concurrent.CountDownLatch.await(java...@11.0.19/CountDownLatch.java:232)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at io.reactivex.rxjava3.internal.observers.BlockingMultiObserver.blockingGet(BlockingMultiObserver.java:86)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at io.reactivex.rxjava3.core.Single.blockingGet(Single.java:3645)Sep 15 10:13:43.479257 APM00203802612-A control-path[37578]: at com.emc.cyclone.contexts.security.credstore.CycTrustOptions$CycTrustManager.validateCertificateWithCmfa(CycTrustOptions.java:442)
--
You received this message because you are subscribed to the Google Groups "Netty discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to netty+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/netty/8f31febf-5aaa-47b5-8a37-b71d63edf6acn%40googlegroups.com.