Hi guys, we are in a process of upgrading a very old Quarkus application, the first we wrote that went in production in summer 2019.
It's a long process because everytime we encounter blocking problems that force us to rollback one step to understand.
Now we have also a canary environment, but we was unable to spot the problem, given the low number of user that are routed there.
Yesterday we upgraded from 1.4.1.Final to 1.8.0.Final, last step before 1.13.x (and RestEasy-Reactive) when we will plan an upgrade to Quarkus 2.
We needed to rollback because all event thread were blocked with this WARNING:
06/07/2021 13:48:10,858 WARNING [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-0,5,main]=Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4025547 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked
at java...@11.0.1/java.lang.Object.wait(Native Method)
at java...@11.0.1/java.lang.Object.wait(Object.java:328)
at app//io.undertow.vertx.VertxHttpExchange.awaitWriteable(VertxHttpExchange.java:586)
at app//io.undertow.vertx.VertxHttpExchange.writeBlocking0(VertxHttpExchange.java:533)
at app//io.undertow.httpcore.HttpExchangeBase.writeBlocking(HttpExchangeBase.java:236)
at app//io.undertow.server.HttpServerExchange.writeBlocking(HttpServerExchange.java:1006)
at app//io.undertow.servlet.spec.ServletOutputStreamImpl.flush(ServletOutputStreamImpl.java:209)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.flush(HttpServletResponseWrapper.java:158)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$FlushOperation.doWork(HttpServletResponseWrapper.java:99)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$AsyncOperation.work(HttpServletResponseWrapper.java:41)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.flushQueue(HttpServletResponseWrapper.java:230)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.queue(HttpServletResponseWrapper.java:210)
at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.asyncFlush(HttpServletResponseWrapper.java:171)
at app//org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.writeEvent(SseEventOutputImpl.java:352)
at app//org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.send(SseEventOutputImpl.java:294)
at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl.lambda$broadcast$4(SseBroadcasterImpl.java:156)
at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl$$Lambda$973/0x0000000100bd4440.apply(Unknown Source)
at java...@11.0.1/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
at java...@11.0.1/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
at java...@11.0.1/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:143)
at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl.broadcast(SseBroadcasterImpl.java:154)
at app//it.esselunga.ordbar.server.boundary.StreamResource.lambda$updateDevices$4(StreamResource.java:182)
at app//it.esselunga.ordbar.server.boundary.StreamResource$$Lambda$898/0x0000000100b72c40.accept(Unknown Source)
at java...@11.0.1/java.util.Optional.ifPresent(Optional.java:183)
at app//it.esselunga.ordbar.server.boundary.StreamResource.updateDevices(StreamResource.java:181)
.....
.....
.....
the code that generated that is (simplified, without business logic not related to Quarkus):
@Incoming("ordini")
public void updateDevices(String deviceKeyJson) {
Optional.ofNullable(statusUpdateRequests.get(deviceKey))
.ifPresent(l -> {
l.broadcast(event);
});
}
The waiting thread is never notified !!
request.connection().wait();
Consider that with current version this code is running for all our customers without the problem and I would like to understand what is going on so that I can avoid in the future.
Is a lost socket ? I've found some notify on the object, may be there is a code flow that lost that connection and so everything is "lost in space" ?
Thank you for your help.