[RESTEasy] Blocked server upgrading to a new Quarkus version

1,049 views
Skip to first unread message

Luca Masini

unread,
Jul 7, 2021, 1:51:40 AM7/7/21
to Quarkus Development mailing list
Hi guys, we are in a process of upgrading a very old Quarkus application, the first we wrote that went in production in summer 2019.

It's a long process because everytime we encounter blocking problems that force us to rollback one step to understand.

Now we have also a canary environment, but we was unable to spot the problem, given the low number of user that are routed there.

Yesterday we upgraded from 1.4.1.Final to 1.8.0.Final, last step before 1.13.x (and RestEasy-Reactive) when we will plan an upgrade to Quarkus 2.

We needed to rollback because all event thread were blocked with this WARNING:

06/07/2021 13:48:10,858 WARNING [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-0,5,main]=Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4025547 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked at java...@11.0.1/java.lang.Object.wait(Native Method) at java...@11.0.1/java.lang.Object.wait(Object.java:328) at app//io.undertow.vertx.VertxHttpExchange.awaitWriteable(VertxHttpExchange.java:586) at app//io.undertow.vertx.VertxHttpExchange.writeBlocking0(VertxHttpExchange.java:533) at app//io.undertow.httpcore.HttpExchangeBase.writeBlocking(HttpExchangeBase.java:236) at app//io.undertow.server.HttpServerExchange.writeBlocking(HttpServerExchange.java:1006) at app//io.undertow.servlet.spec.ServletOutputStreamImpl.flush(ServletOutputStreamImpl.java:209) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.flush(HttpServletResponseWrapper.java:158) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$FlushOperation.doWork(HttpServletResponseWrapper.java:99) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$AsyncOperation.work(HttpServletResponseWrapper.java:41) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.flushQueue(HttpServletResponseWrapper.java:230) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.queue(HttpServletResponseWrapper.java:210) at app//org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.asyncFlush(HttpServletResponseWrapper.java:171) at app//org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.writeEvent(SseEventOutputImpl.java:352) at app//org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.send(SseEventOutputImpl.java:294) at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl.lambda$broadcast$4(SseBroadcasterImpl.java:156) at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl$$Lambda$973/0x0000000100bd4440.apply(Unknown Source) at java...@11.0.1/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106) at java...@11.0.1/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235) at java...@11.0.1/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:143) at app//org.jboss.resteasy.plugins.providers.sse.SseBroadcasterImpl.broadcast(SseBroadcasterImpl.java:154) at app//it.esselunga.ordbar.server.boundary.StreamResource.lambda$updateDevices$4(StreamResource.java:182) at app//it.esselunga.ordbar.server.boundary.StreamResource$$Lambda$898/0x0000000100b72c40.accept(Unknown Source) at java...@11.0.1/java.util.Optional.ifPresent(Optional.java:183) at app//it.esselunga.ordbar.server.boundary.StreamResource.updateDevices(StreamResource.java:181)
.....
.....
.....

the code that generated that is (simplified, without business logic not related to Quarkus):

@Incoming("ordini")
public void updateDevices(String deviceKeyJson) {
      Optional.ofNullable(statusUpdateRequests.get(deviceKey))
               .ifPresent(l -> {
                      l.broadcast(event);
               
});
}

The waiting thread is never notified !!

request.connection().wait();

Consider that with current version this code is running for all our customers without the problem and I would like to understand what is going on so that I can avoid in the future.

Is a lost socket ? I've found some notify on the object, may be there is a code flow that lost that connection and so everything is "lost in space" ?

Thank you for your help.

Stuart Douglas

unread,
Jul 7, 2021, 2:09:42 AM7/7/21
to Luca Masini, Quarkus Development mailing list
That thread should never be use to do blocking IO, as it is the event loop thread. The way blocking IO is implemented it actually needs the event loop thread to notify the thread to unblock, so if the event loop thread is blocked you deadlock.

There is supposed to be a guard against this which will throw an IOException, but it looks like if you are using Undertow and you do a 'flush' on the IO thread instead of a 'write' the guard is missed, so you get a deadlock.

The root cause here is that l.broadcast is a blocking operation with RESTEasy, so you can't do it from an IO thread, you need to dispatch to a worker. This may be as simple as adding @Blocking to the @Incoming method but I am not sure which version that was added.

Stuart



--
You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/8603fdff-a0f0-4232-bf35-3ff01503f0dbn%40googlegroups.com.

clement escoffier

unread,
Jul 7, 2021, 2:10:22 AM7/7/21
to luca....@gmail.com, Quarkus Development mailing list
Hello,

What’s happening in l.broadcast(event)? 
Did you try adding the @Blocking annotation on the updateDevices method? (1.8.0 is quite old, I don’t remember if we added that already).

Clement

--

Stuart Douglas

unread,
Jul 7, 2021, 2:12:58 AM7/7/21
to Luca Masini, Quarkus Development mailing list
https://github.com/quarkusio/quarkus-http/pull/79 should make it throw an exception now.

Stuart

Luca Masini

unread,
Jul 7, 2021, 2:18:49 AM7/7/21
to Stuart Douglas, clement escoffier, Quarkus Development mailing list
Hi Clement and Stuart !!!

Thank you for the support understanding. @Stuart Douglas I like that exception so that in the future we won't miss the @Blocking annotation, and yes @clement escoffier it was present in 1.8 so we added.

What  I don't understand is why everything is working in 1.4, we don't have any flush or direct access to the stream, the only interaction is with that broadcast method.

And I can't replicate it on my workstation of course, and even in canary environment.


--
****************************************
http://www.lucamasini.net
http://twitter.com/lmasini
http://www.linkedin.com/pub/luca-masini/7/10/2b9
****************************************

Stuart Douglas

unread,
Jul 7, 2021, 11:38:13 PM7/7/21
to Luca Masini, clement escoffier, Quarkus Development mailing list
At one point we had multiple vert.x instances, so the messaging and the HTTP vert.x instance would never share an IO thread. In previous versions this code might have blocked an IO thread and potentially caused a perf issue, but it would not deadlock.

It's hard to reproduce without a lot of load, as you need to have both the message and the HTTP request share the same IO thread (and there are 2*cores by default), and IO needs to actually block rather than just being written out immediately.

Stuart

Luca Masini

unread,
Jul 8, 2021, 2:47:34 AM7/8/21
to Stuart Douglas, clement escoffier, Quarkus Development mailing list
Thank you Stuart, I hope that we will not have other "show stopper" in the migration path so that we can use reactive without blocking the I/O threads.

Have a nice day.
Reply all
Reply to author
Forward
0 new messages