HttpClient deadlock

246 views
Skip to first unread message

Nat

unread,
Nov 17, 2015, 1:48:46 PM11/17/15
to vert.x
It looks like HttpClient can run into a deadlock situation especially if it runs outside of Vert.x context due to the lock ordering between HttpClientRequest and Connection. This seems to happen often enough that it seems to be a major bug.


main is waiting to lock io.vertx.core.http.impl.ClientConnection@701253e3 which is held by vert.x-eventloop-thread-1
vert.x-eventloop-thread-1 is waiting to lock io.vertx.core.http.impl.HttpClientRequestImpl@51f9223b which is held by main


Thread stacks


main [BLOCKED; waiting to lock io.vertx.core.http.impl.ClientConnection@701253e3]
 io.vertx.core.net.impl.ConnectionBase.writeToChannel(ConnectionBase.java:101)
 io.vertx.core.http.impl.HttpClientRequestImpl.write(HttpClientRequestImpl.java:665)
 io.vertx.core.http.impl.HttpClientRequestImpl.write(HttpClientRequestImpl.java:182)
 io.vertx.core.http.impl.HttpClientRequestImpl.write(HttpClientRequestImpl.java:56)


vert.x-eventloop-thread-1 [BLOCKED; waiting to lock io.vertx.core.http.impl.HttpClientRequestImpl@51f9223b]
 io.vertx.core.http.impl.HttpClientRequestImpl.handleDrained(HttpClientRequestImpl.java:320)
 io.vertx.core.http.impl.ClientConnection.handleInterestedOpsChanged(ClientConnection.java:256)
 io.vertx.core.net.impl.VertxHandler$$Lambda$138/1356631125.run(unknown source)
 io.vertx.core.impl.ContextImpl.lambda$wrapTask$16(ContextImpl.java:333)
 io.vertx.core.impl.ContextImpl$$Lambda$11/544152609.run(unknown source)
 io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:225)
 io.vertx.core.net.impl.VertxHandler.channelWritabilityChanged(VertxHandler.java:68)
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:376)
 io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:358)
 io.netty.channel.ChannelInboundHandlerAdapter.channelWritabilityChanged(ChannelInboundHandlerAdapter.java:119)
 io.netty.channel.CombinedChannelDuplexHandler.channelWritabilityChanged(CombinedChannelDuplexHandler.java:202)
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:376)
 io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:358)
 io.netty.channel.DefaultChannelPipeline.fireChannelWritabilityChanged(DefaultChannelPipeline.java:861)
 io.netty.channel.ChannelOutboundBuffer$2.run(ChannelOutboundBuffer.java:583)
 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
 io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
 java.lang.Thread.run(Thread.java:744)



Tim Fox

unread,
Nov 17, 2015, 1:53:53 PM11/17/15
to ve...@googlegroups.com
Which version of Vert.x is this?
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at http://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/ba861d5b-deb4-4e67-b048-9dd2ecf3bb2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nat

unread,
Nov 17, 2015, 2:09:09 PM11/17/15
to vert.x
Vertx 3.1

Tim Fox

unread,
Nov 17, 2015, 2:12:04 PM11/17/15
to ve...@googlegroups.com
Thanks Nat.

I think you know the next question. Do you have a reproducer? ;)

Nat

unread,
Nov 17, 2015, 2:28:39 PM11/17/15
to vert.x
Not consistently. It only happens if the handleDrain is called during the write. The simple reproducer is simply to create a new HttpClient and try to write a lot of data from the main thread.

Tim Fox

unread,
Nov 18, 2015, 10:40:35 AM11/18/15
to ve...@googlegroups.com
Could you provide the complete stack dump please?


On 17/11/15 18:48, Nat wrote:

Nat

unread,
Nov 18, 2015, 10:59:09 AM11/18/15
to vert.x
That is pretty much the entire stack trace. The one you see from main thread is the code to write out the data.

Tim Fox

unread,
Nov 18, 2015, 11:46:43 AM11/18/15
to ve...@googlegroups.com
I've changed the locking scheme in the client_deadlock branch - could you try that out and see if it works for you?

Nat

unread,
Nov 18, 2015, 12:21:00 PM11/18/15
to vert.x
The problem seems to happen a lot less. From looking at the code, it does not seem to be safe enough. conn variable might not be visible to main thread. 

Tim Fox

unread,
Nov 18, 2015, 1:19:36 PM11/18/15
to ve...@googlegroups.com
On 18/11/15 17:21, Nat wrote:
The problem seems to happen a lot less.

Does it still happen? If so, can you provide the stack trace?

Nat

unread,
Nov 18, 2015, 1:49:25 PM11/18/15
to vert.x
I only caught one glimpse of that once. I didn't manage to capture it.

Tim Fox

unread,
Nov 19, 2015, 3:15:14 AM11/19/15
to ve...@googlegroups.com
Ok will take another look.

BTW.... I'd strongly recommend you write your code so it plays nicely with the Vert.x threading model, e.g. if you always use your HttpClientRequest from the same context then you will avoid issues like this, and performance will be better too. You get that for free if you use verticles.

Nat

unread,
Nov 19, 2015, 10:36:03 AM11/19/15
to vert.x
I agree but if you only use vert.x to write a client application or library. You will be likely to run into this scenario as you do not control the caller.

Tim Fox

unread,
Nov 20, 2015, 5:17:56 AM11/20/15
to ve...@googlegroups.com
Even if you're writing a client you can use vertx.runOnContext to ensure the Vert.x API is always executed on the correct context.

Tim Fox

unread,
Nov 20, 2015, 6:22:48 AM11/20/15
to ve...@googlegroups.com
I've made some more changes - could you try again please?


On 18/11/15 18:49, Nat wrote:
Message has been deleted

Tim Fox

unread,
Nov 20, 2015, 1:52:22 PM11/20/15
to ve...@googlegroups.com
Could you elaborate?

On 20/11/15 18:10, Nat wrote:
It looks like you haven't called getLock() from most of the places. Did you miss that?

Nat

unread,
Nov 20, 2015, 1:53:51 PM11/20/15
to vert.x
I haven't been able to reproduce it yet. That's a good sign. I will try to find a way to consistently repro it.

Jochen Mader

unread,
Nov 23, 2015, 3:02:35 AM11/23/15
to ve...@googlegroups.com
Just a suggestion for reproducing the problem:
I use ByteMan for debugging such issues.
It's a great tool where you have fine grained control over individual threads.
Creating the necessary conditions for reproducing Race-Conditions and Deadlocks is far easier with it than trying manually.


For more options, visit https://groups.google.com/d/optout.



--
Jochen Mader | Lead IT Consultant

codecentric AG | Elsenheimerstr. 55a | 80687 München | Deutschland
tel: +49 89 215486633 | fax: +49 89 215486699 | mobil: +49 152 51862390
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de

Sitz der Gesellschaft: Düsseldorf | HRB 63043 | Amtsgericht Düsseldorf
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
Reply all
Reply to author
Forward
0 new messages