Blocking the event loop from worker verticle!?!

941 views
Skip to first unread message

Steve Hummingbird

unread,
Aug 2, 2018, 11:46:56 AM8/2/18
to vert.x
I just came across a pretty strange situation, which I haven't been able to fully figure out yet (pretty scarce on time at the moment).

Basically the situation is as following:
- I have a worker verticle
- the worker verticle makes an async call to the mongo driver
- the callback of the mongo driver calls vertx.runOnContext and inside that, we just call Thread.sleep() to block

This leads to the following warning:

 Thread Thread[vert.x-eventloop-thread-14,5,main] has been blocked for 5457 ms, time limit is 2000


Note, this seems to be the main event loop(!!) and not the woker complaining about a long running task.

To make this go away, I found two things that work:
- Replace the call to the mongo driver with my own code which just spins up a new thread and sleeps before calling the callback. (so probably the mongo driver is doing something different than just ending up on another thread on the callback). This is actually pretty useless in practice.
- Remove vertx.runOnContext from the mongodb callback


In the end the question is: How is it possible to start on a worker verticle and end up blocking the main event loop? And why do I only see this when I use vertx.runOnContext (in combination with the mongo driver)?

I can try to put up a simple reproducer in case this is helpful, but I wanted to get in touch first, since maybe there is a misunderstanding about the general concept here.

Blake

unread,
Aug 2, 2018, 4:04:34 PM8/2/18
to vert.x
Hey Steve,

It makes sense you're blocking the event loop here, because you've already said that you're calling Thread.sleep() on the event loop yeah? In the mongo callback, you're sleeping. The mongo callback is caling runOnContext which means that it is running that callback on the vert.x context (i.e., event loop).

Steve Hummingbird

unread,
Aug 2, 2018, 4:23:20 PM8/2/18
to vert.x
not exactly. I am calling Thread.sleep() from within a *worker* verticle, which is designed to allow blocking code without breaking the main event loop. However, I am seeing a situation, where a blocking worker thread blocks the main event loop under certain circumstances (mongodb callback + vertx.runOnContext). So, I am wondering in which cases a worker thread could impair the main thread (except when we are running out of system resources)?

Blake

unread,
Aug 2, 2018, 4:35:15 PM8/2/18
to vert.x
I'm not sure how you're mongo call is going and all that, but maybe this will help:

If you run the Thread.sleep() WITHIN the vertx.runOnContext then you're blocking the event loop. If you call Thread.sleep() before the vertx.runOnContext then you shouldn't be blocking.

vertx.executeBlocking(() -> {
 
// worker stuff here. can block
 
Thread.sleep() // not blocking
  vertx
.runOnContext(() -> {
   
Thread.sleep() // blocking
 
})
})


I'm not sure what mongo library youre using, but vert.x provides a mongo-client that you can use that will take care of all this for you too.

If you were using the mongo client and you called Thread.sleep() within the callback you're going to be blocking too, because the callback is run on that verticles context (event loop).

Steve Hummingbird

unread,
Aug 2, 2018, 4:47:24 PM8/2/18
to vert.x
I am afraid this is a misunderstanding. I am able to call Thread.sleep after vertx.runOnContext within a worker verticle just fine without blocking. And I think this behaviour is intended, as a worker verticle should not impact the main event loop. However, I found this weird case where this does not hold true. I will need to dig into this further when I find time to do so. So I was wondering if there were known cases when a worker verticle could cause the main event loop to block.

The sleep is in reality a complex call which results in JNI invocation. But, I wanted to rule out that this was causing what I was seeing, so I just replaced it all with a simple sleep.

I think your example runs on the main event loop, but uses the executeBlocking feature. It might make sense that then the context you are put on then is one which impacts the main event loop. However, I am running on a dedicated worker thread, so I should be put on the worker context again, which usually seems to work fine.

Blake

unread,
Aug 2, 2018, 4:54:35 PM8/2/18
to vert.x
Gotcha Steve, didn't know you were running it all from a worker verticle. This does sound strange. I dunno if you can throw up a quick reproducer or not, but I or someone else might be able to see something from it. Otherwise this is just a lot of text and we're just having to imagine how it's interconnected.

Julien Viet

unread,
Aug 2, 2018, 5:15:03 PM8/2/18
to ve...@googlegroups.com
+1 for a reproducer.

thanks.

-- 
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/e7320f26-a2f9-4b02-8032-e0c183bb4160%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve Hummingbird

unread,
Aug 2, 2018, 7:43:51 PM8/2/18
to vert.x
I put together the simplest version to reproduce I could imagine. Short setup requirements are listed in the readme:


it basically just consists of a main verticle, which deploys a worker verticle. The worker verticle loads a document form db and simulates blocking processing by invoking Thread.sleep(). This causes the main verticle to block. 

sameer uddin

unread,
Aug 2, 2018, 7:59:50 PM8/2/18
to ve...@googlegroups.com
i get this when i debug the code i wonder if we could increase the time limit for the process

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

Blake

unread,
Aug 2, 2018, 8:19:40 PM8/2/18
to vert.x
Alright, so here's what's happening. When you run the mongo stuff you're on another thread (non vertx related). So when you run vertx.runOnContext it retrieves the thread that you're on and it's not a "context thread", so vertx creates a new event loop and then runs your handler.

To get back on context you need to do something like this (you need to get the context when you're still on a vert.x context, not when you're on another thread):

Context vertxContext = vertx.getOrCreateContext();
collection.find().forEach(o -> {
vertxContext.runOnContext(event1 -> {
try {
// doing something that blocks a while here
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
});

Blake

unread,
Aug 2, 2018, 8:30:02 PM8/2/18
to vert.x
You can set the blocked thread checker interval and warning time on VertxOptions. So in your dev environment you could turn this up so you don't get it.


On Thursday, August 2, 2018 at 6:59:50 PM UTC-5, sameer uddin wrote:
i get this when i debug the code i wonder if we could increase the time limit for the process
On Thu, Aug 2, 2018 at 8:46 AM, Steve Hummingbird <Steve.Hu...@yandex.com> wrote:
I just came across a pretty strange situation, which I haven't been able to fully figure out yet (pretty scarce on time at the moment).

Basically the situation is as following:
- I have a worker verticle
- the worker verticle makes an async call to the mongo driver
- the callback of the mongo driver calls vertx.runOnContext and inside that, we just call Thread.sleep() to block

This leads to the following warning:

 Thread Thread[vert.x-eventloop-thread-14,5,main] has been blocked for 5457 ms, time limit is 2000


Note, this seems to be the main event loop(!!) and not the woker complaining about a long running task.

To make this go away, I found two things that work:
- Replace the call to the mongo driver with my own code which just spins up a new thread and sleeps before calling the callback. (so probably the mongo driver is doing something different than just ending up on another thread on the callback). This is actually pretty useless in practice.
- Remove vertx.runOnContext from the mongodb callback


In the end the question is: How is it possible to start on a worker verticle and end up blocking the main event loop? And why do I only see this when I use vertx.runOnContext (in combination with the mongo driver)?

I can try to put up a simple reproducer in case this is helpful, but I wanted to get in touch first, since maybe there is a misunderstanding about the general concept here.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Jez P

unread,
Aug 3, 2018, 3:59:33 AM8/3/18
to vert.x
Wow, that's an omission from the blog post you referenced (I wrote it and I definitely didn't allow cover what happens when you run vertx.runOnContext on a non vertx thread). I've never run into that scenario before so it didn't figure in my thinking. I might update the original blog post to cover that and then pass to the vertx guys to update their reproduction. Thanks! I hope you found it useful even with the omission. 

You can store the context as a member of the verticle when it starts up since a verticle's context never changes, then reference in your handlers as you suggest (in other words the getOrCreateContext can be called during initialization of the verticle). 

Steve Hummingbird

unread,
Aug 3, 2018, 7:28:40 AM8/3/18
to vert.x
I am not sure we really are talking about the same thing here. But in any case this looks quite bad.

First off we should be very clear if this is intended behaviour. If it is intended that invoking vertx.runOnContext from a worker thread will end up on the main event loop and as such will block the event loop and as such the whole vertx application, this is really bad! 

Not only does this seem to not be documented anywhere, but it is recommended all over the web, that one should use runOnContext (often without being clear if its the vertx or the context one) to get back on the correct event loop. In reality calling runOnContext might do more harm than it will do good, as in worst case this will block the whole application. Not calling runOnContext however, will lead to degraded performance in worst case, as it became clear in this previous conversation https://groups.google.com/d/msg/vertx/KLbQs0eUSbU/XykWtuTtBwAJ 
I myself added vertx.runOnContext to a library I maintain due to a recommendation form the vertx team. But as I can see now, this can have horrible consequences when the library is invoked from within a worker verticle!

If this however, should not be intended behaviour, which I really hope it is. Then this is a bug, which should be fixed as soon as possible, as people might be running code like this without even knowing until that moment happens, where there were too many invocations and their app breaks. 

@Blake
Yes, this change seems to resolve the issue. However I think it really is dangerous that the vertx.runOnContext method might put you from a worker thread onto the main event loop without at least clearly warning people about that in the docs.

@sameer
You probably should only do this while debugging something on the main event loop. The warning is there for all the right reasons in this case, because what actually is happening, is that this breaks the whole vertx application! We actually never intended to end up on the main event loop in first place.

Julien Viet

unread,
Aug 3, 2018, 9:10:16 AM8/3/18
to ve...@googlegroups.com
Hi,

you are not execute runOnContext from a Vert.x worker thread.

you are executing runOnContext from a non Vertx. thread (Mongo thread because you are using the real Mongo Client and not the Vert.x Mongo client) which result in creating an event-loop context.

if you want to be on the context thread of your worker then as said previously, you should not call vertx.runOnContext but instead capture the current context and then run on this context.

In short what you are doing now is:

// Vert.x worker verticle thread
new Thread() {
  public void run() {
    vertx.runOnContext(v -> {
      // Run on event loop
    });
  }
}.run();

which is equivalent to:

// Vert.x worker verticle thread
new Thread() {
  public void run() {
    // Get the current context, since we are not on vertx thread then it creates a new context
    Context ctx = vertx.getOrCreateContext();
    ctx.runOnContext(v -> {
      // Run on event loop
    });
  }
}.run();

what you should do instead is:

// Vert.x worker verticle thread
// Get the current context, since we are on the worker thread then we get the worker context
Context ctx = vertx.getOrCreateContext();
new Thread() {
  public void run() {
    ctx.runOnContext(v -> {
      // run on the worker thread which can block fine
    });
  }
}.run();

HTH

Julien



Steve Hummingbird

unread,
Aug 3, 2018, 9:40:43 AM8/3/18
to vert.x
Thanks Julien. This is clear by now. Yes, I am not calling vertx.runOnContext from a vertx worker thread, but all method calls were started on a worker thread. It still is a little weird that one can run code from a worker thread, which will end up blocking the main event loop, yet it can easily be explained why that happens.The actual issue here is that it is extremely difficult to find out how things that make use of threads should be dealt with in vertx. These really are two sets of problems:

- how the internals of third party code works. One basically needs to look through every library and figure out if there is something involved which would impact the thread or the vertx context and take appropriate measures
- how vertx and application code actually works and in which context they are used

I have put quite a lot of time in reading everything I could find concerning this topic, asked quite a few questions here on the forum and yet every few month still something crawls up which is totally unexpected and very often not clearly documented.

I think vertx.runOnContext should almost never be used, unless one can make absolutely sure that this code will never ever be called from a worker context. But I really don't know what would be an appropriate measure to ensure that.

Secondly, one should never ever recommend to make use of runOnContext without specifying that this is the context.runOnContext method or make sure code will never ever be called from a worker context. The consequences are just too horrific :/

Blake

unread,
Aug 3, 2018, 9:42:39 AM8/3/18
to vert.x
Hey Steve,

I agree with you that it's more of an omission from the docs and it's unclear that would happen, because as you say it's recommended in may places to just call vertx.runOnContext() to get back on and this isn't the case if you're NOT on a vert.x thread.
However, I think you may be misunderstanding? It's NOT because you're running on a worker verticle. It's because you're running on a non-vertx thread entirely. When you call collection.find().forEach({ // this is a different thread that has nothing to do with vert.x}) and when you call runOnContext in it, it determines the thread is a non-vertx thread and creates a new event loop.

So technically you're NOT blocking the "main" event loop either. It's starting an entirely NEW event loop (but then it informs you that you're blocking it with the sleep call). (Just tacking this on cause i see you wrote a response as I was writing this) and it seems the above stuff you just figured out too.

Cheers,,
Blake

Steve Hummingbird

unread,
Aug 3, 2018, 10:16:52 AM8/3/18
to vert.x
So technically you're NOT blocking the "main" event loop either. 

are you sure about this? If I issue multiple requests against the server I can see that the server becomes unresponsive quickly (using the code I put up on github). After the thread no longer sleeps, the server becomes responsive again. So I think something is blocking the main event loop after all. But maybe I am just running out of fresh event loops :D

Thank you for clarifying.

Blake

unread,
Aug 3, 2018, 10:22:39 AM8/3/18
to vert.x
I'm not so sure you need to know that much about the underlying third party library? Generally, if it's async and you have to pass in a callback, then that callback is going to be run on their thread (non-vertx. whether it be worker or regular event loop).

I'm not sure if you meant vertx worker context when you said called from a "worker context", but if you did: it IS okay to call runOnContext from a worker context as it will put you back on that worker's context and you're good to go. It's calling it from a non-vertx thread that is the problem as that will create a new event loop.

**Steve you're right, I mispoke about the creating an entirely new event loop. It will create a new event loop CONTEXT, but not event loop (will use same old event loop). My bad!

Steve Hummingbird

unread,
Aug 3, 2018, 10:43:19 AM8/3/18
to vert.x
Well, yes. In the end it is not very complex: whenever in doubt, store the context and afterwards put yourself on the context again. And then never ever use the vertx.runOnContext method unless you do find a good reason to.

The thing about third party stuff is that there is a lot of code out there which makes use of threads where you probably would not expect it. May CompletableFuture serve as an example.

I think one thing would be totally awesome: A section in the docs that mentions how to deal with code which uses threads. Like what can happen if you run something on a different context or thread. What is not safe to do when on a different context/thread (e.g. what will happen if you do it anyways). I think most people only stumble upon that stuff when they are already hitting an issue and most likely have already written a lot of code they now need to refactor. And then they start reading various postings or blogs and often only get an incomplete picture.

Julien Viet

unread,
Aug 4, 2018, 4:31:55 AM8/4/18
to ve...@googlegroups.com
I fully agree with you about Vertx#runOnContext

we should add a note that this should be used with care when used from a non vertx thread in javadoc at least.

and perhaps deprecate it

Julien

Julien Viet

unread,
Aug 4, 2018, 4:33:11 AM8/4/18
to ve...@googlegroups.com

On 3 Aug 2018, at 16:43, Steve Hummingbird <Steve.Hu...@yandex.com> wrote:

Well, yes. In the end it is not very complex: whenever in doubt, store the context and afterwards put yourself on the context again. And then never ever use the vertx.runOnContext method unless you do find a good reason to.

The thing about third party stuff is that there is a lot of code out there which makes use of threads where you probably would not expect it. May CompletableFuture serve as an example.

I think one thing would be totally awesome: A section in the docs that mentions how to deal with code which uses threads. Like what can happen if you run something on a different context or thread. What is not safe to do when on a different context/thread (e.g. what will happen if you do it anyways). I think most people only stumble upon that stuff when they are already hitting an issue and most likely have already written a lot of code they now need to refactor. And then they start reading various postings or blogs and often only get an incomplete picture.

yes that would be a good thing 

Steve Hummingbird

unread,
Aug 4, 2018, 8:26:13 AM8/4/18
to vert.x
I refactored all the code to properly store the context before ending up on another thread and replaced all invocations of vertx.runOnContext with the contexts method. Looks like everything works as it should now.

Thank you for all your help and comments. Hopefully this will be helpful to others in dealing with all the context stuff.

Steve Hummingbird

unread,
Aug 4, 2018, 8:31:03 AM8/4/18
to vert.x
+1 for deprecating vertx.runOnContext, as this can be really dangerous when built into libraries or gets hidden away in the implementation details of methods

Julien Viet

unread,
Aug 4, 2018, 10:40:58 AM8/4/18
to ve...@googlegroups.com
it will give you also the benefit to have better performances

-- 
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Steve Hummingbird

unread,
Aug 4, 2018, 12:12:31 PM8/4/18
to vert.x
Are there any details available somewhere, what causes performance penalties when code is executed on a different thread/context? I would really like to read up on that.
Reply all
Reply to author
Forward
0 new messages