multiple javascript verticle instances and thread safety

Hoobajoob

unread,

Sep 21, 2018, 10:41:42 AM9/21/18

to vert.x

We recently found a problem making use of a third party javascript library in a module that is used by a javascript verticle. The problem only exhibits if there are multiple instances of this js verticle and two or more of the verticle instances are assigned to different threads, and we get at least two simultaneous requests forwarded to verticle instances on different threads.

My question is: given that Nashorn has limitations in multithreaded usage, how has the vertx-lang-js implementation approached the use of nashorn to protect against these issues? It is very common in javascript libraries to have presumed that module-global data is not going to be accessed from multiple threads.

The fact that we encountered this problem indicates that something is missing in the current implementation. Has anyone else out there encountered similar issues?

Hoobajoob

unread,

Sep 21, 2018, 1:37:49 PM9/21/18

to vert.x

Here is a description of a simple way I have found to reproduce this issue. I started with a javascript module called 'onePlusOne.js' that uses module global data, like so (which I adapted from this article):

var i = 0;
module.exports = {
  add: function() {
    i = 0;
    i += 1;
    var shortly_later = new Date()/1000 + Math.random;
    while( (new Date()/1000) < shortly_later) { Math.random() } //prevent optimizations
    i += 1;
    return i;
  }
};

Now, use the vertx-lang-js require implementation to create a variable bound to this module from inside a js verticle's vertxStart function, and use it to produce results for event bus requests to add 1+1:

\var adder = require('./onePlusOne');

vertx.eventBus().consumer('doOnePlusOne', function(message) {
  var onePlusOne = adder.add();
  message.reply({result: onePlusOne })
});

Now from somewhere else in the program, deploy the above verticle with, say, 8 instances, and then launch a bunch of threads that all ask it to add 1+1. The snippet below is from a junit test using vertx-unit's TestContext.async() to provide synchronous Callables.

        ExecutorService executor = Executors.newCachedThreadPool();
        ArrayList<Future<Integer>> results = new ArrayList<>();

        for(int i = 0; i < 50; i++) {
            results.add(executor.submit(() -> {
                Async added = theContext.async();
                JsonObject addition = new JsonObject();
                vertx.eventBus().<JsonObject>send("doOnePlusOne", new JsonObject(), ar -> {
                    if ( ar.succeeded() ) {
                        addition.put("onePlusOne", ar.result().body().getInteger( "result" ));
                    }
                    added.complete();
                });
                added.awaitSuccess();

                return addition.getInteger( "onePlusOne" );
            }));
        }

When I run the above code against a deployment with 8 instances of the js verticle, I get summary results like this:

Overall: 26 wrong values for 1 + 1, 24 right ones.
Overall: 16 wrong values for 1 + 1, 34 right ones.
Overall: 18 wrong values for 1 + 1, 32 right ones.
Overall: 25 wrong values for 1 + 1, 25 right ones.

If I change the instance count for the js verticle to 1, no problem:

Overall: 0 wrong values for 1 + 1, 50 right ones.
Overall: 0 wrong values for 1 + 1, 50 right ones.
Overall: 0 wrong values for 1 + 1, 50 right ones.
Overall: 0 wrong values for 1 + 1, 50 right ones.

The issue of course, is that the entire javascript ecosystem of libraries is designed and implemented for the traditional javascript engine hosts, which all promise that their module global data is not going to be accessed from multiple threads.

At the moment, unless I have missed something about how best to implement a js verticle, it appears that vertx-lang-js does not create independent global spaces that are thread-safe for different js verticle instances.

Please let me know if I have missed something or can use an approach that won't constrain all non-trivial js verticle instance counts to 1.

Blake

unread,

Sep 21, 2018, 6:41:54 PM9/21/18

to vert.x

I'm not sure what JS library you're using that has globals that are being modified in a way that messes up each verticle, but I don't think you're going to have a very good workaround here. I hope someone else comments and can prove that statement wrong. The obvious one is if they're in different VMs you're fine, but that's not so practical. In your example, you could have modified the addition to work with a locally scoped variable, but that's about as far as my recommendation goes for dealing with it. It doesn't matter if you're running JS or Java, if you modify a global in multiple threads you're going to have a problem if you don't implement some type of synchronization.

Hoobajoob

unread,

Sep 21, 2018, 7:02:18 PM9/21/18

to vert.x

This isn’t a programming problem with the individual js libraries.

Nashorn has some facilities to isolate module global data that it appears vertx-lang-js isn’t using when verticle instances are created.

It would be impractical to constrains the use of js within vertex to only those libs that were designed to be re-entrant. You’re likely to find about zero libs even mentioning reentrancy. There is no practical reason for any js lib author to do so when there is not a single standard js engine that exposes module global data to access from multiple threads.

It’s likewise impractical to constrain js verticles to deployments where the instance count is 1.

This is to not to criticize either js or vertx-lang-js, but to highlight what I found, whether there is any current workaround and, if not, I’m happy to add a new issue for vertx-lang-js. I just wanted to get feedback on these findings.

I’m not a nashorn expert either, so I don’t know what options there are with the as-is jdk 8 libs

Paulo Lopes

unread,

Sep 22, 2018, 3:28:38 PM9/22/18

to vert.x

Hoobajoob, i guess you're thinking of loadWithNewContext function of nashorn. That function loads a script with a isolated new context or global if you wish. However we can't use it as no module would see other modules.

Javascript has no spec for how to behave in a multi threaded world, and it's assumed that it always runs on a single thread, so the synchronization issues you're seeing are not related to vertx or nashorn, but because the lib you're using never considered the case you would run on a multi threaded environment and implemented proper synchronization.

Now, you can use vertx shared data to help you or avoid using threads yourself with the executor code you're using and just use the vertx event loop.

Hoobajoob

unread,

Sep 22, 2018, 5:21:11 PM9/22/18

to vert.x

I agree the factory couldn't just load the verticle js by itself using loadWithNewContext, but it seems that it should be possible to bring in all of the other vertx globals and jvm-npm definitions into that same context by including them in the scope that's loaded by loadWithNewContext.

I don't know Nashorn from a hole in the ground, but maybe something like this in the verticle factory?

String loadJvmNpm = "load({name: 'jvm-npm.js', script: \""+jvmNpmAsString + "\"});";
String loadVertxGlobals = "load({name: 'vertx-globals', script: \"" + vertxGlobalDefsAsString + "\"});";
String verticleRequire = "require.noCache('" + verticleName + "', null, true);";

String fullContext = loadJvmNpm + loadVertxGlobals + verticleRequire;
String loadFullContext = "loadWithNewGlobal({script: \"" + fullContext + "\", name: \"" + verticleName + "\"});";

engine.eval(loadFullContext);

Paulo Lopes

unread,

Sep 23, 2018, 1:37:33 AM9/23/18

to vert.x

You can't do that as you can't serialize all "globals" to a string as you will miss every memory reference. Also for clarification vertx Lang js only has 1 global variable set by the vertical factory, the vertx instance.

Any other global is coming from your libraries so if you want to avoid global state you'll need to fix the whole ecosystem I'm afraid.

Anyway you should look as shared data in vertx as it includes structures that allow you safely to work across threads or verticles.

Hoobajoob

unread,

Sep 23, 2018, 1:54:08 AM9/23/18

to vert.x

Out of curiosity, I downloaded the source distribution for vertx-lang-js (for version 3.4.1, the version we're using from here) and tweaked the verticle loading process to make use of loadWithNewContext and populate it with the vertx globals and jvm-npm implementation. The unit tests all seem to still be passing with this modification, but I haven't tried using the revised jar to see if it would solve the onePlusOne example above. I'll try that next.

The approach I used was:

1) load jvm-npm.js into the new scope using Nashorn's classpath 'load' and 'loadWithNewContext', like so

2) add the vertx instance to the new scope, like so

3) add the remaining vertx globals to the new scope like so

4) load the requested verticle script to the new scope like so, and proceed with calling its start function

The contents of the additional resource files referenced in the different load statements are also in the same gist.

Hoobajoob

unread,

Sep 23, 2018, 2:02:59 AM9/23/18

to vert.x

With this re-built jar substituted in, the example above now consistently reports no errors under a deployment of 8 instances of that js verticle:

Paulo Lopes

unread,

Sep 23, 2018, 4:27:43 AM9/23/18

to vert.x

For simple scripts it should work but you will probably run into trouble when you require modules which have state on their own. In that case the state won't be visible to the caller modules and you behavior will be undefined.

Paulo Lopes

unread,

Sep 23, 2018, 4:33:06 AM9/23/18

to vert.x

What could work and would be more future proof (say compatible with graaljs) would be having a script engine per thread instead of 1 per verticle factory.

Hoobajoob

unread,

Sep 23, 2018, 9:36:18 AM9/23/18

to vert.x

An engine instance per verticle seems like it might be just as straightforward, maybe simpler.

Do you have an example of a scenario that would break under the approach in this example? I’m not sure I understand what you mean by requiring modules that “have state on their own”?

Hoobajoob

unread,

Sep 23, 2018, 9:47:14 AM9/23/18

to vert.x

...misread your comment about engine-per-thread, was thinking instance per verticle. It’s not clear to me that it’s important or necessary to establish an engine per thread. There are a couple of exchanges with the nashorn team out their that seem to suggest so. I think the link below is one that addresses that question specifically:

https://stackoverflow.com/a/30159424

Hoobajoob

unread,

Sep 23, 2018, 10:54:50 AM9/23/18

to vert.x

@Paulo: I need to determine the most viable approach in the near term to fixing our current problem with 3.4.1 where we need for js-verticle instances to have isolated module global spaces. Can you see any obstacle to using the example gist here as our own custom verticle factory, using a different prefix (say 'js-mt' or something)? The only possible coupling I see in vertx-lang-js with the current verticle factory is in jvm-npm.js here:
https://github.com/vert-x3/vertx-lang-js/blob/master/src/main/resources/vertx-js/util/jvm-npm.js#L39

...I'm not sure, but it seems like that reference to the current js verticle factory is used mostly for 'require' module bookkeeping rather than an implementation dependency. Is that right?

I'm going to give it a shot at any rate - please let me know if there are any complications with this approach.

I understand everybody's busy digging into graal, but is it worth adding an issue to vertx-lang-js for this?

Paulo Lopes

unread,

Sep 23, 2018, 2:06:25 PM9/23/18

to vert.x

Yes having 1 engine per verticle is a safer bet because of threading. Say that the js code never calls java code that spawns threads, then 1 engine per thread would be always safe as event loops don't overlap when handling request, but in cases like you illustrated there can be threads involved and in this case it can be problematic.

I'd like to see your test on https://github.com/reactiverse/es4x as I'm proposing it as the future direction of js in vertx.

Hoobajoob

unread,

Sep 23, 2018, 5:20:01 PM9/23/18

to vert.x

Feel free to grab the test files - i added them to the gist.

junit test here: https://gist.github.com/chefhoobajoob/ea20097cf441a233292e923400e9b76c#file-tests-java

js verticle (test resource): https://gist.github.com/chefhoobajoob/ea20097cf441a233292e923400e9b76c#file-addingverticle-js

onePlusOne script (test resource): https://gist.github.com/chefhoobajoob/ea20097cf441a233292e923400e9b76c#file-oneplusone-js

Paulo Lopes

unread,

Sep 25, 2018, 8:24:10 AM9/25/18

to vert.x

Hi,

I've added your tests and I've both good and bad news :)

Good news is that verticle isolations works out of the box with ES4X as I've added many extra fixes to npm-jvm loader and use a context per verticle instance.

Bad news is that accessing a context from 2 different threads on graal is not allowed at all (https://github.com/graalvm/graaljs/blob/master/docs/user/JavaInterop.md#multithreading). This means that, if you're planning to upgrade in the future to use graal as the base runtime because you want to take advantage of ES6/7 features such as classes, promises, generators, async/await, etc... your current code is not supported.

I think you can propose a pull request to create a new context per verticle to vertx-lang-js as that will be probably useful for other users who might be facing the same issue you reported.

Hoobajoob

unread,

Sep 25, 2018, 10:23:57 AM9/25/18

to vert.x

I'm not familiar at all with graal, but from what little I can pick up in the link you provided, I'm not sure I where there would be a problem:

"Graal JavaScript supports multithreading by creating several Context objects from Java code. Multiple JavaScript engines can be created from a Java application, and can be safely executed in parallel on multiple threads."

Why would vertx on graal intentionally be accessing the same graaljs context from multiple threads?

To be clear, our current issue isn't coming from our own code creating threads. We don't create threads or thread pools of our own. The only thread management going on in our code is provided entirely by vertx.

Our scenario is this:

1. 4 instances of our js REST verticle are deployed by vertx. Each of these uses vertx web to setup routes and handle http requests

2. 4 instances of a different js verticle are deployed. Each instance of this js verticle listens for event bus requests, calls into some js libraries to compute the result, and sends a response over the event bus

3. Somewhere else in the network, a bunch of REST clients start issuing http requests that are handled by the REST verticles in #1 above.

4. At least 2 requests are received nearly simultaneously in at least 2 of the REST verticles, and each issues an event bus request to the address that the verticles from #2 are listening on.

5. Because vertx has placed at least 2 of the verticles from #2 on separate event loop threads, at least 2 simultaneous requests can result in two different vertx-managed threads delivering eventbus messages to 2 different js verticle instances simultaneously

6. Because the verticles were not created with separate global spaces, bad results occur

If, under the graal implementation, vertx can guarantee that no 2 vertx-managed threads will call into the same graaljs context, then it doesn't seem like there would be a problem.

Paulo Lopes

unread,

Sep 25, 2018, 10:39:36 AM9/25/18

to vert.x

Your scenario works fine with graal as long as there are no execute blocking calls. In that case because the context is linked to the instance if a function is executed in the event loop and at the same time another is running on the worker pool 2 threads are trying to access the same context and that is not allowed. To bypass the issue all blocking code should be isolated and run on independent verticles.

Since vertx has no knowledge of what language it is running it knows nothing about graal so there's no state / context propagated across calls so we could properly isolate the calls.

Hoobajoob

unread,

Sep 25, 2018, 10:49:13 AM9/25/18

to vert.x

Ok - I gotcha. We're not currently making executeBlocking calls in any of our js verticles, but I can see why that would lead to trouble. Duly noted, Paulo.

Thanks!

Reply all

Reply to author

Forward