Re: [vertx:40306] Can a single verticle instance use all cores in a machine?

Message has been deleted

Thomas SEGISMONT

unread,

Mar 27, 2017, 4:37:27 AM3/27/17

to ve...@googlegroups.com

Each verticle instance is assigned a different event loop. The context is an execution abstraction which can either be backed by an event loop (standard verticle) or a pool of worker threads (worker verticle). Different verticles (and thus context) can be backed by the same event loop. An event loop is really just a thread. The operating system decides where the thread will run. It could start on one core and move to another core (or socket) later. Vert.x does not set thread affinity (there is not even Java APIs for this).

So, your verticle instance could run on different cores, but is seems strange that it consumes 100% cpu on all cores.

As a rule of thumb, you can deploy a verticle instance per core. But since verticle code is very different from one project to another, it's best to measure the results with some performance testing.

2017-03-27 8:34 GMT+02:00 Pradp <pradeep....@gmail.com>:

Hi,

I want to understand the relationship between the verticle instance - event loop - context - cpu cores. From the documents I understood that each vertcile instance will have one event loop mapped too. In a sample program I tried with a single verticle instance I was able to max out all the cpu cores in the machine. So, does it mean that single event loop is able to use all the cpu cores in the machine? If this is the case when should I have to increase the verticle instance number when I start the vertx instance. Is there a guide or formula to figure out how many instances of verticle is optimum to spin out based on the number of cores or any?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/4f37b14a-20ca-409e-ae04-2b4b5598c7ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Jez P

unread,

Mar 27, 2017, 8:37:43 AM3/27/17

to vert.x

Doesn't the rxJava stuff happen on the fork/join thread pool by default? So you may involve additional threads (and hence additional CPUs). In your flatmap call, could you try outputting the thread name?

On Monday, March 27, 2017 at 12:32:13 PM UTC+1, Pradp wrote:

Below is the program I used to check whether the vertx with single event loop can make use of all the cores in the machine. When I execute this and saw the CPU usage using "htop" I could see 8 cores put to work. So it all happening by one thread/one event loop?

public class VertxPerformanceTest extends AbstractVerticle {

    public static void main(String[] args) {
        Vertx vertx = Vertx.vertx();
        DeploymentOptions deploymentOptions = new DeploymentOptions();
        deploymentOptions.setInstances(1);
        vertx.deployVerticle("com.tesco.cec.VertxPerformanceTest", deploymentOptions);
    }

    @Override
    public void start() throws Exception {
        super.start();
        HttpClient httpClient = vertx.createHttpClient();
        IntStream.rangeClosed(1, 100).forEach(delay -> {
            vertx.setPeriodic(1, idx -> {
                HttpClientRequest req = httpClient.request(HttpMethod.GET, 80, "www.amazon.in", "/");
                req.toObservable().
                        flatMap(resp -> {
                            //System.out.println(resp.statusCode());
                            return resp.toObservable();
                        }).
                        subscribe();
                req.end();
            });
        });
    }
}

On Monday, March 27, 2017 at 2:07:27 PM UTC+5:30, Thomas Segismont wrote:

Each verticle instance is assigned a different event loop. The context is an execution abstraction which can either be backed by an event loop (standard verticle) or a pool of worker threads (worker verticle). Different verticles (and thus context) can be backed by the same event loop. An event loop is really just a thread. The operating system decides where the thread will run. It could start on one core and move to another core (or socket) later. Vert.x does not set thread affinity (there is not even Java APIs for this).

So, your verticle instance could run on different cores, but is seems strange that it consumes 100% cpu on all cores.

As a rule of thumb, you can deploy a verticle instance per core. But since verticle code is very different from one project to another, it's best to measure the results with some performance testing.

2017-03-27 8:34 GMT+02:00 Pradp <pradeep....@gmail.com>:

Hi,

I want to understand the relationship between the verticle instance - event loop - context - cpu cores. From the documents I understood that each vertcile instance will have one event loop mapped too. In a sample program I tried with a single verticle instance I was able to max out all the cpu cores in the machine. So, does it mean that single event loop is able to use all the cpu cores in the machine? If this is the case when should I have to increase the verticle instance number when I start the vertx instance. Is there a guide or formula to figure out how many instances of verticle is optimum to spin out based on the number of cores or any?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Y Ramesh Rao

unread,

Mar 27, 2017, 8:53:34 AM3/27/17

to vert.x

I tried the program after changing the deployVerticle class path and "Thread.currentThread().getName()" is printing -- "vert.x-eventloop-thread-0"

Y Ramesh Rao

unread,

Mar 27, 2017, 8:59:12 AM3/27/17

to vert.x

To give more context of the SOP placement, here it is :

req.toObservable().

flatMap(resp -> {

System.out.println(Thread.currentThread().getName());

return resp.toObservable();

}).subscribe();

Jez P

unread,

Mar 27, 2017, 11:09:27 AM3/27/17

to vert.x

One thread should not be able to use all the CPU on a multicore machine. It can only be running on one core at a time.

Jez P

unread,

Mar 27, 2017, 11:11:02 AM3/27/17

to vert.x

I note also you've got 12G of memory usage. Not much of that will be your Java program. Have you tried running just "top"

Pradp

unread,

Mar 27, 2017, 11:57:45 AM3/27/17

to vert.x

Below is the program I used to check whether the vertx with single event loop can make use of all the cores in the machine. When I execute this and saw the CPU usage using "htop" I could see 8 cores put to work. So it all happening by one thread/one event loop?

public class VertxPerformanceTest extends AbstractVerticle {

    public static void main(String[] args) {
        Vertx vertx = Vertx.vertx();
        DeploymentOptions deploymentOptions = new DeploymentOptions();
        deploymentOptions.setInstances(1

);
        vertx.deployVerticle("VertxPerformanceTest", deploymentOptions);
    }

    @Override
    public void start() throws Exception {
        super.start();
        HttpClient httpClient = vertx.createHttpClient();
        IntStream.rangeClosed(1, 100).forEach(delay -> {
            vertx.setPeriodic(1, idx -> {
                HttpClientRequest req = httpClient.request(HttpMethod.GET, 80, "localhost", "/");
                req.toObservable().
                        flatMap(resp -> {
                            //System.out.println(resp.statusCode());
                            return resp.toObservable();

                        }).
                        subscribe();
                req.end();
            });
        });
    }
}

On Monday, March 27, 2017 at 2:07:27 PM UTC+5:30, Thomas Segismont wrote:

Each verticle instance is assigned a different event loop. The context is an execution abstraction which can either be backed by an event loop (standard verticle) or a pool of worker threads (worker verticle). Different verticles (and thus context) can be backed by the same event loop. An event loop is really just a thread. The operating system decides where the thread will run. It could start on one core and move to another core (or socket) later. Vert.x does not set thread affinity (there is not even Java APIs for this).

So, your verticle instance could run on different cores, but is seems strange that it consumes 100% cpu on all cores.

As a rule of thumb, you can deploy a verticle instance per core. But since verticle code is very different from one project to another, it's best to measure the results with some performance testing.

2017-03-27 8:34 GMT+02:00 Pradp <pradeep....@gmail.com>:

Hi,

I want to understand the relationship between the verticle instance - event loop - context - cpu cores. From the documents I understood that each vertcile instance will have one event loop mapped too. In a sample program I tried with a single verticle instance I was able to max out all the cpu cores in the machine. So, does it mean that single event loop is able to use all the cpu cores in the machine? If this is the case when should I have to increase the verticle instance number when I start the vertx instance. Is there a guide or formula to figure out how many instances of verticle is optimum to spin out based on the number of cores or any?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Jez P

unread,

Mar 27, 2017, 1:54:09 PM3/27/17

to vert.x

Show the rest of the output of top please. I reiterate, you were using only one thread in the vert.x application. Unless maybe GC was maxing things out, but I doubt that would get 7 threads. You're not proving that one application is using all 8 cores in that demo.

I notice you're now calling localhost whereas previously you were using google.in. What is the server you're calling into? Where is the code for that? What does the list in htop tell you. The visualisation part is insufficient information to tell you what a single application is doing.

Jez P

unread,

Mar 27, 2017, 1:55:43 PM3/27/17

to vert.x

And again, you haven't explained what's using 12G of your memory. Java will use 25% of your RAM by default (4G) so something else is consuming 8G. How do you know that's not contributing to your CPU load?

On Monday, March 27, 2017 at 4:57:45 PM UTC+1, Pradp wrote:

Jez P

unread,

Mar 31, 2017, 2:50:29 PM3/31/17

to vert.x

Did you ever find out what was using the other 8G of your RAM?

Pradp

unread,

Apr 4, 2017, 3:30:20 AM4/4/17

to vert.x

I'm more interested in usage of CPU not much of RAM. My intention here is to figure out how many instances of verticles I need to spin to utilise all the cores of the machine or container. I rewrote the program without RxJava to make sure it doesn't do anything additional to make usage of cores. Below is the code

import io.vertx.core.AbstractVerticle;

import io.vertx.core.DeploymentOptions;

import io.vertx.core.Vertx;

import io.vertx.core.http.HttpClient;

import io.vertx.ext.web.Router;

import java.util.stream.IntStream;

public class RestCallPerformanceTest extends AbstractVerticle{

public static void main(String[] args) {

Vertx vertx = Vertx.vertx();

DeploymentOptions deploymentOptions = new DeploymentOptions();

deploymentOptions.setInstances(1);

vertx.deployVerticle("RestCallPerformanceTest", deploymentOptions);

}

@Override

public void start() throws Exception {

super.start();

Router router = Router.router(vertx);

HttpClient httpClient = vertx.createHttpClient();

IntStream.rangeClosed(1, 70000).forEach(delay -> {

vertx.setPeriodic(1, idx -> {

httpClient.getAbs("http://www.google.in", response -> {

response.bodyHandler(body -> body.toString());

}).end();

});

}

Below is the top results when I ran the above program. Java process shows 738.4 %CPU. Is this the indicator that it uses all 8 cores in the machine?

Below JvisualVM screenshot for the java process

There is only one event loop thread that is created. These were the only threads attached to the java process and no other threads where shown in the jvisualvm.

Does this infer that CPU cores utilisation may not be based on the number of verticle instances I spin up? Simple operation like REST call at high volume can make use of all cores?

Jez P

unread,

Apr 4, 2017, 4:22:43 AM4/4/17

to vert.x

Look at your CPU/GC trace. Do you see a correlation? (I do).

Every time your GC stops, your CPU usage drops massively. What does that tell you? It tells me that you're doing a lot of GC, and that while you're doing your GC your CPU usage is very high. When GC gets a break (always very brief) your CPU usage drops to almost nil.

You're constantly triggering GC. I didn't know GC could use more than one core, but it looks like that's what's happening to me.

Jez P

unread,

Apr 4, 2017, 4:27:06 AM4/4/17

to vert.x

By the way, you are aware that you're firing up 17000 periodic repeaters, which will hit google.in every millisecond. That's probably why your GC is immense, you're creating 70000 requests every millisecond, and then throwing them away once completed.

On Tuesday, April 4, 2017 at 8:30:20 AM UTC+1, Pradp wrote:

Jez P

unread,

Apr 4, 2017, 4:27:22 AM4/4/17

to vert.x

70000 not 17000

Pradp

unread,

Apr 4, 2017, 5:16:33 AM4/4/17

to vert.x

You are right. The correlation between the GC and CPU utilisation matches. So GC was using all the cores. To make sure I made some changes to the code keep the CPU busy with less GC.

import io.vertx.core.AbstractVerticle;
import io.vertx.core.DeploymentOptions;
import io.vertx.core.Vertx;
import io.vertx.core.http.HttpClient;
import io.vertx.ext.web.Router;

import java.math.BigInteger;
import java.util.stream.IntStream;


/**
 * Created by gts9 on 04/04/17.
 */
public class PerformanceTest extends AbstractVerticle{

public static void main(String[] args) {
        Vertx vertx = Vertx.vertx();
        DeploymentOptions deploymentOptions = new DeploymentOptions();
        deploymentOptions.setInstances(1

);
        vertx.deployVerticle("PerformanceTest", deploymentOptions);
    }

@Override
    public void start() throws Exception {
        super.start();
        Router router = Router.router(vertx);
        HttpClient httpClient = vertx

.createHttpClient();
        IntStream.rangeClosed(1, 100000).forEach(delay -> {
            vertx.setPeriodic(1, idx -> {
                keepCpuBusy();
            });
        });
    }

    private void keepCpuBusy() {
        BigInteger factValue = BigInteger.ONE;
        long t1 = System.nanoTime();
        for (int i = 2; i <= 100000; i++) {
            factValue = factValue.multiply(BigInteger.valueOf(i));
        }
        long t2 = System.nanoTime();
        String result = "CPU Id: Service Time(ms)=" + ((double) (t2 - t1) / 1000000);
        System.out.println(result);
    }

}

Now the CPU utilisation matches to the verticle instances in a way

Thanks a lot Jez P.

To end this discussion I like to clarify on one point. It is advertised that vertx uses multi reactor pattern and uses all the cores of machine not like node js.

But here I see only one event loop is created when there is only one verticle instance. If I increase the number of instances the event loop count increases

and uses all the cores. My understanding by reading the docs and presentation which shows multiple event loop that it will be spun by default based on the cores

to utilise all the cores. At least that is what the impression I get and I checked with few of my colleagues too. If we have to increase the instances to make it happen

how is it different from node js? I have to do the same

in node js to spin more process using compute node module to make use of all cores. Only difference I could see here in vertx it is natively built in but the developer effort remains the same in a way.

Does it mean that the single process here can use all the cores in contrast to node there are multiple processes? How does the comparison works?

Jez P

unread,

Apr 4, 2017, 5:40:55 AM4/4/17

to vert.x

Answering your question about number of cores: yes you have to increase instances. A verticle instance is bound to only one event loop thread - this guarantees single-threaded access to the state of that verticle instance. But you can break down your logic into multiple verticles (since the deployment unit is the verticle) - this enables you to scale verticles independently from one another for example. So those which do most work deploy with multiple instances, those which do least have fewest instances.

In answer to your last question, primary differences from node (the ones that count for me, anyway):-

(1) JVM-based (tuning benefits and control)

(2) Transparent eventbus communication for clustering - easy horizontal scaling (IPC effectively managed by vert.x)

(3) What you hit on: single process using multiple cores as opposed to multiple processes using multiple cores - scale within process until you have to scale horizontally.

(4) Polyglot (yes I know node supports any language which transpiles to JS, however communities for eg Purescript much smaller than those for say Scala)

(5) Independently scalable deployment units, within one process to optimise use of machine resources within process.

Others may have other opinions or point out things I didn't think of but should have :)