service discovery & circuit breaker best practice

Gadi Eichhorn

unread,

Jan 6, 2018, 2:06:59 PM1/6/18

to vert.x

Hi,

This may have been answered before...

I would like to use circuit breaker [CB] on service discovery services.

My question is, do you scan all services matching your filter each time there is a call and then use the CB to call it or you cache the services discovered (loaded once) and then only research for services if the CB is opened?

Microservice A (gateway)

Microservice B (REST)

do A load all services at start and then use them to call B

OR

A load/discover services each time need to call B?

If I use a CB , is there a way to reload services from discovery service when the CB is opened?

any examples will be great.

Thanks,

G.

Clement Escoffier

unread,

Jan 7, 2018, 3:31:19 AM1/7/18

to ve...@googlegroups.com

Hi,

It depends what you want to achieve. If your platform does not support round robin, you may want to retrieve all the services and iterate. On failure, re-execute the discovery. If your platform support some kind of round-robin, I generally re-discover the service in the half-open state:

https://github.com/redhat-developer/reactive-microservices-in-java/blob/master/openshift/hello-microservice-consumer-openshift-circuit-breaker/src/main/java/io/vertx/book/openshift/HelloConsumerWithCircuitBreakerHttpVerticle.java#L49

Notice that the discovery price depends on your service discovery backend. By default it uses a distributed map (so all nodes should have their up to date copy), but if you use the Redis backend you are going to pay the round-trip price.

Clement

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/07e6f9a4-68f8-4380-a5f4-9f1cc492aa86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gadi Eichhorn

unread,

Jan 7, 2018, 4:00:02 AM1/7/18

to vert.x

Thanks Clement,

I will try that idea/example.

I am using zookeeper as cluster manager. does this supports round-robin?

* docker discovery link

* zookeeper cluster manager

A strange related issue I get with ZK is that when I restart (docker) I get some caching issue, I can still see the old services. is this because:

1. my services records didnt get unpublished?

2. ZK keep some cache internally

the only way to remove the old records is to clean docker with

> docker system prune

That removes all cache and then I see just the single record deployed

I am using embedded vertx (docker) so maybe I am not getting the SIGTERM properly which keeps the records on ZK.

Thanks.

G

Clement Escoffier

unread,

Jan 7, 2018, 5:05:46 AM1/7/18

to ve...@googlegroups.com

On 7 Jan 2018, at 10:00, 'Gadi Eichhorn' via vert.x <ve...@googlegroups.com> wrote:

Thanks Clement,

I will try that idea/example.

I am using zookeeper as cluster manager. does this supports round-robin?
* docker discovery link
* zookeeper cluster manager

Are you using something like Docker Swarm? If so yes. If not… No I don’t think pure docker does round-robin.

A strange related issue I get with ZK is that when I restart (docker) I get some caching issue, I can still see the old services. is this because:
1. my services records didnt get unpublished?
2. ZK keep some cache internally

Are you using the Zookeeper Service Discovery Backend or just the Zookeeper cluster manager? In both case, your code must unpublish the services.

the only way to remove the old records is to clean docker with
> docker system prune
That removes all cache and then I see just the single record deployed
I am using embedded vertx (docker) so maybe I am not getting the SIGTERM properly which keeps the records on ZK.

Do you unpublish the services in the `stop` method (from your verticle) ?

Clement

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/d80e531a-bf51-426e-9df8-136c46013a91%40googlegroups.com.

Gadi Eichhorn

unread,

Jan 7, 2018, 4:48:46 PM1/7/18

to vert.x

On Sunday, January 7, 2018 at 10:05:46 AM UTC, clement escoffier wrote:

On 7 Jan 2018, at 10:00, 'Gadi Eichhorn' via vert.x <ve...@googlegroups.com> wrote:

Thanks Clement,

I will try that idea/example.

I am using zookeeper as cluster manager. does this supports round-robin?
* docker discovery link
* zookeeper cluster manager

Are you using something like Docker Swarm? If so yes. If not… No I don’t think pure docker does round-robin.

not using swarm yet, I see your point about the implementation, with a service swarm will handle that for me.

A strange related issue I get with ZK is that when I restart (docker) I get some caching issue, I can still see the old services. is this because:
1. my services records didnt get unpublished?
2. ZK keep some cache internally

Are you using the Zookeeper Service Discovery Backend or just the Zookeeper cluster manager? In both case, your code must unpublish the services.

So far just the cluster manager. I didnt see a special backend for zookeeper. I use this:

compile group: 'io.vertx', name: 'vertx-service-discovery', version: vertx

compile group: 'io.vertx', name: 'vertx-service-discovery-bridge-docker-links', version: vertx

the only way to remove the old records is to clean docker with
> docker system prune
That removes all cache and then I see just the single record deployed
I am using embedded vertx (docker) so maybe I am not getting the SIGTERM properly which keeps the records on ZK.

Do you unpublish the services in the `stop` method (from your verticle) ?

Yes. With Hazelcast it worked fine, so it is something to do with the ZK implementation that doesn't clean up properly.

Clement Escoffier

unread,

Jan 8, 2018, 1:45:26 AM1/8/18

to ve...@googlegroups.com

On 7 Jan 2018, at 22:48, 'Gadi Eichhorn' via vert.x <ve...@googlegroups.com> wrote:
On Sunday, January 7, 2018 at 10:05:46 AM UTC, clement escoffier wrote:

On 7 Jan 2018, at 10:00, 'Gadi Eichhorn' via vert.x <ve...@googlegroups.com> wrote:

Thanks Clement,

I will try that idea/example.

I am using zookeeper as cluster manager. does this supports round-robin?
* docker discovery link
* zookeeper cluster manager

Are you using something like Docker Swarm? If so yes. If not… No I don’t think pure docker does round-robin.

not using swarm yet, I see your point about the implementation, with a service swarm will handle that for me.

A strange related issue I get with ZK is that when I restart (docker) I get some caching issue, I can still see the old services. is this because:
1. my services records didnt get unpublished?
2. ZK keep some cache internally

Are you using the Zookeeper Service Discovery Backend or just the Zookeeper cluster manager? In both case, your code must unpublish the services.

So far just the cluster manager. I didnt see a special backend for zookeeper. I use this:
compile group: 'io.vertx', name: 'vertx-service-discovery', version: vertx
compile group: 'io.vertx', name: 'vertx-service-discovery-bridge-docker-links', version: vertx

the only way to remove the old records is to clean docker with
> docker system prune
That removes all cache and then I see just the single record deployed
I am using embedded vertx (docker) so maybe I am not getting the SIGTERM properly which keeps the records on ZK.

Do you unpublish the services in the `stop` method (from your verticle) ?

Yes. With Hazelcast it worked fine, so it is something to do with the ZK implementation that doesn't clean up properly.

Thanks for the report, can you open an issue on the Vert.x Zookeeper project: https://github.com/vert-x3/vertx-zookeeper/issues. The best would be to provide a short reproducer.

Clement

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/9548cefb-60a5-4be4-abe7-160749ccda81%40googlegroups.com.

Gadi Eichhorn

unread,

Jan 9, 2018, 3:38:34 AM1/9/18

to vert.x

after more investigation I think the main difference is that Hazelcast CM is embedded and not external so it cleans better in docker container.

to make ZK work I had to properly stop containers with compose up & down. CTRL+C just stop containers in compose without sending the proper SIGTERM

I now changed docker links to ZK backend

```

compile group: 'io.vertx', name: 'vertx-service-discovery', version: vertx

compile group: 'io.vertx', name: 'vertx-service-discovery-backend-zookeeper', version: vertx

```

I will use ZK only for service discovery and the event bus. Using ZK for config is a mystery ...

G.

Gadi Eichhorn

unread,

Jan 12, 2018, 5:22:45 PM1/12/18

to vert.x

Hi Clement,

Is there a more standard example not using RX java?

So far I manage to make the CB to work and indeed I get the error quickly when the service is not there.

step 1:

I get all the records for the httpEndpoint I am after and cache them in a map. Since I will use swarm in the end this is fine for now.

```

vertx.setPeriodic(10000, t -> {
    serviceDiscovery.getRecords(r -> r.getType().equals(HttpEndpoint.TYPE), ar -> {
        if (ar.succeeded()) {
            log.debug("Lookup... {}", ar.result().size());
            ar.result().stream()
                    .forEach(r -> {
                        records.computeIfAbsent(r.getName(), k ->  r);
                    });
        } else {
            log.error("Lookup failed", ar.cause());
        }
    });
    log.debug("Records: {}", records);
});

```

Step 2 I delegate any calls to the relevant endpoint. its a gateway pattern.

Now I want to Proxy my REST call to the relevant service endpoint with the CB wrapping. This is my route handler method. I am very unsure about this.

```

private void onDashboard(RoutingContext context) {
    HttpServerRequest request = context.request();
    if (records.containsKey(WebClientProvider.SERVICE_NAME)) {

        // I have a record to match the name I am after
        circuitBreaker.<HttpClientResponse>execute(future -> {
            try (ServiceReference serviceReference = serviceDiscovery.getReference(records.get(WebClientProvider.SERVICE_NAME))) {
                HttpClient client = serviceReference.getAs(HttpClient.class);

               // here I try to delegate the request to the endpoint
               client.request(request.method(), request.uri(), dashboard -> {
                    dashboard.bodyHandler(body -> {
                        context.response().end(body);
                    });
                    future.complete();
                });
                serviceDiscovery.release(serviceReference);
            } catch (Exception ex) {
                log.error("Exception", ex);
                future.fail(ex);
            }
        }).setHandler(result -> {
            if(result.succeeded()){

// do I return anything here?
            }else{
                context.response().setStatusCode(500).end(Json.encode(result.cause()));
            }
        });
    } else {
        log.warn("Missing client service reference: {}", WebClientProvider.SERVICE_NAME);
        log.debug("Clients: {}", records);
    }

}

```

always results in this:

{
cause: null,
stackTrace: [ ],
message: "operation timeout",
localizedMessage: "operation timeout",
suppressed: [ ]
}

I am also getting this strange warnings

gateway_1 | Jan 12, 2018 10:13:59 PM io.vertx.core.http.impl.ConnectionManager
gateway_1 | WARNING: Reusing a connection with a different context: an HttpClient is probably shared between different Verticles

Any help is appreciated.

G.

On Sunday, January 7, 2018 at 8:31:19 AM UTC, clement escoffier wrote:

Clement Escoffier

unread,

Jan 13, 2018, 3:15:40 AM1/13/18

to ve...@googlegroups.com

Hi,

This should be close to what you are looking for:

https://github.com/redhat-developer/reactive-microservices-in-java/blob/master/openshift/hello-microservice-consumer-openshift-circuit-breaker/src/main/java/io/vertx/book/openshift/HelloConsumerWithCircuitBreakerHttpVerticle.java

Clement

To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/9eb4dbe0-204e-4355-8f4e-bda0458b6569%40googlegroups.com.

Gadi Eichhorn

unread,

Jan 13, 2018, 1:51:13 PM1/13/18

to vert.x

thanks!!!

Finally it works...

I used this example -> http://escoffier.me/vertx-hol/ :)

private void onDashboard(RoutingContext context) {
 HttpServerRequest request = context.request();



 log.debug("Method: {}", request.method().name());
 log.debug("URI: {}", request.uri());

 if (records.containsKey(WebClientProvider.SERVICE_NAME)) {
 Record record = records.get(WebClientProvider.SERVICE_NAME);

 circuitBreaker.executeWithFallback(
 future ->
 vertx.createHttpClient().request(
 request.method(),
 record.getLocation().getInteger("port"),
 record.getLocation().getString("host"),
 request.uri(),
 response -> {
 response
 .exceptionHandler(future::fail)
 .bodyHandler(future::complete);
 }).exceptionHandler(future::fail).setTimeout(5000).end(),
 t -> Buffer.buffer("{\"message\":\"No web client service, or unable to call it\"}")
 ).setHandler(ar -> context.response().end(ar.result()));

Reply all

Reply to author

Forward