Semantics of the CDI fault tolerance approach

166 views
Skip to first unread message

sst...@redhat.com

unread,
May 10, 2017, 2:57:38 PM5/10/17
to MicroProfile
I'm trying to understand how this ends up being processed by a container:

@ApplicationScoped
public class FaultToleranceBean {
   int i = 0;
   @Retry(maxRetries = 2)
   public Runnable doWork() {
      Runnable mainService = () -> serviceA(); // This unreliable service sometimes succeeds but
                                         // sometimes throws a RuntimeException
	  return mainService;								 
   }
}
I'm assuming that this creates an interceptor that has a RetryPolicy equivalent to this builder code:

RetryPolicy rp = FaultToleranceFactory.getInstance(RetryPolicy.class).retryOn(RuntimeException.class)
.withDelay(2, TimeUnit.SECONDS) .withMaxRetries(2);
I'm trying to understand how this will work when one is composing a workflow of micro service calls. In the builder style case, I'm going to combine policy with a Callable/Runnables that represent the services I'm composing. 

For the CDI model, what would one do?

Emily Jiang

unread,
May 11, 2017, 11:39:33 AM5/11/17
to MicroProfile
Hi Scott,

The implementer will have to provide an interceptor to apply the retry policy, which can use executor.withRetry(retryPolicy).run(mainService). Any fault tolerance annotation will trigger the interceptor being added on the method or the class level depending on the annotation specified on the method or class.

Basically, the implementer provides two interceptors, one synchronous and the other asynchronous. The default is synchronous invocation. With the annotation of asynchronous, the asynchronous interceptor will be applied.

Thanks
Emily

John D. Ament

unread,
May 16, 2017, 9:41:17 AM5/16/17
to MicroProfile
But even then, I'm not sure this is the right API signature.  I would expect the interceptor to be applied like this:

@ApplicationScoped
public class FaultToleranceBean {
   int i = 0;
   @Retry(maxRetries = 2)
   public void doWork() {
                // call some long remote service that may fail
   }
}

E.g. it shouldn't return a runnable that you execute, but instead do the actual execution.

Emily Jiang

unread,
May 17, 2017, 5:26:25 PM5/17/17
to MicroProfile
Maybe Runnable is not a good return type. All I want to emphasize is that the fault tolerance is supposed to wrap up the call to the unreliable service not the client method itself. The return value can be anything. We must not force to return void.



In the non-CDI approach, you have


        public void myMethod() {
....
// Set up a Duration of 1 second
Duration duration = Duration.ofSeconds(1); //

// FaultTolerance retry policy with max of 3 retries and 1 millisecond between retries
retryPolicy = retryPolicy.retryOn(Exception.class).withDelay(duration).withMaxRetries(3);

// Create an Execution object. Configure it to connect to a "Primary", with our RetryPolicy
// and with a fallback to connect to a "Backup"
Executor executor = FaultToleranceProvider.getFaultToleranceType(Executor.class);

// Main Service
Callable<Connection> mainService = () -> connectToPrimary();

// FaultTolerance with fallback to a Backup
executor.with(retryPolicy).get(mainService);

...
}

protected Connection connectToPrimary() throws ConnectException {
System.out.println("Main Service has been called");
out.println("Main Service has been called");
if (true)
throw new ConnectException();
// Shouldn't get here
return null;
}

In CDI-based approach, the annotations apply to the method connectToPrimary not myMethod.

public void myMethod() {
...
Connection c = connectToPrimary()
}
@Retry(maxRetries=3, delay=1)
protected Connection connectToPrimary() throws ConnectException {
System.out.println("Main Service has been called");
out.println("Main Service has been called");
if (true)
throw new ConnectException();
// Shouldn't get here
return null;
}

Ken Finnigan

unread,
May 19, 2017, 2:23:06 PM5/19/17
to Emily Jiang, MicroProfile
Emily,

I'm not sure I see how your example for CDI would work if myMethod() made an HTTP call instead of calling an internal method, connectToPrimary().

Or are we suggesting that anything you want to "circuit break" must be in a wrapped method?

Ken

--
You received this message because you are subscribed to the Google Groups "MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile+unsubscribe@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microprofile/69e2640b-00dc-4ec9-a96f-b644d09765ce%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Emily Jiang

unread,
May 19, 2017, 5:32:27 PM5/19/17
to MicroProfile, emij...@googlemail.com
Ken,

Yes, anything you want to apply for fault tolerance policy must be in a wrapped method.
Emily
Ken

To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.

To post to this group, send email to microp...@googlegroups.com.

Ondrej Mihályi

unread,
May 19, 2017, 7:29:43 PM5/19/17
to MicroProfile
I was also puzzled with the Runnable return type at first. But I think it also makes sense - the interceptor would wrap such a runnable with a higher order function that would intercept the runnable once called some time after.

E.g.:

@Retry
public Runnable remoteAction() {...}

Calling remoteAction().run() would apply retry policy around the action automatically.

This adds flexibility and supports functional programming nicely. The same concept could be applied with JavaSE by providing the higher order functions (functional interceptors) e.g. as static methods to wrap runnables.

With JavaSE:

Runnable actionWithRetries = Intercept.retry(new Runnable() {...});
actionWithRetries.run(); // would be intercepted by the retry higher order function


I'll try creating pull request to give more examples.

--Ondrej

sst...@redhat.com

unread,
May 25, 2017, 5:44:43 PM5/25/17
to MicroProfile
Ok, I'm beginning to see the syntax. I feel there needs to be more of a description about the workflow a method is invoking, and this has only increased given the service mesh for micro services announcement by Google, IBM, and Lyft, with support from Red Hat and others:
https://github.com/istio

A question is how the MicroProfile effort fits into a service mesh architecture? I can see both externalization of fault-tolerance and injection of either policies or service proxies needing to be supported.

Here are a couple of brain dumps on trying to capture a notion of providing metadata about the workflow an edge service is involved with. This first mockup just describes the wrapped calls being made:
public class MyGateway {
@Workflow(
name = "display-book-info",
calls = {
@CallReference("#connectToPrimary()"),
@CallReference("#restCall()")
}
)
@GET()
@Path("frontend-gateway")
@Produces("application/json")
public String myGatewayMethod() {
Callable<Connection> dbCall = this::connectToPrimary;
Callable<String> restCall = this::restCall;

// ...

return null;

}

@Retry(maxRetries=3, delay=1)
protected Connection connectToPrimary() throws ConnectException {
        return null;
}

@Asynchronous
@CircuitBreaker(delay = 5, delayUnit = ChronoUnit.SECONDS)
protected String restCall() throws Exception {
return null;
}
}

and here is another based on some type of injection of service proxies that are registered with the mesh:
public class MyAbstractGateway {
@ServiceReference(name="dbResults", endpoint = "/books")
WebTarget dbResults;

@Asynchronous
@ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

@Workflow(
name = "display-book-info",
services = {
@ServiceReference(name="dbResults", endpoint = "/books"),
@ServiceReference(name="images", endpoint = "/books/images"),
@ServiceReference(name="reviews", endpoint = "/books/reviews")
}
)
@GET
@Path("frontend-gateway")
@Produces("application/json")
public String myGatewayMethod() {
dbResults.request().get(...);

return null;
}

}


The concept of a workflow is something needed for tracing and integration with Hystrix type dashboards as well I believe.

This would be post the 1.1 release.
Message has been deleted

sst...@redhat.com

unread,
May 25, 2017, 6:15:47 PM5/25/17
to MicroProfile
I forgot the fault-tolerance annotations on the service reference injection sites:

public class MyAbstractGateway {
@Retry(maxRetries=3, delay=1)

@ServiceReference(name="dbResults", endpoint = "/books")
WebTarget dbResults;

@Asynchronous
    @CircuitBreaker(delay = 5, delayUnit = ChronoUnit.SECONDS)
    @ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

    @Retry(maxRetries=3, delay=1)

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

On Thursday, May 25, 2017 at 2:44:43 PM UTC-7, sst...@redhat.com wrote:
Ok, I'm beginning to see the syntax. I feel there needs to be more of a description about the workflow a method is invoking, and this has only increased given the service mesh for micro services announcement by Google, IBM, and Lyft, with support from Red Hat and others:
https://github.com/istio

A question is how the MicroProfile effort fits into a service mesh architecture? I can see both externalization of fault-tolerance and injection of either policies or service proxies needing to be supported.
...

Ken Finnigan

unread,
May 26, 2017, 3:20:49 PM5/26/17
to Scott M Stark, MicroProfile
On Thu, May 25, 2017 at 5:44 PM, <sst...@redhat.com> wrote:
Ok, I'm beginning to see the syntax. I feel there needs to be more of a description about the workflow a method is invoking, and this has only increased given the service mesh for micro services announcement by Google, IBM, and Lyft, with support from Red Hat and others:
https://github.com/istio

A question is how the MicroProfile effort fits into a service mesh architecture? I can see both externalization of fault-tolerance and injection of either policies or service proxies needing to be supported.

+1. MicroProfile will need to support both approaches as not everyone would necessarily be running in an environment where service mesh is present.

I'm not sure I fully understand how @Workflow fits into the picture.

I see that in this instance it's defining the set of calls that will be made by a service, but that only seems applicable in an aggregation service sense.

How would @Workflow be defined when the workflow is a sequence of chained service calls? Would each service have it's own @Workflow annotation with the same name?

Is @Workflow equivalent to HystrixCommandKey? Essentially the name identifier of a "circuit". Though this seems broader than that, so not sure how it fits.

Ken
  

--
You received this message because you are subscribed to the Google Groups "MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile+unsubscribe@googlegroups.com.

To post to this group, send email to microp...@googlegroups.com.

sst...@redhat.com

unread,
May 26, 2017, 3:59:49 PM5/26/17
to MicroProfile, sst...@redhat.com
I was viewing the @Workflow as something more like the HystrixCommandGroupKey that represents the nodes of a call graph being made by the given service.  Each service endpoint would in turn define its associated calls, so in the sequence of calls scenario where A calls B calls C, A & B work have @Workflow annotations, C would not.

I'm still trying to figure out how the Hystrix dashboard shows such a group to understand what the mapping would be. There is a relationship to Zipkin tracing as well.

Ken Finnigan

unread,
May 26, 2017, 4:05:09 PM5/26/17
to Scott M Stark, MicroProfile
Makes sense.

Been a while since I've played with the dashboard so can't recall exactly, but I think each circuit "box" in the UI is a separate HCGK.

--
You received this message because you are subscribed to the Google Groups "MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile+unsubscribe@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.

Emily Jiang

unread,
May 26, 2017, 6:41:54 PM5/26/17
to MicroProfile, sst...@redhat.com
Scott,
What you proposed sounds quite encouraging. I can see FT has a great expansion area. With the integration with distributed tracing together with FT event stream, I think they can fit together nicely. In order not to loose this use case, I raised an issue on the FT area and mark as enhancement. It can also potentially work in OSGi bundles. Let's get the essential api out first and expand and integrate with more pieces.

Thanks
Emily
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.

To post to this group, send email to microp...@googlegroups.com.

Ian Robinson

unread,
Jun 16, 2017, 9:17:35 AM6/16/17
to MicroProfile, sst...@redhat.com
The opportunity for application-level and infrastructure-level fault tolerannce to clash or collaborate seems like something we should get ahead of with MicroProfile FT, and Istio as a concrete example of where this might occur. Both provide aspects of fault tolerance - MicroProfile's focus is on the Java application programming model (including the app configuration) whereas Istio's is on the routing infrastructure. Both obviously have to be able to function entirely in the absence of the other but there are some separations of concern and opportunities for interaction we could work on.

For separation of concern, the fallback aspects of FT are exclusively in the domain of the application rather than the domain of the infrastructure (and the Istio proposal asserts this too), whereas policies for timeout, retries, bulkhead and circuit-breaking could be defined by both and there is a question over which should take precedence and how an application-level policy would even be aware of an infrastructure one (and vice-versa). For example, if an app uses 
@Retry(maxRetries = 5)

but Istio also has a simpleRetry policy of 3 then what should happen? The right answer is either 3 retries or 5 but it is not 15.

For timeout, Istio recognizes there may be different sources of truth for timeout and supports an app-provided (or app-framework-provided) header that enables the application to influence a routing timeout policy. It is far less clear how retries could be coordinated since Istio-driven retries are transparent to the application. I don't think we want the burden of some protocol to expose and resolve policy conflicts (that would all start to get a bit too WS-Policy for my liking and nobody want to go there...) but we might want to think about a strategy for defining a portable way for a MicroProfile FT configuration to result in the generation of artefacts for an Istio route-rule to which the MicroProfile FT framework would then delegate. Or alternatively a means to define to Istio that a specific set of FT policies (including retry) are already in effect for a deployed microservice, managed at the application layer, so that Istio (Envoy) defers driving retries to the app even if a route-rule is configured.

To get much further we're going to need to have parts of this discussion in both the MicroProfile and Istio groups, but we have a good deal of overlap in participation in both.
One starting point here in MicroProfile could be to look at which parts of our current FT proposal could be extended to indicate that application-framework-level handling (for example of retries) are being delegated to the infrastructure - in which case our framework-level implementation of those policies would need to be aware at runtime that it was not in control of that policy and we'd need a means to generate an external policy (e.g. for Istio route-rule config) from application config/annotations. The goal of this would be  a self-contained Java programming model for FT regardless of underlying infrastrucure (the path we're already on), and one that also then defines the Java bindings for the FT parts of Istio route-rules. Is that reasonable?

- Ian
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.

To post to this group, send email to microp...@googlegroups.com.

Emily Jiang

unread,
Jun 16, 2017, 12:09:18 PM6/16/17
to MicroProfile, sst...@redhat.com
Thanks for bringing this up, Ian!

MicroProfile FT should work well in istio. We can use Istio to configure/influence the FT policies e.g. retries, timeout, circuit breaker. In this way, the app FT is same as Istio FT policies if they match. Istio is just there to provide configuration for FT policies in the apps. With the idea mentioned in this thread, I think it is feasible.

As per your note @Retry(maxRetries = 5),
but Istio also has a simple Retry policy of 3 then what should happen? The right answer is either 3 retries or 5 but it is not 15.

I would think the environment configuration should have the final say, which means the Retry should be 3.

Below is an example of istio config:

destination: "ratings.default.svc.cluster.local"
route:
- tags:
    version: v1
httpReqRetries:
  simpleRetry:
    attempts: 3


We can use the above jason config in Istio as a dynamic config source and give it a higher priority, it should have overall control. Of course, we need to transform the property names according to naming convention. e.g. transform attempts to maxRetries etc. In this way, it is nice and easy, without any confusion. The parameters are visible in the console as well.

Emily

sst...@redhat.com

unread,
Jun 19, 2017, 12:21:38 PM6/19/17
to MicroProfile, sst...@redhat.com
I would agree that the service mesh infrastructure should be able to override most of the behavior specified by developers as the programming language level. What is unclear is what the canonical configuration description that integrates both service mesh and bean level behavior specifications. For example, one issue is that the istio configuration example uses the FQDN of the service as well as a label "version: v1". Neither of these are currently concepts in the MicroProfile world.

It seems to me that we will have to have a service mesh abstraction specification that will define concepts container runtimes need to support to be able to integrate into a service mesh like istio.

Emily Jiang

unread,
Jun 19, 2017, 6:28:19 PM6/19/17
to MicroProfile, sst...@redhat.com
Hi Scott,

I noticed the v1 concept on Istio. My understanding is that we don't have the corresponding capability in Microprofile. I don't quite understand your concern of having a service mash abstraction. Should that be handled by the infrastructure not interfered by the MP programming model?

Emily

sst...@redhat.com

unread,
Jun 21, 2017, 3:10:38 PM6/21/17
to MicroProfile, sst...@redhat.com
The concern is mapping to/from the service mesh configuration down to a bean instance in a runtime container does not align up well as far as I can see because the namespaces are coming from different perspectives and levels. As far as I can see we would need a notion of a mapping from the bean instance name to its service name in the service mesh in order to be able to map between the configurations. Once you have that, you can attach attributes to it which is the next level of being able to filter on a service.

In terms of CDI semantics, we essentially need a new scope @ServiceMeshScoped that is above @ApplicationScoped. I would expect the @ServiceMeshScoped would be where the FQDN portion of the service's name would be defined. When running in an environment with a istio type of proxy, this information has to be imported by the container from its runtime environment. 

In the case where a devop is composing a workflow across N micro profile compatible runtimes, it could be a logical configuration.

A related element is some service catalog notion. It is one thing if every microservice is a standalone function element is that is only invoked, but it is another level of complexity when one microservice relies on others. The problem being what is the location of the services as needed when creating a WebTarget endpoint for example.

Kevin Sutter

unread,
Jun 23, 2017, 9:46:18 AM6/23/17
to MicroProfile, sst...@redhat.com
Scott,
I like the idea of the @ServiceMeshScoped scope addition to CDI.  This could help with surfacing the application's intent to the service mesh.

(Sorry I'm coming late to the party here, but I didn't realize that the Istio integration discussion was happening in this thread...)

Thanks, Kevin

Emily Jiang

unread,
Jul 4, 2017, 7:27:17 AM7/4/17
to MicroProfile, sst...@redhat.com
I think the notion of any new service mesh scope is orthogonal to FT integration with an external service mesh FT config. CDI scopes might be useful for a CDI context spanning multiple inbound Envoy-proxied requests. But what about a simple outbound scenario where a FT-annotated CDI method makes a service requestion that is proxed by Istio with its own FT configuration? Istio provides: TimeOut, Retry and CircuitBreaker. 
CircuitBreaker is not an issue as it controls its own circuit.
The only issues are TimeOut and the max retries.
I think we have 3 options.
  1. When MP FT app is deployed in Istio, don't use Istio's fault handling.
  1. When MP FT app is deployed in Istio, MP FT can be switched off except fallback. Basically in FT spec, we state the implementer must provide a way to switch the FT. It is feasible as the implementation is done by CDI interceptor.
  1. Istio provides two special http headers “x-envoy-upstream-rq-timeout-ms” and “x-envoy-max-retries”. If the runtime e.g. Liberty, Wildfly swarm detects MP FT present and then put the http header on the request with the value of 0. In this way, it can turn off Istio's timeout and max retries while keeping the rest Istio's FT capabilities.

The MP fault tolerance config, for each of these 3 cases, could be specified as config properties via the Config API ( separate thread on this pattern). #1 and #2 are simply achieved by setting zero retries in either the Istio or application config and essentially represent no integration. 
#3 is the most interesting because is allows Istio FT configuration to be provided for a proxied service but overridden by an application-provided configuration using the existing Istio headers mentioned. In this case we need to determine the simplest way of adding the 2 headers to the application's outbound HTTP call to the proxied service. For example, providing a helper method that returned a javax.ws.rs.core.MultivaluedMap containg both these headers with a value of 0 for use with the jaxrs WebTarget. 

Emily

Emily Jiang

unread,
Jul 7, 2017, 6:06:59 AM7/7/17
to MicroProfile, sst...@redhat.com
Follow up on this thread, I have raised two issues in Fault Tolerance repo.

  1. Istio integration - switch off FT policy except retry
  2. Istio Integration - overwrite Istio's retry and timeout


We are going to prototype the solutions under the above two issues. If you have any thoughts, please comment on the corresponding issues.

Emily
Reply all
Reply to author
Forward
0 new messages