[grpc-java] ClientInterceptor Order

711 views
Skip to first unread message

Philippe Laflamme

unread,
Feb 16, 2021, 9:38:31 AM2/16/21
to grpc.io
Hi,

I've noticed a behaviour recently with ClientInterceptor that was a bit surprising and I'm wondering if the documentation needs to be adjusted or whether some other adjustment needs to happen.

The issue is around the ordering of calls when multiple interceptors are present. If one uses ClientInterceptors.intercept(a,b); the documentations states that b will be invoked first.

This is true for calls to the ClientCall methods, but is not for calls to ClientCalls.Listener. The calls to the ClientCall.Listener methods are in the reverse order as the calls to ClientCall methods. So if both a and b above "intercept" the ClientCall.Listener, then the ordering is something like this:

// startCall: b called first
a.startCall(b.startCall())

// onMessage: a called first
b.onMessage(a.onMessage())

This makes it impossible for an interceptor to do something like:

* setup some ThreadLocal state
* invoke the next interceptor's method (e.g.: start or onMessage)
* teardown ThreadLocal state

This is a common pattern for setting up tracing or logging context or integrating with other frameworks that use ThreadLocal.

I'm assuming this is the expected behaviour, but it is rather surprising when writing a "chain" of interceptors. Especially since the ServerInterceptor don't behave this way. Perhaps the documentation should be adjusted to reflect this more explicitly?

Thanks

Chengyuan Zhang

unread,
Feb 17, 2021, 5:22:27 PM2/17/21
to grpc.io
I think existing documentation is appropriate as it is documented from the caller of interceptCall's perspective. The order of interceptors being invoked follows with the network direction of the call. One interceptor instance wraps the call and its corresponding listener. When you do ClientInterceptors.intercept(A, B), interceptor B is closer to you and interceptor A is closer to the network. So the workflow of the call is

   Application -> Interceptor B's decorating code for call -> Interceptor A's decorating code for call -> ClientCall (gRPC's code) -> Network

   Network -> ClientCall.Listener (gRPC's code) -> Interceptor A's decorating code for call listener -> interceptor B's decorating code for call listener -> Application

You feel the server side is the opposite because you are viewing it from client application's perspective. If you think about it from server application's perspective, it flows in the same way. Another thing that could make you feel server interceptor operates oppositely is that server calls are started by the remote peer.


When intercepting a call, you should save the state in the interceptor, so that both the call and call listener at the same wrapping layer share the same view of the state. You can see how we use interceptors for recording OpenCensus metrics: https://github.com/grpc/grpc-java/blob/e73f31a561aa5849dc1d7746f916b79134d0973a/census/src/main/java/io/grpc/census/CensusStatsModule.java#L683-L708

Eric Anderson

unread,
Feb 17, 2021, 5:48:37 PM2/17/21
to Philippe Laflamme, grpc.io
On Tue, Feb 16, 2021 at 6:38 AM Philippe Laflamme <phil...@hopper.com> wrote:

This is true for calls to the ClientCall methods, but is not for calls to ClientCalls.Listener. The calls to the ClientCall.Listener methods are in the reverse order as the calls to ClientCall methods. So if both a and b above "intercept" the ClientCall.Listener, then the ordering is something like this:

// startCall: b called first
a.startCall(b.startCall())

// onMessage: a called first
b.onMessage(a.onMessage())

This is 100% the behavior you want; anything else leads to madness. Interceptors inject themselves to the boundary because the application and the gRPC library and allow you to do "whatever you want" with the API. This produces a layering, and to preserve the layering "who calls who" must be the exact opposite for callbacks as normal method calls.

It may be easier to think about it not as which is called "first" but by its placement; think about whether an interceptor is nearer the application or the network.

This makes it impossible for an interceptor to do something like:

* setup some ThreadLocal state
* invoke the next interceptor's method (e.g.: start or onMessage)
* teardown ThreadLocal state

How so? Each method call may be on a different thread, so you have to setup and teardown the ThreadLocal every method:

public void onMessage(ReqT req) {
  setupThreadLocal();
  try {
    super.onMessage(req);
  } finally {
    teardownThreadLocal();
  }
}

Don't create and throw exceptions inside interceptors, but if an exception happens it is okay to propagate it. This sort of code probably doesn't want the ThreadLocal to "leak," thus the try-finally.

This is a common pattern for setting up tracing or logging context or integrating with other frameworks that use ThreadLocal.

We actually do this ourselves for the io.grpc.Context which is stored in a ThreadLocal.

Philippe Laflamme

unread,
Feb 17, 2021, 9:17:38 PM2/17/21
to Eric Anderson, grpc.io
This is 100% the behavior you want; anything else leads to madness. Interceptors inject themselves to the boundary because the application and the gRPC library and allow you to do "whatever you want" with the API. This produces a layering, and to preserve the layering "who calls who" must be the exact opposite for callbacks as normal method calls.

I believe this can make sense in some cases, but I'm not sure that's the case in general. I believe there are cases where you want the interceptor to be at the same "logical" place on both paths. For example, a tracing interceptor wants to be first on the outbound path so create a new span and wants to be first on the callback path to make it the "active span" again. As far as I can tell, it is not possible to have this setup with the current API.

Furthermore, the ServerInterceptor doesn't behave like this, i.e.: the calls to ServerCall.Listener are in the same order as the calls to interceptCall. So on that point, perhaps some small documentation updates would help people figure this out.
 
This makes it impossible for an interceptor to do something like:

* setup some ThreadLocal state
* invoke the next interceptor's method (e.g.: start or onMessage)
* teardown ThreadLocal state

How so? Each method call may be on a different thread, so you have to setup and teardown the ThreadLocal every method:

Hmm, right, my example was incomplete. What I meant to say was that the same interceptor cannot do this kind of setup for both the outbound and callback paths.

Again, taking the tracing interceptor as an example, I believe you would want something like this:

ClientCall interceptCall(...) {
  Span span = newSpan();
  try(activateSpan(span)) {
    return new SimpleForwardingClientCall(next.newCall(...)) {
      void start(...) {
        try(activateSpan(span)) {
          super.start(
                     new SimpleForwardingClientCallListener(...) {
             void onMessage(...) {
               try(activateSpan(span)) { super.onMessage(...) }
             }
          }
        }
      }
    }
  }
}

I believe you want the tracing to be setup before all other interceptors on both paths. For example, so that log statements related to that request are attached to the same span.

In fact, both the OpenTracing and OpenTelemetry interceptors do exactly that. The OpenTracing's documentation[1] also mentions that it should be able to intercept other interceptors, which is not the case if you follow their suggested setup. I cannot find documentation for the OpenTelemetry interceptor but it seems to be making a similar assumption...


Eric Anderson

unread,
Feb 18, 2021, 1:41:06 PM2/18/21
to Philippe Laflamme, grpc.io
On Wed, Feb 17, 2021 at 6:17 PM Philippe Laflamme <phil...@hopper.com> wrote:
This is 100% the behavior you want; anything else leads to madness. Interceptors inject themselves to the boundary because the application and the gRPC library and allow you to do "whatever you want" with the API. This produces a layering, and to preserve the layering "who calls who" must be the exact opposite for callbacks as normal method calls.

I believe this can make sense in some cases, but I'm not sure that's the case in general. I believe there are cases where you want the interceptor to be at the same "logical" place on both paths. For example, a tracing interceptor wants to be first on the outbound path so create a new span and wants to be first on the callback path to make it the "active span" again. As far as I can tell, it is not possible to have this setup with the current API.

No, it really doesn't. What if there is a retrying interceptor? How about fault injection? How about binary logging? From our own experience of creating a tracing interceptor, I will strongly assert that it should not be split as you suggest. Initially it was unclear, but as time has gone on it has only been more obvious that it needs to be a single point. Otherwise most other features have to have custom tracing support to make everything cooperate.

I completely agree there are competing requirements for where best to place it. But if you do split it as you suggest, it makes the data fuzzy and inconsistent.

If you need low-level events when something happens, we do have an API for that. But it is a "read-only" API. It is primarily for stats-gathering. It is run on arbitrary threads at important times, so it must execute quickly. The Interceptor+StreamTracer model uses two components that each have their own consistent view of what occurs, and discrepancies are dealt with when the results are combined. Note that a single RPC may have multiple streams created for it.

Furthermore, the ServerInterceptor doesn't behave like this, i.e.: the calls to ServerCall.Listener are in the same order as the calls to interceptCall. So on that point, perhaps some small documentation updates would help people figure this out.

That is factually incorrect. Rerun whatever test made you come to that decision.

Hmm, right, my example was incomplete. What I meant to say was that the same interceptor cannot do this kind of setup for both the outbound and callback paths.

It can, by intercepting both the Call+Listener. But generally it is nonsensical to do so, since in one case it would call the library and in the other the application.

Again, taking the tracing interceptor as an example, I believe you would want something like this:

             void onMessage(...) {
               try(activateSpan(span)) { super.onMessage(...) }
             }

It took us a while to realize this the hard way, but you don't actually want to use the span for the callbacks. That can produce an infinite cascading of spans, as completing some work can cause other work to begin. You may want to propagate the initial span though, the span that was active before being replaced with `span`.

The easiest way to realize "this isn't what you want" is to consider a simpler world where we could use blocking APIs everywhere:

doRpc1();
doRpc2();

In that situation the two RPCs are related only by whatever span the code was called with. Now consider this non-blocking, callback-based approach:

doRpc1().whenComplete(::doRpc2);

You don't actually want doRpc2 within rpc1's span. The "best" span to use is the one active when whenComplete() is called, which here would be the same span for rpc1 and rpc2. Depending on the specific APIs in play that may be difficult to plumb, so sometimes you may have to accept the active span when doRpc1() was called.

Maybe it still isn't obvious. Consider how ridiculous the results would be if this were made async but all the spans chained:

for (int i = 0; i < 100; i++)
  doRpc();

Ignore the fact that you could do those RPCs in parallel. There's lots of cases where you need results from one RPC before starting another RPC and that still shouldn't impact the spans.

I believe you want the tracing to be setup before all other interceptors on both paths. For example, so that log statements related to that request are attached to the same span.

I think different people may come to different decisions depending on what is important to them. I honestly don't know what log statements an interceptor would do such that I'd care about the span, with the exception of a "log every RPC" interceptor.

Eric Anderson

unread,
Feb 19, 2021, 6:04:30 PM2/19/21
to Philippe Laflamme, grpc-io
+cc:grcp-io

On Fri, Feb 19, 2021 at 10:18 AM Philippe Laflamme <phil...@hopper.com> wrote:
No, it really doesn't. What if there is a retrying interceptor? How about fault injection? How about binary logging? From our own experience of creating a tracing interceptor, I will strongly assert that it should not be split as you suggest. Initially it was unclear, but as time has gone on it has only been more obvious that it needs to be a single point. Otherwise most other features have to have custom tracing support to make everything cooperate.

Ok, this is interesting and it would be great if the experience of creating that tracing interceptor was shared since it is pretty clear from my own assumptions and looking at existing tracing libraries that they're all doing it "wrong". Is there an existing write-up of this somewhere? I'd be happy to contribute something in https://grpc.io/docs/guides/ if such a thing exists. 

I don't know of any libraries doing this wrong. No, there's no write-up. I honestly don't know what would be put in a general-purpose document. I do think there could be something like "this is how to integrate gRPC with MDC". But really, it seems best to just have a repo with some code somewhere with utilities instead of each user reinventing the wheel.

Furthermore, the ServerInterceptor doesn't behave like this, i.e.: the calls to ServerCall.Listener are in the same order as the calls to interceptCall. So on that point, perhaps some small documentation updates would help people figure this out.

That is factually incorrect. Rerun whatever test made you come to that decision.

I've written this test: https://github.com/plaflamme/grpc-java/pull/1 to replicate what I was seeing. The test checks that the order is the same and according to the CI, the test passes. I'm certainly missing something, so I'd appreciate a quick look if you can.

Oh, I see. You're comparing Listener to interceptCall. Yes, the behavior you see is correct; my statement was wrong.

ServerCall and ServerCall.Listener are reversed from each other just like ClientCall and ClientCall.Listener. But on client side the interceptor aligns with ClientCall where on server side it aligns with ServerCall.Listener; in both cases it is the thing returned by the interceptor.

I will note that there is no magic nor infrastructure with interceptors; they are just convenience functions for making Channels and ServerCallHandlers/ServerMethodDefinitions. When you make an interceptor you are just wrapping the gRPC API with a part of your own implementation. Maybe the documentation could make that more clear. Maybe we should also downplay which runs "first" and emphasize placement further or closer to network/application.

It can, by intercepting both the Call+Listener. But generally it is nonsensical to do so, since in one case it would call the library and in the other the application.

Yes, I agree with that. But my particular goal is to instrument the gRPC interceptors themselves. The application is "shielded" from this because it has its own "request-local" state that it will restore when the callback reaches it.

For context, my particular goal is to make sure that some logging interceptor has visibility into the current span so it can add the span's ID to the log statements for correlation purposes. In practice, we do this through the "MDC" logging feature, but it's conceptually the same. So it seems important that this "tracing interceptor" be "before" the logging one in both directions.

I see. MDC. I do think you should split the problem into two different parts: MDC-propagation and Span-creation. Those should be separate components and be considered separately. And they have to be considered separately for client-side and server-side.

For client-side, I think you could have an MDC-propagation interceptor that sits very close to the network. Any interceptor that modifies MDC would modify the downward calls (Interceptor, ClientCall) and restore the parent value on the upward calls (Listener) (as discussed elsewhere).

For server-side, I think you set the Span on the downward calls (Interceptor, Listener)... and you could restore the original value on the upward calls (ServerCall), but it seems nothing may care. It seems there wouldn't be any MDC propagation here.

Does that sound like it would work?

 

Again, taking the tracing interceptor as an example, I believe you would want something like this:

             void onMessage(...) {
               try(activateSpan(span)) { super.onMessage(...) }
             }

It took us a while to realize this the hard way, but you don't actually want to use the span for the callbacks. That can produce an infinite cascading of spans, as completing some work can cause other work to begin. You may want to propagate the initial span though, the span that was active before being replaced with `span`.

Agreed. This is the case in our application stack: it will correctly re-activate the initial / parent span "on top", so the "callback span" isn't really problematic. But it does mean that the child span covers a part of the application that it shouldn't, so I definitely see your point here. We probably need to adjust this so that the child span is not just hidden, but actually "closed". Thanks for this!

I think different people may come to different decisions depending on what is important to them. I honestly don't know what log statements an interceptor would do such that I'd care about the span, with the exception of a "log every RPC" interceptor.

In our case, we want to log the status code and description when the status is !OK. Furthermore, we need this statement to be attached to the request's span which was setup at the very beginning of the call (so that the span covers the whole request). We don't log anything on the outbound path, so I suppose we could more strategically place the tracing interceptor, but that seems pretty brittle. Any thoughts on a recommended approach for such a setup?

Okay, so it is a "log the RPC" interceptor. I don't know if this is on client-side or server-side. I think splitting the MDC propagation from the span creation addresses this on client-side. Server-side is harder because there's nothing that would guarantee the "correct" span to be active in the MDC when the server responds; the interceptor may need to save the span in the request path to use it in the response path.

Philippe Laflamme

unread,
Feb 23, 2021, 9:04:01 AM2/23/21
to Eric Anderson, grpc-io

I don't know of any libraries doing this wrong.

 
No, there's no write-up. I honestly don't know what would be put in a general-purpose document. I do think there could be something like "this is how to integrate gRPC with MDC". But really, it seems best to just have a repo with some code somewhere with utilities instead of each user reinventing the wheel.

Well my first suggestion was to improve the Javadoc to clarify the placement / ordering of the client interceptor (i.e.: the fact that a client interceptor's method will be ). This, I believe, would also qualify as a "general-purpose" document, see below.
 

 Maybe the documentation could make that more clear. Maybe we should also downplay which runs "first" and emphasize placement further or closer to network/application.

Right, I think this might be what a "general-purpose" document might cover. Thinking about interceptors as a stack is probably the best way, e.g.: interceptors are (conceptually) pushed onto a stack on the outbound path and then popped on the inbound path. With experience, it's evident that this is desirable, but it's also not necessarily intuitive for newcomers. Furthermore, interceptors are typically part of "framework-type" code so they're usually pretty important to "get right" and a unit test of a single interceptor will not make this behaviour evident; so documentation would go a long way IMO.
 

It can, by intercepting both the Call+Listener. But generally it is nonsensical to do so, since in one case it would call the library and in the other the application.

Yes, I agree with that. But my particular goal is to instrument the gRPC interceptors themselves. The application is "shielded" from this because it has its own "request-local" state that it will restore when the callback reaches it.

For context, my particular goal is to make sure that some logging interceptor has visibility into the current span so it can add the span's ID to the log statements for correlation purposes. In practice, we do this through the "MDC" logging feature, but it's conceptually the same. So it seems important that this "tracing interceptor" be "before" the logging one in both directions.

I see. MDC. I do think you should split the problem into two different parts: MDC-propagation and Span-creation. Those should be separate components and be considered separately. And they have to be considered separately for client-side and server-side. 

For client-side, I think you could have an MDC-propagation interceptor that sits very close to the network. Any interceptor that modifies MDC would modify the downward calls (Interceptor, ClientCall) and restore the parent value on the upward calls (Listener) (as discussed elsewhere).

Right, this is our current state for client interception (in interceptForward order): tracing | MDC | logging (each one is a separate ClientInterceptor).

The issue is that the logging interceptor is first on the inbound path, so it doesn't have its MDC setup. So we ended up having to setup another interceptor to re-initialize the MDC on the inbound path, e.g.: tracing | MDC | logging | MDC'

It does indeed make sense to have an interceptor as close to the network as possible, but it also does not seem sufficient to have a single interceptor if we also expect the logging interceptor to also do things on the outbound path.
 
For server-side, I think you set the Span on the downward calls (Interceptor, Listener)... and you could restore the original value on the upward calls (ServerCall), but it seems nothing may care. It seems there wouldn't be any MDC propagation here.

Right, similarly, our setup looks like this (in interceptForward order): tracing | MDC | logging, but that ended up working fine since we don't currently do anything with ServerCall per-say. But indeed, it seems like we'd have the same problem here.
 

Does that sound like it would work?

It certainly clarifies things. In our case, I believe some interceptor concerns do need to be split in 2 pieces that coordinate with one-another. This is obviously a design decision of ours and isn't directly applicable to other users...
 
Okay, so it is a "log the RPC" interceptor. I don't know if this is on client-side or server-side. I think splitting the MDC propagation from the span creation addresses this on client-side. Server-side is harder because there's nothing that would guarantee the "correct" span to be active in the MDC when the server responds; the interceptor may need to save the span in the request path to use it in the response path. 

Yes, thanks for the help, this did in fact clarify things enough for us to address the problem.

Cheers 
Reply all
Reply to author
Forward
0 new messages