Ping (grpc-java)

922 views
Skip to first unread message

Josh Humphries

unread,
May 7, 2015, 11:32:30 PM5/7/15
to grp...@googlegroups.com
HTTP/2 includes a lightweight to check connections via a ping frame.

I see both Netty and OkHttp implementations in GRPC automatically acknowledge pings. But they don't expose a way to issue them. I think this would be valuable for interesting/advanced connection management. Ideally, the ClientTransport would expose a method that would also be accessible via the ChannelImpl:

ListenableFuture<Void> ping();

I have a branch where I've started adding this. I'm curious what folks think about it. I actually have some other local changes that add even more to ChannelImpl, to expose things for additional connection management usages:

// Currently, transport is started lazily as needed from
// newCall(...). This allows the transport to be started
// eagerly to avoid connection delays.
void tryStartTransport();

// Stops active transport (if there is one) and starts a
// new one. This can be useful if the endpoint is a load
// balancer and you want to get a persistent connection to
// a different backend behind the load balancer.
void recreateTransport();

// Listeners, for knowing when a channel is connected or not.
boolean addListener(Listener);
boolean removeListener(Listener);

interface Listener {
  // transport started
  channelTransportStarted(ClientTransport transport);

  // transport is ready (e.g. connection established)
  // also requires a new method on ClientTransport.Listener
  channelTransportReady(ClientTransport transport);

  // transport is shutting down
  channelTransportShutdown(ClientTransport transport);
}

Thoughts?
----
Josh Humphries
Manager, Shared Systems  |  Platform Engineering
Atlanta, GA  |  678-400-4867

Eric Anderson

unread,
May 8, 2015, 1:15:25 PM5/8/15
to Josh Humphries, grpc-io
On Thu, May 7, 2015 at 8:32 PM, Josh Humphries <j...@squareup.com> wrote:
I see both Netty and OkHttp implementations in GRPC automatically acknowledge pings. But they don't expose a way to issue them. I think this would be valuable for interesting/advanced connection management. Ideally, the ClientTransport would expose a method that would also be accessible via the ChannelImpl:

ListenableFuture<Void> ping();

This idea sounds good, and doing it at the transport layer is appropriate.

We have been avoiding ListenableFuture as of late in APIs in order permit reducing dependency on Guava in the future. Since this is in the transport, which we plan to give much weaker API guarantees for, that can be fine though.

We may want the API to provide the actual RTT discovered from the ping.

I have a branch where I've started adding this. I'm curious what folks think about it. I actually have some other local changes that add even more to ChannelImpl, to expose things for additional connection management usages:

// Currently, transport is started lazily as needed from
// newCall(...). This allows the transport to be started
// eagerly to avoid connection delays.
void tryStartTransport();

We'd like Channel to expose higher-level APIs, not ones for low-level connection management. We know we will need user-custom load balancing, so we will already be wanting to figure out how to plumb alternative channels with advanced details or plugging in alternative algorithms. We could have these various methods you are making protected, for instance, but that still leaves problems with constructing with ChannelBuilder.



For the "normal-user" experience here, I was thinking more along the terms of an option/setting that enabled eager (instead of lazy) connecting. It seems many users who would want the transport to start connecting now would also want it to reconnect semi-immediately if it gets disconnected. Alternatively, there could be a "waitForConnected" call that implicitly starts connecting (and a timeout of 0 would start connecting, but not block).

// Stops active transport (if there is one) and starts a
// new one. This can be useful if the endpoint is a load
// balancer and you want to get a persistent connection to
// a different backend behind the load balancer.
void recreateTransport();
 
// Listeners, for knowing when a channel is connected or not.
boolean addListener(Listener);
boolean removeListener(Listener);

interface Listener {
  // transport started
  channelTransportStarted(ClientTransport transport);

  // transport is ready (e.g. connection established)
  // also requires a new method on ClientTransport.Listener
  channelTransportReady(ClientTransport transport);

  // transport is shutting down
  channelTransportShutdown(ClientTransport transport);
}

The high-level version of this sort of stuff was going to be discussed in https://github.com/grpc/grpc-java/issues/28 . Importantly, it wouldn't want it to expose transports, as we want those to stay within Channel.

We were thinking of just exposing what state we were in: idle, connecting, ready, failure. Idle means "no active transport." Connecting means "active transport but it doesn't work yet." Ready means "working active transport." And failure means "transport failed; we may wait until trying to connect again". This was being discussed in an internal doc; I'd start working on putting it on GitHub, but it is out-of-date :'(. I'll try to push things in the right direction.


If the high-level APIs won't do what you need, then creating low level Channel interaction sounds good, although I don't know what form that should look like.

Josh Humphries

unread,
May 8, 2015, 1:30:22 PM5/8/15
to Eric Anderson, grpc-io
On Fri, May 8, 2015 at 1:15 PM, Eric Anderson <ej...@google.com> wrote:
On Thu, May 7, 2015 at 8:32 PM, Josh Humphries <j...@squareup.com> wrote:
I see both Netty and OkHttp implementations in GRPC automatically acknowledge pings. But they don't expose a way to issue them. I think this would be valuable for interesting/advanced connection management. Ideally, the ClientTransport would expose a method that would also be accessible via the ChannelImpl:

ListenableFuture<Void> ping();

This idea sounds good, and doing it at the transport layer is appropriate.

We have been avoiding ListenableFuture as of late in APIs in order permit reducing dependency on Guava in the future. Since this is in the transport, which we plan to give much weaker API guarantees for, that can be fine though.


So maybe having it accept a Runnable (callback) is better than adding dep on Guava. BTW, I have most of this already implemented in a branch. Should I try to clean it up and send a pull request? Or is this work that someone else is already doing, related to the health-checking API stuff you mentioned?
 
We may want the API to provide the actual RTT discovered from the ping.

Maybe, but this is quite easy for the caller to measure. Maybe the callback could accept the duration instead of just being a Runnable.
 

I have a branch where I've started adding this. I'm curious what folks think about it. I actually have some other local changes that add even more to ChannelImpl, to expose things for additional connection management usages:

// Currently, transport is started lazily as needed from
// newCall(...). This allows the transport to be started
// eagerly to avoid connection delays.
void tryStartTransport();

We'd like Channel to expose higher-level APIs, not ones for low-level connection management. We know we will need user-custom load balancing, so we will already be wanting to figure out how to plumb alternative channels with advanced details or plugging in alternative algorithms. We could have these various methods you are making protected, for instance, but that still leaves problems with constructing with ChannelBuilder.



For the "normal-user" experience here, I was thinking more along the terms of an option/setting that enabled eager (instead of lazy) connecting. It seems many users who would want the transport to start connecting now would also want it to reconnect semi-immediately if it gets disconnected. Alternatively, there could be a "waitForConnected" call that implicitly starts connecting (and a timeout of 0 would start connecting, but not block).

Yeah, that was how I started when I was toying with this. But even an "always connected" strategy will likely need some customization -- like how long to delay between connection attempts (including support for exponential back-off), how long to optionally block for the transport to become connected/ready when trying to send a request, and how to auto-reply to the call if the channel never became available (e.g. immediate Status.UNAVAILABLE).

Trying to make it flexible and pluggable while also keeping a sane and simple API was challenging. So I pivoted with the thought that all of this could be layered on top (like via ClientInterceptor) if we just exposed a few simple low-level primitives like these.
 

// Stops active transport (if there is one) and starts a
// new one. This can be useful if the endpoint is a load
// balancer and you want to get a persistent connection to
// a different backend behind the load balancer.
void recreateTransport();
 
// Listeners, for knowing when a channel is connected or not.
boolean addListener(Listener);
boolean removeListener(Listener);

interface Listener {
  // transport started
  channelTransportStarted(ClientTransport transport);

  // transport is ready (e.g. connection established)
  // also requires a new method on ClientTransport.Listener
  channelTransportReady(ClientTransport transport);

  // transport is shutting down
  channelTransportShutdown(ClientTransport transport);
}

The high-level version of this sort of stuff was going to be discussed in https://github.com/grpc/grpc-java/issues/28 . Importantly, it wouldn't want it to expose transports, as we want those to stay within Channel.

We were thinking of just exposing what state we were in: idle, connecting, ready, failure. Idle means "no active transport." Connecting means "active transport but it doesn't work yet." Ready means "working active transport." And failure means "transport failed; we may wait until trying to connect again". This was being discussed in an internal doc; I'd start working on putting it on GitHub, but it is out-of-date :'(. I'll try to push things in the right direction.

I definitely like the idea of exposing a state and not exposing the underlying transport. But I still think the listener is valuable (just altered to emit state change events) so that things can promptly react to state changes without polling. The set of states you list sounds perfect.
 

If the high-level APIs won't do what you need, then creating low level Channel interaction sounds good, although I don't know what form that should look like.

I have a branch where I've made all of these changes, but it sounds like I might be stepping on toes or pushing in a different direction than you already had in mind. Let's continue to discuss, and maybe I can get that branch into the right shape and open a pull request?


Eric Anderson

unread,
May 8, 2015, 4:29:33 PM5/8/15
to Josh Humphries, grpc-io
On Fri, May 8, 2015 at 10:30 AM, Josh Humphries <j...@squareup.com> wrote:
On Fri, May 8, 2015 at 1:15 PM, Eric Anderson <ej...@google.com> wrote:
On Thu, May 7, 2015 at 8:32 PM, Josh Humphries <j...@squareup.com> wrote:
We have been avoiding ListenableFuture as of late in APIs in order permit reducing dependency on Guava in the future. Since this is in the transport, which we plan to give much weaker API guarantees for, that can be fine though.


So maybe having it accept a Runnable (callback) is better than adding dep on Guava.

We already have the Guava dep, and it isn't leaving soon, but multiple people have expressed dissatisfaction with the dep, so we are just giving ourselves options in the future.

A Runnable or Callback both sound fine. Since this is in the transport, we can change it as necessary, so we don't worry as much about making the API "perfect."

BTW, I have most of this already implemented in a branch. Should I try to clean it up and send a pull request?

Ping is easy, feel free to have a PR for it. Reconnection would need a small discussion about API (like we've been doing here), but should also be easy.

Anything you are able to do without exposing public methods in Channel will be much easier, as we would then be more free to change it.

I'll try to push for getting some of these docs public so we know where we are aligned and not, and then take it from there.

Or is this work that someone else is already doing, related to the health-checking API stuff you mentioned?

Nobody is currently working on this stuff in Java. Some designs have been written for grpc in general, but I'm certain they aren't public yet simply because they were written in Google Docs.

I've had thoughts about the high-level API surface in the past, but no code. I think Louis has had ideas on how the pieces can fit together.

We may want the API to provide the actual RTT discovered from the ping.

Maybe, but this is quite easy for the caller to measure. Maybe the callback could accept the duration instead of just being a Runnable.

Yeah, it is really easy to measure. I guess since someone may be doing PING just to keep the connection alive/test the connection not checking how long it takes makes sense. Meh. It was just a mild thought, and it can make sense either way.

Yeah, that was how I started when I was toying with this. But even an "always connected" strategy will likely need some customization -- like how long to delay between connection attempts (including support for exponential back-off), how long to optionally block for the transport to become connected/ready when trying to send a request, and how to auto-reply to the call if the channel never became available (e.g. immediate Status.UNAVAILABLE).

I'm not sure to what degree we were wanting to allow customization of backoff. For circumstances that you have more information (like client-side load balancing with service to mediate) we would want to be able to make use of that information, though. The current "be lazy and just try to connect each RPC if one isn't in progress" was really just a stop-gap until we Did It Right™.

Timeouts was the way we were going to expose for the clients to give up. If there isn't a timeout specified, we would keep trying forever.

Trying to make it flexible and pluggable while also keeping a sane and simple API was challenging. So I pivoted with the thought that all of this could be layered on top (like via ClientInterceptor) if we just exposed a few simple low-level primitives like these.

I completely understand. And going that way may be best. I'll see if I can get some of Louis's opinions about it (assuming he doesn't reply here).

 I definitely like the idea of exposing a state and not exposing the underlying transport. But I still think the listener is valuable (just altered to emit state change events) so that things can promptly react to state changes without polling. The set of states you list sounds perfect.

Yeah, the listener is valuable. The (currently internal) doc is opinionated about interesting details of the exact look of the listener, it's on my plate to may it public. I'd be sooo happy if I get it done today.

If the high-level APIs won't do what you need, then creating low level Channel interaction sounds good, although I don't know what form that should look like.

I have a branch where I've made all of these changes, but it sounds like I might be stepping on toes or pushing in a different direction than you already had in mind. Let's continue to discuss, and maybe I can get that branch into the right shape and open a pull request?

That sounds good. You can share the branch when you think it is useful (maybe not even make a PR). I'll make sure to try preventing the "discuss" getting drawn out.

Josh Humphries

unread,
May 8, 2015, 10:53:43 PM5/8/15
to Eric Anderson, grpc-io
Thanks for all of the input, Eric.

I figured I'd go ahead and get the easy pieces going (adding ClientTransport#ping() and a couple of extra methods on ChannelImpl).

For now I left the API using ListenableFuture. The fact that it can aggregate listeners and be cancelled makes it better than just having the API accept a Runnable and Executor (at least until it's decided to drop the Guava dep).

I added tests and thought everything would be good, but the build fails due to the "animal sniffer". I'd never heard of it, but doing some digging, I'm guessing it's barfing because I added a method to the ClientTransport interface and thus broke the public API. It's output is actually not very helpful at all at telling me the problem. It just throws an IllegalArgumentException with no message (stack trace indicates it's using asm to read the new classes).

My digging only reveals how to rebuild the signature definitions with maven, and I am no gradle guru (not that I'm a maven guru either).

Any tips on what I should do to generate new signatures and get a clean build?

Here's the branch I have:



----
Josh Humphries
Manager, Shared Systems  |  Platform Engineering
Atlanta, GA  |  678-400-4867

Eric Anderson

unread,
May 9, 2015, 10:49:16 AM5/9/15
to Josh Humphries, grpc-io

I'll have to look into it. Animal sniffer always gave clear messages when it failed in the past. We use animal sniffer to ensure we maintain Java 6 compatibility for Android.

Josh Humphries

unread,
May 9, 2015, 11:36:29 AM5/9/15
to Eric Anderson, grpc-io
Ah, that makes sense. I'm pretty sure I didn't add anything that would eliminate Java 6 compatibility since my IntelliJ project is setup for Java 6 language level and a Java 6 JDK.

So... I just re-ran "./gradlew build" on master and got the same thing. So apparently it's an issue with my environment. 

I ran "./gradlew clean" and set my JAVA_HOME to a Java 7 JDK (I tried Java 6, but there is code compiled w/ 7 [class file version 51] that is needed to build).

When I subsequently ran the build, the animal sniffer problems went away. (I now have to get past problems in compiling the plugin since it can't find the protobuf header files.)

Sorry for the false alarm!

----
Josh Humphries
Manager, Shared Systems  |  Platform Engineering
Atlanta, GA  |  678-400-4867

Eric Anderson

unread,
May 9, 2015, 11:56:56 AM5/9/15
to Josh Humphries, grpc-io

-PskipCodegen=true was added this week. Check out the updated README.md that describes it.

Eric Anderson

unread,
May 9, 2015, 12:02:20 PM5/9/15
to Josh Humphries, grpc-io
And you will probably experience test failures on Java 7. Java 7 doesn't have any ciphers that are required by HTTP/2, and blah-blah-poor-documentation-blah.

Eric Anderson

unread,
May 11, 2015, 2:04:09 PM5/11/15
to Josh Humphries, grpc-io
On Fri, May 8, 2015 at 1:29 PM, Eric Anderson <ej...@google.com> wrote:
The (currently internal) doc is opinionated about interesting details of the exact look of the listener, it's on my plate to may it public. I'd be sooo happy if I get it done today.

The docs I had in mind are now public, but not yet committed:

There are likely other docs that would be helpful. I'm trying to find such docs and see what I can do with them.

Eric Anderson

unread,
May 11, 2015, 2:07:08 PM5/11/15
to Josh Humphries, grpc-io
On Mon, May 11, 2015 at 11:03 AM, Eric Anderson <ej...@google.com> wrote:
The docs I had in mind are now public, but not yet committed:

And do note: I know that the connectivity doc is slightly out-of-date. As a team we decided to add an IDLE state, but it was not added to the doc. There are also some changes that need to be made for clarity. But it is better than nothing :-)
Reply all
Reply to author
Forward
0 new messages