How can you tell if a connection is broken? Unless you receive a packet saying the host isn't reachable, its possible the remote endpoint is just taking a long time. It can't be distinguished from radio silence.
The deadline mechanism isn't really for connection level usage, it's for RPC level usage.
If you can listen for OS updates on Android, why not just kill the RPCs your self when you get notified?
And, even if you did do this, how can you tell if the connection is failed? For example, if the OS tells you the antenna is turned off, it may be temporary and could turn on again with neither endpoint being the wiser. The connection is still active.
Carl, thanks for your input.In client-server development, especially for mobile, I can control certain things better than others. I have control over the server and can make it very reliable, so that I can mostly rule it out as a point of failure. By contrast, the network is chaotic. Network communication is going over the cellular network in my case, and I have to design for substantial connectivity problems because the app is generally used in remote locations. So usually the blame for communications problems in my case lies on the client-side network.How can you tell if a connection is broken? Unless you receive a packet saying the host isn't reachable, its possible the remote endpoint is just taking a long time. It can't be distinguished from radio silence.That's the crux of the problem - I usually can't tell if the connection is broken in a timely manner when a Deadline expires.
So I have to use good heuristics (make informed guesses) in a way that minimizes the impact on app users. If I take an optimistic stance and reissue calls on an existing yet broken connection, I will waste time because all I may get at the application level is silence and another expired Deadline. If instead I take a pessimistic stance and assume that the connection is broken (whether or not that actually is the case), I can try a new connection (if it appears that I am online) and minimize the downtime. Doing one retry on the old connection can be a good idea, but if it fails I am still facing the same problem.The real strategy may be a little more complex, but in the end I still need the ability to request a reconnect when my strategy dictates it (and when the Channel has no indication of recent data received). The current Channel design is overbearing when connections are not reliable.The deadline mechanism isn't really for connection level usage, it's for RPC level usage.Agreed. From the application level perspective, however, an expired Deadline can simultaneously be an indicator of a broken or unavailable connection, so to take action it is important to have the ability to signal this problem to the connection/channel level right away, without having to wait for a keep alive ping and lack of response at their regularly scheduled intervals. That kind of wait may be fine on the server side, but it's not feasible for my interactive user-facing app. And expired Deadlines can cancel out KeepAlive pings due to no active calls, which exacerbates the problem.I mean, I can work around the problem on the client side, but for my use case this means potentially issuing very frequent pings and doing so regardless of the existence of open calls. It's not a great design. I wouldn't have to do either of these things if I could signal to the Channel to reconnect when the Deadline expires.If you can listen for OS updates on Android, why not just kill the RPCs your self when you get notified?Good idea - doing more proactive OS listening & killing RPCs could come in useful. My question in this regard was more about how to best kill all existing RPCs in grpc-java while minimizing races, and the general feasibility of attempting to hide the messiness of Channel recreation by using Subchannels.
And, even if you did do this, how can you tell if the connection is failed? For example, if the OS tells you the antenna is turned off, it may be temporary and could turn on again with neither endpoint being the wiser. The connection is still active.Very true, there is no telling if the connection broke.So in summary I think the ability to signal to the Channel to reconnect is still needed for the Android use case. I recall that one of the official grpc design documents (connection backoff document perhaps) mentions to just continue using the old connection instead if it recovers before the new connection is ready; the same could be true here.There are already enhancement requests for RPC retries and reconnection backoff improvements, which I also sorely need.Should I create an issue about the undocumented problem of expiring Deadlines interfering with KeepAlives? Maybe even just to add something to the Javadoc? While logical, I found this quite surprising.